Training: 2022-04-11 10:17:26,645-rank_id: 0
Training: 2022-04-11 10:17:39,953-: margin_list              [1.0, 0.5, 0.0]
Training: 2022-04-11 10:17:39,954-: network                  r50
Training: 2022-04-11 10:17:39,954-: resume                   False
Training: 2022-04-11 10:17:39,954-: output                   work_dirs/ms1mv3_r50
Training: 2022-04-11 10:17:39,954-: embedding_size           512
Training: 2022-04-11 10:17:39,954-: sample_rate              1.0
Training: 2022-04-11 10:17:39,954-: interclass_filtering_threshold0
Training: 2022-04-11 10:17:39,954-: fp16                     True
Training: 2022-04-11 10:17:39,954-: batch_size               128
Training: 2022-04-11 10:17:39,954-: optimizer                sgd
Training: 2022-04-11 10:17:39,954-: lr                       0.1
Training: 2022-04-11 10:17:39,954-: momentum                 0.9
Training: 2022-04-11 10:17:39,955-: weight_decay             0.0005
Training: 2022-04-11 10:17:39,955-: verbose                  2000
Training: 2022-04-11 10:17:39,955-: frequent                 10
Training: 2022-04-11 10:17:39,955-: dali                     False
Training: 2022-04-11 10:17:39,955-: rec                      /train_tmp/ms1m-retinaface-t1
Training: 2022-04-11 10:17:39,955-: num_classes              93431
Training: 2022-04-11 10:17:39,955-: num_image                5179510
Training: 2022-04-11 10:17:39,955-: num_epoch                20
Training: 2022-04-11 10:17:39,955-: warmup_epoch             0
Training: 2022-04-11 10:17:39,955-: val_targets              ['lfw', 'cfp_fp', 'agedb_30']
Training: 2022-04-11 10:17:39,955-: total_batch_size         1024
Training: 2022-04-11 10:17:39,955-: warmup_step              0
Training: 2022-04-11 10:17:39,955-: total_step               101160
Training: 2022-04-11 10:18:47,693-Reducer buckets have been rebuilt in this iteration.
Training: 2022-04-11 10:18:51,063-Speed 5516.93 samples/sec   Loss 46.4909   LearningRate 0.1000   Epoch: 0   Global Step: 20   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-11 10:18:52,897-Speed 5584.33 samples/sec   Loss 47.7616   LearningRate 0.0999   Epoch: 0   Global Step: 30   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-11 10:18:54,750-Speed 5529.89 samples/sec   Loss 48.9438   LearningRate 0.0999   Epoch: 0   Global Step: 40   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-11 10:18:56,555-Speed 5677.36 samples/sec   Loss 47.4847   LearningRate 0.0999   Epoch: 0   Global Step: 50   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-11 10:18:58,388-Speed 5589.57 samples/sec   Loss 47.7831   LearningRate 0.0999   Epoch: 0   Global Step: 60   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-11 10:19:00,209-Speed 5626.29 samples/sec   Loss 47.4355   LearningRate 0.0999   Epoch: 0   Global Step: 70   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-11 10:19:02,042-Speed 5587.78 samples/sec   Loss 47.4098   LearningRate 0.0998   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-11 10:19:03,851-Speed 5666.01 samples/sec   Loss 46.9978   LearningRate 0.0998   Epoch: 0   Global Step: 90   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-11 10:19:05,670-Speed 5632.45 samples/sec   Loss 46.8949   LearningRate 0.0998   Epoch: 0   Global Step: 100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-11 10:19:07,493-Speed 5618.04 samples/sec   Loss 46.5454   LearningRate 0.0998   Epoch: 0   Global Step: 110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-11 10:19:09,309-Speed 5643.97 samples/sec   Loss 46.6333   LearningRate 0.0998   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 10:19:11,127-Speed 5636.34 samples/sec   Loss 46.4097   LearningRate 0.0997   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 10:19:12,938-Speed 5657.22 samples/sec   Loss 46.2107   LearningRate 0.0997   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 10:19:14,760-Speed 5621.95 samples/sec   Loss 46.0634   LearningRate 0.0997   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 10:19:16,563-Speed 5680.73 samples/sec   Loss 45.8538   LearningRate 0.0997   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 10:19:18,397-Speed 5588.11 samples/sec   Loss 45.6578   LearningRate 0.0997   Epoch: 0   Global Step: 170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 10:19:20,208-Speed 5659.81 samples/sec   Loss 45.4912   LearningRate 0.0996   Epoch: 0   Global Step: 180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 10:19:22,031-Speed 5620.45 samples/sec   Loss 45.3730   LearningRate 0.0996   Epoch: 0   Global Step: 190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 10:19:23,882-Speed 5535.05 samples/sec   Loss 45.0480   LearningRate 0.0996   Epoch: 0   Global Step: 200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 10:19:25,728-Speed 5549.95 samples/sec   Loss 45.0384   LearningRate 0.0996   Epoch: 0   Global Step: 210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:19:27,592-Speed 5495.80 samples/sec   Loss 44.6643   LearningRate 0.0996   Epoch: 0   Global Step: 220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:19:29,427-Speed 5584.33 samples/sec   Loss 44.5782   LearningRate 0.0995   Epoch: 0   Global Step: 230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:19:31,237-Speed 5661.30 samples/sec   Loss 44.4282   LearningRate 0.0995   Epoch: 0   Global Step: 240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:19:33,043-Speed 5674.77 samples/sec   Loss 44.2476   LearningRate 0.0995   Epoch: 0   Global Step: 250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:19:34,854-Speed 5655.25 samples/sec   Loss 44.1646   LearningRate 0.0995   Epoch: 0   Global Step: 260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:19:36,702-Speed 5544.66 samples/sec   Loss 43.8462   LearningRate 0.0995   Epoch: 0   Global Step: 270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:19:38,514-Speed 5653.97 samples/sec   Loss 43.6716   LearningRate 0.0994   Epoch: 0   Global Step: 280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:19:40,360-Speed 5549.07 samples/sec   Loss 43.5314   LearningRate 0.0994   Epoch: 0   Global Step: 290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:19:42,194-Speed 5587.34 samples/sec   Loss 43.3533   LearningRate 0.0994   Epoch: 0   Global Step: 300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:19:43,998-Speed 5679.95 samples/sec   Loss 43.2353   LearningRate 0.0994   Epoch: 0   Global Step: 310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:19:45,810-Speed 5653.80 samples/sec   Loss 43.0359   LearningRate 0.0994   Epoch: 0   Global Step: 320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:19:47,613-Speed 5684.82 samples/sec   Loss 42.8901   LearningRate 0.0993   Epoch: 0   Global Step: 330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:19:49,446-Speed 5588.53 samples/sec   Loss 42.7585   LearningRate 0.0993   Epoch: 0   Global Step: 340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:19:51,250-Speed 5678.35 samples/sec   Loss 42.5605   LearningRate 0.0993   Epoch: 0   Global Step: 350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:19:53,050-Speed 5692.07 samples/sec   Loss 42.3554   LearningRate 0.0993   Epoch: 0   Global Step: 360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:19:54,854-Speed 5679.23 samples/sec   Loss 42.2043   LearningRate 0.0993   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:19:56,650-Speed 5703.35 samples/sec   Loss 42.0335   LearningRate 0.0993   Epoch: 0   Global Step: 380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:19:58,466-Speed 5643.58 samples/sec   Loss 41.7946   LearningRate 0.0992   Epoch: 0   Global Step: 390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:00,264-Speed 5698.82 samples/sec   Loss 41.8366   LearningRate 0.0992   Epoch: 0   Global Step: 400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:02,081-Speed 5637.54 samples/sec   Loss 41.6395   LearningRate 0.0992   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:03,889-Speed 5668.54 samples/sec   Loss 41.3742   LearningRate 0.0992   Epoch: 0   Global Step: 420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:05,715-Speed 5610.14 samples/sec   Loss 41.3265   LearningRate 0.0992   Epoch: 0   Global Step: 430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:07,508-Speed 5711.38 samples/sec   Loss 41.2555   LearningRate 0.0991   Epoch: 0   Global Step: 440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:09,306-Speed 5699.33 samples/sec   Loss 41.0300   LearningRate 0.0991   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:11,100-Speed 5712.01 samples/sec   Loss 40.8966   LearningRate 0.0991   Epoch: 0   Global Step: 460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:12,909-Speed 5664.07 samples/sec   Loss 40.7091   LearningRate 0.0991   Epoch: 0   Global Step: 470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:14,729-Speed 5630.23 samples/sec   Loss 40.4529   LearningRate 0.0991   Epoch: 0   Global Step: 480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:16,526-Speed 5701.34 samples/sec   Loss 40.3084   LearningRate 0.0990   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:18,320-Speed 5710.11 samples/sec   Loss 40.1962   LearningRate 0.0990   Epoch: 0   Global Step: 500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:20,136-Speed 5641.95 samples/sec   Loss 39.9974   LearningRate 0.0990   Epoch: 0   Global Step: 510   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 10:20:21,935-Speed 5694.53 samples/sec   Loss 39.8377   LearningRate 0.0990   Epoch: 0   Global Step: 520   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 10:20:23,739-Speed 5680.85 samples/sec   Loss 39.7291   LearningRate 0.0990   Epoch: 0   Global Step: 530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:25,537-Speed 5699.58 samples/sec   Loss 39.6395   LearningRate 0.0989   Epoch: 0   Global Step: 540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:27,358-Speed 5623.74 samples/sec   Loss 39.4790   LearningRate 0.0989   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:29,151-Speed 5714.80 samples/sec   Loss 39.4034   LearningRate 0.0989   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:30,957-Speed 5674.18 samples/sec   Loss 39.1978   LearningRate 0.0989   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:32,753-Speed 5706.43 samples/sec   Loss 38.9289   LearningRate 0.0989   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:34,557-Speed 5677.43 samples/sec   Loss 38.6782   LearningRate 0.0988   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:36,378-Speed 5626.01 samples/sec   Loss 38.6011   LearningRate 0.0988   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:38,174-Speed 5704.00 samples/sec   Loss 38.4482   LearningRate 0.0988   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:39,979-Speed 5676.18 samples/sec   Loss 38.3041   LearningRate 0.0988   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:20:41,778-Speed 5692.56 samples/sec   Loss 38.1693   LearningRate 0.0988   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:20:43,580-Speed 5686.96 samples/sec   Loss 38.0635   LearningRate 0.0987   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:20:45,402-Speed 5622.71 samples/sec   Loss 37.8941   LearningRate 0.0987   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:20:47,248-Speed 5613.31 samples/sec   Loss 37.7611   LearningRate 0.0987   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:20:49,058-Speed 5660.32 samples/sec   Loss 37.6149   LearningRate 0.0987   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:20:50,853-Speed 5708.71 samples/sec   Loss 37.3386   LearningRate 0.0987   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:20:52,666-Speed 5648.91 samples/sec   Loss 37.1413   LearningRate 0.0986   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:20:54,463-Speed 5702.43 samples/sec   Loss 36.9727   LearningRate 0.0986   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:20:56,274-Speed 5659.32 samples/sec   Loss 36.8127   LearningRate 0.0986   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:20:58,072-Speed 5698.27 samples/sec   Loss 36.6828   LearningRate 0.0986   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:20:59,885-Speed 5649.51 samples/sec   Loss 36.5986   LearningRate 0.0986   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 10:21:01,681-Speed 5706.18 samples/sec   Loss 36.3823   LearningRate 0.0985   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:03,517-Speed 5580.56 samples/sec   Loss 36.2943   LearningRate 0.0985   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:05,316-Speed 5693.28 samples/sec   Loss 36.0890   LearningRate 0.0985   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:07,128-Speed 5654.85 samples/sec   Loss 35.8788   LearningRate 0.0985   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:08,923-Speed 5707.71 samples/sec   Loss 35.8505   LearningRate 0.0985   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:10,732-Speed 5666.43 samples/sec   Loss 35.6045   LearningRate 0.0984   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:12,538-Speed 5670.39 samples/sec   Loss 35.4374   LearningRate 0.0984   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:14,347-Speed 5664.52 samples/sec   Loss 35.2312   LearningRate 0.0984   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:16,151-Speed 5678.98 samples/sec   Loss 35.0735   LearningRate 0.0984   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:17,979-Speed 5607.07 samples/sec   Loss 35.0335   LearningRate 0.0984   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:19,766-Speed 5732.24 samples/sec   Loss 34.7683   LearningRate 0.0983   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:21,560-Speed 5709.84 samples/sec   Loss 34.7170   LearningRate 0.0983   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:23,371-Speed 5660.20 samples/sec   Loss 34.6129   LearningRate 0.0983   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:25,169-Speed 5698.14 samples/sec   Loss 34.2699   LearningRate 0.0983   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:26,983-Speed 5645.66 samples/sec   Loss 34.1601   LearningRate 0.0983   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:28,785-Speed 5688.72 samples/sec   Loss 33.9710   LearningRate 0.0982   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:30,603-Speed 5635.10 samples/sec   Loss 33.7977   LearningRate 0.0982   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:32,397-Speed 5710.21 samples/sec   Loss 33.6662   LearningRate 0.0982   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:34,190-Speed 5715.68 samples/sec   Loss 33.5587   LearningRate 0.0982   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:35,996-Speed 5673.12 samples/sec   Loss 33.2995   LearningRate 0.0982   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:37,791-Speed 5707.13 samples/sec   Loss 33.2040   LearningRate 0.0982   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 10:21:39,600-Speed 5664.66 samples/sec   Loss 33.0864   LearningRate 0.0981   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:41,409-Speed 5660.71 samples/sec   Loss 32.8163   LearningRate 0.0981   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:43,224-Speed 5648.44 samples/sec   Loss 32.6597   LearningRate 0.0981   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:45,024-Speed 5690.04 samples/sec   Loss 32.6784   LearningRate 0.0981   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:46,823-Speed 5695.66 samples/sec   Loss 32.3713   LearningRate 0.0981   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:48,638-Speed 5645.27 samples/sec   Loss 32.2157   LearningRate 0.0980   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:50,437-Speed 5693.12 samples/sec   Loss 32.0790   LearningRate 0.0980   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:52,256-Speed 5683.34 samples/sec   Loss 31.9050   LearningRate 0.0980   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:54,076-Speed 5629.71 samples/sec   Loss 31.7893   LearningRate 0.0980   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:55,875-Speed 5695.23 samples/sec   Loss 31.5327   LearningRate 0.0980   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:57,665-Speed 5722.37 samples/sec   Loss 31.3468   LearningRate 0.0979   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:21:59,483-Speed 5637.27 samples/sec   Loss 31.4009   LearningRate 0.0979   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:01,284-Speed 5689.12 samples/sec   Loss 30.9880   LearningRate 0.0979   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:03,094-Speed 5658.68 samples/sec   Loss 30.8924   LearningRate 0.0979   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:04,893-Speed 5697.56 samples/sec   Loss 30.6821   LearningRate 0.0979   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:06,707-Speed 5647.69 samples/sec   Loss 30.6243   LearningRate 0.0978   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:08,503-Speed 5706.15 samples/sec   Loss 30.4638   LearningRate 0.0978   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:10,314-Speed 5655.32 samples/sec   Loss 30.2822   LearningRate 0.0978   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:12,138-Speed 5617.76 samples/sec   Loss 29.8347   LearningRate 0.0978   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:13,954-Speed 5640.17 samples/sec   Loss 30.1160   LearningRate 0.0978   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:15,763-Speed 5664.17 samples/sec   Loss 29.8251   LearningRate 0.0977   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:17,568-Speed 5675.98 samples/sec   Loss 29.7573   LearningRate 0.0977   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:19,401-Speed 5589.08 samples/sec   Loss 29.4313   LearningRate 0.0977   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:21,199-Speed 5700.19 samples/sec   Loss 29.3663   LearningRate 0.0977   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:23,014-Speed 5643.94 samples/sec   Loss 29.2106   LearningRate 0.0977   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:24,828-Speed 5647.70 samples/sec   Loss 28.8505   LearningRate 0.0976   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:26,666-Speed 5574.80 samples/sec   Loss 28.8590   LearningRate 0.0976   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:28,473-Speed 5670.50 samples/sec   Loss 28.8144   LearningRate 0.0976   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:30,293-Speed 5629.30 samples/sec   Loss 28.7098   LearningRate 0.0976   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:32,091-Speed 5697.73 samples/sec   Loss 28.4814   LearningRate 0.0976   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:33,896-Speed 5677.03 samples/sec   Loss 28.4194   LearningRate 0.0975   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:35,748-Speed 5530.42 samples/sec   Loss 28.1322   LearningRate 0.0975   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:37,585-Speed 5576.20 samples/sec   Loss 28.0226   LearningRate 0.0975   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:39,420-Speed 5586.89 samples/sec   Loss 28.0697   LearningRate 0.0975   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:41,236-Speed 5638.19 samples/sec   Loss 27.8822   LearningRate 0.0975   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:43,036-Speed 5694.25 samples/sec   Loss 27.7145   LearningRate 0.0974   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:44,837-Speed 5687.94 samples/sec   Loss 27.7751   LearningRate 0.0974   Epoch: 0   Global Step: 1310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:46,658-Speed 5627.21 samples/sec   Loss 27.3507   LearningRate 0.0974   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:48,456-Speed 5699.84 samples/sec   Loss 27.2585   LearningRate 0.0974   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:50,277-Speed 5624.16 samples/sec   Loss 27.2788   LearningRate 0.0974   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:52,092-Speed 5645.51 samples/sec   Loss 26.8259   LearningRate 0.0973   Epoch: 0   Global Step: 1350   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 10:22:53,917-Speed 5613.98 samples/sec   Loss 26.7544   LearningRate 0.0973   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:55,719-Speed 5684.27 samples/sec   Loss 26.6459   LearningRate 0.0973   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:57,517-Speed 5698.19 samples/sec   Loss 26.6205   LearningRate 0.0973   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:22:59,330-Speed 5652.76 samples/sec   Loss 26.3864   LearningRate 0.0973   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:01,150-Speed 5627.31 samples/sec   Loss 26.3627   LearningRate 0.0973   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:02,977-Speed 5609.69 samples/sec   Loss 26.3985   LearningRate 0.0972   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:04,781-Speed 5678.04 samples/sec   Loss 26.1920   LearningRate 0.0972   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:06,593-Speed 5653.96 samples/sec   Loss 26.1284   LearningRate 0.0972   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:08,414-Speed 5627.57 samples/sec   Loss 26.1102   LearningRate 0.0972   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:10,210-Speed 5703.54 samples/sec   Loss 25.8820   LearningRate 0.0972   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:12,022-Speed 5654.75 samples/sec   Loss 25.8636   LearningRate 0.0971   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:13,847-Speed 5616.44 samples/sec   Loss 25.2601   LearningRate 0.0971   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:15,654-Speed 5669.26 samples/sec   Loss 25.2155   LearningRate 0.0971   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:17,473-Speed 5632.37 samples/sec   Loss 25.0876   LearningRate 0.0971   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:19,287-Speed 5647.31 samples/sec   Loss 25.2609   LearningRate 0.0971   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:21,102-Speed 5644.65 samples/sec   Loss 24.8778   LearningRate 0.0970   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:22,901-Speed 5695.58 samples/sec   Loss 24.7665   LearningRate 0.0970   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:24,697-Speed 5702.74 samples/sec   Loss 24.8773   LearningRate 0.0970   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:26,524-Speed 5608.65 samples/sec   Loss 24.9178   LearningRate 0.0970   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:28,321-Speed 5703.35 samples/sec   Loss 24.6759   LearningRate 0.0970   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:30,130-Speed 5662.23 samples/sec   Loss 24.5354   LearningRate 0.0969   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:31,928-Speed 5699.28 samples/sec   Loss 24.4808   LearningRate 0.0969   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:33,741-Speed 5651.40 samples/sec   Loss 24.2417   LearningRate 0.0969   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:35,544-Speed 5682.56 samples/sec   Loss 24.0406   LearningRate 0.0969   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:37,354-Speed 5658.04 samples/sec   Loss 24.0455   LearningRate 0.0969   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:39,176-Speed 5625.50 samples/sec   Loss 24.0369   LearningRate 0.0968   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:41,000-Speed 5616.97 samples/sec   Loss 24.0231   LearningRate 0.0968   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:42,827-Speed 5606.98 samples/sec   Loss 23.9764   LearningRate 0.0968   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:44,624-Speed 5703.12 samples/sec   Loss 23.6806   LearningRate 0.0968   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:46,442-Speed 5635.10 samples/sec   Loss 23.7135   LearningRate 0.0968   Epoch: 0   Global Step: 1650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:48,254-Speed 5654.70 samples/sec   Loss 23.4787   LearningRate 0.0967   Epoch: 0   Global Step: 1660   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 10:23:50,067-Speed 5651.36 samples/sec   Loss 23.4784   LearningRate 0.0967   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:23:51,884-Speed 5638.47 samples/sec   Loss 23.3026   LearningRate 0.0967   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 10:23:53,687-Speed 5683.30 samples/sec   Loss 23.1758   LearningRate 0.0967   Epoch: 0   Global Step: 1690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 10:23:55,495-Speed 5667.00 samples/sec   Loss 23.4219   LearningRate 0.0967   Epoch: 0   Global Step: 1700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 10:23:57,298-Speed 5680.65 samples/sec   Loss 23.2719   LearningRate 0.0966   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 10:23:59,106-Speed 5667.65 samples/sec   Loss 23.0730   LearningRate 0.0966   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 10:24:00,909-Speed 5684.11 samples/sec   Loss 22.7610   LearningRate 0.0966   Epoch: 0   Global Step: 1730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 10:24:02,725-Speed 5640.99 samples/sec   Loss 22.7564   LearningRate 0.0966   Epoch: 0   Global Step: 1740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 10:24:04,528-Speed 5682.58 samples/sec   Loss 22.6443   LearningRate 0.0966   Epoch: 0   Global Step: 1750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 10:24:06,359-Speed 5593.19 samples/sec   Loss 22.8637   LearningRate 0.0966   Epoch: 0   Global Step: 1760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 10:24:08,161-Speed 5687.03 samples/sec   Loss 22.4802   LearningRate 0.0965   Epoch: 0   Global Step: 1770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 10:24:09,971-Speed 5660.80 samples/sec   Loss 22.2499   LearningRate 0.0965   Epoch: 0   Global Step: 1780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:11,774-Speed 5683.84 samples/sec   Loss 22.2451   LearningRate 0.0965   Epoch: 0   Global Step: 1790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:13,598-Speed 5616.67 samples/sec   Loss 22.4408   LearningRate 0.0965   Epoch: 0   Global Step: 1800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:15,404-Speed 5672.11 samples/sec   Loss 22.2847   LearningRate 0.0965   Epoch: 0   Global Step: 1810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:17,235-Speed 5596.39 samples/sec   Loss 22.0673   LearningRate 0.0964   Epoch: 0   Global Step: 1820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:19,038-Speed 5683.79 samples/sec   Loss 22.1250   LearningRate 0.0964   Epoch: 0   Global Step: 1830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:20,847-Speed 5661.35 samples/sec   Loss 21.9384   LearningRate 0.0964   Epoch: 0   Global Step: 1840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:22,653-Speed 5674.06 samples/sec   Loss 21.9013   LearningRate 0.0964   Epoch: 0   Global Step: 1850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:24,483-Speed 5596.88 samples/sec   Loss 21.8593   LearningRate 0.0964   Epoch: 0   Global Step: 1860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:26,290-Speed 5672.17 samples/sec   Loss 21.7724   LearningRate 0.0963   Epoch: 0   Global Step: 1870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:28,114-Speed 5616.50 samples/sec   Loss 21.7505   LearningRate 0.0963   Epoch: 0   Global Step: 1880   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 10:24:29,929-Speed 5643.44 samples/sec   Loss 21.6520   LearningRate 0.0963   Epoch: 0   Global Step: 1890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:31,744-Speed 5645.07 samples/sec   Loss 21.5013   LearningRate 0.0963   Epoch: 0   Global Step: 1900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:33,559-Speed 5644.86 samples/sec   Loss 21.3828   LearningRate 0.0963   Epoch: 0   Global Step: 1910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:35,381-Speed 5622.95 samples/sec   Loss 21.2292   LearningRate 0.0962   Epoch: 0   Global Step: 1920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:37,189-Speed 5665.88 samples/sec   Loss 21.3528   LearningRate 0.0962   Epoch: 0   Global Step: 1930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:39,012-Speed 5621.17 samples/sec   Loss 21.3797   LearningRate 0.0962   Epoch: 0   Global Step: 1940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:40,835-Speed 5621.43 samples/sec   Loss 21.2023   LearningRate 0.0962   Epoch: 0   Global Step: 1950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:42,645-Speed 5660.21 samples/sec   Loss 21.0634   LearningRate 0.0962   Epoch: 0   Global Step: 1960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:44,445-Speed 5688.78 samples/sec   Loss 20.9776   LearningRate 0.0961   Epoch: 0   Global Step: 1970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 10:24:46,252-Speed 5671.47 samples/sec   Loss 20.9944   LearningRate 0.0961   Epoch: 0   Global Step: 1980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 10:24:48,053-Speed 5691.88 samples/sec   Loss 20.9149   LearningRate 0.0961   Epoch: 0   Global Step: 1990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 10:24:49,851-Speed 5697.41 samples/sec   Loss 20.7147   LearningRate 0.0961   Epoch: 0   Global Step: 2000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 10:25:17,220-[lfw][2000]XNorm: 22.104179
Training: 2022-04-11 10:25:17,220-[lfw][2000]Accuracy-Flip: 0.98233+-0.00429
Training: 2022-04-11 10:25:17,221-[lfw][2000]Accuracy-Highest: 0.98233
Training: 2022-04-11 10:25:48,545-[cfp_fp][2000]XNorm: 19.029918
Training: 2022-04-11 10:25:48,545-[cfp_fp][2000]Accuracy-Flip: 0.78843+-0.01866
Training: 2022-04-11 10:25:48,546-[cfp_fp][2000]Accuracy-Highest: 0.78843
Training: 2022-04-11 10:26:15,466-[agedb_30][2000]XNorm: 21.376326
Training: 2022-04-11 10:26:15,466-[agedb_30][2000]Accuracy-Flip: 0.88083+-0.02024
Training: 2022-04-11 10:26:15,467-[agedb_30][2000]Accuracy-Highest: 0.88083
Training: 2022-04-11 10:26:17,293-Speed 117.11 samples/sec   Loss 20.7624   LearningRate 0.0961   Epoch: 0   Global Step: 2010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 10:26:19,124-Speed 5593.35 samples/sec   Loss 20.6601   LearningRate 0.0960   Epoch: 0   Global Step: 2020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 10:26:20,920-Speed 5702.51 samples/sec   Loss 20.5377   LearningRate 0.0960   Epoch: 0   Global Step: 2030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 10:26:22,722-Speed 5686.54 samples/sec   Loss 20.7418   LearningRate 0.0960   Epoch: 0   Global Step: 2040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 10:26:24,526-Speed 5678.60 samples/sec   Loss 20.5106   LearningRate 0.0960   Epoch: 0   Global Step: 2050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 10:26:26,341-Speed 5644.01 samples/sec   Loss 20.4104   LearningRate 0.0960   Epoch: 0   Global Step: 2060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 10:26:28,152-Speed 5657.41 samples/sec   Loss 20.5386   LearningRate 0.0959   Epoch: 0   Global Step: 2070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-11 10:26:29,961-Speed 5665.21 samples/sec   Loss 20.3654   LearningRate 0.0959   Epoch: 0   Global Step: 2080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:26:31,759-Speed 5697.23 samples/sec   Loss 20.0453   LearningRate 0.0959   Epoch: 0   Global Step: 2090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:26:33,580-Speed 5623.89 samples/sec   Loss 20.2358   LearningRate 0.0959   Epoch: 0   Global Step: 2100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:26:35,381-Speed 5690.06 samples/sec   Loss 19.9373   LearningRate 0.0959   Epoch: 0   Global Step: 2110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:26:37,193-Speed 5650.40 samples/sec   Loss 20.2047   LearningRate 0.0959   Epoch: 0   Global Step: 2120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:26:39,015-Speed 5626.84 samples/sec   Loss 19.9315   LearningRate 0.0958   Epoch: 0   Global Step: 2130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:26:40,832-Speed 5637.66 samples/sec   Loss 20.0434   LearningRate 0.0958   Epoch: 0   Global Step: 2140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:26:42,633-Speed 5686.89 samples/sec   Loss 19.8979   LearningRate 0.0958   Epoch: 0   Global Step: 2150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:26:44,434-Speed 5689.14 samples/sec   Loss 19.7741   LearningRate 0.0958   Epoch: 0   Global Step: 2160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:26:46,239-Speed 5678.06 samples/sec   Loss 19.5920   LearningRate 0.0958   Epoch: 0   Global Step: 2170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:26:48,060-Speed 5625.33 samples/sec   Loss 19.8045   LearningRate 0.0957   Epoch: 0   Global Step: 2180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:26:49,860-Speed 5692.10 samples/sec   Loss 19.4399   LearningRate 0.0957   Epoch: 0   Global Step: 2190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:26:51,680-Speed 5627.83 samples/sec   Loss 19.6926   LearningRate 0.0957   Epoch: 0   Global Step: 2200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:26:53,479-Speed 5694.70 samples/sec   Loss 19.7728   LearningRate 0.0957   Epoch: 0   Global Step: 2210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:26:55,279-Speed 5692.64 samples/sec   Loss 19.5057   LearningRate 0.0957   Epoch: 0   Global Step: 2220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:26:57,075-Speed 5702.14 samples/sec   Loss 19.4345   LearningRate 0.0956   Epoch: 0   Global Step: 2230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:26:58,874-Speed 5696.22 samples/sec   Loss 19.3661   LearningRate 0.0956   Epoch: 0   Global Step: 2240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:00,704-Speed 5599.19 samples/sec   Loss 19.4497   LearningRate 0.0956   Epoch: 0   Global Step: 2250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:02,513-Speed 5664.18 samples/sec   Loss 19.4401   LearningRate 0.0956   Epoch: 0   Global Step: 2260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:04,325-Speed 5651.25 samples/sec   Loss 19.3271   LearningRate 0.0956   Epoch: 0   Global Step: 2270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:06,117-Speed 5719.27 samples/sec   Loss 19.3011   LearningRate 0.0955   Epoch: 0   Global Step: 2280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:27:07,936-Speed 5632.05 samples/sec   Loss 19.3059   LearningRate 0.0955   Epoch: 0   Global Step: 2290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:27:09,737-Speed 5689.94 samples/sec   Loss 19.1645   LearningRate 0.0955   Epoch: 0   Global Step: 2300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:27:11,578-Speed 5563.22 samples/sec   Loss 19.0775   LearningRate 0.0955   Epoch: 0   Global Step: 2310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:27:13,380-Speed 5685.62 samples/sec   Loss 19.0453   LearningRate 0.0955   Epoch: 0   Global Step: 2320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:27:15,203-Speed 5619.51 samples/sec   Loss 19.1023   LearningRate 0.0954   Epoch: 0   Global Step: 2330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:27:17,011-Speed 5667.10 samples/sec   Loss 19.0684   LearningRate 0.0954   Epoch: 0   Global Step: 2340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:27:18,843-Speed 5593.29 samples/sec   Loss 18.8219   LearningRate 0.0954   Epoch: 0   Global Step: 2350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:27:20,640-Speed 5701.81 samples/sec   Loss 18.9022   LearningRate 0.0954   Epoch: 0   Global Step: 2360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:27:22,440-Speed 5689.72 samples/sec   Loss 18.7882   LearningRate 0.0954   Epoch: 0   Global Step: 2370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:27:24,240-Speed 5692.55 samples/sec   Loss 18.8524   LearningRate 0.0953   Epoch: 0   Global Step: 2380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:26,040-Speed 5689.95 samples/sec   Loss 18.5639   LearningRate 0.0953   Epoch: 0   Global Step: 2390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:27,864-Speed 5617.60 samples/sec   Loss 18.6735   LearningRate 0.0953   Epoch: 0   Global Step: 2400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:29,666-Speed 5685.06 samples/sec   Loss 18.7788   LearningRate 0.0953   Epoch: 0   Global Step: 2410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:31,466-Speed 5693.08 samples/sec   Loss 18.7434   LearningRate 0.0953   Epoch: 0   Global Step: 2420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:33,284-Speed 5632.91 samples/sec   Loss 18.7743   LearningRate 0.0953   Epoch: 0   Global Step: 2430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:35,085-Speed 5688.02 samples/sec   Loss 18.5661   LearningRate 0.0952   Epoch: 0   Global Step: 2440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:36,895-Speed 5661.73 samples/sec   Loss 18.4116   LearningRate 0.0952   Epoch: 0   Global Step: 2450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:38,799-Speed 5379.67 samples/sec   Loss 18.2477   LearningRate 0.0952   Epoch: 0   Global Step: 2460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:40,632-Speed 5589.32 samples/sec   Loss 18.3961   LearningRate 0.0952   Epoch: 0   Global Step: 2470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:42,440-Speed 5666.78 samples/sec   Loss 18.4272   LearningRate 0.0952   Epoch: 0   Global Step: 2480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:44,243-Speed 5679.13 samples/sec   Loss 18.1715   LearningRate 0.0951   Epoch: 0   Global Step: 2490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:46,047-Speed 5679.42 samples/sec   Loss 18.3040   LearningRate 0.0951   Epoch: 0   Global Step: 2500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:47,864-Speed 5639.15 samples/sec   Loss 18.0568   LearningRate 0.0951   Epoch: 0   Global Step: 2510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:49,736-Speed 5473.14 samples/sec   Loss 18.1629   LearningRate 0.0951   Epoch: 0   Global Step: 2520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:51,554-Speed 5633.34 samples/sec   Loss 18.0580   LearningRate 0.0951   Epoch: 0   Global Step: 2530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:53,379-Speed 5613.23 samples/sec   Loss 18.0795   LearningRate 0.0950   Epoch: 0   Global Step: 2540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:55,183-Speed 5679.52 samples/sec   Loss 18.2306   LearningRate 0.0950   Epoch: 0   Global Step: 2550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:57,012-Speed 5603.53 samples/sec   Loss 18.1229   LearningRate 0.0950   Epoch: 0   Global Step: 2560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:27:58,814-Speed 5684.06 samples/sec   Loss 18.0125   LearningRate 0.0950   Epoch: 0   Global Step: 2570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:00,623-Speed 5665.61 samples/sec   Loss 18.0086   LearningRate 0.0950   Epoch: 0   Global Step: 2580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:02,454-Speed 5594.78 samples/sec   Loss 18.0573   LearningRate 0.0949   Epoch: 0   Global Step: 2590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:04,261-Speed 5667.99 samples/sec   Loss 17.9344   LearningRate 0.0949   Epoch: 0   Global Step: 2600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:06,066-Speed 5677.94 samples/sec   Loss 17.9421   LearningRate 0.0949   Epoch: 0   Global Step: 2610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:07,878-Speed 5653.10 samples/sec   Loss 17.7839   LearningRate 0.0949   Epoch: 0   Global Step: 2620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:09,678-Speed 5693.27 samples/sec   Loss 17.8511   LearningRate 0.0949   Epoch: 0   Global Step: 2630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:11,478-Speed 5689.93 samples/sec   Loss 17.9065   LearningRate 0.0948   Epoch: 0   Global Step: 2640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:13,282-Speed 5679.01 samples/sec   Loss 17.8711   LearningRate 0.0948   Epoch: 0   Global Step: 2650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:15,104-Speed 5622.01 samples/sec   Loss 17.7877   LearningRate 0.0948   Epoch: 0   Global Step: 2660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:16,907-Speed 5681.33 samples/sec   Loss 17.7784   LearningRate 0.0948   Epoch: 0   Global Step: 2670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:18,716-Speed 5663.20 samples/sec   Loss 17.9212   LearningRate 0.0948   Epoch: 0   Global Step: 2680   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 10:28:20,521-Speed 5677.89 samples/sec   Loss 17.6326   LearningRate 0.0948   Epoch: 0   Global Step: 2690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:22,349-Speed 5604.34 samples/sec   Loss 17.5514   LearningRate 0.0947   Epoch: 0   Global Step: 2700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:24,226-Speed 5455.67 samples/sec   Loss 17.5808   LearningRate 0.0947   Epoch: 0   Global Step: 2710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:26,073-Speed 5549.58 samples/sec   Loss 17.4657   LearningRate 0.0947   Epoch: 0   Global Step: 2720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:27,879-Speed 5672.15 samples/sec   Loss 17.3828   LearningRate 0.0947   Epoch: 0   Global Step: 2730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:29,713-Speed 5587.40 samples/sec   Loss 17.3145   LearningRate 0.0947   Epoch: 0   Global Step: 2740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:31,516-Speed 5680.80 samples/sec   Loss 17.5230   LearningRate 0.0946   Epoch: 0   Global Step: 2750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:33,322-Speed 5670.54 samples/sec   Loss 17.3036   LearningRate 0.0946   Epoch: 0   Global Step: 2760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:35,139-Speed 5640.43 samples/sec   Loss 17.1966   LearningRate 0.0946   Epoch: 0   Global Step: 2770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:36,941-Speed 5684.67 samples/sec   Loss 17.4028   LearningRate 0.0946   Epoch: 0   Global Step: 2780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:38,767-Speed 5610.31 samples/sec   Loss 17.5928   LearningRate 0.0946   Epoch: 0   Global Step: 2790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:40,570-Speed 5683.33 samples/sec   Loss 17.1422   LearningRate 0.0945   Epoch: 0   Global Step: 2800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:42,398-Speed 5602.24 samples/sec   Loss 17.1664   LearningRate 0.0945   Epoch: 0   Global Step: 2810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:44,217-Speed 5635.03 samples/sec   Loss 17.2146   LearningRate 0.0945   Epoch: 0   Global Step: 2820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:46,039-Speed 5623.34 samples/sec   Loss 17.2919   LearningRate 0.0945   Epoch: 0   Global Step: 2830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:47,860-Speed 5623.83 samples/sec   Loss 16.9046   LearningRate 0.0945   Epoch: 0   Global Step: 2840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:49,678-Speed 5635.70 samples/sec   Loss 17.1070   LearningRate 0.0944   Epoch: 0   Global Step: 2850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:51,505-Speed 5609.23 samples/sec   Loss 17.1812   LearningRate 0.0944   Epoch: 0   Global Step: 2860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:53,314-Speed 5662.79 samples/sec   Loss 16.9266   LearningRate 0.0944   Epoch: 0   Global Step: 2870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:55,126-Speed 5654.26 samples/sec   Loss 17.0160   LearningRate 0.0944   Epoch: 0   Global Step: 2880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:56,931-Speed 5676.89 samples/sec   Loss 17.0089   LearningRate 0.0944   Epoch: 0   Global Step: 2890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:28:58,757-Speed 5612.08 samples/sec   Loss 17.2049   LearningRate 0.0943   Epoch: 0   Global Step: 2900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:00,559-Speed 5684.92 samples/sec   Loss 17.0994   LearningRate 0.0943   Epoch: 0   Global Step: 2910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:02,377-Speed 5635.19 samples/sec   Loss 16.9177   LearningRate 0.0943   Epoch: 0   Global Step: 2920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:29:04,192-Speed 5643.56 samples/sec   Loss 16.8105   LearningRate 0.0943   Epoch: 0   Global Step: 2930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:29:06,014-Speed 5622.47 samples/sec   Loss 16.8842   LearningRate 0.0943   Epoch: 0   Global Step: 2940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:29:07,826-Speed 5653.67 samples/sec   Loss 16.9343   LearningRate 0.0943   Epoch: 0   Global Step: 2950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:29:09,627-Speed 5688.39 samples/sec   Loss 16.7080   LearningRate 0.0942   Epoch: 0   Global Step: 2960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:29:11,454-Speed 5609.11 samples/sec   Loss 16.8043   LearningRate 0.0942   Epoch: 0   Global Step: 2970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:29:13,263-Speed 5662.99 samples/sec   Loss 16.7278   LearningRate 0.0942   Epoch: 0   Global Step: 2980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:29:15,081-Speed 5634.81 samples/sec   Loss 16.5420   LearningRate 0.0942   Epoch: 0   Global Step: 2990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:29:16,890-Speed 5663.92 samples/sec   Loss 16.9258   LearningRate 0.0942   Epoch: 0   Global Step: 3000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:29:18,715-Speed 5612.62 samples/sec   Loss 16.5707   LearningRate 0.0941   Epoch: 0   Global Step: 3010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:29:20,532-Speed 5638.03 samples/sec   Loss 16.5548   LearningRate 0.0941   Epoch: 0   Global Step: 3020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:22,344-Speed 5655.88 samples/sec   Loss 16.5275   LearningRate 0.0941   Epoch: 0   Global Step: 3030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:24,146-Speed 5684.69 samples/sec   Loss 16.6147   LearningRate 0.0941   Epoch: 0   Global Step: 3040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:25,958-Speed 5654.26 samples/sec   Loss 16.6600   LearningRate 0.0941   Epoch: 0   Global Step: 3050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:27,758-Speed 5689.52 samples/sec   Loss 16.7889   LearningRate 0.0940   Epoch: 0   Global Step: 3060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:29,563-Speed 5681.31 samples/sec   Loss 16.5736   LearningRate 0.0940   Epoch: 0   Global Step: 3070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:31,381-Speed 5638.44 samples/sec   Loss 16.6153   LearningRate 0.0940   Epoch: 0   Global Step: 3080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:33,182-Speed 5686.47 samples/sec   Loss 16.4371   LearningRate 0.0940   Epoch: 0   Global Step: 3090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:35,024-Speed 5559.71 samples/sec   Loss 16.4638   LearningRate 0.0940   Epoch: 0   Global Step: 3100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:36,837-Speed 5650.73 samples/sec   Loss 16.4705   LearningRate 0.0939   Epoch: 0   Global Step: 3110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:38,650-Speed 5652.17 samples/sec   Loss 16.3947   LearningRate 0.0939   Epoch: 0   Global Step: 3120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:40,467-Speed 5638.68 samples/sec   Loss 16.5907   LearningRate 0.0939   Epoch: 0   Global Step: 3130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:42,296-Speed 5600.07 samples/sec   Loss 16.3097   LearningRate 0.0939   Epoch: 0   Global Step: 3140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:44,126-Speed 5600.90 samples/sec   Loss 16.4525   LearningRate 0.0939   Epoch: 0   Global Step: 3150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:45,927-Speed 5689.51 samples/sec   Loss 16.4288   LearningRate 0.0939   Epoch: 0   Global Step: 3160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:47,755-Speed 5605.11 samples/sec   Loss 16.1946   LearningRate 0.0938   Epoch: 0   Global Step: 3170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:49,569-Speed 5648.88 samples/sec   Loss 16.2788   LearningRate 0.0938   Epoch: 0   Global Step: 3180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:51,378-Speed 5662.51 samples/sec   Loss 16.2976   LearningRate 0.0938   Epoch: 0   Global Step: 3190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:53,207-Speed 5600.47 samples/sec   Loss 16.2102   LearningRate 0.0938   Epoch: 0   Global Step: 3200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:55,012-Speed 5674.41 samples/sec   Loss 16.0012   LearningRate 0.0938   Epoch: 0   Global Step: 3210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:56,836-Speed 5618.29 samples/sec   Loss 16.1334   LearningRate 0.0937   Epoch: 0   Global Step: 3220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:29:58,647-Speed 5657.22 samples/sec   Loss 16.0630   LearningRate 0.0937   Epoch: 0   Global Step: 3230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:00,463-Speed 5642.79 samples/sec   Loss 15.8742   LearningRate 0.0937   Epoch: 0   Global Step: 3240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:02,295-Speed 5589.75 samples/sec   Loss 15.9853   LearningRate 0.0937   Epoch: 0   Global Step: 3250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:04,109-Speed 5647.51 samples/sec   Loss 16.0923   LearningRate 0.0937   Epoch: 0   Global Step: 3260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:05,914-Speed 5676.75 samples/sec   Loss 16.1936   LearningRate 0.0936   Epoch: 0   Global Step: 3270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:07,741-Speed 5605.60 samples/sec   Loss 15.9101   LearningRate 0.0936   Epoch: 0   Global Step: 3280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:09,563-Speed 5627.82 samples/sec   Loss 16.0635   LearningRate 0.0936   Epoch: 0   Global Step: 3290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:11,372-Speed 5663.23 samples/sec   Loss 16.1890   LearningRate 0.0936   Epoch: 0   Global Step: 3300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:13,177-Speed 5674.74 samples/sec   Loss 16.0598   LearningRate 0.0936   Epoch: 0   Global Step: 3310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:14,997-Speed 5627.49 samples/sec   Loss 15.9004   LearningRate 0.0935   Epoch: 0   Global Step: 3320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:16,807-Speed 5662.53 samples/sec   Loss 15.8386   LearningRate 0.0935   Epoch: 0   Global Step: 3330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:18,645-Speed 5576.50 samples/sec   Loss 15.7893   LearningRate 0.0935   Epoch: 0   Global Step: 3340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:20,451-Speed 5670.59 samples/sec   Loss 15.8763   LearningRate 0.0935   Epoch: 0   Global Step: 3350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:22,280-Speed 5602.75 samples/sec   Loss 15.7363   LearningRate 0.0935   Epoch: 0   Global Step: 3360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:24,086-Speed 5672.06 samples/sec   Loss 15.8609   LearningRate 0.0934   Epoch: 0   Global Step: 3370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:25,913-Speed 5608.90 samples/sec   Loss 15.9805   LearningRate 0.0934   Epoch: 0   Global Step: 3380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:27,725-Speed 5653.60 samples/sec   Loss 15.8404   LearningRate 0.0934   Epoch: 0   Global Step: 3390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:29,547-Speed 5624.70 samples/sec   Loss 15.8264   LearningRate 0.0934   Epoch: 0   Global Step: 3400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:31,352-Speed 5673.39 samples/sec   Loss 15.9259   LearningRate 0.0934   Epoch: 0   Global Step: 3410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:33,159-Speed 5671.40 samples/sec   Loss 15.6376   LearningRate 0.0934   Epoch: 0   Global Step: 3420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:34,977-Speed 5634.64 samples/sec   Loss 15.5486   LearningRate 0.0933   Epoch: 0   Global Step: 3430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:36,780-Speed 5680.82 samples/sec   Loss 15.6729   LearningRate 0.0933   Epoch: 0   Global Step: 3440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:38,625-Speed 5551.79 samples/sec   Loss 15.6851   LearningRate 0.0933   Epoch: 0   Global Step: 3450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:40,436-Speed 5660.08 samples/sec   Loss 15.6951   LearningRate 0.0933   Epoch: 0   Global Step: 3460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:42,264-Speed 5603.43 samples/sec   Loss 15.8545   LearningRate 0.0933   Epoch: 0   Global Step: 3470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:44,079-Speed 5645.46 samples/sec   Loss 15.6127   LearningRate 0.0932   Epoch: 0   Global Step: 3480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:45,886-Speed 5669.94 samples/sec   Loss 15.8901   LearningRate 0.0932   Epoch: 0   Global Step: 3490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:47,697-Speed 5657.51 samples/sec   Loss 15.4876   LearningRate 0.0932   Epoch: 0   Global Step: 3500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:49,551-Speed 5524.10 samples/sec   Loss 15.5882   LearningRate 0.0932   Epoch: 0   Global Step: 3510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:51,356-Speed 5676.97 samples/sec   Loss 15.4385   LearningRate 0.0932   Epoch: 0   Global Step: 3520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:53,180-Speed 5615.49 samples/sec   Loss 15.4648   LearningRate 0.0931   Epoch: 0   Global Step: 3530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:54,985-Speed 5676.34 samples/sec   Loss 15.4455   LearningRate 0.0931   Epoch: 0   Global Step: 3540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:56,821-Speed 5582.30 samples/sec   Loss 15.3902   LearningRate 0.0931   Epoch: 0   Global Step: 3550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:30:58,629-Speed 5663.80 samples/sec   Loss 15.4605   LearningRate 0.0931   Epoch: 0   Global Step: 3560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:00,458-Speed 5601.99 samples/sec   Loss 15.3243   LearningRate 0.0931   Epoch: 0   Global Step: 3570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:02,261-Speed 5682.83 samples/sec   Loss 15.5163   LearningRate 0.0930   Epoch: 0   Global Step: 3580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:04,075-Speed 5648.90 samples/sec   Loss 15.2501   LearningRate 0.0930   Epoch: 0   Global Step: 3590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:31:05,903-Speed 5603.94 samples/sec   Loss 15.3311   LearningRate 0.0930   Epoch: 0   Global Step: 3600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:31:07,733-Speed 5598.13 samples/sec   Loss 15.5576   LearningRate 0.0930   Epoch: 0   Global Step: 3610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:31:09,547-Speed 5649.84 samples/sec   Loss 15.3339   LearningRate 0.0930   Epoch: 0   Global Step: 3620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:31:11,383-Speed 5578.31 samples/sec   Loss 15.4005   LearningRate 0.0930   Epoch: 0   Global Step: 3630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:31:13,206-Speed 5619.22 samples/sec   Loss 15.5160   LearningRate 0.0929   Epoch: 0   Global Step: 3640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:31:15,049-Speed 5558.55 samples/sec   Loss 15.4700   LearningRate 0.0929   Epoch: 0   Global Step: 3650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:31:16,860-Speed 5660.99 samples/sec   Loss 15.3727   LearningRate 0.0929   Epoch: 0   Global Step: 3660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:31:18,695-Speed 5580.13 samples/sec   Loss 15.0632   LearningRate 0.0929   Epoch: 0   Global Step: 3670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:31:20,509-Speed 5649.14 samples/sec   Loss 15.4167   LearningRate 0.0929   Epoch: 0   Global Step: 3680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:31:22,317-Speed 5665.78 samples/sec   Loss 15.2029   LearningRate 0.0928   Epoch: 0   Global Step: 3690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:24,150-Speed 5588.38 samples/sec   Loss 15.0555   LearningRate 0.0928   Epoch: 0   Global Step: 3700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:25,982-Speed 5591.26 samples/sec   Loss 15.2378   LearningRate 0.0928   Epoch: 0   Global Step: 3710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:27,797-Speed 5646.13 samples/sec   Loss 15.1146   LearningRate 0.0928   Epoch: 0   Global Step: 3720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:29,611-Speed 5648.06 samples/sec   Loss 15.3051   LearningRate 0.0928   Epoch: 0   Global Step: 3730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:31,427-Speed 5639.79 samples/sec   Loss 15.2899   LearningRate 0.0927   Epoch: 0   Global Step: 3740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:33,239-Speed 5652.89 samples/sec   Loss 15.2427   LearningRate 0.0927   Epoch: 0   Global Step: 3750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:35,044-Speed 5677.37 samples/sec   Loss 15.2869   LearningRate 0.0927   Epoch: 0   Global Step: 3760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:36,872-Speed 5604.81 samples/sec   Loss 15.0408   LearningRate 0.0927   Epoch: 0   Global Step: 3770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:38,691-Speed 5631.34 samples/sec   Loss 15.0831   LearningRate 0.0927   Epoch: 0   Global Step: 3780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:40,493-Speed 5686.91 samples/sec   Loss 14.9994   LearningRate 0.0926   Epoch: 0   Global Step: 3790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:42,300-Speed 5666.19 samples/sec   Loss 14.7811   LearningRate 0.0926   Epoch: 0   Global Step: 3800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:44,135-Speed 5583.51 samples/sec   Loss 15.1221   LearningRate 0.0926   Epoch: 0   Global Step: 3810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:45,941-Speed 5672.74 samples/sec   Loss 14.9768   LearningRate 0.0926   Epoch: 0   Global Step: 3820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:47,753-Speed 5657.11 samples/sec   Loss 15.0303   LearningRate 0.0926   Epoch: 0   Global Step: 3830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:31:49,561-Speed 5664.67 samples/sec   Loss 15.0189   LearningRate 0.0926   Epoch: 0   Global Step: 3840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:31:51,368-Speed 5669.61 samples/sec   Loss 15.0276   LearningRate 0.0925   Epoch: 0   Global Step: 3850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:31:53,234-Speed 5491.54 samples/sec   Loss 15.2168   LearningRate 0.0925   Epoch: 0   Global Step: 3860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:31:55,059-Speed 5613.11 samples/sec   Loss 14.9617   LearningRate 0.0925   Epoch: 0   Global Step: 3870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:31:56,872-Speed 5648.11 samples/sec   Loss 15.1094   LearningRate 0.0925   Epoch: 0   Global Step: 3880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:31:58,699-Speed 5609.44 samples/sec   Loss 14.8623   LearningRate 0.0925   Epoch: 0   Global Step: 3890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:32:00,503-Speed 5680.56 samples/sec   Loss 15.1127   LearningRate 0.0924   Epoch: 0   Global Step: 3900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:32:02,347-Speed 5556.48 samples/sec   Loss 15.0681   LearningRate 0.0924   Epoch: 0   Global Step: 3910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:32:04,168-Speed 5622.36 samples/sec   Loss 14.9873   LearningRate 0.0924   Epoch: 0   Global Step: 3920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:32:05,989-Speed 5627.90 samples/sec   Loss 14.8364   LearningRate 0.0924   Epoch: 0   Global Step: 3930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:32:07,795-Speed 5671.93 samples/sec   Loss 14.9906   LearningRate 0.0924   Epoch: 0   Global Step: 3940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:32:09,630-Speed 5585.88 samples/sec   Loss 15.0032   LearningRate 0.0923   Epoch: 0   Global Step: 3950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:32:11,452-Speed 5622.42 samples/sec   Loss 14.8255   LearningRate 0.0923   Epoch: 0   Global Step: 3960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:32:13,262-Speed 5660.35 samples/sec   Loss 15.0347   LearningRate 0.0923   Epoch: 0   Global Step: 3970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:32:15,079-Speed 5636.48 samples/sec   Loss 14.8245   LearningRate 0.0923   Epoch: 0   Global Step: 3980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:32:16,891-Speed 5656.11 samples/sec   Loss 15.0418   LearningRate 0.0923   Epoch: 0   Global Step: 3990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:32:18,717-Speed 5608.87 samples/sec   Loss 14.7038   LearningRate 0.0922   Epoch: 0   Global Step: 4000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:32:46,140-[lfw][4000]XNorm: 23.260032
Training: 2022-04-11 10:32:46,141-[lfw][4000]Accuracy-Flip: 0.99183+-0.00431
Training: 2022-04-11 10:32:46,141-[lfw][4000]Accuracy-Highest: 0.99183
Training: 2022-04-11 10:33:17,820-[cfp_fp][4000]XNorm: 20.164622
Training: 2022-04-11 10:33:17,821-[cfp_fp][4000]Accuracy-Flip: 0.89571+-0.01525
Training: 2022-04-11 10:33:17,822-[cfp_fp][4000]Accuracy-Highest: 0.89571
Training: 2022-04-11 10:33:44,834-[agedb_30][4000]XNorm: 22.959244
Training: 2022-04-11 10:33:44,834-[agedb_30][4000]Accuracy-Flip: 0.93717+-0.01436
Training: 2022-04-11 10:33:44,835-[agedb_30][4000]Accuracy-Highest: 0.93717
Training: 2022-04-11 10:33:46,674-Speed 116.42 samples/sec   Loss 14.8352   LearningRate 0.0922   Epoch: 0   Global Step: 4010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:33:48,525-Speed 5532.77 samples/sec   Loss 14.7560   LearningRate 0.0922   Epoch: 0   Global Step: 4020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:33:50,337-Speed 5654.31 samples/sec   Loss 15.0010   LearningRate 0.0922   Epoch: 0   Global Step: 4030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:33:52,146-Speed 5662.88 samples/sec   Loss 14.6601   LearningRate 0.0922   Epoch: 0   Global Step: 4040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:33:53,947-Speed 5688.89 samples/sec   Loss 14.6642   LearningRate 0.0922   Epoch: 0   Global Step: 4050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:33:55,742-Speed 5705.81 samples/sec   Loss 14.9258   LearningRate 0.0921   Epoch: 0   Global Step: 4060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:33:57,538-Speed 5704.88 samples/sec   Loss 14.6584   LearningRate 0.0921   Epoch: 0   Global Step: 4070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:33:59,367-Speed 5600.43 samples/sec   Loss 14.6031   LearningRate 0.0921   Epoch: 0   Global Step: 4080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:34:01,166-Speed 5695.34 samples/sec   Loss 14.8554   LearningRate 0.0921   Epoch: 0   Global Step: 4090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:34:02,990-Speed 5617.01 samples/sec   Loss 14.5310   LearningRate 0.0921   Epoch: 0   Global Step: 4100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:04,796-Speed 5673.33 samples/sec   Loss 14.6064   LearningRate 0.0920   Epoch: 0   Global Step: 4110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:06,592-Speed 5703.79 samples/sec   Loss 14.6974   LearningRate 0.0920   Epoch: 0   Global Step: 4120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:08,403-Speed 5657.59 samples/sec   Loss 14.7565   LearningRate 0.0920   Epoch: 0   Global Step: 4130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:10,199-Speed 5704.49 samples/sec   Loss 14.7896   LearningRate 0.0920   Epoch: 0   Global Step: 4140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:12,032-Speed 5587.52 samples/sec   Loss 14.7258   LearningRate 0.0920   Epoch: 0   Global Step: 4150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:13,829-Speed 5702.00 samples/sec   Loss 14.5509   LearningRate 0.0919   Epoch: 0   Global Step: 4160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:15,655-Speed 5611.41 samples/sec   Loss 14.6705   LearningRate 0.0919   Epoch: 0   Global Step: 4170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:17,490-Speed 5585.81 samples/sec   Loss 14.6242   LearningRate 0.0919   Epoch: 0   Global Step: 4180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:19,305-Speed 5643.38 samples/sec   Loss 14.5372   LearningRate 0.0919   Epoch: 0   Global Step: 4190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:21,107-Speed 5684.13 samples/sec   Loss 14.6084   LearningRate 0.0919   Epoch: 0   Global Step: 4200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:22,922-Speed 5643.81 samples/sec   Loss 14.6871   LearningRate 0.0918   Epoch: 0   Global Step: 4210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:24,741-Speed 5635.35 samples/sec   Loss 14.6208   LearningRate 0.0918   Epoch: 0   Global Step: 4220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:26,552-Speed 5656.35 samples/sec   Loss 14.5753   LearningRate 0.0918   Epoch: 0   Global Step: 4230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:28,380-Speed 5602.93 samples/sec   Loss 14.4113   LearningRate 0.0918   Epoch: 0   Global Step: 4240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:30,197-Speed 5639.35 samples/sec   Loss 14.5555   LearningRate 0.0918   Epoch: 0   Global Step: 4250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:31,999-Speed 5687.05 samples/sec   Loss 14.5779   LearningRate 0.0918   Epoch: 0   Global Step: 4260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:33,821-Speed 5621.21 samples/sec   Loss 14.5375   LearningRate 0.0917   Epoch: 0   Global Step: 4270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:35,632-Speed 5659.45 samples/sec   Loss 14.4429   LearningRate 0.0917   Epoch: 0   Global Step: 4280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:37,479-Speed 5547.29 samples/sec   Loss 14.5675   LearningRate 0.0917   Epoch: 0   Global Step: 4290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:39,283-Speed 5678.61 samples/sec   Loss 14.5345   LearningRate 0.0917   Epoch: 0   Global Step: 4300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:41,089-Speed 5671.76 samples/sec   Loss 14.3902   LearningRate 0.0917   Epoch: 0   Global Step: 4310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:42,921-Speed 5593.55 samples/sec   Loss 14.5561   LearningRate 0.0916   Epoch: 0   Global Step: 4320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:44,719-Speed 5695.47 samples/sec   Loss 14.4433   LearningRate 0.0916   Epoch: 0   Global Step: 4330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:46,543-Speed 5619.58 samples/sec   Loss 14.2751   LearningRate 0.0916   Epoch: 0   Global Step: 4340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:48,352-Speed 5664.26 samples/sec   Loss 14.3524   LearningRate 0.0916   Epoch: 0   Global Step: 4350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:50,157-Speed 5679.17 samples/sec   Loss 14.2299   LearningRate 0.0916   Epoch: 0   Global Step: 4360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:52,018-Speed 5505.20 samples/sec   Loss 14.4516   LearningRate 0.0915   Epoch: 0   Global Step: 4370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:53,816-Speed 5697.98 samples/sec   Loss 14.5251   LearningRate 0.0915   Epoch: 0   Global Step: 4380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:55,635-Speed 5631.22 samples/sec   Loss 14.5187   LearningRate 0.0915   Epoch: 0   Global Step: 4390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:57,452-Speed 5641.58 samples/sec   Loss 14.1329   LearningRate 0.0915   Epoch: 0   Global Step: 4400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:34:59,273-Speed 5631.16 samples/sec   Loss 14.6178   LearningRate 0.0915   Epoch: 0   Global Step: 4410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:01,073-Speed 5692.29 samples/sec   Loss 14.4046   LearningRate 0.0915   Epoch: 0   Global Step: 4420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:02,875-Speed 5683.11 samples/sec   Loss 14.2910   LearningRate 0.0914   Epoch: 0   Global Step: 4430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:04,701-Speed 5612.41 samples/sec   Loss 14.4359   LearningRate 0.0914   Epoch: 0   Global Step: 4440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:35:06,522-Speed 5624.14 samples/sec   Loss 14.4591   LearningRate 0.0914   Epoch: 0   Global Step: 4450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:35:08,324-Speed 5686.29 samples/sec   Loss 14.3280   LearningRate 0.0914   Epoch: 0   Global Step: 4460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:35:10,140-Speed 5643.51 samples/sec   Loss 14.3956   LearningRate 0.0914   Epoch: 0   Global Step: 4470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:35:11,978-Speed 5573.11 samples/sec   Loss 14.1635   LearningRate 0.0913   Epoch: 0   Global Step: 4480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:35:13,792-Speed 5646.27 samples/sec   Loss 14.3969   LearningRate 0.0913   Epoch: 0   Global Step: 4490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:35:15,590-Speed 5697.12 samples/sec   Loss 14.1759   LearningRate 0.0913   Epoch: 0   Global Step: 4500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:35:17,410-Speed 5629.75 samples/sec   Loss 14.2035   LearningRate 0.0913   Epoch: 0   Global Step: 4510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:35:19,212-Speed 5687.25 samples/sec   Loss 14.1490   LearningRate 0.0913   Epoch: 0   Global Step: 4520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:35:21,032-Speed 5629.52 samples/sec   Loss 14.2135   LearningRate 0.0912   Epoch: 0   Global Step: 4530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:35:22,847-Speed 5642.70 samples/sec   Loss 14.2401   LearningRate 0.0912   Epoch: 0   Global Step: 4540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:24,651-Speed 5677.25 samples/sec   Loss 14.1957   LearningRate 0.0912   Epoch: 0   Global Step: 4550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:26,459-Speed 5668.16 samples/sec   Loss 14.2022   LearningRate 0.0912   Epoch: 0   Global Step: 4560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:28,257-Speed 5696.28 samples/sec   Loss 14.2901   LearningRate 0.0912   Epoch: 0   Global Step: 4570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:30,083-Speed 5611.11 samples/sec   Loss 14.0914   LearningRate 0.0912   Epoch: 0   Global Step: 4580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:31,895-Speed 5654.71 samples/sec   Loss 14.0757   LearningRate 0.0911   Epoch: 0   Global Step: 4590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:33,701-Speed 5672.81 samples/sec   Loss 14.0173   LearningRate 0.0911   Epoch: 0   Global Step: 4600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:35,519-Speed 5635.46 samples/sec   Loss 13.9970   LearningRate 0.0911   Epoch: 0   Global Step: 4610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:37,349-Speed 5596.93 samples/sec   Loss 14.0917   LearningRate 0.0911   Epoch: 0   Global Step: 4620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:39,152-Speed 5683.24 samples/sec   Loss 14.0596   LearningRate 0.0911   Epoch: 0   Global Step: 4630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:40,963-Speed 5657.72 samples/sec   Loss 14.0620   LearningRate 0.0910   Epoch: 0   Global Step: 4640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:42,778-Speed 5643.48 samples/sec   Loss 14.1978   LearningRate 0.0910   Epoch: 0   Global Step: 4650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:44,588-Speed 5657.28 samples/sec   Loss 14.0539   LearningRate 0.0910   Epoch: 0   Global Step: 4660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:46,407-Speed 5633.86 samples/sec   Loss 14.0892   LearningRate 0.0910   Epoch: 0   Global Step: 4670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:48,227-Speed 5628.89 samples/sec   Loss 14.1379   LearningRate 0.0910   Epoch: 0   Global Step: 4680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:50,041-Speed 5647.49 samples/sec   Loss 14.2191   LearningRate 0.0909   Epoch: 0   Global Step: 4690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:51,842-Speed 5685.93 samples/sec   Loss 14.0157   LearningRate 0.0909   Epoch: 0   Global Step: 4700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:53,642-Speed 5692.82 samples/sec   Loss 14.0730   LearningRate 0.0909   Epoch: 0   Global Step: 4710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:55,443-Speed 5685.79 samples/sec   Loss 14.1903   LearningRate 0.0909   Epoch: 0   Global Step: 4720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:57,243-Speed 5692.23 samples/sec   Loss 13.9665   LearningRate 0.0909   Epoch: 0   Global Step: 4730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:35:59,058-Speed 5643.17 samples/sec   Loss 14.1053   LearningRate 0.0908   Epoch: 0   Global Step: 4740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:36:00,851-Speed 5713.54 samples/sec   Loss 13.9622   LearningRate 0.0908   Epoch: 0   Global Step: 4750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:36:02,668-Speed 5639.73 samples/sec   Loss 14.0518   LearningRate 0.0908   Epoch: 0   Global Step: 4760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:36:04,501-Speed 5589.08 samples/sec   Loss 14.0174   LearningRate 0.0908   Epoch: 0   Global Step: 4770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:36:06,312-Speed 5656.26 samples/sec   Loss 13.9407   LearningRate 0.0908   Epoch: 0   Global Step: 4780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:36:08,145-Speed 5587.35 samples/sec   Loss 14.1331   LearningRate 0.0908   Epoch: 0   Global Step: 4790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:36:09,988-Speed 5559.84 samples/sec   Loss 14.0537   LearningRate 0.0907   Epoch: 0   Global Step: 4800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:36:11,790-Speed 5685.79 samples/sec   Loss 13.8678   LearningRate 0.0907   Epoch: 0   Global Step: 4810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:36:13,591-Speed 5687.21 samples/sec   Loss 13.8339   LearningRate 0.0907   Epoch: 0   Global Step: 4820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:36:15,389-Speed 5698.35 samples/sec   Loss 13.9297   LearningRate 0.0907   Epoch: 0   Global Step: 4830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:36:17,198-Speed 5663.00 samples/sec   Loss 13.9346   LearningRate 0.0907   Epoch: 0   Global Step: 4840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:36:19,007-Speed 5662.31 samples/sec   Loss 13.8631   LearningRate 0.0906   Epoch: 0   Global Step: 4850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:36:20,810-Speed 5681.41 samples/sec   Loss 13.7722   LearningRate 0.0906   Epoch: 0   Global Step: 4860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:36:22,615-Speed 5675.55 samples/sec   Loss 13.6137   LearningRate 0.0906   Epoch: 0   Global Step: 4870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:36:24,420-Speed 5675.64 samples/sec   Loss 13.6963   LearningRate 0.0906   Epoch: 0   Global Step: 4880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:36:26,229-Speed 5661.99 samples/sec   Loss 14.0454   LearningRate 0.0906   Epoch: 0   Global Step: 4890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:36:28,030-Speed 5687.19 samples/sec   Loss 13.8097   LearningRate 0.0905   Epoch: 0   Global Step: 4900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:36:29,839-Speed 5663.84 samples/sec   Loss 14.0158   LearningRate 0.0905   Epoch: 0   Global Step: 4910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:36:31,654-Speed 5643.56 samples/sec   Loss 13.6022   LearningRate 0.0905   Epoch: 0   Global Step: 4920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:36:33,463-Speed 5662.89 samples/sec   Loss 13.8251   LearningRate 0.0905   Epoch: 0   Global Step: 4930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:36:35,281-Speed 5634.20 samples/sec   Loss 13.6397   LearningRate 0.0905   Epoch: 0   Global Step: 4940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:36:37,099-Speed 5635.56 samples/sec   Loss 13.7075   LearningRate 0.0905   Epoch: 0   Global Step: 4950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:36:38,899-Speed 5692.91 samples/sec   Loss 13.8186   LearningRate 0.0904   Epoch: 0   Global Step: 4960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:36:40,725-Speed 5610.90 samples/sec   Loss 13.9245   LearningRate 0.0904   Epoch: 0   Global Step: 4970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:36:42,536-Speed 5655.81 samples/sec   Loss 13.7158   LearningRate 0.0904   Epoch: 0   Global Step: 4980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:36:44,339-Speed 5682.37 samples/sec   Loss 13.8730   LearningRate 0.0904   Epoch: 0   Global Step: 4990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:36:46,151-Speed 5652.60 samples/sec   Loss 13.7772   LearningRate 0.0904   Epoch: 0   Global Step: 5000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:36:47,959-Speed 5667.94 samples/sec   Loss 13.9282   LearningRate 0.0903   Epoch: 0   Global Step: 5010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:36:49,783-Speed 5617.14 samples/sec   Loss 13.7227   LearningRate 0.0903   Epoch: 0   Global Step: 5020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:36:51,587-Speed 5678.14 samples/sec   Loss 13.9853   LearningRate 0.0903   Epoch: 0   Global Step: 5030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:36:53,408-Speed 5623.99 samples/sec   Loss 13.6879   LearningRate 0.0903   Epoch: 0   Global Step: 5040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:36:55,289-Speed 5446.22 samples/sec   Loss 13.7270   LearningRate 0.0903   Epoch: 0   Global Step: 5050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:37:06,416-Speed 920.43 samples/sec   Loss 13.5324   LearningRate 0.0902   Epoch: 1   Global Step: 5060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:37:08,279-Speed 5502.00 samples/sec   Loss 12.7101   LearningRate 0.0902   Epoch: 1   Global Step: 5070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:37:10,119-Speed 5568.73 samples/sec   Loss 12.6316   LearningRate 0.0902   Epoch: 1   Global Step: 5080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:37:12,312-Speed 4672.67 samples/sec   Loss 12.7901   LearningRate 0.0902   Epoch: 1   Global Step: 5090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:14,183-Speed 5474.43 samples/sec   Loss 12.9441   LearningRate 0.0902   Epoch: 1   Global Step: 5100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:16,007-Speed 5620.51 samples/sec   Loss 12.5830   LearningRate 0.0902   Epoch: 1   Global Step: 5110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:17,866-Speed 5512.19 samples/sec   Loss 12.7615   LearningRate 0.0901   Epoch: 1   Global Step: 5120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:19,686-Speed 5628.77 samples/sec   Loss 12.8707   LearningRate 0.0901   Epoch: 1   Global Step: 5130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:21,491-Speed 5674.53 samples/sec   Loss 12.9579   LearningRate 0.0901   Epoch: 1   Global Step: 5140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:23,361-Speed 5482.46 samples/sec   Loss 12.5509   LearningRate 0.0901   Epoch: 1   Global Step: 5150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:25,182-Speed 5624.37 samples/sec   Loss 13.0103   LearningRate 0.0901   Epoch: 1   Global Step: 5160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:27,073-Speed 5418.23 samples/sec   Loss 12.8882   LearningRate 0.0900   Epoch: 1   Global Step: 5170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:28,903-Speed 5599.09 samples/sec   Loss 12.8982   LearningRate 0.0900   Epoch: 1   Global Step: 5180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:30,745-Speed 5561.18 samples/sec   Loss 12.7832   LearningRate 0.0900   Epoch: 1   Global Step: 5190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:32,552-Speed 5671.06 samples/sec   Loss 12.9718   LearningRate 0.0900   Epoch: 1   Global Step: 5200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:34,374-Speed 5620.63 samples/sec   Loss 13.0446   LearningRate 0.0900   Epoch: 1   Global Step: 5210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:36,211-Speed 5575.52 samples/sec   Loss 12.9717   LearningRate 0.0899   Epoch: 1   Global Step: 5220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:38,111-Speed 5392.11 samples/sec   Loss 12.9838   LearningRate 0.0899   Epoch: 1   Global Step: 5230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:39,927-Speed 5641.45 samples/sec   Loss 12.8896   LearningRate 0.0899   Epoch: 1   Global Step: 5240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:41,826-Speed 5394.45 samples/sec   Loss 12.9028   LearningRate 0.0899   Epoch: 1   Global Step: 5250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:43,632-Speed 5673.17 samples/sec   Loss 13.1036   LearningRate 0.0899   Epoch: 1   Global Step: 5260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:45,443-Speed 5657.97 samples/sec   Loss 12.9867   LearningRate 0.0899   Epoch: 1   Global Step: 5270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:47,251-Speed 5665.06 samples/sec   Loss 12.9529   LearningRate 0.0898   Epoch: 1   Global Step: 5280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:49,055-Speed 5679.81 samples/sec   Loss 13.0769   LearningRate 0.0898   Epoch: 1   Global Step: 5290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:50,871-Speed 5639.77 samples/sec   Loss 13.0320   LearningRate 0.0898   Epoch: 1   Global Step: 5300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:52,725-Speed 5527.67 samples/sec   Loss 12.9372   LearningRate 0.0898   Epoch: 1   Global Step: 5310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:54,558-Speed 5587.96 samples/sec   Loss 13.1562   LearningRate 0.0898   Epoch: 1   Global Step: 5320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:56,410-Speed 5531.21 samples/sec   Loss 13.1167   LearningRate 0.0897   Epoch: 1   Global Step: 5330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:37:58,233-Speed 5620.02 samples/sec   Loss 13.0225   LearningRate 0.0897   Epoch: 1   Global Step: 5340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:00,064-Speed 5592.58 samples/sec   Loss 12.9835   LearningRate 0.0897   Epoch: 1   Global Step: 5350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:01,885-Speed 5626.09 samples/sec   Loss 13.1080   LearningRate 0.0897   Epoch: 1   Global Step: 5360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:03,714-Speed 5600.97 samples/sec   Loss 12.9629   LearningRate 0.0897   Epoch: 1   Global Step: 5370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:05,713-Speed 5126.55 samples/sec   Loss 13.0872   LearningRate 0.0896   Epoch: 1   Global Step: 5380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:07,568-Speed 5524.43 samples/sec   Loss 13.2431   LearningRate 0.0896   Epoch: 1   Global Step: 5390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:09,375-Speed 5666.66 samples/sec   Loss 13.0025   LearningRate 0.0896   Epoch: 1   Global Step: 5400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:11,185-Speed 5661.89 samples/sec   Loss 13.1376   LearningRate 0.0896   Epoch: 1   Global Step: 5410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:13,012-Speed 5607.12 samples/sec   Loss 13.2745   LearningRate 0.0896   Epoch: 1   Global Step: 5420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:14,831-Speed 5632.28 samples/sec   Loss 13.2431   LearningRate 0.0896   Epoch: 1   Global Step: 5430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:16,701-Speed 5479.26 samples/sec   Loss 12.9731   LearningRate 0.0895   Epoch: 1   Global Step: 5440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:18,546-Speed 5551.75 samples/sec   Loss 13.0090   LearningRate 0.0895   Epoch: 1   Global Step: 5450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:20,380-Speed 5585.57 samples/sec   Loss 13.3556   LearningRate 0.0895   Epoch: 1   Global Step: 5460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:22,224-Speed 5557.75 samples/sec   Loss 12.8313   LearningRate 0.0895   Epoch: 1   Global Step: 5470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:24,041-Speed 5637.47 samples/sec   Loss 13.0894   LearningRate 0.0895   Epoch: 1   Global Step: 5480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:25,920-Speed 5450.83 samples/sec   Loss 12.9563   LearningRate 0.0894   Epoch: 1   Global Step: 5490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:27,794-Speed 5465.56 samples/sec   Loss 13.0975   LearningRate 0.0894   Epoch: 1   Global Step: 5500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:29,606-Speed 5655.38 samples/sec   Loss 13.0973   LearningRate 0.0894   Epoch: 1   Global Step: 5510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:31,431-Speed 5611.39 samples/sec   Loss 13.1723   LearningRate 0.0894   Epoch: 1   Global Step: 5520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:33,233-Speed 5685.19 samples/sec   Loss 12.7862   LearningRate 0.0894   Epoch: 1   Global Step: 5530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:35,039-Speed 5671.24 samples/sec   Loss 13.0970   LearningRate 0.0893   Epoch: 1   Global Step: 5540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:36,892-Speed 5530.19 samples/sec   Loss 12.9878   LearningRate 0.0893   Epoch: 1   Global Step: 5550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:38,698-Speed 5670.87 samples/sec   Loss 13.0856   LearningRate 0.0893   Epoch: 1   Global Step: 5560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:40,510-Speed 5654.31 samples/sec   Loss 13.0774   LearningRate 0.0893   Epoch: 1   Global Step: 5570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:42,350-Speed 5566.90 samples/sec   Loss 13.1334   LearningRate 0.0893   Epoch: 1   Global Step: 5580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:44,152-Speed 5686.13 samples/sec   Loss 13.1176   LearningRate 0.0893   Epoch: 1   Global Step: 5590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:45,972-Speed 5630.77 samples/sec   Loss 12.8540   LearningRate 0.0892   Epoch: 1   Global Step: 5600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:47,774-Speed 5683.47 samples/sec   Loss 13.2200   LearningRate 0.0892   Epoch: 1   Global Step: 5610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:49,581-Speed 5667.52 samples/sec   Loss 13.0439   LearningRate 0.0892   Epoch: 1   Global Step: 5620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:51,387-Speed 5674.67 samples/sec   Loss 12.9588   LearningRate 0.0892   Epoch: 1   Global Step: 5630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:53,191-Speed 5679.60 samples/sec   Loss 13.1851   LearningRate 0.0892   Epoch: 1   Global Step: 5640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:54,999-Speed 5663.03 samples/sec   Loss 13.0537   LearningRate 0.0891   Epoch: 1   Global Step: 5650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:38:56,823-Speed 5616.17 samples/sec   Loss 12.9605   LearningRate 0.0891   Epoch: 1   Global Step: 5660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:38:58,662-Speed 5572.97 samples/sec   Loss 13.1506   LearningRate 0.0891   Epoch: 1   Global Step: 5670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:00,465-Speed 5681.15 samples/sec   Loss 13.0340   LearningRate 0.0891   Epoch: 1   Global Step: 5680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:02,299-Speed 5585.14 samples/sec   Loss 13.0677   LearningRate 0.0891   Epoch: 1   Global Step: 5690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:04,108-Speed 5663.16 samples/sec   Loss 13.0480   LearningRate 0.0890   Epoch: 1   Global Step: 5700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:05,914-Speed 5672.65 samples/sec   Loss 13.0132   LearningRate 0.0890   Epoch: 1   Global Step: 5710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:07,729-Speed 5644.66 samples/sec   Loss 13.2471   LearningRate 0.0890   Epoch: 1   Global Step: 5720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:09,556-Speed 5605.90 samples/sec   Loss 13.0837   LearningRate 0.0890   Epoch: 1   Global Step: 5730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:11,375-Speed 5634.81 samples/sec   Loss 13.2122   LearningRate 0.0890   Epoch: 1   Global Step: 5740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:13,194-Speed 5631.08 samples/sec   Loss 12.8343   LearningRate 0.0890   Epoch: 1   Global Step: 5750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:15,001-Speed 5669.84 samples/sec   Loss 12.9690   LearningRate 0.0889   Epoch: 1   Global Step: 5760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:39:16,805-Speed 5675.81 samples/sec   Loss 13.1013   LearningRate 0.0889   Epoch: 1   Global Step: 5770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:39:18,607-Speed 5686.87 samples/sec   Loss 12.9236   LearningRate 0.0889   Epoch: 1   Global Step: 5780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:39:20,412-Speed 5674.20 samples/sec   Loss 13.0280   LearningRate 0.0889   Epoch: 1   Global Step: 5790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:39:22,223-Speed 5660.01 samples/sec   Loss 13.0497   LearningRate 0.0889   Epoch: 1   Global Step: 5800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:39:24,034-Speed 5655.50 samples/sec   Loss 12.7674   LearningRate 0.0888   Epoch: 1   Global Step: 5810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:39:25,841-Speed 5670.48 samples/sec   Loss 12.8418   LearningRate 0.0888   Epoch: 1   Global Step: 5820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:39:27,667-Speed 5610.12 samples/sec   Loss 12.9545   LearningRate 0.0888   Epoch: 1   Global Step: 5830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:39:29,502-Speed 5583.74 samples/sec   Loss 12.9573   LearningRate 0.0888   Epoch: 1   Global Step: 5840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:39:31,327-Speed 5612.29 samples/sec   Loss 12.9757   LearningRate 0.0888   Epoch: 1   Global Step: 5850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:39:33,125-Speed 5697.97 samples/sec   Loss 13.1358   LearningRate 0.0887   Epoch: 1   Global Step: 5860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:34,944-Speed 5633.60 samples/sec   Loss 13.0596   LearningRate 0.0887   Epoch: 1   Global Step: 5870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:36,750-Speed 5669.74 samples/sec   Loss 12.9641   LearningRate 0.0887   Epoch: 1   Global Step: 5880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:38,581-Speed 5596.50 samples/sec   Loss 13.0119   LearningRate 0.0887   Epoch: 1   Global Step: 5890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:40,390-Speed 5661.43 samples/sec   Loss 13.1278   LearningRate 0.0887   Epoch: 1   Global Step: 5900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:42,227-Speed 5576.21 samples/sec   Loss 12.9312   LearningRate 0.0887   Epoch: 1   Global Step: 5910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:44,036-Speed 5664.65 samples/sec   Loss 13.0090   LearningRate 0.0886   Epoch: 1   Global Step: 5920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:45,839-Speed 5682.28 samples/sec   Loss 13.0311   LearningRate 0.0886   Epoch: 1   Global Step: 5930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:47,647-Speed 5665.09 samples/sec   Loss 13.0848   LearningRate 0.0886   Epoch: 1   Global Step: 5940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:49,457-Speed 5659.15 samples/sec   Loss 12.9528   LearningRate 0.0886   Epoch: 1   Global Step: 5950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:39:51,264-Speed 5671.59 samples/sec   Loss 13.0910   LearningRate 0.0886   Epoch: 1   Global Step: 5960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:39:53,087-Speed 5620.49 samples/sec   Loss 12.9472   LearningRate 0.0885   Epoch: 1   Global Step: 5970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:39:54,902-Speed 5643.40 samples/sec   Loss 12.9948   LearningRate 0.0885   Epoch: 1   Global Step: 5980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:39:56,706-Speed 5675.81 samples/sec   Loss 12.7992   LearningRate 0.0885   Epoch: 1   Global Step: 5990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:39:58,519-Speed 5650.29 samples/sec   Loss 12.9557   LearningRate 0.0885   Epoch: 1   Global Step: 6000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:40:25,899-[lfw][6000]XNorm: 21.248232
Training: 2022-04-11 10:40:25,900-[lfw][6000]Accuracy-Flip: 0.99300+-0.00407
Training: 2022-04-11 10:40:25,900-[lfw][6000]Accuracy-Highest: 0.99300
Training: 2022-04-11 10:40:57,496-[cfp_fp][6000]XNorm: 18.533061
Training: 2022-04-11 10:40:57,498-[cfp_fp][6000]Accuracy-Flip: 0.90586+-0.01327
Training: 2022-04-11 10:40:57,498-[cfp_fp][6000]Accuracy-Highest: 0.90586
Training: 2022-04-11 10:41:24,454-[agedb_30][6000]XNorm: 20.919483
Training: 2022-04-11 10:41:24,455-[agedb_30][6000]Accuracy-Flip: 0.95100+-0.01057
Training: 2022-04-11 10:41:24,455-[agedb_30][6000]Accuracy-Highest: 0.95100
Training: 2022-04-11 10:41:26,286-Speed 116.67 samples/sec   Loss 13.0460   LearningRate 0.0885   Epoch: 1   Global Step: 6010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:28,112-Speed 5611.59 samples/sec   Loss 13.0969   LearningRate 0.0885   Epoch: 1   Global Step: 6020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:29,953-Speed 5564.02 samples/sec   Loss 13.0301   LearningRate 0.0884   Epoch: 1   Global Step: 6030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:31,783-Speed 5603.00 samples/sec   Loss 12.8132   LearningRate 0.0884   Epoch: 1   Global Step: 6040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:33,619-Speed 5582.04 samples/sec   Loss 13.1540   LearningRate 0.0884   Epoch: 1   Global Step: 6050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:35,417-Speed 5699.90 samples/sec   Loss 12.7747   LearningRate 0.0884   Epoch: 1   Global Step: 6060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:37,239-Speed 5623.87 samples/sec   Loss 13.0897   LearningRate 0.0884   Epoch: 1   Global Step: 6070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:39,061-Speed 5622.17 samples/sec   Loss 12.8382   LearningRate 0.0883   Epoch: 1   Global Step: 6080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:40,898-Speed 5578.71 samples/sec   Loss 12.9663   LearningRate 0.0883   Epoch: 1   Global Step: 6090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:42,741-Speed 5559.85 samples/sec   Loss 12.9375   LearningRate 0.0883   Epoch: 1   Global Step: 6100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:44,542-Speed 5689.28 samples/sec   Loss 12.8034   LearningRate 0.0883   Epoch: 1   Global Step: 6110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:46,403-Speed 5506.95 samples/sec   Loss 12.8811   LearningRate 0.0883   Epoch: 1   Global Step: 6120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:48,203-Speed 5689.19 samples/sec   Loss 12.9183   LearningRate 0.0882   Epoch: 1   Global Step: 6130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:50,014-Speed 5657.44 samples/sec   Loss 12.9950   LearningRate 0.0882   Epoch: 1   Global Step: 6140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:51,830-Speed 5640.71 samples/sec   Loss 12.9172   LearningRate 0.0882   Epoch: 1   Global Step: 6150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:53,678-Speed 5546.40 samples/sec   Loss 13.0348   LearningRate 0.0882   Epoch: 1   Global Step: 6160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:55,515-Speed 5578.48 samples/sec   Loss 12.8712   LearningRate 0.0882   Epoch: 1   Global Step: 6170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:57,358-Speed 5558.06 samples/sec   Loss 13.0438   LearningRate 0.0882   Epoch: 1   Global Step: 6180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:41:59,171-Speed 5653.57 samples/sec   Loss 12.8557   LearningRate 0.0881   Epoch: 1   Global Step: 6190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:00,993-Speed 5624.94 samples/sec   Loss 12.9627   LearningRate 0.0881   Epoch: 1   Global Step: 6200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:02,829-Speed 5582.68 samples/sec   Loss 12.9853   LearningRate 0.0881   Epoch: 1   Global Step: 6210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:04,642-Speed 5651.88 samples/sec   Loss 12.7977   LearningRate 0.0881   Epoch: 1   Global Step: 6220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:06,473-Speed 5595.58 samples/sec   Loss 12.8056   LearningRate 0.0881   Epoch: 1   Global Step: 6230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:08,282-Speed 5664.62 samples/sec   Loss 12.7324   LearningRate 0.0880   Epoch: 1   Global Step: 6240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:10,103-Speed 5625.99 samples/sec   Loss 12.8508   LearningRate 0.0880   Epoch: 1   Global Step: 6250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:11,915-Speed 5655.30 samples/sec   Loss 12.9397   LearningRate 0.0880   Epoch: 1   Global Step: 6260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:13,731-Speed 5642.46 samples/sec   Loss 12.9782   LearningRate 0.0880   Epoch: 1   Global Step: 6270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:15,545-Speed 5645.32 samples/sec   Loss 12.8440   LearningRate 0.0880   Epoch: 1   Global Step: 6280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:17,389-Speed 5558.71 samples/sec   Loss 12.7867   LearningRate 0.0880   Epoch: 1   Global Step: 6290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:19,205-Speed 5639.25 samples/sec   Loss 13.0146   LearningRate 0.0879   Epoch: 1   Global Step: 6300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:21,030-Speed 5615.02 samples/sec   Loss 12.5868   LearningRate 0.0879   Epoch: 1   Global Step: 6310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:22,836-Speed 5675.25 samples/sec   Loss 12.9063   LearningRate 0.0879   Epoch: 1   Global Step: 6320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:24,680-Speed 5553.38 samples/sec   Loss 12.8544   LearningRate 0.0879   Epoch: 1   Global Step: 6330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:26,501-Speed 5626.39 samples/sec   Loss 12.8457   LearningRate 0.0879   Epoch: 1   Global Step: 6340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:28,309-Speed 5667.67 samples/sec   Loss 12.6736   LearningRate 0.0878   Epoch: 1   Global Step: 6350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:30,131-Speed 5623.25 samples/sec   Loss 12.7501   LearningRate 0.0878   Epoch: 1   Global Step: 6360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:31,937-Speed 5672.05 samples/sec   Loss 12.7433   LearningRate 0.0878   Epoch: 1   Global Step: 6370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:33,757-Speed 5630.15 samples/sec   Loss 12.7742   LearningRate 0.0878   Epoch: 1   Global Step: 6380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:35,612-Speed 5525.07 samples/sec   Loss 12.9016   LearningRate 0.0878   Epoch: 1   Global Step: 6390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:37,417-Speed 5675.05 samples/sec   Loss 12.8651   LearningRate 0.0877   Epoch: 1   Global Step: 6400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:39,237-Speed 5631.40 samples/sec   Loss 12.8425   LearningRate 0.0877   Epoch: 1   Global Step: 6410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:41,071-Speed 5588.19 samples/sec   Loss 12.8022   LearningRate 0.0877   Epoch: 1   Global Step: 6420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:42,886-Speed 5643.39 samples/sec   Loss 12.7121   LearningRate 0.0877   Epoch: 1   Global Step: 6430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:44,698-Speed 5656.06 samples/sec   Loss 12.9270   LearningRate 0.0877   Epoch: 1   Global Step: 6440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:46,533-Speed 5581.95 samples/sec   Loss 12.7209   LearningRate 0.0877   Epoch: 1   Global Step: 6450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:48,335-Speed 5684.45 samples/sec   Loss 12.5311   LearningRate 0.0876   Epoch: 1   Global Step: 6460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:50,141-Speed 5675.03 samples/sec   Loss 12.8062   LearningRate 0.0876   Epoch: 1   Global Step: 6470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:51,963-Speed 5621.83 samples/sec   Loss 12.7824   LearningRate 0.0876   Epoch: 1   Global Step: 6480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:53,769-Speed 5673.79 samples/sec   Loss 12.6871   LearningRate 0.0876   Epoch: 1   Global Step: 6490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:55,606-Speed 5576.86 samples/sec   Loss 12.7557   LearningRate 0.0876   Epoch: 1   Global Step: 6500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:57,439-Speed 5588.68 samples/sec   Loss 12.6582   LearningRate 0.0875   Epoch: 1   Global Step: 6510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:42:59,260-Speed 5627.20 samples/sec   Loss 12.6049   LearningRate 0.0875   Epoch: 1   Global Step: 6520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:01,069-Speed 5665.11 samples/sec   Loss 12.5169   LearningRate 0.0875   Epoch: 1   Global Step: 6530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:02,873-Speed 5679.92 samples/sec   Loss 12.6284   LearningRate 0.0875   Epoch: 1   Global Step: 6540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:04,687-Speed 5645.57 samples/sec   Loss 12.5024   LearningRate 0.0875   Epoch: 1   Global Step: 6550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:06,542-Speed 5522.80 samples/sec   Loss 12.5904   LearningRate 0.0875   Epoch: 1   Global Step: 6560   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 10:43:08,364-Speed 5625.51 samples/sec   Loss 12.6673   LearningRate 0.0874   Epoch: 1   Global Step: 6570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:10,173-Speed 5664.34 samples/sec   Loss 12.5183   LearningRate 0.0874   Epoch: 1   Global Step: 6580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:11,978-Speed 5672.83 samples/sec   Loss 12.6507   LearningRate 0.0874   Epoch: 1   Global Step: 6590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:13,809-Speed 5595.27 samples/sec   Loss 12.6270   LearningRate 0.0874   Epoch: 1   Global Step: 6600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:15,613-Speed 5678.95 samples/sec   Loss 12.9027   LearningRate 0.0874   Epoch: 1   Global Step: 6610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:17,437-Speed 5621.15 samples/sec   Loss 12.6904   LearningRate 0.0873   Epoch: 1   Global Step: 6620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:19,242-Speed 5674.84 samples/sec   Loss 12.6569   LearningRate 0.0873   Epoch: 1   Global Step: 6630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:21,048-Speed 5675.52 samples/sec   Loss 12.7405   LearningRate 0.0873   Epoch: 1   Global Step: 6640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:22,852-Speed 5677.66 samples/sec   Loss 12.7071   LearningRate 0.0873   Epoch: 1   Global Step: 6650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:24,668-Speed 5640.43 samples/sec   Loss 12.4697   LearningRate 0.0873   Epoch: 1   Global Step: 6660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:26,469-Speed 5688.33 samples/sec   Loss 12.6572   LearningRate 0.0872   Epoch: 1   Global Step: 6670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:28,278-Speed 5662.07 samples/sec   Loss 12.5173   LearningRate 0.0872   Epoch: 1   Global Step: 6680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:30,138-Speed 5510.52 samples/sec   Loss 12.7278   LearningRate 0.0872   Epoch: 1   Global Step: 6690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:31,946-Speed 5666.78 samples/sec   Loss 12.7123   LearningRate 0.0872   Epoch: 1   Global Step: 6700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:33,747-Speed 5688.29 samples/sec   Loss 12.6018   LearningRate 0.0872   Epoch: 1   Global Step: 6710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:35,554-Speed 5669.95 samples/sec   Loss 12.5258   LearningRate 0.0872   Epoch: 1   Global Step: 6720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:37,399-Speed 5554.05 samples/sec   Loss 12.6752   LearningRate 0.0871   Epoch: 1   Global Step: 6730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:39,235-Speed 5579.49 samples/sec   Loss 12.4883   LearningRate 0.0871   Epoch: 1   Global Step: 6740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:41,056-Speed 5629.54 samples/sec   Loss 12.5412   LearningRate 0.0871   Epoch: 1   Global Step: 6750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:42,866-Speed 5659.97 samples/sec   Loss 12.6946   LearningRate 0.0871   Epoch: 1   Global Step: 6760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:44,665-Speed 5696.90 samples/sec   Loss 12.7151   LearningRate 0.0871   Epoch: 1   Global Step: 6770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:46,485-Speed 5626.89 samples/sec   Loss 12.7068   LearningRate 0.0870   Epoch: 1   Global Step: 6780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:48,323-Speed 5576.75 samples/sec   Loss 12.4374   LearningRate 0.0870   Epoch: 1   Global Step: 6790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:50,127-Speed 5677.69 samples/sec   Loss 12.5583   LearningRate 0.0870   Epoch: 1   Global Step: 6800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:52,043-Speed 5349.41 samples/sec   Loss 12.4958   LearningRate 0.0870   Epoch: 1   Global Step: 6810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:53,854-Speed 5656.83 samples/sec   Loss 12.6356   LearningRate 0.0870   Epoch: 1   Global Step: 6820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:55,744-Speed 5418.51 samples/sec   Loss 12.6588   LearningRate 0.0870   Epoch: 1   Global Step: 6830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:57,560-Speed 5642.88 samples/sec   Loss 12.5232   LearningRate 0.0869   Epoch: 1   Global Step: 6840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:43:59,385-Speed 5616.78 samples/sec   Loss 12.6561   LearningRate 0.0869   Epoch: 1   Global Step: 6850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:01,196-Speed 5658.35 samples/sec   Loss 12.5035   LearningRate 0.0869   Epoch: 1   Global Step: 6860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:03,022-Speed 5610.92 samples/sec   Loss 12.3713   LearningRate 0.0869   Epoch: 1   Global Step: 6870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:04,830-Speed 5666.64 samples/sec   Loss 12.2574   LearningRate 0.0869   Epoch: 1   Global Step: 6880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:06,633-Speed 5680.86 samples/sec   Loss 12.5321   LearningRate 0.0868   Epoch: 1   Global Step: 6890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:08,488-Speed 5525.28 samples/sec   Loss 12.5137   LearningRate 0.0868   Epoch: 1   Global Step: 6900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:10,305-Speed 5641.13 samples/sec   Loss 12.7288   LearningRate 0.0868   Epoch: 1   Global Step: 6910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:12,136-Speed 5592.79 samples/sec   Loss 12.6058   LearningRate 0.0868   Epoch: 1   Global Step: 6920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:13,946-Speed 5661.42 samples/sec   Loss 12.4628   LearningRate 0.0868   Epoch: 1   Global Step: 6930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:15,760-Speed 5647.88 samples/sec   Loss 12.3408   LearningRate 0.0867   Epoch: 1   Global Step: 6940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:17,589-Speed 5599.99 samples/sec   Loss 12.5080   LearningRate 0.0867   Epoch: 1   Global Step: 6950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:19,392-Speed 5681.90 samples/sec   Loss 12.5734   LearningRate 0.0867   Epoch: 1   Global Step: 6960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:21,222-Speed 5601.16 samples/sec   Loss 12.5579   LearningRate 0.0867   Epoch: 1   Global Step: 6970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:23,027-Speed 5675.87 samples/sec   Loss 12.4360   LearningRate 0.0867   Epoch: 1   Global Step: 6980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:24,874-Speed 5548.36 samples/sec   Loss 12.3342   LearningRate 0.0867   Epoch: 1   Global Step: 6990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:26,699-Speed 5614.61 samples/sec   Loss 12.6542   LearningRate 0.0866   Epoch: 1   Global Step: 7000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:28,509-Speed 5658.78 samples/sec   Loss 12.4144   LearningRate 0.0866   Epoch: 1   Global Step: 7010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:30,322-Speed 5655.97 samples/sec   Loss 12.5295   LearningRate 0.0866   Epoch: 1   Global Step: 7020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:32,142-Speed 5629.96 samples/sec   Loss 12.3459   LearningRate 0.0866   Epoch: 1   Global Step: 7030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:33,950-Speed 5664.64 samples/sec   Loss 12.4526   LearningRate 0.0866   Epoch: 1   Global Step: 7040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:35,800-Speed 5537.73 samples/sec   Loss 12.1538   LearningRate 0.0865   Epoch: 1   Global Step: 7050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:37,635-Speed 5587.99 samples/sec   Loss 12.4727   LearningRate 0.0865   Epoch: 1   Global Step: 7060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:39,445-Speed 5659.41 samples/sec   Loss 12.4049   LearningRate 0.0865   Epoch: 1   Global Step: 7070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:41,300-Speed 5525.01 samples/sec   Loss 12.3364   LearningRate 0.0865   Epoch: 1   Global Step: 7080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:43,128-Speed 5604.07 samples/sec   Loss 12.4486   LearningRate 0.0865   Epoch: 1   Global Step: 7090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:44,943-Speed 5644.79 samples/sec   Loss 12.5996   LearningRate 0.0865   Epoch: 1   Global Step: 7100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:46,804-Speed 5503.80 samples/sec   Loss 12.4024   LearningRate 0.0864   Epoch: 1   Global Step: 7110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:48,635-Speed 5597.11 samples/sec   Loss 12.5218   LearningRate 0.0864   Epoch: 1   Global Step: 7120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:50,463-Speed 5607.36 samples/sec   Loss 12.4821   LearningRate 0.0864   Epoch: 1   Global Step: 7130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:44:52,283-Speed 5627.84 samples/sec   Loss 12.3142   LearningRate 0.0864   Epoch: 1   Global Step: 7140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:44:54,101-Speed 5635.72 samples/sec   Loss 12.6460   LearningRate 0.0864   Epoch: 1   Global Step: 7150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:44:55,938-Speed 5577.04 samples/sec   Loss 12.4471   LearningRate 0.0863   Epoch: 1   Global Step: 7160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:44:57,761-Speed 5620.64 samples/sec   Loss 12.3937   LearningRate 0.0863   Epoch: 1   Global Step: 7170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:44:59,607-Speed 5550.19 samples/sec   Loss 12.2504   LearningRate 0.0863   Epoch: 1   Global Step: 7180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:45:01,437-Speed 5600.62 samples/sec   Loss 12.3219   LearningRate 0.0863   Epoch: 1   Global Step: 7190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:45:03,282-Speed 5551.71 samples/sec   Loss 12.3793   LearningRate 0.0863   Epoch: 1   Global Step: 7200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:45:05,112-Speed 5601.33 samples/sec   Loss 12.4543   LearningRate 0.0863   Epoch: 1   Global Step: 7210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:45:06,947-Speed 5581.79 samples/sec   Loss 12.4835   LearningRate 0.0862   Epoch: 1   Global Step: 7220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:45:08,772-Speed 5614.02 samples/sec   Loss 12.3443   LearningRate 0.0862   Epoch: 1   Global Step: 7230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:45:10,619-Speed 5546.65 samples/sec   Loss 12.4225   LearningRate 0.0862   Epoch: 1   Global Step: 7240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:12,434-Speed 5646.38 samples/sec   Loss 12.3596   LearningRate 0.0862   Epoch: 1   Global Step: 7250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:14,244-Speed 5661.60 samples/sec   Loss 12.4606   LearningRate 0.0862   Epoch: 1   Global Step: 7260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:16,074-Speed 5597.92 samples/sec   Loss 12.2994   LearningRate 0.0861   Epoch: 1   Global Step: 7270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:17,892-Speed 5635.56 samples/sec   Loss 12.2949   LearningRate 0.0861   Epoch: 1   Global Step: 7280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:19,699-Speed 5667.83 samples/sec   Loss 12.3504   LearningRate 0.0861   Epoch: 1   Global Step: 7290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:21,531-Speed 5594.74 samples/sec   Loss 12.4878   LearningRate 0.0861   Epoch: 1   Global Step: 7300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:23,335-Speed 5678.87 samples/sec   Loss 12.2946   LearningRate 0.0861   Epoch: 1   Global Step: 7310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:25,148-Speed 5650.25 samples/sec   Loss 12.2962   LearningRate 0.0861   Epoch: 1   Global Step: 7320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:26,964-Speed 5644.66 samples/sec   Loss 12.3721   LearningRate 0.0860   Epoch: 1   Global Step: 7330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:28,809-Speed 5552.94 samples/sec   Loss 12.3238   LearningRate 0.0860   Epoch: 1   Global Step: 7340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:30,631-Speed 5624.03 samples/sec   Loss 12.3716   LearningRate 0.0860   Epoch: 1   Global Step: 7350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:32,436-Speed 5677.33 samples/sec   Loss 12.1304   LearningRate 0.0860   Epoch: 1   Global Step: 7360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:34,272-Speed 5578.83 samples/sec   Loss 12.3488   LearningRate 0.0860   Epoch: 1   Global Step: 7370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:36,088-Speed 5639.85 samples/sec   Loss 12.3365   LearningRate 0.0859   Epoch: 1   Global Step: 7380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:37,902-Speed 5649.43 samples/sec   Loss 12.3259   LearningRate 0.0859   Epoch: 1   Global Step: 7390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:39,706-Speed 5677.85 samples/sec   Loss 12.3469   LearningRate 0.0859   Epoch: 1   Global Step: 7400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:41,557-Speed 5535.73 samples/sec   Loss 12.2020   LearningRate 0.0859   Epoch: 1   Global Step: 7410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:43,367-Speed 5660.27 samples/sec   Loss 12.3475   LearningRate 0.0859   Epoch: 1   Global Step: 7420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:45,203-Speed 5580.42 samples/sec   Loss 12.2354   LearningRate 0.0858   Epoch: 1   Global Step: 7430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:47,010-Speed 5670.69 samples/sec   Loss 12.2645   LearningRate 0.0858   Epoch: 1   Global Step: 7440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:48,825-Speed 5645.42 samples/sec   Loss 12.1575   LearningRate 0.0858   Epoch: 1   Global Step: 7450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:50,667-Speed 5563.89 samples/sec   Loss 12.3376   LearningRate 0.0858   Epoch: 1   Global Step: 7460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:52,491-Speed 5616.95 samples/sec   Loss 12.3649   LearningRate 0.0858   Epoch: 1   Global Step: 7470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:54,314-Speed 5621.36 samples/sec   Loss 12.0055   LearningRate 0.0858   Epoch: 1   Global Step: 7480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:56,122-Speed 5666.94 samples/sec   Loss 12.4362   LearningRate 0.0857   Epoch: 1   Global Step: 7490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:57,947-Speed 5612.87 samples/sec   Loss 12.3192   LearningRate 0.0857   Epoch: 1   Global Step: 7500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:45:59,759-Speed 5655.13 samples/sec   Loss 12.2157   LearningRate 0.0857   Epoch: 1   Global Step: 7510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:01,632-Speed 5471.26 samples/sec   Loss 12.0909   LearningRate 0.0857   Epoch: 1   Global Step: 7520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:03,457-Speed 5615.05 samples/sec   Loss 12.1535   LearningRate 0.0857   Epoch: 1   Global Step: 7530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:05,262-Speed 5676.45 samples/sec   Loss 12.1442   LearningRate 0.0856   Epoch: 1   Global Step: 7540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:07,082-Speed 5633.05 samples/sec   Loss 12.1462   LearningRate 0.0856   Epoch: 1   Global Step: 7550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:08,886-Speed 5679.87 samples/sec   Loss 12.3226   LearningRate 0.0856   Epoch: 1   Global Step: 7560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:10,697-Speed 5655.26 samples/sec   Loss 12.2491   LearningRate 0.0856   Epoch: 1   Global Step: 7570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:12,540-Speed 5559.37 samples/sec   Loss 12.3216   LearningRate 0.0856   Epoch: 1   Global Step: 7580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:14,360-Speed 5628.88 samples/sec   Loss 11.9703   LearningRate 0.0856   Epoch: 1   Global Step: 7590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:16,245-Speed 5436.97 samples/sec   Loss 12.2993   LearningRate 0.0855   Epoch: 1   Global Step: 7600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:18,049-Speed 5681.05 samples/sec   Loss 12.2175   LearningRate 0.0855   Epoch: 1   Global Step: 7610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:19,910-Speed 5504.44 samples/sec   Loss 12.0743   LearningRate 0.0855   Epoch: 1   Global Step: 7620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:46:21,711-Speed 5688.55 samples/sec   Loss 12.1409   LearningRate 0.0855   Epoch: 1   Global Step: 7630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:46:23,578-Speed 5486.86 samples/sec   Loss 12.1249   LearningRate 0.0855   Epoch: 1   Global Step: 7640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:46:25,396-Speed 5636.32 samples/sec   Loss 12.1659   LearningRate 0.0854   Epoch: 1   Global Step: 7650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:46:27,248-Speed 5593.80 samples/sec   Loss 12.0944   LearningRate 0.0854   Epoch: 1   Global Step: 7660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:46:29,059-Speed 5659.30 samples/sec   Loss 12.2074   LearningRate 0.0854   Epoch: 1   Global Step: 7670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:46:30,904-Speed 5552.12 samples/sec   Loss 12.1342   LearningRate 0.0854   Epoch: 1   Global Step: 7680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:46:32,708-Speed 5680.58 samples/sec   Loss 12.1195   LearningRate 0.0854   Epoch: 1   Global Step: 7690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:46:34,530-Speed 5623.84 samples/sec   Loss 12.2056   LearningRate 0.0854   Epoch: 1   Global Step: 7700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:46:36,362-Speed 5592.27 samples/sec   Loss 12.0693   LearningRate 0.0853   Epoch: 1   Global Step: 7710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:46:38,227-Speed 5495.25 samples/sec   Loss 11.9777   LearningRate 0.0853   Epoch: 1   Global Step: 7720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:40,040-Speed 5652.78 samples/sec   Loss 12.0948   LearningRate 0.0853   Epoch: 1   Global Step: 7730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:41,859-Speed 5644.27 samples/sec   Loss 12.0751   LearningRate 0.0853   Epoch: 1   Global Step: 7740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:43,722-Speed 5497.74 samples/sec   Loss 12.0482   LearningRate 0.0853   Epoch: 1   Global Step: 7750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:45,529-Speed 5671.60 samples/sec   Loss 12.1496   LearningRate 0.0852   Epoch: 1   Global Step: 7760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:47,410-Speed 5445.94 samples/sec   Loss 12.0476   LearningRate 0.0852   Epoch: 1   Global Step: 7770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:49,219-Speed 5663.48 samples/sec   Loss 12.2557   LearningRate 0.0852   Epoch: 1   Global Step: 7780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:51,097-Speed 5456.99 samples/sec   Loss 12.2037   LearningRate 0.0852   Epoch: 1   Global Step: 7790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:52,903-Speed 5671.27 samples/sec   Loss 12.0654   LearningRate 0.0852   Epoch: 1   Global Step: 7800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:54,731-Speed 5604.40 samples/sec   Loss 12.1520   LearningRate 0.0852   Epoch: 1   Global Step: 7810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:56,544-Speed 5651.50 samples/sec   Loss 11.9496   LearningRate 0.0851   Epoch: 1   Global Step: 7820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:46:58,359-Speed 5644.49 samples/sec   Loss 12.0411   LearningRate 0.0851   Epoch: 1   Global Step: 7830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:00,215-Speed 5523.17 samples/sec   Loss 12.2397   LearningRate 0.0851   Epoch: 1   Global Step: 7840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:02,030-Speed 5644.02 samples/sec   Loss 11.9815   LearningRate 0.0851   Epoch: 1   Global Step: 7850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:03,868-Speed 5575.94 samples/sec   Loss 12.1241   LearningRate 0.0851   Epoch: 1   Global Step: 7860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:05,675-Speed 5669.49 samples/sec   Loss 12.0092   LearningRate 0.0850   Epoch: 1   Global Step: 7870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:07,485-Speed 5659.75 samples/sec   Loss 11.9638   LearningRate 0.0850   Epoch: 1   Global Step: 7880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:09,336-Speed 5535.46 samples/sec   Loss 11.8489   LearningRate 0.0850   Epoch: 1   Global Step: 7890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:11,160-Speed 5617.92 samples/sec   Loss 12.1543   LearningRate 0.0850   Epoch: 1   Global Step: 7900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:12,970-Speed 5659.13 samples/sec   Loss 12.0715   LearningRate 0.0850   Epoch: 1   Global Step: 7910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:14,842-Speed 5474.85 samples/sec   Loss 12.2130   LearningRate 0.0850   Epoch: 1   Global Step: 7920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:16,652-Speed 5658.82 samples/sec   Loss 12.2111   LearningRate 0.0849   Epoch: 1   Global Step: 7930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:18,493-Speed 5568.09 samples/sec   Loss 11.9880   LearningRate 0.0849   Epoch: 1   Global Step: 7940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:20,330-Speed 5578.05 samples/sec   Loss 11.8472   LearningRate 0.0849   Epoch: 1   Global Step: 7950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:22,161-Speed 5595.92 samples/sec   Loss 11.9506   LearningRate 0.0849   Epoch: 1   Global Step: 7960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:24,021-Speed 5507.71 samples/sec   Loss 11.9863   LearningRate 0.0849   Epoch: 1   Global Step: 7970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:25,831-Speed 5661.04 samples/sec   Loss 12.0470   LearningRate 0.0848   Epoch: 1   Global Step: 7980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:27,649-Speed 5635.62 samples/sec   Loss 11.9003   LearningRate 0.0848   Epoch: 1   Global Step: 7990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:29,493-Speed 5556.93 samples/sec   Loss 12.0393   LearningRate 0.0848   Epoch: 1   Global Step: 8000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:47:57,215-[lfw][8000]XNorm: 22.081268
Training: 2022-04-11 10:47:57,215-[lfw][8000]Accuracy-Flip: 0.99417+-0.00318
Training: 2022-04-11 10:47:57,216-[lfw][8000]Accuracy-Highest: 0.99417
Training: 2022-04-11 10:48:28,862-[cfp_fp][8000]XNorm: 19.295822
Training: 2022-04-11 10:48:28,863-[cfp_fp][8000]Accuracy-Flip: 0.92086+-0.01054
Training: 2022-04-11 10:48:28,863-[cfp_fp][8000]Accuracy-Highest: 0.92086
Training: 2022-04-11 10:48:56,125-[agedb_30][8000]XNorm: 21.211359
Training: 2022-04-11 10:48:56,126-[agedb_30][8000]Accuracy-Flip: 0.95567+-0.00854
Training: 2022-04-11 10:48:56,127-[agedb_30][8000]Accuracy-Highest: 0.95567
Training: 2022-04-11 10:48:57,950-Speed 115.76 samples/sec   Loss 12.0456   LearningRate 0.0848   Epoch: 1   Global Step: 8010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:48:59,756-Speed 5671.80 samples/sec   Loss 12.0100   LearningRate 0.0848   Epoch: 1   Global Step: 8020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:01,555-Speed 5696.50 samples/sec   Loss 12.1607   LearningRate 0.0848   Epoch: 1   Global Step: 8030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:03,369-Speed 5649.71 samples/sec   Loss 12.1217   LearningRate 0.0847   Epoch: 1   Global Step: 8040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:05,175-Speed 5672.58 samples/sec   Loss 11.9567   LearningRate 0.0847   Epoch: 1   Global Step: 8050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:06,978-Speed 5681.59 samples/sec   Loss 12.0276   LearningRate 0.0847   Epoch: 1   Global Step: 8060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:08,820-Speed 5563.28 samples/sec   Loss 11.9976   LearningRate 0.0847   Epoch: 1   Global Step: 8070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:10,634-Speed 5646.91 samples/sec   Loss 11.7917   LearningRate 0.0847   Epoch: 1   Global Step: 8080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:12,452-Speed 5636.05 samples/sec   Loss 12.0303   LearningRate 0.0846   Epoch: 1   Global Step: 8090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:14,266-Speed 5646.07 samples/sec   Loss 12.0692   LearningRate 0.0846   Epoch: 1   Global Step: 8100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:16,068-Speed 5684.71 samples/sec   Loss 11.8618   LearningRate 0.0846   Epoch: 1   Global Step: 8110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:17,911-Speed 5561.01 samples/sec   Loss 11.9524   LearningRate 0.0846   Epoch: 1   Global Step: 8120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:19,710-Speed 5693.63 samples/sec   Loss 12.0166   LearningRate 0.0846   Epoch: 1   Global Step: 8130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:21,536-Speed 5611.23 samples/sec   Loss 11.9919   LearningRate 0.0846   Epoch: 1   Global Step: 8140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:23,343-Speed 5669.48 samples/sec   Loss 11.9697   LearningRate 0.0845   Epoch: 1   Global Step: 8150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:25,150-Speed 5670.49 samples/sec   Loss 11.8702   LearningRate 0.0845   Epoch: 1   Global Step: 8160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:26,972-Speed 5619.94 samples/sec   Loss 11.8602   LearningRate 0.0845   Epoch: 1   Global Step: 8170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:28,793-Speed 5626.49 samples/sec   Loss 11.9774   LearningRate 0.0845   Epoch: 1   Global Step: 8180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:30,658-Speed 5494.40 samples/sec   Loss 11.8104   LearningRate 0.0845   Epoch: 1   Global Step: 8190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:32,458-Speed 5691.72 samples/sec   Loss 12.1036   LearningRate 0.0844   Epoch: 1   Global Step: 8200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:34,296-Speed 5575.87 samples/sec   Loss 12.0969   LearningRate 0.0844   Epoch: 1   Global Step: 8210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:36,105-Speed 5664.94 samples/sec   Loss 12.0233   LearningRate 0.0844   Epoch: 1   Global Step: 8220   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 10:49:37,922-Speed 5639.12 samples/sec   Loss 11.9617   LearningRate 0.0844   Epoch: 1   Global Step: 8230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:39,780-Speed 5515.53 samples/sec   Loss 12.1158   LearningRate 0.0844   Epoch: 1   Global Step: 8240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:41,589-Speed 5664.44 samples/sec   Loss 12.0268   LearningRate 0.0844   Epoch: 1   Global Step: 8250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:43,388-Speed 5695.07 samples/sec   Loss 12.0212   LearningRate 0.0843   Epoch: 1   Global Step: 8260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:45,229-Speed 5565.56 samples/sec   Loss 11.9674   LearningRate 0.0843   Epoch: 1   Global Step: 8270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:47,040-Speed 5658.13 samples/sec   Loss 11.8822   LearningRate 0.0843   Epoch: 1   Global Step: 8280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:48,883-Speed 5556.46 samples/sec   Loss 11.7937   LearningRate 0.0843   Epoch: 1   Global Step: 8290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:50,733-Speed 5539.92 samples/sec   Loss 12.0021   LearningRate 0.0843   Epoch: 1   Global Step: 8300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:52,555-Speed 5623.53 samples/sec   Loss 11.9657   LearningRate 0.0842   Epoch: 1   Global Step: 8310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:54,388-Speed 5587.68 samples/sec   Loss 11.8800   LearningRate 0.0842   Epoch: 1   Global Step: 8320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:56,204-Speed 5640.33 samples/sec   Loss 11.8922   LearningRate 0.0842   Epoch: 1   Global Step: 8330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:58,022-Speed 5637.09 samples/sec   Loss 12.0319   LearningRate 0.0842   Epoch: 1   Global Step: 8340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:49:59,829-Speed 5668.36 samples/sec   Loss 11.7271   LearningRate 0.0842   Epoch: 1   Global Step: 8350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:01,662-Speed 5588.79 samples/sec   Loss 11.8530   LearningRate 0.0842   Epoch: 1   Global Step: 8360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:03,498-Speed 5581.12 samples/sec   Loss 12.1727   LearningRate 0.0841   Epoch: 1   Global Step: 8370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:05,310-Speed 5652.85 samples/sec   Loss 11.8916   LearningRate 0.0841   Epoch: 1   Global Step: 8380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:07,202-Speed 5417.43 samples/sec   Loss 11.8874   LearningRate 0.0841   Epoch: 1   Global Step: 8390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:09,013-Speed 5656.89 samples/sec   Loss 11.7485   LearningRate 0.0841   Epoch: 1   Global Step: 8400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:10,837-Speed 5617.07 samples/sec   Loss 11.9435   LearningRate 0.0841   Epoch: 1   Global Step: 8410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:12,678-Speed 5562.38 samples/sec   Loss 11.9612   LearningRate 0.0840   Epoch: 1   Global Step: 8420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:14,488-Speed 5659.74 samples/sec   Loss 11.8089   LearningRate 0.0840   Epoch: 1   Global Step: 8430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:16,307-Speed 5634.05 samples/sec   Loss 11.9132   LearningRate 0.0840   Epoch: 1   Global Step: 8440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:18,160-Speed 5528.70 samples/sec   Loss 11.7615   LearningRate 0.0840   Epoch: 1   Global Step: 8450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:19,991-Speed 5595.12 samples/sec   Loss 11.9148   LearningRate 0.0840   Epoch: 1   Global Step: 8460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:21,834-Speed 5560.70 samples/sec   Loss 11.9500   LearningRate 0.0840   Epoch: 1   Global Step: 8470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:23,671-Speed 5578.73 samples/sec   Loss 11.9070   LearningRate 0.0839   Epoch: 1   Global Step: 8480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:25,478-Speed 5672.93 samples/sec   Loss 11.8167   LearningRate 0.0839   Epoch: 1   Global Step: 8490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:27,301-Speed 5618.11 samples/sec   Loss 11.7210   LearningRate 0.0839   Epoch: 1   Global Step: 8500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:29,106-Speed 5676.97 samples/sec   Loss 11.8292   LearningRate 0.0839   Epoch: 1   Global Step: 8510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:30,908-Speed 5684.38 samples/sec   Loss 11.7854   LearningRate 0.0839   Epoch: 1   Global Step: 8520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:32,719-Speed 5659.54 samples/sec   Loss 11.8030   LearningRate 0.0838   Epoch: 1   Global Step: 8530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:34,557-Speed 5574.49 samples/sec   Loss 11.7852   LearningRate 0.0838   Epoch: 1   Global Step: 8540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:36,383-Speed 5608.58 samples/sec   Loss 11.8129   LearningRate 0.0838   Epoch: 1   Global Step: 8550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:38,230-Speed 5547.74 samples/sec   Loss 12.0251   LearningRate 0.0838   Epoch: 1   Global Step: 8560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:40,042-Speed 5654.99 samples/sec   Loss 11.8900   LearningRate 0.0838   Epoch: 1   Global Step: 8570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:41,896-Speed 5524.51 samples/sec   Loss 11.8700   LearningRate 0.0838   Epoch: 1   Global Step: 8580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:43,722-Speed 5611.09 samples/sec   Loss 11.8679   LearningRate 0.0837   Epoch: 1   Global Step: 8590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:45,545-Speed 5618.98 samples/sec   Loss 11.8199   LearningRate 0.0837   Epoch: 1   Global Step: 8600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:50:47,353-Speed 5665.75 samples/sec   Loss 12.0055   LearningRate 0.0837   Epoch: 1   Global Step: 8610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:50:49,170-Speed 5639.86 samples/sec   Loss 11.6771   LearningRate 0.0837   Epoch: 1   Global Step: 8620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:50:50,979-Speed 5662.52 samples/sec   Loss 11.6362   LearningRate 0.0837   Epoch: 1   Global Step: 8630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:50:52,824-Speed 5552.78 samples/sec   Loss 11.9845   LearningRate 0.0836   Epoch: 1   Global Step: 8640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:50:54,636-Speed 5654.75 samples/sec   Loss 11.8356   LearningRate 0.0836   Epoch: 1   Global Step: 8650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:50:56,476-Speed 5567.36 samples/sec   Loss 11.7226   LearningRate 0.0836   Epoch: 1   Global Step: 8660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:50:58,277-Speed 5689.86 samples/sec   Loss 11.7836   LearningRate 0.0836   Epoch: 1   Global Step: 8670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:51:00,084-Speed 5667.02 samples/sec   Loss 11.7587   LearningRate 0.0836   Epoch: 1   Global Step: 8680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:51:01,902-Speed 5637.70 samples/sec   Loss 11.7364   LearningRate 0.0836   Epoch: 1   Global Step: 8690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:51:03,711-Speed 5663.03 samples/sec   Loss 11.7948   LearningRate 0.0835   Epoch: 1   Global Step: 8700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:51:05,522-Speed 5655.29 samples/sec   Loss 11.8346   LearningRate 0.0835   Epoch: 1   Global Step: 8710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:07,346-Speed 5617.43 samples/sec   Loss 11.8069   LearningRate 0.0835   Epoch: 1   Global Step: 8720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:09,195-Speed 5543.54 samples/sec   Loss 11.7656   LearningRate 0.0835   Epoch: 1   Global Step: 8730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:11,076-Speed 5444.77 samples/sec   Loss 11.9411   LearningRate 0.0835   Epoch: 1   Global Step: 8740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:12,883-Speed 5668.84 samples/sec   Loss 11.7921   LearningRate 0.0834   Epoch: 1   Global Step: 8750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:14,710-Speed 5607.56 samples/sec   Loss 11.7172   LearningRate 0.0834   Epoch: 1   Global Step: 8760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:16,539-Speed 5602.03 samples/sec   Loss 11.6484   LearningRate 0.0834   Epoch: 1   Global Step: 8770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:18,356-Speed 5636.92 samples/sec   Loss 11.7092   LearningRate 0.0834   Epoch: 1   Global Step: 8780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:20,163-Speed 5671.24 samples/sec   Loss 11.5859   LearningRate 0.0834   Epoch: 1   Global Step: 8790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:21,971-Speed 5666.44 samples/sec   Loss 11.9443   LearningRate 0.0834   Epoch: 1   Global Step: 8800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:23,770-Speed 5693.49 samples/sec   Loss 11.6350   LearningRate 0.0833   Epoch: 1   Global Step: 8810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:25,622-Speed 5533.72 samples/sec   Loss 11.5946   LearningRate 0.0833   Epoch: 1   Global Step: 8820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:27,474-Speed 5531.25 samples/sec   Loss 11.5152   LearningRate 0.0833   Epoch: 1   Global Step: 8830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:29,306-Speed 5591.30 samples/sec   Loss 11.7577   LearningRate 0.0833   Epoch: 1   Global Step: 8840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:31,123-Speed 5639.76 samples/sec   Loss 11.7672   LearningRate 0.0833   Epoch: 1   Global Step: 8850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:32,928-Speed 5675.42 samples/sec   Loss 11.5403   LearningRate 0.0833   Epoch: 1   Global Step: 8860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:34,749-Speed 5626.40 samples/sec   Loss 11.7925   LearningRate 0.0832   Epoch: 1   Global Step: 8870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:36,563-Speed 5649.60 samples/sec   Loss 11.6250   LearningRate 0.0832   Epoch: 1   Global Step: 8880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:38,398-Speed 5581.37 samples/sec   Loss 11.6479   LearningRate 0.0832   Epoch: 1   Global Step: 8890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:40,207-Speed 5665.97 samples/sec   Loss 11.6031   LearningRate 0.0832   Epoch: 1   Global Step: 8900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:42,042-Speed 5582.22 samples/sec   Loss 11.7884   LearningRate 0.0832   Epoch: 1   Global Step: 8910   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 10:51:43,856-Speed 5648.32 samples/sec   Loss 11.7479   LearningRate 0.0831   Epoch: 1   Global Step: 8920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:45,665-Speed 5659.92 samples/sec   Loss 11.6054   LearningRate 0.0831   Epoch: 1   Global Step: 8930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:47,493-Speed 5607.08 samples/sec   Loss 11.6829   LearningRate 0.0831   Epoch: 1   Global Step: 8940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:49,326-Speed 5587.33 samples/sec   Loss 11.7054   LearningRate 0.0831   Epoch: 1   Global Step: 8950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:51:51,180-Speed 5526.19 samples/sec   Loss 11.6280   LearningRate 0.0831   Epoch: 1   Global Step: 8960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:51:52,996-Speed 5643.68 samples/sec   Loss 11.7370   LearningRate 0.0831   Epoch: 1   Global Step: 8970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:51:54,801-Speed 5674.34 samples/sec   Loss 11.6904   LearningRate 0.0830   Epoch: 1   Global Step: 8980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:51:56,650-Speed 5543.21 samples/sec   Loss 11.6614   LearningRate 0.0830   Epoch: 1   Global Step: 8990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:51:58,507-Speed 5515.06 samples/sec   Loss 11.6586   LearningRate 0.0830   Epoch: 1   Global Step: 9000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:52:00,336-Speed 5601.59 samples/sec   Loss 11.8396   LearningRate 0.0830   Epoch: 1   Global Step: 9010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:52:02,151-Speed 5645.42 samples/sec   Loss 11.5966   LearningRate 0.0830   Epoch: 1   Global Step: 9020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:52:03,996-Speed 5553.54 samples/sec   Loss 11.6397   LearningRate 0.0829   Epoch: 1   Global Step: 9030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:52:05,800-Speed 5677.79 samples/sec   Loss 11.9146   LearningRate 0.0829   Epoch: 1   Global Step: 9040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:52:07,633-Speed 5588.91 samples/sec   Loss 11.5535   LearningRate 0.0829   Epoch: 1   Global Step: 9050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:52:09,438-Speed 5678.23 samples/sec   Loss 11.5365   LearningRate 0.0829   Epoch: 1   Global Step: 9060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:11,249-Speed 5655.59 samples/sec   Loss 11.6009   LearningRate 0.0829   Epoch: 1   Global Step: 9070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:13,071-Speed 5624.59 samples/sec   Loss 11.6722   LearningRate 0.0829   Epoch: 1   Global Step: 9080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:14,892-Speed 5623.38 samples/sec   Loss 11.7785   LearningRate 0.0828   Epoch: 1   Global Step: 9090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:16,739-Speed 5547.08 samples/sec   Loss 11.5616   LearningRate 0.0828   Epoch: 1   Global Step: 9100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:18,543-Speed 5678.96 samples/sec   Loss 11.6224   LearningRate 0.0828   Epoch: 1   Global Step: 9110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:20,352-Speed 5662.82 samples/sec   Loss 11.5517   LearningRate 0.0828   Epoch: 1   Global Step: 9120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:22,215-Speed 5501.46 samples/sec   Loss 11.7127   LearningRate 0.0828   Epoch: 1   Global Step: 9130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:24,017-Speed 5683.68 samples/sec   Loss 11.6960   LearningRate 0.0827   Epoch: 1   Global Step: 9140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:25,955-Speed 5286.92 samples/sec   Loss 11.5106   LearningRate 0.0827   Epoch: 1   Global Step: 9150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:27,753-Speed 5696.28 samples/sec   Loss 11.6576   LearningRate 0.0827   Epoch: 1   Global Step: 9160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:52:29,596-Speed 5559.17 samples/sec   Loss 11.5236   LearningRate 0.0827   Epoch: 1   Global Step: 9170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:52:31,436-Speed 5570.85 samples/sec   Loss 11.4550   LearningRate 0.0827   Epoch: 1   Global Step: 9180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:52:33,235-Speed 5694.14 samples/sec   Loss 11.5504   LearningRate 0.0827   Epoch: 1   Global Step: 9190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:52:35,040-Speed 5674.93 samples/sec   Loss 11.5508   LearningRate 0.0826   Epoch: 1   Global Step: 9200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:52:36,858-Speed 5635.28 samples/sec   Loss 11.7055   LearningRate 0.0826   Epoch: 1   Global Step: 9210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:52:38,663-Speed 5678.03 samples/sec   Loss 11.5971   LearningRate 0.0826   Epoch: 1   Global Step: 9220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:52:40,474-Speed 5656.49 samples/sec   Loss 11.5486   LearningRate 0.0826   Epoch: 1   Global Step: 9230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:52:42,321-Speed 5547.61 samples/sec   Loss 11.4217   LearningRate 0.0826   Epoch: 1   Global Step: 9240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:52:44,127-Speed 5671.40 samples/sec   Loss 11.6198   LearningRate 0.0825   Epoch: 1   Global Step: 9250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:52:46,003-Speed 5462.06 samples/sec   Loss 11.5756   LearningRate 0.0825   Epoch: 1   Global Step: 9260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:47,809-Speed 5673.30 samples/sec   Loss 11.4635   LearningRate 0.0825   Epoch: 1   Global Step: 9270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:49,631-Speed 5622.10 samples/sec   Loss 11.3894   LearningRate 0.0825   Epoch: 1   Global Step: 9280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:51,533-Speed 5386.21 samples/sec   Loss 11.5023   LearningRate 0.0825   Epoch: 1   Global Step: 9290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:53,348-Speed 5644.25 samples/sec   Loss 11.4400   LearningRate 0.0825   Epoch: 1   Global Step: 9300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:55,159-Speed 5655.45 samples/sec   Loss 11.6683   LearningRate 0.0824   Epoch: 1   Global Step: 9310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:56,995-Speed 5579.55 samples/sec   Loss 11.5974   LearningRate 0.0824   Epoch: 1   Global Step: 9320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:52:58,799-Speed 5680.26 samples/sec   Loss 11.4540   LearningRate 0.0824   Epoch: 1   Global Step: 9330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:00,658-Speed 5511.53 samples/sec   Loss 11.6107   LearningRate 0.0824   Epoch: 1   Global Step: 9340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:02,475-Speed 5640.53 samples/sec   Loss 11.5538   LearningRate 0.0824   Epoch: 1   Global Step: 9350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:04,301-Speed 5607.90 samples/sec   Loss 11.7580   LearningRate 0.0824   Epoch: 1   Global Step: 9360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:06,112-Speed 5658.52 samples/sec   Loss 11.4134   LearningRate 0.0823   Epoch: 1   Global Step: 9370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:07,914-Speed 5685.28 samples/sec   Loss 11.6036   LearningRate 0.0823   Epoch: 1   Global Step: 9380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:09,744-Speed 5600.27 samples/sec   Loss 11.6929   LearningRate 0.0823   Epoch: 1   Global Step: 9390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:11,553-Speed 5661.52 samples/sec   Loss 11.4477   LearningRate 0.0823   Epoch: 1   Global Step: 9400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:13,382-Speed 5601.95 samples/sec   Loss 11.5334   LearningRate 0.0823   Epoch: 1   Global Step: 9410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:15,202-Speed 5629.53 samples/sec   Loss 11.4329   LearningRate 0.0822   Epoch: 1   Global Step: 9420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:17,035-Speed 5586.82 samples/sec   Loss 11.4974   LearningRate 0.0822   Epoch: 1   Global Step: 9430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:18,855-Speed 5627.83 samples/sec   Loss 11.7482   LearningRate 0.0822   Epoch: 1   Global Step: 9440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:20,661-Speed 5673.59 samples/sec   Loss 11.3903   LearningRate 0.0822   Epoch: 1   Global Step: 9450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:22,474-Speed 5652.82 samples/sec   Loss 11.6052   LearningRate 0.0822   Epoch: 1   Global Step: 9460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:24,291-Speed 5640.54 samples/sec   Loss 11.5193   LearningRate 0.0822   Epoch: 1   Global Step: 9470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:26,098-Speed 5667.45 samples/sec   Loss 11.3829   LearningRate 0.0821   Epoch: 1   Global Step: 9480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:27,926-Speed 5605.86 samples/sec   Loss 11.4876   LearningRate 0.0821   Epoch: 1   Global Step: 9490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:29,760-Speed 5585.12 samples/sec   Loss 11.5169   LearningRate 0.0821   Epoch: 1   Global Step: 9500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:31,582-Speed 5625.67 samples/sec   Loss 11.4479   LearningRate 0.0821   Epoch: 1   Global Step: 9510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:33,402-Speed 5630.45 samples/sec   Loss 11.4840   LearningRate 0.0821   Epoch: 1   Global Step: 9520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:35,209-Speed 5670.63 samples/sec   Loss 11.5313   LearningRate 0.0820   Epoch: 1   Global Step: 9530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:37,028-Speed 5632.63 samples/sec   Loss 11.4213   LearningRate 0.0820   Epoch: 1   Global Step: 9540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:38,843-Speed 5645.62 samples/sec   Loss 11.3721   LearningRate 0.0820   Epoch: 1   Global Step: 9550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:40,686-Speed 5558.52 samples/sec   Loss 11.5621   LearningRate 0.0820   Epoch: 1   Global Step: 9560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:42,494-Speed 5666.01 samples/sec   Loss 11.2946   LearningRate 0.0820   Epoch: 1   Global Step: 9570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:44,296-Speed 5684.17 samples/sec   Loss 11.5335   LearningRate 0.0820   Epoch: 1   Global Step: 9580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:46,129-Speed 5590.34 samples/sec   Loss 11.8026   LearningRate 0.0819   Epoch: 1   Global Step: 9590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:47,929-Speed 5691.07 samples/sec   Loss 11.5379   LearningRate 0.0819   Epoch: 1   Global Step: 9600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:49,733-Speed 5679.74 samples/sec   Loss 11.3824   LearningRate 0.0819   Epoch: 1   Global Step: 9610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:51,577-Speed 5556.37 samples/sec   Loss 11.3665   LearningRate 0.0819   Epoch: 1   Global Step: 9620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:53,389-Speed 5652.58 samples/sec   Loss 11.2230   LearningRate 0.0819   Epoch: 1   Global Step: 9630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:55,215-Speed 5612.19 samples/sec   Loss 11.4752   LearningRate 0.0818   Epoch: 1   Global Step: 9640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:57,024-Speed 5662.97 samples/sec   Loss 11.4465   LearningRate 0.0818   Epoch: 1   Global Step: 9650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:53:58,824-Speed 5689.77 samples/sec   Loss 11.4819   LearningRate 0.0818   Epoch: 1   Global Step: 9660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:00,662-Speed 5574.32 samples/sec   Loss 11.3358   LearningRate 0.0818   Epoch: 1   Global Step: 9670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:02,469-Speed 5673.50 samples/sec   Loss 11.3520   LearningRate 0.0818   Epoch: 1   Global Step: 9680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:04,272-Speed 5683.12 samples/sec   Loss 11.4599   LearningRate 0.0818   Epoch: 1   Global Step: 9690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:06,097-Speed 5611.86 samples/sec   Loss 11.5488   LearningRate 0.0817   Epoch: 1   Global Step: 9700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:07,897-Speed 5692.51 samples/sec   Loss 11.3132   LearningRate 0.0817   Epoch: 1   Global Step: 9710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:09,745-Speed 5544.81 samples/sec   Loss 11.3766   LearningRate 0.0817   Epoch: 1   Global Step: 9720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:11,549-Speed 5679.83 samples/sec   Loss 11.2326   LearningRate 0.0817   Epoch: 1   Global Step: 9730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:13,353-Speed 5677.26 samples/sec   Loss 11.2633   LearningRate 0.0817   Epoch: 1   Global Step: 9740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:15,168-Speed 5642.72 samples/sec   Loss 11.3867   LearningRate 0.0817   Epoch: 1   Global Step: 9750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:17,009-Speed 5565.65 samples/sec   Loss 11.2646   LearningRate 0.0816   Epoch: 1   Global Step: 9760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:18,821-Speed 5655.31 samples/sec   Loss 11.3250   LearningRate 0.0816   Epoch: 1   Global Step: 9770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:20,651-Speed 5599.80 samples/sec   Loss 11.4040   LearningRate 0.0816   Epoch: 1   Global Step: 9780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:22,462-Speed 5663.67 samples/sec   Loss 11.4125   LearningRate 0.0816   Epoch: 1   Global Step: 9790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:24,336-Speed 5465.37 samples/sec   Loss 11.5447   LearningRate 0.0816   Epoch: 1   Global Step: 9800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:26,149-Speed 5651.61 samples/sec   Loss 11.3842   LearningRate 0.0815   Epoch: 1   Global Step: 9810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:27,988-Speed 5569.54 samples/sec   Loss 11.3801   LearningRate 0.0815   Epoch: 1   Global Step: 9820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:29,794-Speed 5673.73 samples/sec   Loss 11.2870   LearningRate 0.0815   Epoch: 1   Global Step: 9830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:31,642-Speed 5543.84 samples/sec   Loss 11.2840   LearningRate 0.0815   Epoch: 1   Global Step: 9840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:33,445-Speed 5681.10 samples/sec   Loss 11.4703   LearningRate 0.0815   Epoch: 1   Global Step: 9850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:35,290-Speed 5555.02 samples/sec   Loss 11.5834   LearningRate 0.0815   Epoch: 1   Global Step: 9860   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 10:54:37,101-Speed 5654.31 samples/sec   Loss 11.3345   LearningRate 0.0814   Epoch: 1   Global Step: 9870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:38,923-Speed 5627.00 samples/sec   Loss 11.3866   LearningRate 0.0814   Epoch: 1   Global Step: 9880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:40,730-Speed 5667.89 samples/sec   Loss 11.4509   LearningRate 0.0814   Epoch: 1   Global Step: 9890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:42,555-Speed 5614.77 samples/sec   Loss 11.3410   LearningRate 0.0814   Epoch: 1   Global Step: 9900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:44,358-Speed 5678.78 samples/sec   Loss 11.4007   LearningRate 0.0814   Epoch: 1   Global Step: 9910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:46,193-Speed 5583.42 samples/sec   Loss 11.2595   LearningRate 0.0813   Epoch: 1   Global Step: 9920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:48,008-Speed 5644.16 samples/sec   Loss 11.2341   LearningRate 0.0813   Epoch: 1   Global Step: 9930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:49,856-Speed 5546.56 samples/sec   Loss 11.5700   LearningRate 0.0813   Epoch: 1   Global Step: 9940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:51,675-Speed 5632.80 samples/sec   Loss 11.3144   LearningRate 0.0813   Epoch: 1   Global Step: 9950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:53,504-Speed 5601.84 samples/sec   Loss 11.3037   LearningRate 0.0813   Epoch: 1   Global Step: 9960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:55,382-Speed 5454.14 samples/sec   Loss 11.2136   LearningRate 0.0813   Epoch: 1   Global Step: 9970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:57,187-Speed 5676.30 samples/sec   Loss 11.3557   LearningRate 0.0812   Epoch: 1   Global Step: 9980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:54:59,009-Speed 5622.15 samples/sec   Loss 11.4865   LearningRate 0.0812   Epoch: 1   Global Step: 9990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:55:00,821-Speed 5654.72 samples/sec   Loss 11.2960   LearningRate 0.0812   Epoch: 1   Global Step: 10000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:55:28,458-[lfw][10000]XNorm: 21.465626
Training: 2022-04-11 10:55:28,458-[lfw][10000]Accuracy-Flip: 0.99483+-0.00293
Training: 2022-04-11 10:55:28,458-[lfw][10000]Accuracy-Highest: 0.99483
Training: 2022-04-11 10:55:59,773-[cfp_fp][10000]XNorm: 18.616215
Training: 2022-04-11 10:55:59,773-[cfp_fp][10000]Accuracy-Flip: 0.92457+-0.00973
Training: 2022-04-11 10:55:59,774-[cfp_fp][10000]Accuracy-Highest: 0.92457
Training: 2022-04-11 10:56:27,155-[agedb_30][10000]XNorm: 21.377548
Training: 2022-04-11 10:56:27,155-[agedb_30][10000]Accuracy-Flip: 0.96050+-0.00925
Training: 2022-04-11 10:56:27,155-[agedb_30][10000]Accuracy-Highest: 0.96050
Training: 2022-04-11 10:56:28,998-Speed 116.13 samples/sec   Loss 11.2706   LearningRate 0.0812   Epoch: 1   Global Step: 10010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:56:30,809-Speed 5658.11 samples/sec   Loss 11.2755   LearningRate 0.0812   Epoch: 1   Global Step: 10020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:56:32,605-Speed 5704.45 samples/sec   Loss 11.2625   LearningRate 0.0812   Epoch: 1   Global Step: 10030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:56:34,470-Speed 5492.45 samples/sec   Loss 11.4125   LearningRate 0.0811   Epoch: 1   Global Step: 10040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:56:36,274-Speed 5679.56 samples/sec   Loss 11.3464   LearningRate 0.0811   Epoch: 1   Global Step: 10050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:56:38,102-Speed 5606.33 samples/sec   Loss 11.2382   LearningRate 0.0811   Epoch: 1   Global Step: 10060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:56:39,903-Speed 5688.29 samples/sec   Loss 11.2890   LearningRate 0.0811   Epoch: 1   Global Step: 10070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:56:41,735-Speed 5593.77 samples/sec   Loss 11.3047   LearningRate 0.0811   Epoch: 1   Global Step: 10080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:56:43,533-Speed 5695.60 samples/sec   Loss 11.2983   LearningRate 0.0810   Epoch: 1   Global Step: 10090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:56:45,361-Speed 5605.30 samples/sec   Loss 11.3921   LearningRate 0.0810   Epoch: 1   Global Step: 10100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:56:47,307-Speed 5264.46 samples/sec   Loss 11.1143   LearningRate 0.0810   Epoch: 1   Global Step: 10110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:56:59,442-Speed 843.95 samples/sec   Loss 10.7863   LearningRate 0.0810   Epoch: 2   Global Step: 10120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:01,292-Speed 5539.19 samples/sec   Loss 10.4029   LearningRate 0.0810   Epoch: 2   Global Step: 10130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:03,151-Speed 5512.99 samples/sec   Loss 10.6321   LearningRate 0.0810   Epoch: 2   Global Step: 10140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:04,981-Speed 5601.90 samples/sec   Loss 10.6596   LearningRate 0.0809   Epoch: 2   Global Step: 10150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:06,918-Speed 5290.13 samples/sec   Loss 10.2906   LearningRate 0.0809   Epoch: 2   Global Step: 10160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:08,909-Speed 5146.06 samples/sec   Loss 10.3439   LearningRate 0.0809   Epoch: 2   Global Step: 10170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:10,757-Speed 5543.09 samples/sec   Loss 10.5188   LearningRate 0.0809   Epoch: 2   Global Step: 10180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:12,603-Speed 5549.34 samples/sec   Loss 10.5784   LearningRate 0.0809   Epoch: 2   Global Step: 10190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:14,437-Speed 5621.25 samples/sec   Loss 10.4105   LearningRate 0.0809   Epoch: 2   Global Step: 10200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:16,327-Speed 5420.76 samples/sec   Loss 10.5836   LearningRate 0.0808   Epoch: 2   Global Step: 10210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:18,157-Speed 5598.95 samples/sec   Loss 10.6532   LearningRate 0.0808   Epoch: 2   Global Step: 10220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:20,016-Speed 5509.26 samples/sec   Loss 10.4835   LearningRate 0.0808   Epoch: 2   Global Step: 10230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:21,821-Speed 5678.31 samples/sec   Loss 10.6118   LearningRate 0.0808   Epoch: 2   Global Step: 10240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:23,659-Speed 5574.13 samples/sec   Loss 10.3883   LearningRate 0.0808   Epoch: 2   Global Step: 10250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:25,471-Speed 5656.72 samples/sec   Loss 10.5154   LearningRate 0.0807   Epoch: 2   Global Step: 10260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:27,270-Speed 5693.93 samples/sec   Loss 10.5790   LearningRate 0.0807   Epoch: 2   Global Step: 10270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:29,106-Speed 5581.75 samples/sec   Loss 10.6977   LearningRate 0.0807   Epoch: 2   Global Step: 10280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:30,964-Speed 5512.02 samples/sec   Loss 10.7059   LearningRate 0.0807   Epoch: 2   Global Step: 10290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:32,769-Speed 5676.88 samples/sec   Loss 10.6855   LearningRate 0.0807   Epoch: 2   Global Step: 10300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:34,599-Speed 5599.33 samples/sec   Loss 10.6687   LearningRate 0.0807   Epoch: 2   Global Step: 10310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:36,431-Speed 5589.81 samples/sec   Loss 10.6844   LearningRate 0.0806   Epoch: 2   Global Step: 10320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:38,265-Speed 5586.72 samples/sec   Loss 10.6289   LearningRate 0.0806   Epoch: 2   Global Step: 10330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:40,078-Speed 5652.36 samples/sec   Loss 10.7103   LearningRate 0.0806   Epoch: 2   Global Step: 10340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:41,921-Speed 5556.40 samples/sec   Loss 10.8042   LearningRate 0.0806   Epoch: 2   Global Step: 10350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:43,760-Speed 5572.00 samples/sec   Loss 10.7823   LearningRate 0.0806   Epoch: 2   Global Step: 10360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:45,555-Speed 5706.60 samples/sec   Loss 10.6930   LearningRate 0.0805   Epoch: 2   Global Step: 10370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:47,388-Speed 5596.83 samples/sec   Loss 10.7881   LearningRate 0.0805   Epoch: 2   Global Step: 10380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:49,191-Speed 5683.34 samples/sec   Loss 10.6766   LearningRate 0.0805   Epoch: 2   Global Step: 10390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:51,003-Speed 5652.06 samples/sec   Loss 10.7716   LearningRate 0.0805   Epoch: 2   Global Step: 10400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:52,904-Speed 5391.73 samples/sec   Loss 10.8460   LearningRate 0.0805   Epoch: 2   Global Step: 10410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:54,759-Speed 5522.85 samples/sec   Loss 10.7293   LearningRate 0.0805   Epoch: 2   Global Step: 10420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:56,596-Speed 5578.38 samples/sec   Loss 10.8922   LearningRate 0.0804   Epoch: 2   Global Step: 10430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:57:58,441-Speed 5553.59 samples/sec   Loss 10.7553   LearningRate 0.0804   Epoch: 2   Global Step: 10440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:00,247-Speed 5673.28 samples/sec   Loss 10.6825   LearningRate 0.0804   Epoch: 2   Global Step: 10450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:02,108-Speed 5505.29 samples/sec   Loss 10.6781   LearningRate 0.0804   Epoch: 2   Global Step: 10460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:03,925-Speed 5639.63 samples/sec   Loss 10.6175   LearningRate 0.0804   Epoch: 2   Global Step: 10470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:05,771-Speed 5552.23 samples/sec   Loss 10.7481   LearningRate 0.0804   Epoch: 2   Global Step: 10480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:07,592-Speed 5624.69 samples/sec   Loss 10.5945   LearningRate 0.0803   Epoch: 2   Global Step: 10490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:09,433-Speed 5566.83 samples/sec   Loss 10.7872   LearningRate 0.0803   Epoch: 2   Global Step: 10500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:11,248-Speed 5646.67 samples/sec   Loss 10.7455   LearningRate 0.0803   Epoch: 2   Global Step: 10510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:13,075-Speed 5607.15 samples/sec   Loss 10.6704   LearningRate 0.0803   Epoch: 2   Global Step: 10520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:14,890-Speed 5642.08 samples/sec   Loss 10.8112   LearningRate 0.0803   Epoch: 2   Global Step: 10530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:16,726-Speed 5581.57 samples/sec   Loss 10.9836   LearningRate 0.0802   Epoch: 2   Global Step: 10540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:18,578-Speed 5532.65 samples/sec   Loss 10.9387   LearningRate 0.0802   Epoch: 2   Global Step: 10550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:20,461-Speed 5440.82 samples/sec   Loss 10.8568   LearningRate 0.0802   Epoch: 2   Global Step: 10560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:22,285-Speed 5616.47 samples/sec   Loss 10.8439   LearningRate 0.0802   Epoch: 2   Global Step: 10570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:24,124-Speed 5572.52 samples/sec   Loss 10.9868   LearningRate 0.0802   Epoch: 2   Global Step: 10580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:25,959-Speed 5580.89 samples/sec   Loss 10.8578   LearningRate 0.0802   Epoch: 2   Global Step: 10590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:27,773-Speed 5647.18 samples/sec   Loss 10.7951   LearningRate 0.0801   Epoch: 2   Global Step: 10600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:29,601-Speed 5605.71 samples/sec   Loss 10.8886   LearningRate 0.0801   Epoch: 2   Global Step: 10610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:31,419-Speed 5636.39 samples/sec   Loss 10.9178   LearningRate 0.0801   Epoch: 2   Global Step: 10620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:33,237-Speed 5635.11 samples/sec   Loss 10.8221   LearningRate 0.0801   Epoch: 2   Global Step: 10630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:35,048-Speed 5657.07 samples/sec   Loss 10.8386   LearningRate 0.0801   Epoch: 2   Global Step: 10640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:36,851-Speed 5681.04 samples/sec   Loss 11.0019   LearningRate 0.0801   Epoch: 2   Global Step: 10650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:58:38,704-Speed 5530.69 samples/sec   Loss 10.7744   LearningRate 0.0800   Epoch: 2   Global Step: 10660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:58:40,514-Speed 5661.12 samples/sec   Loss 10.8629   LearningRate 0.0800   Epoch: 2   Global Step: 10670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:58:42,363-Speed 5539.29 samples/sec   Loss 11.0091   LearningRate 0.0800   Epoch: 2   Global Step: 10680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:58:44,183-Speed 5630.50 samples/sec   Loss 11.0246   LearningRate 0.0800   Epoch: 2   Global Step: 10690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:58:45,992-Speed 5663.94 samples/sec   Loss 11.0382   LearningRate 0.0800   Epoch: 2   Global Step: 10700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:58:47,823-Speed 5593.63 samples/sec   Loss 10.9172   LearningRate 0.0799   Epoch: 2   Global Step: 10710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:58:49,641-Speed 5634.82 samples/sec   Loss 11.0299   LearningRate 0.0799   Epoch: 2   Global Step: 10720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:58:51,451-Speed 5659.80 samples/sec   Loss 10.9407   LearningRate 0.0799   Epoch: 2   Global Step: 10730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:58:53,298-Speed 5548.65 samples/sec   Loss 10.9434   LearningRate 0.0799   Epoch: 2   Global Step: 10740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:58:55,141-Speed 5557.94 samples/sec   Loss 11.0094   LearningRate 0.0799   Epoch: 2   Global Step: 10750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:56,947-Speed 5672.70 samples/sec   Loss 10.7443   LearningRate 0.0799   Epoch: 2   Global Step: 10760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:58:58,784-Speed 5576.04 samples/sec   Loss 10.9317   LearningRate 0.0798   Epoch: 2   Global Step: 10770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:00,614-Speed 5599.86 samples/sec   Loss 11.0744   LearningRate 0.0798   Epoch: 2   Global Step: 10780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:02,419-Speed 5678.03 samples/sec   Loss 10.9402   LearningRate 0.0798   Epoch: 2   Global Step: 10790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:04,223-Speed 5677.57 samples/sec   Loss 10.8177   LearningRate 0.0798   Epoch: 2   Global Step: 10800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:06,065-Speed 5560.25 samples/sec   Loss 10.8487   LearningRate 0.0798   Epoch: 2   Global Step: 10810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:07,884-Speed 5634.77 samples/sec   Loss 11.1048   LearningRate 0.0798   Epoch: 2   Global Step: 10820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:09,740-Speed 5521.18 samples/sec   Loss 10.9379   LearningRate 0.0797   Epoch: 2   Global Step: 10830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:11,618-Speed 5454.32 samples/sec   Loss 10.8515   LearningRate 0.0797   Epoch: 2   Global Step: 10840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:13,428-Speed 5660.90 samples/sec   Loss 10.8613   LearningRate 0.0797   Epoch: 2   Global Step: 10850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:15,259-Speed 5595.45 samples/sec   Loss 11.0316   LearningRate 0.0797   Epoch: 2   Global Step: 10860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:17,138-Speed 5451.56 samples/sec   Loss 10.7536   LearningRate 0.0797   Epoch: 2   Global Step: 10870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:18,959-Speed 5625.27 samples/sec   Loss 10.7949   LearningRate 0.0796   Epoch: 2   Global Step: 10880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:20,805-Speed 5552.45 samples/sec   Loss 10.8597   LearningRate 0.0796   Epoch: 2   Global Step: 10890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:22,615-Speed 5659.45 samples/sec   Loss 10.8825   LearningRate 0.0796   Epoch: 2   Global Step: 10900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:24,483-Speed 5484.24 samples/sec   Loss 11.0401   LearningRate 0.0796   Epoch: 2   Global Step: 10910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:26,300-Speed 5638.61 samples/sec   Loss 10.8538   LearningRate 0.0796   Epoch: 2   Global Step: 10920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:28,131-Speed 5596.39 samples/sec   Loss 10.8505   LearningRate 0.0796   Epoch: 2   Global Step: 10930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:29,983-Speed 5531.75 samples/sec   Loss 11.0514   LearningRate 0.0795   Epoch: 2   Global Step: 10940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:31,833-Speed 5535.94 samples/sec   Loss 10.8038   LearningRate 0.0795   Epoch: 2   Global Step: 10950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:33,641-Speed 5667.66 samples/sec   Loss 10.9510   LearningRate 0.0795   Epoch: 2   Global Step: 10960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:35,471-Speed 5598.01 samples/sec   Loss 10.8496   LearningRate 0.0795   Epoch: 2   Global Step: 10970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:37,279-Speed 5667.82 samples/sec   Loss 10.9952   LearningRate 0.0795   Epoch: 2   Global Step: 10980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:39,114-Speed 5583.69 samples/sec   Loss 10.9900   LearningRate 0.0795   Epoch: 2   Global Step: 10990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:40,930-Speed 5641.92 samples/sec   Loss 10.9581   LearningRate 0.0794   Epoch: 2   Global Step: 11000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:42,784-Speed 5529.10 samples/sec   Loss 11.0176   LearningRate 0.0794   Epoch: 2   Global Step: 11010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:44,592-Speed 5666.95 samples/sec   Loss 11.0167   LearningRate 0.0794   Epoch: 2   Global Step: 11020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:46,417-Speed 5614.93 samples/sec   Loss 11.0250   LearningRate 0.0794   Epoch: 2   Global Step: 11030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:48,233-Speed 5640.29 samples/sec   Loss 10.9033   LearningRate 0.0794   Epoch: 2   Global Step: 11040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:50,044-Speed 5656.76 samples/sec   Loss 11.0729   LearningRate 0.0793   Epoch: 2   Global Step: 11050   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 10:59:51,920-Speed 5461.38 samples/sec   Loss 10.8672   LearningRate 0.0793   Epoch: 2   Global Step: 11060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 10:59:53,735-Speed 5645.08 samples/sec   Loss 10.8274   LearningRate 0.0793   Epoch: 2   Global Step: 11070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:59:55,627-Speed 5415.95 samples/sec   Loss 11.0687   LearningRate 0.0793   Epoch: 2   Global Step: 11080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:59:57,440-Speed 5651.23 samples/sec   Loss 10.8795   LearningRate 0.0793   Epoch: 2   Global Step: 11090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 10:59:59,273-Speed 5587.65 samples/sec   Loss 10.8208   LearningRate 0.0793   Epoch: 2   Global Step: 11100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:00:01,098-Speed 5612.43 samples/sec   Loss 10.8442   LearningRate 0.0792   Epoch: 2   Global Step: 11110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:00:02,950-Speed 5532.43 samples/sec   Loss 10.9444   LearningRate 0.0792   Epoch: 2   Global Step: 11120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:00:04,764-Speed 5647.99 samples/sec   Loss 10.9898   LearningRate 0.0792   Epoch: 2   Global Step: 11130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:00:06,586-Speed 5622.83 samples/sec   Loss 10.8858   LearningRate 0.0792   Epoch: 2   Global Step: 11140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:00:08,415-Speed 5603.12 samples/sec   Loss 10.9190   LearningRate 0.0792   Epoch: 2   Global Step: 11150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:00:10,228-Speed 5649.88 samples/sec   Loss 10.9349   LearningRate 0.0792   Epoch: 2   Global Step: 11160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:00:12,060-Speed 5594.64 samples/sec   Loss 10.8825   LearningRate 0.0791   Epoch: 2   Global Step: 11170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:13,903-Speed 5556.74 samples/sec   Loss 10.8739   LearningRate 0.0791   Epoch: 2   Global Step: 11180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:15,717-Speed 5650.09 samples/sec   Loss 10.8690   LearningRate 0.0791   Epoch: 2   Global Step: 11190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:17,575-Speed 5512.69 samples/sec   Loss 10.8599   LearningRate 0.0791   Epoch: 2   Global Step: 11200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:19,398-Speed 5622.21 samples/sec   Loss 10.9907   LearningRate 0.0791   Epoch: 2   Global Step: 11210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:21,204-Speed 5669.87 samples/sec   Loss 10.8815   LearningRate 0.0790   Epoch: 2   Global Step: 11220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:23,018-Speed 5650.64 samples/sec   Loss 10.8990   LearningRate 0.0790   Epoch: 2   Global Step: 11230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:24,876-Speed 5514.93 samples/sec   Loss 10.8905   LearningRate 0.0790   Epoch: 2   Global Step: 11240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:26,687-Speed 5657.17 samples/sec   Loss 10.9895   LearningRate 0.0790   Epoch: 2   Global Step: 11250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:28,556-Speed 5483.14 samples/sec   Loss 10.9642   LearningRate 0.0790   Epoch: 2   Global Step: 11260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:30,365-Speed 5663.12 samples/sec   Loss 11.0745   LearningRate 0.0790   Epoch: 2   Global Step: 11270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:32,199-Speed 5585.12 samples/sec   Loss 10.8470   LearningRate 0.0789   Epoch: 2   Global Step: 11280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:34,009-Speed 5659.67 samples/sec   Loss 10.9754   LearningRate 0.0789   Epoch: 2   Global Step: 11290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:35,815-Speed 5675.08 samples/sec   Loss 10.8640   LearningRate 0.0789   Epoch: 2   Global Step: 11300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:37,678-Speed 5499.15 samples/sec   Loss 10.8083   LearningRate 0.0789   Epoch: 2   Global Step: 11310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:39,498-Speed 5627.29 samples/sec   Loss 10.8530   LearningRate 0.0789   Epoch: 2   Global Step: 11320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:41,363-Speed 5495.12 samples/sec   Loss 10.8023   LearningRate 0.0789   Epoch: 2   Global Step: 11330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:43,187-Speed 5616.56 samples/sec   Loss 11.0187   LearningRate 0.0788   Epoch: 2   Global Step: 11340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:45,042-Speed 5523.35 samples/sec   Loss 10.9112   LearningRate 0.0788   Epoch: 2   Global Step: 11350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:46,864-Speed 5621.61 samples/sec   Loss 10.8199   LearningRate 0.0788   Epoch: 2   Global Step: 11360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:48,683-Speed 5635.12 samples/sec   Loss 10.7629   LearningRate 0.0788   Epoch: 2   Global Step: 11370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:50,506-Speed 5622.04 samples/sec   Loss 10.7965   LearningRate 0.0788   Epoch: 2   Global Step: 11380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:52,316-Speed 5661.18 samples/sec   Loss 11.0544   LearningRate 0.0787   Epoch: 2   Global Step: 11390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:54,184-Speed 5485.12 samples/sec   Loss 11.0171   LearningRate 0.0787   Epoch: 2   Global Step: 11400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:56,003-Speed 5633.71 samples/sec   Loss 10.9338   LearningRate 0.0787   Epoch: 2   Global Step: 11410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:57,808-Speed 5677.21 samples/sec   Loss 11.0658   LearningRate 0.0787   Epoch: 2   Global Step: 11420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:00:59,660-Speed 5532.21 samples/sec   Loss 10.8703   LearningRate 0.0787   Epoch: 2   Global Step: 11430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:01,471-Speed 5658.60 samples/sec   Loss 10.7047   LearningRate 0.0787   Epoch: 2   Global Step: 11440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:03,301-Speed 5597.33 samples/sec   Loss 11.0785   LearningRate 0.0786   Epoch: 2   Global Step: 11450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:05,132-Speed 5596.82 samples/sec   Loss 10.9350   LearningRate 0.0786   Epoch: 2   Global Step: 11460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:06,978-Speed 5550.51 samples/sec   Loss 10.7847   LearningRate 0.0786   Epoch: 2   Global Step: 11470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:08,782-Speed 5679.15 samples/sec   Loss 11.0867   LearningRate 0.0786   Epoch: 2   Global Step: 11480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:10,596-Speed 5649.68 samples/sec   Loss 10.9988   LearningRate 0.0786   Epoch: 2   Global Step: 11490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:12,447-Speed 5535.17 samples/sec   Loss 10.9107   LearningRate 0.0786   Epoch: 2   Global Step: 11500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:14,263-Speed 5641.91 samples/sec   Loss 11.0429   LearningRate 0.0785   Epoch: 2   Global Step: 11510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:16,108-Speed 5552.01 samples/sec   Loss 10.9685   LearningRate 0.0785   Epoch: 2   Global Step: 11520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:01:17,920-Speed 5655.28 samples/sec   Loss 10.7218   LearningRate 0.0785   Epoch: 2   Global Step: 11530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:01:19,788-Speed 5483.64 samples/sec   Loss 10.8217   LearningRate 0.0785   Epoch: 2   Global Step: 11540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:01:21,592-Speed 5679.84 samples/sec   Loss 10.7811   LearningRate 0.0785   Epoch: 2   Global Step: 11550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:01:23,397-Speed 5676.50 samples/sec   Loss 10.7625   LearningRate 0.0785   Epoch: 2   Global Step: 11560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:01:25,228-Speed 5594.87 samples/sec   Loss 10.7891   LearningRate 0.0784   Epoch: 2   Global Step: 11570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:01:27,032-Speed 5679.83 samples/sec   Loss 10.7629   LearningRate 0.0784   Epoch: 2   Global Step: 11580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:01:28,842-Speed 5659.22 samples/sec   Loss 10.7186   LearningRate 0.0784   Epoch: 2   Global Step: 11590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:01:30,691-Speed 5541.67 samples/sec   Loss 10.9326   LearningRate 0.0784   Epoch: 2   Global Step: 11600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:01:32,505-Speed 5647.86 samples/sec   Loss 10.9604   LearningRate 0.0784   Epoch: 2   Global Step: 11610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 11:01:34,315-Speed 5659.41 samples/sec   Loss 10.7845   LearningRate 0.0783   Epoch: 2   Global Step: 11620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:36,151-Speed 5579.54 samples/sec   Loss 10.8868   LearningRate 0.0783   Epoch: 2   Global Step: 11630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:37,959-Speed 5665.99 samples/sec   Loss 10.7978   LearningRate 0.0783   Epoch: 2   Global Step: 11640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:39,763-Speed 5681.46 samples/sec   Loss 10.8024   LearningRate 0.0783   Epoch: 2   Global Step: 11650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:41,610-Speed 5545.66 samples/sec   Loss 10.8556   LearningRate 0.0783   Epoch: 2   Global Step: 11660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:43,416-Speed 5671.12 samples/sec   Loss 10.7521   LearningRate 0.0783   Epoch: 2   Global Step: 11670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:45,235-Speed 5631.70 samples/sec   Loss 10.7035   LearningRate 0.0782   Epoch: 2   Global Step: 11680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:47,067-Speed 5593.93 samples/sec   Loss 10.8777   LearningRate 0.0782   Epoch: 2   Global Step: 11690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:48,881-Speed 5648.66 samples/sec   Loss 10.9212   LearningRate 0.0782   Epoch: 2   Global Step: 11700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:50,683-Speed 5684.99 samples/sec   Loss 10.9078   LearningRate 0.0782   Epoch: 2   Global Step: 11710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:52,497-Speed 5646.69 samples/sec   Loss 10.9283   LearningRate 0.0782   Epoch: 2   Global Step: 11720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:01:54,324-Speed 5606.75 samples/sec   Loss 10.8292   LearningRate 0.0782   Epoch: 2   Global Step: 11730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:01:56,135-Speed 5655.59 samples/sec   Loss 10.7382   LearningRate 0.0781   Epoch: 2   Global Step: 11740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:01:57,961-Speed 5611.10 samples/sec   Loss 10.9490   LearningRate 0.0781   Epoch: 2   Global Step: 11750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:01:59,777-Speed 5641.02 samples/sec   Loss 10.8564   LearningRate 0.0781   Epoch: 2   Global Step: 11760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:02:01,591-Speed 5658.19 samples/sec   Loss 10.7119   LearningRate 0.0781   Epoch: 2   Global Step: 11770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:02:03,423-Speed 5590.96 samples/sec   Loss 10.6929   LearningRate 0.0781   Epoch: 2   Global Step: 11780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:02:05,241-Speed 5635.83 samples/sec   Loss 10.8463   LearningRate 0.0780   Epoch: 2   Global Step: 11790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:02:07,048-Speed 5669.08 samples/sec   Loss 10.7058   LearningRate 0.0780   Epoch: 2   Global Step: 11800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:02:08,886-Speed 5573.25 samples/sec   Loss 10.9077   LearningRate 0.0780   Epoch: 2   Global Step: 11810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:02:10,691-Speed 5675.32 samples/sec   Loss 10.8853   LearningRate 0.0780   Epoch: 2   Global Step: 11820   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 11:02:12,488-Speed 5699.20 samples/sec   Loss 10.9163   LearningRate 0.0780   Epoch: 2   Global Step: 11830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:02:14,326-Speed 5574.06 samples/sec   Loss 10.7939   LearningRate 0.0780   Epoch: 2   Global Step: 11840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:02:16,141-Speed 5647.93 samples/sec   Loss 10.9000   LearningRate 0.0779   Epoch: 2   Global Step: 11850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:02:18,000-Speed 5513.90 samples/sec   Loss 10.7967   LearningRate 0.0779   Epoch: 2   Global Step: 11860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:02:19,808-Speed 5665.63 samples/sec   Loss 10.9241   LearningRate 0.0779   Epoch: 2   Global Step: 11870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:02:21,630-Speed 5623.63 samples/sec   Loss 10.6826   LearningRate 0.0779   Epoch: 2   Global Step: 11880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:02:23,439-Speed 5660.65 samples/sec   Loss 10.7957   LearningRate 0.0779   Epoch: 2   Global Step: 11890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:02:25,269-Speed 5598.95 samples/sec   Loss 10.9247   LearningRate 0.0779   Epoch: 2   Global Step: 11900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:02:27,084-Speed 5645.58 samples/sec   Loss 10.7837   LearningRate 0.0778   Epoch: 2   Global Step: 11910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:02:28,894-Speed 5658.44 samples/sec   Loss 10.8484   LearningRate 0.0778   Epoch: 2   Global Step: 11920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:02:30,708-Speed 5649.35 samples/sec   Loss 10.8752   LearningRate 0.0778   Epoch: 2   Global Step: 11930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:02:32,578-Speed 5479.16 samples/sec   Loss 10.6911   LearningRate 0.0778   Epoch: 2   Global Step: 11940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:02:34,390-Speed 5652.01 samples/sec   Loss 10.6911   LearningRate 0.0778   Epoch: 2   Global Step: 11950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:02:36,234-Speed 5561.61 samples/sec   Loss 10.5788   LearningRate 0.0778   Epoch: 2   Global Step: 11960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:02:38,070-Speed 5582.61 samples/sec   Loss 10.9517   LearningRate 0.0777   Epoch: 2   Global Step: 11970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:02:39,912-Speed 5563.21 samples/sec   Loss 10.6974   LearningRate 0.0777   Epoch: 2   Global Step: 11980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:02:41,731-Speed 5631.24 samples/sec   Loss 10.7780   LearningRate 0.0777   Epoch: 2   Global Step: 11990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:02:43,574-Speed 5560.59 samples/sec   Loss 10.8642   LearningRate 0.0777   Epoch: 2   Global Step: 12000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:03:10,848-[lfw][12000]XNorm: 21.607418
Training: 2022-04-11 11:03:10,849-[lfw][12000]Accuracy-Flip: 0.99667+-0.00289
Training: 2022-04-11 11:03:10,850-[lfw][12000]Accuracy-Highest: 0.99667
Training: 2022-04-11 11:03:42,388-[cfp_fp][12000]XNorm: 18.734406
Training: 2022-04-11 11:03:42,389-[cfp_fp][12000]Accuracy-Flip: 0.91457+-0.01385
Training: 2022-04-11 11:03:42,390-[cfp_fp][12000]Accuracy-Highest: 0.92457
Training: 2022-04-11 11:04:09,618-[agedb_30][12000]XNorm: 21.070338
Training: 2022-04-11 11:04:09,619-[agedb_30][12000]Accuracy-Flip: 0.96283+-0.00949
Training: 2022-04-11 11:04:09,620-[agedb_30][12000]Accuracy-Highest: 0.96283
Training: 2022-04-11 11:04:11,463-Speed 116.51 samples/sec   Loss 10.5584   LearningRate 0.0777   Epoch: 2   Global Step: 12010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:13,292-Speed 5601.12 samples/sec   Loss 10.7694   LearningRate 0.0776   Epoch: 2   Global Step: 12020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:15,111-Speed 5632.40 samples/sec   Loss 10.7634   LearningRate 0.0776   Epoch: 2   Global Step: 12030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:16,959-Speed 5544.97 samples/sec   Loss 10.7593   LearningRate 0.0776   Epoch: 2   Global Step: 12040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:18,808-Speed 5542.37 samples/sec   Loss 10.7825   LearningRate 0.0776   Epoch: 2   Global Step: 12050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:20,612-Speed 5681.39 samples/sec   Loss 10.7645   LearningRate 0.0776   Epoch: 2   Global Step: 12060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:22,414-Speed 5684.52 samples/sec   Loss 10.9004   LearningRate 0.0776   Epoch: 2   Global Step: 12070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:24,228-Speed 5644.91 samples/sec   Loss 10.8093   LearningRate 0.0775   Epoch: 2   Global Step: 12080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:26,028-Speed 5692.32 samples/sec   Loss 10.8178   LearningRate 0.0775   Epoch: 2   Global Step: 12090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:27,827-Speed 5696.35 samples/sec   Loss 10.8346   LearningRate 0.0775   Epoch: 2   Global Step: 12100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:29,626-Speed 5692.56 samples/sec   Loss 10.7646   LearningRate 0.0775   Epoch: 2   Global Step: 12110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:31,449-Speed 5618.89 samples/sec   Loss 10.7074   LearningRate 0.0775   Epoch: 2   Global Step: 12120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:33,246-Speed 5701.78 samples/sec   Loss 10.8078   LearningRate 0.0775   Epoch: 2   Global Step: 12130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:35,053-Speed 5675.30 samples/sec   Loss 10.7176   LearningRate 0.0774   Epoch: 2   Global Step: 12140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:36,858-Speed 5673.02 samples/sec   Loss 10.8367   LearningRate 0.0774   Epoch: 2   Global Step: 12150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:38,700-Speed 5562.28 samples/sec   Loss 10.7505   LearningRate 0.0774   Epoch: 2   Global Step: 12160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:40,534-Speed 5587.17 samples/sec   Loss 10.7744   LearningRate 0.0774   Epoch: 2   Global Step: 12170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:42,388-Speed 5525.89 samples/sec   Loss 10.8297   LearningRate 0.0774   Epoch: 2   Global Step: 12180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:44,216-Speed 5605.15 samples/sec   Loss 10.8789   LearningRate 0.0774   Epoch: 2   Global Step: 12190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:46,017-Speed 5687.14 samples/sec   Loss 10.8403   LearningRate 0.0773   Epoch: 2   Global Step: 12200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:47,857-Speed 5568.63 samples/sec   Loss 10.8414   LearningRate 0.0773   Epoch: 2   Global Step: 12210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:49,661-Speed 5680.64 samples/sec   Loss 10.6280   LearningRate 0.0773   Epoch: 2   Global Step: 12220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:51,466-Speed 5675.08 samples/sec   Loss 10.7909   LearningRate 0.0773   Epoch: 2   Global Step: 12230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:53,304-Speed 5573.08 samples/sec   Loss 10.8833   LearningRate 0.0773   Epoch: 2   Global Step: 12240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:55,111-Speed 5668.91 samples/sec   Loss 10.8863   LearningRate 0.0772   Epoch: 2   Global Step: 12250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:56,927-Speed 5642.22 samples/sec   Loss 10.7653   LearningRate 0.0772   Epoch: 2   Global Step: 12260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:04:58,772-Speed 5553.84 samples/sec   Loss 10.8695   LearningRate 0.0772   Epoch: 2   Global Step: 12270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:00,617-Speed 5552.13 samples/sec   Loss 10.6295   LearningRate 0.0772   Epoch: 2   Global Step: 12280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:02,502-Speed 5434.05 samples/sec   Loss 10.7332   LearningRate 0.0772   Epoch: 2   Global Step: 12290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:04,309-Speed 5670.58 samples/sec   Loss 10.7263   LearningRate 0.0772   Epoch: 2   Global Step: 12300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:06,166-Speed 5518.26 samples/sec   Loss 10.8730   LearningRate 0.0771   Epoch: 2   Global Step: 12310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:07,976-Speed 5660.99 samples/sec   Loss 10.8595   LearningRate 0.0771   Epoch: 2   Global Step: 12320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:09,845-Speed 5481.50 samples/sec   Loss 10.7656   LearningRate 0.0771   Epoch: 2   Global Step: 12330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:11,679-Speed 5586.54 samples/sec   Loss 10.6923   LearningRate 0.0771   Epoch: 2   Global Step: 12340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:13,515-Speed 5579.82 samples/sec   Loss 10.5370   LearningRate 0.0771   Epoch: 2   Global Step: 12350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:15,386-Speed 5473.69 samples/sec   Loss 10.6060   LearningRate 0.0771   Epoch: 2   Global Step: 12360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:17,218-Speed 5593.43 samples/sec   Loss 10.6867   LearningRate 0.0770   Epoch: 2   Global Step: 12370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:19,036-Speed 5637.59 samples/sec   Loss 10.7949   LearningRate 0.0770   Epoch: 2   Global Step: 12380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:20,875-Speed 5568.03 samples/sec   Loss 10.6406   LearningRate 0.0770   Epoch: 2   Global Step: 12390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:22,697-Speed 5622.59 samples/sec   Loss 10.6963   LearningRate 0.0770   Epoch: 2   Global Step: 12400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:24,502-Speed 5676.99 samples/sec   Loss 10.7042   LearningRate 0.0770   Epoch: 2   Global Step: 12410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:26,373-Speed 5477.35 samples/sec   Loss 10.7649   LearningRate 0.0770   Epoch: 2   Global Step: 12420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:28,187-Speed 5647.06 samples/sec   Loss 10.6857   LearningRate 0.0769   Epoch: 2   Global Step: 12430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:30,048-Speed 5506.62 samples/sec   Loss 10.7394   LearningRate 0.0769   Epoch: 2   Global Step: 12440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:31,884-Speed 5578.19 samples/sec   Loss 10.7291   LearningRate 0.0769   Epoch: 2   Global Step: 12450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:33,694-Speed 5662.34 samples/sec   Loss 10.5138   LearningRate 0.0769   Epoch: 2   Global Step: 12460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:35,539-Speed 5554.29 samples/sec   Loss 10.5487   LearningRate 0.0769   Epoch: 2   Global Step: 12470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:37,354-Speed 5645.44 samples/sec   Loss 10.7627   LearningRate 0.0768   Epoch: 2   Global Step: 12480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:39,168-Speed 5647.99 samples/sec   Loss 10.5869   LearningRate 0.0768   Epoch: 2   Global Step: 12490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:40,999-Speed 5595.18 samples/sec   Loss 10.5757   LearningRate 0.0768   Epoch: 2   Global Step: 12500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:42,813-Speed 5649.26 samples/sec   Loss 10.6799   LearningRate 0.0768   Epoch: 2   Global Step: 12510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:44,679-Speed 5490.79 samples/sec   Loss 10.6053   LearningRate 0.0768   Epoch: 2   Global Step: 12520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:46,490-Speed 5655.34 samples/sec   Loss 10.9026   LearningRate 0.0768   Epoch: 2   Global Step: 12530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:48,307-Speed 5639.66 samples/sec   Loss 10.7052   LearningRate 0.0767   Epoch: 2   Global Step: 12540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:50,147-Speed 5568.33 samples/sec   Loss 10.5894   LearningRate 0.0767   Epoch: 2   Global Step: 12550   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 11:05:51,986-Speed 5569.53 samples/sec   Loss 10.8885   LearningRate 0.0767   Epoch: 2   Global Step: 12560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:53,810-Speed 5617.92 samples/sec   Loss 10.6221   LearningRate 0.0767   Epoch: 2   Global Step: 12570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:55,615-Speed 5678.00 samples/sec   Loss 10.5591   LearningRate 0.0767   Epoch: 2   Global Step: 12580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:57,452-Speed 5574.60 samples/sec   Loss 10.5346   LearningRate 0.0767   Epoch: 2   Global Step: 12590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:05:59,262-Speed 5663.39 samples/sec   Loss 10.6327   LearningRate 0.0766   Epoch: 2   Global Step: 12600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:01,111-Speed 5539.30 samples/sec   Loss 10.5079   LearningRate 0.0766   Epoch: 2   Global Step: 12610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:02,913-Speed 5687.38 samples/sec   Loss 10.4161   LearningRate 0.0766   Epoch: 2   Global Step: 12620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:04,778-Speed 5492.30 samples/sec   Loss 10.6621   LearningRate 0.0766   Epoch: 2   Global Step: 12630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:06,598-Speed 5633.37 samples/sec   Loss 10.5926   LearningRate 0.0766   Epoch: 2   Global Step: 12640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:08,417-Speed 5632.05 samples/sec   Loss 10.7041   LearningRate 0.0766   Epoch: 2   Global Step: 12650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:10,220-Speed 5684.44 samples/sec   Loss 10.6198   LearningRate 0.0765   Epoch: 2   Global Step: 12660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:12,056-Speed 5579.06 samples/sec   Loss 10.4871   LearningRate 0.0765   Epoch: 2   Global Step: 12670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:13,887-Speed 5596.86 samples/sec   Loss 10.6369   LearningRate 0.0765   Epoch: 2   Global Step: 12680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:15,717-Speed 5596.43 samples/sec   Loss 10.5819   LearningRate 0.0765   Epoch: 2   Global Step: 12690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:17,541-Speed 5617.65 samples/sec   Loss 10.7154   LearningRate 0.0765   Epoch: 2   Global Step: 12700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:19,377-Speed 5578.84 samples/sec   Loss 10.6576   LearningRate 0.0765   Epoch: 2   Global Step: 12710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:21,225-Speed 5546.83 samples/sec   Loss 10.5620   LearningRate 0.0764   Epoch: 2   Global Step: 12720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:23,032-Speed 5671.86 samples/sec   Loss 10.5055   LearningRate 0.0764   Epoch: 2   Global Step: 12730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:24,886-Speed 5524.96 samples/sec   Loss 10.4842   LearningRate 0.0764   Epoch: 2   Global Step: 12740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:26,707-Speed 5624.81 samples/sec   Loss 10.5609   LearningRate 0.0764   Epoch: 2   Global Step: 12750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:28,600-Speed 5411.99 samples/sec   Loss 10.7339   LearningRate 0.0764   Epoch: 2   Global Step: 12760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:30,447-Speed 5547.09 samples/sec   Loss 10.6683   LearningRate 0.0763   Epoch: 2   Global Step: 12770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:32,275-Speed 5607.03 samples/sec   Loss 10.6234   LearningRate 0.0763   Epoch: 2   Global Step: 12780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:34,147-Speed 5472.39 samples/sec   Loss 10.4997   LearningRate 0.0763   Epoch: 2   Global Step: 12790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:35,950-Speed 5685.64 samples/sec   Loss 10.6002   LearningRate 0.0763   Epoch: 2   Global Step: 12800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:37,755-Speed 5674.07 samples/sec   Loss 10.6474   LearningRate 0.0763   Epoch: 2   Global Step: 12810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:39,623-Speed 5485.79 samples/sec   Loss 10.4996   LearningRate 0.0763   Epoch: 2   Global Step: 12820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:41,464-Speed 5567.16 samples/sec   Loss 10.5925   LearningRate 0.0762   Epoch: 2   Global Step: 12830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:43,313-Speed 5543.07 samples/sec   Loss 10.6412   LearningRate 0.0762   Epoch: 2   Global Step: 12840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:45,143-Speed 5599.43 samples/sec   Loss 10.6881   LearningRate 0.0762   Epoch: 2   Global Step: 12850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:46,979-Speed 5579.50 samples/sec   Loss 10.6771   LearningRate 0.0762   Epoch: 2   Global Step: 12860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:48,839-Speed 5508.06 samples/sec   Loss 10.5116   LearningRate 0.0762   Epoch: 2   Global Step: 12870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:50,654-Speed 5643.82 samples/sec   Loss 10.5456   LearningRate 0.0762   Epoch: 2   Global Step: 12880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:52,495-Speed 5565.02 samples/sec   Loss 10.6008   LearningRate 0.0761   Epoch: 2   Global Step: 12890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:54,333-Speed 5576.37 samples/sec   Loss 10.2828   LearningRate 0.0761   Epoch: 2   Global Step: 12900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:56,153-Speed 5627.82 samples/sec   Loss 10.6359   LearningRate 0.0761   Epoch: 2   Global Step: 12910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:57,989-Speed 5582.18 samples/sec   Loss 10.6319   LearningRate 0.0761   Epoch: 2   Global Step: 12920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:06:59,811-Speed 5623.27 samples/sec   Loss 10.6139   LearningRate 0.0761   Epoch: 2   Global Step: 12930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:07:01,667-Speed 5518.15 samples/sec   Loss 10.5523   LearningRate 0.0761   Epoch: 2   Global Step: 12940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:03,562-Speed 5408.55 samples/sec   Loss 10.5389   LearningRate 0.0760   Epoch: 2   Global Step: 12950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:05,373-Speed 5656.27 samples/sec   Loss 10.4496   LearningRate 0.0760   Epoch: 2   Global Step: 12960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:07,193-Speed 5631.76 samples/sec   Loss 10.5996   LearningRate 0.0760   Epoch: 2   Global Step: 12970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:09,017-Speed 5614.87 samples/sec   Loss 10.4378   LearningRate 0.0760   Epoch: 2   Global Step: 12980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:10,844-Speed 5610.64 samples/sec   Loss 10.6668   LearningRate 0.0760   Epoch: 2   Global Step: 12990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:12,696-Speed 5531.22 samples/sec   Loss 10.3875   LearningRate 0.0759   Epoch: 2   Global Step: 13000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:14,509-Speed 5653.34 samples/sec   Loss 10.5119   LearningRate 0.0759   Epoch: 2   Global Step: 13010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:16,341-Speed 5591.75 samples/sec   Loss 10.6711   LearningRate 0.0759   Epoch: 2   Global Step: 13020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:18,187-Speed 5550.19 samples/sec   Loss 10.6379   LearningRate 0.0759   Epoch: 2   Global Step: 13030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:20,005-Speed 5636.47 samples/sec   Loss 10.5415   LearningRate 0.0759   Epoch: 2   Global Step: 13040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:21,830-Speed 5614.17 samples/sec   Loss 10.5231   LearningRate 0.0759   Epoch: 2   Global Step: 13050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:23,655-Speed 5614.72 samples/sec   Loss 10.6517   LearningRate 0.0758   Epoch: 2   Global Step: 13060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:25,478-Speed 5619.11 samples/sec   Loss 10.5238   LearningRate 0.0758   Epoch: 2   Global Step: 13070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:27,283-Speed 5678.15 samples/sec   Loss 10.7000   LearningRate 0.0758   Epoch: 2   Global Step: 13080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:29,121-Speed 5573.42 samples/sec   Loss 10.5555   LearningRate 0.0758   Epoch: 2   Global Step: 13090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:30,941-Speed 5631.49 samples/sec   Loss 10.5287   LearningRate 0.0758   Epoch: 2   Global Step: 13100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:32,794-Speed 5532.15 samples/sec   Loss 10.5723   LearningRate 0.0758   Epoch: 2   Global Step: 13110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:34,613-Speed 5629.03 samples/sec   Loss 10.6088   LearningRate 0.0757   Epoch: 2   Global Step: 13120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:36,425-Speed 5655.35 samples/sec   Loss 10.4620   LearningRate 0.0757   Epoch: 2   Global Step: 13130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:38,266-Speed 5564.04 samples/sec   Loss 10.3533   LearningRate 0.0757   Epoch: 2   Global Step: 13140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:40,111-Speed 5554.17 samples/sec   Loss 10.4939   LearningRate 0.0757   Epoch: 2   Global Step: 13150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:41,908-Speed 5700.70 samples/sec   Loss 10.4757   LearningRate 0.0757   Epoch: 2   Global Step: 13160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:43,720-Speed 5655.43 samples/sec   Loss 10.5212   LearningRate 0.0757   Epoch: 2   Global Step: 13170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:45,533-Speed 5649.47 samples/sec   Loss 10.4663   LearningRate 0.0756   Epoch: 2   Global Step: 13180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:47,352-Speed 5633.29 samples/sec   Loss 10.4633   LearningRate 0.0756   Epoch: 2   Global Step: 13190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:49,178-Speed 5607.83 samples/sec   Loss 10.6121   LearningRate 0.0756   Epoch: 2   Global Step: 13200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:51,049-Speed 5475.35 samples/sec   Loss 10.7067   LearningRate 0.0756   Epoch: 2   Global Step: 13210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:52,866-Speed 5640.14 samples/sec   Loss 10.3911   LearningRate 0.0756   Epoch: 2   Global Step: 13220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:54,669-Speed 5681.84 samples/sec   Loss 10.4250   LearningRate 0.0756   Epoch: 2   Global Step: 13230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:56,471-Speed 5684.03 samples/sec   Loss 10.6248   LearningRate 0.0755   Epoch: 2   Global Step: 13240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:07:58,292-Speed 5625.24 samples/sec   Loss 10.6641   LearningRate 0.0755   Epoch: 2   Global Step: 13250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:00,107-Speed 5644.14 samples/sec   Loss 10.5558   LearningRate 0.0755   Epoch: 2   Global Step: 13260   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 11:08:01,927-Speed 5628.95 samples/sec   Loss 10.5418   LearningRate 0.0755   Epoch: 2   Global Step: 13270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:03,753-Speed 5609.83 samples/sec   Loss 10.5623   LearningRate 0.0755   Epoch: 2   Global Step: 13280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:05,563-Speed 5661.71 samples/sec   Loss 10.4062   LearningRate 0.0755   Epoch: 2   Global Step: 13290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:07,447-Speed 5436.11 samples/sec   Loss 10.5161   LearningRate 0.0754   Epoch: 2   Global Step: 13300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:09,311-Speed 5497.84 samples/sec   Loss 10.5949   LearningRate 0.0754   Epoch: 2   Global Step: 13310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:11,128-Speed 5640.03 samples/sec   Loss 10.4365   LearningRate 0.0754   Epoch: 2   Global Step: 13320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:12,954-Speed 5609.24 samples/sec   Loss 10.6075   LearningRate 0.0754   Epoch: 2   Global Step: 13330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:14,770-Speed 5643.28 samples/sec   Loss 10.5141   LearningRate 0.0754   Epoch: 2   Global Step: 13340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:16,611-Speed 5563.01 samples/sec   Loss 10.6235   LearningRate 0.0753   Epoch: 2   Global Step: 13350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:18,424-Speed 5651.52 samples/sec   Loss 10.4983   LearningRate 0.0753   Epoch: 2   Global Step: 13360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:20,259-Speed 5585.17 samples/sec   Loss 10.3953   LearningRate 0.0753   Epoch: 2   Global Step: 13370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:22,068-Speed 5664.11 samples/sec   Loss 10.5441   LearningRate 0.0753   Epoch: 2   Global Step: 13380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:23,909-Speed 5564.66 samples/sec   Loss 10.4686   LearningRate 0.0753   Epoch: 2   Global Step: 13390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:25,726-Speed 5637.36 samples/sec   Loss 10.5450   LearningRate 0.0753   Epoch: 2   Global Step: 13400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:27,565-Speed 5571.01 samples/sec   Loss 10.3557   LearningRate 0.0752   Epoch: 2   Global Step: 13410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:29,388-Speed 5621.15 samples/sec   Loss 10.4981   LearningRate 0.0752   Epoch: 2   Global Step: 13420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:31,214-Speed 5615.91 samples/sec   Loss 10.5017   LearningRate 0.0752   Epoch: 2   Global Step: 13430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:33,020-Speed 5675.26 samples/sec   Loss 10.5129   LearningRate 0.0752   Epoch: 2   Global Step: 13440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:34,880-Speed 5506.23 samples/sec   Loss 10.5099   LearningRate 0.0752   Epoch: 2   Global Step: 13450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:36,711-Speed 5596.63 samples/sec   Loss 10.3272   LearningRate 0.0752   Epoch: 2   Global Step: 13460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:38,578-Speed 5485.94 samples/sec   Loss 10.5943   LearningRate 0.0751   Epoch: 2   Global Step: 13470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:40,408-Speed 5598.40 samples/sec   Loss 10.4270   LearningRate 0.0751   Epoch: 2   Global Step: 13480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:42,223-Speed 5647.19 samples/sec   Loss 10.4879   LearningRate 0.0751   Epoch: 2   Global Step: 13490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:44,066-Speed 5559.56 samples/sec   Loss 10.4478   LearningRate 0.0751   Epoch: 2   Global Step: 13500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:45,881-Speed 5643.61 samples/sec   Loss 10.5355   LearningRate 0.0751   Epoch: 2   Global Step: 13510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:47,720-Speed 5571.21 samples/sec   Loss 10.5185   LearningRate 0.0751   Epoch: 2   Global Step: 13520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:49,605-Speed 5434.07 samples/sec   Loss 10.5718   LearningRate 0.0750   Epoch: 2   Global Step: 13530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:51,418-Speed 5651.27 samples/sec   Loss 10.2559   LearningRate 0.0750   Epoch: 2   Global Step: 13540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:53,235-Speed 5640.32 samples/sec   Loss 10.4348   LearningRate 0.0750   Epoch: 2   Global Step: 13550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:55,060-Speed 5612.90 samples/sec   Loss 10.4105   LearningRate 0.0750   Epoch: 2   Global Step: 13560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:56,861-Speed 5689.62 samples/sec   Loss 10.3556   LearningRate 0.0750   Epoch: 2   Global Step: 13570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:08:58,666-Speed 5672.47 samples/sec   Loss 10.4080   LearningRate 0.0750   Epoch: 2   Global Step: 13580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:09:00,476-Speed 5660.50 samples/sec   Loss 10.3780   LearningRate 0.0749   Epoch: 2   Global Step: 13590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:09:02,302-Speed 5612.43 samples/sec   Loss 10.3694   LearningRate 0.0749   Epoch: 2   Global Step: 13600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:09:04,119-Speed 5638.72 samples/sec   Loss 10.2348   LearningRate 0.0749   Epoch: 2   Global Step: 13610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:09:05,958-Speed 5569.82 samples/sec   Loss 10.2489   LearningRate 0.0749   Epoch: 2   Global Step: 13620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:09:07,752-Speed 5710.28 samples/sec   Loss 10.2977   LearningRate 0.0749   Epoch: 2   Global Step: 13630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:09,621-Speed 5482.20 samples/sec   Loss 10.4979   LearningRate 0.0749   Epoch: 2   Global Step: 13640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:11,436-Speed 5643.42 samples/sec   Loss 10.5157   LearningRate 0.0748   Epoch: 2   Global Step: 13650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:13,273-Speed 5576.92 samples/sec   Loss 10.4931   LearningRate 0.0748   Epoch: 2   Global Step: 13660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:15,086-Speed 5651.46 samples/sec   Loss 10.4747   LearningRate 0.0748   Epoch: 2   Global Step: 13670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:16,913-Speed 5608.26 samples/sec   Loss 10.4801   LearningRate 0.0748   Epoch: 2   Global Step: 13680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:18,745-Speed 5591.01 samples/sec   Loss 10.4308   LearningRate 0.0748   Epoch: 2   Global Step: 13690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:20,563-Speed 5633.45 samples/sec   Loss 10.3758   LearningRate 0.0747   Epoch: 2   Global Step: 13700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:22,396-Speed 5589.32 samples/sec   Loss 10.4958   LearningRate 0.0747   Epoch: 2   Global Step: 13710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:24,203-Speed 5670.76 samples/sec   Loss 10.6315   LearningRate 0.0747   Epoch: 2   Global Step: 13720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:26,025-Speed 5623.90 samples/sec   Loss 10.3470   LearningRate 0.0747   Epoch: 2   Global Step: 13730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:09:27,861-Speed 5580.26 samples/sec   Loss 10.4949   LearningRate 0.0747   Epoch: 2   Global Step: 13740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:09:29,679-Speed 5635.81 samples/sec   Loss 10.4033   LearningRate 0.0747   Epoch: 2   Global Step: 13750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:09:31,501-Speed 5623.40 samples/sec   Loss 10.4730   LearningRate 0.0746   Epoch: 2   Global Step: 13760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:09:33,358-Speed 5517.20 samples/sec   Loss 10.2792   LearningRate 0.0746   Epoch: 2   Global Step: 13770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:09:35,162-Speed 5679.93 samples/sec   Loss 10.3923   LearningRate 0.0746   Epoch: 2   Global Step: 13780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:09:36,964-Speed 5683.82 samples/sec   Loss 10.2999   LearningRate 0.0746   Epoch: 2   Global Step: 13790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:38,805-Speed 5566.76 samples/sec   Loss 10.5138   LearningRate 0.0746   Epoch: 2   Global Step: 13800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:40,627-Speed 5623.40 samples/sec   Loss 10.3753   LearningRate 0.0746   Epoch: 2   Global Step: 13810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:42,444-Speed 5638.41 samples/sec   Loss 10.3568   LearningRate 0.0745   Epoch: 2   Global Step: 13820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:44,288-Speed 5555.78 samples/sec   Loss 10.3638   LearningRate 0.0745   Epoch: 2   Global Step: 13830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:46,125-Speed 5578.06 samples/sec   Loss 10.2708   LearningRate 0.0745   Epoch: 2   Global Step: 13840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:47,956-Speed 5594.46 samples/sec   Loss 10.2901   LearningRate 0.0745   Epoch: 2   Global Step: 13850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:49,786-Speed 5599.77 samples/sec   Loss 10.3214   LearningRate 0.0745   Epoch: 2   Global Step: 13860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:51,672-Speed 5431.49 samples/sec   Loss 10.4779   LearningRate 0.0745   Epoch: 2   Global Step: 13870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:53,487-Speed 5647.72 samples/sec   Loss 10.2223   LearningRate 0.0744   Epoch: 2   Global Step: 13880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:09:55,300-Speed 5648.84 samples/sec   Loss 10.3197   LearningRate 0.0744   Epoch: 2   Global Step: 13890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:09:57,132-Speed 5593.32 samples/sec   Loss 10.2944   LearningRate 0.0744   Epoch: 2   Global Step: 13900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:09:58,940-Speed 5666.95 samples/sec   Loss 10.3363   LearningRate 0.0744   Epoch: 2   Global Step: 13910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:10:00,790-Speed 5537.65 samples/sec   Loss 10.3287   LearningRate 0.0744   Epoch: 2   Global Step: 13920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:10:02,613-Speed 5620.06 samples/sec   Loss 10.1891   LearningRate 0.0744   Epoch: 2   Global Step: 13930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:10:04,441-Speed 5605.14 samples/sec   Loss 10.3477   LearningRate 0.0743   Epoch: 2   Global Step: 13940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:10:06,256-Speed 5646.00 samples/sec   Loss 10.3669   LearningRate 0.0743   Epoch: 2   Global Step: 13950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:10:08,092-Speed 5580.05 samples/sec   Loss 10.4482   LearningRate 0.0743   Epoch: 2   Global Step: 13960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:10:09,912-Speed 5627.53 samples/sec   Loss 10.4291   LearningRate 0.0743   Epoch: 2   Global Step: 13970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:10:11,719-Speed 5671.81 samples/sec   Loss 10.4160   LearningRate 0.0743   Epoch: 2   Global Step: 13980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:10:13,524-Speed 5673.34 samples/sec   Loss 10.3808   LearningRate 0.0743   Epoch: 2   Global Step: 13990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:10:15,349-Speed 5613.27 samples/sec   Loss 10.2403   LearningRate 0.0742   Epoch: 2   Global Step: 14000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:10:42,570-[lfw][14000]XNorm: 21.175048
Training: 2022-04-11 11:10:42,571-[lfw][14000]Accuracy-Flip: 0.99517+-0.00353
Training: 2022-04-11 11:10:42,571-[lfw][14000]Accuracy-Highest: 0.99667
Training: 2022-04-11 11:11:13,888-[cfp_fp][14000]XNorm: 18.010395
Training: 2022-04-11 11:11:13,889-[cfp_fp][14000]Accuracy-Flip: 0.93243+-0.01148
Training: 2022-04-11 11:11:13,890-[cfp_fp][14000]Accuracy-Highest: 0.93243
Training: 2022-04-11 11:11:41,025-[agedb_30][14000]XNorm: 20.614065
Training: 2022-04-11 11:11:41,026-[agedb_30][14000]Accuracy-Flip: 0.96417+-0.01020
Training: 2022-04-11 11:11:41,027-[agedb_30][14000]Accuracy-Highest: 0.96417
Training: 2022-04-11 11:11:42,862-Speed 117.01 samples/sec   Loss 10.4229   LearningRate 0.0742   Epoch: 2   Global Step: 14010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:11:44,659-Speed 5700.70 samples/sec   Loss 10.2819   LearningRate 0.0742   Epoch: 2   Global Step: 14020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:11:46,466-Speed 5668.08 samples/sec   Loss 10.2335   LearningRate 0.0742   Epoch: 2   Global Step: 14030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 11:11:48,303-Speed 5576.98 samples/sec   Loss 10.1398   LearningRate 0.0742   Epoch: 2   Global Step: 14040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:11:50,113-Speed 5660.73 samples/sec   Loss 10.3161   LearningRate 0.0742   Epoch: 2   Global Step: 14050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:11:51,922-Speed 5665.16 samples/sec   Loss 10.3416   LearningRate 0.0741   Epoch: 2   Global Step: 14060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:11:53,743-Speed 5625.40 samples/sec   Loss 10.2850   LearningRate 0.0741   Epoch: 2   Global Step: 14070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:11:55,544-Speed 5688.65 samples/sec   Loss 10.1804   LearningRate 0.0741   Epoch: 2   Global Step: 14080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:11:57,334-Speed 5723.81 samples/sec   Loss 10.4727   LearningRate 0.0741   Epoch: 2   Global Step: 14090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:11:59,176-Speed 5561.23 samples/sec   Loss 10.2654   LearningRate 0.0741   Epoch: 2   Global Step: 14100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:12:00,996-Speed 5629.12 samples/sec   Loss 10.2326   LearningRate 0.0740   Epoch: 2   Global Step: 14110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:12:02,808-Speed 5654.46 samples/sec   Loss 10.3686   LearningRate 0.0740   Epoch: 2   Global Step: 14120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:12:04,653-Speed 5553.13 samples/sec   Loss 10.3205   LearningRate 0.0740   Epoch: 2   Global Step: 14130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:12:06,494-Speed 5565.05 samples/sec   Loss 10.4062   LearningRate 0.0740   Epoch: 2   Global Step: 14140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:08,316-Speed 5623.31 samples/sec   Loss 10.1521   LearningRate 0.0740   Epoch: 2   Global Step: 14150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:10,130-Speed 5645.14 samples/sec   Loss 10.5547   LearningRate 0.0740   Epoch: 2   Global Step: 14160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:11,961-Speed 5598.32 samples/sec   Loss 10.3129   LearningRate 0.0739   Epoch: 2   Global Step: 14170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:13,791-Speed 5596.83 samples/sec   Loss 10.2736   LearningRate 0.0739   Epoch: 2   Global Step: 14180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:15,650-Speed 5512.63 samples/sec   Loss 10.4428   LearningRate 0.0739   Epoch: 2   Global Step: 14190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:17,479-Speed 5601.61 samples/sec   Loss 10.3138   LearningRate 0.0739   Epoch: 2   Global Step: 14200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:19,302-Speed 5620.02 samples/sec   Loss 10.3545   LearningRate 0.0739   Epoch: 2   Global Step: 14210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:21,119-Speed 5638.69 samples/sec   Loss 10.2218   LearningRate 0.0739   Epoch: 2   Global Step: 14220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:22,958-Speed 5574.78 samples/sec   Loss 10.3755   LearningRate 0.0738   Epoch: 2   Global Step: 14230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:24,822-Speed 5495.81 samples/sec   Loss 10.3809   LearningRate 0.0738   Epoch: 2   Global Step: 14240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:12:26,641-Speed 5632.54 samples/sec   Loss 10.2394   LearningRate 0.0738   Epoch: 2   Global Step: 14250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:12:28,462-Speed 5625.16 samples/sec   Loss 10.2333   LearningRate 0.0738   Epoch: 2   Global Step: 14260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:12:30,296-Speed 5587.26 samples/sec   Loss 10.2584   LearningRate 0.0738   Epoch: 2   Global Step: 14270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:12:32,102-Speed 5674.62 samples/sec   Loss 10.2232   LearningRate 0.0738   Epoch: 2   Global Step: 14280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:12:33,937-Speed 5584.64 samples/sec   Loss 10.1995   LearningRate 0.0737   Epoch: 2   Global Step: 14290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:12:35,759-Speed 5623.31 samples/sec   Loss 10.4934   LearningRate 0.0737   Epoch: 2   Global Step: 14300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:12:37,611-Speed 5531.27 samples/sec   Loss 10.3839   LearningRate 0.0737   Epoch: 2   Global Step: 14310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:12:39,411-Speed 5692.10 samples/sec   Loss 10.5791   LearningRate 0.0737   Epoch: 2   Global Step: 14320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:41,217-Speed 5674.60 samples/sec   Loss 10.4217   LearningRate 0.0737   Epoch: 2   Global Step: 14330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:43,038-Speed 5625.31 samples/sec   Loss 10.4859   LearningRate 0.0737   Epoch: 2   Global Step: 14340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:44,845-Speed 5669.63 samples/sec   Loss 10.3963   LearningRate 0.0736   Epoch: 2   Global Step: 14350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:46,658-Speed 5651.91 samples/sec   Loss 10.2601   LearningRate 0.0736   Epoch: 2   Global Step: 14360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:48,528-Speed 5480.90 samples/sec   Loss 10.5168   LearningRate 0.0736   Epoch: 2   Global Step: 14370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:50,336-Speed 5664.18 samples/sec   Loss 10.3266   LearningRate 0.0736   Epoch: 2   Global Step: 14380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:52,152-Speed 5641.20 samples/sec   Loss 10.2659   LearningRate 0.0736   Epoch: 2   Global Step: 14390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:53,982-Speed 5600.83 samples/sec   Loss 10.3944   LearningRate 0.0736   Epoch: 2   Global Step: 14400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:55,787-Speed 5673.84 samples/sec   Loss 10.1857   LearningRate 0.0735   Epoch: 2   Global Step: 14410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:12:57,643-Speed 5521.92 samples/sec   Loss 10.2655   LearningRate 0.0735   Epoch: 2   Global Step: 14420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:12:59,447-Speed 5677.81 samples/sec   Loss 10.5424   LearningRate 0.0735   Epoch: 2   Global Step: 14430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:01,275-Speed 5605.86 samples/sec   Loss 10.3504   LearningRate 0.0735   Epoch: 2   Global Step: 14440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:03,140-Speed 5493.84 samples/sec   Loss 10.3526   LearningRate 0.0735   Epoch: 2   Global Step: 14450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:04,945-Speed 5674.73 samples/sec   Loss 10.2592   LearningRate 0.0735   Epoch: 2   Global Step: 14460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:06,790-Speed 5551.95 samples/sec   Loss 10.3171   LearningRate 0.0734   Epoch: 2   Global Step: 14470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:08,616-Speed 5612.60 samples/sec   Loss 10.1961   LearningRate 0.0734   Epoch: 2   Global Step: 14480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:10,454-Speed 5574.31 samples/sec   Loss 10.4714   LearningRate 0.0734   Epoch: 2   Global Step: 14490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:12,286-Speed 5591.90 samples/sec   Loss 10.3372   LearningRate 0.0734   Epoch: 2   Global Step: 14500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:14,096-Speed 5661.67 samples/sec   Loss 10.4409   LearningRate 0.0734   Epoch: 2   Global Step: 14510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:15,948-Speed 5530.47 samples/sec   Loss 10.1532   LearningRate 0.0734   Epoch: 2   Global Step: 14520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:17,774-Speed 5612.99 samples/sec   Loss 10.3296   LearningRate 0.0733   Epoch: 2   Global Step: 14530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:19,625-Speed 5533.64 samples/sec   Loss 10.2730   LearningRate 0.0733   Epoch: 2   Global Step: 14540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:21,442-Speed 5639.54 samples/sec   Loss 10.2714   LearningRate 0.0733   Epoch: 2   Global Step: 14550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:23,278-Speed 5580.83 samples/sec   Loss 10.3900   LearningRate 0.0733   Epoch: 2   Global Step: 14560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:25,113-Speed 5582.84 samples/sec   Loss 10.3890   LearningRate 0.0733   Epoch: 2   Global Step: 14570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:26,962-Speed 5541.91 samples/sec   Loss 10.2635   LearningRate 0.0733   Epoch: 2   Global Step: 14580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:28,771-Speed 5663.27 samples/sec   Loss 10.1485   LearningRate 0.0732   Epoch: 2   Global Step: 14590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:30,603-Speed 5591.05 samples/sec   Loss 10.1749   LearningRate 0.0732   Epoch: 2   Global Step: 14600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:32,407-Speed 5682.14 samples/sec   Loss 10.1698   LearningRate 0.0732   Epoch: 2   Global Step: 14610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:34,220-Speed 5650.06 samples/sec   Loss 10.4196   LearningRate 0.0732   Epoch: 2   Global Step: 14620   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 11:13:36,020-Speed 5692.17 samples/sec   Loss 10.3980   LearningRate 0.0732   Epoch: 2   Global Step: 14630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:37,844-Speed 5616.20 samples/sec   Loss 10.3629   LearningRate 0.0732   Epoch: 2   Global Step: 14640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:39,652-Speed 5666.05 samples/sec   Loss 10.3148   LearningRate 0.0731   Epoch: 2   Global Step: 14650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:41,499-Speed 5548.93 samples/sec   Loss 10.2656   LearningRate 0.0731   Epoch: 2   Global Step: 14660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:13:43,306-Speed 5670.33 samples/sec   Loss 10.3704   LearningRate 0.0731   Epoch: 2   Global Step: 14670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:13:45,134-Speed 5661.35 samples/sec   Loss 10.1868   LearningRate 0.0731   Epoch: 2   Global Step: 14680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:13:46,968-Speed 5588.57 samples/sec   Loss 10.2479   LearningRate 0.0731   Epoch: 2   Global Step: 14690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:13:48,811-Speed 5559.01 samples/sec   Loss 10.2344   LearningRate 0.0730   Epoch: 2   Global Step: 14700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:13:50,643-Speed 5593.16 samples/sec   Loss 10.2243   LearningRate 0.0730   Epoch: 2   Global Step: 14710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:13:52,483-Speed 5568.12 samples/sec   Loss 10.2215   LearningRate 0.0730   Epoch: 2   Global Step: 14720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:13:54,344-Speed 5507.57 samples/sec   Loss 10.4156   LearningRate 0.0730   Epoch: 2   Global Step: 14730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:13:56,149-Speed 5675.52 samples/sec   Loss 10.3005   LearningRate 0.0730   Epoch: 2   Global Step: 14740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:13:57,966-Speed 5639.85 samples/sec   Loss 10.2261   LearningRate 0.0730   Epoch: 2   Global Step: 14750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:13:59,797-Speed 5595.03 samples/sec   Loss 10.3630   LearningRate 0.0729   Epoch: 2   Global Step: 14760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:14:01,660-Speed 5498.09 samples/sec   Loss 10.2410   LearningRate 0.0729   Epoch: 2   Global Step: 14770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:03,493-Speed 5589.93 samples/sec   Loss 10.2375   LearningRate 0.0729   Epoch: 2   Global Step: 14780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:05,306-Speed 5650.56 samples/sec   Loss 10.1675   LearningRate 0.0729   Epoch: 2   Global Step: 14790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:07,130-Speed 5615.99 samples/sec   Loss 10.1812   LearningRate 0.0729   Epoch: 2   Global Step: 14800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:08,936-Speed 5673.72 samples/sec   Loss 10.4261   LearningRate 0.0729   Epoch: 2   Global Step: 14810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:10,779-Speed 5560.02 samples/sec   Loss 10.0879   LearningRate 0.0728   Epoch: 2   Global Step: 14820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:12,610-Speed 5596.03 samples/sec   Loss 10.1744   LearningRate 0.0728   Epoch: 2   Global Step: 14830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:14,421-Speed 5656.53 samples/sec   Loss 10.1357   LearningRate 0.0728   Epoch: 2   Global Step: 14840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:16,259-Speed 5572.51 samples/sec   Loss 10.0885   LearningRate 0.0728   Epoch: 2   Global Step: 14850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:18,075-Speed 5642.23 samples/sec   Loss 10.1903   LearningRate 0.0728   Epoch: 2   Global Step: 14860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:19,922-Speed 5549.30 samples/sec   Loss 10.0753   LearningRate 0.0728   Epoch: 2   Global Step: 14870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:21,731-Speed 5664.38 samples/sec   Loss 10.3372   LearningRate 0.0727   Epoch: 2   Global Step: 14880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:23,572-Speed 5565.60 samples/sec   Loss 10.2383   LearningRate 0.0727   Epoch: 2   Global Step: 14890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:25,414-Speed 5561.86 samples/sec   Loss 10.2270   LearningRate 0.0727   Epoch: 2   Global Step: 14900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:27,265-Speed 5534.97 samples/sec   Loss 10.1695   LearningRate 0.0727   Epoch: 2   Global Step: 14910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:29,106-Speed 5567.42 samples/sec   Loss 10.0158   LearningRate 0.0727   Epoch: 2   Global Step: 14920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:30,941-Speed 5584.52 samples/sec   Loss 10.1865   LearningRate 0.0727   Epoch: 2   Global Step: 14930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:32,787-Speed 5551.74 samples/sec   Loss 10.2817   LearningRate 0.0726   Epoch: 2   Global Step: 14940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:34,596-Speed 5662.62 samples/sec   Loss 10.3195   LearningRate 0.0726   Epoch: 2   Global Step: 14950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:36,410-Speed 5648.28 samples/sec   Loss 10.2143   LearningRate 0.0726   Epoch: 2   Global Step: 14960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:14:38,225-Speed 5645.10 samples/sec   Loss 10.1532   LearningRate 0.0726   Epoch: 2   Global Step: 14970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:14:40,041-Speed 5642.75 samples/sec   Loss 10.1277   LearningRate 0.0726   Epoch: 2   Global Step: 14980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:14:41,879-Speed 5575.09 samples/sec   Loss 10.2220   LearningRate 0.0726   Epoch: 2   Global Step: 14990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:14:43,697-Speed 5635.90 samples/sec   Loss 10.4588   LearningRate 0.0725   Epoch: 2   Global Step: 15000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:14:45,534-Speed 5577.13 samples/sec   Loss 10.1825   LearningRate 0.0725   Epoch: 2   Global Step: 15010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:14:47,342-Speed 5666.41 samples/sec   Loss 10.2895   LearningRate 0.0725   Epoch: 2   Global Step: 15020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:14:49,174-Speed 5593.06 samples/sec   Loss 10.2441   LearningRate 0.0725   Epoch: 2   Global Step: 15030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:14:50,996-Speed 5623.14 samples/sec   Loss 10.0314   LearningRate 0.0725   Epoch: 2   Global Step: 15040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:14:52,848-Speed 5531.00 samples/sec   Loss 10.0158   LearningRate 0.0725   Epoch: 2   Global Step: 15050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:14:54,685-Speed 5579.24 samples/sec   Loss 10.3277   LearningRate 0.0724   Epoch: 2   Global Step: 15060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:56,510-Speed 5614.67 samples/sec   Loss 10.0066   LearningRate 0.0724   Epoch: 2   Global Step: 15070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:14:58,322-Speed 5654.72 samples/sec   Loss 10.3099   LearningRate 0.0724   Epoch: 2   Global Step: 15080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:00,196-Speed 5466.25 samples/sec   Loss 10.2446   LearningRate 0.0724   Epoch: 2   Global Step: 15090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:02,013-Speed 5639.31 samples/sec   Loss 10.1341   LearningRate 0.0724   Epoch: 2   Global Step: 15100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:03,858-Speed 5553.26 samples/sec   Loss 10.1426   LearningRate 0.0724   Epoch: 2   Global Step: 15110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:05,691-Speed 5589.05 samples/sec   Loss 10.2103   LearningRate 0.0723   Epoch: 2   Global Step: 15120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:07,560-Speed 5481.79 samples/sec   Loss 10.0046   LearningRate 0.0723   Epoch: 2   Global Step: 15130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:09,371-Speed 5657.57 samples/sec   Loss 10.1051   LearningRate 0.0723   Epoch: 2   Global Step: 15140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:11,187-Speed 5643.27 samples/sec   Loss 10.2427   LearningRate 0.0723   Epoch: 2   Global Step: 15150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:13,018-Speed 5597.62 samples/sec   Loss 10.2983   LearningRate 0.0723   Epoch: 2   Global Step: 15160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:14,891-Speed 5471.45 samples/sec   Loss 10.0217   LearningRate 0.0723   Epoch: 2   Global Step: 15170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:26,128-Speed 911.37 samples/sec   Loss 9.7422   LearningRate 0.0722   Epoch: 3   Global Step: 15180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:27,999-Speed 5476.89 samples/sec   Loss 9.2672   LearningRate 0.0722   Epoch: 3   Global Step: 15190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:29,828-Speed 5605.30 samples/sec   Loss 9.2866   LearningRate 0.0722   Epoch: 3   Global Step: 15200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:31,684-Speed 5518.44 samples/sec   Loss 9.1695   LearningRate 0.0722   Epoch: 3   Global Step: 15210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:33,513-Speed 5606.45 samples/sec   Loss 9.2647   LearningRate 0.0722   Epoch: 3   Global Step: 15220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:35,511-Speed 5126.90 samples/sec   Loss 9.2683   LearningRate 0.0722   Epoch: 3   Global Step: 15230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:37,341-Speed 5601.36 samples/sec   Loss 9.3141   LearningRate 0.0721   Epoch: 3   Global Step: 15240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:39,188-Speed 5547.50 samples/sec   Loss 9.2501   LearningRate 0.0721   Epoch: 3   Global Step: 15250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:40,991-Speed 5683.73 samples/sec   Loss 9.5842   LearningRate 0.0721   Epoch: 3   Global Step: 15260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:42,814-Speed 5619.42 samples/sec   Loss 9.4031   LearningRate 0.0721   Epoch: 3   Global Step: 15270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:44,643-Speed 5600.51 samples/sec   Loss 9.5399   LearningRate 0.0721   Epoch: 3   Global Step: 15280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:46,488-Speed 5553.26 samples/sec   Loss 9.3562   LearningRate 0.0721   Epoch: 3   Global Step: 15290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:48,319-Speed 5595.86 samples/sec   Loss 9.5488   LearningRate 0.0720   Epoch: 3   Global Step: 15300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:50,169-Speed 5538.14 samples/sec   Loss 9.5202   LearningRate 0.0720   Epoch: 3   Global Step: 15310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:51,996-Speed 5607.13 samples/sec   Loss 9.5305   LearningRate 0.0720   Epoch: 3   Global Step: 15320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:53,834-Speed 5576.39 samples/sec   Loss 9.4968   LearningRate 0.0720   Epoch: 3   Global Step: 15330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:55,652-Speed 5636.90 samples/sec   Loss 9.5614   LearningRate 0.0720   Epoch: 3   Global Step: 15340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:57,465-Speed 5649.17 samples/sec   Loss 9.4829   LearningRate 0.0720   Epoch: 3   Global Step: 15350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:15:59,337-Speed 5476.81 samples/sec   Loss 9.4750   LearningRate 0.0719   Epoch: 3   Global Step: 15360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:01,169-Speed 5590.94 samples/sec   Loss 9.4762   LearningRate 0.0719   Epoch: 3   Global Step: 15370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:03,040-Speed 5475.94 samples/sec   Loss 9.5156   LearningRate 0.0719   Epoch: 3   Global Step: 15380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:04,865-Speed 5614.49 samples/sec   Loss 9.5526   LearningRate 0.0719   Epoch: 3   Global Step: 15390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:06,731-Speed 5490.56 samples/sec   Loss 9.7076   LearningRate 0.0719   Epoch: 3   Global Step: 15400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:08,544-Speed 5653.12 samples/sec   Loss 9.4177   LearningRate 0.0719   Epoch: 3   Global Step: 15410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:10,358-Speed 5646.35 samples/sec   Loss 9.5048   LearningRate 0.0718   Epoch: 3   Global Step: 15420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:12,238-Speed 5449.08 samples/sec   Loss 9.4426   LearningRate 0.0718   Epoch: 3   Global Step: 15430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:14,063-Speed 5613.53 samples/sec   Loss 9.5377   LearningRate 0.0718   Epoch: 3   Global Step: 15440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:15,916-Speed 5529.94 samples/sec   Loss 9.6546   LearningRate 0.0718   Epoch: 3   Global Step: 15450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:17,815-Speed 5396.76 samples/sec   Loss 9.6172   LearningRate 0.0718   Epoch: 3   Global Step: 15460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:16:19,663-Speed 5543.17 samples/sec   Loss 9.6561   LearningRate 0.0718   Epoch: 3   Global Step: 15470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:16:21,474-Speed 5656.82 samples/sec   Loss 9.7446   LearningRate 0.0717   Epoch: 3   Global Step: 15480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:16:23,327-Speed 5528.15 samples/sec   Loss 9.6784   LearningRate 0.0717   Epoch: 3   Global Step: 15490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:25,144-Speed 5638.37 samples/sec   Loss 9.7405   LearningRate 0.0717   Epoch: 3   Global Step: 15500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:26,999-Speed 5524.26 samples/sec   Loss 9.7065   LearningRate 0.0717   Epoch: 3   Global Step: 15510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:28,841-Speed 5563.45 samples/sec   Loss 9.6947   LearningRate 0.0717   Epoch: 3   Global Step: 15520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:30,682-Speed 5566.12 samples/sec   Loss 9.6581   LearningRate 0.0717   Epoch: 3   Global Step: 15530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:32,488-Speed 5672.70 samples/sec   Loss 9.6695   LearningRate 0.0716   Epoch: 3   Global Step: 15540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:34,352-Speed 5497.24 samples/sec   Loss 9.7013   LearningRate 0.0716   Epoch: 3   Global Step: 15550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:36,182-Speed 5598.49 samples/sec   Loss 9.6707   LearningRate 0.0716   Epoch: 3   Global Step: 15560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:37,992-Speed 5660.75 samples/sec   Loss 9.6270   LearningRate 0.0716   Epoch: 3   Global Step: 15570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:39,831-Speed 5571.76 samples/sec   Loss 9.5192   LearningRate 0.0716   Epoch: 3   Global Step: 15580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:41,666-Speed 5583.03 samples/sec   Loss 9.6647   LearningRate 0.0716   Epoch: 3   Global Step: 15590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:16:43,501-Speed 5582.97 samples/sec   Loss 9.7626   LearningRate 0.0715   Epoch: 3   Global Step: 15600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:16:45,325-Speed 5618.46 samples/sec   Loss 9.8003   LearningRate 0.0715   Epoch: 3   Global Step: 15610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:16:47,173-Speed 5543.40 samples/sec   Loss 9.6622   LearningRate 0.0715   Epoch: 3   Global Step: 15620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:16:49,026-Speed 5529.38 samples/sec   Loss 9.6264   LearningRate 0.0715   Epoch: 3   Global Step: 15630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:16:50,846-Speed 5631.05 samples/sec   Loss 9.7569   LearningRate 0.0715   Epoch: 3   Global Step: 15640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:16:52,678-Speed 5596.66 samples/sec   Loss 9.8125   LearningRate 0.0715   Epoch: 3   Global Step: 15650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:54,523-Speed 5553.18 samples/sec   Loss 9.6842   LearningRate 0.0714   Epoch: 3   Global Step: 15660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:56,335-Speed 5653.45 samples/sec   Loss 9.9041   LearningRate 0.0714   Epoch: 3   Global Step: 15670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:16:58,205-Speed 5478.80 samples/sec   Loss 9.7151   LearningRate 0.0714   Epoch: 3   Global Step: 15680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:17:00,037-Speed 5592.53 samples/sec   Loss 9.6325   LearningRate 0.0714   Epoch: 3   Global Step: 15690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:17:01,876-Speed 5570.94 samples/sec   Loss 9.5297   LearningRate 0.0714   Epoch: 3   Global Step: 15700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:17:03,694-Speed 5636.40 samples/sec   Loss 9.7934   LearningRate 0.0714   Epoch: 3   Global Step: 15710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:17:05,552-Speed 5517.16 samples/sec   Loss 9.9596   LearningRate 0.0713   Epoch: 3   Global Step: 15720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:17:07,417-Speed 5492.03 samples/sec   Loss 9.7960   LearningRate 0.0713   Epoch: 3   Global Step: 15730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:17:09,246-Speed 5601.55 samples/sec   Loss 9.7476   LearningRate 0.0713   Epoch: 3   Global Step: 15740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:17:11,066-Speed 5629.64 samples/sec   Loss 9.7464   LearningRate 0.0713   Epoch: 3   Global Step: 15750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:12,951-Speed 5436.13 samples/sec   Loss 9.8830   LearningRate 0.0713   Epoch: 3   Global Step: 15760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:14,773-Speed 5625.09 samples/sec   Loss 9.6342   LearningRate 0.0713   Epoch: 3   Global Step: 15770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:16,627-Speed 5527.72 samples/sec   Loss 9.8571   LearningRate 0.0712   Epoch: 3   Global Step: 15780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:18,443-Speed 5643.52 samples/sec   Loss 9.8766   LearningRate 0.0712   Epoch: 3   Global Step: 15790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:20,266-Speed 5617.05 samples/sec   Loss 9.8247   LearningRate 0.0712   Epoch: 3   Global Step: 15800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:22,095-Speed 5605.60 samples/sec   Loss 9.8807   LearningRate 0.0712   Epoch: 3   Global Step: 15810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:23,929-Speed 5586.66 samples/sec   Loss 9.8184   LearningRate 0.0712   Epoch: 3   Global Step: 15820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:25,789-Speed 5506.78 samples/sec   Loss 9.8078   LearningRate 0.0712   Epoch: 3   Global Step: 15830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:27,635-Speed 5549.56 samples/sec   Loss 9.7401   LearningRate 0.0711   Epoch: 3   Global Step: 15840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:29,464-Speed 5604.12 samples/sec   Loss 9.6050   LearningRate 0.0711   Epoch: 3   Global Step: 15850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:31,292-Speed 5601.87 samples/sec   Loss 9.8767   LearningRate 0.0711   Epoch: 3   Global Step: 15860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:33,121-Speed 5604.78 samples/sec   Loss 9.8067   LearningRate 0.0711   Epoch: 3   Global Step: 15870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:34,971-Speed 5536.51 samples/sec   Loss 9.9310   LearningRate 0.0711   Epoch: 3   Global Step: 15880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:36,796-Speed 5615.54 samples/sec   Loss 9.8389   LearningRate 0.0711   Epoch: 3   Global Step: 15890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:38,631-Speed 5582.79 samples/sec   Loss 9.7977   LearningRate 0.0710   Epoch: 3   Global Step: 15900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:40,441-Speed 5660.83 samples/sec   Loss 9.9632   LearningRate 0.0710   Epoch: 3   Global Step: 15910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:42,309-Speed 5483.89 samples/sec   Loss 9.7426   LearningRate 0.0710   Epoch: 3   Global Step: 15920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:44,126-Speed 5639.68 samples/sec   Loss 9.8097   LearningRate 0.0710   Epoch: 3   Global Step: 15930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:45,992-Speed 5491.69 samples/sec   Loss 10.0355   LearningRate 0.0710   Epoch: 3   Global Step: 15940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:47,805-Speed 5651.13 samples/sec   Loss 9.9540   LearningRate 0.0710   Epoch: 3   Global Step: 15950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:49,675-Speed 5477.21 samples/sec   Loss 9.7964   LearningRate 0.0709   Epoch: 3   Global Step: 15960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:51,498-Speed 5622.58 samples/sec   Loss 9.9411   LearningRate 0.0709   Epoch: 3   Global Step: 15970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:53,343-Speed 5551.60 samples/sec   Loss 9.9845   LearningRate 0.0709   Epoch: 3   Global Step: 15980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:55,176-Speed 5590.40 samples/sec   Loss 9.9387   LearningRate 0.0709   Epoch: 3   Global Step: 15990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:17:57,059-Speed 5441.97 samples/sec   Loss 9.8023   LearningRate 0.0709   Epoch: 3   Global Step: 16000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:18:24,249-[lfw][16000]XNorm: 21.707389
Training: 2022-04-11 11:18:24,250-[lfw][16000]Accuracy-Flip: 0.99583+-0.00227
Training: 2022-04-11 11:18:24,251-[lfw][16000]Accuracy-Highest: 0.99667
Training: 2022-04-11 11:18:55,973-[cfp_fp][16000]XNorm: 18.765134
Training: 2022-04-11 11:18:55,974-[cfp_fp][16000]Accuracy-Flip: 0.94543+-0.01141
Training: 2022-04-11 11:18:55,975-[cfp_fp][16000]Accuracy-Highest: 0.94543
Training: 2022-04-11 11:19:23,138-[agedb_30][16000]XNorm: 21.307326
Training: 2022-04-11 11:19:23,138-[agedb_30][16000]Accuracy-Flip: 0.96350+-0.01031
Training: 2022-04-11 11:19:23,139-[agedb_30][16000]Accuracy-Highest: 0.96417
Training: 2022-04-11 11:19:24,958-Speed 116.50 samples/sec   Loss 9.8535   LearningRate 0.0709   Epoch: 3   Global Step: 16010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:19:26,815-Speed 5514.75 samples/sec   Loss 9.8577   LearningRate 0.0708   Epoch: 3   Global Step: 16020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:19:28,669-Speed 5526.77 samples/sec   Loss 10.0185   LearningRate 0.0708   Epoch: 3   Global Step: 16030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:19:30,489-Speed 5631.52 samples/sec   Loss 9.7050   LearningRate 0.0708   Epoch: 3   Global Step: 16040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:19:32,287-Speed 5699.25 samples/sec   Loss 9.8121   LearningRate 0.0708   Epoch: 3   Global Step: 16050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:19:34,157-Speed 5477.88 samples/sec   Loss 9.8915   LearningRate 0.0708   Epoch: 3   Global Step: 16060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:19:35,976-Speed 5631.76 samples/sec   Loss 9.8870   LearningRate 0.0708   Epoch: 3   Global Step: 16070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:19:37,795-Speed 5631.57 samples/sec   Loss 9.6810   LearningRate 0.0707   Epoch: 3   Global Step: 16080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:19:39,615-Speed 5630.49 samples/sec   Loss 9.7815   LearningRate 0.0707   Epoch: 3   Global Step: 16090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:19:41,501-Speed 5431.97 samples/sec   Loss 9.8909   LearningRate 0.0707   Epoch: 3   Global Step: 16100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:19:43,318-Speed 5637.11 samples/sec   Loss 9.9432   LearningRate 0.0707   Epoch: 3   Global Step: 16110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:19:45,120-Speed 5689.15 samples/sec   Loss 9.9369   LearningRate 0.0707   Epoch: 3   Global Step: 16120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:19:46,961-Speed 5564.81 samples/sec   Loss 9.9648   LearningRate 0.0707   Epoch: 3   Global Step: 16130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:19:48,777-Speed 5640.12 samples/sec   Loss 9.7658   LearningRate 0.0706   Epoch: 3   Global Step: 16140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:19:50,637-Speed 5508.49 samples/sec   Loss 9.9512   LearningRate 0.0706   Epoch: 3   Global Step: 16150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:19:52,479-Speed 5563.09 samples/sec   Loss 9.9464   LearningRate 0.0706   Epoch: 3   Global Step: 16160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:19:54,319-Speed 5567.01 samples/sec   Loss 9.7215   LearningRate 0.0706   Epoch: 3   Global Step: 16170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:19:56,150-Speed 5595.83 samples/sec   Loss 9.7173   LearningRate 0.0706   Epoch: 3   Global Step: 16180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:19:57,983-Speed 5590.45 samples/sec   Loss 9.9226   LearningRate 0.0706   Epoch: 3   Global Step: 16190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:19:59,850-Speed 5487.66 samples/sec   Loss 9.8222   LearningRate 0.0705   Epoch: 3   Global Step: 16200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:20:01,712-Speed 5501.72 samples/sec   Loss 9.8331   LearningRate 0.0705   Epoch: 3   Global Step: 16210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:20:03,559-Speed 5547.41 samples/sec   Loss 9.9176   LearningRate 0.0705   Epoch: 3   Global Step: 16220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:20:05,421-Speed 5503.28 samples/sec   Loss 9.8154   LearningRate 0.0705   Epoch: 3   Global Step: 16230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:20:07,229-Speed 5665.21 samples/sec   Loss 9.8936   LearningRate 0.0705   Epoch: 3   Global Step: 16240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:20:09,037-Speed 5665.44 samples/sec   Loss 9.9894   LearningRate 0.0705   Epoch: 3   Global Step: 16250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:20:10,864-Speed 5610.09 samples/sec   Loss 9.8433   LearningRate 0.0704   Epoch: 3   Global Step: 16260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:12,687-Speed 5616.95 samples/sec   Loss 9.9429   LearningRate 0.0704   Epoch: 3   Global Step: 16270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:14,546-Speed 5513.71 samples/sec   Loss 9.8351   LearningRate 0.0704   Epoch: 3   Global Step: 16280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:16,357-Speed 5656.26 samples/sec   Loss 9.8261   LearningRate 0.0704   Epoch: 3   Global Step: 16290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:18,183-Speed 5609.75 samples/sec   Loss 9.8026   LearningRate 0.0704   Epoch: 3   Global Step: 16300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:20,007-Speed 5617.97 samples/sec   Loss 9.8540   LearningRate 0.0704   Epoch: 3   Global Step: 16310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:21,845-Speed 5577.91 samples/sec   Loss 9.7294   LearningRate 0.0703   Epoch: 3   Global Step: 16320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:23,658-Speed 5651.62 samples/sec   Loss 9.9765   LearningRate 0.0703   Epoch: 3   Global Step: 16330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:25,533-Speed 5465.30 samples/sec   Loss 9.7949   LearningRate 0.0703   Epoch: 3   Global Step: 16340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:27,374-Speed 5563.81 samples/sec   Loss 9.9667   LearningRate 0.0703   Epoch: 3   Global Step: 16350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:29,186-Speed 5656.65 samples/sec   Loss 9.9314   LearningRate 0.0703   Epoch: 3   Global Step: 16360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:31,056-Speed 5477.24 samples/sec   Loss 10.0419   LearningRate 0.0703   Epoch: 3   Global Step: 16370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:32,873-Speed 5639.93 samples/sec   Loss 9.6212   LearningRate 0.0702   Epoch: 3   Global Step: 16380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:34,714-Speed 5566.93 samples/sec   Loss 9.8229   LearningRate 0.0702   Epoch: 3   Global Step: 16390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:36,563-Speed 5540.08 samples/sec   Loss 9.6831   LearningRate 0.0702   Epoch: 3   Global Step: 16400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:38,376-Speed 5650.61 samples/sec   Loss 9.7775   LearningRate 0.0702   Epoch: 3   Global Step: 16410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:40,232-Speed 5518.99 samples/sec   Loss 9.8586   LearningRate 0.0702   Epoch: 3   Global Step: 16420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:42,053-Speed 5626.37 samples/sec   Loss 9.9701   LearningRate 0.0702   Epoch: 3   Global Step: 16430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:43,867-Speed 5648.86 samples/sec   Loss 9.8570   LearningRate 0.0701   Epoch: 3   Global Step: 16440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:45,683-Speed 5641.37 samples/sec   Loss 9.7352   LearningRate 0.0701   Epoch: 3   Global Step: 16450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:47,511-Speed 5606.83 samples/sec   Loss 9.8402   LearningRate 0.0701   Epoch: 3   Global Step: 16460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:20:49,315-Speed 5677.13 samples/sec   Loss 9.8453   LearningRate 0.0701   Epoch: 3   Global Step: 16470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:20:51,118-Speed 5681.57 samples/sec   Loss 9.9320   LearningRate 0.0701   Epoch: 3   Global Step: 16480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:20:52,935-Speed 5638.27 samples/sec   Loss 9.7371   LearningRate 0.0701   Epoch: 3   Global Step: 16490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:20:54,776-Speed 5564.71 samples/sec   Loss 9.8806   LearningRate 0.0700   Epoch: 3   Global Step: 16500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:56,578-Speed 5684.92 samples/sec   Loss 9.9198   LearningRate 0.0700   Epoch: 3   Global Step: 16510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:20:58,387-Speed 5664.62 samples/sec   Loss 9.8522   LearningRate 0.0700   Epoch: 3   Global Step: 16520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:00,199-Speed 5654.38 samples/sec   Loss 9.8525   LearningRate 0.0700   Epoch: 3   Global Step: 16530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:02,002-Speed 5680.21 samples/sec   Loss 9.7355   LearningRate 0.0700   Epoch: 3   Global Step: 16540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:03,842-Speed 5568.67 samples/sec   Loss 10.0875   LearningRate 0.0700   Epoch: 3   Global Step: 16550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:05,694-Speed 5532.25 samples/sec   Loss 10.0250   LearningRate 0.0699   Epoch: 3   Global Step: 16560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:07,545-Speed 5533.27 samples/sec   Loss 9.8762   LearningRate 0.0699   Epoch: 3   Global Step: 16570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:09,352-Speed 5669.86 samples/sec   Loss 9.6994   LearningRate 0.0699   Epoch: 3   Global Step: 16580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:11,186-Speed 5585.87 samples/sec   Loss 9.7660   LearningRate 0.0699   Epoch: 3   Global Step: 16590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:12,997-Speed 5659.38 samples/sec   Loss 9.8195   LearningRate 0.0699   Epoch: 3   Global Step: 16600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:21:14,854-Speed 5517.14 samples/sec   Loss 10.0186   LearningRate 0.0699   Epoch: 3   Global Step: 16610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:21:16,686-Speed 5591.97 samples/sec   Loss 9.8305   LearningRate 0.0698   Epoch: 3   Global Step: 16620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:21:18,565-Speed 5450.81 samples/sec   Loss 9.8191   LearningRate 0.0698   Epoch: 3   Global Step: 16630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:21:20,367-Speed 5686.38 samples/sec   Loss 9.8928   LearningRate 0.0698   Epoch: 3   Global Step: 16640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:21:22,208-Speed 5569.69 samples/sec   Loss 9.6248   LearningRate 0.0698   Epoch: 3   Global Step: 16650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:21:24,069-Speed 5506.83 samples/sec   Loss 9.8706   LearningRate 0.0698   Epoch: 3   Global Step: 16660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:21:25,876-Speed 5669.70 samples/sec   Loss 9.7623   LearningRate 0.0698   Epoch: 3   Global Step: 16670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:27,715-Speed 5570.01 samples/sec   Loss 9.9027   LearningRate 0.0697   Epoch: 3   Global Step: 16680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:29,523-Speed 5668.04 samples/sec   Loss 10.1129   LearningRate 0.0697   Epoch: 3   Global Step: 16690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:31,403-Speed 5447.27 samples/sec   Loss 9.8319   LearningRate 0.0697   Epoch: 3   Global Step: 16700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:33,218-Speed 5645.05 samples/sec   Loss 9.8599   LearningRate 0.0697   Epoch: 3   Global Step: 16710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:35,059-Speed 5567.51 samples/sec   Loss 9.8619   LearningRate 0.0697   Epoch: 3   Global Step: 16720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:36,878-Speed 5632.62 samples/sec   Loss 9.8377   LearningRate 0.0697   Epoch: 3   Global Step: 16730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:38,790-Speed 5359.73 samples/sec   Loss 10.0265   LearningRate 0.0696   Epoch: 3   Global Step: 16740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:40,599-Speed 5662.36 samples/sec   Loss 9.8376   LearningRate 0.0696   Epoch: 3   Global Step: 16750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:42,426-Speed 5606.99 samples/sec   Loss 9.7084   LearningRate 0.0696   Epoch: 3   Global Step: 16760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:21:44,285-Speed 5510.68 samples/sec   Loss 9.7798   LearningRate 0.0696   Epoch: 3   Global Step: 16770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:21:46,104-Speed 5631.66 samples/sec   Loss 9.8269   LearningRate 0.0696   Epoch: 3   Global Step: 16780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:21:47,941-Speed 5579.69 samples/sec   Loss 9.8415   LearningRate 0.0696   Epoch: 3   Global Step: 16790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:21:49,817-Speed 5460.75 samples/sec   Loss 9.8894   LearningRate 0.0695   Epoch: 3   Global Step: 16800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:21:51,646-Speed 5605.36 samples/sec   Loss 9.6877   LearningRate 0.0695   Epoch: 3   Global Step: 16810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:21:53,473-Speed 5608.15 samples/sec   Loss 9.7754   LearningRate 0.0695   Epoch: 3   Global Step: 16820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:21:55,292-Speed 5631.32 samples/sec   Loss 9.8902   LearningRate 0.0695   Epoch: 3   Global Step: 16830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:21:57,122-Speed 5599.14 samples/sec   Loss 9.8474   LearningRate 0.0695   Epoch: 3   Global Step: 16840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:21:58,983-Speed 5509.66 samples/sec   Loss 9.9744   LearningRate 0.0695   Epoch: 3   Global Step: 16850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:22:00,798-Speed 5643.55 samples/sec   Loss 9.8971   LearningRate 0.0694   Epoch: 3   Global Step: 16860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:22:02,651-Speed 5528.95 samples/sec   Loss 9.7240   LearningRate 0.0694   Epoch: 3   Global Step: 16870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:22:04,452-Speed 5690.15 samples/sec   Loss 9.8572   LearningRate 0.0694   Epoch: 3   Global Step: 16880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:06,295-Speed 5560.76 samples/sec   Loss 9.7008   LearningRate 0.0694   Epoch: 3   Global Step: 16890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:08,111-Speed 5638.99 samples/sec   Loss 9.7231   LearningRate 0.0694   Epoch: 3   Global Step: 16900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:09,939-Speed 5606.87 samples/sec   Loss 9.7146   LearningRate 0.0694   Epoch: 3   Global Step: 16910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:11,833-Speed 5412.12 samples/sec   Loss 10.0129   LearningRate 0.0693   Epoch: 3   Global Step: 16920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:13,651-Speed 5635.55 samples/sec   Loss 9.8516   LearningRate 0.0693   Epoch: 3   Global Step: 16930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:15,484-Speed 5591.36 samples/sec   Loss 9.9879   LearningRate 0.0693   Epoch: 3   Global Step: 16940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:17,407-Speed 5325.95 samples/sec   Loss 9.8099   LearningRate 0.0693   Epoch: 3   Global Step: 16950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:19,237-Speed 5599.69 samples/sec   Loss 9.9537   LearningRate 0.0693   Epoch: 3   Global Step: 16960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:21,051-Speed 5651.16 samples/sec   Loss 9.7895   LearningRate 0.0693   Epoch: 3   Global Step: 16970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:22,889-Speed 5573.54 samples/sec   Loss 9.7996   LearningRate 0.0692   Epoch: 3   Global Step: 16980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:22:24,735-Speed 5552.63 samples/sec   Loss 9.7444   LearningRate 0.0692   Epoch: 3   Global Step: 16990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:26,551-Speed 5643.07 samples/sec   Loss 9.6555   LearningRate 0.0692   Epoch: 3   Global Step: 17000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:28,383-Speed 5591.34 samples/sec   Loss 9.8103   LearningRate 0.0692   Epoch: 3   Global Step: 17010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:30,216-Speed 5591.02 samples/sec   Loss 9.8546   LearningRate 0.0692   Epoch: 3   Global Step: 17020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:32,035-Speed 5632.19 samples/sec   Loss 9.8823   LearningRate 0.0692   Epoch: 3   Global Step: 17030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:33,895-Speed 5508.01 samples/sec   Loss 9.7795   LearningRate 0.0691   Epoch: 3   Global Step: 17040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:35,780-Speed 5435.51 samples/sec   Loss 9.7552   LearningRate 0.0691   Epoch: 3   Global Step: 17050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:37,625-Speed 5552.57 samples/sec   Loss 9.8518   LearningRate 0.0691   Epoch: 3   Global Step: 17060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:39,517-Speed 5418.00 samples/sec   Loss 9.8976   LearningRate 0.0691   Epoch: 3   Global Step: 17070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:41,448-Speed 5554.84 samples/sec   Loss 10.0565   LearningRate 0.0691   Epoch: 3   Global Step: 17080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:22:43,259-Speed 5658.97 samples/sec   Loss 9.7644   LearningRate 0.0691   Epoch: 3   Global Step: 17090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:22:45,090-Speed 5595.29 samples/sec   Loss 9.8452   LearningRate 0.0690   Epoch: 3   Global Step: 17100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:22:46,935-Speed 5553.94 samples/sec   Loss 9.6633   LearningRate 0.0690   Epoch: 3   Global Step: 17110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:22:48,759-Speed 5617.02 samples/sec   Loss 9.9862   LearningRate 0.0690   Epoch: 3   Global Step: 17120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:22:50,584-Speed 5613.50 samples/sec   Loss 9.8234   LearningRate 0.0690   Epoch: 3   Global Step: 17130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:22:52,437-Speed 5560.33 samples/sec   Loss 9.8635   LearningRate 0.0690   Epoch: 3   Global Step: 17140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:22:54,275-Speed 5573.00 samples/sec   Loss 9.9201   LearningRate 0.0690   Epoch: 3   Global Step: 17150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:22:56,134-Speed 5513.37 samples/sec   Loss 9.8822   LearningRate 0.0690   Epoch: 3   Global Step: 17160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:22:57,958-Speed 5616.21 samples/sec   Loss 9.8107   LearningRate 0.0689   Epoch: 3   Global Step: 17170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:22:59,818-Speed 5508.53 samples/sec   Loss 9.8129   LearningRate 0.0689   Epoch: 3   Global Step: 17180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:23:01,628-Speed 5660.12 samples/sec   Loss 9.7229   LearningRate 0.0689   Epoch: 3   Global Step: 17190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:23:03,477-Speed 5540.82 samples/sec   Loss 9.7290   LearningRate 0.0689   Epoch: 3   Global Step: 17200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:23:05,347-Speed 5480.20 samples/sec   Loss 9.8794   LearningRate 0.0689   Epoch: 3   Global Step: 17210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:23:07,177-Speed 5597.64 samples/sec   Loss 9.7873   LearningRate 0.0689   Epoch: 3   Global Step: 17220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:23:09,008-Speed 5596.02 samples/sec   Loss 9.7497   LearningRate 0.0688   Epoch: 3   Global Step: 17230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:23:10,836-Speed 5603.41 samples/sec   Loss 9.7374   LearningRate 0.0688   Epoch: 3   Global Step: 17240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:23:12,738-Speed 5386.35 samples/sec   Loss 9.8960   LearningRate 0.0688   Epoch: 3   Global Step: 17250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:23:14,578-Speed 5581.02 samples/sec   Loss 9.7455   LearningRate 0.0688   Epoch: 3   Global Step: 17260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:23:16,410-Speed 5591.94 samples/sec   Loss 9.8371   LearningRate 0.0688   Epoch: 3   Global Step: 17270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:23:18,277-Speed 5488.09 samples/sec   Loss 9.8210   LearningRate 0.0688   Epoch: 3   Global Step: 17280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:23:20,085-Speed 5665.55 samples/sec   Loss 9.8478   LearningRate 0.0687   Epoch: 3   Global Step: 17290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:23:21,937-Speed 5555.50 samples/sec   Loss 9.6400   LearningRate 0.0687   Epoch: 3   Global Step: 17300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:23:23,748-Speed 5657.00 samples/sec   Loss 9.6963   LearningRate 0.0687   Epoch: 3   Global Step: 17310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:23:25,611-Speed 5499.20 samples/sec   Loss 9.7106   LearningRate 0.0687   Epoch: 3   Global Step: 17320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:23:27,421-Speed 5659.73 samples/sec   Loss 9.7105   LearningRate 0.0687   Epoch: 3   Global Step: 17330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:23:29,239-Speed 5636.97 samples/sec   Loss 9.6856   LearningRate 0.0687   Epoch: 3   Global Step: 17340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:23:31,079-Speed 5567.11 samples/sec   Loss 9.8966   LearningRate 0.0686   Epoch: 3   Global Step: 17350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:23:32,940-Speed 5504.99 samples/sec   Loss 9.6918   LearningRate 0.0686   Epoch: 3   Global Step: 17360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:23:34,806-Speed 5500.99 samples/sec   Loss 9.7422   LearningRate 0.0686   Epoch: 3   Global Step: 17370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:23:36,614-Speed 5664.59 samples/sec   Loss 9.8557   LearningRate 0.0686   Epoch: 3   Global Step: 17380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:23:38,458-Speed 5557.69 samples/sec   Loss 9.7064   LearningRate 0.0686   Epoch: 3   Global Step: 17390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:23:40,275-Speed 5635.83 samples/sec   Loss 9.8195   LearningRate 0.0686   Epoch: 3   Global Step: 17400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:23:42,135-Speed 5510.58 samples/sec   Loss 9.7915   LearningRate 0.0685   Epoch: 3   Global Step: 17410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:23:43,957-Speed 5623.39 samples/sec   Loss 9.7521   LearningRate 0.0685   Epoch: 3   Global Step: 17420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:23:45,780-Speed 5652.27 samples/sec   Loss 9.7854   LearningRate 0.0685   Epoch: 3   Global Step: 17430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:23:47,590-Speed 5659.77 samples/sec   Loss 9.8044   LearningRate 0.0685   Epoch: 3   Global Step: 17440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:23:49,447-Speed 5515.96 samples/sec   Loss 9.9810   LearningRate 0.0685   Epoch: 3   Global Step: 17450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:23:51,280-Speed 5590.54 samples/sec   Loss 9.7376   LearningRate 0.0685   Epoch: 3   Global Step: 17460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:23:53,124-Speed 5557.59 samples/sec   Loss 9.7496   LearningRate 0.0684   Epoch: 3   Global Step: 17470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:23:54,938-Speed 5648.03 samples/sec   Loss 9.8548   LearningRate 0.0684   Epoch: 3   Global Step: 17480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:23:56,808-Speed 5478.04 samples/sec   Loss 9.8909   LearningRate 0.0684   Epoch: 3   Global Step: 17490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:23:58,622-Speed 5649.34 samples/sec   Loss 9.7727   LearningRate 0.0684   Epoch: 3   Global Step: 17500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:00,440-Speed 5632.74 samples/sec   Loss 9.8800   LearningRate 0.0684   Epoch: 3   Global Step: 17510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:02,321-Speed 5448.25 samples/sec   Loss 9.5394   LearningRate 0.0684   Epoch: 3   Global Step: 17520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:04,143-Speed 5621.47 samples/sec   Loss 9.9100   LearningRate 0.0683   Epoch: 3   Global Step: 17530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:05,993-Speed 5599.85 samples/sec   Loss 9.6983   LearningRate 0.0683   Epoch: 3   Global Step: 17540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:07,834-Speed 5564.84 samples/sec   Loss 9.9008   LearningRate 0.0683   Epoch: 3   Global Step: 17550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:09,640-Speed 5675.63 samples/sec   Loss 9.8273   LearningRate 0.0683   Epoch: 3   Global Step: 17560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:11,467-Speed 5605.01 samples/sec   Loss 9.6758   LearningRate 0.0683   Epoch: 3   Global Step: 17570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:13,272-Speed 5675.24 samples/sec   Loss 9.6769   LearningRate 0.0683   Epoch: 3   Global Step: 17580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:15,134-Speed 5549.39 samples/sec   Loss 9.7803   LearningRate 0.0682   Epoch: 3   Global Step: 17590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:17,030-Speed 5406.07 samples/sec   Loss 9.5972   LearningRate 0.0682   Epoch: 3   Global Step: 17600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:18,862-Speed 5591.61 samples/sec   Loss 9.6290   LearningRate 0.0682   Epoch: 3   Global Step: 17610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:20,676-Speed 5648.11 samples/sec   Loss 9.6090   LearningRate 0.0682   Epoch: 3   Global Step: 17620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:22,520-Speed 5561.32 samples/sec   Loss 9.5346   LearningRate 0.0682   Epoch: 3   Global Step: 17630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:24,338-Speed 5634.42 samples/sec   Loss 9.7274   LearningRate 0.0682   Epoch: 3   Global Step: 17640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:24:26,166-Speed 5602.41 samples/sec   Loss 9.7458   LearningRate 0.0681   Epoch: 3   Global Step: 17650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:24:27,997-Speed 5598.26 samples/sec   Loss 9.6961   LearningRate 0.0681   Epoch: 3   Global Step: 17660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:24:29,859-Speed 5501.62 samples/sec   Loss 9.7564   LearningRate 0.0681   Epoch: 3   Global Step: 17670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:24:31,666-Speed 5667.51 samples/sec   Loss 9.7235   LearningRate 0.0681   Epoch: 3   Global Step: 17680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:24:33,534-Speed 5486.89 samples/sec   Loss 9.6522   LearningRate 0.0681   Epoch: 3   Global Step: 17690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:24:35,363-Speed 5601.40 samples/sec   Loss 9.6061   LearningRate 0.0681   Epoch: 3   Global Step: 17700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:24:37,245-Speed 5443.43 samples/sec   Loss 9.6914   LearningRate 0.0681   Epoch: 3   Global Step: 17710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:24:39,113-Speed 5484.92 samples/sec   Loss 9.7055   LearningRate 0.0680   Epoch: 3   Global Step: 17720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:24:40,972-Speed 5511.51 samples/sec   Loss 9.6264   LearningRate 0.0680   Epoch: 3   Global Step: 17730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:24:42,792-Speed 5627.60 samples/sec   Loss 9.7498   LearningRate 0.0680   Epoch: 3   Global Step: 17740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:24:44,654-Speed 5621.16 samples/sec   Loss 9.6676   LearningRate 0.0680   Epoch: 3   Global Step: 17750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:24:46,475-Speed 5627.28 samples/sec   Loss 9.6469   LearningRate 0.0680   Epoch: 3   Global Step: 17760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:24:48,284-Speed 5664.27 samples/sec   Loss 9.6412   LearningRate 0.0680   Epoch: 3   Global Step: 17770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:24:50,100-Speed 5639.63 samples/sec   Loss 9.5711   LearningRate 0.0679   Epoch: 3   Global Step: 17780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:24:51,927-Speed 5645.57 samples/sec   Loss 9.6351   LearningRate 0.0679   Epoch: 3   Global Step: 17790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:24:53,759-Speed 5592.00 samples/sec   Loss 9.7247   LearningRate 0.0679   Epoch: 3   Global Step: 17800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:55,595-Speed 5579.11 samples/sec   Loss 9.7093   LearningRate 0.0679   Epoch: 3   Global Step: 17810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:57,402-Speed 5670.39 samples/sec   Loss 9.7825   LearningRate 0.0679   Epoch: 3   Global Step: 17820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:24:59,271-Speed 5484.39 samples/sec   Loss 9.8229   LearningRate 0.0679   Epoch: 3   Global Step: 17830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:25:01,116-Speed 5553.13 samples/sec   Loss 9.7594   LearningRate 0.0678   Epoch: 3   Global Step: 17840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:25:02,985-Speed 5482.86 samples/sec   Loss 9.7354   LearningRate 0.0678   Epoch: 3   Global Step: 17850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:25:04,823-Speed 5635.87 samples/sec   Loss 9.6907   LearningRate 0.0678   Epoch: 3   Global Step: 17860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:25:06,665-Speed 5561.59 samples/sec   Loss 9.6811   LearningRate 0.0678   Epoch: 3   Global Step: 17870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:25:08,484-Speed 5631.61 samples/sec   Loss 9.8286   LearningRate 0.0678   Epoch: 3   Global Step: 17880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:25:10,338-Speed 5525.83 samples/sec   Loss 9.7515   LearningRate 0.0678   Epoch: 3   Global Step: 17890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:25:12,162-Speed 5617.49 samples/sec   Loss 9.7404   LearningRate 0.0677   Epoch: 3   Global Step: 17900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:25:14,032-Speed 5480.54 samples/sec   Loss 9.7273   LearningRate 0.0677   Epoch: 3   Global Step: 17910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:25:15,963-Speed 5433.53 samples/sec   Loss 9.5745   LearningRate 0.0677   Epoch: 3   Global Step: 17920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:25:17,793-Speed 5599.15 samples/sec   Loss 9.7354   LearningRate 0.0677   Epoch: 3   Global Step: 17930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:25:19,674-Speed 5447.33 samples/sec   Loss 9.7046   LearningRate 0.0677   Epoch: 3   Global Step: 17940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:25:21,483-Speed 5662.68 samples/sec   Loss 9.7533   LearningRate 0.0677   Epoch: 3   Global Step: 17950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:25:23,338-Speed 5666.48 samples/sec   Loss 9.7818   LearningRate 0.0676   Epoch: 3   Global Step: 17960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:25:25,164-Speed 5613.19 samples/sec   Loss 9.7492   LearningRate 0.0676   Epoch: 3   Global Step: 17970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:25:27,016-Speed 5529.38 samples/sec   Loss 9.6849   LearningRate 0.0676   Epoch: 3   Global Step: 17980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:25:28,828-Speed 5656.39 samples/sec   Loss 9.7916   LearningRate 0.0676   Epoch: 3   Global Step: 17990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:25:30,639-Speed 5656.55 samples/sec   Loss 9.8895   LearningRate 0.0676   Epoch: 3   Global Step: 18000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:25:58,172-[lfw][18000]XNorm: 21.568299
Training: 2022-04-11 11:25:58,175-[lfw][18000]Accuracy-Flip: 0.99467+-0.00323
Training: 2022-04-11 11:25:58,175-[lfw][18000]Accuracy-Highest: 0.99667
Training: 2022-04-11 11:26:30,004-[cfp_fp][18000]XNorm: 18.345091
Training: 2022-04-11 11:26:30,005-[cfp_fp][18000]Accuracy-Flip: 0.94057+-0.01056
Training: 2022-04-11 11:26:30,005-[cfp_fp][18000]Accuracy-Highest: 0.94543
Training: 2022-04-11 11:26:57,171-[agedb_30][18000]XNorm: 21.054906
Training: 2022-04-11 11:26:57,177-[agedb_30][18000]Accuracy-Flip: 0.96467+-0.00823
Training: 2022-04-11 11:26:57,178-[agedb_30][18000]Accuracy-Highest: 0.96467
Training: 2022-04-11 11:26:59,008-Speed 115.88 samples/sec   Loss 9.6451   LearningRate 0.0676   Epoch: 3   Global Step: 18010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:27:00,862-Speed 5524.32 samples/sec   Loss 9.6848   LearningRate 0.0675   Epoch: 3   Global Step: 18020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:27:02,699-Speed 5576.92 samples/sec   Loss 9.6439   LearningRate 0.0675   Epoch: 3   Global Step: 18030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:27:04,551-Speed 5546.76 samples/sec   Loss 9.4978   LearningRate 0.0675   Epoch: 3   Global Step: 18040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:27:06,387-Speed 5578.67 samples/sec   Loss 9.8556   LearningRate 0.0675   Epoch: 3   Global Step: 18050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:27:08,213-Speed 5610.30 samples/sec   Loss 9.6893   LearningRate 0.0675   Epoch: 3   Global Step: 18060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:27:10,062-Speed 5541.10 samples/sec   Loss 9.7069   LearningRate 0.0675   Epoch: 3   Global Step: 18070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:11,873-Speed 5659.20 samples/sec   Loss 9.7192   LearningRate 0.0674   Epoch: 3   Global Step: 18080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:13,743-Speed 5479.18 samples/sec   Loss 9.8718   LearningRate 0.0674   Epoch: 3   Global Step: 18090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:15,567-Speed 5616.85 samples/sec   Loss 9.6326   LearningRate 0.0674   Epoch: 3   Global Step: 18100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:17,427-Speed 5509.17 samples/sec   Loss 9.5998   LearningRate 0.0674   Epoch: 3   Global Step: 18110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:19,272-Speed 5550.76 samples/sec   Loss 9.7258   LearningRate 0.0674   Epoch: 3   Global Step: 18120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:21,084-Speed 5656.12 samples/sec   Loss 9.6384   LearningRate 0.0674   Epoch: 3   Global Step: 18130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:22,961-Speed 5458.57 samples/sec   Loss 9.7463   LearningRate 0.0674   Epoch: 3   Global Step: 18140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:24,867-Speed 5545.78 samples/sec   Loss 9.6689   LearningRate 0.0673   Epoch: 3   Global Step: 18150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:26,726-Speed 5511.81 samples/sec   Loss 9.7866   LearningRate 0.0673   Epoch: 3   Global Step: 18160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:28,550-Speed 5614.76 samples/sec   Loss 9.6107   LearningRate 0.0673   Epoch: 3   Global Step: 18170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:27:30,439-Speed 5422.28 samples/sec   Loss 9.6911   LearningRate 0.0673   Epoch: 3   Global Step: 18180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:27:32,269-Speed 5598.88 samples/sec   Loss 9.6558   LearningRate 0.0673   Epoch: 3   Global Step: 18190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:27:34,075-Speed 5674.13 samples/sec   Loss 9.7653   LearningRate 0.0673   Epoch: 3   Global Step: 18200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:27:35,888-Speed 5652.03 samples/sec   Loss 9.7052   LearningRate 0.0672   Epoch: 3   Global Step: 18210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:27:37,727-Speed 5571.38 samples/sec   Loss 9.8009   LearningRate 0.0672   Epoch: 3   Global Step: 18220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:27:39,533-Speed 5672.51 samples/sec   Loss 9.7974   LearningRate 0.0672   Epoch: 3   Global Step: 18230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:41,382-Speed 5586.89 samples/sec   Loss 9.6860   LearningRate 0.0672   Epoch: 3   Global Step: 18240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:43,254-Speed 5473.47 samples/sec   Loss 9.4467   LearningRate 0.0672   Epoch: 3   Global Step: 18250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:45,065-Speed 5657.05 samples/sec   Loss 9.6698   LearningRate 0.0672   Epoch: 3   Global Step: 18260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:46,916-Speed 5535.37 samples/sec   Loss 9.5147   LearningRate 0.0671   Epoch: 3   Global Step: 18270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:48,752-Speed 5578.78 samples/sec   Loss 9.5890   LearningRate 0.0671   Epoch: 3   Global Step: 18280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:50,584-Speed 5591.98 samples/sec   Loss 9.4861   LearningRate 0.0671   Epoch: 3   Global Step: 18290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:52,398-Speed 5648.02 samples/sec   Loss 9.6374   LearningRate 0.0671   Epoch: 3   Global Step: 18300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:54,279-Speed 5652.61 samples/sec   Loss 9.6432   LearningRate 0.0671   Epoch: 3   Global Step: 18310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:56,091-Speed 5653.90 samples/sec   Loss 9.4563   LearningRate 0.0671   Epoch: 3   Global Step: 18320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:27:57,894-Speed 5681.87 samples/sec   Loss 9.7239   LearningRate 0.0670   Epoch: 3   Global Step: 18330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:27:59,769-Speed 5463.26 samples/sec   Loss 9.5770   LearningRate 0.0670   Epoch: 3   Global Step: 18340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:01,627-Speed 5512.30 samples/sec   Loss 9.4883   LearningRate 0.0670   Epoch: 3   Global Step: 18350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:03,487-Speed 5580.12 samples/sec   Loss 9.5406   LearningRate 0.0670   Epoch: 3   Global Step: 18360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:05,345-Speed 5514.55 samples/sec   Loss 9.4174   LearningRate 0.0670   Epoch: 3   Global Step: 18370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:07,212-Speed 5489.45 samples/sec   Loss 9.6331   LearningRate 0.0670   Epoch: 3   Global Step: 18380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:09,029-Speed 5637.75 samples/sec   Loss 9.6496   LearningRate 0.0669   Epoch: 3   Global Step: 18390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:10,860-Speed 5596.53 samples/sec   Loss 9.5902   LearningRate 0.0669   Epoch: 3   Global Step: 18400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:12,670-Speed 5661.53 samples/sec   Loss 9.5513   LearningRate 0.0669   Epoch: 3   Global Step: 18410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:14,505-Speed 5584.47 samples/sec   Loss 9.5787   LearningRate 0.0669   Epoch: 3   Global Step: 18420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:16,358-Speed 5527.36 samples/sec   Loss 9.7610   LearningRate 0.0669   Epoch: 3   Global Step: 18430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:18,214-Speed 5521.03 samples/sec   Loss 9.4972   LearningRate 0.0669   Epoch: 3   Global Step: 18440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:28:20,020-Speed 5673.76 samples/sec   Loss 9.4514   LearningRate 0.0668   Epoch: 3   Global Step: 18450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:28:21,867-Speed 5546.40 samples/sec   Loss 9.5261   LearningRate 0.0668   Epoch: 3   Global Step: 18460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:28:23,699-Speed 5621.18 samples/sec   Loss 9.4197   LearningRate 0.0668   Epoch: 3   Global Step: 18470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:28:25,548-Speed 5540.92 samples/sec   Loss 9.5033   LearningRate 0.0668   Epoch: 3   Global Step: 18480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:28:27,366-Speed 5634.83 samples/sec   Loss 9.5490   LearningRate 0.0668   Epoch: 3   Global Step: 18490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:28:29,215-Speed 5542.77 samples/sec   Loss 9.4302   LearningRate 0.0668   Epoch: 3   Global Step: 18500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:28:31,034-Speed 5632.35 samples/sec   Loss 9.5304   LearningRate 0.0668   Epoch: 3   Global Step: 18510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:28:32,857-Speed 5618.50 samples/sec   Loss 9.7842   LearningRate 0.0667   Epoch: 3   Global Step: 18520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:28:34,677-Speed 5642.40 samples/sec   Loss 9.7479   LearningRate 0.0667   Epoch: 3   Global Step: 18530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:28:36,528-Speed 5535.19 samples/sec   Loss 9.4723   LearningRate 0.0667   Epoch: 3   Global Step: 18540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:28:38,349-Speed 5624.23 samples/sec   Loss 9.5053   LearningRate 0.0667   Epoch: 3   Global Step: 18550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:28:40,182-Speed 5588.29 samples/sec   Loss 9.5242   LearningRate 0.0667   Epoch: 3   Global Step: 18560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:42,016-Speed 5676.55 samples/sec   Loss 9.6632   LearningRate 0.0667   Epoch: 3   Global Step: 18570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:43,852-Speed 5580.89 samples/sec   Loss 9.5933   LearningRate 0.0666   Epoch: 3   Global Step: 18580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:45,688-Speed 5579.50 samples/sec   Loss 9.6054   LearningRate 0.0666   Epoch: 3   Global Step: 18590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:47,496-Speed 5667.83 samples/sec   Loss 9.6974   LearningRate 0.0666   Epoch: 3   Global Step: 18600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:49,338-Speed 5560.41 samples/sec   Loss 9.8450   LearningRate 0.0666   Epoch: 3   Global Step: 18610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:51,159-Speed 5627.75 samples/sec   Loss 9.6836   LearningRate 0.0666   Epoch: 3   Global Step: 18620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:52,987-Speed 5601.76 samples/sec   Loss 9.6801   LearningRate 0.0666   Epoch: 3   Global Step: 18630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:54,953-Speed 5460.63 samples/sec   Loss 9.5873   LearningRate 0.0665   Epoch: 3   Global Step: 18640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:56,762-Speed 5668.79 samples/sec   Loss 9.5545   LearningRate 0.0665   Epoch: 3   Global Step: 18650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:28:58,624-Speed 5500.95 samples/sec   Loss 9.6199   LearningRate 0.0665   Epoch: 3   Global Step: 18660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:29:00,436-Speed 5652.35 samples/sec   Loss 9.5251   LearningRate 0.0665   Epoch: 3   Global Step: 18670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:29:02,283-Speed 5547.31 samples/sec   Loss 9.7057   LearningRate 0.0665   Epoch: 3   Global Step: 18680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:29:04,168-Speed 5435.82 samples/sec   Loss 9.6076   LearningRate 0.0665   Epoch: 3   Global Step: 18690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:05,999-Speed 5597.08 samples/sec   Loss 9.6317   LearningRate 0.0664   Epoch: 3   Global Step: 18700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:07,815-Speed 5640.49 samples/sec   Loss 9.6349   LearningRate 0.0664   Epoch: 3   Global Step: 18710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:09,668-Speed 5530.73 samples/sec   Loss 9.6587   LearningRate 0.0664   Epoch: 3   Global Step: 18720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:11,544-Speed 5470.93 samples/sec   Loss 9.5304   LearningRate 0.0664   Epoch: 3   Global Step: 18730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:13,360-Speed 5641.01 samples/sec   Loss 9.5182   LearningRate 0.0664   Epoch: 3   Global Step: 18740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:15,178-Speed 5635.29 samples/sec   Loss 9.5942   LearningRate 0.0664   Epoch: 3   Global Step: 18750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:16,997-Speed 5629.97 samples/sec   Loss 9.4230   LearningRate 0.0663   Epoch: 3   Global Step: 18760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:18,813-Speed 5643.93 samples/sec   Loss 9.4904   LearningRate 0.0663   Epoch: 3   Global Step: 18770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:20,619-Speed 5671.18 samples/sec   Loss 9.5394   LearningRate 0.0663   Epoch: 3   Global Step: 18780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:22,432-Speed 5650.05 samples/sec   Loss 9.5614   LearningRate 0.0663   Epoch: 3   Global Step: 18790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:29:24,346-Speed 5486.24 samples/sec   Loss 9.5302   LearningRate 0.0663   Epoch: 3   Global Step: 18800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:29:26,165-Speed 5631.99 samples/sec   Loss 9.4425   LearningRate 0.0663   Epoch: 3   Global Step: 18810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:29:28,058-Speed 5411.41 samples/sec   Loss 9.7237   LearningRate 0.0663   Epoch: 3   Global Step: 18820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:29:29,874-Speed 5641.20 samples/sec   Loss 9.4600   LearningRate 0.0662   Epoch: 3   Global Step: 18830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:29:31,702-Speed 5603.59 samples/sec   Loss 9.4349   LearningRate 0.0662   Epoch: 3   Global Step: 18840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:29:33,549-Speed 5548.14 samples/sec   Loss 9.4748   LearningRate 0.0662   Epoch: 3   Global Step: 18850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:35,360-Speed 5656.24 samples/sec   Loss 9.4543   LearningRate 0.0662   Epoch: 3   Global Step: 18860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:37,209-Speed 5539.69 samples/sec   Loss 9.7226   LearningRate 0.0662   Epoch: 3   Global Step: 18870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:39,089-Speed 5450.54 samples/sec   Loss 9.5602   LearningRate 0.0662   Epoch: 3   Global Step: 18880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:40,908-Speed 5633.20 samples/sec   Loss 9.4871   LearningRate 0.0661   Epoch: 3   Global Step: 18890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:42,775-Speed 5486.05 samples/sec   Loss 9.5794   LearningRate 0.0661   Epoch: 3   Global Step: 18900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:44,608-Speed 5589.07 samples/sec   Loss 9.5220   LearningRate 0.0661   Epoch: 3   Global Step: 18910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:29:46,450-Speed 5562.68 samples/sec   Loss 9.7158   LearningRate 0.0661   Epoch: 3   Global Step: 18920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:29:48,619-Speed 4722.59 samples/sec   Loss 9.5813   LearningRate 0.0661   Epoch: 3   Global Step: 18930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:29:50,446-Speed 5607.50 samples/sec   Loss 9.5326   LearningRate 0.0661   Epoch: 3   Global Step: 18940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:29:52,262-Speed 5640.48 samples/sec   Loss 9.3677   LearningRate 0.0660   Epoch: 3   Global Step: 18950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:29:54,100-Speed 5589.39 samples/sec   Loss 9.6395   LearningRate 0.0660   Epoch: 3   Global Step: 18960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:29:55,924-Speed 5623.04 samples/sec   Loss 9.7113   LearningRate 0.0660   Epoch: 3   Global Step: 18970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:29:57,750-Speed 5608.88 samples/sec   Loss 9.4972   LearningRate 0.0660   Epoch: 3   Global Step: 18980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:29:59,594-Speed 5554.76 samples/sec   Loss 9.2905   LearningRate 0.0660   Epoch: 3   Global Step: 18990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:30:01,404-Speed 5659.11 samples/sec   Loss 9.4224   LearningRate 0.0660   Epoch: 3   Global Step: 19000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:30:03,243-Speed 5573.09 samples/sec   Loss 9.3748   LearningRate 0.0659   Epoch: 3   Global Step: 19010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:30:05,062-Speed 5639.31 samples/sec   Loss 9.5497   LearningRate 0.0659   Epoch: 3   Global Step: 19020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:06,891-Speed 5600.46 samples/sec   Loss 9.6033   LearningRate 0.0659   Epoch: 3   Global Step: 19030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:08,706-Speed 5645.73 samples/sec   Loss 9.6268   LearningRate 0.0659   Epoch: 3   Global Step: 19040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:10,539-Speed 5587.17 samples/sec   Loss 9.5015   LearningRate 0.0659   Epoch: 3   Global Step: 19050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:12,363-Speed 5618.08 samples/sec   Loss 9.6052   LearningRate 0.0659   Epoch: 3   Global Step: 19060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:14,194-Speed 5594.64 samples/sec   Loss 9.5093   LearningRate 0.0659   Epoch: 3   Global Step: 19070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:16,040-Speed 5546.45 samples/sec   Loss 9.5260   LearningRate 0.0658   Epoch: 3   Global Step: 19080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:17,867-Speed 5607.86 samples/sec   Loss 9.5322   LearningRate 0.0658   Epoch: 3   Global Step: 19090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:19,685-Speed 5634.27 samples/sec   Loss 9.6057   LearningRate 0.0658   Epoch: 3   Global Step: 19100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:21,519-Speed 5585.56 samples/sec   Loss 9.5298   LearningRate 0.0658   Epoch: 3   Global Step: 19110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:23,354-Speed 5582.63 samples/sec   Loss 9.5603   LearningRate 0.0658   Epoch: 3   Global Step: 19120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:30:25,159-Speed 5677.14 samples/sec   Loss 9.3503   LearningRate 0.0658   Epoch: 3   Global Step: 19130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:30:26,980-Speed 5624.64 samples/sec   Loss 9.5665   LearningRate 0.0657   Epoch: 3   Global Step: 19140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:30:28,859-Speed 5451.74 samples/sec   Loss 9.3690   LearningRate 0.0657   Epoch: 3   Global Step: 19150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:30:30,699-Speed 5567.53 samples/sec   Loss 9.3624   LearningRate 0.0657   Epoch: 3   Global Step: 19160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:30:32,521-Speed 5621.03 samples/sec   Loss 9.4120   LearningRate 0.0657   Epoch: 3   Global Step: 19170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:30:34,357-Speed 5582.53 samples/sec   Loss 9.5617   LearningRate 0.0657   Epoch: 3   Global Step: 19180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:30:36,204-Speed 5545.56 samples/sec   Loss 9.6025   LearningRate 0.0657   Epoch: 3   Global Step: 19190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:30:38,028-Speed 5614.01 samples/sec   Loss 9.6203   LearningRate 0.0656   Epoch: 3   Global Step: 19200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:30:39,860-Speed 5593.20 samples/sec   Loss 9.4069   LearningRate 0.0656   Epoch: 3   Global Step: 19210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:30:41,670-Speed 5659.90 samples/sec   Loss 9.5118   LearningRate 0.0656   Epoch: 3   Global Step: 19220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:30:43,482-Speed 5652.27 samples/sec   Loss 9.5425   LearningRate 0.0656   Epoch: 3   Global Step: 19230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:45,318-Speed 5580.48 samples/sec   Loss 9.6254   LearningRate 0.0656   Epoch: 3   Global Step: 19240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:47,134-Speed 5640.33 samples/sec   Loss 9.5292   LearningRate 0.0656   Epoch: 3   Global Step: 19250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:48,946-Speed 5652.60 samples/sec   Loss 9.5187   LearningRate 0.0655   Epoch: 3   Global Step: 19260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:50,781-Speed 5582.84 samples/sec   Loss 9.4885   LearningRate 0.0655   Epoch: 3   Global Step: 19270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:52,607-Speed 5611.17 samples/sec   Loss 9.3042   LearningRate 0.0655   Epoch: 3   Global Step: 19280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:54,421-Speed 5646.50 samples/sec   Loss 9.6058   LearningRate 0.0655   Epoch: 3   Global Step: 19290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:56,299-Speed 5454.13 samples/sec   Loss 9.4508   LearningRate 0.0655   Epoch: 3   Global Step: 19300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:58,127-Speed 5605.68 samples/sec   Loss 9.5499   LearningRate 0.0655   Epoch: 3   Global Step: 19310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:30:59,951-Speed 5615.01 samples/sec   Loss 9.5728   LearningRate 0.0655   Epoch: 3   Global Step: 19320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:01,814-Speed 5499.69 samples/sec   Loss 9.6744   LearningRate 0.0654   Epoch: 3   Global Step: 19330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:31:03,678-Speed 5496.54 samples/sec   Loss 9.6133   LearningRate 0.0654   Epoch: 3   Global Step: 19340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:31:05,512-Speed 5585.66 samples/sec   Loss 9.4522   LearningRate 0.0654   Epoch: 3   Global Step: 19350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:31:07,339-Speed 5607.01 samples/sec   Loss 9.6713   LearningRate 0.0654   Epoch: 3   Global Step: 19360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:31:09,145-Speed 5670.75 samples/sec   Loss 9.4486   LearningRate 0.0654   Epoch: 3   Global Step: 19370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:10,969-Speed 5615.85 samples/sec   Loss 9.3320   LearningRate 0.0654   Epoch: 3   Global Step: 19380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:12,782-Speed 5650.24 samples/sec   Loss 9.3997   LearningRate 0.0653   Epoch: 3   Global Step: 19390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:14,586-Speed 5679.45 samples/sec   Loss 9.3838   LearningRate 0.0653   Epoch: 3   Global Step: 19400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:16,411-Speed 5613.04 samples/sec   Loss 9.5607   LearningRate 0.0653   Epoch: 3   Global Step: 19410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:18,293-Speed 5442.15 samples/sec   Loss 9.4846   LearningRate 0.0653   Epoch: 3   Global Step: 19420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:20,113-Speed 5626.97 samples/sec   Loss 9.5313   LearningRate 0.0653   Epoch: 3   Global Step: 19430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:21,922-Speed 5662.95 samples/sec   Loss 9.3742   LearningRate 0.0653   Epoch: 3   Global Step: 19440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:23,747-Speed 5615.32 samples/sec   Loss 9.2798   LearningRate 0.0652   Epoch: 3   Global Step: 19450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:25,579-Speed 5591.91 samples/sec   Loss 9.5113   LearningRate 0.0652   Epoch: 3   Global Step: 19460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:27,405-Speed 5609.87 samples/sec   Loss 9.6070   LearningRate 0.0652   Epoch: 3   Global Step: 19470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:31:29,214-Speed 5660.92 samples/sec   Loss 9.5072   LearningRate 0.0652   Epoch: 3   Global Step: 19480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:31:31,044-Speed 5599.21 samples/sec   Loss 9.5412   LearningRate 0.0652   Epoch: 3   Global Step: 19490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:31:32,874-Speed 5595.30 samples/sec   Loss 9.4831   LearningRate 0.0652   Epoch: 3   Global Step: 19500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:31:34,691-Speed 5640.79 samples/sec   Loss 9.4864   LearningRate 0.0651   Epoch: 3   Global Step: 19510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:31:36,579-Speed 5425.48 samples/sec   Loss 9.3843   LearningRate 0.0651   Epoch: 3   Global Step: 19520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:38,423-Speed 5554.41 samples/sec   Loss 9.2462   LearningRate 0.0651   Epoch: 3   Global Step: 19530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:40,278-Speed 5522.14 samples/sec   Loss 9.4994   LearningRate 0.0651   Epoch: 3   Global Step: 19540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:42,097-Speed 5634.54 samples/sec   Loss 9.3434   LearningRate 0.0651   Epoch: 3   Global Step: 19550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:43,944-Speed 5545.55 samples/sec   Loss 9.3679   LearningRate 0.0651   Epoch: 3   Global Step: 19560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:45,800-Speed 5517.07 samples/sec   Loss 9.3636   LearningRate 0.0651   Epoch: 3   Global Step: 19570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:47,657-Speed 5518.51 samples/sec   Loss 9.3930   LearningRate 0.0650   Epoch: 3   Global Step: 19580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:49,527-Speed 5476.71 samples/sec   Loss 9.4179   LearningRate 0.0650   Epoch: 3   Global Step: 19590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:51,342-Speed 5644.36 samples/sec   Loss 9.4288   LearningRate 0.0650   Epoch: 3   Global Step: 19600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:53,181-Speed 5571.47 samples/sec   Loss 9.5508   LearningRate 0.0650   Epoch: 3   Global Step: 19610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:31:54,993-Speed 5652.67 samples/sec   Loss 9.4262   LearningRate 0.0650   Epoch: 3   Global Step: 19620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:31:56,816-Speed 5617.84 samples/sec   Loss 9.5121   LearningRate 0.0650   Epoch: 3   Global Step: 19630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:31:58,652-Speed 5578.82 samples/sec   Loss 9.3885   LearningRate 0.0649   Epoch: 3   Global Step: 19640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:32:00,503-Speed 5536.46 samples/sec   Loss 9.4445   LearningRate 0.0649   Epoch: 3   Global Step: 19650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:32:02,312-Speed 5661.76 samples/sec   Loss 9.2871   LearningRate 0.0649   Epoch: 3   Global Step: 19660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:32:04,149-Speed 5577.27 samples/sec   Loss 9.5117   LearningRate 0.0649   Epoch: 3   Global Step: 19670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:32:05,976-Speed 5608.31 samples/sec   Loss 9.5957   LearningRate 0.0649   Epoch: 3   Global Step: 19680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:32:07,796-Speed 5626.91 samples/sec   Loss 9.3552   LearningRate 0.0649   Epoch: 3   Global Step: 19690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:32:09,613-Speed 5636.46 samples/sec   Loss 9.3700   LearningRate 0.0648   Epoch: 3   Global Step: 19700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:32:11,425-Speed 5653.80 samples/sec   Loss 9.3905   LearningRate 0.0648   Epoch: 3   Global Step: 19710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:32:13,245-Speed 5630.15 samples/sec   Loss 9.3503   LearningRate 0.0648   Epoch: 3   Global Step: 19720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:32:15,062-Speed 5637.25 samples/sec   Loss 9.4171   LearningRate 0.0648   Epoch: 3   Global Step: 19730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:16,961-Speed 5394.62 samples/sec   Loss 9.4291   LearningRate 0.0648   Epoch: 3   Global Step: 19740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:18,830-Speed 5479.02 samples/sec   Loss 9.3575   LearningRate 0.0648   Epoch: 3   Global Step: 19750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:20,660-Speed 5597.78 samples/sec   Loss 9.3912   LearningRate 0.0647   Epoch: 3   Global Step: 19760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:22,475-Speed 5643.89 samples/sec   Loss 9.4830   LearningRate 0.0647   Epoch: 3   Global Step: 19770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:24,303-Speed 5606.24 samples/sec   Loss 9.4905   LearningRate 0.0647   Epoch: 3   Global Step: 19780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:26,174-Speed 5474.24 samples/sec   Loss 9.4197   LearningRate 0.0647   Epoch: 3   Global Step: 19790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:28,007-Speed 5586.35 samples/sec   Loss 9.2783   LearningRate 0.0647   Epoch: 3   Global Step: 19800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:29,845-Speed 5574.37 samples/sec   Loss 9.3175   LearningRate 0.0647   Epoch: 3   Global Step: 19810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:31,669-Speed 5614.53 samples/sec   Loss 9.6238   LearningRate 0.0647   Epoch: 3   Global Step: 19820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:33,493-Speed 5618.89 samples/sec   Loss 9.3811   LearningRate 0.0646   Epoch: 3   Global Step: 19830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:32:35,304-Speed 5656.12 samples/sec   Loss 9.3259   LearningRate 0.0646   Epoch: 3   Global Step: 19840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:37,119-Speed 5643.37 samples/sec   Loss 9.2479   LearningRate 0.0646   Epoch: 3   Global Step: 19850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:38,942-Speed 5618.81 samples/sec   Loss 9.3732   LearningRate 0.0646   Epoch: 3   Global Step: 19860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:40,764-Speed 5620.95 samples/sec   Loss 9.2802   LearningRate 0.0646   Epoch: 3   Global Step: 19870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:42,599-Speed 5585.21 samples/sec   Loss 9.3868   LearningRate 0.0646   Epoch: 3   Global Step: 19880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:44,414-Speed 5643.35 samples/sec   Loss 9.4954   LearningRate 0.0645   Epoch: 3   Global Step: 19890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:46,228-Speed 5646.90 samples/sec   Loss 9.2656   LearningRate 0.0645   Epoch: 3   Global Step: 19900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:48,051-Speed 5618.04 samples/sec   Loss 9.3758   LearningRate 0.0645   Epoch: 3   Global Step: 19910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:49,875-Speed 5618.41 samples/sec   Loss 9.4203   LearningRate 0.0645   Epoch: 3   Global Step: 19920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:51,709-Speed 5584.80 samples/sec   Loss 9.4560   LearningRate 0.0645   Epoch: 3   Global Step: 19930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:32:53,535-Speed 5607.64 samples/sec   Loss 9.2285   LearningRate 0.0645   Epoch: 3   Global Step: 19940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:32:55,372-Speed 5578.80 samples/sec   Loss 9.2507   LearningRate 0.0644   Epoch: 3   Global Step: 19950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:32:57,204-Speed 5591.60 samples/sec   Loss 9.3670   LearningRate 0.0644   Epoch: 3   Global Step: 19960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:32:59,052-Speed 5542.65 samples/sec   Loss 9.2116   LearningRate 0.0644   Epoch: 3   Global Step: 19970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:33:00,877-Speed 5614.08 samples/sec   Loss 9.4806   LearningRate 0.0644   Epoch: 3   Global Step: 19980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:33:02,739-Speed 5500.27 samples/sec   Loss 9.5349   LearningRate 0.0644   Epoch: 3   Global Step: 19990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:33:04,565-Speed 5610.50 samples/sec   Loss 9.3772   LearningRate 0.0644   Epoch: 3   Global Step: 20000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:33:31,994-[lfw][20000]XNorm: 21.186255
Training: 2022-04-11 11:33:31,995-[lfw][20000]Accuracy-Flip: 0.99633+-0.00277
Training: 2022-04-11 11:33:31,996-[lfw][20000]Accuracy-Highest: 0.99667
Training: 2022-04-11 11:34:03,204-[cfp_fp][20000]XNorm: 18.244188
Training: 2022-04-11 11:34:03,206-[cfp_fp][20000]Accuracy-Flip: 0.95486+-0.00836
Training: 2022-04-11 11:34:03,206-[cfp_fp][20000]Accuracy-Highest: 0.95486
Training: 2022-04-11 11:34:30,163-[agedb_30][20000]XNorm: 20.944213
Training: 2022-04-11 11:34:30,164-[agedb_30][20000]Accuracy-Flip: 0.97033+-0.00894
Training: 2022-04-11 11:34:30,165-[agedb_30][20000]Accuracy-Highest: 0.97033
Training: 2022-04-11 11:34:32,041-Speed 117.06 samples/sec   Loss 9.3294   LearningRate 0.0644   Epoch: 3   Global Step: 20010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:34:33,897-Speed 5520.48 samples/sec   Loss 9.2522   LearningRate 0.0643   Epoch: 3   Global Step: 20020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:34:35,776-Speed 5454.02 samples/sec   Loss 9.4319   LearningRate 0.0643   Epoch: 3   Global Step: 20030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:34:37,617-Speed 5563.96 samples/sec   Loss 9.4107   LearningRate 0.0643   Epoch: 3   Global Step: 20040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:34:39,466-Speed 5542.40 samples/sec   Loss 9.2184   LearningRate 0.0643   Epoch: 3   Global Step: 20050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:34:41,319-Speed 5529.57 samples/sec   Loss 9.4813   LearningRate 0.0643   Epoch: 3   Global Step: 20060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:34:43,136-Speed 5639.95 samples/sec   Loss 9.4274   LearningRate 0.0643   Epoch: 3   Global Step: 20070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:34:44,961-Speed 5613.88 samples/sec   Loss 9.3704   LearningRate 0.0642   Epoch: 3   Global Step: 20080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:34:46,799-Speed 5576.29 samples/sec   Loss 9.3525   LearningRate 0.0642   Epoch: 3   Global Step: 20090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:34:48,631-Speed 5593.13 samples/sec   Loss 9.2881   LearningRate 0.0642   Epoch: 3   Global Step: 20100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:34:50,535-Speed 5379.40 samples/sec   Loss 9.3532   LearningRate 0.0642   Epoch: 3   Global Step: 20110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:34:52,363-Speed 5606.22 samples/sec   Loss 9.2000   LearningRate 0.0642   Epoch: 3   Global Step: 20120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:34:54,221-Speed 5515.77 samples/sec   Loss 9.4689   LearningRate 0.0642   Epoch: 3   Global Step: 20130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:34:56,065-Speed 5553.84 samples/sec   Loss 9.3001   LearningRate 0.0641   Epoch: 3   Global Step: 20140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:34:57,894-Speed 5602.72 samples/sec   Loss 9.4400   LearningRate 0.0641   Epoch: 3   Global Step: 20150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:34:59,708-Speed 5647.42 samples/sec   Loss 9.3003   LearningRate 0.0641   Epoch: 3   Global Step: 20160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:35:01,595-Speed 5433.17 samples/sec   Loss 9.4453   LearningRate 0.0641   Epoch: 3   Global Step: 20170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:35:03,419-Speed 5616.19 samples/sec   Loss 9.5310   LearningRate 0.0641   Epoch: 3   Global Step: 20180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:35:05,264-Speed 5553.54 samples/sec   Loss 9.3606   LearningRate 0.0641   Epoch: 3   Global Step: 20190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:35:07,141-Speed 5458.65 samples/sec   Loss 9.3868   LearningRate 0.0641   Epoch: 3   Global Step: 20200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:35:09,022-Speed 5446.83 samples/sec   Loss 9.2697   LearningRate 0.0640   Epoch: 3   Global Step: 20210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:35:10,839-Speed 5637.97 samples/sec   Loss 9.4822   LearningRate 0.0640   Epoch: 3   Global Step: 20220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:35:12,765-Speed 5319.65 samples/sec   Loss 9.2583   LearningRate 0.0640   Epoch: 3   Global Step: 20230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:35:24,624-Speed 863.58 samples/sec   Loss 8.6482   LearningRate 0.0640   Epoch: 4   Global Step: 20240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:35:26,481-Speed 5519.76 samples/sec   Loss 8.5766   LearningRate 0.0640   Epoch: 4   Global Step: 20250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:35:28,343-Speed 5503.30 samples/sec   Loss 8.4664   LearningRate 0.0640   Epoch: 4   Global Step: 20260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:35:30,170-Speed 5608.29 samples/sec   Loss 8.7219   LearningRate 0.0639   Epoch: 4   Global Step: 20270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:35:31,994-Speed 5615.43 samples/sec   Loss 8.5181   LearningRate 0.0639   Epoch: 4   Global Step: 20280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:35:33,802-Speed 5665.02 samples/sec   Loss 8.4967   LearningRate 0.0639   Epoch: 4   Global Step: 20290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:35:35,628-Speed 5612.21 samples/sec   Loss 8.4135   LearningRate 0.0639   Epoch: 4   Global Step: 20300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:35:37,472-Speed 5555.10 samples/sec   Loss 8.4607   LearningRate 0.0639   Epoch: 4   Global Step: 20310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:35:39,309-Speed 5576.15 samples/sec   Loss 8.6387   LearningRate 0.0639   Epoch: 4   Global Step: 20320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:35:41,144-Speed 5585.47 samples/sec   Loss 8.7072   LearningRate 0.0638   Epoch: 4   Global Step: 20330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:35:42,992-Speed 5546.02 samples/sec   Loss 8.5758   LearningRate 0.0638   Epoch: 4   Global Step: 20340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:35:44,821-Speed 5598.04 samples/sec   Loss 8.6679   LearningRate 0.0638   Epoch: 4   Global Step: 20350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:35:46,660-Speed 5571.84 samples/sec   Loss 8.7407   LearningRate 0.0638   Epoch: 4   Global Step: 20360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:35:48,510-Speed 5537.88 samples/sec   Loss 8.7627   LearningRate 0.0638   Epoch: 4   Global Step: 20370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:35:50,347-Speed 5578.24 samples/sec   Loss 8.6404   LearningRate 0.0638   Epoch: 4   Global Step: 20380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:35:52,199-Speed 5531.92 samples/sec   Loss 8.7708   LearningRate 0.0638   Epoch: 4   Global Step: 20390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:35:54,040-Speed 5561.76 samples/sec   Loss 8.6206   LearningRate 0.0637   Epoch: 4   Global Step: 20400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:35:55,872-Speed 5596.00 samples/sec   Loss 8.6880   LearningRate 0.0637   Epoch: 4   Global Step: 20410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:35:57,697-Speed 5611.65 samples/sec   Loss 8.7636   LearningRate 0.0637   Epoch: 4   Global Step: 20420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:35:59,569-Speed 5472.47 samples/sec   Loss 8.8118   LearningRate 0.0637   Epoch: 4   Global Step: 20430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:01,391-Speed 5623.18 samples/sec   Loss 8.7410   LearningRate 0.0637   Epoch: 4   Global Step: 20440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:03,220-Speed 5601.35 samples/sec   Loss 8.7259   LearningRate 0.0637   Epoch: 4   Global Step: 20450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:05,033-Speed 5651.05 samples/sec   Loss 8.6106   LearningRate 0.0636   Epoch: 4   Global Step: 20460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:06,853-Speed 5631.32 samples/sec   Loss 8.6570   LearningRate 0.0636   Epoch: 4   Global Step: 20470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:08,663-Speed 5659.37 samples/sec   Loss 8.7595   LearningRate 0.0636   Epoch: 4   Global Step: 20480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:10,528-Speed 5493.99 samples/sec   Loss 8.8462   LearningRate 0.0636   Epoch: 4   Global Step: 20490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:12,349-Speed 5627.11 samples/sec   Loss 8.6175   LearningRate 0.0636   Epoch: 4   Global Step: 20500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:36:14,193-Speed 5554.39 samples/sec   Loss 8.8099   LearningRate 0.0636   Epoch: 4   Global Step: 20510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:36:16,006-Speed 5652.08 samples/sec   Loss 8.7622   LearningRate 0.0635   Epoch: 4   Global Step: 20520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:36:17,831-Speed 5612.49 samples/sec   Loss 8.9073   LearningRate 0.0635   Epoch: 4   Global Step: 20530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:36:19,643-Speed 5655.80 samples/sec   Loss 8.8464   LearningRate 0.0635   Epoch: 4   Global Step: 20540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:36:21,498-Speed 5521.85 samples/sec   Loss 8.7543   LearningRate 0.0635   Epoch: 4   Global Step: 20550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:36:23,325-Speed 5606.66 samples/sec   Loss 8.8590   LearningRate 0.0635   Epoch: 4   Global Step: 20560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:36:25,187-Speed 5500.89 samples/sec   Loss 8.9266   LearningRate 0.0635   Epoch: 4   Global Step: 20570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:36:27,013-Speed 5612.89 samples/sec   Loss 9.0738   LearningRate 0.0635   Epoch: 4   Global Step: 20580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:36:28,825-Speed 5654.90 samples/sec   Loss 8.9077   LearningRate 0.0634   Epoch: 4   Global Step: 20590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:36:30,626-Speed 5688.91 samples/sec   Loss 8.8913   LearningRate 0.0634   Epoch: 4   Global Step: 20600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:36:32,464-Speed 5570.81 samples/sec   Loss 8.9797   LearningRate 0.0634   Epoch: 4   Global Step: 20610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:34,313-Speed 5541.03 samples/sec   Loss 8.9939   LearningRate 0.0634   Epoch: 4   Global Step: 20620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:36,137-Speed 5618.97 samples/sec   Loss 8.8119   LearningRate 0.0634   Epoch: 4   Global Step: 20630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:37,969-Speed 5590.64 samples/sec   Loss 8.8743   LearningRate 0.0634   Epoch: 4   Global Step: 20640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:39,800-Speed 5595.50 samples/sec   Loss 8.9089   LearningRate 0.0633   Epoch: 4   Global Step: 20650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:41,631-Speed 5596.09 samples/sec   Loss 8.6811   LearningRate 0.0633   Epoch: 4   Global Step: 20660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:43,478-Speed 5546.61 samples/sec   Loss 9.0246   LearningRate 0.0633   Epoch: 4   Global Step: 20670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:45,295-Speed 5638.39 samples/sec   Loss 9.0654   LearningRate 0.0633   Epoch: 4   Global Step: 20680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:47,126-Speed 5595.24 samples/sec   Loss 9.0391   LearningRate 0.0633   Epoch: 4   Global Step: 20690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:48,998-Speed 5473.49 samples/sec   Loss 9.0327   LearningRate 0.0633   Epoch: 4   Global Step: 20700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:50,799-Speed 5686.34 samples/sec   Loss 8.9276   LearningRate 0.0632   Epoch: 4   Global Step: 20710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:52,646-Speed 5546.49 samples/sec   Loss 8.9945   LearningRate 0.0632   Epoch: 4   Global Step: 20720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:54,478-Speed 5594.47 samples/sec   Loss 8.8051   LearningRate 0.0632   Epoch: 4   Global Step: 20730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:56,304-Speed 5612.05 samples/sec   Loss 9.1188   LearningRate 0.0632   Epoch: 4   Global Step: 20740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:58,165-Speed 5504.42 samples/sec   Loss 8.9152   LearningRate 0.0632   Epoch: 4   Global Step: 20750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:36:59,995-Speed 5598.47 samples/sec   Loss 8.9878   LearningRate 0.0632   Epoch: 4   Global Step: 20760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:01,831-Speed 5578.13 samples/sec   Loss 8.9650   LearningRate 0.0632   Epoch: 4   Global Step: 20770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:03,679-Speed 5543.85 samples/sec   Loss 9.0670   LearningRate 0.0631   Epoch: 4   Global Step: 20780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:05,505-Speed 5613.09 samples/sec   Loss 8.9290   LearningRate 0.0631   Epoch: 4   Global Step: 20790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:07,349-Speed 5556.94 samples/sec   Loss 8.9355   LearningRate 0.0631   Epoch: 4   Global Step: 20800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:09,166-Speed 5638.61 samples/sec   Loss 9.0107   LearningRate 0.0631   Epoch: 4   Global Step: 20810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:11,027-Speed 5507.82 samples/sec   Loss 8.8538   LearningRate 0.0631   Epoch: 4   Global Step: 20820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:12,853-Speed 5611.77 samples/sec   Loss 8.9198   LearningRate 0.0631   Epoch: 4   Global Step: 20830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:14,702-Speed 5541.18 samples/sec   Loss 8.9219   LearningRate 0.0630   Epoch: 4   Global Step: 20840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:16,522-Speed 5629.45 samples/sec   Loss 8.9512   LearningRate 0.0630   Epoch: 4   Global Step: 20850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:18,387-Speed 5493.27 samples/sec   Loss 9.0284   LearningRate 0.0630   Epoch: 4   Global Step: 20860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:20,231-Speed 5555.11 samples/sec   Loss 9.1492   LearningRate 0.0630   Epoch: 4   Global Step: 20870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:22,083-Speed 5532.49 samples/sec   Loss 9.0794   LearningRate 0.0630   Epoch: 4   Global Step: 20880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:23,929-Speed 5548.03 samples/sec   Loss 9.0104   LearningRate 0.0630   Epoch: 4   Global Step: 20890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:25,751-Speed 5626.14 samples/sec   Loss 9.0634   LearningRate 0.0629   Epoch: 4   Global Step: 20900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:27,582-Speed 5596.84 samples/sec   Loss 8.9186   LearningRate 0.0629   Epoch: 4   Global Step: 20910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:37:29,426-Speed 5555.54 samples/sec   Loss 8.8288   LearningRate 0.0629   Epoch: 4   Global Step: 20920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:37:31,256-Speed 5598.07 samples/sec   Loss 9.0643   LearningRate 0.0629   Epoch: 4   Global Step: 20930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:37:33,095-Speed 5570.02 samples/sec   Loss 9.1214   LearningRate 0.0629   Epoch: 4   Global Step: 20940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:37:34,927-Speed 5592.90 samples/sec   Loss 8.9265   LearningRate 0.0629   Epoch: 4   Global Step: 20950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:37:36,750-Speed 5622.28 samples/sec   Loss 8.9585   LearningRate 0.0629   Epoch: 4   Global Step: 20960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:37:38,571-Speed 5623.55 samples/sec   Loss 9.2076   LearningRate 0.0628   Epoch: 4   Global Step: 20970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:37:40,374-Speed 5681.00 samples/sec   Loss 8.9342   LearningRate 0.0628   Epoch: 4   Global Step: 20980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:42,215-Speed 5565.54 samples/sec   Loss 9.0224   LearningRate 0.0628   Epoch: 4   Global Step: 20990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:44,042-Speed 5608.39 samples/sec   Loss 8.9401   LearningRate 0.0628   Epoch: 4   Global Step: 21000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:45,866-Speed 5615.34 samples/sec   Loss 9.0417   LearningRate 0.0628   Epoch: 4   Global Step: 21010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:47,707-Speed 5565.61 samples/sec   Loss 9.0081   LearningRate 0.0628   Epoch: 4   Global Step: 21020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:49,543-Speed 5578.68 samples/sec   Loss 9.0729   LearningRate 0.0627   Epoch: 4   Global Step: 21030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:51,363-Speed 5630.87 samples/sec   Loss 9.0329   LearningRate 0.0627   Epoch: 4   Global Step: 21040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:53,189-Speed 5608.71 samples/sec   Loss 9.1826   LearningRate 0.0627   Epoch: 4   Global Step: 21050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:55,006-Speed 5638.52 samples/sec   Loss 9.0234   LearningRate 0.0627   Epoch: 4   Global Step: 21060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:56,821-Speed 5643.21 samples/sec   Loss 8.9388   LearningRate 0.0627   Epoch: 4   Global Step: 21070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:37:58,648-Speed 5609.83 samples/sec   Loss 9.1965   LearningRate 0.0627   Epoch: 4   Global Step: 21080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:38:00,480-Speed 5592.97 samples/sec   Loss 8.9683   LearningRate 0.0627   Epoch: 4   Global Step: 21090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:02,308-Speed 5602.14 samples/sec   Loss 9.2176   LearningRate 0.0626   Epoch: 4   Global Step: 21100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:04,158-Speed 5540.59 samples/sec   Loss 9.0106   LearningRate 0.0626   Epoch: 4   Global Step: 21110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:05,994-Speed 5579.59 samples/sec   Loss 9.1290   LearningRate 0.0626   Epoch: 4   Global Step: 21120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:07,822-Speed 5606.95 samples/sec   Loss 9.0068   LearningRate 0.0626   Epoch: 4   Global Step: 21130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:09,653-Speed 5594.91 samples/sec   Loss 9.0173   LearningRate 0.0626   Epoch: 4   Global Step: 21140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:38:11,497-Speed 5555.23 samples/sec   Loss 9.1656   LearningRate 0.0626   Epoch: 4   Global Step: 21150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:38:13,332-Speed 5584.32 samples/sec   Loss 9.1176   LearningRate 0.0625   Epoch: 4   Global Step: 21160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:38:15,161-Speed 5600.74 samples/sec   Loss 9.1196   LearningRate 0.0625   Epoch: 4   Global Step: 21170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:38:16,987-Speed 5610.63 samples/sec   Loss 9.0190   LearningRate 0.0625   Epoch: 4   Global Step: 21180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:38:18,829-Speed 5565.20 samples/sec   Loss 9.2860   LearningRate 0.0625   Epoch: 4   Global Step: 21190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:38:20,670-Speed 5563.60 samples/sec   Loss 9.0494   LearningRate 0.0625   Epoch: 4   Global Step: 21200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:38:22,490-Speed 5628.91 samples/sec   Loss 9.1284   LearningRate 0.0625   Epoch: 4   Global Step: 21210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:38:24,317-Speed 5610.46 samples/sec   Loss 9.0635   LearningRate 0.0624   Epoch: 4   Global Step: 21220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:38:26,155-Speed 5574.34 samples/sec   Loss 9.0335   LearningRate 0.0624   Epoch: 4   Global Step: 21230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:38:28,009-Speed 5527.78 samples/sec   Loss 9.2640   LearningRate 0.0624   Epoch: 4   Global Step: 21240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:29,834-Speed 5611.02 samples/sec   Loss 9.0977   LearningRate 0.0624   Epoch: 4   Global Step: 21250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:31,675-Speed 5569.09 samples/sec   Loss 9.1753   LearningRate 0.0624   Epoch: 4   Global Step: 21260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:33,512-Speed 5576.55 samples/sec   Loss 9.0162   LearningRate 0.0624   Epoch: 4   Global Step: 21270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:35,357-Speed 5553.15 samples/sec   Loss 9.0695   LearningRate 0.0624   Epoch: 4   Global Step: 21280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:37,187-Speed 5598.78 samples/sec   Loss 8.9281   LearningRate 0.0623   Epoch: 4   Global Step: 21290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:39,034-Speed 5547.48 samples/sec   Loss 9.0216   LearningRate 0.0623   Epoch: 4   Global Step: 21300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:40,906-Speed 5474.00 samples/sec   Loss 9.0483   LearningRate 0.0623   Epoch: 4   Global Step: 21310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:42,732-Speed 5610.72 samples/sec   Loss 9.2169   LearningRate 0.0623   Epoch: 4   Global Step: 21320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:44,578-Speed 5548.08 samples/sec   Loss 9.1596   LearningRate 0.0623   Epoch: 4   Global Step: 21330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:46,421-Speed 5561.06 samples/sec   Loss 9.1560   LearningRate 0.0623   Epoch: 4   Global Step: 21340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:38:48,246-Speed 5613.93 samples/sec   Loss 9.1570   LearningRate 0.0622   Epoch: 4   Global Step: 21350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:38:50,104-Speed 5515.70 samples/sec   Loss 9.0884   LearningRate 0.0622   Epoch: 4   Global Step: 21360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:38:51,917-Speed 5651.80 samples/sec   Loss 8.9653   LearningRate 0.0622   Epoch: 4   Global Step: 21370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:53,744-Speed 5608.01 samples/sec   Loss 9.1094   LearningRate 0.0622   Epoch: 4   Global Step: 21380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:55,618-Speed 5465.78 samples/sec   Loss 9.1228   LearningRate 0.0622   Epoch: 4   Global Step: 21390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:57,443-Speed 5615.46 samples/sec   Loss 9.0272   LearningRate 0.0622   Epoch: 4   Global Step: 21400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:38:59,284-Speed 5562.48 samples/sec   Loss 9.0606   LearningRate 0.0622   Epoch: 4   Global Step: 21410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:39:01,148-Speed 5498.86 samples/sec   Loss 9.2062   LearningRate 0.0621   Epoch: 4   Global Step: 21420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:39:03,006-Speed 5514.17 samples/sec   Loss 8.9079   LearningRate 0.0621   Epoch: 4   Global Step: 21430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:39:04,869-Speed 5500.09 samples/sec   Loss 8.9460   LearningRate 0.0621   Epoch: 4   Global Step: 21440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:39:06,695-Speed 5611.46 samples/sec   Loss 9.2022   LearningRate 0.0621   Epoch: 4   Global Step: 21450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:39:08,539-Speed 5558.08 samples/sec   Loss 9.1706   LearningRate 0.0621   Epoch: 4   Global Step: 21460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:39:10,380-Speed 5563.58 samples/sec   Loss 9.1807   LearningRate 0.0621   Epoch: 4   Global Step: 21470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:39:12,241-Speed 5507.82 samples/sec   Loss 8.9765   LearningRate 0.0620   Epoch: 4   Global Step: 21480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:39:14,082-Speed 5563.91 samples/sec   Loss 8.9541   LearningRate 0.0620   Epoch: 4   Global Step: 21490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:39:15,907-Speed 5615.20 samples/sec   Loss 9.2012   LearningRate 0.0620   Epoch: 4   Global Step: 21500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:39:17,739-Speed 5589.96 samples/sec   Loss 9.1648   LearningRate 0.0620   Epoch: 4   Global Step: 21510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:39:19,578-Speed 5573.43 samples/sec   Loss 9.0589   LearningRate 0.0620   Epoch: 4   Global Step: 21520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:39:21,405-Speed 5606.93 samples/sec   Loss 9.0309   LearningRate 0.0620   Epoch: 4   Global Step: 21530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:39:23,255-Speed 5538.31 samples/sec   Loss 8.8048   LearningRate 0.0619   Epoch: 4   Global Step: 21540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:39:25,132-Speed 5459.95 samples/sec   Loss 9.1259   LearningRate 0.0619   Epoch: 4   Global Step: 21550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:39:26,990-Speed 5512.94 samples/sec   Loss 9.1696   LearningRate 0.0619   Epoch: 4   Global Step: 21560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:39:28,824-Speed 5589.50 samples/sec   Loss 9.2847   LearningRate 0.0619   Epoch: 4   Global Step: 21570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:39:30,673-Speed 5538.91 samples/sec   Loss 9.1107   LearningRate 0.0619   Epoch: 4   Global Step: 21580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:39:32,507-Speed 5586.95 samples/sec   Loss 8.9733   LearningRate 0.0619   Epoch: 4   Global Step: 21590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:39:34,360-Speed 5529.33 samples/sec   Loss 9.1503   LearningRate 0.0619   Epoch: 4   Global Step: 21600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:39:36,185-Speed 5613.44 samples/sec   Loss 8.9587   LearningRate 0.0618   Epoch: 4   Global Step: 21610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:39:38,006-Speed 5624.33 samples/sec   Loss 8.9859   LearningRate 0.0618   Epoch: 4   Global Step: 21620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:39:39,852-Speed 5550.49 samples/sec   Loss 9.0532   LearningRate 0.0618   Epoch: 4   Global Step: 21630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:39:41,664-Speed 5656.83 samples/sec   Loss 8.9689   LearningRate 0.0618   Epoch: 4   Global Step: 21640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:39:43,483-Speed 5632.10 samples/sec   Loss 8.9926   LearningRate 0.0618   Epoch: 4   Global Step: 21650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:39:45,341-Speed 5513.05 samples/sec   Loss 9.0233   LearningRate 0.0618   Epoch: 4   Global Step: 21660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:39:47,174-Speed 5588.30 samples/sec   Loss 9.1168   LearningRate 0.0617   Epoch: 4   Global Step: 21670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:39:49,035-Speed 5507.07 samples/sec   Loss 9.1135   LearningRate 0.0617   Epoch: 4   Global Step: 21680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:39:50,879-Speed 5555.33 samples/sec   Loss 9.0946   LearningRate 0.0617   Epoch: 4   Global Step: 21690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:39:52,718-Speed 5572.16 samples/sec   Loss 9.0539   LearningRate 0.0617   Epoch: 4   Global Step: 21700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:39:54,582-Speed 5496.32 samples/sec   Loss 9.1149   LearningRate 0.0617   Epoch: 4   Global Step: 21710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:39:56,403-Speed 5625.03 samples/sec   Loss 9.0266   LearningRate 0.0617   Epoch: 4   Global Step: 21720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:39:58,225-Speed 5626.20 samples/sec   Loss 9.1609   LearningRate 0.0617   Epoch: 4   Global Step: 21730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:00,077-Speed 5529.53 samples/sec   Loss 9.0236   LearningRate 0.0616   Epoch: 4   Global Step: 21740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:01,945-Speed 5491.09 samples/sec   Loss 8.9622   LearningRate 0.0616   Epoch: 4   Global Step: 21750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:03,800-Speed 5522.36 samples/sec   Loss 9.1783   LearningRate 0.0616   Epoch: 4   Global Step: 21760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:05,663-Speed 5497.98 samples/sec   Loss 9.1724   LearningRate 0.0616   Epoch: 4   Global Step: 21770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:07,523-Speed 5509.32 samples/sec   Loss 9.0428   LearningRate 0.0616   Epoch: 4   Global Step: 21780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:09,362-Speed 5569.94 samples/sec   Loss 8.9968   LearningRate 0.0616   Epoch: 4   Global Step: 21790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:11,191-Speed 5604.49 samples/sec   Loss 9.0394   LearningRate 0.0615   Epoch: 4   Global Step: 21800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:13,018-Speed 5610.29 samples/sec   Loss 9.1294   LearningRate 0.0615   Epoch: 4   Global Step: 21810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:14,862-Speed 5556.87 samples/sec   Loss 9.0006   LearningRate 0.0615   Epoch: 4   Global Step: 21820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:16,708-Speed 5550.03 samples/sec   Loss 9.2050   LearningRate 0.0615   Epoch: 4   Global Step: 21830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:40:18,549-Speed 5566.32 samples/sec   Loss 9.0071   LearningRate 0.0615   Epoch: 4   Global Step: 21840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:40:20,373-Speed 5615.64 samples/sec   Loss 8.9448   LearningRate 0.0615   Epoch: 4   Global Step: 21850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:40:22,205-Speed 5593.28 samples/sec   Loss 9.0794   LearningRate 0.0615   Epoch: 4   Global Step: 21860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:40:24,058-Speed 5529.35 samples/sec   Loss 9.1400   LearningRate 0.0614   Epoch: 4   Global Step: 21870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:40:25,885-Speed 5606.37 samples/sec   Loss 9.1576   LearningRate 0.0614   Epoch: 4   Global Step: 21880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:40:27,714-Speed 5600.37 samples/sec   Loss 8.9935   LearningRate 0.0614   Epoch: 4   Global Step: 21890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:40:29,552-Speed 5574.58 samples/sec   Loss 8.9498   LearningRate 0.0614   Epoch: 4   Global Step: 21900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:40:31,388-Speed 5580.22 samples/sec   Loss 9.0718   LearningRate 0.0614   Epoch: 4   Global Step: 21910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:40:33,218-Speed 5597.92 samples/sec   Loss 9.0503   LearningRate 0.0614   Epoch: 4   Global Step: 21920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:40:35,039-Speed 5626.53 samples/sec   Loss 9.1201   LearningRate 0.0613   Epoch: 4   Global Step: 21930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:36,863-Speed 5615.75 samples/sec   Loss 9.1180   LearningRate 0.0613   Epoch: 4   Global Step: 21940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:38,696-Speed 5589.91 samples/sec   Loss 8.9405   LearningRate 0.0613   Epoch: 4   Global Step: 21950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:40,539-Speed 5559.04 samples/sec   Loss 9.1246   LearningRate 0.0613   Epoch: 4   Global Step: 21960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:42,372-Speed 5591.28 samples/sec   Loss 8.9822   LearningRate 0.0613   Epoch: 4   Global Step: 21970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:44,216-Speed 5553.34 samples/sec   Loss 9.0481   LearningRate 0.0613   Epoch: 4   Global Step: 21980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:46,049-Speed 5589.31 samples/sec   Loss 9.1369   LearningRate 0.0612   Epoch: 4   Global Step: 21990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:40:47,905-Speed 5519.96 samples/sec   Loss 8.9961   LearningRate 0.0612   Epoch: 4   Global Step: 22000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:41:15,282-[lfw][22000]XNorm: 22.424608
Training: 2022-04-11 11:41:15,282-[lfw][22000]Accuracy-Flip: 0.99617+-0.00248
Training: 2022-04-11 11:41:15,283-[lfw][22000]Accuracy-Highest: 0.99667
Training: 2022-04-11 11:41:46,889-[cfp_fp][22000]XNorm: 19.330436
Training: 2022-04-11 11:41:46,890-[cfp_fp][22000]Accuracy-Flip: 0.94543+-0.01160
Training: 2022-04-11 11:41:46,891-[cfp_fp][22000]Accuracy-Highest: 0.95486
Training: 2022-04-11 11:42:14,177-[agedb_30][22000]XNorm: 22.090508
Training: 2022-04-11 11:42:14,177-[agedb_30][22000]Accuracy-Flip: 0.96850+-0.01029
Training: 2022-04-11 11:42:14,178-[agedb_30][22000]Accuracy-Highest: 0.97033
Training: 2022-04-11 11:42:16,034-Speed 116.19 samples/sec   Loss 9.1488   LearningRate 0.0612   Epoch: 4   Global Step: 22010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:42:17,873-Speed 5570.65 samples/sec   Loss 9.0137   LearningRate 0.0612   Epoch: 4   Global Step: 22020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:42:19,693-Speed 5629.61 samples/sec   Loss 9.0522   LearningRate 0.0612   Epoch: 4   Global Step: 22030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:21,538-Speed 5555.42 samples/sec   Loss 9.1663   LearningRate 0.0612   Epoch: 4   Global Step: 22040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:23,385-Speed 5546.18 samples/sec   Loss 9.0663   LearningRate 0.0612   Epoch: 4   Global Step: 22050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:25,235-Speed 5539.77 samples/sec   Loss 9.0132   LearningRate 0.0611   Epoch: 4   Global Step: 22060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:27,063-Speed 5603.30 samples/sec   Loss 9.0077   LearningRate 0.0611   Epoch: 4   Global Step: 22070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:28,962-Speed 5394.59 samples/sec   Loss 9.1119   LearningRate 0.0611   Epoch: 4   Global Step: 22080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:30,804-Speed 5561.79 samples/sec   Loss 9.2437   LearningRate 0.0611   Epoch: 4   Global Step: 22090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:32,658-Speed 5527.03 samples/sec   Loss 9.0369   LearningRate 0.0611   Epoch: 4   Global Step: 22100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:34,492-Speed 5585.76 samples/sec   Loss 9.0595   LearningRate 0.0611   Epoch: 4   Global Step: 22110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:36,339-Speed 5547.98 samples/sec   Loss 9.0965   LearningRate 0.0610   Epoch: 4   Global Step: 22120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:38,194-Speed 5524.56 samples/sec   Loss 9.0976   LearningRate 0.0610   Epoch: 4   Global Step: 22130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:42:40,044-Speed 5535.38 samples/sec   Loss 9.0409   LearningRate 0.0610   Epoch: 4   Global Step: 22140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:42:41,873-Speed 5602.61 samples/sec   Loss 8.8690   LearningRate 0.0610   Epoch: 4   Global Step: 22150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:42:43,713-Speed 5570.51 samples/sec   Loss 9.1478   LearningRate 0.0610   Epoch: 4   Global Step: 22160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:42:45,542-Speed 5600.46 samples/sec   Loss 8.9875   LearningRate 0.0610   Epoch: 4   Global Step: 22170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:47,362-Speed 5629.67 samples/sec   Loss 9.1308   LearningRate 0.0610   Epoch: 4   Global Step: 22180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:49,221-Speed 5510.21 samples/sec   Loss 8.9853   LearningRate 0.0609   Epoch: 4   Global Step: 22190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:51,051-Speed 5597.69 samples/sec   Loss 9.1591   LearningRate 0.0609   Epoch: 4   Global Step: 22200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:52,892-Speed 5564.33 samples/sec   Loss 9.1435   LearningRate 0.0609   Epoch: 4   Global Step: 22210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:54,741-Speed 5541.49 samples/sec   Loss 8.8779   LearningRate 0.0609   Epoch: 4   Global Step: 22220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:56,593-Speed 5531.83 samples/sec   Loss 9.2203   LearningRate 0.0609   Epoch: 4   Global Step: 22230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:42:58,415-Speed 5622.36 samples/sec   Loss 8.9296   LearningRate 0.0609   Epoch: 4   Global Step: 22240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:00,254-Speed 5571.63 samples/sec   Loss 8.9042   LearningRate 0.0608   Epoch: 4   Global Step: 22250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:02,124-Speed 5481.15 samples/sec   Loss 8.8208   LearningRate 0.0608   Epoch: 4   Global Step: 22260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:03,991-Speed 5485.50 samples/sec   Loss 9.0956   LearningRate 0.0608   Epoch: 4   Global Step: 22270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:43:05,832-Speed 5564.89 samples/sec   Loss 8.9704   LearningRate 0.0608   Epoch: 4   Global Step: 22280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:43:07,672-Speed 5567.90 samples/sec   Loss 9.0060   LearningRate 0.0608   Epoch: 4   Global Step: 22290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:43:09,510-Speed 5574.13 samples/sec   Loss 9.1643   LearningRate 0.0608   Epoch: 4   Global Step: 22300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:43:11,349-Speed 5571.35 samples/sec   Loss 8.9860   LearningRate 0.0608   Epoch: 4   Global Step: 22310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:43:13,187-Speed 5575.02 samples/sec   Loss 9.0114   LearningRate 0.0607   Epoch: 4   Global Step: 22320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:43:15,014-Speed 5605.78 samples/sec   Loss 9.0746   LearningRate 0.0607   Epoch: 4   Global Step: 22330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:16,839-Speed 5614.35 samples/sec   Loss 8.9978   LearningRate 0.0607   Epoch: 4   Global Step: 22340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:18,667-Speed 5608.02 samples/sec   Loss 9.0279   LearningRate 0.0607   Epoch: 4   Global Step: 22350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:20,513-Speed 5547.12 samples/sec   Loss 9.0785   LearningRate 0.0607   Epoch: 4   Global Step: 22360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:22,355-Speed 5562.51 samples/sec   Loss 9.0155   LearningRate 0.0607   Epoch: 4   Global Step: 22370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:24,187-Speed 5592.34 samples/sec   Loss 8.9030   LearningRate 0.0606   Epoch: 4   Global Step: 22380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:26,009-Speed 5622.23 samples/sec   Loss 9.1562   LearningRate 0.0606   Epoch: 4   Global Step: 22390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:27,856-Speed 5543.29 samples/sec   Loss 8.9540   LearningRate 0.0606   Epoch: 4   Global Step: 22400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:29,706-Speed 5537.03 samples/sec   Loss 9.0604   LearningRate 0.0606   Epoch: 4   Global Step: 22410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:31,535-Speed 5604.38 samples/sec   Loss 8.9574   LearningRate 0.0606   Epoch: 4   Global Step: 22420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:33,357-Speed 5621.90 samples/sec   Loss 8.9831   LearningRate 0.0606   Epoch: 4   Global Step: 22430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:43:35,172-Speed 5643.34 samples/sec   Loss 9.1597   LearningRate 0.0606   Epoch: 4   Global Step: 22440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:37,015-Speed 5559.70 samples/sec   Loss 8.8150   LearningRate 0.0605   Epoch: 4   Global Step: 22450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:38,855-Speed 5569.03 samples/sec   Loss 9.1738   LearningRate 0.0605   Epoch: 4   Global Step: 22460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:40,709-Speed 5524.75 samples/sec   Loss 9.1477   LearningRate 0.0605   Epoch: 4   Global Step: 22470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:42,537-Speed 5609.29 samples/sec   Loss 9.2229   LearningRate 0.0605   Epoch: 4   Global Step: 22480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:44,388-Speed 5533.75 samples/sec   Loss 9.1641   LearningRate 0.0605   Epoch: 4   Global Step: 22490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:46,252-Speed 5531.26 samples/sec   Loss 9.0189   LearningRate 0.0605   Epoch: 4   Global Step: 22500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:48,076-Speed 5617.99 samples/sec   Loss 8.9965   LearningRate 0.0604   Epoch: 4   Global Step: 22510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:49,938-Speed 5503.34 samples/sec   Loss 9.0372   LearningRate 0.0604   Epoch: 4   Global Step: 22520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:52,857-Speed 3508.84 samples/sec   Loss 9.0345   LearningRate 0.0604   Epoch: 4   Global Step: 22530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:43:54,692-Speed 5582.05 samples/sec   Loss 8.9961   LearningRate 0.0604   Epoch: 4   Global Step: 22540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:43:56,523-Speed 5597.50 samples/sec   Loss 8.9718   LearningRate 0.0604   Epoch: 4   Global Step: 22550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:43:58,345-Speed 5620.61 samples/sec   Loss 8.9974   LearningRate 0.0604   Epoch: 4   Global Step: 22560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:44:00,177-Speed 5591.83 samples/sec   Loss 8.9779   LearningRate 0.0604   Epoch: 4   Global Step: 22570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:02,008-Speed 5596.85 samples/sec   Loss 9.0048   LearningRate 0.0603   Epoch: 4   Global Step: 22580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:03,841-Speed 5588.34 samples/sec   Loss 8.9004   LearningRate 0.0603   Epoch: 4   Global Step: 22590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:05,682-Speed 5562.91 samples/sec   Loss 9.0147   LearningRate 0.0603   Epoch: 4   Global Step: 22600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:07,520-Speed 5574.83 samples/sec   Loss 9.0589   LearningRate 0.0603   Epoch: 4   Global Step: 22610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:09,350-Speed 5598.57 samples/sec   Loss 8.8839   LearningRate 0.0603   Epoch: 4   Global Step: 22620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:11,182-Speed 5592.24 samples/sec   Loss 9.1251   LearningRate 0.0603   Epoch: 4   Global Step: 22630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:13,074-Speed 5414.59 samples/sec   Loss 9.0501   LearningRate 0.0602   Epoch: 4   Global Step: 22640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:14,914-Speed 5567.94 samples/sec   Loss 9.0388   LearningRate 0.0602   Epoch: 4   Global Step: 22650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:16,775-Speed 5509.84 samples/sec   Loss 9.0445   LearningRate 0.0602   Epoch: 4   Global Step: 22660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:18,601-Speed 5610.30 samples/sec   Loss 9.1455   LearningRate 0.0602   Epoch: 4   Global Step: 22670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:44:20,460-Speed 5510.47 samples/sec   Loss 9.0248   LearningRate 0.0602   Epoch: 4   Global Step: 22680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:44:22,302-Speed 5563.14 samples/sec   Loss 8.9169   LearningRate 0.0602   Epoch: 4   Global Step: 22690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:44:24,126-Speed 5615.96 samples/sec   Loss 8.9059   LearningRate 0.0602   Epoch: 4   Global Step: 22700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:44:25,966-Speed 5568.93 samples/sec   Loss 9.0197   LearningRate 0.0601   Epoch: 4   Global Step: 22710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:44:27,794-Speed 5603.80 samples/sec   Loss 9.0123   LearningRate 0.0601   Epoch: 4   Global Step: 22720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:29,633-Speed 5572.26 samples/sec   Loss 9.0942   LearningRate 0.0601   Epoch: 4   Global Step: 22730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:31,473-Speed 5566.77 samples/sec   Loss 9.0907   LearningRate 0.0601   Epoch: 4   Global Step: 22740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:33,308-Speed 5582.91 samples/sec   Loss 8.9583   LearningRate 0.0601   Epoch: 4   Global Step: 22750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:35,160-Speed 5532.15 samples/sec   Loss 8.9972   LearningRate 0.0601   Epoch: 4   Global Step: 22760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:37,012-Speed 5531.42 samples/sec   Loss 8.9135   LearningRate 0.0600   Epoch: 4   Global Step: 22770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:38,841-Speed 5601.83 samples/sec   Loss 9.0594   LearningRate 0.0600   Epoch: 4   Global Step: 22780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:40,680-Speed 5570.90 samples/sec   Loss 9.0284   LearningRate 0.0600   Epoch: 4   Global Step: 22790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:42,535-Speed 5521.73 samples/sec   Loss 9.0077   LearningRate 0.0600   Epoch: 4   Global Step: 22800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:44,370-Speed 5583.77 samples/sec   Loss 8.9549   LearningRate 0.0600   Epoch: 4   Global Step: 22810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:46,204-Speed 5586.38 samples/sec   Loss 8.9230   LearningRate 0.0600   Epoch: 4   Global Step: 22820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:44:48,096-Speed 5416.16 samples/sec   Loss 8.9673   LearningRate 0.0600   Epoch: 4   Global Step: 22830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:44:49,960-Speed 5495.07 samples/sec   Loss 8.9314   LearningRate 0.0599   Epoch: 4   Global Step: 22840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:44:51,797-Speed 5577.06 samples/sec   Loss 8.9357   LearningRate 0.0599   Epoch: 4   Global Step: 22850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:44:53,702-Speed 5379.33 samples/sec   Loss 8.9116   LearningRate 0.0599   Epoch: 4   Global Step: 22860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:44:55,532-Speed 5598.87 samples/sec   Loss 9.0411   LearningRate 0.0599   Epoch: 4   Global Step: 22870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:57,380-Speed 5542.46 samples/sec   Loss 9.1573   LearningRate 0.0599   Epoch: 4   Global Step: 22880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:44:59,217-Speed 5579.00 samples/sec   Loss 8.9021   LearningRate 0.0599   Epoch: 4   Global Step: 22890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:01,078-Speed 5503.13 samples/sec   Loss 9.0126   LearningRate 0.0598   Epoch: 4   Global Step: 22900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:02,943-Speed 5493.00 samples/sec   Loss 8.8894   LearningRate 0.0598   Epoch: 4   Global Step: 22910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:04,797-Speed 5528.18 samples/sec   Loss 8.9648   LearningRate 0.0598   Epoch: 4   Global Step: 22920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:06,668-Speed 5475.42 samples/sec   Loss 8.9708   LearningRate 0.0598   Epoch: 4   Global Step: 22930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:08,504-Speed 5581.72 samples/sec   Loss 8.9556   LearningRate 0.0598   Epoch: 4   Global Step: 22940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:10,355-Speed 5534.30 samples/sec   Loss 9.0424   LearningRate 0.0598   Epoch: 4   Global Step: 22950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:12,218-Speed 5497.63 samples/sec   Loss 9.0785   LearningRate 0.0598   Epoch: 4   Global Step: 22960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:14,048-Speed 5599.38 samples/sec   Loss 8.9464   LearningRate 0.0597   Epoch: 4   Global Step: 22970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:45:15,922-Speed 5466.54 samples/sec   Loss 8.9517   LearningRate 0.0597   Epoch: 4   Global Step: 22980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:45:17,791-Speed 5481.84 samples/sec   Loss 8.8234   LearningRate 0.0597   Epoch: 4   Global Step: 22990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:45:19,636-Speed 5552.46 samples/sec   Loss 8.9967   LearningRate 0.0597   Epoch: 4   Global Step: 23000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:45:21,484-Speed 5545.43 samples/sec   Loss 8.9078   LearningRate 0.0597   Epoch: 4   Global Step: 23010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:45:23,377-Speed 5411.85 samples/sec   Loss 8.9144   LearningRate 0.0597   Epoch: 4   Global Step: 23020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:45:25,248-Speed 5476.97 samples/sec   Loss 8.8859   LearningRate 0.0597   Epoch: 4   Global Step: 23030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:45:27,103-Speed 5522.43 samples/sec   Loss 8.9320   LearningRate 0.0596   Epoch: 4   Global Step: 23040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:45:28,945-Speed 5559.81 samples/sec   Loss 9.0526   LearningRate 0.0596   Epoch: 4   Global Step: 23050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:45:30,795-Speed 5537.38 samples/sec   Loss 8.9455   LearningRate 0.0596   Epoch: 4   Global Step: 23060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:45:32,634-Speed 5571.93 samples/sec   Loss 8.8616   LearningRate 0.0596   Epoch: 4   Global Step: 23070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:45:34,477-Speed 5560.31 samples/sec   Loss 8.9456   LearningRate 0.0596   Epoch: 4   Global Step: 23080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:36,388-Speed 5359.96 samples/sec   Loss 8.8161   LearningRate 0.0596   Epoch: 4   Global Step: 23090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:38,277-Speed 5423.19 samples/sec   Loss 8.9216   LearningRate 0.0595   Epoch: 4   Global Step: 23100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:40,151-Speed 5466.79 samples/sec   Loss 9.0122   LearningRate 0.0595   Epoch: 4   Global Step: 23110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:41,997-Speed 5550.30 samples/sec   Loss 8.8167   LearningRate 0.0595   Epoch: 4   Global Step: 23120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:43,855-Speed 5515.56 samples/sec   Loss 8.9899   LearningRate 0.0595   Epoch: 4   Global Step: 23130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:45,702-Speed 5545.13 samples/sec   Loss 8.9160   LearningRate 0.0595   Epoch: 4   Global Step: 23140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:47,574-Speed 5474.77 samples/sec   Loss 8.7978   LearningRate 0.0595   Epoch: 4   Global Step: 23150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:49,461-Speed 5430.13 samples/sec   Loss 8.9641   LearningRate 0.0595   Epoch: 4   Global Step: 23160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:51,333-Speed 5472.70 samples/sec   Loss 9.0036   LearningRate 0.0594   Epoch: 4   Global Step: 23170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:45:53,186-Speed 5526.75 samples/sec   Loss 8.9836   LearningRate 0.0594   Epoch: 4   Global Step: 23180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:45:55,034-Speed 5547.17 samples/sec   Loss 8.8652   LearningRate 0.0594   Epoch: 4   Global Step: 23190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:45:56,891-Speed 5516.17 samples/sec   Loss 9.0068   LearningRate 0.0594   Epoch: 4   Global Step: 23200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:45:58,744-Speed 5527.06 samples/sec   Loss 8.9560   LearningRate 0.0594   Epoch: 4   Global Step: 23210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:46:00,616-Speed 5475.09 samples/sec   Loss 8.8148   LearningRate 0.0594   Epoch: 4   Global Step: 23220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:46:02,473-Speed 5516.95 samples/sec   Loss 8.8197   LearningRate 0.0593   Epoch: 4   Global Step: 23230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:46:04,317-Speed 5555.31 samples/sec   Loss 8.8265   LearningRate 0.0593   Epoch: 4   Global Step: 23240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:46:06,153-Speed 5578.39 samples/sec   Loss 8.7919   LearningRate 0.0593   Epoch: 4   Global Step: 23250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:46:07,992-Speed 5573.11 samples/sec   Loss 8.7991   LearningRate 0.0593   Epoch: 4   Global Step: 23260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:46:09,847-Speed 5523.43 samples/sec   Loss 9.0195   LearningRate 0.0593   Epoch: 4   Global Step: 23270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:46:11,673-Speed 5608.20 samples/sec   Loss 8.7596   LearningRate 0.0593   Epoch: 4   Global Step: 23280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:46:13,564-Speed 5419.82 samples/sec   Loss 8.9190   LearningRate 0.0593   Epoch: 4   Global Step: 23290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:46:15,439-Speed 5464.24 samples/sec   Loss 8.9619   LearningRate 0.0592   Epoch: 4   Global Step: 23300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:46:17,274-Speed 5582.82 samples/sec   Loss 8.9572   LearningRate 0.0592   Epoch: 4   Global Step: 23310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:46:19,106-Speed 5593.98 samples/sec   Loss 8.9832   LearningRate 0.0592   Epoch: 4   Global Step: 23320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:46:20,942-Speed 5577.37 samples/sec   Loss 8.9018   LearningRate 0.0592   Epoch: 4   Global Step: 23330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:22,796-Speed 5525.36 samples/sec   Loss 8.9687   LearningRate 0.0592   Epoch: 4   Global Step: 23340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:24,630-Speed 5586.26 samples/sec   Loss 8.9295   LearningRate 0.0592   Epoch: 4   Global Step: 23350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:26,498-Speed 5484.66 samples/sec   Loss 9.0251   LearningRate 0.0591   Epoch: 4   Global Step: 23360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:28,348-Speed 5539.29 samples/sec   Loss 9.0687   LearningRate 0.0591   Epoch: 4   Global Step: 23370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:30,200-Speed 5531.06 samples/sec   Loss 8.8480   LearningRate 0.0591   Epoch: 4   Global Step: 23380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:32,036-Speed 5581.46 samples/sec   Loss 8.9868   LearningRate 0.0591   Epoch: 4   Global Step: 23390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:33,910-Speed 5466.81 samples/sec   Loss 8.8625   LearningRate 0.0591   Epoch: 4   Global Step: 23400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:35,760-Speed 5537.41 samples/sec   Loss 8.8740   LearningRate 0.0591   Epoch: 4   Global Step: 23410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:37,607-Speed 5548.75 samples/sec   Loss 9.0180   LearningRate 0.0591   Epoch: 4   Global Step: 23420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:39,444-Speed 5578.14 samples/sec   Loss 8.7877   LearningRate 0.0590   Epoch: 4   Global Step: 23430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:46:41,271-Speed 5606.35 samples/sec   Loss 8.9422   LearningRate 0.0590   Epoch: 4   Global Step: 23440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:43,102-Speed 5593.63 samples/sec   Loss 8.8941   LearningRate 0.0590   Epoch: 4   Global Step: 23450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:44,943-Speed 5563.63 samples/sec   Loss 8.9328   LearningRate 0.0590   Epoch: 4   Global Step: 23460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:46,822-Speed 5455.30 samples/sec   Loss 9.0511   LearningRate 0.0590   Epoch: 4   Global Step: 23470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:48,657-Speed 5581.52 samples/sec   Loss 8.8542   LearningRate 0.0590   Epoch: 4   Global Step: 23480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:50,505-Speed 5543.66 samples/sec   Loss 9.0490   LearningRate 0.0590   Epoch: 4   Global Step: 23490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:52,356-Speed 5534.78 samples/sec   Loss 8.8497   LearningRate 0.0589   Epoch: 4   Global Step: 23500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:54,228-Speed 5476.83 samples/sec   Loss 8.8596   LearningRate 0.0589   Epoch: 4   Global Step: 23510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:56,058-Speed 5598.73 samples/sec   Loss 8.8359   LearningRate 0.0589   Epoch: 4   Global Step: 23520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:57,915-Speed 5514.99 samples/sec   Loss 8.9291   LearningRate 0.0589   Epoch: 4   Global Step: 23530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:46:59,760-Speed 5553.68 samples/sec   Loss 9.0132   LearningRate 0.0589   Epoch: 4   Global Step: 23540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:47:01,638-Speed 5454.68 samples/sec   Loss 9.0085   LearningRate 0.0589   Epoch: 4   Global Step: 23550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:47:03,483-Speed 5552.77 samples/sec   Loss 8.9291   LearningRate 0.0588   Epoch: 4   Global Step: 23560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:47:05,342-Speed 5511.49 samples/sec   Loss 8.6456   LearningRate 0.0588   Epoch: 4   Global Step: 23570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:47:07,172-Speed 5598.23 samples/sec   Loss 8.8007   LearningRate 0.0588   Epoch: 4   Global Step: 23580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:47:09,040-Speed 5486.42 samples/sec   Loss 8.8486   LearningRate 0.0588   Epoch: 4   Global Step: 23590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:47:10,885-Speed 5553.75 samples/sec   Loss 8.8026   LearningRate 0.0588   Epoch: 4   Global Step: 23600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:47:12,732-Speed 5546.39 samples/sec   Loss 9.0247   LearningRate 0.0588   Epoch: 4   Global Step: 23610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:47:14,575-Speed 5557.14 samples/sec   Loss 8.9574   LearningRate 0.0588   Epoch: 4   Global Step: 23620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:47:16,423-Speed 5545.39 samples/sec   Loss 9.0239   LearningRate 0.0587   Epoch: 4   Global Step: 23630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:47:18,291-Speed 5486.52 samples/sec   Loss 8.8679   LearningRate 0.0587   Epoch: 4   Global Step: 23640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:47:20,148-Speed 5516.77 samples/sec   Loss 8.7254   LearningRate 0.0587   Epoch: 4   Global Step: 23650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:47:21,991-Speed 5558.73 samples/sec   Loss 8.8189   LearningRate 0.0587   Epoch: 4   Global Step: 23660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:47:23,872-Speed 5447.41 samples/sec   Loss 8.7376   LearningRate 0.0587   Epoch: 4   Global Step: 23670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:47:25,753-Speed 5447.89 samples/sec   Loss 8.8045   LearningRate 0.0587   Epoch: 4   Global Step: 23680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:47:27,609-Speed 5519.54 samples/sec   Loss 8.9857   LearningRate 0.0586   Epoch: 4   Global Step: 23690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:47:29,474-Speed 5491.70 samples/sec   Loss 8.9177   LearningRate 0.0586   Epoch: 4   Global Step: 23700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:47:31,315-Speed 5565.03 samples/sec   Loss 8.9886   LearningRate 0.0586   Epoch: 4   Global Step: 23710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:47:33,180-Speed 5493.09 samples/sec   Loss 8.7573   LearningRate 0.0586   Epoch: 4   Global Step: 23720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:47:35,082-Speed 5388.42 samples/sec   Loss 8.9703   LearningRate 0.0586   Epoch: 4   Global Step: 23730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:47:36,959-Speed 5456.79 samples/sec   Loss 8.7581   LearningRate 0.0586   Epoch: 4   Global Step: 23740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:47:38,824-Speed 5494.87 samples/sec   Loss 8.7347   LearningRate 0.0586   Epoch: 4   Global Step: 23750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:47:40,705-Speed 5443.73 samples/sec   Loss 8.7263   LearningRate 0.0585   Epoch: 4   Global Step: 23760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:47:42,632-Speed 5317.47 samples/sec   Loss 8.9052   LearningRate 0.0585   Epoch: 4   Global Step: 23770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:47:44,500-Speed 5485.55 samples/sec   Loss 8.8379   LearningRate 0.0585   Epoch: 4   Global Step: 23780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:47:46,352-Speed 5532.55 samples/sec   Loss 8.7652   LearningRate 0.0585   Epoch: 4   Global Step: 23790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:47:48,189-Speed 5578.26 samples/sec   Loss 8.5751   LearningRate 0.0585   Epoch: 4   Global Step: 23800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:47:50,032-Speed 5556.55 samples/sec   Loss 8.8739   LearningRate 0.0585   Epoch: 4   Global Step: 23810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:47:51,900-Speed 5484.55 samples/sec   Loss 8.8944   LearningRate 0.0585   Epoch: 4   Global Step: 23820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:47:53,745-Speed 5552.93 samples/sec   Loss 9.0171   LearningRate 0.0584   Epoch: 4   Global Step: 23830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:47:55,590-Speed 5553.08 samples/sec   Loss 8.9388   LearningRate 0.0584   Epoch: 4   Global Step: 23840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:47:57,425-Speed 5581.29 samples/sec   Loss 8.8041   LearningRate 0.0584   Epoch: 4   Global Step: 23850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:47:59,293-Speed 5486.68 samples/sec   Loss 9.0566   LearningRate 0.0584   Epoch: 4   Global Step: 23860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:48:01,157-Speed 5495.86 samples/sec   Loss 8.7307   LearningRate 0.0584   Epoch: 4   Global Step: 23870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:48:03,012-Speed 5522.34 samples/sec   Loss 8.7890   LearningRate 0.0584   Epoch: 4   Global Step: 23880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:48:04,837-Speed 5612.25 samples/sec   Loss 8.9201   LearningRate 0.0583   Epoch: 4   Global Step: 23890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:48:06,682-Speed 5555.73 samples/sec   Loss 8.8615   LearningRate 0.0583   Epoch: 4   Global Step: 23900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:48:08,509-Speed 5606.25 samples/sec   Loss 8.8849   LearningRate 0.0583   Epoch: 4   Global Step: 23910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:48:10,335-Speed 5611.66 samples/sec   Loss 8.8813   LearningRate 0.0583   Epoch: 4   Global Step: 23920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:48:12,201-Speed 5490.44 samples/sec   Loss 8.9424   LearningRate 0.0583   Epoch: 4   Global Step: 23930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:48:14,031-Speed 5595.85 samples/sec   Loss 8.8207   LearningRate 0.0583   Epoch: 4   Global Step: 23940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:48:15,882-Speed 5535.75 samples/sec   Loss 8.7079   LearningRate 0.0583   Epoch: 4   Global Step: 23950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:48:17,717-Speed 5584.29 samples/sec   Loss 8.8734   LearningRate 0.0582   Epoch: 4   Global Step: 23960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:48:19,579-Speed 5502.08 samples/sec   Loss 8.9125   LearningRate 0.0582   Epoch: 4   Global Step: 23970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:48:21,429-Speed 5537.63 samples/sec   Loss 8.8114   LearningRate 0.0582   Epoch: 4   Global Step: 23980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:48:23,304-Speed 5464.18 samples/sec   Loss 8.8395   LearningRate 0.0582   Epoch: 4   Global Step: 23990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:48:25,153-Speed 5541.46 samples/sec   Loss 8.7440   LearningRate 0.0582   Epoch: 4   Global Step: 24000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:48:52,091-[lfw][24000]XNorm: 22.663238
Training: 2022-04-11 11:48:52,091-[lfw][24000]Accuracy-Flip: 0.99633+-0.00267
Training: 2022-04-11 11:48:52,092-[lfw][24000]Accuracy-Highest: 0.99667
Training: 2022-04-11 11:49:23,434-[cfp_fp][24000]XNorm: 19.661125
Training: 2022-04-11 11:49:23,434-[cfp_fp][24000]Accuracy-Flip: 0.94657+-0.01216
Training: 2022-04-11 11:49:23,435-[cfp_fp][24000]Accuracy-Highest: 0.95486
Training: 2022-04-11 11:49:50,424-[agedb_30][24000]XNorm: 22.181026
Training: 2022-04-11 11:49:50,425-[agedb_30][24000]Accuracy-Flip: 0.97083+-0.00696
Training: 2022-04-11 11:49:50,426-[agedb_30][24000]Accuracy-Highest: 0.97083
Training: 2022-04-11 11:49:52,298-Speed 117.51 samples/sec   Loss 8.8683   LearningRate 0.0582   Epoch: 4   Global Step: 24010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:49:54,146-Speed 5541.19 samples/sec   Loss 8.8980   LearningRate 0.0581   Epoch: 4   Global Step: 24020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:49:55,980-Speed 5587.53 samples/sec   Loss 8.7410   LearningRate 0.0581   Epoch: 4   Global Step: 24030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:49:57,823-Speed 5558.76 samples/sec   Loss 8.7524   LearningRate 0.0581   Epoch: 4   Global Step: 24040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:49:59,690-Speed 5488.97 samples/sec   Loss 8.9425   LearningRate 0.0581   Epoch: 4   Global Step: 24050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:01,532-Speed 5561.16 samples/sec   Loss 8.6334   LearningRate 0.0581   Epoch: 4   Global Step: 24060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:03,366-Speed 5585.09 samples/sec   Loss 8.8283   LearningRate 0.0581   Epoch: 4   Global Step: 24070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:05,267-Speed 5391.02 samples/sec   Loss 8.9035   LearningRate 0.0581   Epoch: 4   Global Step: 24080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:07,097-Speed 5597.47 samples/sec   Loss 8.7001   LearningRate 0.0580   Epoch: 4   Global Step: 24090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:50:08,944-Speed 5548.20 samples/sec   Loss 8.7939   LearningRate 0.0580   Epoch: 4   Global Step: 24100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:50:10,793-Speed 5539.77 samples/sec   Loss 8.8095   LearningRate 0.0580   Epoch: 4   Global Step: 24110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:50:12,648-Speed 5524.37 samples/sec   Loss 8.9128   LearningRate 0.0580   Epoch: 4   Global Step: 24120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:50:14,513-Speed 5491.40 samples/sec   Loss 8.7011   LearningRate 0.0580   Epoch: 4   Global Step: 24130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:50:16,394-Speed 5448.64 samples/sec   Loss 8.6672   LearningRate 0.0580   Epoch: 4   Global Step: 24140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:50:18,234-Speed 5566.15 samples/sec   Loss 8.7838   LearningRate 0.0580   Epoch: 4   Global Step: 24150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:50:20,087-Speed 5529.55 samples/sec   Loss 8.8448   LearningRate 0.0579   Epoch: 4   Global Step: 24160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:50:21,956-Speed 5481.53 samples/sec   Loss 8.9513   LearningRate 0.0579   Epoch: 4   Global Step: 24170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:50:23,790-Speed 5586.60 samples/sec   Loss 8.7050   LearningRate 0.0579   Epoch: 4   Global Step: 24180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:50:25,631-Speed 5564.00 samples/sec   Loss 8.7694   LearningRate 0.0579   Epoch: 4   Global Step: 24190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:27,478-Speed 5547.40 samples/sec   Loss 8.6519   LearningRate 0.0579   Epoch: 4   Global Step: 24200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:29,331-Speed 5530.08 samples/sec   Loss 8.9031   LearningRate 0.0579   Epoch: 4   Global Step: 24210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:31,179-Speed 5543.75 samples/sec   Loss 9.0054   LearningRate 0.0578   Epoch: 4   Global Step: 24220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:33,021-Speed 5565.45 samples/sec   Loss 8.7753   LearningRate 0.0578   Epoch: 4   Global Step: 24230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:34,893-Speed 5471.53 samples/sec   Loss 8.7837   LearningRate 0.0578   Epoch: 4   Global Step: 24240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:36,745-Speed 5530.53 samples/sec   Loss 8.9111   LearningRate 0.0578   Epoch: 4   Global Step: 24250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:38,580-Speed 5581.34 samples/sec   Loss 8.8629   LearningRate 0.0578   Epoch: 4   Global Step: 24260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:40,417-Speed 5577.76 samples/sec   Loss 8.7569   LearningRate 0.0578   Epoch: 4   Global Step: 24270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:42,258-Speed 5565.61 samples/sec   Loss 8.7056   LearningRate 0.0578   Epoch: 4   Global Step: 24280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:44,147-Speed 5421.91 samples/sec   Loss 8.8646   LearningRate 0.0577   Epoch: 4   Global Step: 24290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:50:46,003-Speed 5521.96 samples/sec   Loss 8.8741   LearningRate 0.0577   Epoch: 4   Global Step: 24300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:50:47,856-Speed 5527.39 samples/sec   Loss 8.9313   LearningRate 0.0577   Epoch: 4   Global Step: 24310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:50:49,700-Speed 5554.68 samples/sec   Loss 8.6681   LearningRate 0.0577   Epoch: 4   Global Step: 24320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:51,536-Speed 5579.14 samples/sec   Loss 9.0343   LearningRate 0.0577   Epoch: 4   Global Step: 24330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:53,409-Speed 5470.67 samples/sec   Loss 8.8306   LearningRate 0.0577   Epoch: 4   Global Step: 24340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:55,245-Speed 5578.58 samples/sec   Loss 8.8342   LearningRate 0.0577   Epoch: 4   Global Step: 24350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:57,126-Speed 5447.93 samples/sec   Loss 8.9402   LearningRate 0.0576   Epoch: 4   Global Step: 24360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:50:58,971-Speed 5553.82 samples/sec   Loss 8.7521   LearningRate 0.0576   Epoch: 4   Global Step: 24370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:00,816-Speed 5551.28 samples/sec   Loss 8.9036   LearningRate 0.0576   Epoch: 4   Global Step: 24380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:02,660-Speed 5557.93 samples/sec   Loss 8.8394   LearningRate 0.0576   Epoch: 4   Global Step: 24390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:04,515-Speed 5522.09 samples/sec   Loss 8.7830   LearningRate 0.0576   Epoch: 4   Global Step: 24400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:06,370-Speed 5524.88 samples/sec   Loss 8.8758   LearningRate 0.0576   Epoch: 4   Global Step: 24410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:08,210-Speed 5565.48 samples/sec   Loss 8.7076   LearningRate 0.0575   Epoch: 4   Global Step: 24420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:51:10,062-Speed 5532.83 samples/sec   Loss 8.7657   LearningRate 0.0575   Epoch: 4   Global Step: 24430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:51:11,938-Speed 5460.05 samples/sec   Loss 9.0067   LearningRate 0.0575   Epoch: 4   Global Step: 24440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:51:13,792-Speed 5528.15 samples/sec   Loss 8.8809   LearningRate 0.0575   Epoch: 4   Global Step: 24450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:51:15,649-Speed 5514.94 samples/sec   Loss 8.6940   LearningRate 0.0575   Epoch: 4   Global Step: 24460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:51:17,521-Speed 5476.29 samples/sec   Loss 8.9246   LearningRate 0.0575   Epoch: 4   Global Step: 24470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:51:19,374-Speed 5528.81 samples/sec   Loss 8.9154   LearningRate 0.0575   Epoch: 4   Global Step: 24480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:21,241-Speed 5487.17 samples/sec   Loss 8.6428   LearningRate 0.0574   Epoch: 4   Global Step: 24490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:23,088-Speed 5546.83 samples/sec   Loss 8.7853   LearningRate 0.0574   Epoch: 4   Global Step: 24500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:24,970-Speed 5443.48 samples/sec   Loss 8.7682   LearningRate 0.0574   Epoch: 4   Global Step: 24510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:26,854-Speed 5437.85 samples/sec   Loss 8.6647   LearningRate 0.0574   Epoch: 4   Global Step: 24520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:28,753-Speed 5393.17 samples/sec   Loss 8.7683   LearningRate 0.0574   Epoch: 4   Global Step: 24530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:30,594-Speed 5566.10 samples/sec   Loss 8.7643   LearningRate 0.0574   Epoch: 4   Global Step: 24540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:32,482-Speed 5425.77 samples/sec   Loss 8.8163   LearningRate 0.0574   Epoch: 4   Global Step: 24550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:34,345-Speed 5500.31 samples/sec   Loss 8.6139   LearningRate 0.0573   Epoch: 4   Global Step: 24560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:36,187-Speed 5561.96 samples/sec   Loss 8.7482   LearningRate 0.0573   Epoch: 4   Global Step: 24570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:38,025-Speed 5574.80 samples/sec   Loss 8.7912   LearningRate 0.0573   Epoch: 4   Global Step: 24580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:51:39,879-Speed 5525.10 samples/sec   Loss 8.9074   LearningRate 0.0573   Epoch: 4   Global Step: 24590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:51:41,738-Speed 5509.87 samples/sec   Loss 8.6607   LearningRate 0.0573   Epoch: 4   Global Step: 24600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:51:43,627-Speed 5424.37 samples/sec   Loss 8.7991   LearningRate 0.0573   Epoch: 4   Global Step: 24610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:51:45,498-Speed 5476.17 samples/sec   Loss 8.6615   LearningRate 0.0572   Epoch: 4   Global Step: 24620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:51:47,358-Speed 5508.32 samples/sec   Loss 8.7725   LearningRate 0.0572   Epoch: 4   Global Step: 24630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:51:49,247-Speed 5422.74 samples/sec   Loss 8.9604   LearningRate 0.0572   Epoch: 4   Global Step: 24640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:51:51,115-Speed 5484.20 samples/sec   Loss 8.7349   LearningRate 0.0572   Epoch: 4   Global Step: 24650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:51:52,958-Speed 5559.00 samples/sec   Loss 8.9019   LearningRate 0.0572   Epoch: 4   Global Step: 24660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:51:54,784-Speed 5610.32 samples/sec   Loss 8.7768   LearningRate 0.0572   Epoch: 4   Global Step: 24670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:56,632-Speed 5541.67 samples/sec   Loss 8.7278   LearningRate 0.0572   Epoch: 4   Global Step: 24680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:51:58,470-Speed 5575.91 samples/sec   Loss 8.9318   LearningRate 0.0571   Epoch: 4   Global Step: 24690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:00,319-Speed 5539.05 samples/sec   Loss 8.7123   LearningRate 0.0571   Epoch: 4   Global Step: 24700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:02,175-Speed 5518.76 samples/sec   Loss 8.7453   LearningRate 0.0571   Epoch: 4   Global Step: 24710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:04,037-Speed 5503.22 samples/sec   Loss 8.5444   LearningRate 0.0571   Epoch: 4   Global Step: 24720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:05,878-Speed 5564.27 samples/sec   Loss 8.8387   LearningRate 0.0571   Epoch: 4   Global Step: 24730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:07,717-Speed 5570.56 samples/sec   Loss 8.7729   LearningRate 0.0571   Epoch: 4   Global Step: 24740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:09,585-Speed 5483.59 samples/sec   Loss 8.7107   LearningRate 0.0571   Epoch: 4   Global Step: 24750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:11,433-Speed 5542.45 samples/sec   Loss 8.7806   LearningRate 0.0570   Epoch: 4   Global Step: 24760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:13,270-Speed 5576.97 samples/sec   Loss 8.7295   LearningRate 0.0570   Epoch: 4   Global Step: 24770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:52:15,121-Speed 5532.42 samples/sec   Loss 8.8027   LearningRate 0.0570   Epoch: 4   Global Step: 24780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:52:16,976-Speed 5526.00 samples/sec   Loss 8.8311   LearningRate 0.0570   Epoch: 4   Global Step: 24790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:52:18,818-Speed 5561.03 samples/sec   Loss 8.7296   LearningRate 0.0570   Epoch: 4   Global Step: 24800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:52:20,659-Speed 5563.97 samples/sec   Loss 8.6709   LearningRate 0.0570   Epoch: 4   Global Step: 24810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:52:22,490-Speed 5593.49 samples/sec   Loss 8.7069   LearningRate 0.0569   Epoch: 4   Global Step: 24820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:24,338-Speed 5542.44 samples/sec   Loss 8.7834   LearningRate 0.0569   Epoch: 4   Global Step: 24830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:26,175-Speed 5578.84 samples/sec   Loss 8.7961   LearningRate 0.0569   Epoch: 4   Global Step: 24840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:28,017-Speed 5560.49 samples/sec   Loss 8.8554   LearningRate 0.0569   Epoch: 4   Global Step: 24850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:29,852-Speed 5582.40 samples/sec   Loss 8.7281   LearningRate 0.0569   Epoch: 4   Global Step: 24860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:31,704-Speed 5532.18 samples/sec   Loss 8.8269   LearningRate 0.0569   Epoch: 4   Global Step: 24870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:33,606-Speed 5385.88 samples/sec   Loss 8.7200   LearningRate 0.0569   Epoch: 4   Global Step: 24880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:35,467-Speed 5506.37 samples/sec   Loss 8.8437   LearningRate 0.0568   Epoch: 4   Global Step: 24890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:37,351-Speed 5436.14 samples/sec   Loss 8.8281   LearningRate 0.0568   Epoch: 4   Global Step: 24900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:39,203-Speed 5536.80 samples/sec   Loss 8.7143   LearningRate 0.0568   Epoch: 4   Global Step: 24910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:41,094-Speed 5416.21 samples/sec   Loss 8.6780   LearningRate 0.0568   Epoch: 4   Global Step: 24920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:52:42,944-Speed 5539.21 samples/sec   Loss 8.7438   LearningRate 0.0568   Epoch: 4   Global Step: 24930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:52:44,795-Speed 5533.92 samples/sec   Loss 8.6065   LearningRate 0.0568   Epoch: 4   Global Step: 24940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:52:46,664-Speed 5483.20 samples/sec   Loss 8.5492   LearningRate 0.0568   Epoch: 4   Global Step: 24950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:52:48,501-Speed 5575.70 samples/sec   Loss 8.8259   LearningRate 0.0567   Epoch: 4   Global Step: 24960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:52:50,337-Speed 5579.75 samples/sec   Loss 8.5969   LearningRate 0.0567   Epoch: 4   Global Step: 24970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:52:52,180-Speed 5560.89 samples/sec   Loss 8.6139   LearningRate 0.0567   Epoch: 4   Global Step: 24980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:54,012-Speed 5589.45 samples/sec   Loss 8.7460   LearningRate 0.0567   Epoch: 4   Global Step: 24990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:55,879-Speed 5489.75 samples/sec   Loss 8.7743   LearningRate 0.0567   Epoch: 4   Global Step: 25000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:57,726-Speed 5547.86 samples/sec   Loss 8.9216   LearningRate 0.0567   Epoch: 4   Global Step: 25010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:52:59,612-Speed 5429.93 samples/sec   Loss 8.6622   LearningRate 0.0567   Epoch: 4   Global Step: 25020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:53:01,518-Speed 5376.88 samples/sec   Loss 8.5565   LearningRate 0.0566   Epoch: 4   Global Step: 25030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:53:03,388-Speed 5477.11 samples/sec   Loss 8.6938   LearningRate 0.0566   Epoch: 4   Global Step: 25040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:53:05,227-Speed 5570.98 samples/sec   Loss 8.7749   LearningRate 0.0566   Epoch: 4   Global Step: 25050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:53:07,099-Speed 5473.20 samples/sec   Loss 8.7235   LearningRate 0.0566   Epoch: 4   Global Step: 25060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:53:08,938-Speed 5569.80 samples/sec   Loss 8.8347   LearningRate 0.0566   Epoch: 4   Global Step: 25070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:53:10,810-Speed 5473.25 samples/sec   Loss 8.9363   LearningRate 0.0566   Epoch: 4   Global Step: 25080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:12,668-Speed 5513.89 samples/sec   Loss 8.6411   LearningRate 0.0565   Epoch: 4   Global Step: 25090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:14,525-Speed 5517.34 samples/sec   Loss 8.8658   LearningRate 0.0565   Epoch: 4   Global Step: 25100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:16,385-Speed 5509.33 samples/sec   Loss 8.7654   LearningRate 0.0565   Epoch: 4   Global Step: 25110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:18,251-Speed 5490.09 samples/sec   Loss 8.6989   LearningRate 0.0565   Epoch: 4   Global Step: 25120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:20,090-Speed 5569.16 samples/sec   Loss 8.7203   LearningRate 0.0565   Epoch: 4   Global Step: 25130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:21,954-Speed 5498.54 samples/sec   Loss 8.7994   LearningRate 0.0565   Epoch: 4   Global Step: 25140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:23,892-Speed 5284.90 samples/sec   Loss 8.5850   LearningRate 0.0565   Epoch: 4   Global Step: 25150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:25,805-Speed 5355.87 samples/sec   Loss 8.8399   LearningRate 0.0564   Epoch: 4   Global Step: 25160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:27,645-Speed 5566.80 samples/sec   Loss 8.9422   LearningRate 0.0564   Epoch: 4   Global Step: 25170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:29,507-Speed 5501.52 samples/sec   Loss 8.8035   LearningRate 0.0564   Epoch: 4   Global Step: 25180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:31,343-Speed 5580.55 samples/sec   Loss 8.6897   LearningRate 0.0564   Epoch: 4   Global Step: 25190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:33,204-Speed 5505.97 samples/sec   Loss 8.6920   LearningRate 0.0564   Epoch: 4   Global Step: 25200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:35,039-Speed 5582.92 samples/sec   Loss 8.5288   LearningRate 0.0564   Epoch: 4   Global Step: 25210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:36,894-Speed 5522.26 samples/sec   Loss 8.6997   LearningRate 0.0564   Epoch: 4   Global Step: 25220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:38,752-Speed 5516.43 samples/sec   Loss 8.8221   LearningRate 0.0563   Epoch: 4   Global Step: 25230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:40,624-Speed 5469.87 samples/sec   Loss 8.7297   LearningRate 0.0563   Epoch: 4   Global Step: 25240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:53:42,479-Speed 5524.21 samples/sec   Loss 8.6393   LearningRate 0.0563   Epoch: 4   Global Step: 25250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:53:44,323-Speed 5554.51 samples/sec   Loss 8.7365   LearningRate 0.0563   Epoch: 4   Global Step: 25260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:53:46,214-Speed 5418.75 samples/sec   Loss 8.6418   LearningRate 0.0563   Epoch: 4   Global Step: 25270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:53:48,195-Speed 5173.75 samples/sec   Loss 8.9139   LearningRate 0.0563   Epoch: 4   Global Step: 25280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:53:50,021-Speed 5610.39 samples/sec   Loss 8.8260   LearningRate 0.0563   Epoch: 4   Global Step: 25290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:01,105-Speed 923.95 samples/sec   Loss 7.8594   LearningRate 0.0562   Epoch: 5   Global Step: 25300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:03,030-Speed 5323.82 samples/sec   Loss 7.9958   LearningRate 0.0562   Epoch: 5   Global Step: 25310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:04,887-Speed 5515.19 samples/sec   Loss 7.9973   LearningRate 0.0562   Epoch: 5   Global Step: 25320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:06,760-Speed 5468.71 samples/sec   Loss 7.9905   LearningRate 0.0562   Epoch: 5   Global Step: 25330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:08,618-Speed 5516.14 samples/sec   Loss 7.9165   LearningRate 0.0562   Epoch: 5   Global Step: 25340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:10,477-Speed 5511.29 samples/sec   Loss 7.9705   LearningRate 0.0562   Epoch: 5   Global Step: 25350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:54:12,356-Speed 5452.57 samples/sec   Loss 8.0484   LearningRate 0.0561   Epoch: 5   Global Step: 25360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:54:14,215-Speed 5509.85 samples/sec   Loss 7.9482   LearningRate 0.0561   Epoch: 5   Global Step: 25370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:54:16,048-Speed 5587.11 samples/sec   Loss 8.2060   LearningRate 0.0561   Epoch: 5   Global Step: 25380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:17,897-Speed 5542.10 samples/sec   Loss 7.9745   LearningRate 0.0561   Epoch: 5   Global Step: 25390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:19,740-Speed 5557.33 samples/sec   Loss 8.1128   LearningRate 0.0561   Epoch: 5   Global Step: 25400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:21,599-Speed 5510.56 samples/sec   Loss 8.0413   LearningRate 0.0561   Epoch: 5   Global Step: 25410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:23,441-Speed 5562.65 samples/sec   Loss 7.9129   LearningRate 0.0561   Epoch: 5   Global Step: 25420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:25,303-Speed 5503.40 samples/sec   Loss 7.9079   LearningRate 0.0560   Epoch: 5   Global Step: 25430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:27,152-Speed 5539.98 samples/sec   Loss 7.9922   LearningRate 0.0560   Epoch: 5   Global Step: 25440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:28,991-Speed 5571.29 samples/sec   Loss 7.9993   LearningRate 0.0560   Epoch: 5   Global Step: 25450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:30,838-Speed 5545.79 samples/sec   Loss 8.2440   LearningRate 0.0560   Epoch: 5   Global Step: 25460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:32,685-Speed 5547.59 samples/sec   Loss 8.1313   LearningRate 0.0560   Epoch: 5   Global Step: 25470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:34,532-Speed 5548.97 samples/sec   Loss 8.0954   LearningRate 0.0560   Epoch: 5   Global Step: 25480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:54:36,420-Speed 5426.00 samples/sec   Loss 8.1053   LearningRate 0.0560   Epoch: 5   Global Step: 25490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:54:38,306-Speed 5431.35 samples/sec   Loss 8.0093   LearningRate 0.0559   Epoch: 5   Global Step: 25500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:40,150-Speed 5555.98 samples/sec   Loss 7.9433   LearningRate 0.0559   Epoch: 5   Global Step: 25510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:42,018-Speed 5483.75 samples/sec   Loss 8.1013   LearningRate 0.0559   Epoch: 5   Global Step: 25520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:43,853-Speed 5584.26 samples/sec   Loss 8.0131   LearningRate 0.0559   Epoch: 5   Global Step: 25530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:45,708-Speed 5521.77 samples/sec   Loss 8.0758   LearningRate 0.0559   Epoch: 5   Global Step: 25540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:47,588-Speed 5450.05 samples/sec   Loss 7.9958   LearningRate 0.0559   Epoch: 5   Global Step: 25550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:49,459-Speed 5476.38 samples/sec   Loss 7.9068   LearningRate 0.0559   Epoch: 5   Global Step: 25560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:51,317-Speed 5516.24 samples/sec   Loss 7.8864   LearningRate 0.0558   Epoch: 5   Global Step: 25570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:53,185-Speed 5481.68 samples/sec   Loss 8.1356   LearningRate 0.0558   Epoch: 5   Global Step: 25580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:55,044-Speed 5513.79 samples/sec   Loss 8.1547   LearningRate 0.0558   Epoch: 5   Global Step: 25590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:54:56,915-Speed 5476.76 samples/sec   Loss 8.1525   LearningRate 0.0558   Epoch: 5   Global Step: 25600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:54:58,749-Speed 5586.67 samples/sec   Loss 8.2793   LearningRate 0.0558   Epoch: 5   Global Step: 25610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:55:00,602-Speed 5528.90 samples/sec   Loss 8.1673   LearningRate 0.0558   Epoch: 5   Global Step: 25620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:55:02,480-Speed 5453.97 samples/sec   Loss 8.0868   LearningRate 0.0557   Epoch: 5   Global Step: 25630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:55:04,328-Speed 5543.61 samples/sec   Loss 8.1301   LearningRate 0.0557   Epoch: 5   Global Step: 25640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:55:06,221-Speed 5412.32 samples/sec   Loss 8.3341   LearningRate 0.0557   Epoch: 5   Global Step: 25650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:08,068-Speed 5550.49 samples/sec   Loss 8.2235   LearningRate 0.0557   Epoch: 5   Global Step: 25660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:09,921-Speed 5526.30 samples/sec   Loss 8.2831   LearningRate 0.0557   Epoch: 5   Global Step: 25670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:11,780-Speed 5513.15 samples/sec   Loss 8.2076   LearningRate 0.0557   Epoch: 5   Global Step: 25680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:13,615-Speed 5582.63 samples/sec   Loss 8.2237   LearningRate 0.0557   Epoch: 5   Global Step: 25690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:15,481-Speed 5488.18 samples/sec   Loss 8.3686   LearningRate 0.0556   Epoch: 5   Global Step: 25700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:17,325-Speed 5558.51 samples/sec   Loss 8.3755   LearningRate 0.0556   Epoch: 5   Global Step: 25710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:19,171-Speed 5548.61 samples/sec   Loss 8.0978   LearningRate 0.0556   Epoch: 5   Global Step: 25720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:21,034-Speed 5498.89 samples/sec   Loss 8.2024   LearningRate 0.0556   Epoch: 5   Global Step: 25730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:22,881-Speed 5546.68 samples/sec   Loss 8.2478   LearningRate 0.0556   Epoch: 5   Global Step: 25740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:24,723-Speed 5561.70 samples/sec   Loss 8.1447   LearningRate 0.0556   Epoch: 5   Global Step: 25750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:55:26,564-Speed 5565.56 samples/sec   Loss 8.3805   LearningRate 0.0556   Epoch: 5   Global Step: 25760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:55:28,445-Speed 5447.56 samples/sec   Loss 8.0754   LearningRate 0.0555   Epoch: 5   Global Step: 25770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:55:30,286-Speed 5565.28 samples/sec   Loss 8.2309   LearningRate 0.0555   Epoch: 5   Global Step: 25780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:55:32,130-Speed 5556.31 samples/sec   Loss 8.3018   LearningRate 0.0555   Epoch: 5   Global Step: 25790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:55:33,978-Speed 5545.40 samples/sec   Loss 8.2999   LearningRate 0.0555   Epoch: 5   Global Step: 25800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:55:35,819-Speed 5564.43 samples/sec   Loss 8.2525   LearningRate 0.0555   Epoch: 5   Global Step: 25810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:55:37,671-Speed 5532.92 samples/sec   Loss 8.3290   LearningRate 0.0555   Epoch: 5   Global Step: 25820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:55:39,526-Speed 5521.57 samples/sec   Loss 8.2452   LearningRate 0.0555   Epoch: 5   Global Step: 25830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:55:41,383-Speed 5518.29 samples/sec   Loss 8.2947   LearningRate 0.0554   Epoch: 5   Global Step: 25840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:55:43,231-Speed 5541.16 samples/sec   Loss 8.2141   LearningRate 0.0554   Epoch: 5   Global Step: 25850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:55:45,081-Speed 5538.55 samples/sec   Loss 8.3431   LearningRate 0.0554   Epoch: 5   Global Step: 25860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:55:46,941-Speed 5507.84 samples/sec   Loss 8.1549   LearningRate 0.0554   Epoch: 5   Global Step: 25870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:48,794-Speed 5529.26 samples/sec   Loss 8.4363   LearningRate 0.0554   Epoch: 5   Global Step: 25880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:50,645-Speed 5536.62 samples/sec   Loss 8.5159   LearningRate 0.0554   Epoch: 5   Global Step: 25890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:52,531-Speed 5429.37 samples/sec   Loss 8.2315   LearningRate 0.0553   Epoch: 5   Global Step: 25900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:54,374-Speed 5562.14 samples/sec   Loss 8.3992   LearningRate 0.0553   Epoch: 5   Global Step: 25910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:56,213-Speed 5568.86 samples/sec   Loss 8.4290   LearningRate 0.0553   Epoch: 5   Global Step: 25920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:58,074-Speed 5507.33 samples/sec   Loss 8.2319   LearningRate 0.0553   Epoch: 5   Global Step: 25930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:55:59,939-Speed 5491.62 samples/sec   Loss 8.2608   LearningRate 0.0553   Epoch: 5   Global Step: 25940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:56:01,810-Speed 5475.89 samples/sec   Loss 8.3872   LearningRate 0.0553   Epoch: 5   Global Step: 25950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:56:03,668-Speed 5513.37 samples/sec   Loss 8.3577   LearningRate 0.0553   Epoch: 5   Global Step: 25960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:56:05,522-Speed 5526.61 samples/sec   Loss 8.3383   LearningRate 0.0552   Epoch: 5   Global Step: 25970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:56:07,388-Speed 5492.86 samples/sec   Loss 8.2705   LearningRate 0.0552   Epoch: 5   Global Step: 25980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:56:09,222-Speed 5586.59 samples/sec   Loss 8.3705   LearningRate 0.0552   Epoch: 5   Global Step: 25990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:56:11,066-Speed 5554.62 samples/sec   Loss 8.3580   LearningRate 0.0552   Epoch: 5   Global Step: 26000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:56:38,331-[lfw][26000]XNorm: 23.686060
Training: 2022-04-11 11:56:38,332-[lfw][26000]Accuracy-Flip: 0.99617+-0.00289
Training: 2022-04-11 11:56:38,333-[lfw][26000]Accuracy-Highest: 0.99667
Training: 2022-04-11 11:57:09,832-[cfp_fp][26000]XNorm: 20.557960
Training: 2022-04-11 11:57:09,833-[cfp_fp][26000]Accuracy-Flip: 0.94686+-0.01232
Training: 2022-04-11 11:57:09,833-[cfp_fp][26000]Accuracy-Highest: 0.95486
Training: 2022-04-11 11:57:36,701-[agedb_30][26000]XNorm: 22.865481
Training: 2022-04-11 11:57:36,702-[agedb_30][26000]Accuracy-Flip: 0.96850+-0.01031
Training: 2022-04-11 11:57:36,702-[agedb_30][26000]Accuracy-Highest: 0.97083
Training: 2022-04-11 11:57:38,554-Speed 117.05 samples/sec   Loss 8.3113   LearningRate 0.0552   Epoch: 5   Global Step: 26010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:57:40,402-Speed 5542.68 samples/sec   Loss 8.2872   LearningRate 0.0552   Epoch: 5   Global Step: 26020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:57:42,241-Speed 5570.70 samples/sec   Loss 8.5609   LearningRate 0.0552   Epoch: 5   Global Step: 26030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:57:44,071-Speed 5599.53 samples/sec   Loss 8.4028   LearningRate 0.0551   Epoch: 5   Global Step: 26040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:57:45,937-Speed 5488.58 samples/sec   Loss 8.3697   LearningRate 0.0551   Epoch: 5   Global Step: 26050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:57:47,768-Speed 5595.23 samples/sec   Loss 8.4226   LearningRate 0.0551   Epoch: 5   Global Step: 26060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:57:49,609-Speed 5565.43 samples/sec   Loss 8.3882   LearningRate 0.0551   Epoch: 5   Global Step: 26070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:57:51,449-Speed 5567.25 samples/sec   Loss 8.3917   LearningRate 0.0551   Epoch: 5   Global Step: 26080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:57:53,294-Speed 5555.91 samples/sec   Loss 8.5487   LearningRate 0.0551   Epoch: 5   Global Step: 26090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:57:55,125-Speed 5595.53 samples/sec   Loss 8.3458   LearningRate 0.0551   Epoch: 5   Global Step: 26100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:57:56,981-Speed 5520.95 samples/sec   Loss 8.4003   LearningRate 0.0550   Epoch: 5   Global Step: 26110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:57:58,813-Speed 5589.87 samples/sec   Loss 8.4387   LearningRate 0.0550   Epoch: 5   Global Step: 26120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:58:00,668-Speed 5523.87 samples/sec   Loss 8.3529   LearningRate 0.0550   Epoch: 5   Global Step: 26130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:58:02,505-Speed 5576.85 samples/sec   Loss 8.3904   LearningRate 0.0550   Epoch: 5   Global Step: 26140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:58:04,349-Speed 5556.99 samples/sec   Loss 8.1120   LearningRate 0.0550   Epoch: 5   Global Step: 26150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:58:06,191-Speed 5562.00 samples/sec   Loss 8.3362   LearningRate 0.0550   Epoch: 5   Global Step: 26160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:08,027-Speed 5579.22 samples/sec   Loss 8.3389   LearningRate 0.0550   Epoch: 5   Global Step: 26170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:09,884-Speed 5516.95 samples/sec   Loss 8.4420   LearningRate 0.0549   Epoch: 5   Global Step: 26180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:11,736-Speed 5531.29 samples/sec   Loss 8.3360   LearningRate 0.0549   Epoch: 5   Global Step: 26190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:13,620-Speed 5438.49 samples/sec   Loss 8.4144   LearningRate 0.0549   Epoch: 5   Global Step: 26200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:15,462-Speed 5561.63 samples/sec   Loss 8.3333   LearningRate 0.0549   Epoch: 5   Global Step: 26210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:17,333-Speed 5478.29 samples/sec   Loss 8.4300   LearningRate 0.0549   Epoch: 5   Global Step: 26220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:19,166-Speed 5587.80 samples/sec   Loss 8.4749   LearningRate 0.0549   Epoch: 5   Global Step: 26230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:21,028-Speed 5503.17 samples/sec   Loss 8.3703   LearningRate 0.0549   Epoch: 5   Global Step: 26240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:22,873-Speed 5554.12 samples/sec   Loss 8.3710   LearningRate 0.0548   Epoch: 5   Global Step: 26250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:24,723-Speed 5536.91 samples/sec   Loss 8.4915   LearningRate 0.0548   Epoch: 5   Global Step: 26260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:58:26,574-Speed 5535.80 samples/sec   Loss 8.4217   LearningRate 0.0548   Epoch: 5   Global Step: 26270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:58:28,431-Speed 5532.09 samples/sec   Loss 8.4018   LearningRate 0.0548   Epoch: 5   Global Step: 26280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:58:30,280-Speed 5539.52 samples/sec   Loss 8.4261   LearningRate 0.0548   Epoch: 5   Global Step: 26290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:58:32,110-Speed 5601.68 samples/sec   Loss 8.4949   LearningRate 0.0548   Epoch: 5   Global Step: 26300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:58:33,950-Speed 5567.89 samples/sec   Loss 8.3967   LearningRate 0.0547   Epoch: 5   Global Step: 26310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:58:35,785-Speed 5581.92 samples/sec   Loss 8.4102   LearningRate 0.0547   Epoch: 5   Global Step: 26320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:58:37,648-Speed 5499.73 samples/sec   Loss 8.4450   LearningRate 0.0547   Epoch: 5   Global Step: 26330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:58:39,510-Speed 5502.58 samples/sec   Loss 8.4012   LearningRate 0.0547   Epoch: 5   Global Step: 26340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:58:41,368-Speed 5512.95 samples/sec   Loss 8.3528   LearningRate 0.0547   Epoch: 5   Global Step: 26350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:58:43,232-Speed 5498.98 samples/sec   Loss 8.6012   LearningRate 0.0547   Epoch: 5   Global Step: 26360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:45,097-Speed 5492.69 samples/sec   Loss 8.3470   LearningRate 0.0547   Epoch: 5   Global Step: 26370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:46,948-Speed 5535.76 samples/sec   Loss 8.4026   LearningRate 0.0546   Epoch: 5   Global Step: 26380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:48,812-Speed 5496.46 samples/sec   Loss 8.3121   LearningRate 0.0546   Epoch: 5   Global Step: 26390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:50,650-Speed 5572.75 samples/sec   Loss 8.2763   LearningRate 0.0546   Epoch: 5   Global Step: 26400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:52,490-Speed 5568.52 samples/sec   Loss 8.4432   LearningRate 0.0546   Epoch: 5   Global Step: 26410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:54,339-Speed 5541.91 samples/sec   Loss 8.3492   LearningRate 0.0546   Epoch: 5   Global Step: 26420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:56,179-Speed 5567.17 samples/sec   Loss 8.3169   LearningRate 0.0546   Epoch: 5   Global Step: 26430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:58,015-Speed 5579.48 samples/sec   Loss 8.4155   LearningRate 0.0546   Epoch: 5   Global Step: 26440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:58:59,881-Speed 5489.49 samples/sec   Loss 8.3750   LearningRate 0.0545   Epoch: 5   Global Step: 26450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:59:01,718-Speed 5578.01 samples/sec   Loss 8.3969   LearningRate 0.0545   Epoch: 5   Global Step: 26460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:59:03,555-Speed 5576.73 samples/sec   Loss 8.2986   LearningRate 0.0545   Epoch: 5   Global Step: 26470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:59:05,399-Speed 5555.97 samples/sec   Loss 8.4530   LearningRate 0.0545   Epoch: 5   Global Step: 26480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:59:07,279-Speed 5448.89 samples/sec   Loss 8.4125   LearningRate 0.0545   Epoch: 5   Global Step: 26490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:59:09,121-Speed 5562.46 samples/sec   Loss 8.5343   LearningRate 0.0545   Epoch: 5   Global Step: 26500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:59:10,956-Speed 5581.70 samples/sec   Loss 8.4072   LearningRate 0.0545   Epoch: 5   Global Step: 26510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:59:12,809-Speed 5530.74 samples/sec   Loss 8.4007   LearningRate 0.0544   Epoch: 5   Global Step: 26520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:59:14,645-Speed 5577.97 samples/sec   Loss 8.5217   LearningRate 0.0544   Epoch: 5   Global Step: 26530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:59:16,509-Speed 5496.93 samples/sec   Loss 8.4172   LearningRate 0.0544   Epoch: 5   Global Step: 26540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:59:18,368-Speed 5509.87 samples/sec   Loss 8.4305   LearningRate 0.0544   Epoch: 5   Global Step: 26550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:59:20,198-Speed 5598.74 samples/sec   Loss 8.2384   LearningRate 0.0544   Epoch: 5   Global Step: 26560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:59:22,054-Speed 5518.95 samples/sec   Loss 8.5223   LearningRate 0.0544   Epoch: 5   Global Step: 26570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:59:23,956-Speed 5387.15 samples/sec   Loss 8.4511   LearningRate 0.0544   Epoch: 5   Global Step: 26580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:59:25,800-Speed 5553.95 samples/sec   Loss 8.4020   LearningRate 0.0543   Epoch: 5   Global Step: 26590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 11:59:27,629-Speed 5600.05 samples/sec   Loss 8.4924   LearningRate 0.0543   Epoch: 5   Global Step: 26600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:59:29,462-Speed 5589.09 samples/sec   Loss 8.5598   LearningRate 0.0543   Epoch: 5   Global Step: 26610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:59:31,299-Speed 5576.71 samples/sec   Loss 8.5327   LearningRate 0.0543   Epoch: 5   Global Step: 26620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:59:33,136-Speed 5577.56 samples/sec   Loss 8.3712   LearningRate 0.0543   Epoch: 5   Global Step: 26630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:59:34,975-Speed 5572.82 samples/sec   Loss 8.5269   LearningRate 0.0543   Epoch: 5   Global Step: 26640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:59:36,821-Speed 5547.14 samples/sec   Loss 8.5174   LearningRate 0.0543   Epoch: 5   Global Step: 26650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:59:38,666-Speed 5552.32 samples/sec   Loss 8.4521   LearningRate 0.0542   Epoch: 5   Global Step: 26660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:59:40,537-Speed 5476.94 samples/sec   Loss 8.4675   LearningRate 0.0542   Epoch: 5   Global Step: 26670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:59:42,369-Speed 5590.95 samples/sec   Loss 8.5100   LearningRate 0.0542   Epoch: 5   Global Step: 26680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:59:44,237-Speed 5485.62 samples/sec   Loss 8.4790   LearningRate 0.0542   Epoch: 5   Global Step: 26690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:59:46,104-Speed 5487.39 samples/sec   Loss 8.4453   LearningRate 0.0542   Epoch: 5   Global Step: 26700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 11:59:47,971-Speed 5487.75 samples/sec   Loss 8.2758   LearningRate 0.0542   Epoch: 5   Global Step: 26710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:59:49,838-Speed 5486.40 samples/sec   Loss 8.2491   LearningRate 0.0541   Epoch: 5   Global Step: 26720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:59:51,744-Speed 5376.97 samples/sec   Loss 8.3095   LearningRate 0.0541   Epoch: 5   Global Step: 26730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:59:53,609-Speed 5492.58 samples/sec   Loss 8.3886   LearningRate 0.0541   Epoch: 5   Global Step: 26740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:59:55,462-Speed 5528.63 samples/sec   Loss 8.3801   LearningRate 0.0541   Epoch: 5   Global Step: 26750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:59:57,307-Speed 5553.08 samples/sec   Loss 8.2918   LearningRate 0.0541   Epoch: 5   Global Step: 26760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 11:59:59,194-Speed 5427.28 samples/sec   Loss 8.5417   LearningRate 0.0541   Epoch: 5   Global Step: 26770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:01,051-Speed 5517.40 samples/sec   Loss 8.3241   LearningRate 0.0541   Epoch: 5   Global Step: 26780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:02,897-Speed 5552.83 samples/sec   Loss 8.6314   LearningRate 0.0540   Epoch: 5   Global Step: 26790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:04,734-Speed 5577.04 samples/sec   Loss 8.3379   LearningRate 0.0540   Epoch: 5   Global Step: 26800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:06,581-Speed 5544.66 samples/sec   Loss 8.4976   LearningRate 0.0540   Epoch: 5   Global Step: 26810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:00:08,407-Speed 5610.38 samples/sec   Loss 8.5351   LearningRate 0.0540   Epoch: 5   Global Step: 26820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:10,290-Speed 5439.63 samples/sec   Loss 8.3579   LearningRate 0.0540   Epoch: 5   Global Step: 26830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:12,153-Speed 5501.02 samples/sec   Loss 8.4027   LearningRate 0.0540   Epoch: 5   Global Step: 26840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:13,992-Speed 5570.29 samples/sec   Loss 8.4658   LearningRate 0.0540   Epoch: 5   Global Step: 26850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:15,861-Speed 5483.57 samples/sec   Loss 8.3837   LearningRate 0.0539   Epoch: 5   Global Step: 26860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:17,705-Speed 5556.38 samples/sec   Loss 8.5581   LearningRate 0.0539   Epoch: 5   Global Step: 26870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:19,567-Speed 5501.36 samples/sec   Loss 8.5174   LearningRate 0.0539   Epoch: 5   Global Step: 26880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:21,402-Speed 5583.99 samples/sec   Loss 8.4283   LearningRate 0.0539   Epoch: 5   Global Step: 26890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:23,275-Speed 5467.95 samples/sec   Loss 8.3949   LearningRate 0.0539   Epoch: 5   Global Step: 26900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:25,135-Speed 5546.25 samples/sec   Loss 8.5020   LearningRate 0.0539   Epoch: 5   Global Step: 26910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:27,004-Speed 5478.32 samples/sec   Loss 8.4746   LearningRate 0.0539   Epoch: 5   Global Step: 26920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:00:28,847-Speed 5559.34 samples/sec   Loss 8.3257   LearningRate 0.0538   Epoch: 5   Global Step: 26930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:30,701-Speed 5526.14 samples/sec   Loss 8.4291   LearningRate 0.0538   Epoch: 5   Global Step: 26940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:32,537-Speed 5581.62 samples/sec   Loss 8.4835   LearningRate 0.0538   Epoch: 5   Global Step: 26950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:34,392-Speed 5522.68 samples/sec   Loss 8.3105   LearningRate 0.0538   Epoch: 5   Global Step: 26960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:36,274-Speed 5444.07 samples/sec   Loss 8.2898   LearningRate 0.0538   Epoch: 5   Global Step: 26970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:38,173-Speed 5394.73 samples/sec   Loss 8.3155   LearningRate 0.0538   Epoch: 5   Global Step: 26980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:40,030-Speed 5519.80 samples/sec   Loss 8.3564   LearningRate 0.0538   Epoch: 5   Global Step: 26990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:41,899-Speed 5483.36 samples/sec   Loss 8.4053   LearningRate 0.0537   Epoch: 5   Global Step: 27000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:43,733-Speed 5583.96 samples/sec   Loss 8.3842   LearningRate 0.0537   Epoch: 5   Global Step: 27010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:45,568-Speed 5581.96 samples/sec   Loss 8.3383   LearningRate 0.0537   Epoch: 5   Global Step: 27020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:00:47,436-Speed 5487.68 samples/sec   Loss 8.3049   LearningRate 0.0537   Epoch: 5   Global Step: 27030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:00:49,318-Speed 5441.87 samples/sec   Loss 8.3972   LearningRate 0.0537   Epoch: 5   Global Step: 27040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:00:51,196-Speed 5457.06 samples/sec   Loss 8.4736   LearningRate 0.0537   Epoch: 5   Global Step: 27050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:00:53,032-Speed 5580.32 samples/sec   Loss 8.5063   LearningRate 0.0537   Epoch: 5   Global Step: 27060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:00:54,879-Speed 5546.58 samples/sec   Loss 8.3603   LearningRate 0.0536   Epoch: 5   Global Step: 27070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:00:56,717-Speed 5575.10 samples/sec   Loss 8.4733   LearningRate 0.0536   Epoch: 5   Global Step: 27080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:00:58,559-Speed 5561.25 samples/sec   Loss 8.3597   LearningRate 0.0536   Epoch: 5   Global Step: 27090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:00,403-Speed 5556.09 samples/sec   Loss 8.3274   LearningRate 0.0536   Epoch: 5   Global Step: 27100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:02,290-Speed 5429.39 samples/sec   Loss 8.3973   LearningRate 0.0536   Epoch: 5   Global Step: 27110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:04,137-Speed 5548.54 samples/sec   Loss 8.5611   LearningRate 0.0536   Epoch: 5   Global Step: 27120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:05,978-Speed 5562.29 samples/sec   Loss 8.5603   LearningRate 0.0536   Epoch: 5   Global Step: 27130   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 12:01:07,842-Speed 5499.59 samples/sec   Loss 8.5733   LearningRate 0.0535   Epoch: 5   Global Step: 27140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:09,676-Speed 5585.36 samples/sec   Loss 8.4103   LearningRate 0.0535   Epoch: 5   Global Step: 27150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:11,510-Speed 5585.79 samples/sec   Loss 8.5210   LearningRate 0.0535   Epoch: 5   Global Step: 27160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:13,358-Speed 5543.37 samples/sec   Loss 8.3768   LearningRate 0.0535   Epoch: 5   Global Step: 27170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:15,215-Speed 5518.46 samples/sec   Loss 8.5651   LearningRate 0.0535   Epoch: 5   Global Step: 27180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:17,052-Speed 5578.77 samples/sec   Loss 8.2715   LearningRate 0.0535   Epoch: 5   Global Step: 27190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:18,884-Speed 5589.92 samples/sec   Loss 8.2676   LearningRate 0.0535   Epoch: 5   Global Step: 27200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:01:20,736-Speed 5530.79 samples/sec   Loss 8.2520   LearningRate 0.0534   Epoch: 5   Global Step: 27210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:01:22,598-Speed 5506.09 samples/sec   Loss 8.4752   LearningRate 0.0534   Epoch: 5   Global Step: 27220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:01:24,495-Speed 5401.15 samples/sec   Loss 8.3062   LearningRate 0.0534   Epoch: 5   Global Step: 27230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:01:26,373-Speed 5452.71 samples/sec   Loss 8.4726   LearningRate 0.0534   Epoch: 5   Global Step: 27240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:01:28,233-Speed 5511.53 samples/sec   Loss 8.4793   LearningRate 0.0534   Epoch: 5   Global Step: 27250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:01:30,075-Speed 5561.44 samples/sec   Loss 8.6362   LearningRate 0.0534   Epoch: 5   Global Step: 27260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:01:31,937-Speed 5499.97 samples/sec   Loss 8.2608   LearningRate 0.0534   Epoch: 5   Global Step: 27270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:01:33,785-Speed 5545.75 samples/sec   Loss 8.3662   LearningRate 0.0533   Epoch: 5   Global Step: 27280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:01:35,653-Speed 5483.77 samples/sec   Loss 8.3536   LearningRate 0.0533   Epoch: 5   Global Step: 27290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:01:37,504-Speed 5536.35 samples/sec   Loss 8.5538   LearningRate 0.0533   Epoch: 5   Global Step: 27300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:39,350-Speed 5551.63 samples/sec   Loss 8.5409   LearningRate 0.0533   Epoch: 5   Global Step: 27310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:41,198-Speed 5542.29 samples/sec   Loss 8.4843   LearningRate 0.0533   Epoch: 5   Global Step: 27320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:43,044-Speed 5551.80 samples/sec   Loss 8.2969   LearningRate 0.0533   Epoch: 5   Global Step: 27330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:44,908-Speed 5498.85 samples/sec   Loss 8.2616   LearningRate 0.0533   Epoch: 5   Global Step: 27340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:46,759-Speed 5534.73 samples/sec   Loss 8.4107   LearningRate 0.0532   Epoch: 5   Global Step: 27350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:48,639-Speed 5450.12 samples/sec   Loss 8.4442   LearningRate 0.0532   Epoch: 5   Global Step: 27360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:50,500-Speed 5504.26 samples/sec   Loss 8.4485   LearningRate 0.0532   Epoch: 5   Global Step: 27370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:52,407-Speed 5372.60 samples/sec   Loss 8.3164   LearningRate 0.0532   Epoch: 5   Global Step: 27380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:54,273-Speed 5490.34 samples/sec   Loss 8.2379   LearningRate 0.0532   Epoch: 5   Global Step: 27390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:56,120-Speed 5547.93 samples/sec   Loss 8.4549   LearningRate 0.0532   Epoch: 5   Global Step: 27400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:57,960-Speed 5568.48 samples/sec   Loss 8.2746   LearningRate 0.0532   Epoch: 5   Global Step: 27410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:01:59,828-Speed 5485.31 samples/sec   Loss 8.3201   LearningRate 0.0531   Epoch: 5   Global Step: 27420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:02:01,691-Speed 5497.29 samples/sec   Loss 8.4051   LearningRate 0.0531   Epoch: 5   Global Step: 27430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:02:03,548-Speed 5519.94 samples/sec   Loss 8.3740   LearningRate 0.0531   Epoch: 5   Global Step: 27440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:02:05,401-Speed 5526.78 samples/sec   Loss 8.2378   LearningRate 0.0531   Epoch: 5   Global Step: 27450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:07,248-Speed 5549.52 samples/sec   Loss 8.3632   LearningRate 0.0531   Epoch: 5   Global Step: 27460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:09,094-Speed 5550.54 samples/sec   Loss 8.4551   LearningRate 0.0531   Epoch: 5   Global Step: 27470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:10,959-Speed 5491.95 samples/sec   Loss 8.4327   LearningRate 0.0530   Epoch: 5   Global Step: 27480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:12,828-Speed 5483.81 samples/sec   Loss 8.4799   LearningRate 0.0530   Epoch: 5   Global Step: 27490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:14,690-Speed 5500.37 samples/sec   Loss 8.1732   LearningRate 0.0530   Epoch: 5   Global Step: 27500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:16,558-Speed 5484.88 samples/sec   Loss 8.3669   LearningRate 0.0530   Epoch: 5   Global Step: 27510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:18,408-Speed 5540.49 samples/sec   Loss 8.4474   LearningRate 0.0530   Epoch: 5   Global Step: 27520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:20,256-Speed 5546.27 samples/sec   Loss 8.4618   LearningRate 0.0530   Epoch: 5   Global Step: 27530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:22,097-Speed 5565.65 samples/sec   Loss 8.2562   LearningRate 0.0530   Epoch: 5   Global Step: 27540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:23,961-Speed 5495.40 samples/sec   Loss 8.2433   LearningRate 0.0529   Epoch: 5   Global Step: 27550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:02:25,822-Speed 5503.66 samples/sec   Loss 8.3351   LearningRate 0.0529   Epoch: 5   Global Step: 27560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:02:27,686-Speed 5498.73 samples/sec   Loss 8.3518   LearningRate 0.0529   Epoch: 5   Global Step: 27570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:02:29,546-Speed 5507.99 samples/sec   Loss 8.1414   LearningRate 0.0529   Epoch: 5   Global Step: 27580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:02:31,372-Speed 5610.70 samples/sec   Loss 8.3439   LearningRate 0.0529   Epoch: 5   Global Step: 27590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:33,220-Speed 5541.97 samples/sec   Loss 8.4567   LearningRate 0.0529   Epoch: 5   Global Step: 27600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:35,064-Speed 5558.34 samples/sec   Loss 8.4057   LearningRate 0.0529   Epoch: 5   Global Step: 27610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:36,921-Speed 5514.50 samples/sec   Loss 8.3242   LearningRate 0.0528   Epoch: 5   Global Step: 27620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:38,783-Speed 5504.21 samples/sec   Loss 8.3048   LearningRate 0.0528   Epoch: 5   Global Step: 27630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:40,646-Speed 5498.95 samples/sec   Loss 8.5332   LearningRate 0.0528   Epoch: 5   Global Step: 27640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:42,520-Speed 5465.62 samples/sec   Loss 8.2178   LearningRate 0.0528   Epoch: 5   Global Step: 27650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:44,355-Speed 5582.03 samples/sec   Loss 8.3095   LearningRate 0.0528   Epoch: 5   Global Step: 27660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:46,229-Speed 5469.33 samples/sec   Loss 8.1855   LearningRate 0.0528   Epoch: 5   Global Step: 27670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:48,085-Speed 5518.84 samples/sec   Loss 8.4089   LearningRate 0.0528   Epoch: 5   Global Step: 27680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:49,964-Speed 5452.63 samples/sec   Loss 8.3852   LearningRate 0.0527   Epoch: 5   Global Step: 27690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:02:51,807-Speed 5560.60 samples/sec   Loss 8.5719   LearningRate 0.0527   Epoch: 5   Global Step: 27700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:02:53,649-Speed 5561.41 samples/sec   Loss 8.3621   LearningRate 0.0527   Epoch: 5   Global Step: 27710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:02:55,502-Speed 5528.71 samples/sec   Loss 8.3582   LearningRate 0.0527   Epoch: 5   Global Step: 27720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:57,340-Speed 5576.97 samples/sec   Loss 8.4390   LearningRate 0.0527   Epoch: 5   Global Step: 27730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:02:59,219-Speed 5451.72 samples/sec   Loss 8.4111   LearningRate 0.0527   Epoch: 5   Global Step: 27740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:01,055-Speed 5578.94 samples/sec   Loss 8.3365   LearningRate 0.0527   Epoch: 5   Global Step: 27750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:02,924-Speed 5483.47 samples/sec   Loss 8.3462   LearningRate 0.0526   Epoch: 5   Global Step: 27760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:04,776-Speed 5531.38 samples/sec   Loss 8.5338   LearningRate 0.0526   Epoch: 5   Global Step: 27770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:06,630-Speed 5526.03 samples/sec   Loss 8.3994   LearningRate 0.0526   Epoch: 5   Global Step: 27780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:08,480-Speed 5537.75 samples/sec   Loss 8.2585   LearningRate 0.0526   Epoch: 5   Global Step: 27790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:10,350-Speed 5480.71 samples/sec   Loss 8.2488   LearningRate 0.0526   Epoch: 5   Global Step: 27800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:12,203-Speed 5527.56 samples/sec   Loss 8.3506   LearningRate 0.0526   Epoch: 5   Global Step: 27810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:14,059-Speed 5521.49 samples/sec   Loss 8.3889   LearningRate 0.0526   Epoch: 5   Global Step: 27820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:03:15,927-Speed 5484.40 samples/sec   Loss 8.6433   LearningRate 0.0525   Epoch: 5   Global Step: 27830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:03:17,782-Speed 5523.68 samples/sec   Loss 8.4080   LearningRate 0.0525   Epoch: 5   Global Step: 27840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:03:19,649-Speed 5486.51 samples/sec   Loss 8.4544   LearningRate 0.0525   Epoch: 5   Global Step: 27850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:03:21,496-Speed 5546.43 samples/sec   Loss 8.4559   LearningRate 0.0525   Epoch: 5   Global Step: 27860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:03:23,369-Speed 5471.92 samples/sec   Loss 8.4471   LearningRate 0.0525   Epoch: 5   Global Step: 27870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:03:25,233-Speed 5494.24 samples/sec   Loss 8.2938   LearningRate 0.0525   Epoch: 5   Global Step: 27880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:03:27,136-Speed 5383.97 samples/sec   Loss 8.3076   LearningRate 0.0525   Epoch: 5   Global Step: 27890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:03:28,982-Speed 5552.28 samples/sec   Loss 8.4537   LearningRate 0.0524   Epoch: 5   Global Step: 27900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:30,849-Speed 5485.46 samples/sec   Loss 8.2950   LearningRate 0.0524   Epoch: 5   Global Step: 27910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:32,688-Speed 5572.15 samples/sec   Loss 8.3352   LearningRate 0.0524   Epoch: 5   Global Step: 27920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:34,538-Speed 5539.44 samples/sec   Loss 8.4057   LearningRate 0.0524   Epoch: 5   Global Step: 27930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:36,378-Speed 5566.92 samples/sec   Loss 8.2723   LearningRate 0.0524   Epoch: 5   Global Step: 27940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:38,227-Speed 5541.90 samples/sec   Loss 8.2334   LearningRate 0.0524   Epoch: 5   Global Step: 27950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:40,092-Speed 5493.67 samples/sec   Loss 8.6583   LearningRate 0.0524   Epoch: 5   Global Step: 27960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:41,932-Speed 5568.35 samples/sec   Loss 8.5122   LearningRate 0.0523   Epoch: 5   Global Step: 27970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:43,801-Speed 5481.91 samples/sec   Loss 8.3704   LearningRate 0.0523   Epoch: 5   Global Step: 27980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:45,668-Speed 5486.49 samples/sec   Loss 8.5193   LearningRate 0.0523   Epoch: 5   Global Step: 27990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:03:47,537-Speed 5482.85 samples/sec   Loss 8.4258   LearningRate 0.0523   Epoch: 5   Global Step: 28000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:04:14,926-[lfw][28000]XNorm: 21.572095
Training: 2022-04-11 12:04:14,927-[lfw][28000]Accuracy-Flip: 0.99683+-0.00229
Training: 2022-04-11 12:04:14,927-[lfw][28000]Accuracy-Highest: 0.99683
Training: 2022-04-11 12:04:46,190-[cfp_fp][28000]XNorm: 18.669144
Training: 2022-04-11 12:04:46,191-[cfp_fp][28000]Accuracy-Flip: 0.94900+-0.00958
Training: 2022-04-11 12:04:46,191-[cfp_fp][28000]Accuracy-Highest: 0.95486
Training: 2022-04-11 12:05:13,138-[agedb_30][28000]XNorm: 21.295386
Training: 2022-04-11 12:05:13,139-[agedb_30][28000]Accuracy-Flip: 0.97200+-0.00674
Training: 2022-04-11 12:05:13,140-[agedb_30][28000]Accuracy-Highest: 0.97200
Training: 2022-04-11 12:05:15,007-Speed 117.07 samples/sec   Loss 8.2531   LearningRate 0.0523   Epoch: 5   Global Step: 28010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:05:16,850-Speed 5560.11 samples/sec   Loss 8.4365   LearningRate 0.0523   Epoch: 5   Global Step: 28020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:05:18,701-Speed 5534.83 samples/sec   Loss 8.4490   LearningRate 0.0523   Epoch: 5   Global Step: 28030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:05:20,535-Speed 5586.17 samples/sec   Loss 8.4145   LearningRate 0.0522   Epoch: 5   Global Step: 28040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:05:22,377-Speed 5560.99 samples/sec   Loss 8.2561   LearningRate 0.0522   Epoch: 5   Global Step: 28050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:05:24,259-Speed 5445.28 samples/sec   Loss 8.1994   LearningRate 0.0522   Epoch: 5   Global Step: 28060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:05:26,129-Speed 5476.72 samples/sec   Loss 8.2507   LearningRate 0.0522   Epoch: 5   Global Step: 28070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:05:27,977-Speed 5544.85 samples/sec   Loss 8.2773   LearningRate 0.0522   Epoch: 5   Global Step: 28080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:05:29,806-Speed 5601.81 samples/sec   Loss 8.4586   LearningRate 0.0522   Epoch: 5   Global Step: 28090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:05:31,654-Speed 5544.96 samples/sec   Loss 8.3054   LearningRate 0.0522   Epoch: 5   Global Step: 28100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:05:33,507-Speed 5528.33 samples/sec   Loss 8.2758   LearningRate 0.0521   Epoch: 5   Global Step: 28110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:05:35,366-Speed 5511.47 samples/sec   Loss 8.3626   LearningRate 0.0521   Epoch: 5   Global Step: 28120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:05:37,198-Speed 5591.84 samples/sec   Loss 8.4187   LearningRate 0.0521   Epoch: 5   Global Step: 28130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:05:39,056-Speed 5515.66 samples/sec   Loss 8.3760   LearningRate 0.0521   Epoch: 5   Global Step: 28140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:05:40,931-Speed 5461.84 samples/sec   Loss 8.3051   LearningRate 0.0521   Epoch: 5   Global Step: 28150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:05:42,800-Speed 5482.18 samples/sec   Loss 8.2094   LearningRate 0.0521   Epoch: 5   Global Step: 28160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:05:44,637-Speed 5577.45 samples/sec   Loss 8.2356   LearningRate 0.0521   Epoch: 5   Global Step: 28170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:05:46,467-Speed 5598.89 samples/sec   Loss 8.3114   LearningRate 0.0520   Epoch: 5   Global Step: 28180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:05:48,328-Speed 5503.26 samples/sec   Loss 8.3538   LearningRate 0.0520   Epoch: 5   Global Step: 28190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:05:50,203-Speed 5466.53 samples/sec   Loss 8.4719   LearningRate 0.0520   Epoch: 5   Global Step: 28200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:05:52,057-Speed 5525.48 samples/sec   Loss 8.3584   LearningRate 0.0520   Epoch: 5   Global Step: 28210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:05:53,993-Speed 5292.20 samples/sec   Loss 8.2761   LearningRate 0.0520   Epoch: 5   Global Step: 28220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:05:55,834-Speed 5564.30 samples/sec   Loss 8.3776   LearningRate 0.0520   Epoch: 5   Global Step: 28230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:05:57,684-Speed 5537.11 samples/sec   Loss 8.3553   LearningRate 0.0520   Epoch: 5   Global Step: 28240   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 12:05:59,528-Speed 5557.13 samples/sec   Loss 8.2983   LearningRate 0.0519   Epoch: 5   Global Step: 28250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:06:01,400-Speed 5473.85 samples/sec   Loss 8.4711   LearningRate 0.0519   Epoch: 5   Global Step: 28260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:06:03,275-Speed 5464.14 samples/sec   Loss 8.2969   LearningRate 0.0519   Epoch: 5   Global Step: 28270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:06:05,112-Speed 5577.56 samples/sec   Loss 8.2498   LearningRate 0.0519   Epoch: 5   Global Step: 28280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:06:06,969-Speed 5516.55 samples/sec   Loss 8.2706   LearningRate 0.0519   Epoch: 5   Global Step: 28290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:06:08,819-Speed 5539.72 samples/sec   Loss 8.1783   LearningRate 0.0519   Epoch: 5   Global Step: 28300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:06:10,659-Speed 5567.08 samples/sec   Loss 8.2993   LearningRate 0.0519   Epoch: 5   Global Step: 28310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:06:12,509-Speed 5538.09 samples/sec   Loss 8.2228   LearningRate 0.0518   Epoch: 5   Global Step: 28320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:06:14,350-Speed 5565.33 samples/sec   Loss 8.3775   LearningRate 0.0518   Epoch: 5   Global Step: 28330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:06:16,185-Speed 5583.16 samples/sec   Loss 8.4580   LearningRate 0.0518   Epoch: 5   Global Step: 28340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:06:18,028-Speed 5556.96 samples/sec   Loss 8.3969   LearningRate 0.0518   Epoch: 5   Global Step: 28350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:06:19,862-Speed 5587.94 samples/sec   Loss 8.1387   LearningRate 0.0518   Epoch: 5   Global Step: 28360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:06:21,692-Speed 5599.16 samples/sec   Loss 8.4050   LearningRate 0.0518   Epoch: 5   Global Step: 28370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:06:23,526-Speed 5586.45 samples/sec   Loss 8.2246   LearningRate 0.0518   Epoch: 5   Global Step: 28380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:06:25,365-Speed 5570.61 samples/sec   Loss 8.4814   LearningRate 0.0517   Epoch: 5   Global Step: 28390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:06:27,219-Speed 5525.24 samples/sec   Loss 8.2688   LearningRate 0.0517   Epoch: 5   Global Step: 28400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:06:29,056-Speed 5577.82 samples/sec   Loss 8.4238   LearningRate 0.0517   Epoch: 5   Global Step: 28410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:06:30,877-Speed 5626.91 samples/sec   Loss 8.2021   LearningRate 0.0517   Epoch: 5   Global Step: 28420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 12:06:32,709-Speed 5588.68 samples/sec   Loss 8.2140   LearningRate 0.0517   Epoch: 5   Global Step: 28430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 12:06:34,573-Speed 5497.67 samples/sec   Loss 8.1906   LearningRate 0.0517   Epoch: 5   Global Step: 28440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 12:06:36,405-Speed 5594.38 samples/sec   Loss 8.2916   LearningRate 0.0517   Epoch: 5   Global Step: 28450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 12:06:38,304-Speed 5395.28 samples/sec   Loss 8.4215   LearningRate 0.0516   Epoch: 5   Global Step: 28460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 12:06:40,162-Speed 5514.09 samples/sec   Loss 8.4582   LearningRate 0.0516   Epoch: 5   Global Step: 28470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 12:06:42,003-Speed 5564.11 samples/sec   Loss 8.1978   LearningRate 0.0516   Epoch: 5   Global Step: 28480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 12:06:43,874-Speed 5476.23 samples/sec   Loss 8.4248   LearningRate 0.0516   Epoch: 5   Global Step: 28490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 12:06:45,718-Speed 5557.03 samples/sec   Loss 8.2315   LearningRate 0.0516   Epoch: 5   Global Step: 28500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 12:06:47,572-Speed 5526.29 samples/sec   Loss 8.2005   LearningRate 0.0516   Epoch: 5   Global Step: 28510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 12:06:49,409-Speed 5578.44 samples/sec   Loss 8.3637   LearningRate 0.0516   Epoch: 5   Global Step: 28520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:06:51,237-Speed 5601.71 samples/sec   Loss 8.2882   LearningRate 0.0515   Epoch: 5   Global Step: 28530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:06:53,095-Speed 5516.27 samples/sec   Loss 8.2709   LearningRate 0.0515   Epoch: 5   Global Step: 28540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:06:54,960-Speed 5493.75 samples/sec   Loss 8.1551   LearningRate 0.0515   Epoch: 5   Global Step: 28550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:06:56,789-Speed 5601.11 samples/sec   Loss 8.2968   LearningRate 0.0515   Epoch: 5   Global Step: 28560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:06:58,629-Speed 5569.14 samples/sec   Loss 8.3752   LearningRate 0.0515   Epoch: 5   Global Step: 28570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:07:00,507-Speed 5455.49 samples/sec   Loss 8.1867   LearningRate 0.0515   Epoch: 5   Global Step: 28580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:07:02,376-Speed 5481.38 samples/sec   Loss 8.3019   LearningRate 0.0515   Epoch: 5   Global Step: 28590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:07:04,245-Speed 5480.49 samples/sec   Loss 8.3514   LearningRate 0.0514   Epoch: 5   Global Step: 28600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:07:06,080-Speed 5582.58 samples/sec   Loss 8.2931   LearningRate 0.0514   Epoch: 5   Global Step: 28610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:07:07,952-Speed 5473.64 samples/sec   Loss 8.4357   LearningRate 0.0514   Epoch: 5   Global Step: 28620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:09,787-Speed 5586.01 samples/sec   Loss 8.2628   LearningRate 0.0514   Epoch: 5   Global Step: 28630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:11,753-Speed 5211.72 samples/sec   Loss 8.2155   LearningRate 0.0514   Epoch: 5   Global Step: 28640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:13,608-Speed 5521.99 samples/sec   Loss 8.0900   LearningRate 0.0514   Epoch: 5   Global Step: 28650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:15,480-Speed 5472.78 samples/sec   Loss 8.3342   LearningRate 0.0514   Epoch: 5   Global Step: 28660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:17,336-Speed 5518.50 samples/sec   Loss 8.4144   LearningRate 0.0513   Epoch: 5   Global Step: 28670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:19,201-Speed 5498.82 samples/sec   Loss 8.4299   LearningRate 0.0513   Epoch: 5   Global Step: 28680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:21,064-Speed 5497.82 samples/sec   Loss 8.2923   LearningRate 0.0513   Epoch: 5   Global Step: 28690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:22,919-Speed 5525.32 samples/sec   Loss 8.1640   LearningRate 0.0513   Epoch: 5   Global Step: 28700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:24,765-Speed 5549.95 samples/sec   Loss 8.3031   LearningRate 0.0513   Epoch: 5   Global Step: 28710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:26,604-Speed 5570.48 samples/sec   Loss 8.2356   LearningRate 0.0513   Epoch: 5   Global Step: 28720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:28,451-Speed 5548.19 samples/sec   Loss 8.2920   LearningRate 0.0513   Epoch: 5   Global Step: 28730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:30,295-Speed 5556.22 samples/sec   Loss 8.1844   LearningRate 0.0513   Epoch: 5   Global Step: 28740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:32,134-Speed 5572.16 samples/sec   Loss 8.2185   LearningRate 0.0512   Epoch: 5   Global Step: 28750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:33,972-Speed 5574.37 samples/sec   Loss 8.3014   LearningRate 0.0512   Epoch: 5   Global Step: 28760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:35,825-Speed 5529.24 samples/sec   Loss 8.1919   LearningRate 0.0512   Epoch: 5   Global Step: 28770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:07:37,674-Speed 5541.04 samples/sec   Loss 8.3381   LearningRate 0.0512   Epoch: 5   Global Step: 28780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:07:39,514-Speed 5568.07 samples/sec   Loss 8.1495   LearningRate 0.0512   Epoch: 5   Global Step: 28790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:07:41,347-Speed 5590.40 samples/sec   Loss 8.2527   LearningRate 0.0512   Epoch: 5   Global Step: 28800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:07:43,186-Speed 5567.77 samples/sec   Loss 8.2253   LearningRate 0.0512   Epoch: 5   Global Step: 28810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:07:45,033-Speed 5550.15 samples/sec   Loss 8.2291   LearningRate 0.0511   Epoch: 5   Global Step: 28820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:07:46,890-Speed 5517.48 samples/sec   Loss 8.2480   LearningRate 0.0511   Epoch: 5   Global Step: 28830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:07:48,729-Speed 5571.85 samples/sec   Loss 8.3740   LearningRate 0.0511   Epoch: 5   Global Step: 28840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:07:50,580-Speed 5534.37 samples/sec   Loss 8.1921   LearningRate 0.0511   Epoch: 5   Global Step: 28850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:07:52,471-Speed 5417.19 samples/sec   Loss 8.4349   LearningRate 0.0511   Epoch: 5   Global Step: 28860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:07:54,310-Speed 5571.76 samples/sec   Loss 8.1867   LearningRate 0.0511   Epoch: 5   Global Step: 28870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:56,179-Speed 5481.11 samples/sec   Loss 8.3780   LearningRate 0.0511   Epoch: 5   Global Step: 28880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:58,021-Speed 5562.45 samples/sec   Loss 8.2977   LearningRate 0.0510   Epoch: 5   Global Step: 28890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:07:59,910-Speed 5423.94 samples/sec   Loss 8.2476   LearningRate 0.0510   Epoch: 5   Global Step: 28900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:08:01,740-Speed 5599.65 samples/sec   Loss 8.3718   LearningRate 0.0510   Epoch: 5   Global Step: 28910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:03,609-Speed 5481.99 samples/sec   Loss 8.2592   LearningRate 0.0510   Epoch: 5   Global Step: 28920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:05,446-Speed 5576.36 samples/sec   Loss 8.3387   LearningRate 0.0510   Epoch: 5   Global Step: 28930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:07,299-Speed 5530.54 samples/sec   Loss 8.1448   LearningRate 0.0510   Epoch: 5   Global Step: 28940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:09,133-Speed 5584.89 samples/sec   Loss 8.3823   LearningRate 0.0510   Epoch: 5   Global Step: 28950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:10,991-Speed 5513.50 samples/sec   Loss 8.2973   LearningRate 0.0509   Epoch: 5   Global Step: 28960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:12,862-Speed 5476.32 samples/sec   Loss 8.4185   LearningRate 0.0509   Epoch: 5   Global Step: 28970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:14,757-Speed 5408.48 samples/sec   Loss 8.1122   LearningRate 0.0509   Epoch: 5   Global Step: 28980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:16,615-Speed 5514.55 samples/sec   Loss 8.2383   LearningRate 0.0509   Epoch: 5   Global Step: 28990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:18,471-Speed 5527.34 samples/sec   Loss 8.1854   LearningRate 0.0509   Epoch: 5   Global Step: 29000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:20,316-Speed 5555.61 samples/sec   Loss 8.2587   LearningRate 0.0509   Epoch: 5   Global Step: 29010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:08:22,178-Speed 5500.75 samples/sec   Loss 8.2797   LearningRate 0.0509   Epoch: 5   Global Step: 29020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:24,049-Speed 5475.98 samples/sec   Loss 8.0976   LearningRate 0.0508   Epoch: 5   Global Step: 29030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:25,954-Speed 5377.59 samples/sec   Loss 8.2116   LearningRate 0.0508   Epoch: 5   Global Step: 29040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:27,842-Speed 5458.27 samples/sec   Loss 8.2585   LearningRate 0.0508   Epoch: 5   Global Step: 29050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:29,690-Speed 5546.84 samples/sec   Loss 8.3012   LearningRate 0.0508   Epoch: 5   Global Step: 29060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:31,550-Speed 5507.05 samples/sec   Loss 8.2457   LearningRate 0.0508   Epoch: 5   Global Step: 29070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:33,387-Speed 5578.86 samples/sec   Loss 8.1583   LearningRate 0.0508   Epoch: 5   Global Step: 29080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:35,267-Speed 5449.59 samples/sec   Loss 8.3500   LearningRate 0.0508   Epoch: 5   Global Step: 29090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:37,152-Speed 5434.06 samples/sec   Loss 8.2778   LearningRate 0.0507   Epoch: 5   Global Step: 29100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:38,997-Speed 5555.66 samples/sec   Loss 8.1743   LearningRate 0.0507   Epoch: 5   Global Step: 29110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:40,870-Speed 5468.95 samples/sec   Loss 8.1253   LearningRate 0.0507   Epoch: 5   Global Step: 29120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:08:42,704-Speed 5587.70 samples/sec   Loss 8.2585   LearningRate 0.0507   Epoch: 5   Global Step: 29130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:08:44,552-Speed 5545.44 samples/sec   Loss 8.2064   LearningRate 0.0507   Epoch: 5   Global Step: 29140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:46,390-Speed 5572.76 samples/sec   Loss 8.2100   LearningRate 0.0507   Epoch: 5   Global Step: 29150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:48,222-Speed 5593.58 samples/sec   Loss 8.1507   LearningRate 0.0507   Epoch: 5   Global Step: 29160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:50,078-Speed 5518.85 samples/sec   Loss 8.1677   LearningRate 0.0506   Epoch: 5   Global Step: 29170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:51,922-Speed 5557.89 samples/sec   Loss 8.3150   LearningRate 0.0506   Epoch: 5   Global Step: 29180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:53,778-Speed 5520.05 samples/sec   Loss 8.1007   LearningRate 0.0506   Epoch: 5   Global Step: 29190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:55,630-Speed 5531.73 samples/sec   Loss 8.0966   LearningRate 0.0506   Epoch: 5   Global Step: 29200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:57,496-Speed 5490.91 samples/sec   Loss 8.2018   LearningRate 0.0506   Epoch: 5   Global Step: 29210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:08:59,342-Speed 5551.45 samples/sec   Loss 8.3142   LearningRate 0.0506   Epoch: 5   Global Step: 29220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:09:01,196-Speed 5527.37 samples/sec   Loss 8.1559   LearningRate 0.0506   Epoch: 5   Global Step: 29230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:09:03,033-Speed 5576.33 samples/sec   Loss 8.3141   LearningRate 0.0505   Epoch: 5   Global Step: 29240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:04,912-Speed 5454.32 samples/sec   Loss 8.0888   LearningRate 0.0505   Epoch: 5   Global Step: 29250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:06,760-Speed 5544.27 samples/sec   Loss 8.2265   LearningRate 0.0505   Epoch: 5   Global Step: 29260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:08,606-Speed 5551.05 samples/sec   Loss 8.2333   LearningRate 0.0505   Epoch: 5   Global Step: 29270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:10,489-Speed 5437.94 samples/sec   Loss 8.2119   LearningRate 0.0505   Epoch: 5   Global Step: 29280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:12,333-Speed 5555.91 samples/sec   Loss 8.1201   LearningRate 0.0505   Epoch: 5   Global Step: 29290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:14,208-Speed 5466.93 samples/sec   Loss 8.1023   LearningRate 0.0505   Epoch: 5   Global Step: 29300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:16,068-Speed 5508.44 samples/sec   Loss 8.3206   LearningRate 0.0504   Epoch: 5   Global Step: 29310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:17,910-Speed 5560.74 samples/sec   Loss 8.1367   LearningRate 0.0504   Epoch: 5   Global Step: 29320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:19,782-Speed 5474.35 samples/sec   Loss 8.1176   LearningRate 0.0504   Epoch: 5   Global Step: 29330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:21,608-Speed 5609.88 samples/sec   Loss 8.2028   LearningRate 0.0504   Epoch: 5   Global Step: 29340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:23,456-Speed 5545.57 samples/sec   Loss 8.1617   LearningRate 0.0504   Epoch: 5   Global Step: 29350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:25,322-Speed 5490.43 samples/sec   Loss 8.1236   LearningRate 0.0504   Epoch: 5   Global Step: 29360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:27,195-Speed 5469.77 samples/sec   Loss 8.0855   LearningRate 0.0504   Epoch: 5   Global Step: 29370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:29,052-Speed 5514.65 samples/sec   Loss 8.3254   LearningRate 0.0503   Epoch: 5   Global Step: 29380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:09:30,906-Speed 5529.76 samples/sec   Loss 8.2028   LearningRate 0.0503   Epoch: 5   Global Step: 29390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:09:32,766-Speed 5507.01 samples/sec   Loss 8.1420   LearningRate 0.0503   Epoch: 5   Global Step: 29400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:09:34,611-Speed 5554.00 samples/sec   Loss 8.3020   LearningRate 0.0503   Epoch: 5   Global Step: 29410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:09:36,453-Speed 5560.08 samples/sec   Loss 8.3329   LearningRate 0.0503   Epoch: 5   Global Step: 29420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:09:38,307-Speed 5527.79 samples/sec   Loss 8.3249   LearningRate 0.0503   Epoch: 5   Global Step: 29430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:09:40,142-Speed 5582.72 samples/sec   Loss 8.1347   LearningRate 0.0503   Epoch: 5   Global Step: 29440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:09:41,985-Speed 5559.43 samples/sec   Loss 8.1604   LearningRate 0.0503   Epoch: 5   Global Step: 29450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:09:43,841-Speed 5519.24 samples/sec   Loss 8.1399   LearningRate 0.0502   Epoch: 5   Global Step: 29460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:09:45,694-Speed 5530.06 samples/sec   Loss 7.9869   LearningRate 0.0502   Epoch: 5   Global Step: 29470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:09:47,568-Speed 5467.26 samples/sec   Loss 8.3419   LearningRate 0.0502   Epoch: 5   Global Step: 29480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:49,409-Speed 5565.76 samples/sec   Loss 8.1693   LearningRate 0.0502   Epoch: 5   Global Step: 29490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:51,288-Speed 5450.09 samples/sec   Loss 8.4249   LearningRate 0.0502   Epoch: 5   Global Step: 29500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:53,142-Speed 5527.92 samples/sec   Loss 8.1664   LearningRate 0.0502   Epoch: 5   Global Step: 29510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:55,010-Speed 5485.08 samples/sec   Loss 8.1525   LearningRate 0.0502   Epoch: 5   Global Step: 29520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:56,872-Speed 5502.07 samples/sec   Loss 8.3148   LearningRate 0.0501   Epoch: 5   Global Step: 29530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:09:58,705-Speed 5589.05 samples/sec   Loss 8.2683   LearningRate 0.0501   Epoch: 5   Global Step: 29540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:10:00,562-Speed 5515.77 samples/sec   Loss 8.2660   LearningRate 0.0501   Epoch: 5   Global Step: 29550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:10:02,404-Speed 5563.43 samples/sec   Loss 8.2039   LearningRate 0.0501   Epoch: 5   Global Step: 29560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:10:04,250-Speed 5547.58 samples/sec   Loss 8.0126   LearningRate 0.0501   Epoch: 5   Global Step: 29570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:10:06,135-Speed 5436.45 samples/sec   Loss 8.0582   LearningRate 0.0501   Epoch: 5   Global Step: 29580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:10:07,989-Speed 5526.46 samples/sec   Loss 8.0846   LearningRate 0.0501   Epoch: 5   Global Step: 29590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:10:09,830-Speed 5562.64 samples/sec   Loss 8.1622   LearningRate 0.0500   Epoch: 5   Global Step: 29600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:10:11,682-Speed 5533.71 samples/sec   Loss 8.2724   LearningRate 0.0500   Epoch: 5   Global Step: 29610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:10:13,539-Speed 5515.43 samples/sec   Loss 8.2620   LearningRate 0.0500   Epoch: 5   Global Step: 29620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:10:15,410-Speed 5482.86 samples/sec   Loss 8.0825   LearningRate 0.0500   Epoch: 5   Global Step: 29630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:10:17,252-Speed 5560.78 samples/sec   Loss 8.2017   LearningRate 0.0500   Epoch: 5   Global Step: 29640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:10:19,123-Speed 5476.31 samples/sec   Loss 8.3208   LearningRate 0.0500   Epoch: 5   Global Step: 29650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:10:20,953-Speed 5597.76 samples/sec   Loss 8.3224   LearningRate 0.0500   Epoch: 5   Global Step: 29660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:10:22,792-Speed 5572.25 samples/sec   Loss 8.2101   LearningRate 0.0499   Epoch: 5   Global Step: 29670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:10:24,642-Speed 5538.45 samples/sec   Loss 8.3978   LearningRate 0.0499   Epoch: 5   Global Step: 29680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:10:26,503-Speed 5502.23 samples/sec   Loss 8.0928   LearningRate 0.0499   Epoch: 5   Global Step: 29690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:10:28,340-Speed 5577.95 samples/sec   Loss 8.0851   LearningRate 0.0499   Epoch: 5   Global Step: 29700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:10:30,227-Speed 5430.99 samples/sec   Loss 8.1053   LearningRate 0.0499   Epoch: 5   Global Step: 29710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:10:32,046-Speed 5630.65 samples/sec   Loss 8.2263   LearningRate 0.0499   Epoch: 5   Global Step: 29720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:10:33,897-Speed 5534.99 samples/sec   Loss 8.1652   LearningRate 0.0499   Epoch: 5   Global Step: 29730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:10:35,754-Speed 5517.13 samples/sec   Loss 8.1492   LearningRate 0.0498   Epoch: 5   Global Step: 29740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:10:37,630-Speed 5461.68 samples/sec   Loss 8.2106   LearningRate 0.0498   Epoch: 5   Global Step: 29750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:10:39,481-Speed 5566.93 samples/sec   Loss 8.0671   LearningRate 0.0498   Epoch: 5   Global Step: 29760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:10:41,352-Speed 5476.95 samples/sec   Loss 8.2335   LearningRate 0.0498   Epoch: 5   Global Step: 29770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:10:43,189-Speed 5577.59 samples/sec   Loss 8.3034   LearningRate 0.0498   Epoch: 5   Global Step: 29780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:10:45,060-Speed 5472.93 samples/sec   Loss 8.1283   LearningRate 0.0498   Epoch: 5   Global Step: 29790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:10:46,902-Speed 5565.63 samples/sec   Loss 8.3201   LearningRate 0.0498   Epoch: 5   Global Step: 29800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:10:48,779-Speed 5457.94 samples/sec   Loss 8.2143   LearningRate 0.0497   Epoch: 5   Global Step: 29810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:10:50,645-Speed 5489.89 samples/sec   Loss 8.0630   LearningRate 0.0497   Epoch: 5   Global Step: 29820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:10:52,485-Speed 5566.97 samples/sec   Loss 8.1768   LearningRate 0.0497   Epoch: 5   Global Step: 29830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:10:54,340-Speed 5524.49 samples/sec   Loss 7.9531   LearningRate 0.0497   Epoch: 5   Global Step: 29840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:10:56,208-Speed 5482.77 samples/sec   Loss 8.2399   LearningRate 0.0497   Epoch: 5   Global Step: 29850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:10:58,055-Speed 5548.03 samples/sec   Loss 8.1340   LearningRate 0.0497   Epoch: 5   Global Step: 29860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:10:59,915-Speed 5507.78 samples/sec   Loss 8.1942   LearningRate 0.0497   Epoch: 5   Global Step: 29870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:11:01,763-Speed 5545.48 samples/sec   Loss 8.2016   LearningRate 0.0496   Epoch: 5   Global Step: 29880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:11:03,609-Speed 5549.68 samples/sec   Loss 8.0189   LearningRate 0.0496   Epoch: 5   Global Step: 29890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:11:05,467-Speed 5510.47 samples/sec   Loss 8.1015   LearningRate 0.0496   Epoch: 5   Global Step: 29900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:11:07,322-Speed 5525.83 samples/sec   Loss 8.0681   LearningRate 0.0496   Epoch: 5   Global Step: 29910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:11:09,187-Speed 5493.57 samples/sec   Loss 8.1087   LearningRate 0.0496   Epoch: 5   Global Step: 29920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:11:11,024-Speed 5577.63 samples/sec   Loss 8.1826   LearningRate 0.0496   Epoch: 5   Global Step: 29930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:11:12,862-Speed 5570.65 samples/sec   Loss 8.3014   LearningRate 0.0496   Epoch: 5   Global Step: 29940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:11:14,714-Speed 5534.52 samples/sec   Loss 8.0661   LearningRate 0.0496   Epoch: 5   Global Step: 29950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:11:16,558-Speed 5558.70 samples/sec   Loss 8.1836   LearningRate 0.0495   Epoch: 5   Global Step: 29960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:11:18,430-Speed 5470.95 samples/sec   Loss 8.2496   LearningRate 0.0495   Epoch: 5   Global Step: 29970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:11:20,267-Speed 5578.38 samples/sec   Loss 8.1561   LearningRate 0.0495   Epoch: 5   Global Step: 29980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:11:22,115-Speed 5542.91 samples/sec   Loss 7.9694   LearningRate 0.0495   Epoch: 5   Global Step: 29990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:11:23,966-Speed 5535.17 samples/sec   Loss 8.1143   LearningRate 0.0495   Epoch: 5   Global Step: 30000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:11:51,409-[lfw][30000]XNorm: 22.979852
Training: 2022-04-11 12:11:51,409-[lfw][30000]Accuracy-Flip: 0.99650+-0.00302
Training: 2022-04-11 12:11:51,410-[lfw][30000]Accuracy-Highest: 0.99683
Training: 2022-04-11 12:12:22,840-[cfp_fp][30000]XNorm: 19.932093
Training: 2022-04-11 12:12:22,841-[cfp_fp][30000]Accuracy-Flip: 0.95786+-0.00846
Training: 2022-04-11 12:12:22,842-[cfp_fp][30000]Accuracy-Highest: 0.95786
Training: 2022-04-11 12:12:50,050-[agedb_30][30000]XNorm: 22.505847
Training: 2022-04-11 12:12:50,051-[agedb_30][30000]Accuracy-Flip: 0.97417+-0.00739
Training: 2022-04-11 12:12:50,052-[agedb_30][30000]Accuracy-Highest: 0.97417
Training: 2022-04-11 12:12:51,907-Speed 116.44 samples/sec   Loss 8.1568   LearningRate 0.0495   Epoch: 5   Global Step: 30010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:12:53,739-Speed 5589.31 samples/sec   Loss 8.0571   LearningRate 0.0495   Epoch: 5   Global Step: 30020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:12:55,582-Speed 5558.58 samples/sec   Loss 8.2008   LearningRate 0.0494   Epoch: 5   Global Step: 30030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:12:57,421-Speed 5569.70 samples/sec   Loss 8.1877   LearningRate 0.0494   Epoch: 5   Global Step: 30040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:12:59,264-Speed 5563.79 samples/sec   Loss 8.2891   LearningRate 0.0494   Epoch: 5   Global Step: 30050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:13:01,142-Speed 5454.94 samples/sec   Loss 8.4213   LearningRate 0.0494   Epoch: 5   Global Step: 30060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:13:02,992-Speed 5538.30 samples/sec   Loss 8.1191   LearningRate 0.0494   Epoch: 5   Global Step: 30070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:13:04,854-Speed 5504.70 samples/sec   Loss 8.3017   LearningRate 0.0494   Epoch: 5   Global Step: 30080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:13:06,707-Speed 5526.65 samples/sec   Loss 8.1812   LearningRate 0.0494   Epoch: 5   Global Step: 30090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:13:08,537-Speed 5599.16 samples/sec   Loss 8.2713   LearningRate 0.0493   Epoch: 5   Global Step: 30100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:13:10,405-Speed 5483.56 samples/sec   Loss 8.1517   LearningRate 0.0493   Epoch: 5   Global Step: 30110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:13:12,263-Speed 5515.16 samples/sec   Loss 8.3442   LearningRate 0.0493   Epoch: 5   Global Step: 30120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:13:14,117-Speed 5526.06 samples/sec   Loss 8.0572   LearningRate 0.0493   Epoch: 5   Global Step: 30130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:13:16,002-Speed 5435.29 samples/sec   Loss 8.2055   LearningRate 0.0493   Epoch: 5   Global Step: 30140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:13:17,861-Speed 5510.50 samples/sec   Loss 8.1478   LearningRate 0.0493   Epoch: 5   Global Step: 30150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 12:13:19,723-Speed 5503.52 samples/sec   Loss 8.0420   LearningRate 0.0493   Epoch: 5   Global Step: 30160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 12:13:21,568-Speed 5552.63 samples/sec   Loss 8.1616   LearningRate 0.0492   Epoch: 5   Global Step: 30170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:13:23,434-Speed 5490.95 samples/sec   Loss 7.9543   LearningRate 0.0492   Epoch: 5   Global Step: 30180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:13:25,316-Speed 5444.94 samples/sec   Loss 8.1337   LearningRate 0.0492   Epoch: 5   Global Step: 30190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:13:27,185-Speed 5481.27 samples/sec   Loss 8.1655   LearningRate 0.0492   Epoch: 5   Global Step: 30200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:13:29,027-Speed 5561.08 samples/sec   Loss 8.0569   LearningRate 0.0492   Epoch: 5   Global Step: 30210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:13:30,876-Speed 5541.12 samples/sec   Loss 8.1606   LearningRate 0.0492   Epoch: 5   Global Step: 30220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:13:32,714-Speed 5572.64 samples/sec   Loss 8.0982   LearningRate 0.0492   Epoch: 5   Global Step: 30230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:13:34,589-Speed 5465.76 samples/sec   Loss 8.1161   LearningRate 0.0491   Epoch: 5   Global Step: 30240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:13:36,425-Speed 5579.12 samples/sec   Loss 8.1440   LearningRate 0.0491   Epoch: 5   Global Step: 30250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:13:38,288-Speed 5500.69 samples/sec   Loss 8.2080   LearningRate 0.0491   Epoch: 5   Global Step: 30260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:13:40,123-Speed 5582.54 samples/sec   Loss 8.0763   LearningRate 0.0491   Epoch: 5   Global Step: 30270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:13:41,967-Speed 5555.70 samples/sec   Loss 8.1450   LearningRate 0.0491   Epoch: 5   Global Step: 30280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:13:43,798-Speed 5594.75 samples/sec   Loss 8.0893   LearningRate 0.0491   Epoch: 5   Global Step: 30290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:13:45,645-Speed 5546.29 samples/sec   Loss 8.0887   LearningRate 0.0491   Epoch: 5   Global Step: 30300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:13:47,482-Speed 5579.18 samples/sec   Loss 8.2007   LearningRate 0.0491   Epoch: 5   Global Step: 30310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:13:49,317-Speed 5579.92 samples/sec   Loss 8.0463   LearningRate 0.0490   Epoch: 5   Global Step: 30320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:13:51,195-Speed 5457.26 samples/sec   Loss 8.3381   LearningRate 0.0490   Epoch: 5   Global Step: 30330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:13:53,095-Speed 5392.61 samples/sec   Loss 8.3902   LearningRate 0.0490   Epoch: 5   Global Step: 30340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:04,635-Speed 887.46 samples/sec   Loss 7.9798   LearningRate 0.0490   Epoch: 6   Global Step: 30350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:06,534-Speed 5395.39 samples/sec   Loss 7.3059   LearningRate 0.0490   Epoch: 6   Global Step: 30360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:08,409-Speed 5465.75 samples/sec   Loss 7.2502   LearningRate 0.0490   Epoch: 6   Global Step: 30370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:10,277-Speed 5485.13 samples/sec   Loss 7.2784   LearningRate 0.0490   Epoch: 6   Global Step: 30380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:12,149-Speed 5471.43 samples/sec   Loss 7.1624   LearningRate 0.0489   Epoch: 6   Global Step: 30390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:14,038-Speed 5424.16 samples/sec   Loss 7.2509   LearningRate 0.0489   Epoch: 6   Global Step: 30400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:15,941-Speed 5385.45 samples/sec   Loss 7.3422   LearningRate 0.0489   Epoch: 6   Global Step: 30410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:17,804-Speed 5497.04 samples/sec   Loss 7.2362   LearningRate 0.0489   Epoch: 6   Global Step: 30420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:19,668-Speed 5498.08 samples/sec   Loss 7.2689   LearningRate 0.0489   Epoch: 6   Global Step: 30430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:21,535-Speed 5488.12 samples/sec   Loss 7.4526   LearningRate 0.0489   Epoch: 6   Global Step: 30440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:23,390-Speed 5522.39 samples/sec   Loss 7.3569   LearningRate 0.0489   Epoch: 6   Global Step: 30450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:25,232-Speed 5561.91 samples/sec   Loss 7.4468   LearningRate 0.0488   Epoch: 6   Global Step: 30460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:27,086-Speed 5525.02 samples/sec   Loss 7.4102   LearningRate 0.0488   Epoch: 6   Global Step: 30470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:29,015-Speed 5311.25 samples/sec   Loss 7.3867   LearningRate 0.0488   Epoch: 6   Global Step: 30480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:30,882-Speed 5486.36 samples/sec   Loss 7.4307   LearningRate 0.0488   Epoch: 6   Global Step: 30490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:32,755-Speed 5471.97 samples/sec   Loss 7.5243   LearningRate 0.0488   Epoch: 6   Global Step: 30500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:34,577-Speed 5623.13 samples/sec   Loss 7.5794   LearningRate 0.0488   Epoch: 6   Global Step: 30510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:36,435-Speed 5512.22 samples/sec   Loss 7.4054   LearningRate 0.0488   Epoch: 6   Global Step: 30520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:38,289-Speed 5526.75 samples/sec   Loss 7.5556   LearningRate 0.0487   Epoch: 6   Global Step: 30530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:40,125-Speed 5581.61 samples/sec   Loss 7.5434   LearningRate 0.0487   Epoch: 6   Global Step: 30540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:41,964-Speed 5571.42 samples/sec   Loss 7.3630   LearningRate 0.0487   Epoch: 6   Global Step: 30550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:14:43,808-Speed 5553.76 samples/sec   Loss 7.5939   LearningRate 0.0487   Epoch: 6   Global Step: 30560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:14:45,673-Speed 5497.08 samples/sec   Loss 7.5506   LearningRate 0.0487   Epoch: 6   Global Step: 30570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:14:47,513-Speed 5568.04 samples/sec   Loss 7.4743   LearningRate 0.0487   Epoch: 6   Global Step: 30580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:14:49,408-Speed 5407.25 samples/sec   Loss 7.5177   LearningRate 0.0487   Epoch: 6   Global Step: 30590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:14:51,248-Speed 5567.09 samples/sec   Loss 7.4754   LearningRate 0.0487   Epoch: 6   Global Step: 30600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:14:53,127-Speed 5450.68 samples/sec   Loss 7.4088   LearningRate 0.0486   Epoch: 6   Global Step: 30610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:14:54,967-Speed 5569.78 samples/sec   Loss 7.4078   LearningRate 0.0486   Epoch: 6   Global Step: 30620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:14:56,819-Speed 5533.46 samples/sec   Loss 7.5664   LearningRate 0.0486   Epoch: 6   Global Step: 30630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:14:58,704-Speed 5434.17 samples/sec   Loss 7.6147   LearningRate 0.0486   Epoch: 6   Global Step: 30640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:00,555-Speed 5534.47 samples/sec   Loss 7.5083   LearningRate 0.0486   Epoch: 6   Global Step: 30650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:02,397-Speed 5566.03 samples/sec   Loss 7.5391   LearningRate 0.0486   Epoch: 6   Global Step: 30660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:15:04,293-Speed 5402.21 samples/sec   Loss 7.5387   LearningRate 0.0486   Epoch: 6   Global Step: 30670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:15:06,137-Speed 5557.20 samples/sec   Loss 7.3934   LearningRate 0.0485   Epoch: 6   Global Step: 30680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:08,017-Speed 5450.97 samples/sec   Loss 7.6460   LearningRate 0.0485   Epoch: 6   Global Step: 30690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:09,856-Speed 5571.75 samples/sec   Loss 7.7221   LearningRate 0.0485   Epoch: 6   Global Step: 30700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:11,712-Speed 5519.79 samples/sec   Loss 7.4344   LearningRate 0.0485   Epoch: 6   Global Step: 30710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:13,579-Speed 5488.52 samples/sec   Loss 7.6509   LearningRate 0.0485   Epoch: 6   Global Step: 30720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:15,440-Speed 5505.30 samples/sec   Loss 7.4932   LearningRate 0.0485   Epoch: 6   Global Step: 30730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:17,292-Speed 5531.35 samples/sec   Loss 7.6282   LearningRate 0.0485   Epoch: 6   Global Step: 30740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:19,147-Speed 5523.68 samples/sec   Loss 7.4510   LearningRate 0.0484   Epoch: 6   Global Step: 30750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:21,023-Speed 5461.64 samples/sec   Loss 7.5707   LearningRate 0.0484   Epoch: 6   Global Step: 30760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:22,884-Speed 5507.12 samples/sec   Loss 7.7058   LearningRate 0.0484   Epoch: 6   Global Step: 30770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:24,762-Speed 5455.50 samples/sec   Loss 7.7447   LearningRate 0.0484   Epoch: 6   Global Step: 30780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:15:26,640-Speed 5455.86 samples/sec   Loss 7.6119   LearningRate 0.0484   Epoch: 6   Global Step: 30790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:15:28,461-Speed 5625.90 samples/sec   Loss 7.6775   LearningRate 0.0484   Epoch: 6   Global Step: 30800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:15:30,359-Speed 5398.42 samples/sec   Loss 7.7338   LearningRate 0.0484   Epoch: 6   Global Step: 30810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:15:32,207-Speed 5542.31 samples/sec   Loss 7.6595   LearningRate 0.0483   Epoch: 6   Global Step: 30820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:15:34,070-Speed 5501.78 samples/sec   Loss 7.6825   LearningRate 0.0483   Epoch: 6   Global Step: 30830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:15:35,902-Speed 5591.39 samples/sec   Loss 7.7567   LearningRate 0.0483   Epoch: 6   Global Step: 30840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:15:37,775-Speed 5469.68 samples/sec   Loss 7.7843   LearningRate 0.0483   Epoch: 6   Global Step: 30850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:15:39,651-Speed 5462.74 samples/sec   Loss 7.6855   LearningRate 0.0483   Epoch: 6   Global Step: 30860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:15:41,507-Speed 5518.43 samples/sec   Loss 7.6283   LearningRate 0.0483   Epoch: 6   Global Step: 30870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:15:43,390-Speed 5442.15 samples/sec   Loss 7.8191   LearningRate 0.0483   Epoch: 6   Global Step: 30880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:15:45,224-Speed 5585.74 samples/sec   Loss 7.8524   LearningRate 0.0483   Epoch: 6   Global Step: 30890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:15:47,104-Speed 5448.25 samples/sec   Loss 7.5946   LearningRate 0.0482   Epoch: 6   Global Step: 30900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:48,991-Speed 5430.79 samples/sec   Loss 7.6516   LearningRate 0.0482   Epoch: 6   Global Step: 30910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:50,837-Speed 5549.49 samples/sec   Loss 7.8220   LearningRate 0.0482   Epoch: 6   Global Step: 30920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:52,675-Speed 5574.53 samples/sec   Loss 7.5784   LearningRate 0.0482   Epoch: 6   Global Step: 30930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:54,518-Speed 5557.98 samples/sec   Loss 7.7981   LearningRate 0.0482   Epoch: 6   Global Step: 30940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:56,356-Speed 5571.77 samples/sec   Loss 7.7871   LearningRate 0.0482   Epoch: 6   Global Step: 30950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:15:58,208-Speed 5531.96 samples/sec   Loss 7.7279   LearningRate 0.0482   Epoch: 6   Global Step: 30960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:00,064-Speed 5523.75 samples/sec   Loss 7.6603   LearningRate 0.0481   Epoch: 6   Global Step: 30970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:01,901-Speed 5574.20 samples/sec   Loss 7.7035   LearningRate 0.0481   Epoch: 6   Global Step: 30980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:03,746-Speed 5553.63 samples/sec   Loss 7.6348   LearningRate 0.0481   Epoch: 6   Global Step: 30990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:05,581-Speed 5584.64 samples/sec   Loss 7.7047   LearningRate 0.0481   Epoch: 6   Global Step: 31000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:16:07,410-Speed 5599.60 samples/sec   Loss 7.7531   LearningRate 0.0481   Epoch: 6   Global Step: 31010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:09,266-Speed 5521.13 samples/sec   Loss 7.8909   LearningRate 0.0481   Epoch: 6   Global Step: 31020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:11,162-Speed 5404.07 samples/sec   Loss 7.7124   LearningRate 0.0481   Epoch: 6   Global Step: 31030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:13,040-Speed 5452.64 samples/sec   Loss 7.7422   LearningRate 0.0480   Epoch: 6   Global Step: 31040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:14,894-Speed 5526.08 samples/sec   Loss 7.6490   LearningRate 0.0480   Epoch: 6   Global Step: 31050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:16,767-Speed 5472.46 samples/sec   Loss 7.7469   LearningRate 0.0480   Epoch: 6   Global Step: 31060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:18,614-Speed 5546.01 samples/sec   Loss 7.7832   LearningRate 0.0480   Epoch: 6   Global Step: 31070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:20,458-Speed 5557.59 samples/sec   Loss 7.7223   LearningRate 0.0480   Epoch: 6   Global Step: 31080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:22,333-Speed 5461.07 samples/sec   Loss 7.7036   LearningRate 0.0480   Epoch: 6   Global Step: 31090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:24,181-Speed 5545.70 samples/sec   Loss 7.7510   LearningRate 0.0480   Epoch: 6   Global Step: 31100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:26,065-Speed 5439.15 samples/sec   Loss 7.7271   LearningRate 0.0480   Epoch: 6   Global Step: 31110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:16:27,902-Speed 5576.91 samples/sec   Loss 7.7418   LearningRate 0.0479   Epoch: 6   Global Step: 31120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:16:29,739-Speed 5577.67 samples/sec   Loss 7.8867   LearningRate 0.0479   Epoch: 6   Global Step: 31130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:16:31,590-Speed 5535.45 samples/sec   Loss 7.7370   LearningRate 0.0479   Epoch: 6   Global Step: 31140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:16:33,425-Speed 5581.28 samples/sec   Loss 7.7120   LearningRate 0.0479   Epoch: 6   Global Step: 31150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:16:35,290-Speed 5496.51 samples/sec   Loss 7.8885   LearningRate 0.0479   Epoch: 6   Global Step: 31160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:37,160-Speed 5484.38 samples/sec   Loss 7.9500   LearningRate 0.0479   Epoch: 6   Global Step: 31170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:39,053-Speed 5412.63 samples/sec   Loss 7.8214   LearningRate 0.0479   Epoch: 6   Global Step: 31180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:40,927-Speed 5467.50 samples/sec   Loss 7.8986   LearningRate 0.0478   Epoch: 6   Global Step: 31190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:42,788-Speed 5506.52 samples/sec   Loss 7.8139   LearningRate 0.0478   Epoch: 6   Global Step: 31200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:44,624-Speed 5581.25 samples/sec   Loss 7.7602   LearningRate 0.0478   Epoch: 6   Global Step: 31210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:46,496-Speed 5472.69 samples/sec   Loss 7.8710   LearningRate 0.0478   Epoch: 6   Global Step: 31220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:48,361-Speed 5492.03 samples/sec   Loss 7.7914   LearningRate 0.0478   Epoch: 6   Global Step: 31230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:50,222-Speed 5507.70 samples/sec   Loss 7.8395   LearningRate 0.0478   Epoch: 6   Global Step: 31240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:52,072-Speed 5535.56 samples/sec   Loss 7.8436   LearningRate 0.0478   Epoch: 6   Global Step: 31250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:16:53,928-Speed 5520.75 samples/sec   Loss 7.9222   LearningRate 0.0477   Epoch: 6   Global Step: 31260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:16:55,786-Speed 5515.58 samples/sec   Loss 7.8208   LearningRate 0.0477   Epoch: 6   Global Step: 31270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:16:57,621-Speed 5584.03 samples/sec   Loss 7.6607   LearningRate 0.0477   Epoch: 6   Global Step: 31280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:16:59,454-Speed 5588.08 samples/sec   Loss 7.7613   LearningRate 0.0477   Epoch: 6   Global Step: 31290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:17:01,338-Speed 5437.42 samples/sec   Loss 7.8085   LearningRate 0.0477   Epoch: 6   Global Step: 31300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:17:03,188-Speed 5540.14 samples/sec   Loss 7.7764   LearningRate 0.0477   Epoch: 6   Global Step: 31310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:17:05,057-Speed 5480.36 samples/sec   Loss 7.7524   LearningRate 0.0477   Epoch: 6   Global Step: 31320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:17:06,896-Speed 5570.59 samples/sec   Loss 7.9987   LearningRate 0.0477   Epoch: 6   Global Step: 31330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:17:08,734-Speed 5574.04 samples/sec   Loss 7.8001   LearningRate 0.0476   Epoch: 6   Global Step: 31340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:17:10,581-Speed 5548.94 samples/sec   Loss 7.9100   LearningRate 0.0476   Epoch: 6   Global Step: 31350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:17:12,453-Speed 5473.00 samples/sec   Loss 7.8911   LearningRate 0.0476   Epoch: 6   Global Step: 31360   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 12:17:14,308-Speed 5524.37 samples/sec   Loss 7.7634   LearningRate 0.0476   Epoch: 6   Global Step: 31370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:17:16,181-Speed 5471.32 samples/sec   Loss 7.8077   LearningRate 0.0476   Epoch: 6   Global Step: 31380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:17:18,030-Speed 5538.83 samples/sec   Loss 7.8482   LearningRate 0.0476   Epoch: 6   Global Step: 31390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:17:19,876-Speed 5551.02 samples/sec   Loss 7.7472   LearningRate 0.0476   Epoch: 6   Global Step: 31400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:17:21,736-Speed 5508.57 samples/sec   Loss 7.6672   LearningRate 0.0475   Epoch: 6   Global Step: 31410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:17:23,572-Speed 5581.90 samples/sec   Loss 7.9211   LearningRate 0.0475   Epoch: 6   Global Step: 31420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:17:25,409-Speed 5577.54 samples/sec   Loss 7.7372   LearningRate 0.0475   Epoch: 6   Global Step: 31430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:17:27,271-Speed 5500.23 samples/sec   Loss 7.7068   LearningRate 0.0475   Epoch: 6   Global Step: 31440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:17:29,157-Speed 5431.71 samples/sec   Loss 7.7355   LearningRate 0.0475   Epoch: 6   Global Step: 31450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:17:31,000-Speed 5561.53 samples/sec   Loss 7.8547   LearningRate 0.0475   Epoch: 6   Global Step: 31460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:17:32,877-Speed 5457.81 samples/sec   Loss 7.8643   LearningRate 0.0475   Epoch: 6   Global Step: 31470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:17:34,713-Speed 5582.05 samples/sec   Loss 7.7408   LearningRate 0.0474   Epoch: 6   Global Step: 31480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:17:36,555-Speed 5560.98 samples/sec   Loss 7.8939   LearningRate 0.0474   Epoch: 6   Global Step: 31490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:17:38,406-Speed 5534.62 samples/sec   Loss 7.5938   LearningRate 0.0474   Epoch: 6   Global Step: 31500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:17:40,273-Speed 5489.35 samples/sec   Loss 8.0806   LearningRate 0.0474   Epoch: 6   Global Step: 31510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:17:42,155-Speed 5444.82 samples/sec   Loss 7.8101   LearningRate 0.0474   Epoch: 6   Global Step: 31520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:17:44,002-Speed 5544.56 samples/sec   Loss 7.8521   LearningRate 0.0474   Epoch: 6   Global Step: 31530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:17:45,865-Speed 5502.78 samples/sec   Loss 7.7985   LearningRate 0.0474   Epoch: 6   Global Step: 31540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:17:47,712-Speed 5544.55 samples/sec   Loss 7.8112   LearningRate 0.0474   Epoch: 6   Global Step: 31550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:17:49,578-Speed 5491.24 samples/sec   Loss 7.8883   LearningRate 0.0473   Epoch: 6   Global Step: 31560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:17:51,474-Speed 5404.42 samples/sec   Loss 7.9587   LearningRate 0.0473   Epoch: 6   Global Step: 31570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:17:53,328-Speed 5524.30 samples/sec   Loss 7.9567   LearningRate 0.0473   Epoch: 6   Global Step: 31580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:17:55,175-Speed 5548.09 samples/sec   Loss 7.8119   LearningRate 0.0473   Epoch: 6   Global Step: 31590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:17:57,011-Speed 5581.75 samples/sec   Loss 7.6257   LearningRate 0.0473   Epoch: 6   Global Step: 31600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:17:58,891-Speed 5447.56 samples/sec   Loss 7.7109   LearningRate 0.0473   Epoch: 6   Global Step: 31610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:00,748-Speed 5519.20 samples/sec   Loss 7.7536   LearningRate 0.0473   Epoch: 6   Global Step: 31620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:02,617-Speed 5480.92 samples/sec   Loss 7.7975   LearningRate 0.0472   Epoch: 6   Global Step: 31630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:04,484-Speed 5488.18 samples/sec   Loss 7.8087   LearningRate 0.0472   Epoch: 6   Global Step: 31640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:06,352-Speed 5484.70 samples/sec   Loss 7.7747   LearningRate 0.0472   Epoch: 6   Global Step: 31650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:08,201-Speed 5539.67 samples/sec   Loss 7.7461   LearningRate 0.0472   Epoch: 6   Global Step: 31660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:10,068-Speed 5491.18 samples/sec   Loss 7.8200   LearningRate 0.0472   Epoch: 6   Global Step: 31670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:11,953-Speed 5434.58 samples/sec   Loss 7.7499   LearningRate 0.0472   Epoch: 6   Global Step: 31680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:13,843-Speed 5419.11 samples/sec   Loss 7.7222   LearningRate 0.0472   Epoch: 6   Global Step: 31690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:15,727-Speed 5441.98 samples/sec   Loss 7.6993   LearningRate 0.0471   Epoch: 6   Global Step: 31700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:17,589-Speed 5502.69 samples/sec   Loss 7.6928   LearningRate 0.0471   Epoch: 6   Global Step: 31710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:19,483-Speed 5409.54 samples/sec   Loss 7.8549   LearningRate 0.0471   Epoch: 6   Global Step: 31720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:21,331-Speed 5543.12 samples/sec   Loss 7.7535   LearningRate 0.0471   Epoch: 6   Global Step: 31730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:23,182-Speed 5533.19 samples/sec   Loss 7.8575   LearningRate 0.0471   Epoch: 6   Global Step: 31740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:25,061-Speed 5454.44 samples/sec   Loss 7.7156   LearningRate 0.0471   Epoch: 6   Global Step: 31750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:26,911-Speed 5539.30 samples/sec   Loss 7.9385   LearningRate 0.0471   Epoch: 6   Global Step: 31760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:28,764-Speed 5528.04 samples/sec   Loss 7.8382   LearningRate 0.0471   Epoch: 6   Global Step: 31770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:30,622-Speed 5514.01 samples/sec   Loss 7.8611   LearningRate 0.0470   Epoch: 6   Global Step: 31780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:32,481-Speed 5513.17 samples/sec   Loss 7.8941   LearningRate 0.0470   Epoch: 6   Global Step: 31790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:34,329-Speed 5544.35 samples/sec   Loss 7.7079   LearningRate 0.0470   Epoch: 6   Global Step: 31800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:36,201-Speed 5471.23 samples/sec   Loss 7.9679   LearningRate 0.0470   Epoch: 6   Global Step: 31810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:38,051-Speed 5540.37 samples/sec   Loss 7.8563   LearningRate 0.0470   Epoch: 6   Global Step: 31820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:18:39,920-Speed 5479.92 samples/sec   Loss 7.7982   LearningRate 0.0470   Epoch: 6   Global Step: 31830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:18:41,775-Speed 5522.83 samples/sec   Loss 7.8532   LearningRate 0.0470   Epoch: 6   Global Step: 31840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:18:43,648-Speed 5471.10 samples/sec   Loss 7.7607   LearningRate 0.0469   Epoch: 6   Global Step: 31850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:18:45,510-Speed 5503.80 samples/sec   Loss 7.8235   LearningRate 0.0469   Epoch: 6   Global Step: 31860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:18:47,364-Speed 5527.31 samples/sec   Loss 7.9518   LearningRate 0.0469   Epoch: 6   Global Step: 31870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:18:49,229-Speed 5492.67 samples/sec   Loss 7.9817   LearningRate 0.0469   Epoch: 6   Global Step: 31880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:18:51,112-Speed 5441.20 samples/sec   Loss 7.7706   LearningRate 0.0469   Epoch: 6   Global Step: 31890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:18:52,975-Speed 5500.55 samples/sec   Loss 7.7481   LearningRate 0.0469   Epoch: 6   Global Step: 31900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:18:54,843-Speed 5485.05 samples/sec   Loss 7.8878   LearningRate 0.0469   Epoch: 6   Global Step: 31910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:18:56,691-Speed 5544.09 samples/sec   Loss 8.0042   LearningRate 0.0468   Epoch: 6   Global Step: 31920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:18:58,550-Speed 5512.99 samples/sec   Loss 7.8521   LearningRate 0.0468   Epoch: 6   Global Step: 31930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:19:00,432-Speed 5444.77 samples/sec   Loss 7.8309   LearningRate 0.0468   Epoch: 6   Global Step: 31940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:19:02,284-Speed 5530.69 samples/sec   Loss 7.9511   LearningRate 0.0468   Epoch: 6   Global Step: 31950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:19:04,122-Speed 5573.70 samples/sec   Loss 7.6924   LearningRate 0.0468   Epoch: 6   Global Step: 31960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:19:05,966-Speed 5554.90 samples/sec   Loss 7.8935   LearningRate 0.0468   Epoch: 6   Global Step: 31970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:19:07,845-Speed 5452.55 samples/sec   Loss 7.8329   LearningRate 0.0468   Epoch: 6   Global Step: 31980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:19:09,698-Speed 5531.81 samples/sec   Loss 7.8312   LearningRate 0.0468   Epoch: 6   Global Step: 31990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:19:11,584-Speed 5431.41 samples/sec   Loss 7.7212   LearningRate 0.0467   Epoch: 6   Global Step: 32000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:19:38,844-[lfw][32000]XNorm: 23.115894
Training: 2022-04-11 12:19:38,845-[lfw][32000]Accuracy-Flip: 0.99733+-0.00213
Training: 2022-04-11 12:19:38,845-[lfw][32000]Accuracy-Highest: 0.99733
Training: 2022-04-11 12:20:10,055-[cfp_fp][32000]XNorm: 20.174163
Training: 2022-04-11 12:20:10,056-[cfp_fp][32000]Accuracy-Flip: 0.96000+-0.00748
Training: 2022-04-11 12:20:10,056-[cfp_fp][32000]Accuracy-Highest: 0.96000
Training: 2022-04-11 12:20:36,906-[agedb_30][32000]XNorm: 22.824152
Training: 2022-04-11 12:20:36,907-[agedb_30][32000]Accuracy-Flip: 0.97467+-0.00767
Training: 2022-04-11 12:20:36,907-[agedb_30][32000]Accuracy-Highest: 0.97467
Training: 2022-04-11 12:20:38,766-Speed 117.46 samples/sec   Loss 7.8172   LearningRate 0.0467   Epoch: 6   Global Step: 32010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:20:40,590-Speed 5616.29 samples/sec   Loss 7.8672   LearningRate 0.0467   Epoch: 6   Global Step: 32020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:20:42,438-Speed 5545.64 samples/sec   Loss 7.8104   LearningRate 0.0467   Epoch: 6   Global Step: 32030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:20:44,261-Speed 5617.32 samples/sec   Loss 7.7444   LearningRate 0.0467   Epoch: 6   Global Step: 32040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:20:46,087-Speed 5610.02 samples/sec   Loss 7.8451   LearningRate 0.0467   Epoch: 6   Global Step: 32050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:20:47,917-Speed 5599.66 samples/sec   Loss 7.8648   LearningRate 0.0467   Epoch: 6   Global Step: 32060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:20:49,753-Speed 5577.59 samples/sec   Loss 8.1877   LearningRate 0.0466   Epoch: 6   Global Step: 32070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:20:51,581-Speed 5605.68 samples/sec   Loss 7.7958   LearningRate 0.0466   Epoch: 6   Global Step: 32080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:20:53,447-Speed 5489.26 samples/sec   Loss 7.7898   LearningRate 0.0466   Epoch: 6   Global Step: 32090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:20:55,298-Speed 5536.20 samples/sec   Loss 7.7875   LearningRate 0.0466   Epoch: 6   Global Step: 32100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:20:57,150-Speed 5531.23 samples/sec   Loss 7.8510   LearningRate 0.0466   Epoch: 6   Global Step: 32110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:20:58,998-Speed 5545.10 samples/sec   Loss 7.7769   LearningRate 0.0466   Epoch: 6   Global Step: 32120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:21:00,922-Speed 5324.59 samples/sec   Loss 7.7723   LearningRate 0.0466   Epoch: 6   Global Step: 32130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:21:02,751-Speed 5602.38 samples/sec   Loss 7.9388   LearningRate 0.0466   Epoch: 6   Global Step: 32140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:21:04,602-Speed 5533.72 samples/sec   Loss 7.8654   LearningRate 0.0465   Epoch: 6   Global Step: 32150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:21:06,501-Speed 5394.22 samples/sec   Loss 7.7083   LearningRate 0.0465   Epoch: 6   Global Step: 32160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:21:08,327-Speed 5610.57 samples/sec   Loss 7.8744   LearningRate 0.0465   Epoch: 6   Global Step: 32170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:21:10,187-Speed 5509.80 samples/sec   Loss 7.7989   LearningRate 0.0465   Epoch: 6   Global Step: 32180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:21:12,025-Speed 5573.63 samples/sec   Loss 7.9759   LearningRate 0.0465   Epoch: 6   Global Step: 32190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:21:13,857-Speed 5591.76 samples/sec   Loss 7.9297   LearningRate 0.0465   Epoch: 6   Global Step: 32200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:15,718-Speed 5504.45 samples/sec   Loss 7.8319   LearningRate 0.0465   Epoch: 6   Global Step: 32210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:17,582-Speed 5496.39 samples/sec   Loss 7.7899   LearningRate 0.0464   Epoch: 6   Global Step: 32220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:19,434-Speed 5534.85 samples/sec   Loss 7.8900   LearningRate 0.0464   Epoch: 6   Global Step: 32230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:21,279-Speed 5553.70 samples/sec   Loss 7.9016   LearningRate 0.0464   Epoch: 6   Global Step: 32240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:23,155-Speed 5460.08 samples/sec   Loss 7.9122   LearningRate 0.0464   Epoch: 6   Global Step: 32250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:25,019-Speed 5499.39 samples/sec   Loss 7.8563   LearningRate 0.0464   Epoch: 6   Global Step: 32260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:26,892-Speed 5469.59 samples/sec   Loss 7.6342   LearningRate 0.0464   Epoch: 6   Global Step: 32270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:28,730-Speed 5572.17 samples/sec   Loss 7.8511   LearningRate 0.0464   Epoch: 6   Global Step: 32280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:30,569-Speed 5572.83 samples/sec   Loss 7.8383   LearningRate 0.0463   Epoch: 6   Global Step: 32290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:32,426-Speed 5515.41 samples/sec   Loss 7.7036   LearningRate 0.0463   Epoch: 6   Global Step: 32300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:21:34,321-Speed 5407.98 samples/sec   Loss 7.8943   LearningRate 0.0463   Epoch: 6   Global Step: 32310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:21:36,170-Speed 5540.82 samples/sec   Loss 7.8889   LearningRate 0.0463   Epoch: 6   Global Step: 32320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:21:38,026-Speed 5521.11 samples/sec   Loss 7.8097   LearningRate 0.0463   Epoch: 6   Global Step: 32330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:21:39,897-Speed 5476.46 samples/sec   Loss 7.7858   LearningRate 0.0463   Epoch: 6   Global Step: 32340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:21:41,746-Speed 5539.56 samples/sec   Loss 7.8922   LearningRate 0.0463   Epoch: 6   Global Step: 32350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:21:43,584-Speed 5573.82 samples/sec   Loss 7.7658   LearningRate 0.0463   Epoch: 6   Global Step: 32360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:21:45,452-Speed 5485.10 samples/sec   Loss 7.8927   LearningRate 0.0462   Epoch: 6   Global Step: 32370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:47,286-Speed 5586.26 samples/sec   Loss 7.8878   LearningRate 0.0462   Epoch: 6   Global Step: 32380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:49,163-Speed 5455.89 samples/sec   Loss 7.7578   LearningRate 0.0462   Epoch: 6   Global Step: 32390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:51,007-Speed 5557.75 samples/sec   Loss 7.9819   LearningRate 0.0462   Epoch: 6   Global Step: 32400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:52,864-Speed 5516.70 samples/sec   Loss 7.7699   LearningRate 0.0462   Epoch: 6   Global Step: 32410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:54,707-Speed 5559.95 samples/sec   Loss 7.7128   LearningRate 0.0462   Epoch: 6   Global Step: 32420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:56,560-Speed 5529.83 samples/sec   Loss 8.0545   LearningRate 0.0462   Epoch: 6   Global Step: 32430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:21:58,419-Speed 5510.31 samples/sec   Loss 7.7619   LearningRate 0.0461   Epoch: 6   Global Step: 32440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:00,260-Speed 5564.91 samples/sec   Loss 7.8742   LearningRate 0.0461   Epoch: 6   Global Step: 32450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:02,158-Speed 5396.76 samples/sec   Loss 7.9284   LearningRate 0.0461   Epoch: 6   Global Step: 32460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:04,004-Speed 5550.62 samples/sec   Loss 7.9005   LearningRate 0.0461   Epoch: 6   Global Step: 32470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:22:05,934-Speed 5307.16 samples/sec   Loss 7.7764   LearningRate 0.0461   Epoch: 6   Global Step: 32480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:22:07,785-Speed 5535.93 samples/sec   Loss 7.9627   LearningRate 0.0461   Epoch: 6   Global Step: 32490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:09,634-Speed 5540.72 samples/sec   Loss 7.8242   LearningRate 0.0461   Epoch: 6   Global Step: 32500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:11,469-Speed 5581.95 samples/sec   Loss 7.8342   LearningRate 0.0461   Epoch: 6   Global Step: 32510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:13,362-Speed 5414.17 samples/sec   Loss 7.7436   LearningRate 0.0460   Epoch: 6   Global Step: 32520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:15,228-Speed 5489.23 samples/sec   Loss 7.8853   LearningRate 0.0460   Epoch: 6   Global Step: 32530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:17,121-Speed 5413.83 samples/sec   Loss 7.9151   LearningRate 0.0460   Epoch: 6   Global Step: 32540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:18,996-Speed 5461.52 samples/sec   Loss 7.7815   LearningRate 0.0460   Epoch: 6   Global Step: 32550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:20,827-Speed 5596.79 samples/sec   Loss 7.7933   LearningRate 0.0460   Epoch: 6   Global Step: 32560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:22,687-Speed 5508.35 samples/sec   Loss 7.7358   LearningRate 0.0460   Epoch: 6   Global Step: 32570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:24,538-Speed 5536.80 samples/sec   Loss 7.8878   LearningRate 0.0460   Epoch: 6   Global Step: 32580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:26,385-Speed 5545.07 samples/sec   Loss 7.7267   LearningRate 0.0459   Epoch: 6   Global Step: 32590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:22:28,208-Speed 5619.88 samples/sec   Loss 7.7426   LearningRate 0.0459   Epoch: 6   Global Step: 32600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:30,062-Speed 5527.17 samples/sec   Loss 7.8468   LearningRate 0.0459   Epoch: 6   Global Step: 32610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:31,913-Speed 5532.11 samples/sec   Loss 7.7741   LearningRate 0.0459   Epoch: 6   Global Step: 32620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:33,771-Speed 5514.57 samples/sec   Loss 7.6163   LearningRate 0.0459   Epoch: 6   Global Step: 32630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:35,626-Speed 5524.77 samples/sec   Loss 7.7564   LearningRate 0.0459   Epoch: 6   Global Step: 32640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:37,478-Speed 5533.53 samples/sec   Loss 7.8408   LearningRate 0.0459   Epoch: 6   Global Step: 32650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:22:39,315-Speed 5576.99 samples/sec   Loss 7.8771   LearningRate 0.0459   Epoch: 6   Global Step: 32660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:22:41,218-Speed 5382.94 samples/sec   Loss 7.8209   LearningRate 0.0458   Epoch: 6   Global Step: 32670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:22:43,070-Speed 5537.62 samples/sec   Loss 7.6171   LearningRate 0.0458   Epoch: 6   Global Step: 32680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:22:44,913-Speed 5558.62 samples/sec   Loss 7.9518   LearningRate 0.0458   Epoch: 6   Global Step: 32690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:22:46,745-Speed 5590.95 samples/sec   Loss 7.8468   LearningRate 0.0458   Epoch: 6   Global Step: 32700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:22:48,623-Speed 5457.15 samples/sec   Loss 7.9088   LearningRate 0.0458   Epoch: 6   Global Step: 32710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:22:50,475-Speed 5532.26 samples/sec   Loss 7.8805   LearningRate 0.0458   Epoch: 6   Global Step: 32720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:22:52,353-Speed 5452.55 samples/sec   Loss 7.8380   LearningRate 0.0458   Epoch: 6   Global Step: 32730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:22:54,188-Speed 5583.82 samples/sec   Loss 7.7316   LearningRate 0.0457   Epoch: 6   Global Step: 32740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:22:56,023-Speed 5585.53 samples/sec   Loss 7.8079   LearningRate 0.0457   Epoch: 6   Global Step: 32750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:57,875-Speed 5532.54 samples/sec   Loss 7.7419   LearningRate 0.0457   Epoch: 6   Global Step: 32760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:22:59,759-Speed 5436.23 samples/sec   Loss 7.7384   LearningRate 0.0457   Epoch: 6   Global Step: 32770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:01,591-Speed 5592.07 samples/sec   Loss 8.0306   LearningRate 0.0457   Epoch: 6   Global Step: 32780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:03,428-Speed 5576.03 samples/sec   Loss 7.7276   LearningRate 0.0457   Epoch: 6   Global Step: 32790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:05,279-Speed 5534.99 samples/sec   Loss 7.8621   LearningRate 0.0457   Epoch: 6   Global Step: 32800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:07,110-Speed 5594.73 samples/sec   Loss 7.7992   LearningRate 0.0457   Epoch: 6   Global Step: 32810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:08,959-Speed 5543.43 samples/sec   Loss 7.8001   LearningRate 0.0456   Epoch: 6   Global Step: 32820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:10,815-Speed 5520.31 samples/sec   Loss 7.9027   LearningRate 0.0456   Epoch: 6   Global Step: 32830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:12,654-Speed 5571.29 samples/sec   Loss 7.7174   LearningRate 0.0456   Epoch: 6   Global Step: 32840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:14,486-Speed 5590.03 samples/sec   Loss 7.8056   LearningRate 0.0456   Epoch: 6   Global Step: 32850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:23:16,336-Speed 5539.09 samples/sec   Loss 7.8510   LearningRate 0.0456   Epoch: 6   Global Step: 32860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:18,197-Speed 5505.81 samples/sec   Loss 7.8914   LearningRate 0.0456   Epoch: 6   Global Step: 32870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:20,051-Speed 5527.84 samples/sec   Loss 7.9917   LearningRate 0.0456   Epoch: 6   Global Step: 32880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:21,905-Speed 5525.28 samples/sec   Loss 7.8986   LearningRate 0.0455   Epoch: 6   Global Step: 32890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:23,741-Speed 5579.94 samples/sec   Loss 7.9427   LearningRate 0.0455   Epoch: 6   Global Step: 32900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:25,589-Speed 5543.15 samples/sec   Loss 7.6428   LearningRate 0.0455   Epoch: 6   Global Step: 32910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:27,463-Speed 5467.31 samples/sec   Loss 7.8626   LearningRate 0.0455   Epoch: 6   Global Step: 32920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:29,294-Speed 5595.82 samples/sec   Loss 7.7932   LearningRate 0.0455   Epoch: 6   Global Step: 32930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:31,150-Speed 5517.72 samples/sec   Loss 7.6934   LearningRate 0.0455   Epoch: 6   Global Step: 32940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:33,007-Speed 5517.46 samples/sec   Loss 7.7629   LearningRate 0.0455   Epoch: 6   Global Step: 32950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:34,860-Speed 5528.31 samples/sec   Loss 7.9930   LearningRate 0.0455   Epoch: 6   Global Step: 32960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:23:36,716-Speed 5522.43 samples/sec   Loss 7.7696   LearningRate 0.0454   Epoch: 6   Global Step: 32970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:23:38,592-Speed 5464.95 samples/sec   Loss 7.7735   LearningRate 0.0454   Epoch: 6   Global Step: 32980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:23:40,423-Speed 5593.39 samples/sec   Loss 7.8090   LearningRate 0.0454   Epoch: 6   Global Step: 32990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:23:42,266-Speed 5560.00 samples/sec   Loss 7.8465   LearningRate 0.0454   Epoch: 6   Global Step: 33000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:23:44,121-Speed 5523.50 samples/sec   Loss 7.6945   LearningRate 0.0454   Epoch: 6   Global Step: 33010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:23:45,957-Speed 5580.44 samples/sec   Loss 7.7841   LearningRate 0.0454   Epoch: 6   Global Step: 33020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:23:47,810-Speed 5530.21 samples/sec   Loss 7.5762   LearningRate 0.0454   Epoch: 6   Global Step: 33030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:49,673-Speed 5500.19 samples/sec   Loss 7.8742   LearningRate 0.0453   Epoch: 6   Global Step: 33040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:51,529-Speed 5520.80 samples/sec   Loss 7.7191   LearningRate 0.0453   Epoch: 6   Global Step: 33050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:53,370-Speed 5565.40 samples/sec   Loss 7.7933   LearningRate 0.0453   Epoch: 6   Global Step: 33060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:55,244-Speed 5466.40 samples/sec   Loss 7.7936   LearningRate 0.0453   Epoch: 6   Global Step: 33070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:57,077-Speed 5590.98 samples/sec   Loss 7.6494   LearningRate 0.0453   Epoch: 6   Global Step: 33080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:23:58,911-Speed 5586.45 samples/sec   Loss 7.8231   LearningRate 0.0453   Epoch: 6   Global Step: 33090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:24:00,783-Speed 5471.97 samples/sec   Loss 7.8999   LearningRate 0.0453   Epoch: 6   Global Step: 33100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:24:02,630-Speed 5547.04 samples/sec   Loss 7.8166   LearningRate 0.0453   Epoch: 6   Global Step: 33110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:24:04,502-Speed 5473.25 samples/sec   Loss 7.8342   LearningRate 0.0452   Epoch: 6   Global Step: 33120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:24:06,401-Speed 5393.43 samples/sec   Loss 7.8136   LearningRate 0.0452   Epoch: 6   Global Step: 33130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:08,269-Speed 5487.42 samples/sec   Loss 7.8229   LearningRate 0.0452   Epoch: 6   Global Step: 33140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:10,114-Speed 5551.44 samples/sec   Loss 7.8298   LearningRate 0.0452   Epoch: 6   Global Step: 33150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:12,004-Speed 5420.58 samples/sec   Loss 7.8278   LearningRate 0.0452   Epoch: 6   Global Step: 33160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:13,854-Speed 5538.72 samples/sec   Loss 7.9875   LearningRate 0.0452   Epoch: 6   Global Step: 33170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:15,712-Speed 5513.84 samples/sec   Loss 7.8468   LearningRate 0.0452   Epoch: 6   Global Step: 33180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:24:17,542-Speed 5598.73 samples/sec   Loss 7.8166   LearningRate 0.0451   Epoch: 6   Global Step: 33190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:24:19,376-Speed 5588.20 samples/sec   Loss 7.7554   LearningRate 0.0451   Epoch: 6   Global Step: 33200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:24:21,231-Speed 5521.98 samples/sec   Loss 7.9387   LearningRate 0.0451   Epoch: 6   Global Step: 33210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:24:23,066-Speed 5582.21 samples/sec   Loss 7.9264   LearningRate 0.0451   Epoch: 6   Global Step: 33220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:24:24,918-Speed 5533.46 samples/sec   Loss 7.8113   LearningRate 0.0451   Epoch: 6   Global Step: 33230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:24:26,785-Speed 5487.85 samples/sec   Loss 7.6535   LearningRate 0.0451   Epoch: 6   Global Step: 33240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:24:28,651-Speed 5489.59 samples/sec   Loss 7.7778   LearningRate 0.0451   Epoch: 6   Global Step: 33250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:24:30,493-Speed 5559.80 samples/sec   Loss 7.7544   LearningRate 0.0451   Epoch: 6   Global Step: 33260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:24:32,354-Speed 5505.60 samples/sec   Loss 7.9692   LearningRate 0.0450   Epoch: 6   Global Step: 33270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:24:34,240-Speed 5433.15 samples/sec   Loss 7.6382   LearningRate 0.0450   Epoch: 6   Global Step: 33280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:36,106-Speed 5491.23 samples/sec   Loss 7.8605   LearningRate 0.0450   Epoch: 6   Global Step: 33290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:37,960-Speed 5527.70 samples/sec   Loss 8.0382   LearningRate 0.0450   Epoch: 6   Global Step: 33300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:39,858-Speed 5394.82 samples/sec   Loss 7.8679   LearningRate 0.0450   Epoch: 6   Global Step: 33310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:41,708-Speed 5538.99 samples/sec   Loss 7.6330   LearningRate 0.0450   Epoch: 6   Global Step: 33320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:43,546-Speed 5571.72 samples/sec   Loss 7.8068   LearningRate 0.0450   Epoch: 6   Global Step: 33330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:45,382-Speed 5580.95 samples/sec   Loss 7.8760   LearningRate 0.0449   Epoch: 6   Global Step: 33340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:47,214-Speed 5592.05 samples/sec   Loss 7.9399   LearningRate 0.0449   Epoch: 6   Global Step: 33350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:49,056-Speed 5563.16 samples/sec   Loss 7.8021   LearningRate 0.0449   Epoch: 6   Global Step: 33360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:50,928-Speed 5472.05 samples/sec   Loss 7.9132   LearningRate 0.0449   Epoch: 6   Global Step: 33370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:52,757-Speed 5601.91 samples/sec   Loss 7.7187   LearningRate 0.0449   Epoch: 6   Global Step: 33380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:54,651-Speed 5409.44 samples/sec   Loss 7.6275   LearningRate 0.0449   Epoch: 6   Global Step: 33390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:56,498-Speed 5545.98 samples/sec   Loss 7.8699   LearningRate 0.0449   Epoch: 6   Global Step: 33400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:24:58,332-Speed 5586.87 samples/sec   Loss 7.7544   LearningRate 0.0449   Epoch: 6   Global Step: 33410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:25:00,175-Speed 5556.98 samples/sec   Loss 7.6054   LearningRate 0.0448   Epoch: 6   Global Step: 33420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:25:02,006-Speed 5595.59 samples/sec   Loss 7.7493   LearningRate 0.0448   Epoch: 6   Global Step: 33430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:25:03,894-Speed 5425.97 samples/sec   Loss 7.8699   LearningRate 0.0448   Epoch: 6   Global Step: 33440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:25:05,756-Speed 5503.81 samples/sec   Loss 7.9099   LearningRate 0.0448   Epoch: 6   Global Step: 33450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:25:07,622-Speed 5490.51 samples/sec   Loss 7.6693   LearningRate 0.0448   Epoch: 6   Global Step: 33460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:25:09,501-Speed 5453.59 samples/sec   Loss 7.7900   LearningRate 0.0448   Epoch: 6   Global Step: 33470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:25:11,333-Speed 5591.53 samples/sec   Loss 7.6920   LearningRate 0.0448   Epoch: 6   Global Step: 33480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:25:13,185-Speed 5530.27 samples/sec   Loss 7.8135   LearningRate 0.0447   Epoch: 6   Global Step: 33490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:25:15,031-Speed 5551.57 samples/sec   Loss 7.7283   LearningRate 0.0447   Epoch: 6   Global Step: 33500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:25:16,903-Speed 5470.75 samples/sec   Loss 7.4527   LearningRate 0.0447   Epoch: 6   Global Step: 33510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:25:18,743-Speed 5570.00 samples/sec   Loss 7.7123   LearningRate 0.0447   Epoch: 6   Global Step: 33520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:25:20,574-Speed 5596.06 samples/sec   Loss 7.6839   LearningRate 0.0447   Epoch: 6   Global Step: 33530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:25:22,414-Speed 5567.48 samples/sec   Loss 7.6986   LearningRate 0.0447   Epoch: 6   Global Step: 33540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:25:24,274-Speed 5506.61 samples/sec   Loss 7.6339   LearningRate 0.0447   Epoch: 6   Global Step: 33550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:25:26,112-Speed 5575.30 samples/sec   Loss 7.6666   LearningRate 0.0447   Epoch: 6   Global Step: 33560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:25:27,943-Speed 5594.47 samples/sec   Loss 7.7588   LearningRate 0.0446   Epoch: 6   Global Step: 33570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:25:29,801-Speed 5514.64 samples/sec   Loss 7.8584   LearningRate 0.0446   Epoch: 6   Global Step: 33580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:25:31,655-Speed 5525.87 samples/sec   Loss 7.9058   LearningRate 0.0446   Epoch: 6   Global Step: 33590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:25:33,512-Speed 5517.52 samples/sec   Loss 7.8030   LearningRate 0.0446   Epoch: 6   Global Step: 33600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:25:35,354-Speed 5560.50 samples/sec   Loss 7.8142   LearningRate 0.0446   Epoch: 6   Global Step: 33610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:25:37,234-Speed 5448.61 samples/sec   Loss 7.6599   LearningRate 0.0446   Epoch: 6   Global Step: 33620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:25:39,111-Speed 5459.62 samples/sec   Loss 7.7660   LearningRate 0.0446   Epoch: 6   Global Step: 33630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:25:40,957-Speed 5550.91 samples/sec   Loss 7.6099   LearningRate 0.0445   Epoch: 6   Global Step: 33640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:25:42,802-Speed 5551.57 samples/sec   Loss 7.7876   LearningRate 0.0445   Epoch: 6   Global Step: 33650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:25:44,645-Speed 5558.29 samples/sec   Loss 8.0403   LearningRate 0.0445   Epoch: 6   Global Step: 33660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:25:46,493-Speed 5542.49 samples/sec   Loss 7.7355   LearningRate 0.0445   Epoch: 6   Global Step: 33670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:25:48,328-Speed 5584.84 samples/sec   Loss 7.7568   LearningRate 0.0445   Epoch: 6   Global Step: 33680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:25:50,177-Speed 5541.14 samples/sec   Loss 7.7582   LearningRate 0.0445   Epoch: 6   Global Step: 33690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:25:52,030-Speed 5528.86 samples/sec   Loss 7.7506   LearningRate 0.0445   Epoch: 6   Global Step: 33700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:25:53,872-Speed 5562.79 samples/sec   Loss 7.8130   LearningRate 0.0445   Epoch: 6   Global Step: 33710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:25:55,701-Speed 5603.06 samples/sec   Loss 7.5727   LearningRate 0.0444   Epoch: 6   Global Step: 33720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:25:57,538-Speed 5575.33 samples/sec   Loss 7.8591   LearningRate 0.0444   Epoch: 6   Global Step: 33730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:25:59,422-Speed 5437.27 samples/sec   Loss 7.7324   LearningRate 0.0444   Epoch: 6   Global Step: 33740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:26:01,284-Speed 5501.01 samples/sec   Loss 7.7706   LearningRate 0.0444   Epoch: 6   Global Step: 33750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:26:03,131-Speed 5550.17 samples/sec   Loss 7.6952   LearningRate 0.0444   Epoch: 6   Global Step: 33760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:26:04,983-Speed 5531.38 samples/sec   Loss 7.7628   LearningRate 0.0444   Epoch: 6   Global Step: 33770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:26:06,852-Speed 5480.45 samples/sec   Loss 7.9505   LearningRate 0.0444   Epoch: 6   Global Step: 33780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:26:08,696-Speed 5557.42 samples/sec   Loss 7.6943   LearningRate 0.0444   Epoch: 6   Global Step: 33790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:26:10,568-Speed 5472.03 samples/sec   Loss 7.7991   LearningRate 0.0443   Epoch: 6   Global Step: 33800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:26:12,482-Speed 5352.72 samples/sec   Loss 7.7704   LearningRate 0.0443   Epoch: 6   Global Step: 33810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:26:14,324-Speed 5562.08 samples/sec   Loss 7.8729   LearningRate 0.0443   Epoch: 6   Global Step: 33820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:16,179-Speed 5523.23 samples/sec   Loss 7.5614   LearningRate 0.0443   Epoch: 6   Global Step: 33830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:18,042-Speed 5499.92 samples/sec   Loss 7.6727   LearningRate 0.0443   Epoch: 6   Global Step: 33840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:19,936-Speed 5408.92 samples/sec   Loss 7.5671   LearningRate 0.0443   Epoch: 6   Global Step: 33850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:21,808-Speed 5474.69 samples/sec   Loss 7.7087   LearningRate 0.0443   Epoch: 6   Global Step: 33860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:23,658-Speed 5536.83 samples/sec   Loss 7.7338   LearningRate 0.0442   Epoch: 6   Global Step: 33870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:25,507-Speed 5541.15 samples/sec   Loss 7.7294   LearningRate 0.0442   Epoch: 6   Global Step: 33880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:27,356-Speed 5542.81 samples/sec   Loss 7.7658   LearningRate 0.0442   Epoch: 6   Global Step: 33890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:29,206-Speed 5536.38 samples/sec   Loss 7.7305   LearningRate 0.0442   Epoch: 6   Global Step: 33900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:31,048-Speed 5563.15 samples/sec   Loss 7.7094   LearningRate 0.0442   Epoch: 6   Global Step: 33910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:32,909-Speed 5504.26 samples/sec   Loss 7.6499   LearningRate 0.0442   Epoch: 6   Global Step: 33920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:34,776-Speed 5489.66 samples/sec   Loss 7.7560   LearningRate 0.0442   Epoch: 6   Global Step: 33930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:36,621-Speed 5551.57 samples/sec   Loss 7.6173   LearningRate 0.0442   Epoch: 6   Global Step: 33940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:38,543-Speed 5331.63 samples/sec   Loss 7.8100   LearningRate 0.0441   Epoch: 6   Global Step: 33950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:40,407-Speed 5496.86 samples/sec   Loss 7.5573   LearningRate 0.0441   Epoch: 6   Global Step: 33960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:42,245-Speed 5573.89 samples/sec   Loss 7.7032   LearningRate 0.0441   Epoch: 6   Global Step: 33970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:44,124-Speed 5452.04 samples/sec   Loss 7.6881   LearningRate 0.0441   Epoch: 6   Global Step: 33980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:45,971-Speed 5547.62 samples/sec   Loss 7.8292   LearningRate 0.0441   Epoch: 6   Global Step: 33990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:26:47,843-Speed 5472.74 samples/sec   Loss 7.7838   LearningRate 0.0441   Epoch: 6   Global Step: 34000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:27:15,041-[lfw][34000]XNorm: 21.252891
Training: 2022-04-11 12:27:15,042-[lfw][34000]Accuracy-Flip: 0.99583+-0.00344
Training: 2022-04-11 12:27:15,042-[lfw][34000]Accuracy-Highest: 0.99733
Training: 2022-04-11 12:27:46,370-[cfp_fp][34000]XNorm: 18.514315
Training: 2022-04-11 12:27:46,370-[cfp_fp][34000]Accuracy-Flip: 0.95843+-0.00794
Training: 2022-04-11 12:27:46,371-[cfp_fp][34000]Accuracy-Highest: 0.96000
Training: 2022-04-11 12:28:13,469-[agedb_30][34000]XNorm: 21.146796
Training: 2022-04-11 12:28:13,470-[agedb_30][34000]Accuracy-Flip: 0.97350+-0.00728
Training: 2022-04-11 12:28:13,470-[agedb_30][34000]Accuracy-Highest: 0.97467
Training: 2022-04-11 12:28:15,350-Speed 117.02 samples/sec   Loss 7.6448   LearningRate 0.0441   Epoch: 6   Global Step: 34010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:28:17,213-Speed 5499.13 samples/sec   Loss 7.6987   LearningRate 0.0440   Epoch: 6   Global Step: 34020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:28:19,039-Speed 5612.42 samples/sec   Loss 7.7863   LearningRate 0.0440   Epoch: 6   Global Step: 34030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:28:20,888-Speed 5541.36 samples/sec   Loss 7.8192   LearningRate 0.0440   Epoch: 6   Global Step: 34040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:28:22,730-Speed 5560.67 samples/sec   Loss 7.8106   LearningRate 0.0440   Epoch: 6   Global Step: 34050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:28:24,577-Speed 5548.74 samples/sec   Loss 7.7196   LearningRate 0.0440   Epoch: 6   Global Step: 34060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:28:26,428-Speed 5534.42 samples/sec   Loss 7.8004   LearningRate 0.0440   Epoch: 6   Global Step: 34070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:28:28,301-Speed 5468.78 samples/sec   Loss 7.6943   LearningRate 0.0440   Epoch: 6   Global Step: 34080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:28:30,181-Speed 5449.58 samples/sec   Loss 7.8512   LearningRate 0.0440   Epoch: 6   Global Step: 34090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:28:32,034-Speed 5530.85 samples/sec   Loss 7.7099   LearningRate 0.0439   Epoch: 6   Global Step: 34100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:28:33,900-Speed 5490.33 samples/sec   Loss 7.7482   LearningRate 0.0439   Epoch: 6   Global Step: 34110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:28:35,776-Speed 5460.46 samples/sec   Loss 7.7974   LearningRate 0.0439   Epoch: 6   Global Step: 34120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:28:37,624-Speed 5545.41 samples/sec   Loss 7.6904   LearningRate 0.0439   Epoch: 6   Global Step: 34130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:28:39,500-Speed 5460.33 samples/sec   Loss 7.7530   LearningRate 0.0439   Epoch: 6   Global Step: 34140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:28:41,334-Speed 5588.34 samples/sec   Loss 7.6852   LearningRate 0.0439   Epoch: 6   Global Step: 34150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:28:43,186-Speed 5533.01 samples/sec   Loss 7.8457   LearningRate 0.0439   Epoch: 6   Global Step: 34160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:28:45,044-Speed 5512.43 samples/sec   Loss 7.7451   LearningRate 0.0439   Epoch: 6   Global Step: 34170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:28:46,900-Speed 5521.99 samples/sec   Loss 7.6835   LearningRate 0.0438   Epoch: 6   Global Step: 34180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:28:48,793-Speed 5411.14 samples/sec   Loss 7.7416   LearningRate 0.0438   Epoch: 6   Global Step: 34190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:28:50,630-Speed 5576.97 samples/sec   Loss 7.5113   LearningRate 0.0438   Epoch: 6   Global Step: 34200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:28:52,488-Speed 5513.84 samples/sec   Loss 7.6807   LearningRate 0.0438   Epoch: 6   Global Step: 34210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:28:54,356-Speed 5486.53 samples/sec   Loss 7.6283   LearningRate 0.0438   Epoch: 6   Global Step: 34220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:28:56,194-Speed 5574.13 samples/sec   Loss 7.7186   LearningRate 0.0438   Epoch: 6   Global Step: 34230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:28:58,058-Speed 5498.84 samples/sec   Loss 7.5391   LearningRate 0.0438   Epoch: 6   Global Step: 34240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:28:59,889-Speed 5595.20 samples/sec   Loss 7.6697   LearningRate 0.0437   Epoch: 6   Global Step: 34250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:29:01,747-Speed 5511.14 samples/sec   Loss 7.5912   LearningRate 0.0437   Epoch: 6   Global Step: 34260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:29:03,607-Speed 5509.07 samples/sec   Loss 7.7380   LearningRate 0.0437   Epoch: 6   Global Step: 34270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:05,470-Speed 5499.53 samples/sec   Loss 7.6005   LearningRate 0.0437   Epoch: 6   Global Step: 34280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:07,319-Speed 5540.80 samples/sec   Loss 7.7721   LearningRate 0.0437   Epoch: 6   Global Step: 34290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:09,160-Speed 5565.53 samples/sec   Loss 7.5232   LearningRate 0.0437   Epoch: 6   Global Step: 34300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:11,010-Speed 5539.37 samples/sec   Loss 7.7081   LearningRate 0.0437   Epoch: 6   Global Step: 34310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:12,894-Speed 5436.90 samples/sec   Loss 7.7224   LearningRate 0.0437   Epoch: 6   Global Step: 34320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:14,780-Speed 5431.85 samples/sec   Loss 7.7763   LearningRate 0.0436   Epoch: 6   Global Step: 34330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:16,629-Speed 5544.05 samples/sec   Loss 7.7788   LearningRate 0.0436   Epoch: 6   Global Step: 34340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:18,488-Speed 5513.14 samples/sec   Loss 7.6784   LearningRate 0.0436   Epoch: 6   Global Step: 34350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:20,345-Speed 5514.17 samples/sec   Loss 7.7453   LearningRate 0.0436   Epoch: 6   Global Step: 34360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:22,217-Speed 5473.04 samples/sec   Loss 7.6172   LearningRate 0.0436   Epoch: 6   Global Step: 34370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:24,073-Speed 5522.66 samples/sec   Loss 7.9378   LearningRate 0.0436   Epoch: 6   Global Step: 34380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:25,968-Speed 5405.58 samples/sec   Loss 7.7842   LearningRate 0.0436   Epoch: 6   Global Step: 34390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:27,812-Speed 5557.89 samples/sec   Loss 7.7703   LearningRate 0.0436   Epoch: 6   Global Step: 34400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:29,641-Speed 5599.97 samples/sec   Loss 7.8722   LearningRate 0.0435   Epoch: 6   Global Step: 34410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:31,492-Speed 5534.07 samples/sec   Loss 7.6951   LearningRate 0.0435   Epoch: 6   Global Step: 34420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:33,332-Speed 5568.51 samples/sec   Loss 7.7970   LearningRate 0.0435   Epoch: 6   Global Step: 34430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:35,203-Speed 5477.77 samples/sec   Loss 7.7554   LearningRate 0.0435   Epoch: 6   Global Step: 34440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:37,035-Speed 5590.71 samples/sec   Loss 7.6711   LearningRate 0.0435   Epoch: 6   Global Step: 34450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:38,930-Speed 5407.05 samples/sec   Loss 7.7067   LearningRate 0.0435   Epoch: 6   Global Step: 34460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:40,781-Speed 5535.99 samples/sec   Loss 7.6524   LearningRate 0.0435   Epoch: 6   Global Step: 34470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:42,630-Speed 5542.23 samples/sec   Loss 7.8006   LearningRate 0.0434   Epoch: 6   Global Step: 34480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:44,488-Speed 5512.31 samples/sec   Loss 7.5277   LearningRate 0.0434   Epoch: 6   Global Step: 34490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:46,346-Speed 5515.97 samples/sec   Loss 7.6108   LearningRate 0.0434   Epoch: 6   Global Step: 34500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:48,195-Speed 5541.24 samples/sec   Loss 7.5848   LearningRate 0.0434   Epoch: 6   Global Step: 34510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:50,034-Speed 5569.25 samples/sec   Loss 7.6625   LearningRate 0.0434   Epoch: 6   Global Step: 34520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:51,864-Speed 5598.51 samples/sec   Loss 7.6741   LearningRate 0.0434   Epoch: 6   Global Step: 34530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:53,701-Speed 5578.82 samples/sec   Loss 7.7112   LearningRate 0.0434   Epoch: 6   Global Step: 34540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:55,542-Speed 5561.71 samples/sec   Loss 7.7139   LearningRate 0.0434   Epoch: 6   Global Step: 34550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:57,374-Speed 5593.38 samples/sec   Loss 7.6870   LearningRate 0.0433   Epoch: 6   Global Step: 34560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:29:59,299-Speed 5321.48 samples/sec   Loss 7.6357   LearningRate 0.0433   Epoch: 6   Global Step: 34570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:30:01,136-Speed 5578.05 samples/sec   Loss 7.6239   LearningRate 0.0433   Epoch: 6   Global Step: 34580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:30:03,015-Speed 5453.11 samples/sec   Loss 7.7003   LearningRate 0.0433   Epoch: 6   Global Step: 34590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:30:04,873-Speed 5513.26 samples/sec   Loss 7.6567   LearningRate 0.0433   Epoch: 6   Global Step: 34600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:30:06,705-Speed 5589.79 samples/sec   Loss 7.6198   LearningRate 0.0433   Epoch: 6   Global Step: 34610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:30:08,562-Speed 5517.29 samples/sec   Loss 7.6069   LearningRate 0.0433   Epoch: 6   Global Step: 34620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:30:10,402-Speed 5567.39 samples/sec   Loss 7.6405   LearningRate 0.0433   Epoch: 6   Global Step: 34630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:30:12,255-Speed 5529.88 samples/sec   Loss 7.6756   LearningRate 0.0432   Epoch: 6   Global Step: 34640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:14,114-Speed 5512.89 samples/sec   Loss 7.6816   LearningRate 0.0432   Epoch: 6   Global Step: 34650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:15,950-Speed 5585.27 samples/sec   Loss 7.7128   LearningRate 0.0432   Epoch: 6   Global Step: 34660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:17,787-Speed 5576.75 samples/sec   Loss 7.7774   LearningRate 0.0432   Epoch: 6   Global Step: 34670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:19,632-Speed 5553.92 samples/sec   Loss 7.6784   LearningRate 0.0432   Epoch: 6   Global Step: 34680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:21,481-Speed 5539.46 samples/sec   Loss 7.7542   LearningRate 0.0432   Epoch: 6   Global Step: 34690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:23,350-Speed 5482.72 samples/sec   Loss 7.7003   LearningRate 0.0432   Epoch: 6   Global Step: 34700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:25,184-Speed 5586.19 samples/sec   Loss 7.5536   LearningRate 0.0431   Epoch: 6   Global Step: 34710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:27,023-Speed 5567.73 samples/sec   Loss 7.6887   LearningRate 0.0431   Epoch: 6   Global Step: 34720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:28,870-Speed 5548.05 samples/sec   Loss 7.7731   LearningRate 0.0431   Epoch: 6   Global Step: 34730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:30,705-Speed 5582.35 samples/sec   Loss 7.5781   LearningRate 0.0431   Epoch: 6   Global Step: 34740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:30:32,539-Speed 5586.40 samples/sec   Loss 7.6782   LearningRate 0.0431   Epoch: 6   Global Step: 34750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:30:34,372-Speed 5590.21 samples/sec   Loss 7.5721   LearningRate 0.0431   Epoch: 6   Global Step: 34760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:30:36,249-Speed 5458.24 samples/sec   Loss 7.5718   LearningRate 0.0431   Epoch: 6   Global Step: 34770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:30:38,097-Speed 5543.60 samples/sec   Loss 7.5025   LearningRate 0.0431   Epoch: 6   Global Step: 34780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:30:39,932-Speed 5581.70 samples/sec   Loss 7.7744   LearningRate 0.0430   Epoch: 6   Global Step: 34790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:30:41,763-Speed 5594.43 samples/sec   Loss 7.5884   LearningRate 0.0430   Epoch: 6   Global Step: 34800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:43,597-Speed 5586.81 samples/sec   Loss 7.7121   LearningRate 0.0430   Epoch: 6   Global Step: 34810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:45,431-Speed 5585.99 samples/sec   Loss 7.6354   LearningRate 0.0430   Epoch: 6   Global Step: 34820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:47,267-Speed 5579.03 samples/sec   Loss 7.6438   LearningRate 0.0430   Epoch: 6   Global Step: 34830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:49,100-Speed 5588.57 samples/sec   Loss 7.5641   LearningRate 0.0430   Epoch: 6   Global Step: 34840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:50,958-Speed 5515.35 samples/sec   Loss 7.6354   LearningRate 0.0430   Epoch: 6   Global Step: 34850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:52,833-Speed 5464.17 samples/sec   Loss 7.7144   LearningRate 0.0430   Epoch: 6   Global Step: 34860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:54,667-Speed 5585.04 samples/sec   Loss 7.6060   LearningRate 0.0429   Epoch: 6   Global Step: 34870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:56,522-Speed 5521.24 samples/sec   Loss 7.6698   LearningRate 0.0429   Epoch: 6   Global Step: 34880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:30:58,378-Speed 5522.42 samples/sec   Loss 7.5418   LearningRate 0.0429   Epoch: 6   Global Step: 34890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:00,234-Speed 5521.57 samples/sec   Loss 7.6208   LearningRate 0.0429   Epoch: 6   Global Step: 34900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:31:02,078-Speed 5555.50 samples/sec   Loss 7.6112   LearningRate 0.0429   Epoch: 6   Global Step: 34910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:31:03,929-Speed 5535.66 samples/sec   Loss 7.6631   LearningRate 0.0429   Epoch: 6   Global Step: 34920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:05,800-Speed 5476.40 samples/sec   Loss 7.7069   LearningRate 0.0429   Epoch: 6   Global Step: 34930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:07,640-Speed 5568.43 samples/sec   Loss 7.7586   LearningRate 0.0429   Epoch: 6   Global Step: 34940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:09,499-Speed 5509.88 samples/sec   Loss 7.7397   LearningRate 0.0428   Epoch: 6   Global Step: 34950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:11,356-Speed 5519.19 samples/sec   Loss 7.5131   LearningRate 0.0428   Epoch: 6   Global Step: 34960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:13,261-Speed 5378.21 samples/sec   Loss 7.7479   LearningRate 0.0428   Epoch: 6   Global Step: 34970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:15,098-Speed 5576.59 samples/sec   Loss 7.7056   LearningRate 0.0428   Epoch: 6   Global Step: 34980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:17,067-Speed 5204.13 samples/sec   Loss 7.5985   LearningRate 0.0428   Epoch: 6   Global Step: 34990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:18,949-Speed 5446.10 samples/sec   Loss 7.7094   LearningRate 0.0428   Epoch: 6   Global Step: 35000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:20,781-Speed 5590.49 samples/sec   Loss 7.6983   LearningRate 0.0428   Epoch: 6   Global Step: 35010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:22,638-Speed 5518.56 samples/sec   Loss 7.4808   LearningRate 0.0427   Epoch: 6   Global Step: 35020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:31:24,482-Speed 5556.00 samples/sec   Loss 7.4950   LearningRate 0.0427   Epoch: 6   Global Step: 35030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:31:26,374-Speed 5415.19 samples/sec   Loss 7.4478   LearningRate 0.0427   Epoch: 6   Global Step: 35040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:31:28,214-Speed 5569.33 samples/sec   Loss 7.7352   LearningRate 0.0427   Epoch: 6   Global Step: 35050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:31:30,072-Speed 5513.29 samples/sec   Loss 7.6860   LearningRate 0.0427   Epoch: 6   Global Step: 35060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:31:31,905-Speed 5588.50 samples/sec   Loss 7.6513   LearningRate 0.0427   Epoch: 6   Global Step: 35070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:31:33,769-Speed 5499.07 samples/sec   Loss 7.6704   LearningRate 0.0427   Epoch: 6   Global Step: 35080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:31:35,624-Speed 5523.81 samples/sec   Loss 7.5950   LearningRate 0.0427   Epoch: 6   Global Step: 35090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:31:37,461-Speed 5577.19 samples/sec   Loss 7.6603   LearningRate 0.0426   Epoch: 6   Global Step: 35100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:31:39,301-Speed 5568.03 samples/sec   Loss 7.6834   LearningRate 0.0426   Epoch: 6   Global Step: 35110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:41,150-Speed 5539.59 samples/sec   Loss 7.5959   LearningRate 0.0426   Epoch: 6   Global Step: 35120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:42,993-Speed 5559.29 samples/sec   Loss 7.5897   LearningRate 0.0426   Epoch: 6   Global Step: 35130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:44,858-Speed 5495.75 samples/sec   Loss 7.6376   LearningRate 0.0426   Epoch: 6   Global Step: 35140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:46,693-Speed 5580.40 samples/sec   Loss 7.6057   LearningRate 0.0426   Epoch: 6   Global Step: 35150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:48,576-Speed 5445.51 samples/sec   Loss 7.7197   LearningRate 0.0426   Epoch: 6   Global Step: 35160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:50,410-Speed 5585.81 samples/sec   Loss 7.5069   LearningRate 0.0426   Epoch: 6   Global Step: 35170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:52,275-Speed 5492.85 samples/sec   Loss 7.6951   LearningRate 0.0425   Epoch: 6   Global Step: 35180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:54,115-Speed 5571.52 samples/sec   Loss 7.7805   LearningRate 0.0425   Epoch: 6   Global Step: 35190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:55,982-Speed 5485.94 samples/sec   Loss 7.5768   LearningRate 0.0425   Epoch: 6   Global Step: 35200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:31:57,850-Speed 5485.84 samples/sec   Loss 7.6065   LearningRate 0.0425   Epoch: 6   Global Step: 35210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:31:59,696-Speed 5550.16 samples/sec   Loss 7.6051   LearningRate 0.0425   Epoch: 6   Global Step: 35220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:32:01,549-Speed 5528.33 samples/sec   Loss 7.7132   LearningRate 0.0425   Epoch: 6   Global Step: 35230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:32:03,393-Speed 5556.56 samples/sec   Loss 7.7606   LearningRate 0.0425   Epoch: 6   Global Step: 35240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:32:05,257-Speed 5496.13 samples/sec   Loss 7.5316   LearningRate 0.0425   Epoch: 6   Global Step: 35250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:32:07,115-Speed 5517.02 samples/sec   Loss 7.5908   LearningRate 0.0424   Epoch: 6   Global Step: 35260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:32:08,984-Speed 5483.94 samples/sec   Loss 7.7293   LearningRate 0.0424   Epoch: 6   Global Step: 35270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:32:10,829-Speed 5551.73 samples/sec   Loss 7.6939   LearningRate 0.0424   Epoch: 6   Global Step: 35280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:32:12,704-Speed 5463.00 samples/sec   Loss 7.7711   LearningRate 0.0424   Epoch: 6   Global Step: 35290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:32:14,548-Speed 5558.93 samples/sec   Loss 7.7120   LearningRate 0.0424   Epoch: 6   Global Step: 35300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:32:16,384-Speed 5577.87 samples/sec   Loss 7.5016   LearningRate 0.0424   Epoch: 6   Global Step: 35310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:32:18,275-Speed 5419.15 samples/sec   Loss 7.4841   LearningRate 0.0424   Epoch: 6   Global Step: 35320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:32:20,123-Speed 5542.81 samples/sec   Loss 7.5541   LearningRate 0.0423   Epoch: 6   Global Step: 35330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:32:21,970-Speed 5547.44 samples/sec   Loss 7.5653   LearningRate 0.0423   Epoch: 6   Global Step: 35340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:32:23,809-Speed 5571.84 samples/sec   Loss 7.6912   LearningRate 0.0423   Epoch: 6   Global Step: 35350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:32:25,703-Speed 5409.51 samples/sec   Loss 7.4992   LearningRate 0.0423   Epoch: 6   Global Step: 35360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:32:27,540-Speed 5577.08 samples/sec   Loss 7.4513   LearningRate 0.0423   Epoch: 6   Global Step: 35370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:32:29,425-Speed 5436.64 samples/sec   Loss 7.5123   LearningRate 0.0423   Epoch: 6   Global Step: 35380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:32:31,271-Speed 5548.38 samples/sec   Loss 7.5689   LearningRate 0.0423   Epoch: 6   Global Step: 35390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:32:33,187-Speed 5348.87 samples/sec   Loss 7.5958   LearningRate 0.0423   Epoch: 6   Global Step: 35400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:32:46,416-Speed 774.15 samples/sec   Loss 7.3153   LearningRate 0.0422   Epoch: 7   Global Step: 35410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:32:48,370-Speed 5245.21 samples/sec   Loss 6.8312   LearningRate 0.0422   Epoch: 7   Global Step: 35420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:32:50,350-Speed 5172.86 samples/sec   Loss 6.7185   LearningRate 0.0422   Epoch: 7   Global Step: 35430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:32:52,198-Speed 5543.05 samples/sec   Loss 6.8867   LearningRate 0.0422   Epoch: 7   Global Step: 35440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:32:54,062-Speed 5498.23 samples/sec   Loss 6.8067   LearningRate 0.0422   Epoch: 7   Global Step: 35450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:32:55,901-Speed 5570.74 samples/sec   Loss 6.8127   LearningRate 0.0422   Epoch: 7   Global Step: 35460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:32:57,739-Speed 5573.98 samples/sec   Loss 6.7433   LearningRate 0.0422   Epoch: 7   Global Step: 35470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:32:59,633-Speed 5407.55 samples/sec   Loss 6.8166   LearningRate 0.0422   Epoch: 7   Global Step: 35480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:01,502-Speed 5481.52 samples/sec   Loss 6.7084   LearningRate 0.0421   Epoch: 7   Global Step: 35490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:03,357-Speed 5522.48 samples/sec   Loss 6.8278   LearningRate 0.0421   Epoch: 7   Global Step: 35500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:05,214-Speed 5516.84 samples/sec   Loss 6.7407   LearningRate 0.0421   Epoch: 7   Global Step: 35510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:07,096-Speed 5445.72 samples/sec   Loss 6.8429   LearningRate 0.0421   Epoch: 7   Global Step: 35520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:08,945-Speed 5540.70 samples/sec   Loss 6.9025   LearningRate 0.0421   Epoch: 7   Global Step: 35530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:10,880-Speed 5292.98 samples/sec   Loss 6.8793   LearningRate 0.0421   Epoch: 7   Global Step: 35540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:12,736-Speed 5519.94 samples/sec   Loss 7.0037   LearningRate 0.0421   Epoch: 7   Global Step: 35550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:33:14,611-Speed 5468.46 samples/sec   Loss 6.9260   LearningRate 0.0421   Epoch: 7   Global Step: 35560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:33:16,444-Speed 5590.52 samples/sec   Loss 6.8980   LearningRate 0.0420   Epoch: 7   Global Step: 35570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:33:18,290-Speed 5551.33 samples/sec   Loss 6.7627   LearningRate 0.0420   Epoch: 7   Global Step: 35580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:33:20,155-Speed 5490.97 samples/sec   Loss 6.9195   LearningRate 0.0420   Epoch: 7   Global Step: 35590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:33:22,023-Speed 5485.82 samples/sec   Loss 6.8972   LearningRate 0.0420   Epoch: 7   Global Step: 35600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:33:23,884-Speed 5506.04 samples/sec   Loss 6.9000   LearningRate 0.0420   Epoch: 7   Global Step: 35610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:33:25,734-Speed 5537.16 samples/sec   Loss 6.8121   LearningRate 0.0420   Epoch: 7   Global Step: 35620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:27,581-Speed 5547.74 samples/sec   Loss 6.7918   LearningRate 0.0420   Epoch: 7   Global Step: 35630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:29,424-Speed 5557.80 samples/sec   Loss 6.9331   LearningRate 0.0419   Epoch: 7   Global Step: 35640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:31,295-Speed 5477.26 samples/sec   Loss 6.9363   LearningRate 0.0419   Epoch: 7   Global Step: 35650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:33,139-Speed 5557.83 samples/sec   Loss 6.9875   LearningRate 0.0419   Epoch: 7   Global Step: 35660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:34,986-Speed 5545.16 samples/sec   Loss 6.9152   LearningRate 0.0419   Epoch: 7   Global Step: 35670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:36,871-Speed 5435.13 samples/sec   Loss 6.9711   LearningRate 0.0419   Epoch: 7   Global Step: 35680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:38,739-Speed 5483.99 samples/sec   Loss 7.1447   LearningRate 0.0419   Epoch: 7   Global Step: 35690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:40,663-Speed 5325.16 samples/sec   Loss 7.0813   LearningRate 0.0419   Epoch: 7   Global Step: 35700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:42,538-Speed 5462.92 samples/sec   Loss 7.0596   LearningRate 0.0419   Epoch: 7   Global Step: 35710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:44,409-Speed 5478.58 samples/sec   Loss 6.9415   LearningRate 0.0418   Epoch: 7   Global Step: 35720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:33:46,272-Speed 5498.31 samples/sec   Loss 7.0631   LearningRate 0.0418   Epoch: 7   Global Step: 35730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:33:48,144-Speed 5472.90 samples/sec   Loss 7.0419   LearningRate 0.0418   Epoch: 7   Global Step: 35740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:33:50,019-Speed 5465.68 samples/sec   Loss 7.0307   LearningRate 0.0418   Epoch: 7   Global Step: 35750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:33:51,921-Speed 5384.06 samples/sec   Loss 6.9262   LearningRate 0.0418   Epoch: 7   Global Step: 35760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:33:53,783-Speed 5503.36 samples/sec   Loss 6.9669   LearningRate 0.0418   Epoch: 7   Global Step: 35770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:55,629-Speed 5549.99 samples/sec   Loss 7.1253   LearningRate 0.0418   Epoch: 7   Global Step: 35780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:57,495-Speed 5489.57 samples/sec   Loss 6.9343   LearningRate 0.0418   Epoch: 7   Global Step: 35790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:33:59,373-Speed 5457.24 samples/sec   Loss 7.0837   LearningRate 0.0417   Epoch: 7   Global Step: 35800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:01,223-Speed 5537.75 samples/sec   Loss 7.0552   LearningRate 0.0417   Epoch: 7   Global Step: 35810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:03,082-Speed 5520.64 samples/sec   Loss 6.9479   LearningRate 0.0417   Epoch: 7   Global Step: 35820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:04,958-Speed 5461.56 samples/sec   Loss 7.0976   LearningRate 0.0417   Epoch: 7   Global Step: 35830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:06,852-Speed 5406.45 samples/sec   Loss 7.1185   LearningRate 0.0417   Epoch: 7   Global Step: 35840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:08,691-Speed 5573.12 samples/sec   Loss 7.1022   LearningRate 0.0417   Epoch: 7   Global Step: 35850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:10,528-Speed 5576.79 samples/sec   Loss 7.1606   LearningRate 0.0417   Epoch: 7   Global Step: 35860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:12,389-Speed 5504.35 samples/sec   Loss 7.1960   LearningRate 0.0417   Epoch: 7   Global Step: 35870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:34:14,253-Speed 5496.98 samples/sec   Loss 7.1658   LearningRate 0.0416   Epoch: 7   Global Step: 35880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:34:16,098-Speed 5553.90 samples/sec   Loss 7.0486   LearningRate 0.0416   Epoch: 7   Global Step: 35890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:17,950-Speed 5530.94 samples/sec   Loss 7.1115   LearningRate 0.0416   Epoch: 7   Global Step: 35900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:19,799-Speed 5542.55 samples/sec   Loss 7.0427   LearningRate 0.0416   Epoch: 7   Global Step: 35910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:21,667-Speed 5484.57 samples/sec   Loss 7.1792   LearningRate 0.0416   Epoch: 7   Global Step: 35920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:23,515-Speed 5545.53 samples/sec   Loss 7.1646   LearningRate 0.0416   Epoch: 7   Global Step: 35930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:25,398-Speed 5441.55 samples/sec   Loss 7.1205   LearningRate 0.0416   Epoch: 7   Global Step: 35940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:27,254-Speed 5519.75 samples/sec   Loss 7.0552   LearningRate 0.0416   Epoch: 7   Global Step: 35950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:29,137-Speed 5439.62 samples/sec   Loss 7.0466   LearningRate 0.0415   Epoch: 7   Global Step: 35960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:31,049-Speed 5360.51 samples/sec   Loss 7.1857   LearningRate 0.0415   Epoch: 7   Global Step: 35970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:32,886-Speed 5577.59 samples/sec   Loss 7.1394   LearningRate 0.0415   Epoch: 7   Global Step: 35980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:34:34,733-Speed 5547.14 samples/sec   Loss 7.0655   LearningRate 0.0415   Epoch: 7   Global Step: 35990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:34:36,614-Speed 5443.94 samples/sec   Loss 7.1522   LearningRate 0.0415   Epoch: 7   Global Step: 36000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:35:03,783-[lfw][36000]XNorm: 22.880578
Training: 2022-04-11 12:35:03,784-[lfw][36000]Accuracy-Flip: 0.99617+-0.00289
Training: 2022-04-11 12:35:03,785-[lfw][36000]Accuracy-Highest: 0.99733
Training: 2022-04-11 12:35:35,185-[cfp_fp][36000]XNorm: 20.194693
Training: 2022-04-11 12:35:35,186-[cfp_fp][36000]Accuracy-Flip: 0.95814+-0.00994
Training: 2022-04-11 12:35:35,187-[cfp_fp][36000]Accuracy-Highest: 0.96000
Training: 2022-04-11 12:36:02,208-[agedb_30][36000]XNorm: 22.727170
Training: 2022-04-11 12:36:02,209-[agedb_30][36000]Accuracy-Flip: 0.97200+-0.00733
Training: 2022-04-11 12:36:02,210-[agedb_30][36000]Accuracy-Highest: 0.97467
Training: 2022-04-11 12:36:04,062-Speed 117.10 samples/sec   Loss 7.1927   LearningRate 0.0415   Epoch: 7   Global Step: 36010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:05,912-Speed 5540.40 samples/sec   Loss 7.0326   LearningRate 0.0415   Epoch: 7   Global Step: 36020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:07,761-Speed 5540.64 samples/sec   Loss 7.1624   LearningRate 0.0415   Epoch: 7   Global Step: 36030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:09,603-Speed 5561.29 samples/sec   Loss 7.1124   LearningRate 0.0414   Epoch: 7   Global Step: 36040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:11,460-Speed 5528.85 samples/sec   Loss 7.0952   LearningRate 0.0414   Epoch: 7   Global Step: 36050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:13,328-Speed 5483.41 samples/sec   Loss 7.1266   LearningRate 0.0414   Epoch: 7   Global Step: 36060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:15,187-Speed 5511.34 samples/sec   Loss 7.2736   LearningRate 0.0414   Epoch: 7   Global Step: 36070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:17,042-Speed 5522.10 samples/sec   Loss 7.1773   LearningRate 0.0414   Epoch: 7   Global Step: 36080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:18,865-Speed 5619.47 samples/sec   Loss 7.2986   LearningRate 0.0414   Epoch: 7   Global Step: 36090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:20,740-Speed 5465.86 samples/sec   Loss 7.0618   LearningRate 0.0414   Epoch: 7   Global Step: 36100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:36:22,581-Speed 5564.35 samples/sec   Loss 7.3058   LearningRate 0.0414   Epoch: 7   Global Step: 36110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:36:24,434-Speed 5528.75 samples/sec   Loss 7.1966   LearningRate 0.0413   Epoch: 7   Global Step: 36120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:36:26,275-Speed 5567.63 samples/sec   Loss 7.3689   LearningRate 0.0413   Epoch: 7   Global Step: 36130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:36:28,126-Speed 5535.56 samples/sec   Loss 7.2175   LearningRate 0.0413   Epoch: 7   Global Step: 36140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:36:29,967-Speed 5565.59 samples/sec   Loss 7.1806   LearningRate 0.0413   Epoch: 7   Global Step: 36150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:36:31,824-Speed 5514.48 samples/sec   Loss 7.1026   LearningRate 0.0413   Epoch: 7   Global Step: 36160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:36:33,672-Speed 5546.14 samples/sec   Loss 7.0559   LearningRate 0.0413   Epoch: 7   Global Step: 36170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:36:35,520-Speed 5544.28 samples/sec   Loss 7.4083   LearningRate 0.0413   Epoch: 7   Global Step: 36180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:37,387-Speed 5489.44 samples/sec   Loss 7.2900   LearningRate 0.0412   Epoch: 7   Global Step: 36190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:39,249-Speed 5500.89 samples/sec   Loss 7.1745   LearningRate 0.0412   Epoch: 7   Global Step: 36200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:41,087-Speed 5575.62 samples/sec   Loss 7.2617   LearningRate 0.0412   Epoch: 7   Global Step: 36210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:42,937-Speed 5538.09 samples/sec   Loss 7.1818   LearningRate 0.0412   Epoch: 7   Global Step: 36220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:44,803-Speed 5490.52 samples/sec   Loss 7.2438   LearningRate 0.0412   Epoch: 7   Global Step: 36230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:46,655-Speed 5529.56 samples/sec   Loss 7.2563   LearningRate 0.0412   Epoch: 7   Global Step: 36240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:48,526-Speed 5515.63 samples/sec   Loss 7.2927   LearningRate 0.0412   Epoch: 7   Global Step: 36250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:50,373-Speed 5548.17 samples/sec   Loss 7.3555   LearningRate 0.0412   Epoch: 7   Global Step: 36260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:52,208-Speed 5584.46 samples/sec   Loss 7.2582   LearningRate 0.0411   Epoch: 7   Global Step: 36270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:36:54,042-Speed 5583.72 samples/sec   Loss 7.2452   LearningRate 0.0411   Epoch: 7   Global Step: 36280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:36:55,904-Speed 5503.77 samples/sec   Loss 7.3181   LearningRate 0.0411   Epoch: 7   Global Step: 36290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:36:57,738-Speed 5586.87 samples/sec   Loss 7.1760   LearningRate 0.0411   Epoch: 7   Global Step: 36300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:36:59,620-Speed 5443.91 samples/sec   Loss 7.1506   LearningRate 0.0411   Epoch: 7   Global Step: 36310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:01,454-Speed 5585.54 samples/sec   Loss 7.2046   LearningRate 0.0411   Epoch: 7   Global Step: 36320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:03,351-Speed 5402.06 samples/sec   Loss 7.3357   LearningRate 0.0411   Epoch: 7   Global Step: 36330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:05,203-Speed 5532.19 samples/sec   Loss 7.1469   LearningRate 0.0411   Epoch: 7   Global Step: 36340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:07,062-Speed 5510.40 samples/sec   Loss 7.0477   LearningRate 0.0410   Epoch: 7   Global Step: 36350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:08,899-Speed 5578.45 samples/sec   Loss 7.4247   LearningRate 0.0410   Epoch: 7   Global Step: 36360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:10,790-Speed 5418.41 samples/sec   Loss 7.2905   LearningRate 0.0410   Epoch: 7   Global Step: 36370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:12,640-Speed 5542.60 samples/sec   Loss 7.2808   LearningRate 0.0410   Epoch: 7   Global Step: 36380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:14,488-Speed 5542.55 samples/sec   Loss 7.3088   LearningRate 0.0410   Epoch: 7   Global Step: 36390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:16,347-Speed 5510.56 samples/sec   Loss 7.1697   LearningRate 0.0410   Epoch: 7   Global Step: 36400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:18,206-Speed 5512.21 samples/sec   Loss 7.2811   LearningRate 0.0410   Epoch: 7   Global Step: 36410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:37:20,113-Speed 5371.34 samples/sec   Loss 7.2202   LearningRate 0.0410   Epoch: 7   Global Step: 36420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:37:21,960-Speed 5548.01 samples/sec   Loss 7.1497   LearningRate 0.0409   Epoch: 7   Global Step: 36430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:37:23,809-Speed 5542.13 samples/sec   Loss 7.2821   LearningRate 0.0409   Epoch: 7   Global Step: 36440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:37:25,653-Speed 5553.99 samples/sec   Loss 7.2648   LearningRate 0.0409   Epoch: 7   Global Step: 36450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:27,522-Speed 5483.85 samples/sec   Loss 7.0901   LearningRate 0.0409   Epoch: 7   Global Step: 36460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:29,379-Speed 5517.34 samples/sec   Loss 7.2875   LearningRate 0.0409   Epoch: 7   Global Step: 36470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:31,241-Speed 5502.46 samples/sec   Loss 7.1745   LearningRate 0.0409   Epoch: 7   Global Step: 36480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:33,084-Speed 5561.01 samples/sec   Loss 7.1247   LearningRate 0.0409   Epoch: 7   Global Step: 36490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:34,946-Speed 5500.21 samples/sec   Loss 7.1096   LearningRate 0.0409   Epoch: 7   Global Step: 36500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:36,783-Speed 5576.92 samples/sec   Loss 7.2799   LearningRate 0.0408   Epoch: 7   Global Step: 36510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:38,630-Speed 5548.51 samples/sec   Loss 7.3210   LearningRate 0.0408   Epoch: 7   Global Step: 36520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:40,487-Speed 5515.44 samples/sec   Loss 7.2506   LearningRate 0.0408   Epoch: 7   Global Step: 36530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:42,327-Speed 5572.74 samples/sec   Loss 7.3599   LearningRate 0.0408   Epoch: 7   Global Step: 36540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:44,164-Speed 5574.30 samples/sec   Loss 7.3970   LearningRate 0.0408   Epoch: 7   Global Step: 36550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:37:46,001-Speed 5578.33 samples/sec   Loss 7.2064   LearningRate 0.0408   Epoch: 7   Global Step: 36560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:37:47,907-Speed 5376.15 samples/sec   Loss 7.3339   LearningRate 0.0408   Epoch: 7   Global Step: 36570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:37:49,751-Speed 5556.95 samples/sec   Loss 7.3524   LearningRate 0.0408   Epoch: 7   Global Step: 36580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:37:51,682-Speed 5304.54 samples/sec   Loss 7.3653   LearningRate 0.0407   Epoch: 7   Global Step: 36590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:53,525-Speed 5562.27 samples/sec   Loss 7.3132   LearningRate 0.0407   Epoch: 7   Global Step: 36600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:55,363-Speed 5572.73 samples/sec   Loss 7.3768   LearningRate 0.0407   Epoch: 7   Global Step: 36610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:57,190-Speed 5608.21 samples/sec   Loss 7.2834   LearningRate 0.0407   Epoch: 7   Global Step: 36620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:37:59,020-Speed 5600.69 samples/sec   Loss 7.2792   LearningRate 0.0407   Epoch: 7   Global Step: 36630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:00,866-Speed 5548.93 samples/sec   Loss 7.3254   LearningRate 0.0407   Epoch: 7   Global Step: 36640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:02,710-Speed 5555.80 samples/sec   Loss 7.2431   LearningRate 0.0407   Epoch: 7   Global Step: 36650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:04,612-Speed 5386.25 samples/sec   Loss 7.1577   LearningRate 0.0407   Epoch: 7   Global Step: 36660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:06,473-Speed 5505.61 samples/sec   Loss 7.2077   LearningRate 0.0406   Epoch: 7   Global Step: 36670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:08,309-Speed 5581.48 samples/sec   Loss 7.3839   LearningRate 0.0406   Epoch: 7   Global Step: 36680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:10,140-Speed 5592.31 samples/sec   Loss 7.3221   LearningRate 0.0406   Epoch: 7   Global Step: 36690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:38:11,957-Speed 5638.47 samples/sec   Loss 7.3667   LearningRate 0.0406   Epoch: 7   Global Step: 36700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:38:13,826-Speed 5483.77 samples/sec   Loss 7.1651   LearningRate 0.0406   Epoch: 7   Global Step: 36710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:38:15,689-Speed 5497.99 samples/sec   Loss 7.2407   LearningRate 0.0406   Epoch: 7   Global Step: 36720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:38:17,527-Speed 5574.23 samples/sec   Loss 7.3079   LearningRate 0.0406   Epoch: 7   Global Step: 36730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:38:19,368-Speed 5563.47 samples/sec   Loss 7.3666   LearningRate 0.0406   Epoch: 7   Global Step: 36740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:38:21,207-Speed 5572.05 samples/sec   Loss 7.3504   LearningRate 0.0405   Epoch: 7   Global Step: 36750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:38:23,090-Speed 5439.87 samples/sec   Loss 7.3259   LearningRate 0.0405   Epoch: 7   Global Step: 36760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:38:24,939-Speed 5538.72 samples/sec   Loss 7.1635   LearningRate 0.0405   Epoch: 7   Global Step: 36770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:38:26,815-Speed 5461.36 samples/sec   Loss 7.2268   LearningRate 0.0405   Epoch: 7   Global Step: 36780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:38:28,671-Speed 5520.43 samples/sec   Loss 7.2803   LearningRate 0.0405   Epoch: 7   Global Step: 36790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:38:30,558-Speed 5428.58 samples/sec   Loss 7.2692   LearningRate 0.0405   Epoch: 7   Global Step: 36800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:32,388-Speed 5601.39 samples/sec   Loss 7.2966   LearningRate 0.0405   Epoch: 7   Global Step: 36810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:34,237-Speed 5540.03 samples/sec   Loss 7.3694   LearningRate 0.0405   Epoch: 7   Global Step: 36820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:36,107-Speed 5478.83 samples/sec   Loss 7.2321   LearningRate 0.0404   Epoch: 7   Global Step: 36830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:37,967-Speed 5510.18 samples/sec   Loss 7.2100   LearningRate 0.0404   Epoch: 7   Global Step: 36840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:39,845-Speed 5454.21 samples/sec   Loss 7.2006   LearningRate 0.0404   Epoch: 7   Global Step: 36850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:41,715-Speed 5478.87 samples/sec   Loss 7.3006   LearningRate 0.0404   Epoch: 7   Global Step: 36860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:43,560-Speed 5552.40 samples/sec   Loss 7.3151   LearningRate 0.0404   Epoch: 7   Global Step: 36870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:45,386-Speed 5610.34 samples/sec   Loss 7.3550   LearningRate 0.0404   Epoch: 7   Global Step: 36880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:47,282-Speed 5405.02 samples/sec   Loss 7.2414   LearningRate 0.0404   Epoch: 7   Global Step: 36890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:49,155-Speed 5469.11 samples/sec   Loss 7.2934   LearningRate 0.0404   Epoch: 7   Global Step: 36900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:38:50,996-Speed 5566.30 samples/sec   Loss 7.3663   LearningRate 0.0403   Epoch: 7   Global Step: 36910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:38:52,845-Speed 5541.86 samples/sec   Loss 7.3236   LearningRate 0.0403   Epoch: 7   Global Step: 36920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:54,690-Speed 5553.03 samples/sec   Loss 7.3877   LearningRate 0.0403   Epoch: 7   Global Step: 36930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:56,552-Speed 5502.99 samples/sec   Loss 7.3796   LearningRate 0.0403   Epoch: 7   Global Step: 36940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:38:58,413-Speed 5504.97 samples/sec   Loss 7.4457   LearningRate 0.0403   Epoch: 7   Global Step: 36950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:00,265-Speed 5533.46 samples/sec   Loss 7.2736   LearningRate 0.0403   Epoch: 7   Global Step: 36960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:02,156-Speed 5416.33 samples/sec   Loss 7.3485   LearningRate 0.0403   Epoch: 7   Global Step: 36970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:04,010-Speed 5528.15 samples/sec   Loss 7.3046   LearningRate 0.0403   Epoch: 7   Global Step: 36980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:05,858-Speed 5544.44 samples/sec   Loss 7.1818   LearningRate 0.0402   Epoch: 7   Global Step: 36990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:07,691-Speed 5589.81 samples/sec   Loss 7.2420   LearningRate 0.0402   Epoch: 7   Global Step: 37000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:09,524-Speed 5588.66 samples/sec   Loss 7.2176   LearningRate 0.0402   Epoch: 7   Global Step: 37010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:11,390-Speed 5492.49 samples/sec   Loss 7.4222   LearningRate 0.0402   Epoch: 7   Global Step: 37020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:39:13,234-Speed 5555.37 samples/sec   Loss 7.3547   LearningRate 0.0402   Epoch: 7   Global Step: 37030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:39:15,088-Speed 5527.42 samples/sec   Loss 7.2173   LearningRate 0.0402   Epoch: 7   Global Step: 37040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:16,971-Speed 5440.77 samples/sec   Loss 7.3726   LearningRate 0.0402   Epoch: 7   Global Step: 37050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:18,832-Speed 5504.87 samples/sec   Loss 7.3876   LearningRate 0.0402   Epoch: 7   Global Step: 37060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:20,681-Speed 5540.86 samples/sec   Loss 7.3740   LearningRate 0.0401   Epoch: 7   Global Step: 37070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:22,529-Speed 5546.46 samples/sec   Loss 7.3948   LearningRate 0.0401   Epoch: 7   Global Step: 37080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:24,428-Speed 5393.30 samples/sec   Loss 7.3049   LearningRate 0.0401   Epoch: 7   Global Step: 37090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:26,267-Speed 5572.09 samples/sec   Loss 7.2828   LearningRate 0.0401   Epoch: 7   Global Step: 37100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:28,136-Speed 5482.51 samples/sec   Loss 7.4496   LearningRate 0.0401   Epoch: 7   Global Step: 37110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:30,046-Speed 5364.72 samples/sec   Loss 7.4069   LearningRate 0.0401   Epoch: 7   Global Step: 37120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:39:31,932-Speed 5430.10 samples/sec   Loss 7.1488   LearningRate 0.0401   Epoch: 7   Global Step: 37130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:39:33,776-Speed 5558.88 samples/sec   Loss 7.3612   LearningRate 0.0401   Epoch: 7   Global Step: 37140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:39:35,637-Speed 5505.18 samples/sec   Loss 7.2890   LearningRate 0.0400   Epoch: 7   Global Step: 37150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:39:37,491-Speed 5527.95 samples/sec   Loss 7.3202   LearningRate 0.0400   Epoch: 7   Global Step: 37160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:39:39,373-Speed 5442.20 samples/sec   Loss 7.3220   LearningRate 0.0400   Epoch: 7   Global Step: 37170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:39:41,216-Speed 5558.11 samples/sec   Loss 7.2697   LearningRate 0.0400   Epoch: 7   Global Step: 37180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:39:43,061-Speed 5555.65 samples/sec   Loss 7.2139   LearningRate 0.0400   Epoch: 7   Global Step: 37190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:39:44,898-Speed 5575.03 samples/sec   Loss 7.3846   LearningRate 0.0400   Epoch: 7   Global Step: 37200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:39:46,734-Speed 5580.74 samples/sec   Loss 7.3605   LearningRate 0.0400   Epoch: 7   Global Step: 37210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:39:48,573-Speed 5570.72 samples/sec   Loss 7.2917   LearningRate 0.0400   Epoch: 7   Global Step: 37220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:50,455-Speed 5444.88 samples/sec   Loss 7.3886   LearningRate 0.0399   Epoch: 7   Global Step: 37230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:52,284-Speed 5601.96 samples/sec   Loss 7.3300   LearningRate 0.0399   Epoch: 7   Global Step: 37240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:54,131-Speed 5547.07 samples/sec   Loss 7.1480   LearningRate 0.0399   Epoch: 7   Global Step: 37250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:55,962-Speed 5594.16 samples/sec   Loss 7.1673   LearningRate 0.0399   Epoch: 7   Global Step: 37260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:57,842-Speed 5450.98 samples/sec   Loss 7.2497   LearningRate 0.0399   Epoch: 7   Global Step: 37270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:39:59,677-Speed 5583.47 samples/sec   Loss 7.1974   LearningRate 0.0399   Epoch: 7   Global Step: 37280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:01,527-Speed 5537.42 samples/sec   Loss 7.4213   LearningRate 0.0399   Epoch: 7   Global Step: 37290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:03,393-Speed 5493.70 samples/sec   Loss 7.1763   LearningRate 0.0399   Epoch: 7   Global Step: 37300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:05,260-Speed 5485.98 samples/sec   Loss 7.1280   LearningRate 0.0398   Epoch: 7   Global Step: 37310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:07,087-Speed 5606.80 samples/sec   Loss 7.2806   LearningRate 0.0398   Epoch: 7   Global Step: 37320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:08,958-Speed 5475.86 samples/sec   Loss 7.2546   LearningRate 0.0398   Epoch: 7   Global Step: 37330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:10,798-Speed 5569.19 samples/sec   Loss 7.3808   LearningRate 0.0398   Epoch: 7   Global Step: 37340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:12,636-Speed 5574.27 samples/sec   Loss 7.3491   LearningRate 0.0398   Epoch: 7   Global Step: 37350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:14,526-Speed 5421.30 samples/sec   Loss 7.3019   LearningRate 0.0398   Epoch: 7   Global Step: 37360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:16,365-Speed 5571.09 samples/sec   Loss 7.1205   LearningRate 0.0398   Epoch: 7   Global Step: 37370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:18,219-Speed 5524.89 samples/sec   Loss 7.1771   LearningRate 0.0398   Epoch: 7   Global Step: 37380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:20,119-Speed 5393.11 samples/sec   Loss 7.3028   LearningRate 0.0397   Epoch: 7   Global Step: 37390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:21,956-Speed 5575.43 samples/sec   Loss 7.3604   LearningRate 0.0397   Epoch: 7   Global Step: 37400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:23,832-Speed 5462.43 samples/sec   Loss 7.3126   LearningRate 0.0397   Epoch: 7   Global Step: 37410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:25,694-Speed 5501.71 samples/sec   Loss 7.1579   LearningRate 0.0397   Epoch: 7   Global Step: 37420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:40:27,561-Speed 5490.76 samples/sec   Loss 7.3156   LearningRate 0.0397   Epoch: 7   Global Step: 37430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:40:29,404-Speed 5556.98 samples/sec   Loss 7.3682   LearningRate 0.0397   Epoch: 7   Global Step: 37440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:40:31,280-Speed 5462.53 samples/sec   Loss 7.3856   LearningRate 0.0397   Epoch: 7   Global Step: 37450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:40:33,127-Speed 5547.40 samples/sec   Loss 7.1886   LearningRate 0.0397   Epoch: 7   Global Step: 37460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:40:34,976-Speed 5540.78 samples/sec   Loss 7.3322   LearningRate 0.0396   Epoch: 7   Global Step: 37470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:40:36,847-Speed 5476.31 samples/sec   Loss 7.4443   LearningRate 0.0396   Epoch: 7   Global Step: 37480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:40:38,708-Speed 5506.45 samples/sec   Loss 7.2141   LearningRate 0.0396   Epoch: 7   Global Step: 37490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:40:40,585-Speed 5457.18 samples/sec   Loss 7.4424   LearningRate 0.0396   Epoch: 7   Global Step: 37500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:42,458-Speed 5469.68 samples/sec   Loss 7.2041   LearningRate 0.0396   Epoch: 7   Global Step: 37510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:44,312-Speed 5526.00 samples/sec   Loss 7.2112   LearningRate 0.0396   Epoch: 7   Global Step: 37520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:46,147-Speed 5583.30 samples/sec   Loss 7.2789   LearningRate 0.0396   Epoch: 7   Global Step: 37530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:47,992-Speed 5554.10 samples/sec   Loss 7.3838   LearningRate 0.0396   Epoch: 7   Global Step: 37540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:49,888-Speed 5403.42 samples/sec   Loss 7.3129   LearningRate 0.0395   Epoch: 7   Global Step: 37550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:51,730-Speed 5562.15 samples/sec   Loss 7.3773   LearningRate 0.0395   Epoch: 7   Global Step: 37560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:53,607-Speed 5459.75 samples/sec   Loss 7.4062   LearningRate 0.0395   Epoch: 7   Global Step: 37570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:55,453-Speed 5551.21 samples/sec   Loss 7.3602   LearningRate 0.0395   Epoch: 7   Global Step: 37580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:57,331-Speed 5455.37 samples/sec   Loss 7.3097   LearningRate 0.0395   Epoch: 7   Global Step: 37590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:40:59,175-Speed 5559.69 samples/sec   Loss 7.2588   LearningRate 0.0395   Epoch: 7   Global Step: 37600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:41:01,035-Speed 5506.66 samples/sec   Loss 7.3184   LearningRate 0.0395   Epoch: 7   Global Step: 37610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:02,904-Speed 5483.29 samples/sec   Loss 7.3348   LearningRate 0.0395   Epoch: 7   Global Step: 37620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:04,812-Speed 5369.38 samples/sec   Loss 7.3086   LearningRate 0.0394   Epoch: 7   Global Step: 37630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:06,680-Speed 5484.23 samples/sec   Loss 7.3101   LearningRate 0.0394   Epoch: 7   Global Step: 37640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:08,514-Speed 5586.06 samples/sec   Loss 7.4015   LearningRate 0.0394   Epoch: 7   Global Step: 37650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:10,350-Speed 5580.62 samples/sec   Loss 7.3300   LearningRate 0.0394   Epoch: 7   Global Step: 37660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:12,221-Speed 5472.68 samples/sec   Loss 7.2708   LearningRate 0.0394   Epoch: 7   Global Step: 37670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:14,074-Speed 5530.08 samples/sec   Loss 7.2461   LearningRate 0.0394   Epoch: 7   Global Step: 37680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:15,908-Speed 5587.76 samples/sec   Loss 7.1829   LearningRate 0.0394   Epoch: 7   Global Step: 37690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:17,747-Speed 5571.98 samples/sec   Loss 7.3221   LearningRate 0.0394   Epoch: 7   Global Step: 37700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:19,624-Speed 5457.51 samples/sec   Loss 7.2611   LearningRate 0.0393   Epoch: 7   Global Step: 37710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:41:21,446-Speed 5621.73 samples/sec   Loss 7.2379   LearningRate 0.0393   Epoch: 7   Global Step: 37720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:23,339-Speed 5411.36 samples/sec   Loss 7.2317   LearningRate 0.0393   Epoch: 7   Global Step: 37730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:25,188-Speed 5540.64 samples/sec   Loss 7.1272   LearningRate 0.0393   Epoch: 7   Global Step: 37740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:27,048-Speed 5508.59 samples/sec   Loss 7.3814   LearningRate 0.0393   Epoch: 7   Global Step: 37750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:28,915-Speed 5487.90 samples/sec   Loss 7.2466   LearningRate 0.0393   Epoch: 7   Global Step: 37760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:30,751-Speed 5579.96 samples/sec   Loss 7.2179   LearningRate 0.0393   Epoch: 7   Global Step: 37770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:32,628-Speed 5458.64 samples/sec   Loss 7.2383   LearningRate 0.0393   Epoch: 7   Global Step: 37780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:34,488-Speed 5510.04 samples/sec   Loss 7.4611   LearningRate 0.0392   Epoch: 7   Global Step: 37790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:36,342-Speed 5526.53 samples/sec   Loss 7.2715   LearningRate 0.0392   Epoch: 7   Global Step: 37800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:38,197-Speed 5523.79 samples/sec   Loss 7.3793   LearningRate 0.0392   Epoch: 7   Global Step: 37810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:40,069-Speed 5470.36 samples/sec   Loss 7.3702   LearningRate 0.0392   Epoch: 7   Global Step: 37820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:41:41,934-Speed 5492.98 samples/sec   Loss 7.2340   LearningRate 0.0392   Epoch: 7   Global Step: 37830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:43,801-Speed 5488.71 samples/sec   Loss 7.2982   LearningRate 0.0392   Epoch: 7   Global Step: 37840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:45,696-Speed 5407.24 samples/sec   Loss 7.3159   LearningRate 0.0392   Epoch: 7   Global Step: 37850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:47,579-Speed 5442.12 samples/sec   Loss 7.2849   LearningRate 0.0392   Epoch: 7   Global Step: 37860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:49,431-Speed 5529.63 samples/sec   Loss 7.2968   LearningRate 0.0391   Epoch: 7   Global Step: 37870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:51,286-Speed 5523.99 samples/sec   Loss 7.2466   LearningRate 0.0391   Epoch: 7   Global Step: 37880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:53,155-Speed 5483.30 samples/sec   Loss 7.3452   LearningRate 0.0391   Epoch: 7   Global Step: 37890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:55,020-Speed 5493.56 samples/sec   Loss 7.1239   LearningRate 0.0391   Epoch: 7   Global Step: 37900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:56,891-Speed 5475.54 samples/sec   Loss 7.3958   LearningRate 0.0391   Epoch: 7   Global Step: 37910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:41:58,743-Speed 5531.53 samples/sec   Loss 7.3397   LearningRate 0.0391   Epoch: 7   Global Step: 37920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:42:00,600-Speed 5518.34 samples/sec   Loss 7.4026   LearningRate 0.0391   Epoch: 7   Global Step: 37930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:42:02,436-Speed 5579.92 samples/sec   Loss 7.4039   LearningRate 0.0391   Epoch: 7   Global Step: 37940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:42:04,294-Speed 5515.37 samples/sec   Loss 7.1393   LearningRate 0.0390   Epoch: 7   Global Step: 37950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:42:06,132-Speed 5571.85 samples/sec   Loss 7.2668   LearningRate 0.0390   Epoch: 7   Global Step: 37960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:42:07,972-Speed 5570.74 samples/sec   Loss 7.2622   LearningRate 0.0390   Epoch: 7   Global Step: 37970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:42:09,833-Speed 5505.77 samples/sec   Loss 7.3278   LearningRate 0.0390   Epoch: 7   Global Step: 37980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:42:11,683-Speed 5537.73 samples/sec   Loss 7.2178   LearningRate 0.0390   Epoch: 7   Global Step: 37990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:42:13,551-Speed 5484.89 samples/sec   Loss 7.1594   LearningRate 0.0390   Epoch: 7   Global Step: 38000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:42:40,928-[lfw][38000]XNorm: 23.302278
Training: 2022-04-11 12:42:40,929-[lfw][38000]Accuracy-Flip: 0.99750+-0.00227
Training: 2022-04-11 12:42:40,929-[lfw][38000]Accuracy-Highest: 0.99750
Training: 2022-04-11 12:43:12,481-[cfp_fp][38000]XNorm: 20.459902
Training: 2022-04-11 12:43:12,481-[cfp_fp][38000]Accuracy-Flip: 0.96057+-0.01044
Training: 2022-04-11 12:43:12,482-[cfp_fp][38000]Accuracy-Highest: 0.96057
Training: 2022-04-11 12:43:39,444-[agedb_30][38000]XNorm: 22.929287
Training: 2022-04-11 12:43:39,445-[agedb_30][38000]Accuracy-Flip: 0.97383+-0.00775
Training: 2022-04-11 12:43:39,446-[agedb_30][38000]Accuracy-Highest: 0.97467
Training: 2022-04-11 12:43:41,298-Speed 116.70 samples/sec   Loss 7.2951   LearningRate 0.0390   Epoch: 7   Global Step: 38010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:43:43,114-Speed 5638.21 samples/sec   Loss 7.2955   LearningRate 0.0390   Epoch: 7   Global Step: 38020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:43:44,954-Speed 5568.06 samples/sec   Loss 7.3198   LearningRate 0.0389   Epoch: 7   Global Step: 38030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:43:46,828-Speed 5466.09 samples/sec   Loss 7.3842   LearningRate 0.0389   Epoch: 7   Global Step: 38040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:43:48,670-Speed 5561.46 samples/sec   Loss 7.5181   LearningRate 0.0389   Epoch: 7   Global Step: 38050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:43:50,543-Speed 5473.52 samples/sec   Loss 7.2371   LearningRate 0.0389   Epoch: 7   Global Step: 38060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:43:52,476-Speed 5298.81 samples/sec   Loss 7.4007   LearningRate 0.0389   Epoch: 7   Global Step: 38070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:43:54,316-Speed 5566.77 samples/sec   Loss 7.0927   LearningRate 0.0389   Epoch: 7   Global Step: 38080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:43:56,180-Speed 5498.15 samples/sec   Loss 7.1846   LearningRate 0.0389   Epoch: 7   Global Step: 38090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:43:58,015-Speed 5583.41 samples/sec   Loss 7.2995   LearningRate 0.0389   Epoch: 7   Global Step: 38100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:43:59,897-Speed 5443.47 samples/sec   Loss 7.3514   LearningRate 0.0388   Epoch: 7   Global Step: 38110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:01,754-Speed 5518.83 samples/sec   Loss 7.2803   LearningRate 0.0388   Epoch: 7   Global Step: 38120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:44:03,592-Speed 5571.92 samples/sec   Loss 7.2229   LearningRate 0.0388   Epoch: 7   Global Step: 38130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:44:05,429-Speed 5579.01 samples/sec   Loss 7.2823   LearningRate 0.0388   Epoch: 7   Global Step: 38140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:44:07,306-Speed 5457.72 samples/sec   Loss 7.5271   LearningRate 0.0388   Epoch: 7   Global Step: 38150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:44:09,152-Speed 5550.01 samples/sec   Loss 7.3261   LearningRate 0.0388   Epoch: 7   Global Step: 38160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:44:11,018-Speed 5491.85 samples/sec   Loss 7.0790   LearningRate 0.0388   Epoch: 7   Global Step: 38170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:12,874-Speed 5518.74 samples/sec   Loss 7.2296   LearningRate 0.0388   Epoch: 7   Global Step: 38180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:14,730-Speed 5521.06 samples/sec   Loss 7.1841   LearningRate 0.0387   Epoch: 7   Global Step: 38190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:16,606-Speed 5462.94 samples/sec   Loss 7.3462   LearningRate 0.0387   Epoch: 7   Global Step: 38200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:18,455-Speed 5540.61 samples/sec   Loss 7.2598   LearningRate 0.0387   Epoch: 7   Global Step: 38210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:20,317-Speed 5502.77 samples/sec   Loss 7.1967   LearningRate 0.0387   Epoch: 7   Global Step: 38220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:22,179-Speed 5500.56 samples/sec   Loss 7.2344   LearningRate 0.0387   Epoch: 7   Global Step: 38230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:24,033-Speed 5527.56 samples/sec   Loss 7.1425   LearningRate 0.0387   Epoch: 7   Global Step: 38240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:25,886-Speed 5530.02 samples/sec   Loss 7.2873   LearningRate 0.0387   Epoch: 7   Global Step: 38250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:27,727-Speed 5564.35 samples/sec   Loss 7.2885   LearningRate 0.0387   Epoch: 7   Global Step: 38260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:29,580-Speed 5529.59 samples/sec   Loss 7.3564   LearningRate 0.0386   Epoch: 7   Global Step: 38270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:44:31,444-Speed 5496.72 samples/sec   Loss 7.3050   LearningRate 0.0386   Epoch: 7   Global Step: 38280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:44:33,268-Speed 5619.00 samples/sec   Loss 7.2733   LearningRate 0.0386   Epoch: 7   Global Step: 38290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:35,125-Speed 5516.99 samples/sec   Loss 7.1728   LearningRate 0.0386   Epoch: 7   Global Step: 38300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:36,971-Speed 5549.30 samples/sec   Loss 7.2631   LearningRate 0.0386   Epoch: 7   Global Step: 38310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:38,833-Speed 5503.28 samples/sec   Loss 7.3069   LearningRate 0.0386   Epoch: 7   Global Step: 38320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:40,728-Speed 5408.33 samples/sec   Loss 7.4530   LearningRate 0.0386   Epoch: 7   Global Step: 38330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:42,570-Speed 5559.75 samples/sec   Loss 7.2695   LearningRate 0.0386   Epoch: 7   Global Step: 38340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:44,441-Speed 5477.82 samples/sec   Loss 7.3927   LearningRate 0.0386   Epoch: 7   Global Step: 38350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:46,273-Speed 5592.55 samples/sec   Loss 7.1840   LearningRate 0.0385   Epoch: 7   Global Step: 38360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:48,133-Speed 5510.15 samples/sec   Loss 7.4239   LearningRate 0.0385   Epoch: 7   Global Step: 38370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:49,980-Speed 5545.92 samples/sec   Loss 7.3684   LearningRate 0.0385   Epoch: 7   Global Step: 38380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:51,850-Speed 5480.21 samples/sec   Loss 7.1400   LearningRate 0.0385   Epoch: 7   Global Step: 38390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:44:53,694-Speed 5553.59 samples/sec   Loss 7.2392   LearningRate 0.0385   Epoch: 7   Global Step: 38400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:44:55,547-Speed 5530.34 samples/sec   Loss 7.2331   LearningRate 0.0385   Epoch: 7   Global Step: 38410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:44:57,376-Speed 5601.13 samples/sec   Loss 7.2665   LearningRate 0.0385   Epoch: 7   Global Step: 38420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:44:59,263-Speed 5429.11 samples/sec   Loss 7.2239   LearningRate 0.0385   Epoch: 7   Global Step: 38430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:45:01,100-Speed 5578.85 samples/sec   Loss 7.3788   LearningRate 0.0384   Epoch: 7   Global Step: 38440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:45:02,955-Speed 5523.23 samples/sec   Loss 7.1639   LearningRate 0.0384   Epoch: 7   Global Step: 38450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:45:04,849-Speed 5412.08 samples/sec   Loss 7.0478   LearningRate 0.0384   Epoch: 7   Global Step: 38460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:45:06,696-Speed 5547.20 samples/sec   Loss 7.1936   LearningRate 0.0384   Epoch: 7   Global Step: 38470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:45:08,553-Speed 5516.08 samples/sec   Loss 7.3348   LearningRate 0.0384   Epoch: 7   Global Step: 38480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:45:10,393-Speed 5570.54 samples/sec   Loss 7.1867   LearningRate 0.0384   Epoch: 7   Global Step: 38490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:45:12,264-Speed 5474.41 samples/sec   Loss 7.2439   LearningRate 0.0384   Epoch: 7   Global Step: 38500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:45:14,167-Speed 5384.47 samples/sec   Loss 7.3309   LearningRate 0.0384   Epoch: 7   Global Step: 38510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:16,057-Speed 5420.26 samples/sec   Loss 7.3102   LearningRate 0.0383   Epoch: 7   Global Step: 38520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:17,895-Speed 5576.15 samples/sec   Loss 7.2619   LearningRate 0.0383   Epoch: 7   Global Step: 38530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:19,760-Speed 5493.56 samples/sec   Loss 7.2023   LearningRate 0.0383   Epoch: 7   Global Step: 38540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:21,628-Speed 5486.07 samples/sec   Loss 7.2656   LearningRate 0.0383   Epoch: 7   Global Step: 38550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:23,476-Speed 5543.47 samples/sec   Loss 7.1761   LearningRate 0.0383   Epoch: 7   Global Step: 38560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:25,342-Speed 5489.79 samples/sec   Loss 7.3147   LearningRate 0.0383   Epoch: 7   Global Step: 38570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:27,217-Speed 5463.05 samples/sec   Loss 7.2121   LearningRate 0.0383   Epoch: 7   Global Step: 38580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:29,081-Speed 5496.54 samples/sec   Loss 7.2783   LearningRate 0.0383   Epoch: 7   Global Step: 38590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:30,926-Speed 5553.30 samples/sec   Loss 7.2454   LearningRate 0.0382   Epoch: 7   Global Step: 38600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:32,792-Speed 5491.22 samples/sec   Loss 7.2765   LearningRate 0.0382   Epoch: 7   Global Step: 38610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:45:34,642-Speed 5537.88 samples/sec   Loss 7.3737   LearningRate 0.0382   Epoch: 7   Global Step: 38620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:45:36,507-Speed 5494.21 samples/sec   Loss 7.3236   LearningRate 0.0382   Epoch: 7   Global Step: 38630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:38,359-Speed 5532.67 samples/sec   Loss 7.3940   LearningRate 0.0382   Epoch: 7   Global Step: 38640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:40,232-Speed 5469.55 samples/sec   Loss 7.2158   LearningRate 0.0382   Epoch: 7   Global Step: 38650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:42,073-Speed 5565.75 samples/sec   Loss 7.1282   LearningRate 0.0382   Epoch: 7   Global Step: 38660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:43,963-Speed 5420.76 samples/sec   Loss 7.1754   LearningRate 0.0382   Epoch: 7   Global Step: 38670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:45,805-Speed 5563.52 samples/sec   Loss 7.4189   LearningRate 0.0381   Epoch: 7   Global Step: 38680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:47,653-Speed 5542.84 samples/sec   Loss 7.1285   LearningRate 0.0381   Epoch: 7   Global Step: 38690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:49,495-Speed 5561.72 samples/sec   Loss 7.1391   LearningRate 0.0381   Epoch: 7   Global Step: 38700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:51,341-Speed 5549.68 samples/sec   Loss 7.2488   LearningRate 0.0381   Epoch: 7   Global Step: 38710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:53,247-Speed 5376.78 samples/sec   Loss 7.3477   LearningRate 0.0381   Epoch: 7   Global Step: 38720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:55,092-Speed 5553.70 samples/sec   Loss 7.4401   LearningRate 0.0381   Epoch: 7   Global Step: 38730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:45:56,918-Speed 5609.80 samples/sec   Loss 7.2984   LearningRate 0.0381   Epoch: 7   Global Step: 38740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:45:58,770-Speed 5531.66 samples/sec   Loss 7.2323   LearningRate 0.0381   Epoch: 7   Global Step: 38750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:00,621-Speed 5534.81 samples/sec   Loss 7.2751   LearningRate 0.0380   Epoch: 7   Global Step: 38760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:02,463-Speed 5562.10 samples/sec   Loss 7.2455   LearningRate 0.0380   Epoch: 7   Global Step: 38770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:04,300-Speed 5574.28 samples/sec   Loss 7.2960   LearningRate 0.0380   Epoch: 7   Global Step: 38780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:06,163-Speed 5499.64 samples/sec   Loss 7.1548   LearningRate 0.0380   Epoch: 7   Global Step: 38790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:08,044-Speed 5448.12 samples/sec   Loss 7.2430   LearningRate 0.0380   Epoch: 7   Global Step: 38800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:09,887-Speed 5559.68 samples/sec   Loss 7.2703   LearningRate 0.0380   Epoch: 7   Global Step: 38810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:11,730-Speed 5558.64 samples/sec   Loss 7.0479   LearningRate 0.0380   Epoch: 7   Global Step: 38820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:13,639-Speed 5367.14 samples/sec   Loss 7.4424   LearningRate 0.0380   Epoch: 7   Global Step: 38830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:15,508-Speed 5482.72 samples/sec   Loss 7.1784   LearningRate 0.0380   Epoch: 7   Global Step: 38840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:46:17,356-Speed 5544.67 samples/sec   Loss 7.1335   LearningRate 0.0379   Epoch: 7   Global Step: 38850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:19,197-Speed 5564.81 samples/sec   Loss 7.1020   LearningRate 0.0379   Epoch: 7   Global Step: 38860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:21,080-Speed 5440.02 samples/sec   Loss 7.1936   LearningRate 0.0379   Epoch: 7   Global Step: 38870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:22,917-Speed 5579.71 samples/sec   Loss 7.3675   LearningRate 0.0379   Epoch: 7   Global Step: 38880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:24,790-Speed 5470.54 samples/sec   Loss 7.3256   LearningRate 0.0379   Epoch: 7   Global Step: 38890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:26,631-Speed 5565.33 samples/sec   Loss 7.1014   LearningRate 0.0379   Epoch: 7   Global Step: 38900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:28,488-Speed 5517.76 samples/sec   Loss 7.1317   LearningRate 0.0379   Epoch: 7   Global Step: 38910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:30,368-Speed 5450.05 samples/sec   Loss 7.0521   LearningRate 0.0379   Epoch: 7   Global Step: 38920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:32,213-Speed 5553.31 samples/sec   Loss 7.2641   LearningRate 0.0378   Epoch: 7   Global Step: 38930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:34,066-Speed 5527.30 samples/sec   Loss 7.1140   LearningRate 0.0378   Epoch: 7   Global Step: 38940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:35,962-Speed 5405.98 samples/sec   Loss 7.5098   LearningRate 0.0378   Epoch: 7   Global Step: 38950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:46:37,821-Speed 5511.33 samples/sec   Loss 7.3043   LearningRate 0.0378   Epoch: 7   Global Step: 38960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:39,700-Speed 5453.61 samples/sec   Loss 7.1402   LearningRate 0.0378   Epoch: 7   Global Step: 38970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:41,558-Speed 5512.39 samples/sec   Loss 7.3424   LearningRate 0.0378   Epoch: 7   Global Step: 38980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:43,427-Speed 5483.77 samples/sec   Loss 7.2992   LearningRate 0.0378   Epoch: 7   Global Step: 38990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:45,268-Speed 5566.26 samples/sec   Loss 7.2855   LearningRate 0.0378   Epoch: 7   Global Step: 39000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:47,147-Speed 5450.33 samples/sec   Loss 7.3365   LearningRate 0.0377   Epoch: 7   Global Step: 39010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:49,024-Speed 5459.74 samples/sec   Loss 7.2385   LearningRate 0.0377   Epoch: 7   Global Step: 39020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:50,876-Speed 5533.64 samples/sec   Loss 7.2063   LearningRate 0.0377   Epoch: 7   Global Step: 39030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:52,778-Speed 5386.95 samples/sec   Loss 7.3576   LearningRate 0.0377   Epoch: 7   Global Step: 39040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:54,636-Speed 5514.17 samples/sec   Loss 7.2324   LearningRate 0.0377   Epoch: 7   Global Step: 39050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:46:56,512-Speed 5460.10 samples/sec   Loss 7.1616   LearningRate 0.0377   Epoch: 7   Global Step: 39060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:46:58,364-Speed 5531.65 samples/sec   Loss 7.1052   LearningRate 0.0377   Epoch: 7   Global Step: 39070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:47:00,211-Speed 5549.36 samples/sec   Loss 7.1308   LearningRate 0.0377   Epoch: 7   Global Step: 39080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:02,067-Speed 5519.18 samples/sec   Loss 6.9994   LearningRate 0.0376   Epoch: 7   Global Step: 39090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:03,929-Speed 5502.63 samples/sec   Loss 7.2671   LearningRate 0.0376   Epoch: 7   Global Step: 39100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:05,783-Speed 5528.43 samples/sec   Loss 7.3306   LearningRate 0.0376   Epoch: 7   Global Step: 39110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:07,649-Speed 5490.32 samples/sec   Loss 7.2233   LearningRate 0.0376   Epoch: 7   Global Step: 39120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:09,493-Speed 5556.61 samples/sec   Loss 7.0392   LearningRate 0.0376   Epoch: 7   Global Step: 39130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:11,362-Speed 5481.59 samples/sec   Loss 7.1560   LearningRate 0.0376   Epoch: 7   Global Step: 39140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:13,228-Speed 5491.27 samples/sec   Loss 7.1962   LearningRate 0.0376   Epoch: 7   Global Step: 39150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:15,066-Speed 5580.01 samples/sec   Loss 7.1829   LearningRate 0.0376   Epoch: 7   Global Step: 39160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:16,899-Speed 5586.50 samples/sec   Loss 7.1415   LearningRate 0.0376   Epoch: 7   Global Step: 39170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:18,740-Speed 5563.76 samples/sec   Loss 7.1927   LearningRate 0.0375   Epoch: 7   Global Step: 39180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:20,577-Speed 5578.76 samples/sec   Loss 7.3200   LearningRate 0.0375   Epoch: 7   Global Step: 39190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:22,415-Speed 5574.93 samples/sec   Loss 7.1317   LearningRate 0.0375   Epoch: 7   Global Step: 39200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:24,262-Speed 5546.15 samples/sec   Loss 7.2727   LearningRate 0.0375   Epoch: 7   Global Step: 39210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:26,120-Speed 5514.52 samples/sec   Loss 7.0880   LearningRate 0.0375   Epoch: 7   Global Step: 39220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:28,000-Speed 5447.70 samples/sec   Loss 7.3559   LearningRate 0.0375   Epoch: 7   Global Step: 39230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:29,848-Speed 5545.45 samples/sec   Loss 7.2037   LearningRate 0.0375   Epoch: 7   Global Step: 39240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:31,708-Speed 5506.40 samples/sec   Loss 7.2257   LearningRate 0.0375   Epoch: 7   Global Step: 39250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:33,545-Speed 5579.00 samples/sec   Loss 7.3363   LearningRate 0.0374   Epoch: 7   Global Step: 39260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:35,405-Speed 5506.63 samples/sec   Loss 7.1088   LearningRate 0.0374   Epoch: 7   Global Step: 39270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:37,266-Speed 5505.49 samples/sec   Loss 7.1435   LearningRate 0.0374   Epoch: 7   Global Step: 39280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:39,123-Speed 5518.70 samples/sec   Loss 7.1530   LearningRate 0.0374   Epoch: 7   Global Step: 39290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:40,981-Speed 5515.50 samples/sec   Loss 7.1660   LearningRate 0.0374   Epoch: 7   Global Step: 39300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:42,853-Speed 5472.11 samples/sec   Loss 7.1749   LearningRate 0.0374   Epoch: 7   Global Step: 39310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:44,690-Speed 5579.06 samples/sec   Loss 7.1915   LearningRate 0.0374   Epoch: 7   Global Step: 39320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:46,599-Speed 5366.01 samples/sec   Loss 7.2188   LearningRate 0.0374   Epoch: 7   Global Step: 39330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:48,452-Speed 5526.65 samples/sec   Loss 7.2466   LearningRate 0.0373   Epoch: 7   Global Step: 39340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:50,298-Speed 5552.77 samples/sec   Loss 7.0846   LearningRate 0.0373   Epoch: 7   Global Step: 39350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:52,170-Speed 5471.25 samples/sec   Loss 7.1408   LearningRate 0.0373   Epoch: 7   Global Step: 39360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:54,014-Speed 5557.91 samples/sec   Loss 7.1505   LearningRate 0.0373   Epoch: 7   Global Step: 39370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:47:55,863-Speed 5539.92 samples/sec   Loss 7.1338   LearningRate 0.0373   Epoch: 7   Global Step: 39380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:47:57,698-Speed 5584.44 samples/sec   Loss 7.0347   LearningRate 0.0373   Epoch: 7   Global Step: 39390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:47:59,555-Speed 5517.02 samples/sec   Loss 7.0921   LearningRate 0.0373   Epoch: 7   Global Step: 39400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:48:01,402-Speed 5547.68 samples/sec   Loss 7.3151   LearningRate 0.0373   Epoch: 7   Global Step: 39410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:48:03,276-Speed 5466.46 samples/sec   Loss 7.2297   LearningRate 0.0372   Epoch: 7   Global Step: 39420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:48:05,144-Speed 5484.91 samples/sec   Loss 7.1553   LearningRate 0.0372   Epoch: 7   Global Step: 39430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:48:07,000-Speed 5520.31 samples/sec   Loss 7.1540   LearningRate 0.0372   Epoch: 7   Global Step: 39440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:48:08,870-Speed 5477.51 samples/sec   Loss 7.2544   LearningRate 0.0372   Epoch: 7   Global Step: 39450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:48:10,720-Speed 5541.27 samples/sec   Loss 7.1635   LearningRate 0.0372   Epoch: 7   Global Step: 39460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:48:12,555-Speed 5582.95 samples/sec   Loss 7.2813   LearningRate 0.0372   Epoch: 7   Global Step: 39470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:48:14,409-Speed 5525.95 samples/sec   Loss 7.1896   LearningRate 0.0372   Epoch: 7   Global Step: 39480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:48:16,248-Speed 5571.63 samples/sec   Loss 7.1857   LearningRate 0.0372   Epoch: 7   Global Step: 39490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:48:18,086-Speed 5574.44 samples/sec   Loss 7.3030   LearningRate 0.0372   Epoch: 7   Global Step: 39500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:48:19,939-Speed 5529.27 samples/sec   Loss 7.3020   LearningRate 0.0371   Epoch: 7   Global Step: 39510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:48:21,794-Speed 5523.70 samples/sec   Loss 7.0638   LearningRate 0.0371   Epoch: 7   Global Step: 39520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:48:23,636-Speed 5560.26 samples/sec   Loss 7.1983   LearningRate 0.0371   Epoch: 7   Global Step: 39530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:48:25,490-Speed 5526.33 samples/sec   Loss 7.0780   LearningRate 0.0371   Epoch: 7   Global Step: 39540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:48:27,330-Speed 5569.68 samples/sec   Loss 7.2688   LearningRate 0.0371   Epoch: 7   Global Step: 39550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:48:29,204-Speed 5465.70 samples/sec   Loss 7.3026   LearningRate 0.0371   Epoch: 7   Global Step: 39560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:48:31,038-Speed 5587.99 samples/sec   Loss 7.2399   LearningRate 0.0371   Epoch: 7   Global Step: 39570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:48:32,873-Speed 5582.16 samples/sec   Loss 7.1582   LearningRate 0.0371   Epoch: 7   Global Step: 39580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:48:34,708-Speed 5582.17 samples/sec   Loss 7.1583   LearningRate 0.0370   Epoch: 7   Global Step: 39590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:48:36,547-Speed 5572.34 samples/sec   Loss 7.1304   LearningRate 0.0370   Epoch: 7   Global Step: 39600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:48:38,426-Speed 5450.26 samples/sec   Loss 7.1254   LearningRate 0.0370   Epoch: 7   Global Step: 39610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:48:40,290-Speed 5496.66 samples/sec   Loss 7.2501   LearningRate 0.0370   Epoch: 7   Global Step: 39620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:48:42,171-Speed 5448.42 samples/sec   Loss 7.1059   LearningRate 0.0370   Epoch: 7   Global Step: 39630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:48:44,016-Speed 5550.39 samples/sec   Loss 7.1074   LearningRate 0.0370   Epoch: 7   Global Step: 39640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:48:45,886-Speed 5478.74 samples/sec   Loss 7.1305   LearningRate 0.0370   Epoch: 7   Global Step: 39650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:48:47,746-Speed 5509.24 samples/sec   Loss 7.1140   LearningRate 0.0370   Epoch: 7   Global Step: 39660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:48:49,614-Speed 5483.24 samples/sec   Loss 7.2545   LearningRate 0.0369   Epoch: 7   Global Step: 39670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:48:51,454-Speed 5572.14 samples/sec   Loss 7.2272   LearningRate 0.0369   Epoch: 7   Global Step: 39680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:48:53,312-Speed 5511.41 samples/sec   Loss 7.0514   LearningRate 0.0369   Epoch: 7   Global Step: 39690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:48:55,148-Speed 5582.82 samples/sec   Loss 7.2392   LearningRate 0.0369   Epoch: 7   Global Step: 39700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:48:56,988-Speed 5568.91 samples/sec   Loss 7.0384   LearningRate 0.0369   Epoch: 7   Global Step: 39710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:48:58,851-Speed 5500.09 samples/sec   Loss 7.1776   LearningRate 0.0369   Epoch: 7   Global Step: 39720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:00,704-Speed 5527.57 samples/sec   Loss 7.1014   LearningRate 0.0369   Epoch: 7   Global Step: 39730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:02,578-Speed 5469.42 samples/sec   Loss 7.1293   LearningRate 0.0369   Epoch: 7   Global Step: 39740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:04,419-Speed 5563.74 samples/sec   Loss 7.1175   LearningRate 0.0369   Epoch: 7   Global Step: 39750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:06,309-Speed 5422.01 samples/sec   Loss 7.2100   LearningRate 0.0368   Epoch: 7   Global Step: 39760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:08,155-Speed 5548.80 samples/sec   Loss 7.2767   LearningRate 0.0368   Epoch: 7   Global Step: 39770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:10,028-Speed 5472.64 samples/sec   Loss 7.0829   LearningRate 0.0368   Epoch: 7   Global Step: 39780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:11,882-Speed 5525.42 samples/sec   Loss 7.1318   LearningRate 0.0368   Epoch: 7   Global Step: 39790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:13,750-Speed 5483.77 samples/sec   Loss 7.2383   LearningRate 0.0368   Epoch: 7   Global Step: 39800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:49:15,597-Speed 5547.82 samples/sec   Loss 7.1660   LearningRate 0.0368   Epoch: 7   Global Step: 39810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:49:17,492-Speed 5407.52 samples/sec   Loss 7.2523   LearningRate 0.0368   Epoch: 7   Global Step: 39820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:49:19,328-Speed 5578.41 samples/sec   Loss 7.0912   LearningRate 0.0368   Epoch: 7   Global Step: 39830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:49:21,186-Speed 5513.95 samples/sec   Loss 7.1846   LearningRate 0.0367   Epoch: 7   Global Step: 39840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:23,032-Speed 5551.27 samples/sec   Loss 6.9945   LearningRate 0.0367   Epoch: 7   Global Step: 39850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:24,921-Speed 5426.07 samples/sec   Loss 7.1986   LearningRate 0.0367   Epoch: 7   Global Step: 39860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:26,770-Speed 5538.37 samples/sec   Loss 7.0925   LearningRate 0.0367   Epoch: 7   Global Step: 39870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:28,637-Speed 5486.21 samples/sec   Loss 7.0917   LearningRate 0.0367   Epoch: 7   Global Step: 39880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:30,503-Speed 5494.07 samples/sec   Loss 7.2112   LearningRate 0.0367   Epoch: 7   Global Step: 39890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:32,383-Speed 5448.19 samples/sec   Loss 7.1259   LearningRate 0.0367   Epoch: 7   Global Step: 39900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:34,231-Speed 5543.18 samples/sec   Loss 6.9011   LearningRate 0.0367   Epoch: 7   Global Step: 39910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:36,109-Speed 5457.01 samples/sec   Loss 6.9648   LearningRate 0.0366   Epoch: 7   Global Step: 39920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:37,969-Speed 5508.05 samples/sec   Loss 7.2155   LearningRate 0.0366   Epoch: 7   Global Step: 39930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:39,858-Speed 5424.47 samples/sec   Loss 7.0926   LearningRate 0.0366   Epoch: 7   Global Step: 39940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:49:41,700-Speed 5561.39 samples/sec   Loss 7.2699   LearningRate 0.0366   Epoch: 7   Global Step: 39950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:49:43,568-Speed 5485.79 samples/sec   Loss 7.1662   LearningRate 0.0366   Epoch: 7   Global Step: 39960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:49:45,427-Speed 5509.95 samples/sec   Loss 6.9965   LearningRate 0.0366   Epoch: 7   Global Step: 39970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:49:47,293-Speed 5492.36 samples/sec   Loss 7.1114   LearningRate 0.0366   Epoch: 7   Global Step: 39980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:49:49,133-Speed 5565.34 samples/sec   Loss 7.2556   LearningRate 0.0366   Epoch: 7   Global Step: 39990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:49:51,012-Speed 5458.44 samples/sec   Loss 7.1388   LearningRate 0.0366   Epoch: 7   Global Step: 40000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:50:18,457-[lfw][40000]XNorm: 21.711813
Training: 2022-04-11 12:50:18,458-[lfw][40000]Accuracy-Flip: 0.99767+-0.00271
Training: 2022-04-11 12:50:18,458-[lfw][40000]Accuracy-Highest: 0.99767
Training: 2022-04-11 12:50:49,794-[cfp_fp][40000]XNorm: 19.317273
Training: 2022-04-11 12:50:49,794-[cfp_fp][40000]Accuracy-Flip: 0.96414+-0.01160
Training: 2022-04-11 12:50:49,795-[cfp_fp][40000]Accuracy-Highest: 0.96414
Training: 2022-04-11 12:51:16,982-[agedb_30][40000]XNorm: 21.591740
Training: 2022-04-11 12:51:16,983-[agedb_30][40000]Accuracy-Flip: 0.97567+-0.00602
Training: 2022-04-11 12:51:16,983-[agedb_30][40000]Accuracy-Highest: 0.97567
Training: 2022-04-11 12:51:18,854-Speed 116.57 samples/sec   Loss 7.1806   LearningRate 0.0365   Epoch: 7   Global Step: 40010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:51:20,683-Speed 5600.81 samples/sec   Loss 7.0874   LearningRate 0.0365   Epoch: 7   Global Step: 40020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:51:22,540-Speed 5517.48 samples/sec   Loss 7.1300   LearningRate 0.0365   Epoch: 7   Global Step: 40030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:51:24,372-Speed 5593.32 samples/sec   Loss 7.1171   LearningRate 0.0365   Epoch: 7   Global Step: 40040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:51:26,223-Speed 5533.34 samples/sec   Loss 7.1693   LearningRate 0.0365   Epoch: 7   Global Step: 40050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:51:28,062-Speed 5571.03 samples/sec   Loss 7.1524   LearningRate 0.0365   Epoch: 7   Global Step: 40060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:51:29,961-Speed 5397.03 samples/sec   Loss 7.0678   LearningRate 0.0365   Epoch: 7   Global Step: 40070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:51:31,787-Speed 5608.45 samples/sec   Loss 7.0991   LearningRate 0.0365   Epoch: 7   Global Step: 40080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:51:33,671-Speed 5440.00 samples/sec   Loss 7.1324   LearningRate 0.0364   Epoch: 7   Global Step: 40090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:51:35,519-Speed 5543.75 samples/sec   Loss 7.0850   LearningRate 0.0364   Epoch: 7   Global Step: 40100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:51:37,342-Speed 5619.68 samples/sec   Loss 7.0926   LearningRate 0.0364   Epoch: 7   Global Step: 40110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:51:39,184-Speed 5562.31 samples/sec   Loss 7.2416   LearningRate 0.0364   Epoch: 7   Global Step: 40120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:51:41,031-Speed 5545.47 samples/sec   Loss 7.2158   LearningRate 0.0364   Epoch: 7   Global Step: 40130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:51:42,936-Speed 5381.47 samples/sec   Loss 7.1444   LearningRate 0.0364   Epoch: 7   Global Step: 40140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:51:44,774-Speed 5573.32 samples/sec   Loss 7.0095   LearningRate 0.0364   Epoch: 7   Global Step: 40150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:51:46,630-Speed 5520.02 samples/sec   Loss 7.1711   LearningRate 0.0364   Epoch: 7   Global Step: 40160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:51:48,479-Speed 5541.40 samples/sec   Loss 7.1747   LearningRate 0.0363   Epoch: 7   Global Step: 40170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:51:50,338-Speed 5509.39 samples/sec   Loss 7.1285   LearningRate 0.0363   Epoch: 7   Global Step: 40180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:51:52,191-Speed 5531.20 samples/sec   Loss 7.0384   LearningRate 0.0363   Epoch: 7   Global Step: 40190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:51:54,057-Speed 5489.60 samples/sec   Loss 7.0183   LearningRate 0.0363   Epoch: 7   Global Step: 40200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:51:55,898-Speed 5564.98 samples/sec   Loss 7.0969   LearningRate 0.0363   Epoch: 7   Global Step: 40210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:51:57,779-Speed 5449.46 samples/sec   Loss 7.0555   LearningRate 0.0363   Epoch: 7   Global Step: 40220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:51:59,617-Speed 5572.47 samples/sec   Loss 6.9275   LearningRate 0.0363   Epoch: 7   Global Step: 40230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:01,509-Speed 5414.25 samples/sec   Loss 7.1170   LearningRate 0.0363   Epoch: 7   Global Step: 40240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:03,383-Speed 5468.58 samples/sec   Loss 7.0848   LearningRate 0.0363   Epoch: 7   Global Step: 40250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:05,213-Speed 5599.34 samples/sec   Loss 7.0953   LearningRate 0.0362   Epoch: 7   Global Step: 40260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:07,058-Speed 5552.00 samples/sec   Loss 7.0553   LearningRate 0.0362   Epoch: 7   Global Step: 40270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:08,891-Speed 5591.34 samples/sec   Loss 7.1141   LearningRate 0.0362   Epoch: 7   Global Step: 40280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:10,725-Speed 5582.88 samples/sec   Loss 7.0501   LearningRate 0.0362   Epoch: 7   Global Step: 40290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:12,558-Speed 5589.27 samples/sec   Loss 7.0091   LearningRate 0.0362   Epoch: 7   Global Step: 40300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:14,410-Speed 5532.96 samples/sec   Loss 7.0876   LearningRate 0.0362   Epoch: 7   Global Step: 40310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:16,292-Speed 5443.82 samples/sec   Loss 7.0404   LearningRate 0.0362   Epoch: 7   Global Step: 40320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:52:18,147-Speed 5521.63 samples/sec   Loss 7.0027   LearningRate 0.0362   Epoch: 7   Global Step: 40330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:20,015-Speed 5485.81 samples/sec   Loss 7.1544   LearningRate 0.0361   Epoch: 7   Global Step: 40340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:21,849-Speed 5585.27 samples/sec   Loss 7.1984   LearningRate 0.0361   Epoch: 7   Global Step: 40350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:23,731-Speed 5444.93 samples/sec   Loss 6.8776   LearningRate 0.0361   Epoch: 7   Global Step: 40360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:25,569-Speed 5574.73 samples/sec   Loss 7.0660   LearningRate 0.0361   Epoch: 7   Global Step: 40370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:27,425-Speed 5520.74 samples/sec   Loss 7.0552   LearningRate 0.0361   Epoch: 7   Global Step: 40380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:29,259-Speed 5587.87 samples/sec   Loss 7.1136   LearningRate 0.0361   Epoch: 7   Global Step: 40390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:31,089-Speed 5594.85 samples/sec   Loss 6.8979   LearningRate 0.0361   Epoch: 7   Global Step: 40400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:32,969-Speed 5452.07 samples/sec   Loss 7.3568   LearningRate 0.0361   Epoch: 7   Global Step: 40410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:34,829-Speed 5506.32 samples/sec   Loss 7.0255   LearningRate 0.0361   Epoch: 7   Global Step: 40420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:36,680-Speed 5537.05 samples/sec   Loss 7.1005   LearningRate 0.0360   Epoch: 7   Global Step: 40430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:52:38,535-Speed 5522.79 samples/sec   Loss 7.0519   LearningRate 0.0360   Epoch: 7   Global Step: 40440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:40,412-Speed 5458.85 samples/sec   Loss 6.9119   LearningRate 0.0360   Epoch: 7   Global Step: 40450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:42,307-Speed 5406.53 samples/sec   Loss 7.1936   LearningRate 0.0360   Epoch: 7   Global Step: 40460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:53,606-Speed 906.37 samples/sec   Loss 6.4994   LearningRate 0.0360   Epoch: 8   Global Step: 40470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:55,490-Speed 5440.68 samples/sec   Loss 6.1681   LearningRate 0.0360   Epoch: 8   Global Step: 40480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:57,368-Speed 5455.68 samples/sec   Loss 6.1962   LearningRate 0.0360   Epoch: 8   Global Step: 40490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:52:59,237-Speed 5481.64 samples/sec   Loss 6.2690   LearningRate 0.0360   Epoch: 8   Global Step: 40500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:01,367-Speed 4809.46 samples/sec   Loss 6.1490   LearningRate 0.0359   Epoch: 8   Global Step: 40510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:03,230-Speed 5500.03 samples/sec   Loss 6.0908   LearningRate 0.0359   Epoch: 8   Global Step: 40520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:05,065-Speed 5582.34 samples/sec   Loss 6.3503   LearningRate 0.0359   Epoch: 8   Global Step: 40530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:06,928-Speed 5499.92 samples/sec   Loss 6.2989   LearningRate 0.0359   Epoch: 8   Global Step: 40540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:53:08,766-Speed 5573.04 samples/sec   Loss 6.2958   LearningRate 0.0359   Epoch: 8   Global Step: 40550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:10,620-Speed 5526.42 samples/sec   Loss 6.2625   LearningRate 0.0359   Epoch: 8   Global Step: 40560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:12,470-Speed 5536.46 samples/sec   Loss 6.3425   LearningRate 0.0359   Epoch: 8   Global Step: 40570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:14,317-Speed 5547.55 samples/sec   Loss 6.3070   LearningRate 0.0359   Epoch: 8   Global Step: 40580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:16,173-Speed 5520.66 samples/sec   Loss 6.3457   LearningRate 0.0359   Epoch: 8   Global Step: 40590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:18,009-Speed 5581.80 samples/sec   Loss 6.4110   LearningRate 0.0358   Epoch: 8   Global Step: 40600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:19,843-Speed 5586.52 samples/sec   Loss 6.2888   LearningRate 0.0358   Epoch: 8   Global Step: 40610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:21,714-Speed 5476.03 samples/sec   Loss 6.5423   LearningRate 0.0358   Epoch: 8   Global Step: 40620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:23,557-Speed 5558.14 samples/sec   Loss 6.3904   LearningRate 0.0358   Epoch: 8   Global Step: 40630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:25,405-Speed 5544.67 samples/sec   Loss 6.4062   LearningRate 0.0358   Epoch: 8   Global Step: 40640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:27,295-Speed 5421.95 samples/sec   Loss 6.4455   LearningRate 0.0358   Epoch: 8   Global Step: 40650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:29,148-Speed 5527.60 samples/sec   Loss 6.4658   LearningRate 0.0358   Epoch: 8   Global Step: 40660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:31,000-Speed 5532.46 samples/sec   Loss 6.5028   LearningRate 0.0358   Epoch: 8   Global Step: 40670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:32,840-Speed 5569.10 samples/sec   Loss 6.4932   LearningRate 0.0357   Epoch: 8   Global Step: 40680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:34,697-Speed 5515.51 samples/sec   Loss 6.2801   LearningRate 0.0357   Epoch: 8   Global Step: 40690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:36,524-Speed 5609.03 samples/sec   Loss 6.5887   LearningRate 0.0357   Epoch: 8   Global Step: 40700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:38,380-Speed 5520.91 samples/sec   Loss 6.6162   LearningRate 0.0357   Epoch: 8   Global Step: 40710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:40,218-Speed 5574.10 samples/sec   Loss 6.4475   LearningRate 0.0357   Epoch: 8   Global Step: 40720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:42,071-Speed 5528.77 samples/sec   Loss 6.3871   LearningRate 0.0357   Epoch: 8   Global Step: 40730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:43,911-Speed 5568.04 samples/sec   Loss 6.4052   LearningRate 0.0357   Epoch: 8   Global Step: 40740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:45,740-Speed 5602.40 samples/sec   Loss 6.4139   LearningRate 0.0357   Epoch: 8   Global Step: 40750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:53:47,581-Speed 5563.66 samples/sec   Loss 6.5116   LearningRate 0.0356   Epoch: 8   Global Step: 40760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:53:49,426-Speed 5554.91 samples/sec   Loss 6.6111   LearningRate 0.0356   Epoch: 8   Global Step: 40770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:51,326-Speed 5389.06 samples/sec   Loss 6.3383   LearningRate 0.0356   Epoch: 8   Global Step: 40780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:53,161-Speed 5584.60 samples/sec   Loss 6.5111   LearningRate 0.0356   Epoch: 8   Global Step: 40790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:54,992-Speed 5595.58 samples/sec   Loss 6.4119   LearningRate 0.0356   Epoch: 8   Global Step: 40800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:56,831-Speed 5571.43 samples/sec   Loss 6.3418   LearningRate 0.0356   Epoch: 8   Global Step: 40810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:53:58,669-Speed 5572.28 samples/sec   Loss 6.5225   LearningRate 0.0356   Epoch: 8   Global Step: 40820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:00,514-Speed 5555.04 samples/sec   Loss 6.4528   LearningRate 0.0356   Epoch: 8   Global Step: 40830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:02,388-Speed 5467.21 samples/sec   Loss 6.5992   LearningRate 0.0356   Epoch: 8   Global Step: 40840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:04,250-Speed 5503.54 samples/sec   Loss 6.5112   LearningRate 0.0355   Epoch: 8   Global Step: 40850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:06,133-Speed 5438.53 samples/sec   Loss 6.4350   LearningRate 0.0355   Epoch: 8   Global Step: 40860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:07,965-Speed 5595.23 samples/sec   Loss 6.5268   LearningRate 0.0355   Epoch: 8   Global Step: 40870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:09,801-Speed 5581.42 samples/sec   Loss 6.6316   LearningRate 0.0355   Epoch: 8   Global Step: 40880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:11,668-Speed 5486.77 samples/sec   Loss 6.4848   LearningRate 0.0355   Epoch: 8   Global Step: 40890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:13,511-Speed 5559.98 samples/sec   Loss 6.5970   LearningRate 0.0355   Epoch: 8   Global Step: 40900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:15,360-Speed 5542.83 samples/sec   Loss 6.6499   LearningRate 0.0355   Epoch: 8   Global Step: 40910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:17,214-Speed 5525.76 samples/sec   Loss 6.5167   LearningRate 0.0355   Epoch: 8   Global Step: 40920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:19,049-Speed 5585.02 samples/sec   Loss 6.5807   LearningRate 0.0354   Epoch: 8   Global Step: 40930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:20,883-Speed 5585.06 samples/sec   Loss 6.5908   LearningRate 0.0354   Epoch: 8   Global Step: 40940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:22,858-Speed 5186.84 samples/sec   Loss 6.5316   LearningRate 0.0354   Epoch: 8   Global Step: 40950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:24,716-Speed 5517.26 samples/sec   Loss 6.6614   LearningRate 0.0354   Epoch: 8   Global Step: 40960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:26,590-Speed 5465.86 samples/sec   Loss 6.5127   LearningRate 0.0354   Epoch: 8   Global Step: 40970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:28,446-Speed 5521.11 samples/sec   Loss 6.6580   LearningRate 0.0354   Epoch: 8   Global Step: 40980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:30,317-Speed 5473.13 samples/sec   Loss 6.6972   LearningRate 0.0354   Epoch: 8   Global Step: 40990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:32,153-Speed 5581.67 samples/sec   Loss 6.3254   LearningRate 0.0354   Epoch: 8   Global Step: 41000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:34,039-Speed 5433.17 samples/sec   Loss 6.7181   LearningRate 0.0354   Epoch: 8   Global Step: 41010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:35,910-Speed 5474.67 samples/sec   Loss 6.5773   LearningRate 0.0353   Epoch: 8   Global Step: 41020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:37,756-Speed 5551.41 samples/sec   Loss 6.4268   LearningRate 0.0353   Epoch: 8   Global Step: 41030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:39,604-Speed 5544.35 samples/sec   Loss 6.5583   LearningRate 0.0353   Epoch: 8   Global Step: 41040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:41,444-Speed 5570.56 samples/sec   Loss 6.4667   LearningRate 0.0353   Epoch: 8   Global Step: 41050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:43,287-Speed 5561.12 samples/sec   Loss 6.6930   LearningRate 0.0353   Epoch: 8   Global Step: 41060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:45,143-Speed 5518.84 samples/sec   Loss 6.5914   LearningRate 0.0353   Epoch: 8   Global Step: 41070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:54:47,009-Speed 5492.08 samples/sec   Loss 6.5599   LearningRate 0.0353   Epoch: 8   Global Step: 41080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:48,844-Speed 5585.23 samples/sec   Loss 6.7914   LearningRate 0.0353   Epoch: 8   Global Step: 41090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:50,683-Speed 5570.15 samples/sec   Loss 6.6186   LearningRate 0.0352   Epoch: 8   Global Step: 41100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:52,535-Speed 5533.72 samples/sec   Loss 6.7431   LearningRate 0.0352   Epoch: 8   Global Step: 41110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:54,398-Speed 5499.60 samples/sec   Loss 6.5158   LearningRate 0.0352   Epoch: 8   Global Step: 41120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:56,243-Speed 5552.44 samples/sec   Loss 6.6296   LearningRate 0.0352   Epoch: 8   Global Step: 41130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:58,077-Speed 5584.63 samples/sec   Loss 6.7111   LearningRate 0.0352   Epoch: 8   Global Step: 41140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:54:59,929-Speed 5530.54 samples/sec   Loss 6.5551   LearningRate 0.0352   Epoch: 8   Global Step: 41150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:01,778-Speed 5541.58 samples/sec   Loss 6.5893   LearningRate 0.0352   Epoch: 8   Global Step: 41160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:03,678-Speed 5392.22 samples/sec   Loss 6.5234   LearningRate 0.0352   Epoch: 8   Global Step: 41170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:05,519-Speed 5564.83 samples/sec   Loss 6.7267   LearningRate 0.0352   Epoch: 8   Global Step: 41180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:55:07,350-Speed 5596.19 samples/sec   Loss 6.7300   LearningRate 0.0351   Epoch: 8   Global Step: 41190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:09,195-Speed 5554.20 samples/sec   Loss 6.7771   LearningRate 0.0351   Epoch: 8   Global Step: 41200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:11,097-Speed 5385.99 samples/sec   Loss 6.5884   LearningRate 0.0351   Epoch: 8   Global Step: 41210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:12,966-Speed 5482.71 samples/sec   Loss 6.6635   LearningRate 0.0351   Epoch: 8   Global Step: 41220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:14,846-Speed 5449.48 samples/sec   Loss 6.6365   LearningRate 0.0351   Epoch: 8   Global Step: 41230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:16,680-Speed 5586.45 samples/sec   Loss 6.6811   LearningRate 0.0351   Epoch: 8   Global Step: 41240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:18,583-Speed 5384.28 samples/sec   Loss 6.6635   LearningRate 0.0351   Epoch: 8   Global Step: 41250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:20,445-Speed 5501.99 samples/sec   Loss 6.8160   LearningRate 0.0351   Epoch: 8   Global Step: 41260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:22,278-Speed 5587.87 samples/sec   Loss 6.6042   LearningRate 0.0351   Epoch: 8   Global Step: 41270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:24,154-Speed 5462.08 samples/sec   Loss 6.6654   LearningRate 0.0350   Epoch: 8   Global Step: 41280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:25,998-Speed 5557.55 samples/sec   Loss 6.7233   LearningRate 0.0350   Epoch: 8   Global Step: 41290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:55:27,857-Speed 5510.70 samples/sec   Loss 6.8340   LearningRate 0.0350   Epoch: 8   Global Step: 41300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:55:29,723-Speed 5489.75 samples/sec   Loss 6.7550   LearningRate 0.0350   Epoch: 8   Global Step: 41310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:55:31,566-Speed 5558.42 samples/sec   Loss 6.8216   LearningRate 0.0350   Epoch: 8   Global Step: 41320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:55:33,398-Speed 5594.26 samples/sec   Loss 6.7038   LearningRate 0.0350   Epoch: 8   Global Step: 41330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:35,254-Speed 5519.07 samples/sec   Loss 6.7546   LearningRate 0.0350   Epoch: 8   Global Step: 41340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:37,115-Speed 5506.11 samples/sec   Loss 6.7979   LearningRate 0.0350   Epoch: 8   Global Step: 41350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:38,956-Speed 5565.87 samples/sec   Loss 6.7172   LearningRate 0.0349   Epoch: 8   Global Step: 41360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:55:40,803-Speed 5545.22 samples/sec   Loss 6.7757   LearningRate 0.0349   Epoch: 8   Global Step: 41370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:55:42,670-Speed 5489.52 samples/sec   Loss 6.6520   LearningRate 0.0349   Epoch: 8   Global Step: 41380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:55:44,512-Speed 5563.78 samples/sec   Loss 6.7837   LearningRate 0.0349   Epoch: 8   Global Step: 41390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:55:46,355-Speed 5558.30 samples/sec   Loss 6.7397   LearningRate 0.0349   Epoch: 8   Global Step: 41400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:55:48,224-Speed 5484.70 samples/sec   Loss 6.8185   LearningRate 0.0349   Epoch: 8   Global Step: 41410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:55:50,068-Speed 5554.26 samples/sec   Loss 6.7104   LearningRate 0.0349   Epoch: 8   Global Step: 41420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:55:51,908-Speed 5567.32 samples/sec   Loss 6.7141   LearningRate 0.0349   Epoch: 8   Global Step: 41430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:55:53,751-Speed 5561.57 samples/sec   Loss 6.8510   LearningRate 0.0349   Epoch: 8   Global Step: 41440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:55:55,597-Speed 5551.49 samples/sec   Loss 6.8126   LearningRate 0.0348   Epoch: 8   Global Step: 41450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:55:57,433-Speed 5580.41 samples/sec   Loss 6.6420   LearningRate 0.0348   Epoch: 8   Global Step: 41460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:55:59,295-Speed 5503.28 samples/sec   Loss 6.6847   LearningRate 0.0348   Epoch: 8   Global Step: 41470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:01,140-Speed 5553.21 samples/sec   Loss 6.7447   LearningRate 0.0348   Epoch: 8   Global Step: 41480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:03,031-Speed 5418.75 samples/sec   Loss 6.5563   LearningRate 0.0348   Epoch: 8   Global Step: 41490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:04,887-Speed 5519.98 samples/sec   Loss 6.8907   LearningRate 0.0348   Epoch: 8   Global Step: 41500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:06,755-Speed 5485.15 samples/sec   Loss 6.7297   LearningRate 0.0348   Epoch: 8   Global Step: 41510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:08,588-Speed 5591.66 samples/sec   Loss 6.7965   LearningRate 0.0348   Epoch: 8   Global Step: 41520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:10,453-Speed 5491.72 samples/sec   Loss 6.7015   LearningRate 0.0347   Epoch: 8   Global Step: 41530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:12,298-Speed 5554.34 samples/sec   Loss 6.7559   LearningRate 0.0347   Epoch: 8   Global Step: 41540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:14,139-Speed 5563.92 samples/sec   Loss 6.7466   LearningRate 0.0347   Epoch: 8   Global Step: 41550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:15,971-Speed 5594.55 samples/sec   Loss 6.8363   LearningRate 0.0347   Epoch: 8   Global Step: 41560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:17,824-Speed 5527.58 samples/sec   Loss 6.5704   LearningRate 0.0347   Epoch: 8   Global Step: 41570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:19,676-Speed 5532.05 samples/sec   Loss 6.6520   LearningRate 0.0347   Epoch: 8   Global Step: 41580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:21,549-Speed 5471.88 samples/sec   Loss 6.6998   LearningRate 0.0347   Epoch: 8   Global Step: 41590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:23,393-Speed 5556.04 samples/sec   Loss 6.6978   LearningRate 0.0347   Epoch: 8   Global Step: 41600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:25,262-Speed 5481.10 samples/sec   Loss 6.6541   LearningRate 0.0347   Epoch: 8   Global Step: 41610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:27,128-Speed 5491.85 samples/sec   Loss 6.9014   LearningRate 0.0346   Epoch: 8   Global Step: 41620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:29,011-Speed 5442.64 samples/sec   Loss 6.8556   LearningRate 0.0346   Epoch: 8   Global Step: 41630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:30,866-Speed 5522.85 samples/sec   Loss 6.7471   LearningRate 0.0346   Epoch: 8   Global Step: 41640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:32,726-Speed 5507.39 samples/sec   Loss 6.7684   LearningRate 0.0346   Epoch: 8   Global Step: 41650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:34,562-Speed 5579.89 samples/sec   Loss 6.8252   LearningRate 0.0346   Epoch: 8   Global Step: 41660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:56:36,422-Speed 5508.02 samples/sec   Loss 6.7134   LearningRate 0.0346   Epoch: 8   Global Step: 41670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:56:38,270-Speed 5544.40 samples/sec   Loss 6.8432   LearningRate 0.0346   Epoch: 8   Global Step: 41680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:56:40,107-Speed 5577.11 samples/sec   Loss 6.6112   LearningRate 0.0346   Epoch: 8   Global Step: 41690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:41,974-Speed 5488.52 samples/sec   Loss 6.6571   LearningRate 0.0345   Epoch: 8   Global Step: 41700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:43,808-Speed 5585.54 samples/sec   Loss 6.7959   LearningRate 0.0345   Epoch: 8   Global Step: 41710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:45,657-Speed 5540.82 samples/sec   Loss 6.9104   LearningRate 0.0345   Epoch: 8   Global Step: 41720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:47,491-Speed 5587.60 samples/sec   Loss 6.7875   LearningRate 0.0345   Epoch: 8   Global Step: 41730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:49,328-Speed 5576.29 samples/sec   Loss 6.8621   LearningRate 0.0345   Epoch: 8   Global Step: 41740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:51,162-Speed 5587.83 samples/sec   Loss 6.8637   LearningRate 0.0345   Epoch: 8   Global Step: 41750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:52,999-Speed 5577.41 samples/sec   Loss 6.6582   LearningRate 0.0345   Epoch: 8   Global Step: 41760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:54,851-Speed 5529.23 samples/sec   Loss 7.0618   LearningRate 0.0345   Epoch: 8   Global Step: 41770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:56,695-Speed 5555.55 samples/sec   Loss 6.7641   LearningRate 0.0345   Epoch: 8   Global Step: 41780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:56:58,544-Speed 5543.77 samples/sec   Loss 6.8819   LearningRate 0.0344   Epoch: 8   Global Step: 41790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:57:00,378-Speed 5586.39 samples/sec   Loss 6.7519   LearningRate 0.0344   Epoch: 8   Global Step: 41800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:57:02,231-Speed 5527.39 samples/sec   Loss 6.8178   LearningRate 0.0344   Epoch: 8   Global Step: 41810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:57:04,086-Speed 5521.78 samples/sec   Loss 6.7201   LearningRate 0.0344   Epoch: 8   Global Step: 41820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:57:05,930-Speed 5557.04 samples/sec   Loss 6.7628   LearningRate 0.0344   Epoch: 8   Global Step: 41830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:57:07,761-Speed 5593.22 samples/sec   Loss 6.8591   LearningRate 0.0344   Epoch: 8   Global Step: 41840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:57:09,591-Speed 5597.89 samples/sec   Loss 6.6987   LearningRate 0.0344   Epoch: 8   Global Step: 41850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:57:11,418-Speed 5606.05 samples/sec   Loss 6.6992   LearningRate 0.0344   Epoch: 8   Global Step: 41860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:57:13,262-Speed 5556.72 samples/sec   Loss 6.8794   LearningRate 0.0344   Epoch: 8   Global Step: 41870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:57:15,099-Speed 5576.52 samples/sec   Loss 6.6153   LearningRate 0.0343   Epoch: 8   Global Step: 41880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:57:16,935-Speed 5581.78 samples/sec   Loss 6.6003   LearningRate 0.0343   Epoch: 8   Global Step: 41890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:57:18,770-Speed 5582.96 samples/sec   Loss 6.7566   LearningRate 0.0343   Epoch: 8   Global Step: 41900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:57:20,605-Speed 5582.69 samples/sec   Loss 6.8044   LearningRate 0.0343   Epoch: 8   Global Step: 41910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:57:22,440-Speed 5579.62 samples/sec   Loss 6.6694   LearningRate 0.0343   Epoch: 8   Global Step: 41920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:57:24,273-Speed 5591.30 samples/sec   Loss 6.7956   LearningRate 0.0343   Epoch: 8   Global Step: 41930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:57:26,108-Speed 5580.79 samples/sec   Loss 6.8718   LearningRate 0.0343   Epoch: 8   Global Step: 41940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:57:27,947-Speed 5571.49 samples/sec   Loss 6.7435   LearningRate 0.0343   Epoch: 8   Global Step: 41950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:57:29,793-Speed 5550.60 samples/sec   Loss 6.8566   LearningRate 0.0342   Epoch: 8   Global Step: 41960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:57:31,627-Speed 5582.45 samples/sec   Loss 6.8662   LearningRate 0.0342   Epoch: 8   Global Step: 41970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:57:33,478-Speed 5536.45 samples/sec   Loss 6.9062   LearningRate 0.0342   Epoch: 8   Global Step: 41980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:57:35,313-Speed 5582.95 samples/sec   Loss 6.9330   LearningRate 0.0342   Epoch: 8   Global Step: 41990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:57:37,153-Speed 5568.89 samples/sec   Loss 6.8319   LearningRate 0.0342   Epoch: 8   Global Step: 42000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:58:04,422-[lfw][42000]XNorm: 23.336758
Training: 2022-04-11 12:58:04,422-[lfw][42000]Accuracy-Flip: 0.99750+-0.00227
Training: 2022-04-11 12:58:04,423-[lfw][42000]Accuracy-Highest: 0.99767
Training: 2022-04-11 12:58:36,034-[cfp_fp][42000]XNorm: 20.295787
Training: 2022-04-11 12:58:36,035-[cfp_fp][42000]Accuracy-Flip: 0.95957+-0.00998
Training: 2022-04-11 12:58:36,036-[cfp_fp][42000]Accuracy-Highest: 0.96414
Training: 2022-04-11 12:59:03,276-[agedb_30][42000]XNorm: 22.960819
Training: 2022-04-11 12:59:03,277-[agedb_30][42000]Accuracy-Flip: 0.97533+-0.00802
Training: 2022-04-11 12:59:03,277-[agedb_30][42000]Accuracy-Highest: 0.97567
Training: 2022-04-11 12:59:05,140-Speed 116.38 samples/sec   Loss 6.7342   LearningRate 0.0342   Epoch: 8   Global Step: 42010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:59:06,991-Speed 5535.02 samples/sec   Loss 6.8189   LearningRate 0.0342   Epoch: 8   Global Step: 42020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:59:08,817-Speed 5611.18 samples/sec   Loss 6.7574   LearningRate 0.0342   Epoch: 8   Global Step: 42030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:59:10,654-Speed 5576.33 samples/sec   Loss 6.7431   LearningRate 0.0342   Epoch: 8   Global Step: 42040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:59:12,491-Speed 5577.92 samples/sec   Loss 6.8638   LearningRate 0.0341   Epoch: 8   Global Step: 42050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:59:14,336-Speed 5555.76 samples/sec   Loss 6.7581   LearningRate 0.0341   Epoch: 8   Global Step: 42060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 12:59:16,172-Speed 5579.32 samples/sec   Loss 6.8437   LearningRate 0.0341   Epoch: 8   Global Step: 42070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:18,010-Speed 5572.82 samples/sec   Loss 6.8691   LearningRate 0.0341   Epoch: 8   Global Step: 42080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:19,878-Speed 5488.60 samples/sec   Loss 6.8858   LearningRate 0.0341   Epoch: 8   Global Step: 42090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:21,729-Speed 5533.76 samples/sec   Loss 6.6954   LearningRate 0.0341   Epoch: 8   Global Step: 42100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:23,576-Speed 5546.01 samples/sec   Loss 6.8480   LearningRate 0.0341   Epoch: 8   Global Step: 42110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:25,428-Speed 5532.95 samples/sec   Loss 6.7276   LearningRate 0.0341   Epoch: 8   Global Step: 42120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:27,316-Speed 5425.98 samples/sec   Loss 6.8074   LearningRate 0.0341   Epoch: 8   Global Step: 42130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:29,150-Speed 5586.97 samples/sec   Loss 6.6836   LearningRate 0.0340   Epoch: 8   Global Step: 42140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:30,992-Speed 5565.08 samples/sec   Loss 6.7984   LearningRate 0.0340   Epoch: 8   Global Step: 42150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:32,822-Speed 5595.68 samples/sec   Loss 6.7559   LearningRate 0.0340   Epoch: 8   Global Step: 42160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:34,668-Speed 5553.50 samples/sec   Loss 6.6072   LearningRate 0.0340   Epoch: 8   Global Step: 42170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:59:36,506-Speed 5572.54 samples/sec   Loss 6.6640   LearningRate 0.0340   Epoch: 8   Global Step: 42180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:59:38,367-Speed 5505.94 samples/sec   Loss 6.7541   LearningRate 0.0340   Epoch: 8   Global Step: 42190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:59:40,201-Speed 5588.06 samples/sec   Loss 6.7303   LearningRate 0.0340   Epoch: 8   Global Step: 42200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 12:59:42,039-Speed 5573.35 samples/sec   Loss 6.8249   LearningRate 0.0340   Epoch: 8   Global Step: 42210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:43,881-Speed 5563.18 samples/sec   Loss 6.6799   LearningRate 0.0339   Epoch: 8   Global Step: 42220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:45,716-Speed 5580.69 samples/sec   Loss 6.8450   LearningRate 0.0339   Epoch: 8   Global Step: 42230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:47,569-Speed 5527.90 samples/sec   Loss 6.7494   LearningRate 0.0339   Epoch: 8   Global Step: 42240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:49,418-Speed 5543.96 samples/sec   Loss 6.7784   LearningRate 0.0339   Epoch: 8   Global Step: 42250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:51,276-Speed 5514.10 samples/sec   Loss 6.7476   LearningRate 0.0339   Epoch: 8   Global Step: 42260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:53,110-Speed 5585.39 samples/sec   Loss 6.8323   LearningRate 0.0339   Epoch: 8   Global Step: 42270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:54,941-Speed 5597.32 samples/sec   Loss 6.7838   LearningRate 0.0339   Epoch: 8   Global Step: 42280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:56,774-Speed 5588.34 samples/sec   Loss 6.8291   LearningRate 0.0339   Epoch: 8   Global Step: 42290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 12:59:58,621-Speed 5547.35 samples/sec   Loss 6.7886   LearningRate 0.0339   Epoch: 8   Global Step: 42300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:00,468-Speed 5548.21 samples/sec   Loss 6.8730   LearningRate 0.0338   Epoch: 8   Global Step: 42310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:00:02,312-Speed 5554.57 samples/sec   Loss 6.7171   LearningRate 0.0338   Epoch: 8   Global Step: 42320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:00:04,156-Speed 5558.15 samples/sec   Loss 6.7066   LearningRate 0.0338   Epoch: 8   Global Step: 42330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:05,994-Speed 5576.76 samples/sec   Loss 6.6483   LearningRate 0.0338   Epoch: 8   Global Step: 42340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:07,843-Speed 5540.52 samples/sec   Loss 6.8202   LearningRate 0.0338   Epoch: 8   Global Step: 42350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:09,703-Speed 5508.97 samples/sec   Loss 6.9380   LearningRate 0.0338   Epoch: 8   Global Step: 42360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:11,542-Speed 5570.06 samples/sec   Loss 6.9029   LearningRate 0.0338   Epoch: 8   Global Step: 42370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:13,405-Speed 5502.88 samples/sec   Loss 6.7246   LearningRate 0.0338   Epoch: 8   Global Step: 42380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:15,257-Speed 5533.90 samples/sec   Loss 6.7584   LearningRate 0.0338   Epoch: 8   Global Step: 42390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:17,121-Speed 5494.71 samples/sec   Loss 6.7873   LearningRate 0.0337   Epoch: 8   Global Step: 42400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:18,963-Speed 5565.01 samples/sec   Loss 6.8838   LearningRate 0.0337   Epoch: 8   Global Step: 42410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:20,837-Speed 5466.37 samples/sec   Loss 6.7079   LearningRate 0.0337   Epoch: 8   Global Step: 42420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:22,670-Speed 5588.26 samples/sec   Loss 6.8712   LearningRate 0.0337   Epoch: 8   Global Step: 42430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:00:24,539-Speed 5482.79 samples/sec   Loss 6.8627   LearningRate 0.0337   Epoch: 8   Global Step: 42440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:00:26,379-Speed 5567.00 samples/sec   Loss 6.8254   LearningRate 0.0337   Epoch: 8   Global Step: 42450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:28,233-Speed 5526.17 samples/sec   Loss 6.9395   LearningRate 0.0337   Epoch: 8   Global Step: 42460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:30,084-Speed 5538.35 samples/sec   Loss 6.6583   LearningRate 0.0337   Epoch: 8   Global Step: 42470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:31,929-Speed 5551.11 samples/sec   Loss 6.9545   LearningRate 0.0336   Epoch: 8   Global Step: 42480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:33,803-Speed 5502.56 samples/sec   Loss 6.7555   LearningRate 0.0336   Epoch: 8   Global Step: 42490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:35,660-Speed 5517.99 samples/sec   Loss 6.7420   LearningRate 0.0336   Epoch: 8   Global Step: 42500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:37,528-Speed 5484.02 samples/sec   Loss 6.7120   LearningRate 0.0336   Epoch: 8   Global Step: 42510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:39,410-Speed 5446.38 samples/sec   Loss 6.7066   LearningRate 0.0336   Epoch: 8   Global Step: 42520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:41,248-Speed 5574.89 samples/sec   Loss 6.8835   LearningRate 0.0336   Epoch: 8   Global Step: 42530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:43,093-Speed 5551.24 samples/sec   Loss 6.7890   LearningRate 0.0336   Epoch: 8   Global Step: 42540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:44,956-Speed 5500.86 samples/sec   Loss 6.7638   LearningRate 0.0336   Epoch: 8   Global Step: 42550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:00:46,784-Speed 5604.13 samples/sec   Loss 6.7055   LearningRate 0.0336   Epoch: 8   Global Step: 42560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:48,664-Speed 5449.57 samples/sec   Loss 6.7590   LearningRate 0.0335   Epoch: 8   Global Step: 42570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:50,504-Speed 5568.19 samples/sec   Loss 6.8193   LearningRate 0.0335   Epoch: 8   Global Step: 42580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:00:52,337-Speed 5590.89 samples/sec   Loss 6.6093   LearningRate 0.0335   Epoch: 8   Global Step: 42590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:00:54,199-Speed 5503.69 samples/sec   Loss 6.6793   LearningRate 0.0335   Epoch: 8   Global Step: 42600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:00:56,042-Speed 5557.15 samples/sec   Loss 7.0184   LearningRate 0.0335   Epoch: 8   Global Step: 42610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:00:57,878-Speed 5579.53 samples/sec   Loss 6.7545   LearningRate 0.0335   Epoch: 8   Global Step: 42620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:00:59,721-Speed 5560.24 samples/sec   Loss 6.7609   LearningRate 0.0335   Epoch: 8   Global Step: 42630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:01:01,596-Speed 5464.86 samples/sec   Loss 6.6182   LearningRate 0.0335   Epoch: 8   Global Step: 42640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:01:03,497-Speed 5388.78 samples/sec   Loss 6.7193   LearningRate 0.0335   Epoch: 8   Global Step: 42650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:01:05,338-Speed 5564.31 samples/sec   Loss 6.7362   LearningRate 0.0334   Epoch: 8   Global Step: 42660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:01:07,236-Speed 5399.36 samples/sec   Loss 6.7710   LearningRate 0.0334   Epoch: 8   Global Step: 42670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:01:09,077-Speed 5566.23 samples/sec   Loss 6.8273   LearningRate 0.0334   Epoch: 8   Global Step: 42680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:01:10,937-Speed 5507.97 samples/sec   Loss 6.7523   LearningRate 0.0334   Epoch: 8   Global Step: 42690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:12,774-Speed 5575.02 samples/sec   Loss 6.6902   LearningRate 0.0334   Epoch: 8   Global Step: 42700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:14,611-Speed 5577.09 samples/sec   Loss 6.7339   LearningRate 0.0334   Epoch: 8   Global Step: 42710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:16,480-Speed 5484.74 samples/sec   Loss 6.8472   LearningRate 0.0334   Epoch: 8   Global Step: 42720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:18,344-Speed 5495.88 samples/sec   Loss 6.8331   LearningRate 0.0334   Epoch: 8   Global Step: 42730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:20,194-Speed 5538.97 samples/sec   Loss 6.9060   LearningRate 0.0334   Epoch: 8   Global Step: 42740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:22,092-Speed 5397.42 samples/sec   Loss 6.9642   LearningRate 0.0333   Epoch: 8   Global Step: 42750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:23,928-Speed 5581.58 samples/sec   Loss 6.6711   LearningRate 0.0333   Epoch: 8   Global Step: 42760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:25,790-Speed 5502.72 samples/sec   Loss 6.8145   LearningRate 0.0333   Epoch: 8   Global Step: 42770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:27,639-Speed 5540.91 samples/sec   Loss 6.7969   LearningRate 0.0333   Epoch: 8   Global Step: 42780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:29,507-Speed 5485.43 samples/sec   Loss 6.7171   LearningRate 0.0333   Epoch: 8   Global Step: 42790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:01:31,345-Speed 5572.38 samples/sec   Loss 6.8266   LearningRate 0.0333   Epoch: 8   Global Step: 42800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:01:33,205-Speed 5508.05 samples/sec   Loss 6.8036   LearningRate 0.0333   Epoch: 8   Global Step: 42810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:01:35,062-Speed 5520.94 samples/sec   Loss 6.8098   LearningRate 0.0333   Epoch: 8   Global Step: 42820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:36,897-Speed 5583.79 samples/sec   Loss 6.7602   LearningRate 0.0332   Epoch: 8   Global Step: 42830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:38,778-Speed 5447.74 samples/sec   Loss 6.6805   LearningRate 0.0332   Epoch: 8   Global Step: 42840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:40,653-Speed 5463.36 samples/sec   Loss 6.8152   LearningRate 0.0332   Epoch: 8   Global Step: 42850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:42,519-Speed 5491.97 samples/sec   Loss 6.6921   LearningRate 0.0332   Epoch: 8   Global Step: 42860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:44,367-Speed 5544.64 samples/sec   Loss 6.8180   LearningRate 0.0332   Epoch: 8   Global Step: 42870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:46,205-Speed 5572.38 samples/sec   Loss 6.8339   LearningRate 0.0332   Epoch: 8   Global Step: 42880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:48,062-Speed 5518.21 samples/sec   Loss 6.8608   LearningRate 0.0332   Epoch: 8   Global Step: 42890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:49,973-Speed 5361.61 samples/sec   Loss 6.6613   LearningRate 0.0332   Epoch: 8   Global Step: 42900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:51,810-Speed 5576.83 samples/sec   Loss 6.6685   LearningRate 0.0332   Epoch: 8   Global Step: 42910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:53,658-Speed 5546.23 samples/sec   Loss 6.8886   LearningRate 0.0331   Epoch: 8   Global Step: 42920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:01:55,485-Speed 5607.34 samples/sec   Loss 6.8878   LearningRate 0.0331   Epoch: 8   Global Step: 42930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:57,329-Speed 5555.34 samples/sec   Loss 6.7349   LearningRate 0.0331   Epoch: 8   Global Step: 42940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:01:59,164-Speed 5583.53 samples/sec   Loss 6.7317   LearningRate 0.0331   Epoch: 8   Global Step: 42950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:01,010-Speed 5550.53 samples/sec   Loss 6.7626   LearningRate 0.0331   Epoch: 8   Global Step: 42960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:02,888-Speed 5454.55 samples/sec   Loss 6.7822   LearningRate 0.0331   Epoch: 8   Global Step: 42970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:04,747-Speed 5513.80 samples/sec   Loss 6.7680   LearningRate 0.0331   Epoch: 8   Global Step: 42980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:06,616-Speed 5481.28 samples/sec   Loss 6.7643   LearningRate 0.0331   Epoch: 8   Global Step: 42990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:08,455-Speed 5572.66 samples/sec   Loss 6.7828   LearningRate 0.0331   Epoch: 8   Global Step: 43000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:10,299-Speed 5555.90 samples/sec   Loss 6.6962   LearningRate 0.0330   Epoch: 8   Global Step: 43010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:12,161-Speed 5500.78 samples/sec   Loss 6.7630   LearningRate 0.0330   Epoch: 8   Global Step: 43020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:14,068-Speed 5373.22 samples/sec   Loss 6.7574   LearningRate 0.0330   Epoch: 8   Global Step: 43030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:02:15,907-Speed 5571.68 samples/sec   Loss 6.6339   LearningRate 0.0330   Epoch: 8   Global Step: 43040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:17,786-Speed 5451.01 samples/sec   Loss 6.7419   LearningRate 0.0330   Epoch: 8   Global Step: 43050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:19,641-Speed 5524.45 samples/sec   Loss 6.6245   LearningRate 0.0330   Epoch: 8   Global Step: 43060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:21,511-Speed 5479.30 samples/sec   Loss 6.7720   LearningRate 0.0330   Epoch: 8   Global Step: 43070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:23,361-Speed 5535.91 samples/sec   Loss 6.5486   LearningRate 0.0330   Epoch: 8   Global Step: 43080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:25,219-Speed 5516.87 samples/sec   Loss 6.8924   LearningRate 0.0330   Epoch: 8   Global Step: 43090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:27,076-Speed 5517.47 samples/sec   Loss 6.9396   LearningRate 0.0329   Epoch: 8   Global Step: 43100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:28,914-Speed 5576.21 samples/sec   Loss 6.7810   LearningRate 0.0329   Epoch: 8   Global Step: 43110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:30,760-Speed 5551.58 samples/sec   Loss 6.7275   LearningRate 0.0329   Epoch: 8   Global Step: 43120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:02:32,626-Speed 5489.90 samples/sec   Loss 6.6861   LearningRate 0.0329   Epoch: 8   Global Step: 43130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:02:34,490-Speed 5499.12 samples/sec   Loss 6.7367   LearningRate 0.0329   Epoch: 8   Global Step: 43140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:02:36,344-Speed 5526.23 samples/sec   Loss 6.7123   LearningRate 0.0329   Epoch: 8   Global Step: 43150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:02:38,238-Speed 5408.83 samples/sec   Loss 6.7597   LearningRate 0.0329   Epoch: 8   Global Step: 43160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:02:40,086-Speed 5544.60 samples/sec   Loss 6.9539   LearningRate 0.0329   Epoch: 8   Global Step: 43170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:02:41,955-Speed 5481.56 samples/sec   Loss 6.7419   LearningRate 0.0329   Epoch: 8   Global Step: 43180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:02:43,809-Speed 5527.17 samples/sec   Loss 6.8551   LearningRate 0.0328   Epoch: 8   Global Step: 43190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:02:45,671-Speed 5504.10 samples/sec   Loss 6.7676   LearningRate 0.0328   Epoch: 8   Global Step: 43200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:02:47,543-Speed 5472.43 samples/sec   Loss 6.6800   LearningRate 0.0328   Epoch: 8   Global Step: 43210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:02:49,378-Speed 5584.04 samples/sec   Loss 6.7141   LearningRate 0.0328   Epoch: 8   Global Step: 43220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:51,248-Speed 5478.81 samples/sec   Loss 6.7122   LearningRate 0.0328   Epoch: 8   Global Step: 43230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:53,099-Speed 5535.45 samples/sec   Loss 6.6192   LearningRate 0.0328   Epoch: 8   Global Step: 43240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:54,967-Speed 5486.76 samples/sec   Loss 6.8915   LearningRate 0.0328   Epoch: 8   Global Step: 43250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:56,812-Speed 5551.26 samples/sec   Loss 6.6949   LearningRate 0.0328   Epoch: 8   Global Step: 43260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:02:58,695-Speed 5442.90 samples/sec   Loss 6.8410   LearningRate 0.0327   Epoch: 8   Global Step: 43270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:00,573-Speed 5454.48 samples/sec   Loss 6.7734   LearningRate 0.0327   Epoch: 8   Global Step: 43280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:02,422-Speed 5540.19 samples/sec   Loss 6.8076   LearningRate 0.0327   Epoch: 8   Global Step: 43290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:04,277-Speed 5523.51 samples/sec   Loss 6.6870   LearningRate 0.0327   Epoch: 8   Global Step: 43300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:06,127-Speed 5537.39 samples/sec   Loss 6.8483   LearningRate 0.0327   Epoch: 8   Global Step: 43310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:07,970-Speed 5558.77 samples/sec   Loss 6.7852   LearningRate 0.0327   Epoch: 8   Global Step: 43320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:03:09,796-Speed 5611.20 samples/sec   Loss 6.5973   LearningRate 0.0327   Epoch: 8   Global Step: 43330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:11,634-Speed 5574.78 samples/sec   Loss 6.8725   LearningRate 0.0327   Epoch: 8   Global Step: 43340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:13,512-Speed 5455.24 samples/sec   Loss 6.7624   LearningRate 0.0327   Epoch: 8   Global Step: 43350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:15,372-Speed 5507.27 samples/sec   Loss 6.8798   LearningRate 0.0326   Epoch: 8   Global Step: 43360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:17,222-Speed 5539.99 samples/sec   Loss 6.6991   LearningRate 0.0326   Epoch: 8   Global Step: 43370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:19,088-Speed 5491.84 samples/sec   Loss 6.5953   LearningRate 0.0326   Epoch: 8   Global Step: 43380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:20,934-Speed 5547.35 samples/sec   Loss 6.7638   LearningRate 0.0326   Epoch: 8   Global Step: 43390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:22,812-Speed 5457.24 samples/sec   Loss 6.7378   LearningRate 0.0326   Epoch: 8   Global Step: 43400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:24,663-Speed 5535.78 samples/sec   Loss 6.7009   LearningRate 0.0326   Epoch: 8   Global Step: 43410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:26,502-Speed 5571.45 samples/sec   Loss 6.6958   LearningRate 0.0326   Epoch: 8   Global Step: 43420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:28,369-Speed 5485.80 samples/sec   Loss 6.8362   LearningRate 0.0326   Epoch: 8   Global Step: 43430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:03:30,197-Speed 5607.60 samples/sec   Loss 6.7564   LearningRate 0.0326   Epoch: 8   Global Step: 43440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:32,066-Speed 5479.04 samples/sec   Loss 6.8192   LearningRate 0.0325   Epoch: 8   Global Step: 43450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:33,943-Speed 5461.45 samples/sec   Loss 6.6504   LearningRate 0.0325   Epoch: 8   Global Step: 43460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:35,780-Speed 5574.99 samples/sec   Loss 6.8304   LearningRate 0.0325   Epoch: 8   Global Step: 43470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:37,624-Speed 5557.16 samples/sec   Loss 6.8571   LearningRate 0.0325   Epoch: 8   Global Step: 43480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:39,474-Speed 5539.52 samples/sec   Loss 6.7873   LearningRate 0.0325   Epoch: 8   Global Step: 43490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:41,309-Speed 5583.92 samples/sec   Loss 6.7943   LearningRate 0.0325   Epoch: 8   Global Step: 43500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:43,177-Speed 5483.33 samples/sec   Loss 6.6718   LearningRate 0.0325   Epoch: 8   Global Step: 43510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:45,012-Speed 5583.70 samples/sec   Loss 6.7285   LearningRate 0.0325   Epoch: 8   Global Step: 43520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:46,887-Speed 5464.49 samples/sec   Loss 6.7506   LearningRate 0.0325   Epoch: 8   Global Step: 43530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:48,727-Speed 5569.21 samples/sec   Loss 6.7111   LearningRate 0.0324   Epoch: 8   Global Step: 43540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:03:50,585-Speed 5514.11 samples/sec   Loss 6.7940   LearningRate 0.0324   Epoch: 8   Global Step: 43550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:03:52,432-Speed 5547.15 samples/sec   Loss 6.8769   LearningRate 0.0324   Epoch: 8   Global Step: 43560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:03:54,288-Speed 5520.44 samples/sec   Loss 6.7130   LearningRate 0.0324   Epoch: 8   Global Step: 43570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:03:56,130-Speed 5561.54 samples/sec   Loss 6.6874   LearningRate 0.0324   Epoch: 8   Global Step: 43580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:57,999-Speed 5483.67 samples/sec   Loss 6.6647   LearningRate 0.0324   Epoch: 8   Global Step: 43590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:03:59,843-Speed 5558.59 samples/sec   Loss 6.7114   LearningRate 0.0324   Epoch: 8   Global Step: 43600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:01,724-Speed 5447.13 samples/sec   Loss 6.6955   LearningRate 0.0324   Epoch: 8   Global Step: 43610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:03,576-Speed 5532.90 samples/sec   Loss 6.6631   LearningRate 0.0324   Epoch: 8   Global Step: 43620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:05,412-Speed 5581.97 samples/sec   Loss 6.6863   LearningRate 0.0323   Epoch: 8   Global Step: 43630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:07,276-Speed 5495.99 samples/sec   Loss 6.7075   LearningRate 0.0323   Epoch: 8   Global Step: 43640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:09,110-Speed 5587.15 samples/sec   Loss 6.7660   LearningRate 0.0323   Epoch: 8   Global Step: 43650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:10,951-Speed 5564.55 samples/sec   Loss 6.7868   LearningRate 0.0323   Epoch: 8   Global Step: 43660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:12,814-Speed 5500.11 samples/sec   Loss 6.8824   LearningRate 0.0323   Epoch: 8   Global Step: 43670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:14,649-Speed 5584.76 samples/sec   Loss 6.8022   LearningRate 0.0323   Epoch: 8   Global Step: 43680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:16,511-Speed 5500.79 samples/sec   Loss 6.7788   LearningRate 0.0323   Epoch: 8   Global Step: 43690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:18,377-Speed 5489.54 samples/sec   Loss 6.9468   LearningRate 0.0323   Epoch: 8   Global Step: 43700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:20,243-Speed 5493.21 samples/sec   Loss 6.8303   LearningRate 0.0323   Epoch: 8   Global Step: 43710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:22,089-Speed 5551.27 samples/sec   Loss 6.7458   LearningRate 0.0322   Epoch: 8   Global Step: 43720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:23,959-Speed 5478.80 samples/sec   Loss 6.7762   LearningRate 0.0322   Epoch: 8   Global Step: 43730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:25,799-Speed 5569.02 samples/sec   Loss 6.8095   LearningRate 0.0322   Epoch: 8   Global Step: 43740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:27,678-Speed 5453.20 samples/sec   Loss 6.8189   LearningRate 0.0322   Epoch: 8   Global Step: 43750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:29,523-Speed 5553.43 samples/sec   Loss 6.8379   LearningRate 0.0322   Epoch: 8   Global Step: 43760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:31,382-Speed 5515.37 samples/sec   Loss 6.5804   LearningRate 0.0322   Epoch: 8   Global Step: 43770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:33,234-Speed 5531.95 samples/sec   Loss 6.9125   LearningRate 0.0322   Epoch: 8   Global Step: 43780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:04:35,071-Speed 5577.97 samples/sec   Loss 6.6765   LearningRate 0.0322   Epoch: 8   Global Step: 43790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:36,920-Speed 5542.45 samples/sec   Loss 6.7678   LearningRate 0.0322   Epoch: 8   Global Step: 43800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:38,793-Speed 5470.14 samples/sec   Loss 6.8545   LearningRate 0.0321   Epoch: 8   Global Step: 43810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:04:40,633-Speed 5567.97 samples/sec   Loss 6.5810   LearningRate 0.0321   Epoch: 8   Global Step: 43820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:04:42,521-Speed 5426.42 samples/sec   Loss 6.6950   LearningRate 0.0321   Epoch: 8   Global Step: 43830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:04:44,369-Speed 5546.34 samples/sec   Loss 6.5884   LearningRate 0.0321   Epoch: 8   Global Step: 43840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:04:46,203-Speed 5585.53 samples/sec   Loss 6.7142   LearningRate 0.0321   Epoch: 8   Global Step: 43850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:04:48,051-Speed 5542.28 samples/sec   Loss 6.6108   LearningRate 0.0321   Epoch: 8   Global Step: 43860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:04:49,888-Speed 5580.52 samples/sec   Loss 6.5762   LearningRate 0.0321   Epoch: 8   Global Step: 43870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:04:51,806-Speed 5340.55 samples/sec   Loss 6.6743   LearningRate 0.0321   Epoch: 8   Global Step: 43880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:04:53,669-Speed 5499.36 samples/sec   Loss 6.7076   LearningRate 0.0321   Epoch: 8   Global Step: 43890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:04:55,510-Speed 5564.60 samples/sec   Loss 6.6232   LearningRate 0.0320   Epoch: 8   Global Step: 43900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:04:57,356-Speed 5549.33 samples/sec   Loss 6.6856   LearningRate 0.0320   Epoch: 8   Global Step: 43910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:04:59,246-Speed 5422.45 samples/sec   Loss 6.5932   LearningRate 0.0320   Epoch: 8   Global Step: 43920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:05:01,095-Speed 5541.03 samples/sec   Loss 6.6491   LearningRate 0.0320   Epoch: 8   Global Step: 43930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:05:02,959-Speed 5495.28 samples/sec   Loss 6.7652   LearningRate 0.0320   Epoch: 8   Global Step: 43940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:05:04,838-Speed 5455.87 samples/sec   Loss 6.8078   LearningRate 0.0320   Epoch: 8   Global Step: 43950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:05:06,677-Speed 5569.51 samples/sec   Loss 6.6820   LearningRate 0.0320   Epoch: 8   Global Step: 43960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:05:08,577-Speed 5393.71 samples/sec   Loss 6.6173   LearningRate 0.0320   Epoch: 8   Global Step: 43970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:05:10,426-Speed 5542.24 samples/sec   Loss 6.6847   LearningRate 0.0319   Epoch: 8   Global Step: 43980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:05:12,297-Speed 5475.36 samples/sec   Loss 6.6059   LearningRate 0.0319   Epoch: 8   Global Step: 43990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:05:14,143-Speed 5552.14 samples/sec   Loss 6.8991   LearningRate 0.0319   Epoch: 8   Global Step: 44000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:05:41,473-[lfw][44000]XNorm: 22.386428
Training: 2022-04-11 13:05:41,474-[lfw][44000]Accuracy-Flip: 0.99683+-0.00283
Training: 2022-04-11 13:05:41,475-[lfw][44000]Accuracy-Highest: 0.99767
Training: 2022-04-11 13:06:13,056-[cfp_fp][44000]XNorm: 19.237414
Training: 2022-04-11 13:06:13,057-[cfp_fp][44000]Accuracy-Flip: 0.96771+-0.00698
Training: 2022-04-11 13:06:13,058-[cfp_fp][44000]Accuracy-Highest: 0.96771
Training: 2022-04-11 13:06:40,325-[agedb_30][44000]XNorm: 22.253874
Training: 2022-04-11 13:06:40,325-[agedb_30][44000]Accuracy-Flip: 0.97683+-0.00689
Training: 2022-04-11 13:06:40,326-[agedb_30][44000]Accuracy-Highest: 0.97683
Training: 2022-04-11 13:06:42,179-Speed 116.32 samples/sec   Loss 6.7326   LearningRate 0.0319   Epoch: 8   Global Step: 44010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:06:44,032-Speed 5529.78 samples/sec   Loss 6.8239   LearningRate 0.0319   Epoch: 8   Global Step: 44020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:06:45,859-Speed 5607.76 samples/sec   Loss 6.8913   LearningRate 0.0319   Epoch: 8   Global Step: 44030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:06:47,712-Speed 5527.30 samples/sec   Loss 6.8547   LearningRate 0.0319   Epoch: 8   Global Step: 44040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:06:49,561-Speed 5541.79 samples/sec   Loss 6.7406   LearningRate 0.0319   Epoch: 8   Global Step: 44050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:06:51,405-Speed 5555.69 samples/sec   Loss 6.7493   LearningRate 0.0319   Epoch: 8   Global Step: 44060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:06:53,259-Speed 5528.53 samples/sec   Loss 6.6830   LearningRate 0.0318   Epoch: 8   Global Step: 44070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:06:55,123-Speed 5494.40 samples/sec   Loss 6.7764   LearningRate 0.0318   Epoch: 8   Global Step: 44080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:06:56,978-Speed 5527.01 samples/sec   Loss 6.6421   LearningRate 0.0318   Epoch: 8   Global Step: 44090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:06:58,819-Speed 5562.73 samples/sec   Loss 6.7382   LearningRate 0.0318   Epoch: 8   Global Step: 44100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:00,653-Speed 5588.10 samples/sec   Loss 6.7964   LearningRate 0.0318   Epoch: 8   Global Step: 44110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:02,479-Speed 5611.67 samples/sec   Loss 6.6929   LearningRate 0.0318   Epoch: 8   Global Step: 44120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:04,327-Speed 5544.84 samples/sec   Loss 6.7023   LearningRate 0.0318   Epoch: 8   Global Step: 44130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:06,172-Speed 5553.10 samples/sec   Loss 6.7270   LearningRate 0.0318   Epoch: 8   Global Step: 44140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:08,045-Speed 5470.36 samples/sec   Loss 6.7858   LearningRate 0.0318   Epoch: 8   Global Step: 44150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:09,878-Speed 5590.70 samples/sec   Loss 6.5946   LearningRate 0.0317   Epoch: 8   Global Step: 44160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:11,723-Speed 5554.03 samples/sec   Loss 6.6202   LearningRate 0.0317   Epoch: 8   Global Step: 44170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:13,564-Speed 5564.81 samples/sec   Loss 6.6746   LearningRate 0.0317   Epoch: 8   Global Step: 44180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:15,438-Speed 5466.13 samples/sec   Loss 6.5878   LearningRate 0.0317   Epoch: 8   Global Step: 44190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:17,276-Speed 5573.98 samples/sec   Loss 6.7490   LearningRate 0.0317   Epoch: 8   Global Step: 44200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:19,118-Speed 5563.20 samples/sec   Loss 6.7334   LearningRate 0.0317   Epoch: 8   Global Step: 44210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:20,967-Speed 5541.45 samples/sec   Loss 6.6866   LearningRate 0.0317   Epoch: 8   Global Step: 44220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:07:22,851-Speed 5438.43 samples/sec   Loss 6.9000   LearningRate 0.0317   Epoch: 8   Global Step: 44230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:24,687-Speed 5580.09 samples/sec   Loss 6.7226   LearningRate 0.0317   Epoch: 8   Global Step: 44240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:26,549-Speed 5500.58 samples/sec   Loss 6.8437   LearningRate 0.0316   Epoch: 8   Global Step: 44250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:28,396-Speed 5548.73 samples/sec   Loss 6.6481   LearningRate 0.0316   Epoch: 8   Global Step: 44260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:30,255-Speed 5513.54 samples/sec   Loss 6.7457   LearningRate 0.0316   Epoch: 8   Global Step: 44270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:32,087-Speed 5590.59 samples/sec   Loss 6.7793   LearningRate 0.0316   Epoch: 8   Global Step: 44280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:33,960-Speed 5471.99 samples/sec   Loss 6.6992   LearningRate 0.0316   Epoch: 8   Global Step: 44290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:35,825-Speed 5495.83 samples/sec   Loss 6.6044   LearningRate 0.0316   Epoch: 8   Global Step: 44300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:37,665-Speed 5568.28 samples/sec   Loss 6.6188   LearningRate 0.0316   Epoch: 8   Global Step: 44310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:39,518-Speed 5525.76 samples/sec   Loss 6.7471   LearningRate 0.0316   Epoch: 8   Global Step: 44320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:41,352-Speed 5588.74 samples/sec   Loss 6.7097   LearningRate 0.0316   Epoch: 8   Global Step: 44330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:07:43,179-Speed 5605.76 samples/sec   Loss 6.9512   LearningRate 0.0315   Epoch: 8   Global Step: 44340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:45,022-Speed 5556.98 samples/sec   Loss 6.5896   LearningRate 0.0315   Epoch: 8   Global Step: 44350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:46,863-Speed 5566.80 samples/sec   Loss 6.6378   LearningRate 0.0315   Epoch: 8   Global Step: 44360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:48,713-Speed 5537.77 samples/sec   Loss 6.7250   LearningRate 0.0315   Epoch: 8   Global Step: 44370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:50,570-Speed 5520.37 samples/sec   Loss 6.6856   LearningRate 0.0315   Epoch: 8   Global Step: 44380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:52,418-Speed 5542.75 samples/sec   Loss 6.7420   LearningRate 0.0315   Epoch: 8   Global Step: 44390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:54,249-Speed 5593.09 samples/sec   Loss 6.5805   LearningRate 0.0315   Epoch: 8   Global Step: 44400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:56,080-Speed 5596.64 samples/sec   Loss 6.6386   LearningRate 0.0315   Epoch: 8   Global Step: 44410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:57,912-Speed 5591.48 samples/sec   Loss 6.7017   LearningRate 0.0315   Epoch: 8   Global Step: 44420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:07:59,778-Speed 5491.87 samples/sec   Loss 6.7126   LearningRate 0.0314   Epoch: 8   Global Step: 44430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:01,611-Speed 5588.48 samples/sec   Loss 6.5759   LearningRate 0.0314   Epoch: 8   Global Step: 44440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:03,460-Speed 5543.93 samples/sec   Loss 6.6771   LearningRate 0.0314   Epoch: 8   Global Step: 44450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:05,302-Speed 5565.67 samples/sec   Loss 6.7249   LearningRate 0.0314   Epoch: 8   Global Step: 44460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:07,147-Speed 5550.83 samples/sec   Loss 6.7883   LearningRate 0.0314   Epoch: 8   Global Step: 44470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:08,983-Speed 5579.18 samples/sec   Loss 6.7260   LearningRate 0.0314   Epoch: 8   Global Step: 44480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:10,822-Speed 5574.29 samples/sec   Loss 6.6477   LearningRate 0.0314   Epoch: 8   Global Step: 44490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:12,674-Speed 5530.27 samples/sec   Loss 6.5816   LearningRate 0.0314   Epoch: 8   Global Step: 44500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:14,516-Speed 5563.28 samples/sec   Loss 6.5953   LearningRate 0.0314   Epoch: 8   Global Step: 44510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:16,396-Speed 5449.92 samples/sec   Loss 6.6799   LearningRate 0.0313   Epoch: 8   Global Step: 44520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:18,240-Speed 5557.36 samples/sec   Loss 6.6589   LearningRate 0.0313   Epoch: 8   Global Step: 44530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:20,073-Speed 5589.70 samples/sec   Loss 6.7970   LearningRate 0.0313   Epoch: 8   Global Step: 44540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:08:21,914-Speed 5564.25 samples/sec   Loss 6.6431   LearningRate 0.0313   Epoch: 8   Global Step: 44550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:08:23,739-Speed 5614.90 samples/sec   Loss 6.7539   LearningRate 0.0313   Epoch: 8   Global Step: 44560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:25,600-Speed 5507.10 samples/sec   Loss 6.8352   LearningRate 0.0313   Epoch: 8   Global Step: 44570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:27,456-Speed 5519.11 samples/sec   Loss 6.6963   LearningRate 0.0313   Epoch: 8   Global Step: 44580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:29,308-Speed 5530.49 samples/sec   Loss 6.7956   LearningRate 0.0313   Epoch: 8   Global Step: 44590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:31,143-Speed 5586.80 samples/sec   Loss 6.7259   LearningRate 0.0313   Epoch: 8   Global Step: 44600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:32,996-Speed 5528.95 samples/sec   Loss 6.5131   LearningRate 0.0312   Epoch: 8   Global Step: 44610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:34,842-Speed 5549.57 samples/sec   Loss 6.6936   LearningRate 0.0312   Epoch: 8   Global Step: 44620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:36,700-Speed 5514.87 samples/sec   Loss 6.5235   LearningRate 0.0312   Epoch: 8   Global Step: 44630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:38,563-Speed 5499.32 samples/sec   Loss 6.6691   LearningRate 0.0312   Epoch: 8   Global Step: 44640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:08:40,405-Speed 5562.13 samples/sec   Loss 6.5196   LearningRate 0.0312   Epoch: 8   Global Step: 44650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:08:42,236-Speed 5596.43 samples/sec   Loss 6.6361   LearningRate 0.0312   Epoch: 8   Global Step: 44660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:08:44,090-Speed 5525.83 samples/sec   Loss 6.6451   LearningRate 0.0312   Epoch: 8   Global Step: 44670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:08:45,926-Speed 5582.84 samples/sec   Loss 6.7580   LearningRate 0.0312   Epoch: 8   Global Step: 44680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:08:47,764-Speed 5571.74 samples/sec   Loss 6.6453   LearningRate 0.0312   Epoch: 8   Global Step: 44690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:08:49,608-Speed 5555.67 samples/sec   Loss 6.6466   LearningRate 0.0312   Epoch: 8   Global Step: 44700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:08:51,454-Speed 5552.51 samples/sec   Loss 6.5198   LearningRate 0.0311   Epoch: 8   Global Step: 44710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:08:53,306-Speed 5532.92 samples/sec   Loss 6.5789   LearningRate 0.0311   Epoch: 8   Global Step: 44720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:08:55,138-Speed 5590.22 samples/sec   Loss 6.7030   LearningRate 0.0311   Epoch: 8   Global Step: 44730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:08:56,972-Speed 5587.81 samples/sec   Loss 6.6412   LearningRate 0.0311   Epoch: 8   Global Step: 44740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:08:58,821-Speed 5540.03 samples/sec   Loss 6.6910   LearningRate 0.0311   Epoch: 8   Global Step: 44750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:00,660-Speed 5573.42 samples/sec   Loss 6.7024   LearningRate 0.0311   Epoch: 8   Global Step: 44760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:02,520-Speed 5506.41 samples/sec   Loss 6.6617   LearningRate 0.0311   Epoch: 8   Global Step: 44770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:04,392-Speed 5474.18 samples/sec   Loss 6.7385   LearningRate 0.0311   Epoch: 8   Global Step: 44780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:06,254-Speed 5503.49 samples/sec   Loss 6.6167   LearningRate 0.0311   Epoch: 8   Global Step: 44790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:08,115-Speed 5503.73 samples/sec   Loss 6.6625   LearningRate 0.0310   Epoch: 8   Global Step: 44800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:09,949-Speed 5586.81 samples/sec   Loss 6.6160   LearningRate 0.0310   Epoch: 8   Global Step: 44810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:11,818-Speed 5481.54 samples/sec   Loss 6.5325   LearningRate 0.0310   Epoch: 8   Global Step: 44820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:13,662-Speed 5556.05 samples/sec   Loss 6.7730   LearningRate 0.0310   Epoch: 8   Global Step: 44830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:15,502-Speed 5569.46 samples/sec   Loss 6.6989   LearningRate 0.0310   Epoch: 8   Global Step: 44840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:17,348-Speed 5548.62 samples/sec   Loss 6.6956   LearningRate 0.0310   Epoch: 8   Global Step: 44850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:09:19,206-Speed 5513.81 samples/sec   Loss 6.5592   LearningRate 0.0310   Epoch: 8   Global Step: 44860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:09:21,073-Speed 5488.35 samples/sec   Loss 6.6674   LearningRate 0.0310   Epoch: 8   Global Step: 44870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:22,912-Speed 5571.63 samples/sec   Loss 6.6798   LearningRate 0.0310   Epoch: 8   Global Step: 44880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:24,750-Speed 5574.57 samples/sec   Loss 6.5952   LearningRate 0.0309   Epoch: 8   Global Step: 44890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:26,604-Speed 5527.00 samples/sec   Loss 6.5589   LearningRate 0.0309   Epoch: 8   Global Step: 44900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:28,443-Speed 5568.94 samples/sec   Loss 6.8130   LearningRate 0.0309   Epoch: 8   Global Step: 44910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:30,287-Speed 5556.50 samples/sec   Loss 6.6648   LearningRate 0.0309   Epoch: 8   Global Step: 44920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:32,125-Speed 5577.64 samples/sec   Loss 6.6482   LearningRate 0.0309   Epoch: 8   Global Step: 44930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:33,964-Speed 5569.04 samples/sec   Loss 6.7391   LearningRate 0.0309   Epoch: 8   Global Step: 44940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:35,799-Speed 5585.37 samples/sec   Loss 6.5748   LearningRate 0.0309   Epoch: 8   Global Step: 44950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:37,632-Speed 5588.33 samples/sec   Loss 6.6336   LearningRate 0.0309   Epoch: 8   Global Step: 44960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:39,477-Speed 5555.43 samples/sec   Loss 6.6400   LearningRate 0.0309   Epoch: 8   Global Step: 44970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 13:09:41,333-Speed 5519.30 samples/sec   Loss 6.6867   LearningRate 0.0308   Epoch: 8   Global Step: 44980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:09:43,164-Speed 5595.40 samples/sec   Loss 6.6721   LearningRate 0.0308   Epoch: 8   Global Step: 44990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:09:45,018-Speed 5527.35 samples/sec   Loss 6.5759   LearningRate 0.0308   Epoch: 8   Global Step: 45000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:09:46,859-Speed 5566.02 samples/sec   Loss 6.5959   LearningRate 0.0308   Epoch: 8   Global Step: 45010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:09:48,705-Speed 5550.39 samples/sec   Loss 6.6286   LearningRate 0.0308   Epoch: 8   Global Step: 45020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:09:50,582-Speed 5488.30 samples/sec   Loss 6.6051   LearningRate 0.0308   Epoch: 8   Global Step: 45030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:09:52,441-Speed 5509.32 samples/sec   Loss 6.7178   LearningRate 0.0308   Epoch: 8   Global Step: 45040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:09:54,312-Speed 5477.34 samples/sec   Loss 6.6482   LearningRate 0.0308   Epoch: 8   Global Step: 45050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:09:56,147-Speed 5586.09 samples/sec   Loss 6.5152   LearningRate 0.0308   Epoch: 8   Global Step: 45060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:09:58,027-Speed 5447.94 samples/sec   Loss 6.6214   LearningRate 0.0307   Epoch: 8   Global Step: 45070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:09:59,902-Speed 5464.95 samples/sec   Loss 6.5186   LearningRate 0.0307   Epoch: 8   Global Step: 45080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:10:01,739-Speed 5576.39 samples/sec   Loss 6.6178   LearningRate 0.0307   Epoch: 8   Global Step: 45090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:03,579-Speed 5569.99 samples/sec   Loss 6.6495   LearningRate 0.0307   Epoch: 8   Global Step: 45100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:05,423-Speed 5554.12 samples/sec   Loss 6.6696   LearningRate 0.0307   Epoch: 8   Global Step: 45110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:07,263-Speed 5569.37 samples/sec   Loss 6.6277   LearningRate 0.0307   Epoch: 8   Global Step: 45120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:09,103-Speed 5569.85 samples/sec   Loss 6.4199   LearningRate 0.0307   Epoch: 8   Global Step: 45130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:10,972-Speed 5482.60 samples/sec   Loss 6.5519   LearningRate 0.0307   Epoch: 8   Global Step: 45140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:12,817-Speed 5551.79 samples/sec   Loss 6.6577   LearningRate 0.0307   Epoch: 8   Global Step: 45150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:14,694-Speed 5457.36 samples/sec   Loss 6.5705   LearningRate 0.0306   Epoch: 8   Global Step: 45160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:16,535-Speed 5566.80 samples/sec   Loss 6.6977   LearningRate 0.0306   Epoch: 8   Global Step: 45170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:18,395-Speed 5508.86 samples/sec   Loss 6.5451   LearningRate 0.0306   Epoch: 8   Global Step: 45180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:20,227-Speed 5592.86 samples/sec   Loss 6.6171   LearningRate 0.0306   Epoch: 8   Global Step: 45190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:22,066-Speed 5571.06 samples/sec   Loss 6.5480   LearningRate 0.0306   Epoch: 8   Global Step: 45200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:23,915-Speed 5541.09 samples/sec   Loss 6.6105   LearningRate 0.0306   Epoch: 8   Global Step: 45210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:25,755-Speed 5568.56 samples/sec   Loss 6.6943   LearningRate 0.0306   Epoch: 8   Global Step: 45220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:27,594-Speed 5572.18 samples/sec   Loss 6.6509   LearningRate 0.0306   Epoch: 8   Global Step: 45230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:29,429-Speed 5581.35 samples/sec   Loss 6.7697   LearningRate 0.0306   Epoch: 8   Global Step: 45240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:10:31,320-Speed 5420.35 samples/sec   Loss 6.6766   LearningRate 0.0305   Epoch: 8   Global Step: 45250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:10:33,152-Speed 5589.46 samples/sec   Loss 6.6160   LearningRate 0.0305   Epoch: 8   Global Step: 45260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:10:34,991-Speed 5570.61 samples/sec   Loss 6.5437   LearningRate 0.0305   Epoch: 8   Global Step: 45270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:10:36,857-Speed 5491.51 samples/sec   Loss 6.6718   LearningRate 0.0305   Epoch: 8   Global Step: 45280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:10:38,719-Speed 5501.86 samples/sec   Loss 6.5218   LearningRate 0.0305   Epoch: 8   Global Step: 45290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:10:40,567-Speed 5545.31 samples/sec   Loss 6.5661   LearningRate 0.0305   Epoch: 8   Global Step: 45300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:10:42,406-Speed 5569.70 samples/sec   Loss 6.6692   LearningRate 0.0305   Epoch: 8   Global Step: 45310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:10:44,239-Speed 5587.85 samples/sec   Loss 6.5695   LearningRate 0.0305   Epoch: 8   Global Step: 45320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:10:46,078-Speed 5572.48 samples/sec   Loss 6.7441   LearningRate 0.0305   Epoch: 8   Global Step: 45330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:10:47,911-Speed 5589.01 samples/sec   Loss 6.5330   LearningRate 0.0304   Epoch: 8   Global Step: 45340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:49,752-Speed 5563.97 samples/sec   Loss 6.4978   LearningRate 0.0304   Epoch: 8   Global Step: 45350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:51,597-Speed 5555.98 samples/sec   Loss 6.5636   LearningRate 0.0304   Epoch: 8   Global Step: 45360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:10:53,441-Speed 5554.21 samples/sec   Loss 6.6404   LearningRate 0.0304   Epoch: 8   Global Step: 45370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:10:55,298-Speed 5515.82 samples/sec   Loss 6.6043   LearningRate 0.0304   Epoch: 8   Global Step: 45380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:10:57,139-Speed 5567.85 samples/sec   Loss 6.7940   LearningRate 0.0304   Epoch: 8   Global Step: 45390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:10:59,002-Speed 5496.25 samples/sec   Loss 6.7159   LearningRate 0.0304   Epoch: 8   Global Step: 45400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:11:00,838-Speed 5583.47 samples/sec   Loss 6.4973   LearningRate 0.0304   Epoch: 8   Global Step: 45410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:11:02,673-Speed 5580.69 samples/sec   Loss 6.5654   LearningRate 0.0304   Epoch: 8   Global Step: 45420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:11:04,515-Speed 5562.32 samples/sec   Loss 6.6857   LearningRate 0.0304   Epoch: 8   Global Step: 45430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:11:06,361-Speed 5548.99 samples/sec   Loss 6.6387   LearningRate 0.0303   Epoch: 8   Global Step: 45440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:11:08,222-Speed 5506.11 samples/sec   Loss 6.6281   LearningRate 0.0303   Epoch: 8   Global Step: 45450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:11:10,067-Speed 5551.00 samples/sec   Loss 6.7118   LearningRate 0.0303   Epoch: 8   Global Step: 45460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:11:11,942-Speed 5465.62 samples/sec   Loss 6.6318   LearningRate 0.0303   Epoch: 8   Global Step: 45470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:13,818-Speed 5460.11 samples/sec   Loss 6.6835   LearningRate 0.0303   Epoch: 8   Global Step: 45480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:15,672-Speed 5528.37 samples/sec   Loss 6.7797   LearningRate 0.0303   Epoch: 8   Global Step: 45490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:17,504-Speed 5592.80 samples/sec   Loss 6.6390   LearningRate 0.0303   Epoch: 8   Global Step: 45500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:19,336-Speed 5591.52 samples/sec   Loss 6.6109   LearningRate 0.0303   Epoch: 8   Global Step: 45510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:21,237-Speed 5391.20 samples/sec   Loss 6.5169   LearningRate 0.0303   Epoch: 8   Global Step: 45520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:32,363-Speed 920.42 samples/sec   Loss 5.9587   LearningRate 0.0302   Epoch: 9   Global Step: 45530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:34,241-Speed 5455.42 samples/sec   Loss 5.6053   LearningRate 0.0302   Epoch: 9   Global Step: 45540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:36,085-Speed 5555.42 samples/sec   Loss 5.8099   LearningRate 0.0302   Epoch: 9   Global Step: 45550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:37,950-Speed 5494.39 samples/sec   Loss 5.7144   LearningRate 0.0302   Epoch: 9   Global Step: 45560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:39,911-Speed 5224.04 samples/sec   Loss 5.8874   LearningRate 0.0302   Epoch: 9   Global Step: 45570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:41,789-Speed 5456.13 samples/sec   Loss 5.7745   LearningRate 0.0302   Epoch: 9   Global Step: 45580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:43,653-Speed 5495.29 samples/sec   Loss 5.8018   LearningRate 0.0302   Epoch: 9   Global Step: 45590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:45,494-Speed 5565.21 samples/sec   Loss 5.8479   LearningRate 0.0302   Epoch: 9   Global Step: 45600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:47,337-Speed 5557.28 samples/sec   Loss 5.9058   LearningRate 0.0302   Epoch: 9   Global Step: 45610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:49,214-Speed 5459.18 samples/sec   Loss 5.6494   LearningRate 0.0301   Epoch: 9   Global Step: 45620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:51,093-Speed 5451.66 samples/sec   Loss 5.7198   LearningRate 0.0301   Epoch: 9   Global Step: 45630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:11:52,945-Speed 5531.30 samples/sec   Loss 5.8512   LearningRate 0.0301   Epoch: 9   Global Step: 45640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:11:54,787-Speed 5562.80 samples/sec   Loss 5.8290   LearningRate 0.0301   Epoch: 9   Global Step: 45650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:11:56,655-Speed 5485.80 samples/sec   Loss 5.6201   LearningRate 0.0301   Epoch: 9   Global Step: 45660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:11:58,505-Speed 5538.63 samples/sec   Loss 5.8669   LearningRate 0.0301   Epoch: 9   Global Step: 45670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:12:00,357-Speed 5529.10 samples/sec   Loss 5.9704   LearningRate 0.0301   Epoch: 9   Global Step: 45680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:12:02,225-Speed 5489.79 samples/sec   Loss 5.8022   LearningRate 0.0301   Epoch: 9   Global Step: 45690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:12:04,076-Speed 5533.63 samples/sec   Loss 5.8275   LearningRate 0.0301   Epoch: 9   Global Step: 45700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:12:05,929-Speed 5530.13 samples/sec   Loss 5.7434   LearningRate 0.0300   Epoch: 9   Global Step: 45710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:12:07,774-Speed 5552.88 samples/sec   Loss 5.9307   LearningRate 0.0300   Epoch: 9   Global Step: 45720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:12:09,613-Speed 5569.74 samples/sec   Loss 5.7834   LearningRate 0.0300   Epoch: 9   Global Step: 45730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:12:11,461-Speed 5544.78 samples/sec   Loss 5.8573   LearningRate 0.0300   Epoch: 9   Global Step: 45740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:12:13,307-Speed 5551.29 samples/sec   Loss 5.8114   LearningRate 0.0300   Epoch: 9   Global Step: 45750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:12:15,193-Speed 5432.01 samples/sec   Loss 5.9919   LearningRate 0.0300   Epoch: 9   Global Step: 45760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:12:17,052-Speed 5512.55 samples/sec   Loss 5.9214   LearningRate 0.0300   Epoch: 9   Global Step: 45770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:12:18,895-Speed 5555.56 samples/sec   Loss 5.8472   LearningRate 0.0300   Epoch: 9   Global Step: 45780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:12:20,733-Speed 5576.41 samples/sec   Loss 6.0719   LearningRate 0.0300   Epoch: 9   Global Step: 45790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:12:22,576-Speed 5560.67 samples/sec   Loss 5.8488   LearningRate 0.0299   Epoch: 9   Global Step: 45800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:12:24,424-Speed 5541.77 samples/sec   Loss 5.8297   LearningRate 0.0299   Epoch: 9   Global Step: 45810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:12:26,281-Speed 5517.10 samples/sec   Loss 6.0221   LearningRate 0.0299   Epoch: 9   Global Step: 45820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:12:28,173-Speed 5416.38 samples/sec   Loss 5.9456   LearningRate 0.0299   Epoch: 9   Global Step: 45830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:12:30,027-Speed 5526.74 samples/sec   Loss 5.8951   LearningRate 0.0299   Epoch: 9   Global Step: 45840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:12:31,908-Speed 5447.34 samples/sec   Loss 5.9823   LearningRate 0.0299   Epoch: 9   Global Step: 45850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:12:33,751-Speed 5557.33 samples/sec   Loss 6.0235   LearningRate 0.0299   Epoch: 9   Global Step: 45860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:12:35,591-Speed 5566.59 samples/sec   Loss 6.0305   LearningRate 0.0299   Epoch: 9   Global Step: 45870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:12:37,443-Speed 5534.88 samples/sec   Loss 5.8903   LearningRate 0.0299   Epoch: 9   Global Step: 45880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:12:39,308-Speed 5492.89 samples/sec   Loss 5.8008   LearningRate 0.0299   Epoch: 9   Global Step: 45890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:12:41,163-Speed 5521.29 samples/sec   Loss 6.0851   LearningRate 0.0298   Epoch: 9   Global Step: 45900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:12:43,035-Speed 5475.10 samples/sec   Loss 6.0349   LearningRate 0.0298   Epoch: 9   Global Step: 45910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:12:44,884-Speed 5539.54 samples/sec   Loss 6.0669   LearningRate 0.0298   Epoch: 9   Global Step: 45920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:12:46,753-Speed 5487.56 samples/sec   Loss 5.9489   LearningRate 0.0298   Epoch: 9   Global Step: 45930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:12:48,600-Speed 5544.70 samples/sec   Loss 6.0742   LearningRate 0.0298   Epoch: 9   Global Step: 45940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:12:50,492-Speed 5416.98 samples/sec   Loss 6.1917   LearningRate 0.0298   Epoch: 9   Global Step: 45950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:12:52,328-Speed 5579.29 samples/sec   Loss 6.0585   LearningRate 0.0298   Epoch: 9   Global Step: 45960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:12:54,179-Speed 5534.33 samples/sec   Loss 5.9837   LearningRate 0.0298   Epoch: 9   Global Step: 45970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:12:56,042-Speed 5500.04 samples/sec   Loss 5.8969   LearningRate 0.0298   Epoch: 9   Global Step: 45980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:12:57,914-Speed 5474.72 samples/sec   Loss 6.1139   LearningRate 0.0297   Epoch: 9   Global Step: 45990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:12:59,757-Speed 5556.89 samples/sec   Loss 6.0678   LearningRate 0.0297   Epoch: 9   Global Step: 46000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:13:27,062-[lfw][46000]XNorm: 22.744840
Training: 2022-04-11 13:13:27,063-[lfw][46000]Accuracy-Flip: 0.99633+-0.00306
Training: 2022-04-11 13:13:27,064-[lfw][46000]Accuracy-Highest: 0.99767
Training: 2022-04-11 13:13:58,395-[cfp_fp][46000]XNorm: 19.897713
Training: 2022-04-11 13:13:58,396-[cfp_fp][46000]Accuracy-Flip: 0.96714+-0.00958
Training: 2022-04-11 13:13:58,397-[cfp_fp][46000]Accuracy-Highest: 0.96771
Training: 2022-04-11 13:14:25,640-[agedb_30][46000]XNorm: 22.336008
Training: 2022-04-11 13:14:25,641-[agedb_30][46000]Accuracy-Flip: 0.97667+-0.00749
Training: 2022-04-11 13:14:25,642-[agedb_30][46000]Accuracy-Highest: 0.97683
Training: 2022-04-11 13:14:27,486-Speed 116.72 samples/sec   Loss 6.0884   LearningRate 0.0297   Epoch: 9   Global Step: 46010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:14:29,332-Speed 5550.75 samples/sec   Loss 6.0095   LearningRate 0.0297   Epoch: 9   Global Step: 46020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:14:31,181-Speed 5540.68 samples/sec   Loss 6.1671   LearningRate 0.0297   Epoch: 9   Global Step: 46030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:14:33,027-Speed 5552.43 samples/sec   Loss 5.9846   LearningRate 0.0297   Epoch: 9   Global Step: 46040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:14:34,876-Speed 5540.23 samples/sec   Loss 6.0896   LearningRate 0.0297   Epoch: 9   Global Step: 46050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:14:36,718-Speed 5564.40 samples/sec   Loss 6.1052   LearningRate 0.0297   Epoch: 9   Global Step: 46060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:14:38,555-Speed 5576.67 samples/sec   Loss 6.1408   LearningRate 0.0297   Epoch: 9   Global Step: 46070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:14:40,433-Speed 5454.45 samples/sec   Loss 6.1048   LearningRate 0.0296   Epoch: 9   Global Step: 46080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:14:42,263-Speed 5598.56 samples/sec   Loss 6.1490   LearningRate 0.0296   Epoch: 9   Global Step: 46090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:14:44,096-Speed 5590.08 samples/sec   Loss 6.1750   LearningRate 0.0296   Epoch: 9   Global Step: 46100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:14:45,926-Speed 5597.45 samples/sec   Loss 6.1409   LearningRate 0.0296   Epoch: 9   Global Step: 46110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:14:47,758-Speed 5592.00 samples/sec   Loss 6.1731   LearningRate 0.0296   Epoch: 9   Global Step: 46120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:14:49,626-Speed 5482.03 samples/sec   Loss 6.1441   LearningRate 0.0296   Epoch: 9   Global Step: 46130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 13:14:51,461-Speed 5586.09 samples/sec   Loss 6.1539   LearningRate 0.0296   Epoch: 9   Global Step: 46140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:14:53,318-Speed 5515.39 samples/sec   Loss 5.9725   LearningRate 0.0296   Epoch: 9   Global Step: 46150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:14:55,179-Speed 5505.75 samples/sec   Loss 6.2079   LearningRate 0.0296   Epoch: 9   Global Step: 46160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 13:14:57,009-Speed 5600.37 samples/sec   Loss 6.0566   LearningRate 0.0295   Epoch: 9   Global Step: 46170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:14:58,881-Speed 5471.19 samples/sec   Loss 6.0996   LearningRate 0.0295   Epoch: 9   Global Step: 46180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:00,728-Speed 5547.94 samples/sec   Loss 6.2194   LearningRate 0.0295   Epoch: 9   Global Step: 46190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:02,606-Speed 5455.27 samples/sec   Loss 6.1373   LearningRate 0.0295   Epoch: 9   Global Step: 46200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:04,448-Speed 5561.79 samples/sec   Loss 6.1146   LearningRate 0.0295   Epoch: 9   Global Step: 46210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:06,301-Speed 5531.94 samples/sec   Loss 6.0939   LearningRate 0.0295   Epoch: 9   Global Step: 46220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:08,162-Speed 5505.48 samples/sec   Loss 6.0765   LearningRate 0.0295   Epoch: 9   Global Step: 46230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:10,000-Speed 5573.63 samples/sec   Loss 6.1983   LearningRate 0.0295   Epoch: 9   Global Step: 46240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:15:11,845-Speed 5553.14 samples/sec   Loss 6.1848   LearningRate 0.0295   Epoch: 9   Global Step: 46250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:13,697-Speed 5533.51 samples/sec   Loss 6.2287   LearningRate 0.0295   Epoch: 9   Global Step: 46260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:15,547-Speed 5537.30 samples/sec   Loss 6.1389   LearningRate 0.0294   Epoch: 9   Global Step: 46270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:17,408-Speed 5506.11 samples/sec   Loss 6.1225   LearningRate 0.0294   Epoch: 9   Global Step: 46280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:19,249-Speed 5563.86 samples/sec   Loss 6.2906   LearningRate 0.0294   Epoch: 9   Global Step: 46290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:21,133-Speed 5440.80 samples/sec   Loss 6.2963   LearningRate 0.0294   Epoch: 9   Global Step: 46300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:23,010-Speed 5456.53 samples/sec   Loss 6.1669   LearningRate 0.0294   Epoch: 9   Global Step: 46310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:24,867-Speed 5516.72 samples/sec   Loss 6.2386   LearningRate 0.0294   Epoch: 9   Global Step: 46320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:26,735-Speed 5487.47 samples/sec   Loss 6.0893   LearningRate 0.0294   Epoch: 9   Global Step: 46330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:15:28,588-Speed 5529.49 samples/sec   Loss 6.1687   LearningRate 0.0294   Epoch: 9   Global Step: 46340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:15:30,449-Speed 5506.01 samples/sec   Loss 6.2203   LearningRate 0.0294   Epoch: 9   Global Step: 46350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:15:32,289-Speed 5569.20 samples/sec   Loss 6.1720   LearningRate 0.0293   Epoch: 9   Global Step: 46360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:15:34,161-Speed 5473.73 samples/sec   Loss 6.2921   LearningRate 0.0293   Epoch: 9   Global Step: 46370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:15:36,044-Speed 5441.72 samples/sec   Loss 6.2676   LearningRate 0.0293   Epoch: 9   Global Step: 46380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:15:37,901-Speed 5517.03 samples/sec   Loss 6.3869   LearningRate 0.0293   Epoch: 9   Global Step: 46390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:15:39,779-Speed 5453.48 samples/sec   Loss 6.0655   LearningRate 0.0293   Epoch: 9   Global Step: 46400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:15:41,664-Speed 5437.56 samples/sec   Loss 6.0280   LearningRate 0.0293   Epoch: 9   Global Step: 46410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:15:43,509-Speed 5551.30 samples/sec   Loss 6.0627   LearningRate 0.0293   Epoch: 9   Global Step: 46420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:15:45,370-Speed 5504.47 samples/sec   Loss 6.1726   LearningRate 0.0293   Epoch: 9   Global Step: 46430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:47,219-Speed 5543.21 samples/sec   Loss 6.1605   LearningRate 0.0293   Epoch: 9   Global Step: 46440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:49,113-Speed 5409.52 samples/sec   Loss 6.1321   LearningRate 0.0292   Epoch: 9   Global Step: 46450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:50,985-Speed 5473.53 samples/sec   Loss 6.2879   LearningRate 0.0292   Epoch: 9   Global Step: 46460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:52,834-Speed 5538.75 samples/sec   Loss 6.1950   LearningRate 0.0292   Epoch: 9   Global Step: 46470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:54,678-Speed 5558.09 samples/sec   Loss 6.1410   LearningRate 0.0292   Epoch: 9   Global Step: 46480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:56,528-Speed 5538.62 samples/sec   Loss 6.2376   LearningRate 0.0292   Epoch: 9   Global Step: 46490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:15:58,394-Speed 5490.68 samples/sec   Loss 6.2512   LearningRate 0.0292   Epoch: 9   Global Step: 46500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:00,233-Speed 5569.26 samples/sec   Loss 6.2704   LearningRate 0.0292   Epoch: 9   Global Step: 46510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:02,082-Speed 5542.94 samples/sec   Loss 6.2874   LearningRate 0.0292   Epoch: 9   Global Step: 46520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:03,932-Speed 5539.16 samples/sec   Loss 6.2731   LearningRate 0.0292   Epoch: 9   Global Step: 46530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:16:05,774-Speed 5561.51 samples/sec   Loss 6.1908   LearningRate 0.0292   Epoch: 9   Global Step: 46540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:16:07,636-Speed 5501.43 samples/sec   Loss 6.2548   LearningRate 0.0291   Epoch: 9   Global Step: 46550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:09,472-Speed 5581.58 samples/sec   Loss 6.2923   LearningRate 0.0291   Epoch: 9   Global Step: 46560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:11,381-Speed 5367.61 samples/sec   Loss 6.3132   LearningRate 0.0291   Epoch: 9   Global Step: 46570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:13,268-Speed 5427.44 samples/sec   Loss 6.1435   LearningRate 0.0291   Epoch: 9   Global Step: 46580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:15,108-Speed 5569.11 samples/sec   Loss 6.3028   LearningRate 0.0291   Epoch: 9   Global Step: 46590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:16,963-Speed 5525.05 samples/sec   Loss 6.1320   LearningRate 0.0291   Epoch: 9   Global Step: 46600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:18,816-Speed 5528.06 samples/sec   Loss 6.3842   LearningRate 0.0291   Epoch: 9   Global Step: 46610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:20,653-Speed 5578.93 samples/sec   Loss 6.1737   LearningRate 0.0291   Epoch: 9   Global Step: 46620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:22,524-Speed 5474.19 samples/sec   Loss 6.2855   LearningRate 0.0291   Epoch: 9   Global Step: 46630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:24,366-Speed 5563.58 samples/sec   Loss 6.1667   LearningRate 0.0290   Epoch: 9   Global Step: 46640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:26,237-Speed 5476.45 samples/sec   Loss 6.4839   LearningRate 0.0290   Epoch: 9   Global Step: 46650   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:16:28,105-Speed 5484.34 samples/sec   Loss 6.3499   LearningRate 0.0290   Epoch: 9   Global Step: 46660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:29,971-Speed 5490.40 samples/sec   Loss 6.2568   LearningRate 0.0290   Epoch: 9   Global Step: 46670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:31,845-Speed 5467.24 samples/sec   Loss 6.3582   LearningRate 0.0290   Epoch: 9   Global Step: 46680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:33,724-Speed 5453.46 samples/sec   Loss 6.1450   LearningRate 0.0290   Epoch: 9   Global Step: 46690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:35,570-Speed 5549.94 samples/sec   Loss 6.2888   LearningRate 0.0290   Epoch: 9   Global Step: 46700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:37,418-Speed 5546.48 samples/sec   Loss 6.4390   LearningRate 0.0290   Epoch: 9   Global Step: 46710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:16:39,286-Speed 5484.56 samples/sec   Loss 6.3369   LearningRate 0.0290   Epoch: 9   Global Step: 46720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:16:41,162-Speed 5461.70 samples/sec   Loss 6.2606   LearningRate 0.0290   Epoch: 9   Global Step: 46730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:16:43,035-Speed 5470.71 samples/sec   Loss 6.2705   LearningRate 0.0289   Epoch: 9   Global Step: 46740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:16:44,895-Speed 5505.68 samples/sec   Loss 6.2552   LearningRate 0.0289   Epoch: 9   Global Step: 46750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:16:46,730-Speed 5587.10 samples/sec   Loss 6.2991   LearningRate 0.0289   Epoch: 9   Global Step: 46760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:16:48,592-Speed 5501.33 samples/sec   Loss 6.2202   LearningRate 0.0289   Epoch: 9   Global Step: 46770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:16:50,427-Speed 5583.71 samples/sec   Loss 6.3746   LearningRate 0.0289   Epoch: 9   Global Step: 46780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:16:52,273-Speed 5549.63 samples/sec   Loss 6.3498   LearningRate 0.0289   Epoch: 9   Global Step: 46790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:16:54,123-Speed 5538.19 samples/sec   Loss 6.2634   LearningRate 0.0289   Epoch: 9   Global Step: 46800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:16:55,969-Speed 5551.59 samples/sec   Loss 6.1213   LearningRate 0.0289   Epoch: 9   Global Step: 46810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:57,838-Speed 5482.58 samples/sec   Loss 6.1571   LearningRate 0.0289   Epoch: 9   Global Step: 46820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:16:59,680-Speed 5562.56 samples/sec   Loss 6.2028   LearningRate 0.0288   Epoch: 9   Global Step: 46830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:01,537-Speed 5519.25 samples/sec   Loss 6.1327   LearningRate 0.0288   Epoch: 9   Global Step: 46840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:03,389-Speed 5531.82 samples/sec   Loss 6.2885   LearningRate 0.0288   Epoch: 9   Global Step: 46850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:05,251-Speed 5503.58 samples/sec   Loss 6.4093   LearningRate 0.0288   Epoch: 9   Global Step: 46860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:07,088-Speed 5577.88 samples/sec   Loss 6.3139   LearningRate 0.0288   Epoch: 9   Global Step: 46870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:08,964-Speed 5460.30 samples/sec   Loss 6.3517   LearningRate 0.0288   Epoch: 9   Global Step: 46880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:10,810-Speed 5552.17 samples/sec   Loss 6.1187   LearningRate 0.0288   Epoch: 9   Global Step: 46890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:12,675-Speed 5492.91 samples/sec   Loss 6.1784   LearningRate 0.0288   Epoch: 9   Global Step: 46900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:14,542-Speed 5487.55 samples/sec   Loss 6.1524   LearningRate 0.0288   Epoch: 9   Global Step: 46910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:17:16,393-Speed 5537.21 samples/sec   Loss 6.2373   LearningRate 0.0287   Epoch: 9   Global Step: 46920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:17:18,224-Speed 5593.14 samples/sec   Loss 6.0927   LearningRate 0.0287   Epoch: 9   Global Step: 46930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:20,073-Speed 5540.15 samples/sec   Loss 6.2627   LearningRate 0.0287   Epoch: 9   Global Step: 46940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:21,926-Speed 5530.69 samples/sec   Loss 6.3013   LearningRate 0.0287   Epoch: 9   Global Step: 46950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:23,779-Speed 5527.52 samples/sec   Loss 6.0969   LearningRate 0.0287   Epoch: 9   Global Step: 46960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:25,624-Speed 5553.87 samples/sec   Loss 6.2739   LearningRate 0.0287   Epoch: 9   Global Step: 46970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:27,470-Speed 5549.77 samples/sec   Loss 6.3358   LearningRate 0.0287   Epoch: 9   Global Step: 46980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:29,296-Speed 5611.77 samples/sec   Loss 6.2852   LearningRate 0.0287   Epoch: 9   Global Step: 46990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:17:31,139-Speed 5559.27 samples/sec   Loss 6.3263   LearningRate 0.0287   Epoch: 9   Global Step: 47000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:17:32,979-Speed 5565.66 samples/sec   Loss 6.2760   LearningRate 0.0287   Epoch: 9   Global Step: 47010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:17:34,826-Speed 5548.42 samples/sec   Loss 6.3749   LearningRate 0.0286   Epoch: 9   Global Step: 47020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:17:36,660-Speed 5585.72 samples/sec   Loss 6.0945   LearningRate 0.0286   Epoch: 9   Global Step: 47030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:17:38,521-Speed 5505.11 samples/sec   Loss 6.1838   LearningRate 0.0286   Epoch: 9   Global Step: 47040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:17:40,376-Speed 5524.75 samples/sec   Loss 6.1945   LearningRate 0.0286   Epoch: 9   Global Step: 47050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:17:42,223-Speed 5546.30 samples/sec   Loss 6.3514   LearningRate 0.0286   Epoch: 9   Global Step: 47060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:17:44,062-Speed 5569.90 samples/sec   Loss 6.3550   LearningRate 0.0286   Epoch: 9   Global Step: 47070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:17:45,903-Speed 5567.10 samples/sec   Loss 6.5714   LearningRate 0.0286   Epoch: 9   Global Step: 47080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:17:47,737-Speed 5585.23 samples/sec   Loss 6.4093   LearningRate 0.0286   Epoch: 9   Global Step: 47090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:49,574-Speed 5576.98 samples/sec   Loss 6.3280   LearningRate 0.0286   Epoch: 9   Global Step: 47100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:51,433-Speed 5511.30 samples/sec   Loss 6.1451   LearningRate 0.0285   Epoch: 9   Global Step: 47110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:53,291-Speed 5514.29 samples/sec   Loss 6.4699   LearningRate 0.0285   Epoch: 9   Global Step: 47120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:55,141-Speed 5537.11 samples/sec   Loss 6.2114   LearningRate 0.0285   Epoch: 9   Global Step: 47130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:56,998-Speed 5518.77 samples/sec   Loss 6.1386   LearningRate 0.0285   Epoch: 9   Global Step: 47140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:17:58,871-Speed 5470.01 samples/sec   Loss 6.3206   LearningRate 0.0285   Epoch: 9   Global Step: 47150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:00,709-Speed 5575.24 samples/sec   Loss 6.2497   LearningRate 0.0285   Epoch: 9   Global Step: 47160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:02,563-Speed 5527.11 samples/sec   Loss 6.2531   LearningRate 0.0285   Epoch: 9   Global Step: 47170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:04,413-Speed 5537.52 samples/sec   Loss 6.2876   LearningRate 0.0285   Epoch: 9   Global Step: 47180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:06,271-Speed 5516.68 samples/sec   Loss 6.4015   LearningRate 0.0285   Epoch: 9   Global Step: 47190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:08,132-Speed 5506.02 samples/sec   Loss 6.3096   LearningRate 0.0285   Epoch: 9   Global Step: 47200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:09,983-Speed 5532.34 samples/sec   Loss 6.2978   LearningRate 0.0284   Epoch: 9   Global Step: 47210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:11,872-Speed 5424.47 samples/sec   Loss 6.2505   LearningRate 0.0284   Epoch: 9   Global Step: 47220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:13,721-Speed 5542.71 samples/sec   Loss 6.2320   LearningRate 0.0284   Epoch: 9   Global Step: 47230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:15,606-Speed 5436.53 samples/sec   Loss 6.2116   LearningRate 0.0284   Epoch: 9   Global Step: 47240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:17,460-Speed 5525.41 samples/sec   Loss 6.3202   LearningRate 0.0284   Epoch: 9   Global Step: 47250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:19,314-Speed 5526.07 samples/sec   Loss 6.2478   LearningRate 0.0284   Epoch: 9   Global Step: 47260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:21,149-Speed 5585.01 samples/sec   Loss 6.1982   LearningRate 0.0284   Epoch: 9   Global Step: 47270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:23,007-Speed 5515.05 samples/sec   Loss 6.2494   LearningRate 0.0284   Epoch: 9   Global Step: 47280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:24,866-Speed 5509.12 samples/sec   Loss 6.2352   LearningRate 0.0284   Epoch: 9   Global Step: 47290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:18:26,813-Speed 5265.27 samples/sec   Loss 6.1452   LearningRate 0.0283   Epoch: 9   Global Step: 47300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:18:28,661-Speed 5545.63 samples/sec   Loss 6.3719   LearningRate 0.0283   Epoch: 9   Global Step: 47310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:18:30,513-Speed 5531.58 samples/sec   Loss 6.2962   LearningRate 0.0283   Epoch: 9   Global Step: 47320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:18:32,349-Speed 5579.33 samples/sec   Loss 6.2878   LearningRate 0.0283   Epoch: 9   Global Step: 47330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:18:34,212-Speed 5500.90 samples/sec   Loss 6.1584   LearningRate 0.0283   Epoch: 9   Global Step: 47340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:18:36,052-Speed 5567.66 samples/sec   Loss 6.3565   LearningRate 0.0283   Epoch: 9   Global Step: 47350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:18:37,912-Speed 5506.88 samples/sec   Loss 6.3299   LearningRate 0.0283   Epoch: 9   Global Step: 47360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:18:39,776-Speed 5498.53 samples/sec   Loss 6.2753   LearningRate 0.0283   Epoch: 9   Global Step: 47370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:18:41,632-Speed 5520.55 samples/sec   Loss 6.1861   LearningRate 0.0283   Epoch: 9   Global Step: 47380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:18:43,492-Speed 5508.04 samples/sec   Loss 6.2940   LearningRate 0.0283   Epoch: 9   Global Step: 47390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:18:45,347-Speed 5524.25 samples/sec   Loss 6.3891   LearningRate 0.0282   Epoch: 9   Global Step: 47400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:18:47,189-Speed 5562.23 samples/sec   Loss 6.2796   LearningRate 0.0282   Epoch: 9   Global Step: 47410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:49,049-Speed 5508.11 samples/sec   Loss 6.2838   LearningRate 0.0282   Epoch: 9   Global Step: 47420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:50,890-Speed 5563.99 samples/sec   Loss 6.1760   LearningRate 0.0282   Epoch: 9   Global Step: 47430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:52,727-Speed 5578.78 samples/sec   Loss 6.3433   LearningRate 0.0282   Epoch: 9   Global Step: 47440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:54,573-Speed 5549.62 samples/sec   Loss 6.4229   LearningRate 0.0282   Epoch: 9   Global Step: 47450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:56,414-Speed 5564.00 samples/sec   Loss 6.2531   LearningRate 0.0282   Epoch: 9   Global Step: 47460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:18:58,284-Speed 5480.57 samples/sec   Loss 6.4184   LearningRate 0.0282   Epoch: 9   Global Step: 47470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:19:00,118-Speed 5583.64 samples/sec   Loss 6.2171   LearningRate 0.0282   Epoch: 9   Global Step: 47480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:19:01,968-Speed 5539.17 samples/sec   Loss 6.2758   LearningRate 0.0281   Epoch: 9   Global Step: 47490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:19:03,838-Speed 5480.44 samples/sec   Loss 6.3353   LearningRate 0.0281   Epoch: 9   Global Step: 47500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:19:05,682-Speed 5554.60 samples/sec   Loss 6.2659   LearningRate 0.0281   Epoch: 9   Global Step: 47510   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:19:07,553-Speed 5477.39 samples/sec   Loss 6.2900   LearningRate 0.0281   Epoch: 9   Global Step: 47520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:19:09,391-Speed 5573.20 samples/sec   Loss 6.2567   LearningRate 0.0281   Epoch: 9   Global Step: 47530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:11,253-Speed 5504.65 samples/sec   Loss 6.3412   LearningRate 0.0281   Epoch: 9   Global Step: 47540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:13,126-Speed 5467.50 samples/sec   Loss 6.3575   LearningRate 0.0281   Epoch: 9   Global Step: 47550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:14,978-Speed 5534.23 samples/sec   Loss 6.3419   LearningRate 0.0281   Epoch: 9   Global Step: 47560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:16,838-Speed 5507.32 samples/sec   Loss 6.2541   LearningRate 0.0281   Epoch: 9   Global Step: 47570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:18,673-Speed 5583.23 samples/sec   Loss 6.2452   LearningRate 0.0281   Epoch: 9   Global Step: 47580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:20,553-Speed 5449.77 samples/sec   Loss 6.1854   LearningRate 0.0280   Epoch: 9   Global Step: 47590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:22,393-Speed 5570.21 samples/sec   Loss 6.2484   LearningRate 0.0280   Epoch: 9   Global Step: 47600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:24,270-Speed 5456.78 samples/sec   Loss 6.4035   LearningRate 0.0280   Epoch: 9   Global Step: 47610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:26,127-Speed 5518.99 samples/sec   Loss 6.3538   LearningRate 0.0280   Epoch: 9   Global Step: 47620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:27,991-Speed 5495.07 samples/sec   Loss 6.4206   LearningRate 0.0280   Epoch: 9   Global Step: 47630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:29,838-Speed 5546.71 samples/sec   Loss 6.3680   LearningRate 0.0280   Epoch: 9   Global Step: 47640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:31,711-Speed 5471.19 samples/sec   Loss 6.3053   LearningRate 0.0280   Epoch: 9   Global Step: 47650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:33,556-Speed 5551.42 samples/sec   Loss 6.2817   LearningRate 0.0280   Epoch: 9   Global Step: 47660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:35,397-Speed 5564.86 samples/sec   Loss 6.1364   LearningRate 0.0280   Epoch: 9   Global Step: 47670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:37,265-Speed 5488.58 samples/sec   Loss 6.2495   LearningRate 0.0279   Epoch: 9   Global Step: 47680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:39,140-Speed 5463.86 samples/sec   Loss 6.2782   LearningRate 0.0279   Epoch: 9   Global Step: 47690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:40,988-Speed 5546.22 samples/sec   Loss 6.3271   LearningRate 0.0279   Epoch: 9   Global Step: 47700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:42,835-Speed 5544.31 samples/sec   Loss 6.3325   LearningRate 0.0279   Epoch: 9   Global Step: 47710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:44,699-Speed 5496.02 samples/sec   Loss 6.2763   LearningRate 0.0279   Epoch: 9   Global Step: 47720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:46,535-Speed 5581.58 samples/sec   Loss 6.3287   LearningRate 0.0279   Epoch: 9   Global Step: 47730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:19:48,376-Speed 5566.89 samples/sec   Loss 6.4357   LearningRate 0.0279   Epoch: 9   Global Step: 47740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:19:50,218-Speed 5560.64 samples/sec   Loss 6.2609   LearningRate 0.0279   Epoch: 9   Global Step: 47750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:19:52,058-Speed 5567.22 samples/sec   Loss 6.2442   LearningRate 0.0279   Epoch: 9   Global Step: 47760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:19:53,903-Speed 5552.04 samples/sec   Loss 6.1306   LearningRate 0.0279   Epoch: 9   Global Step: 47770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:55,743-Speed 5567.96 samples/sec   Loss 6.2810   LearningRate 0.0278   Epoch: 9   Global Step: 47780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:57,585-Speed 5563.15 samples/sec   Loss 6.2307   LearningRate 0.0278   Epoch: 9   Global Step: 47790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:19:59,437-Speed 5534.32 samples/sec   Loss 6.3213   LearningRate 0.0278   Epoch: 9   Global Step: 47800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:01,306-Speed 5480.64 samples/sec   Loss 6.2876   LearningRate 0.0278   Epoch: 9   Global Step: 47810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:03,147-Speed 5568.15 samples/sec   Loss 6.3618   LearningRate 0.0278   Epoch: 9   Global Step: 47820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:04,997-Speed 5539.40 samples/sec   Loss 6.1041   LearningRate 0.0278   Epoch: 9   Global Step: 47830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:06,859-Speed 5504.67 samples/sec   Loss 6.2542   LearningRate 0.0278   Epoch: 9   Global Step: 47840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:08,720-Speed 5503.50 samples/sec   Loss 6.3757   LearningRate 0.0278   Epoch: 9   Global Step: 47850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:10,587-Speed 5487.20 samples/sec   Loss 6.2078   LearningRate 0.0278   Epoch: 9   Global Step: 47860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:12,455-Speed 5488.01 samples/sec   Loss 6.3619   LearningRate 0.0278   Epoch: 9   Global Step: 47870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:14,299-Speed 5555.60 samples/sec   Loss 6.2414   LearningRate 0.0277   Epoch: 9   Global Step: 47880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:16,163-Speed 5498.27 samples/sec   Loss 6.3277   LearningRate 0.0277   Epoch: 9   Global Step: 47890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:18,007-Speed 5555.75 samples/sec   Loss 6.2398   LearningRate 0.0277   Epoch: 9   Global Step: 47900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:19,840-Speed 5588.11 samples/sec   Loss 6.1585   LearningRate 0.0277   Epoch: 9   Global Step: 47910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:21,722-Speed 5446.02 samples/sec   Loss 6.2456   LearningRate 0.0277   Epoch: 9   Global Step: 47920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:23,582-Speed 5508.58 samples/sec   Loss 6.4065   LearningRate 0.0277   Epoch: 9   Global Step: 47930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:25,431-Speed 5542.51 samples/sec   Loss 6.2701   LearningRate 0.0277   Epoch: 9   Global Step: 47940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:27,269-Speed 5574.87 samples/sec   Loss 6.2398   LearningRate 0.0277   Epoch: 9   Global Step: 47950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:29,139-Speed 5477.98 samples/sec   Loss 6.4434   LearningRate 0.0277   Epoch: 9   Global Step: 47960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:20:31,008-Speed 5484.69 samples/sec   Loss 6.1468   LearningRate 0.0276   Epoch: 9   Global Step: 47970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:20:32,854-Speed 5551.16 samples/sec   Loss 6.3850   LearningRate 0.0276   Epoch: 9   Global Step: 47980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:20:34,720-Speed 5489.10 samples/sec   Loss 6.3673   LearningRate 0.0276   Epoch: 9   Global Step: 47990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:20:36,566-Speed 5550.26 samples/sec   Loss 6.3474   LearningRate 0.0276   Epoch: 9   Global Step: 48000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:21:03,911-[lfw][48000]XNorm: 20.791561
Training: 2022-04-11 13:21:03,912-[lfw][48000]Accuracy-Flip: 0.99783+-0.00236
Training: 2022-04-11 13:21:03,912-[lfw][48000]Accuracy-Highest: 0.99783
Training: 2022-04-11 13:21:35,489-[cfp_fp][48000]XNorm: 18.188981
Training: 2022-04-11 13:21:35,490-[cfp_fp][48000]Accuracy-Flip: 0.96743+-0.00787
Training: 2022-04-11 13:21:35,491-[cfp_fp][48000]Accuracy-Highest: 0.96771
Training: 2022-04-11 13:22:02,360-[agedb_30][48000]XNorm: 20.354975
Training: 2022-04-11 13:22:02,361-[agedb_30][48000]Accuracy-Flip: 0.97817+-0.00794
Training: 2022-04-11 13:22:02,361-[agedb_30][48000]Accuracy-Highest: 0.97817
Training: 2022-04-11 13:22:04,258-Speed 116.77 samples/sec   Loss 6.4198   LearningRate 0.0276   Epoch: 9   Global Step: 48010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:06,090-Speed 5591.81 samples/sec   Loss 6.1424   LearningRate 0.0276   Epoch: 9   Global Step: 48020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:07,924-Speed 5587.59 samples/sec   Loss 6.2647   LearningRate 0.0276   Epoch: 9   Global Step: 48030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:09,755-Speed 5593.67 samples/sec   Loss 6.3502   LearningRate 0.0276   Epoch: 9   Global Step: 48040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:11,608-Speed 5529.50 samples/sec   Loss 6.3039   LearningRate 0.0276   Epoch: 9   Global Step: 48050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:13,438-Speed 5602.42 samples/sec   Loss 6.3129   LearningRate 0.0276   Epoch: 9   Global Step: 48060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:15,268-Speed 5597.59 samples/sec   Loss 6.0944   LearningRate 0.0275   Epoch: 9   Global Step: 48070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:22:17,092-Speed 5614.09 samples/sec   Loss 6.2919   LearningRate 0.0275   Epoch: 9   Global Step: 48080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:18,972-Speed 5450.31 samples/sec   Loss 6.3901   LearningRate 0.0275   Epoch: 9   Global Step: 48090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:20,803-Speed 5594.91 samples/sec   Loss 6.3005   LearningRate 0.0275   Epoch: 9   Global Step: 48100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:22,653-Speed 5539.10 samples/sec   Loss 6.2318   LearningRate 0.0275   Epoch: 9   Global Step: 48110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:24,507-Speed 5526.80 samples/sec   Loss 6.5105   LearningRate 0.0275   Epoch: 9   Global Step: 48120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:26,354-Speed 5547.15 samples/sec   Loss 6.1962   LearningRate 0.0275   Epoch: 9   Global Step: 48130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:28,223-Speed 5482.10 samples/sec   Loss 6.3738   LearningRate 0.0275   Epoch: 9   Global Step: 48140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:30,058-Speed 5583.42 samples/sec   Loss 6.3673   LearningRate 0.0275   Epoch: 9   Global Step: 48150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:31,908-Speed 5537.74 samples/sec   Loss 6.2401   LearningRate 0.0274   Epoch: 9   Global Step: 48160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:33,797-Speed 5424.75 samples/sec   Loss 6.2214   LearningRate 0.0274   Epoch: 9   Global Step: 48170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:35,706-Speed 5368.20 samples/sec   Loss 6.3813   LearningRate 0.0274   Epoch: 9   Global Step: 48180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:22:37,551-Speed 5553.97 samples/sec   Loss 6.3050   LearningRate 0.0274   Epoch: 9   Global Step: 48190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:39,398-Speed 5546.82 samples/sec   Loss 6.3704   LearningRate 0.0274   Epoch: 9   Global Step: 48200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:41,260-Speed 5501.86 samples/sec   Loss 6.2786   LearningRate 0.0274   Epoch: 9   Global Step: 48210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:43,105-Speed 5554.39 samples/sec   Loss 6.2584   LearningRate 0.0274   Epoch: 9   Global Step: 48220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:44,963-Speed 5516.19 samples/sec   Loss 6.0964   LearningRate 0.0274   Epoch: 9   Global Step: 48230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:46,813-Speed 5538.29 samples/sec   Loss 6.3063   LearningRate 0.0274   Epoch: 9   Global Step: 48240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:48,641-Speed 5605.22 samples/sec   Loss 6.1985   LearningRate 0.0274   Epoch: 9   Global Step: 48250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:50,489-Speed 5541.00 samples/sec   Loss 6.1287   LearningRate 0.0273   Epoch: 9   Global Step: 48260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:52,343-Speed 5528.73 samples/sec   Loss 6.1227   LearningRate 0.0273   Epoch: 9   Global Step: 48270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:54,203-Speed 5508.21 samples/sec   Loss 6.0873   LearningRate 0.0273   Epoch: 9   Global Step: 48280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:56,060-Speed 5514.87 samples/sec   Loss 6.2064   LearningRate 0.0273   Epoch: 9   Global Step: 48290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:22:57,897-Speed 5579.59 samples/sec   Loss 6.2082   LearningRate 0.0273   Epoch: 9   Global Step: 48300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:22:59,793-Speed 5402.50 samples/sec   Loss 6.4348   LearningRate 0.0273   Epoch: 9   Global Step: 48310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:01,671-Speed 5454.94 samples/sec   Loss 6.2583   LearningRate 0.0273   Epoch: 9   Global Step: 48320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:03,507-Speed 5582.46 samples/sec   Loss 6.2341   LearningRate 0.0273   Epoch: 9   Global Step: 48330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:05,366-Speed 5508.81 samples/sec   Loss 6.2205   LearningRate 0.0273   Epoch: 9   Global Step: 48340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:07,217-Speed 5537.87 samples/sec   Loss 6.3363   LearningRate 0.0273   Epoch: 9   Global Step: 48350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:09,082-Speed 5494.64 samples/sec   Loss 6.2751   LearningRate 0.0272   Epoch: 9   Global Step: 48360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:10,939-Speed 5516.77 samples/sec   Loss 6.2940   LearningRate 0.0272   Epoch: 9   Global Step: 48370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:12,856-Speed 5342.97 samples/sec   Loss 6.1495   LearningRate 0.0272   Epoch: 9   Global Step: 48380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:14,707-Speed 5535.48 samples/sec   Loss 6.1575   LearningRate 0.0272   Epoch: 9   Global Step: 48390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:16,555-Speed 5545.42 samples/sec   Loss 6.2552   LearningRate 0.0272   Epoch: 9   Global Step: 48400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:18,413-Speed 5515.25 samples/sec   Loss 6.2722   LearningRate 0.0272   Epoch: 9   Global Step: 48410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:20,276-Speed 5500.29 samples/sec   Loss 6.2453   LearningRate 0.0272   Epoch: 9   Global Step: 48420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:22,119-Speed 5559.11 samples/sec   Loss 6.0807   LearningRate 0.0272   Epoch: 9   Global Step: 48430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:23,978-Speed 5511.28 samples/sec   Loss 6.2639   LearningRate 0.0272   Epoch: 9   Global Step: 48440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:25,832-Speed 5527.46 samples/sec   Loss 6.2126   LearningRate 0.0271   Epoch: 9   Global Step: 48450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:27,690-Speed 5515.01 samples/sec   Loss 6.2956   LearningRate 0.0271   Epoch: 9   Global Step: 48460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:29,534-Speed 5554.63 samples/sec   Loss 6.2460   LearningRate 0.0271   Epoch: 9   Global Step: 48470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:31,371-Speed 5575.00 samples/sec   Loss 6.2888   LearningRate 0.0271   Epoch: 9   Global Step: 48480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:33,209-Speed 5574.34 samples/sec   Loss 6.2896   LearningRate 0.0271   Epoch: 9   Global Step: 48490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:35,070-Speed 5506.67 samples/sec   Loss 6.0858   LearningRate 0.0271   Epoch: 9   Global Step: 48500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:23:36,925-Speed 5522.26 samples/sec   Loss 6.2574   LearningRate 0.0271   Epoch: 9   Global Step: 48510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:23:38,761-Speed 5580.79 samples/sec   Loss 6.3101   LearningRate 0.0271   Epoch: 9   Global Step: 48520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:23:40,611-Speed 5537.58 samples/sec   Loss 6.3916   LearningRate 0.0271   Epoch: 9   Global Step: 48530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:23:42,489-Speed 5507.81 samples/sec   Loss 6.3484   LearningRate 0.0271   Epoch: 9   Global Step: 48540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:23:44,324-Speed 5582.01 samples/sec   Loss 6.1862   LearningRate 0.0270   Epoch: 9   Global Step: 48550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:23:46,175-Speed 5535.69 samples/sec   Loss 6.3391   LearningRate 0.0270   Epoch: 9   Global Step: 48560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:23:48,027-Speed 5531.99 samples/sec   Loss 6.2819   LearningRate 0.0270   Epoch: 9   Global Step: 48570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:23:49,940-Speed 5355.11 samples/sec   Loss 6.2802   LearningRate 0.0270   Epoch: 9   Global Step: 48580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:23:51,796-Speed 5519.37 samples/sec   Loss 6.1688   LearningRate 0.0270   Epoch: 9   Global Step: 48590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:23:53,635-Speed 5569.91 samples/sec   Loss 6.2068   LearningRate 0.0270   Epoch: 9   Global Step: 48600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:55,476-Speed 5566.85 samples/sec   Loss 6.1132   LearningRate 0.0270   Epoch: 9   Global Step: 48610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:57,305-Speed 5600.61 samples/sec   Loss 6.1556   LearningRate 0.0270   Epoch: 9   Global Step: 48620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:23:59,172-Speed 5489.75 samples/sec   Loss 6.2628   LearningRate 0.0270   Epoch: 9   Global Step: 48630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:01,005-Speed 5587.53 samples/sec   Loss 6.2761   LearningRate 0.0270   Epoch: 9   Global Step: 48640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:02,844-Speed 5572.92 samples/sec   Loss 6.2163   LearningRate 0.0269   Epoch: 9   Global Step: 48650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:04,706-Speed 5502.02 samples/sec   Loss 6.2096   LearningRate 0.0269   Epoch: 9   Global Step: 48660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:06,540-Speed 5584.38 samples/sec   Loss 6.2249   LearningRate 0.0269   Epoch: 9   Global Step: 48670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:08,379-Speed 5572.10 samples/sec   Loss 6.3005   LearningRate 0.0269   Epoch: 9   Global Step: 48680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:10,216-Speed 5578.25 samples/sec   Loss 6.1861   LearningRate 0.0269   Epoch: 9   Global Step: 48690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:12,057-Speed 5564.28 samples/sec   Loss 6.2262   LearningRate 0.0269   Epoch: 9   Global Step: 48700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:13,921-Speed 5496.40 samples/sec   Loss 6.2235   LearningRate 0.0269   Epoch: 9   Global Step: 48710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:15,760-Speed 5570.49 samples/sec   Loss 6.2422   LearningRate 0.0269   Epoch: 9   Global Step: 48720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:17,608-Speed 5543.88 samples/sec   Loss 6.2758   LearningRate 0.0269   Epoch: 9   Global Step: 48730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:19,447-Speed 5570.84 samples/sec   Loss 6.1557   LearningRate 0.0269   Epoch: 9   Global Step: 48740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:21,305-Speed 5512.11 samples/sec   Loss 6.2712   LearningRate 0.0268   Epoch: 9   Global Step: 48750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:24:23,146-Speed 5565.54 samples/sec   Loss 6.2062   LearningRate 0.0268   Epoch: 9   Global Step: 48760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:24:25,002-Speed 5520.38 samples/sec   Loss 6.2109   LearningRate 0.0268   Epoch: 9   Global Step: 48770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:24:26,865-Speed 5497.42 samples/sec   Loss 6.2622   LearningRate 0.0268   Epoch: 9   Global Step: 48780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:24:28,709-Speed 5557.86 samples/sec   Loss 6.3417   LearningRate 0.0268   Epoch: 9   Global Step: 48790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:24:30,579-Speed 5479.09 samples/sec   Loss 6.2234   LearningRate 0.0268   Epoch: 9   Global Step: 48800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:24:32,423-Speed 5558.33 samples/sec   Loss 6.2885   LearningRate 0.0268   Epoch: 9   Global Step: 48810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:24:34,274-Speed 5535.63 samples/sec   Loss 6.2244   LearningRate 0.0268   Epoch: 9   Global Step: 48820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:24:36,131-Speed 5516.83 samples/sec   Loss 6.4010   LearningRate 0.0268   Epoch: 9   Global Step: 48830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:24:37,979-Speed 5545.71 samples/sec   Loss 6.1020   LearningRate 0.0267   Epoch: 9   Global Step: 48840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:24:39,834-Speed 5523.05 samples/sec   Loss 6.2916   LearningRate 0.0267   Epoch: 9   Global Step: 48850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:41,680-Speed 5548.18 samples/sec   Loss 6.1243   LearningRate 0.0267   Epoch: 9   Global Step: 48860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:43,537-Speed 5516.58 samples/sec   Loss 6.1887   LearningRate 0.0267   Epoch: 9   Global Step: 48870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:45,395-Speed 5515.37 samples/sec   Loss 6.1185   LearningRate 0.0267   Epoch: 9   Global Step: 48880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:47,240-Speed 5553.77 samples/sec   Loss 6.2466   LearningRate 0.0267   Epoch: 9   Global Step: 48890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:49,093-Speed 5527.46 samples/sec   Loss 6.1773   LearningRate 0.0267   Epoch: 9   Global Step: 48900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:50,938-Speed 5552.04 samples/sec   Loss 6.2258   LearningRate 0.0267   Epoch: 9   Global Step: 48910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:52,781-Speed 5562.16 samples/sec   Loss 6.1451   LearningRate 0.0267   Epoch: 9   Global Step: 48920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:54,623-Speed 5558.94 samples/sec   Loss 6.2188   LearningRate 0.0267   Epoch: 9   Global Step: 48930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:56,465-Speed 5562.17 samples/sec   Loss 6.1524   LearningRate 0.0266   Epoch: 9   Global Step: 48940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:24:58,301-Speed 5581.06 samples/sec   Loss 6.2771   LearningRate 0.0266   Epoch: 9   Global Step: 48950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:25:00,180-Speed 5452.70 samples/sec   Loss 6.2581   LearningRate 0.0266   Epoch: 9   Global Step: 48960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:25:02,014-Speed 5585.03 samples/sec   Loss 6.3256   LearningRate 0.0266   Epoch: 9   Global Step: 48970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:03,872-Speed 5514.21 samples/sec   Loss 6.2565   LearningRate 0.0266   Epoch: 9   Global Step: 48980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:05,706-Speed 5584.22 samples/sec   Loss 6.2389   LearningRate 0.0266   Epoch: 9   Global Step: 48990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:07,574-Speed 5486.05 samples/sec   Loss 6.2105   LearningRate 0.0266   Epoch: 9   Global Step: 49000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:09,420-Speed 5547.38 samples/sec   Loss 6.2348   LearningRate 0.0266   Epoch: 9   Global Step: 49010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:11,283-Speed 5500.30 samples/sec   Loss 6.3635   LearningRate 0.0266   Epoch: 9   Global Step: 49020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:13,135-Speed 5532.49 samples/sec   Loss 6.2326   LearningRate 0.0266   Epoch: 9   Global Step: 49030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:15,013-Speed 5454.12 samples/sec   Loss 6.1325   LearningRate 0.0265   Epoch: 9   Global Step: 49040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:16,858-Speed 5552.47 samples/sec   Loss 6.1365   LearningRate 0.0265   Epoch: 9   Global Step: 49050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:18,725-Speed 5488.24 samples/sec   Loss 6.1202   LearningRate 0.0265   Epoch: 9   Global Step: 49060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:20,569-Speed 5556.21 samples/sec   Loss 6.3010   LearningRate 0.0265   Epoch: 9   Global Step: 49070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:25:22,415-Speed 5551.13 samples/sec   Loss 6.3700   LearningRate 0.0265   Epoch: 9   Global Step: 49080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:25:24,257-Speed 5560.73 samples/sec   Loss 6.1229   LearningRate 0.0265   Epoch: 9   Global Step: 49090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:26,129-Speed 5472.26 samples/sec   Loss 6.2250   LearningRate 0.0265   Epoch: 9   Global Step: 49100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:27,973-Speed 5559.54 samples/sec   Loss 6.2774   LearningRate 0.0265   Epoch: 9   Global Step: 49110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:29,837-Speed 5495.18 samples/sec   Loss 6.1934   LearningRate 0.0265   Epoch: 9   Global Step: 49120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:31,686-Speed 5539.55 samples/sec   Loss 6.1787   LearningRate 0.0265   Epoch: 9   Global Step: 49130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:33,547-Speed 5506.56 samples/sec   Loss 6.0925   LearningRate 0.0264   Epoch: 9   Global Step: 49140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:35,387-Speed 5567.26 samples/sec   Loss 6.2963   LearningRate 0.0264   Epoch: 9   Global Step: 49150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:37,254-Speed 5488.59 samples/sec   Loss 6.1041   LearningRate 0.0264   Epoch: 9   Global Step: 49160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:39,110-Speed 5520.04 samples/sec   Loss 6.3502   LearningRate 0.0264   Epoch: 9   Global Step: 49170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:40,954-Speed 5557.66 samples/sec   Loss 6.2156   LearningRate 0.0264   Epoch: 9   Global Step: 49180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:42,780-Speed 5612.02 samples/sec   Loss 6.1611   LearningRate 0.0264   Epoch: 9   Global Step: 49190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:44,656-Speed 5460.95 samples/sec   Loss 6.1695   LearningRate 0.0264   Epoch: 9   Global Step: 49200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:46,494-Speed 5571.48 samples/sec   Loss 6.2536   LearningRate 0.0264   Epoch: 9   Global Step: 49210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:48,351-Speed 5517.59 samples/sec   Loss 6.2568   LearningRate 0.0264   Epoch: 9   Global Step: 49220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:50,200-Speed 5540.41 samples/sec   Loss 6.2441   LearningRate 0.0264   Epoch: 9   Global Step: 49230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:52,050-Speed 5538.57 samples/sec   Loss 6.1039   LearningRate 0.0263   Epoch: 9   Global Step: 49240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:53,909-Speed 5512.34 samples/sec   Loss 6.0913   LearningRate 0.0263   Epoch: 9   Global Step: 49250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:55,741-Speed 5590.30 samples/sec   Loss 6.1862   LearningRate 0.0263   Epoch: 9   Global Step: 49260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:57,580-Speed 5572.49 samples/sec   Loss 6.2040   LearningRate 0.0263   Epoch: 9   Global Step: 49270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:25:59,463-Speed 5442.31 samples/sec   Loss 6.1277   LearningRate 0.0263   Epoch: 9   Global Step: 49280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:01,299-Speed 5579.29 samples/sec   Loss 6.2341   LearningRate 0.0263   Epoch: 9   Global Step: 49290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:26:03,162-Speed 5499.08 samples/sec   Loss 6.2959   LearningRate 0.0263   Epoch: 9   Global Step: 49300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:05,023-Speed 5505.21 samples/sec   Loss 6.1880   LearningRate 0.0263   Epoch: 9   Global Step: 49310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:06,873-Speed 5537.78 samples/sec   Loss 6.2379   LearningRate 0.0263   Epoch: 9   Global Step: 49320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:08,728-Speed 5526.21 samples/sec   Loss 6.1569   LearningRate 0.0263   Epoch: 9   Global Step: 49330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:10,571-Speed 5557.62 samples/sec   Loss 6.0705   LearningRate 0.0262   Epoch: 9   Global Step: 49340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:12,435-Speed 5494.81 samples/sec   Loss 6.2581   LearningRate 0.0262   Epoch: 9   Global Step: 49350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:14,276-Speed 5568.74 samples/sec   Loss 6.0729   LearningRate 0.0262   Epoch: 9   Global Step: 49360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:16,115-Speed 5568.91 samples/sec   Loss 6.2033   LearningRate 0.0262   Epoch: 9   Global Step: 49370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:17,959-Speed 5556.59 samples/sec   Loss 6.0911   LearningRate 0.0262   Epoch: 9   Global Step: 49380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:19,802-Speed 5560.36 samples/sec   Loss 6.1857   LearningRate 0.0262   Epoch: 9   Global Step: 49390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:21,644-Speed 5560.61 samples/sec   Loss 6.2917   LearningRate 0.0262   Epoch: 9   Global Step: 49400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:23,519-Speed 5465.17 samples/sec   Loss 6.2648   LearningRate 0.0262   Epoch: 9   Global Step: 49410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:25,353-Speed 5587.69 samples/sec   Loss 6.0264   LearningRate 0.0262   Epoch: 9   Global Step: 49420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:27,229-Speed 5463.62 samples/sec   Loss 6.0703   LearningRate 0.0261   Epoch: 9   Global Step: 49430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:29,080-Speed 5537.71 samples/sec   Loss 6.2600   LearningRate 0.0261   Epoch: 9   Global Step: 49440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:30,933-Speed 5526.98 samples/sec   Loss 6.3257   LearningRate 0.0261   Epoch: 9   Global Step: 49450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:32,789-Speed 5523.39 samples/sec   Loss 6.2989   LearningRate 0.0261   Epoch: 9   Global Step: 49460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:34,623-Speed 5584.98 samples/sec   Loss 6.2654   LearningRate 0.0261   Epoch: 9   Global Step: 49470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:36,464-Speed 5566.18 samples/sec   Loss 6.2278   LearningRate 0.0261   Epoch: 9   Global Step: 49480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:38,318-Speed 5526.39 samples/sec   Loss 6.1331   LearningRate 0.0261   Epoch: 9   Global Step: 49490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:40,182-Speed 5495.61 samples/sec   Loss 6.2759   LearningRate 0.0261   Epoch: 9   Global Step: 49500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:26:42,020-Speed 5575.97 samples/sec   Loss 6.1566   LearningRate 0.0261   Epoch: 9   Global Step: 49510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:43,885-Speed 5491.02 samples/sec   Loss 6.0749   LearningRate 0.0261   Epoch: 9   Global Step: 49520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:26:45,727-Speed 5562.58 samples/sec   Loss 6.1683   LearningRate 0.0260   Epoch: 9   Global Step: 49530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:26:47,588-Speed 5507.32 samples/sec   Loss 6.2760   LearningRate 0.0260   Epoch: 9   Global Step: 49540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:26:49,479-Speed 5416.59 samples/sec   Loss 6.2282   LearningRate 0.0260   Epoch: 9   Global Step: 49550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:26:51,328-Speed 5541.03 samples/sec   Loss 6.1117   LearningRate 0.0260   Epoch: 9   Global Step: 49560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:26:53,186-Speed 5517.04 samples/sec   Loss 6.2297   LearningRate 0.0260   Epoch: 9   Global Step: 49570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:26:55,039-Speed 5527.74 samples/sec   Loss 6.2075   LearningRate 0.0260   Epoch: 9   Global Step: 49580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:26:56,912-Speed 5471.66 samples/sec   Loss 6.2019   LearningRate 0.0260   Epoch: 9   Global Step: 49590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:26:58,748-Speed 5579.61 samples/sec   Loss 6.2103   LearningRate 0.0260   Epoch: 9   Global Step: 49600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:27:00,617-Speed 5482.84 samples/sec   Loss 6.1856   LearningRate 0.0260   Epoch: 9   Global Step: 49610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:27:02,461-Speed 5553.35 samples/sec   Loss 6.2881   LearningRate 0.0260   Epoch: 9   Global Step: 49620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:27:04,317-Speed 5520.52 samples/sec   Loss 6.1908   LearningRate 0.0259   Epoch: 9   Global Step: 49630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:27:06,154-Speed 5576.79 samples/sec   Loss 6.1562   LearningRate 0.0259   Epoch: 9   Global Step: 49640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:27:08,003-Speed 5541.18 samples/sec   Loss 6.1722   LearningRate 0.0259   Epoch: 9   Global Step: 49650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:27:09,839-Speed 5582.07 samples/sec   Loss 6.1825   LearningRate 0.0259   Epoch: 9   Global Step: 49660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:27:11,698-Speed 5509.74 samples/sec   Loss 6.2775   LearningRate 0.0259   Epoch: 9   Global Step: 49670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:27:13,564-Speed 5488.98 samples/sec   Loss 6.2617   LearningRate 0.0259   Epoch: 9   Global Step: 49680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:27:15,407-Speed 5558.67 samples/sec   Loss 6.2725   LearningRate 0.0259   Epoch: 9   Global Step: 49690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:27:17,265-Speed 5514.59 samples/sec   Loss 6.2229   LearningRate 0.0259   Epoch: 9   Global Step: 49700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:27:19,109-Speed 5556.64 samples/sec   Loss 6.1478   LearningRate 0.0259   Epoch: 9   Global Step: 49710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:27:20,947-Speed 5573.22 samples/sec   Loss 6.2601   LearningRate 0.0259   Epoch: 9   Global Step: 49720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:27:22,878-Speed 5305.82 samples/sec   Loss 6.1845   LearningRate 0.0258   Epoch: 9   Global Step: 49730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:27:24,736-Speed 5513.67 samples/sec   Loss 6.0668   LearningRate 0.0258   Epoch: 9   Global Step: 49740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:27:26,586-Speed 5539.83 samples/sec   Loss 6.1637   LearningRate 0.0258   Epoch: 9   Global Step: 49750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 13:27:28,435-Speed 5540.26 samples/sec   Loss 6.1445   LearningRate 0.0258   Epoch: 9   Global Step: 49760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 13:27:30,311-Speed 5462.82 samples/sec   Loss 6.2220   LearningRate 0.0258   Epoch: 9   Global Step: 49770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 13:27:32,157-Speed 5549.42 samples/sec   Loss 6.2151   LearningRate 0.0258   Epoch: 9   Global Step: 49780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 13:27:34,040-Speed 5441.86 samples/sec   Loss 6.2478   LearningRate 0.0258   Epoch: 9   Global Step: 49790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 13:27:35,878-Speed 5573.47 samples/sec   Loss 6.2491   LearningRate 0.0258   Epoch: 9   Global Step: 49800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 13:27:37,730-Speed 5529.16 samples/sec   Loss 6.1079   LearningRate 0.0258   Epoch: 9   Global Step: 49810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 13:27:39,595-Speed 5495.73 samples/sec   Loss 6.0489   LearningRate 0.0258   Epoch: 9   Global Step: 49820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 13:27:41,430-Speed 5582.02 samples/sec   Loss 6.2511   LearningRate 0.0257   Epoch: 9   Global Step: 49830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 13:27:43,312-Speed 5445.33 samples/sec   Loss 6.1031   LearningRate 0.0257   Epoch: 9   Global Step: 49840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 13:27:45,151-Speed 5572.13 samples/sec   Loss 6.2430   LearningRate 0.0257   Epoch: 9   Global Step: 49850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:27:46,990-Speed 5569.05 samples/sec   Loss 6.2733   LearningRate 0.0257   Epoch: 9   Global Step: 49860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:27:48,875-Speed 5436.25 samples/sec   Loss 6.2958   LearningRate 0.0257   Epoch: 9   Global Step: 49870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:27:50,715-Speed 5566.16 samples/sec   Loss 6.2158   LearningRate 0.0257   Epoch: 9   Global Step: 49880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:27:52,570-Speed 5522.64 samples/sec   Loss 6.2029   LearningRate 0.0257   Epoch: 9   Global Step: 49890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:27:54,436-Speed 5491.18 samples/sec   Loss 6.2440   LearningRate 0.0257   Epoch: 9   Global Step: 49900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:27:56,276-Speed 5570.03 samples/sec   Loss 6.1851   LearningRate 0.0257   Epoch: 9   Global Step: 49910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:27:58,109-Speed 5589.61 samples/sec   Loss 6.1899   LearningRate 0.0257   Epoch: 9   Global Step: 49920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:27:59,967-Speed 5513.08 samples/sec   Loss 6.2295   LearningRate 0.0256   Epoch: 9   Global Step: 49930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:28:01,808-Speed 5564.97 samples/sec   Loss 6.0523   LearningRate 0.0256   Epoch: 9   Global Step: 49940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:28:03,662-Speed 5529.05 samples/sec   Loss 6.0422   LearningRate 0.0256   Epoch: 9   Global Step: 49950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:28:05,524-Speed 5501.34 samples/sec   Loss 6.1570   LearningRate 0.0256   Epoch: 9   Global Step: 49960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:28:07,365-Speed 5565.54 samples/sec   Loss 6.1818   LearningRate 0.0256   Epoch: 9   Global Step: 49970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:28:09,215-Speed 5541.08 samples/sec   Loss 6.1866   LearningRate 0.0256   Epoch: 9   Global Step: 49980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:28:11,054-Speed 5568.90 samples/sec   Loss 5.9426   LearningRate 0.0256   Epoch: 9   Global Step: 49990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:28:12,921-Speed 5489.01 samples/sec   Loss 6.1781   LearningRate 0.0256   Epoch: 9   Global Step: 50000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:28:40,212-[lfw][50000]XNorm: 23.300236
Training: 2022-04-11 13:28:40,213-[lfw][50000]Accuracy-Flip: 0.99817+-0.00189
Training: 2022-04-11 13:28:40,213-[lfw][50000]Accuracy-Highest: 0.99817
Training: 2022-04-11 13:29:11,595-[cfp_fp][50000]XNorm: 20.925915
Training: 2022-04-11 13:29:11,600-[cfp_fp][50000]Accuracy-Flip: 0.97029+-0.00825
Training: 2022-04-11 13:29:11,601-[cfp_fp][50000]Accuracy-Highest: 0.97029
Training: 2022-04-11 13:29:38,747-[agedb_30][50000]XNorm: 23.270061
Training: 2022-04-11 13:29:38,748-[agedb_30][50000]Accuracy-Flip: 0.97700+-0.00657
Training: 2022-04-11 13:29:38,748-[agedb_30][50000]Accuracy-Highest: 0.97817
Training: 2022-04-11 13:29:40,602-Speed 116.79 samples/sec   Loss 6.0598   LearningRate 0.0256   Epoch: 9   Global Step: 50010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:29:42,441-Speed 5568.36 samples/sec   Loss 6.0536   LearningRate 0.0256   Epoch: 9   Global Step: 50020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:29:44,288-Speed 5546.47 samples/sec   Loss 6.1205   LearningRate 0.0255   Epoch: 9   Global Step: 50030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:29:46,118-Speed 5599.42 samples/sec   Loss 6.1326   LearningRate 0.0255   Epoch: 9   Global Step: 50040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:29:47,968-Speed 5537.71 samples/sec   Loss 6.0668   LearningRate 0.0255   Epoch: 9   Global Step: 50050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:29:49,812-Speed 5558.90 samples/sec   Loss 6.1322   LearningRate 0.0255   Epoch: 9   Global Step: 50060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:29:51,690-Speed 5454.69 samples/sec   Loss 6.0573   LearningRate 0.0255   Epoch: 9   Global Step: 50070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:29:53,543-Speed 5530.87 samples/sec   Loss 6.1183   LearningRate 0.0255   Epoch: 9   Global Step: 50080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:29:55,394-Speed 5537.11 samples/sec   Loss 6.1477   LearningRate 0.0255   Epoch: 9   Global Step: 50090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:29:57,238-Speed 5553.49 samples/sec   Loss 5.9689   LearningRate 0.0255   Epoch: 9   Global Step: 50100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:29:59,095-Speed 5520.46 samples/sec   Loss 6.1289   LearningRate 0.0255   Epoch: 9   Global Step: 50110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:00,930-Speed 5583.24 samples/sec   Loss 6.1445   LearningRate 0.0255   Epoch: 9   Global Step: 50120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:02,779-Speed 5539.74 samples/sec   Loss 5.8781   LearningRate 0.0254   Epoch: 9   Global Step: 50130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:30:04,645-Speed 5492.77 samples/sec   Loss 6.2205   LearningRate 0.0254   Epoch: 9   Global Step: 50140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:30:06,488-Speed 5556.98 samples/sec   Loss 6.2336   LearningRate 0.0254   Epoch: 9   Global Step: 50150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:30:08,324-Speed 5578.63 samples/sec   Loss 6.2484   LearningRate 0.0254   Epoch: 9   Global Step: 50160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:30:10,164-Speed 5568.31 samples/sec   Loss 6.0911   LearningRate 0.0254   Epoch: 9   Global Step: 50170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:30:12,012-Speed 5544.12 samples/sec   Loss 6.1867   LearningRate 0.0254   Epoch: 9   Global Step: 50180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:30:13,862-Speed 5539.39 samples/sec   Loss 6.1651   LearningRate 0.0254   Epoch: 9   Global Step: 50190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:30:15,713-Speed 5535.32 samples/sec   Loss 6.3055   LearningRate 0.0254   Epoch: 9   Global Step: 50200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:30:17,568-Speed 5523.39 samples/sec   Loss 6.2510   LearningRate 0.0254   Epoch: 9   Global Step: 50210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:30:19,434-Speed 5489.32 samples/sec   Loss 6.0756   LearningRate 0.0254   Epoch: 9   Global Step: 50220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:30:21,269-Speed 5584.30 samples/sec   Loss 6.1100   LearningRate 0.0253   Epoch: 9   Global Step: 50230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:23,118-Speed 5541.07 samples/sec   Loss 6.3820   LearningRate 0.0253   Epoch: 9   Global Step: 50240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:24,953-Speed 5583.60 samples/sec   Loss 6.2381   LearningRate 0.0253   Epoch: 9   Global Step: 50250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:26,795-Speed 5561.34 samples/sec   Loss 6.1444   LearningRate 0.0253   Epoch: 9   Global Step: 50260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:28,649-Speed 5524.81 samples/sec   Loss 5.9129   LearningRate 0.0253   Epoch: 9   Global Step: 50270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:30,491-Speed 5563.85 samples/sec   Loss 6.0996   LearningRate 0.0253   Epoch: 9   Global Step: 50280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:32,348-Speed 5515.98 samples/sec   Loss 6.0883   LearningRate 0.0253   Epoch: 9   Global Step: 50290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:34,205-Speed 5518.63 samples/sec   Loss 6.2804   LearningRate 0.0253   Epoch: 9   Global Step: 50300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:36,067-Speed 5503.41 samples/sec   Loss 6.2622   LearningRate 0.0253   Epoch: 9   Global Step: 50310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:37,916-Speed 5540.55 samples/sec   Loss 6.0884   LearningRate 0.0253   Epoch: 9   Global Step: 50320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:39,772-Speed 5520.31 samples/sec   Loss 6.1825   LearningRate 0.0252   Epoch: 9   Global Step: 50330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:41,636-Speed 5494.46 samples/sec   Loss 6.0882   LearningRate 0.0252   Epoch: 9   Global Step: 50340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:43,475-Speed 5571.35 samples/sec   Loss 6.1434   LearningRate 0.0252   Epoch: 9   Global Step: 50350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:45,340-Speed 5495.30 samples/sec   Loss 6.1913   LearningRate 0.0252   Epoch: 9   Global Step: 50360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:47,174-Speed 5584.76 samples/sec   Loss 6.1601   LearningRate 0.0252   Epoch: 9   Global Step: 50370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:49,022-Speed 5546.20 samples/sec   Loss 6.0693   LearningRate 0.0252   Epoch: 9   Global Step: 50380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:50,877-Speed 5521.24 samples/sec   Loss 6.2961   LearningRate 0.0252   Epoch: 9   Global Step: 50390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:52,739-Speed 5503.63 samples/sec   Loss 6.1852   LearningRate 0.0252   Epoch: 9   Global Step: 50400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:54,590-Speed 5537.26 samples/sec   Loss 6.0505   LearningRate 0.0252   Epoch: 9   Global Step: 50410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:56,428-Speed 5572.12 samples/sec   Loss 6.1867   LearningRate 0.0252   Epoch: 9   Global Step: 50420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:30:58,286-Speed 5515.13 samples/sec   Loss 6.1112   LearningRate 0.0251   Epoch: 9   Global Step: 50430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:00,134-Speed 5545.13 samples/sec   Loss 6.2366   LearningRate 0.0251   Epoch: 9   Global Step: 50440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:01,997-Speed 5499.38 samples/sec   Loss 6.1395   LearningRate 0.0251   Epoch: 9   Global Step: 50450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:03,846-Speed 5541.69 samples/sec   Loss 6.0517   LearningRate 0.0251   Epoch: 9   Global Step: 50460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:05,687-Speed 5563.72 samples/sec   Loss 6.2294   LearningRate 0.0251   Epoch: 9   Global Step: 50470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:07,532-Speed 5553.54 samples/sec   Loss 6.0274   LearningRate 0.0251   Epoch: 9   Global Step: 50480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:09,384-Speed 5533.75 samples/sec   Loss 5.9700   LearningRate 0.0251   Epoch: 9   Global Step: 50490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:11,244-Speed 5507.49 samples/sec   Loss 6.1538   LearningRate 0.0251   Epoch: 9   Global Step: 50500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:13,105-Speed 5506.58 samples/sec   Loss 6.0692   LearningRate 0.0251   Epoch: 9   Global Step: 50510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:14,958-Speed 5529.41 samples/sec   Loss 6.1825   LearningRate 0.0251   Epoch: 9   Global Step: 50520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:16,829-Speed 5474.46 samples/sec   Loss 6.0899   LearningRate 0.0250   Epoch: 9   Global Step: 50530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:31:18,660-Speed 5595.31 samples/sec   Loss 5.9258   LearningRate 0.0250   Epoch: 9   Global Step: 50540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:20,516-Speed 5521.67 samples/sec   Loss 6.0693   LearningRate 0.0250   Epoch: 9   Global Step: 50550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:22,374-Speed 5513.83 samples/sec   Loss 6.2033   LearningRate 0.0250   Epoch: 9   Global Step: 50560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:24,297-Speed 5330.51 samples/sec   Loss 6.1424   LearningRate 0.0250   Epoch: 9   Global Step: 50570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:26,107-Speed 5660.13 samples/sec   Loss 6.1475   LearningRate 0.0250   Epoch: 9   Global Step: 50580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:37,616-Speed 889.84 samples/sec   Loss 5.1303   LearningRate 0.0250   Epoch: 10   Global Step: 50590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:39,598-Speed 5170.69 samples/sec   Loss 5.2095   LearningRate 0.0250   Epoch: 10   Global Step: 50600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:41,460-Speed 5502.36 samples/sec   Loss 5.2876   LearningRate 0.0250   Epoch: 10   Global Step: 50610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:43,310-Speed 5538.54 samples/sec   Loss 5.3400   LearningRate 0.0250   Epoch: 10   Global Step: 50620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:45,146-Speed 5579.16 samples/sec   Loss 5.1747   LearningRate 0.0250   Epoch: 10   Global Step: 50630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:47,016-Speed 5479.60 samples/sec   Loss 5.2742   LearningRate 0.0249   Epoch: 10   Global Step: 50640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:48,861-Speed 5551.52 samples/sec   Loss 5.2701   LearningRate 0.0249   Epoch: 10   Global Step: 50650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:50,730-Speed 5483.09 samples/sec   Loss 5.2850   LearningRate 0.0249   Epoch: 10   Global Step: 50660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:52,627-Speed 5399.76 samples/sec   Loss 5.3410   LearningRate 0.0249   Epoch: 10   Global Step: 50670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:54,463-Speed 5578.34 samples/sec   Loss 5.3285   LearningRate 0.0249   Epoch: 10   Global Step: 50680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:56,309-Speed 5551.99 samples/sec   Loss 5.2325   LearningRate 0.0249   Epoch: 10   Global Step: 50690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:31:58,166-Speed 5516.31 samples/sec   Loss 5.1572   LearningRate 0.0249   Epoch: 10   Global Step: 50700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:00,007-Speed 5564.93 samples/sec   Loss 5.2718   LearningRate 0.0249   Epoch: 10   Global Step: 50710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:01,916-Speed 5366.84 samples/sec   Loss 5.4409   LearningRate 0.0249   Epoch: 10   Global Step: 50720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:03,817-Speed 5387.75 samples/sec   Loss 5.3327   LearningRate 0.0249   Epoch: 10   Global Step: 50730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:05,699-Speed 5444.81 samples/sec   Loss 5.4437   LearningRate 0.0248   Epoch: 10   Global Step: 50740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:32:07,561-Speed 5503.53 samples/sec   Loss 5.4280   LearningRate 0.0248   Epoch: 10   Global Step: 50750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:09,406-Speed 5552.70 samples/sec   Loss 5.3106   LearningRate 0.0248   Epoch: 10   Global Step: 50760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:11,246-Speed 5568.62 samples/sec   Loss 5.3059   LearningRate 0.0248   Epoch: 10   Global Step: 50770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:13,114-Speed 5482.31 samples/sec   Loss 5.3312   LearningRate 0.0248   Epoch: 10   Global Step: 50780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:14,989-Speed 5466.05 samples/sec   Loss 5.4414   LearningRate 0.0248   Epoch: 10   Global Step: 50790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:16,857-Speed 5483.34 samples/sec   Loss 5.4515   LearningRate 0.0248   Epoch: 10   Global Step: 50800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:18,701-Speed 5557.82 samples/sec   Loss 5.4682   LearningRate 0.0248   Epoch: 10   Global Step: 50810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:20,542-Speed 5564.33 samples/sec   Loss 5.5753   LearningRate 0.0248   Epoch: 10   Global Step: 50820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:22,393-Speed 5535.04 samples/sec   Loss 5.4622   LearningRate 0.0248   Epoch: 10   Global Step: 50830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:24,249-Speed 5520.59 samples/sec   Loss 5.4283   LearningRate 0.0247   Epoch: 10   Global Step: 50840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:26,087-Speed 5574.83 samples/sec   Loss 5.3438   LearningRate 0.0247   Epoch: 10   Global Step: 50850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:27,979-Speed 5412.85 samples/sec   Loss 5.3519   LearningRate 0.0247   Epoch: 10   Global Step: 50860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:29,828-Speed 5542.15 samples/sec   Loss 5.3885   LearningRate 0.0247   Epoch: 10   Global Step: 50870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:31,694-Speed 5489.83 samples/sec   Loss 5.4447   LearningRate 0.0247   Epoch: 10   Global Step: 50880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:33,547-Speed 5531.10 samples/sec   Loss 5.5420   LearningRate 0.0247   Epoch: 10   Global Step: 50890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:35,414-Speed 5487.60 samples/sec   Loss 5.5057   LearningRate 0.0247   Epoch: 10   Global Step: 50900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:37,258-Speed 5558.69 samples/sec   Loss 5.3686   LearningRate 0.0247   Epoch: 10   Global Step: 50910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:39,104-Speed 5550.20 samples/sec   Loss 5.4240   LearningRate 0.0247   Epoch: 10   Global Step: 50920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:40,984-Speed 5448.37 samples/sec   Loss 5.3697   LearningRate 0.0247   Epoch: 10   Global Step: 50930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:42,844-Speed 5509.39 samples/sec   Loss 5.5471   LearningRate 0.0246   Epoch: 10   Global Step: 50940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:44,687-Speed 5558.26 samples/sec   Loss 5.4601   LearningRate 0.0246   Epoch: 10   Global Step: 50950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:32:46,536-Speed 5542.40 samples/sec   Loss 5.4826   LearningRate 0.0246   Epoch: 10   Global Step: 50960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:48,387-Speed 5534.47 samples/sec   Loss 5.3353   LearningRate 0.0246   Epoch: 10   Global Step: 50970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:50,240-Speed 5532.00 samples/sec   Loss 5.3963   LearningRate 0.0246   Epoch: 10   Global Step: 50980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:52,146-Speed 5374.87 samples/sec   Loss 5.6534   LearningRate 0.0246   Epoch: 10   Global Step: 50990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:54,002-Speed 5521.59 samples/sec   Loss 5.4745   LearningRate 0.0246   Epoch: 10   Global Step: 51000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:55,857-Speed 5524.05 samples/sec   Loss 5.7287   LearningRate 0.0246   Epoch: 10   Global Step: 51010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:57,714-Speed 5514.81 samples/sec   Loss 5.5916   LearningRate 0.0246   Epoch: 10   Global Step: 51020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:32:59,570-Speed 5522.02 samples/sec   Loss 5.6224   LearningRate 0.0246   Epoch: 10   Global Step: 51030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:01,454-Speed 5436.01 samples/sec   Loss 5.5170   LearningRate 0.0245   Epoch: 10   Global Step: 51040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:03,313-Speed 5512.56 samples/sec   Loss 5.5078   LearningRate 0.0245   Epoch: 10   Global Step: 51050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:05,146-Speed 5588.86 samples/sec   Loss 5.6445   LearningRate 0.0245   Epoch: 10   Global Step: 51060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:07,041-Speed 5406.14 samples/sec   Loss 5.3973   LearningRate 0.0245   Epoch: 10   Global Step: 51070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:08,906-Speed 5494.17 samples/sec   Loss 5.5091   LearningRate 0.0245   Epoch: 10   Global Step: 51080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:10,759-Speed 5531.33 samples/sec   Loss 5.5171   LearningRate 0.0245   Epoch: 10   Global Step: 51090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:12,636-Speed 5456.95 samples/sec   Loss 5.4912   LearningRate 0.0245   Epoch: 10   Global Step: 51100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:14,517-Speed 5447.43 samples/sec   Loss 5.3980   LearningRate 0.0245   Epoch: 10   Global Step: 51110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:16,376-Speed 5511.63 samples/sec   Loss 5.4368   LearningRate 0.0245   Epoch: 10   Global Step: 51120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:18,223-Speed 5548.31 samples/sec   Loss 5.5695   LearningRate 0.0245   Epoch: 10   Global Step: 51130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:20,089-Speed 5490.17 samples/sec   Loss 5.7616   LearningRate 0.0244   Epoch: 10   Global Step: 51140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:21,926-Speed 5577.11 samples/sec   Loss 5.4973   LearningRate 0.0244   Epoch: 10   Global Step: 51150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:23,756-Speed 5597.54 samples/sec   Loss 5.6054   LearningRate 0.0244   Epoch: 10   Global Step: 51160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:25,613-Speed 5516.63 samples/sec   Loss 5.6486   LearningRate 0.0244   Epoch: 10   Global Step: 51170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:27,484-Speed 5476.38 samples/sec   Loss 5.5849   LearningRate 0.0244   Epoch: 10   Global Step: 51180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:29,326-Speed 5562.10 samples/sec   Loss 5.7270   LearningRate 0.0244   Epoch: 10   Global Step: 51190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:31,181-Speed 5524.92 samples/sec   Loss 5.6926   LearningRate 0.0244   Epoch: 10   Global Step: 51200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:33,037-Speed 5518.31 samples/sec   Loss 5.5584   LearningRate 0.0244   Epoch: 10   Global Step: 51210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:34,896-Speed 5513.47 samples/sec   Loss 5.6899   LearningRate 0.0244   Epoch: 10   Global Step: 51220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:36,780-Speed 5438.35 samples/sec   Loss 5.6838   LearningRate 0.0244   Epoch: 10   Global Step: 51230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:38,623-Speed 5555.71 samples/sec   Loss 5.8815   LearningRate 0.0244   Epoch: 10   Global Step: 51240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:40,513-Speed 5422.05 samples/sec   Loss 5.7593   LearningRate 0.0243   Epoch: 10   Global Step: 51250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:42,359-Speed 5548.82 samples/sec   Loss 5.7256   LearningRate 0.0243   Epoch: 10   Global Step: 51260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:44,230-Speed 5479.69 samples/sec   Loss 5.6466   LearningRate 0.0243   Epoch: 10   Global Step: 51270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:46,077-Speed 5544.80 samples/sec   Loss 5.6143   LearningRate 0.0243   Epoch: 10   Global Step: 51280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:47,928-Speed 5536.78 samples/sec   Loss 5.5550   LearningRate 0.0243   Epoch: 10   Global Step: 51290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:49,793-Speed 5493.30 samples/sec   Loss 5.5994   LearningRate 0.0243   Epoch: 10   Global Step: 51300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:51,658-Speed 5495.00 samples/sec   Loss 5.5051   LearningRate 0.0243   Epoch: 10   Global Step: 51310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:53,517-Speed 5511.69 samples/sec   Loss 5.5752   LearningRate 0.0243   Epoch: 10   Global Step: 51320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:55,363-Speed 5551.97 samples/sec   Loss 5.6289   LearningRate 0.0243   Epoch: 10   Global Step: 51330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:57,201-Speed 5574.33 samples/sec   Loss 5.6971   LearningRate 0.0243   Epoch: 10   Global Step: 51340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:33:59,081-Speed 5450.41 samples/sec   Loss 5.7905   LearningRate 0.0242   Epoch: 10   Global Step: 51350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:34:00,923-Speed 5563.58 samples/sec   Loss 5.6365   LearningRate 0.0242   Epoch: 10   Global Step: 51360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:34:02,783-Speed 5506.44 samples/sec   Loss 5.7226   LearningRate 0.0242   Epoch: 10   Global Step: 51370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:34:04,629-Speed 5553.01 samples/sec   Loss 5.6789   LearningRate 0.0242   Epoch: 10   Global Step: 51380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:34:06,473-Speed 5555.08 samples/sec   Loss 5.7514   LearningRate 0.0242   Epoch: 10   Global Step: 51390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:34:08,309-Speed 5578.91 samples/sec   Loss 5.6872   LearningRate 0.0242   Epoch: 10   Global Step: 51400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:34:10,170-Speed 5506.78 samples/sec   Loss 5.5221   LearningRate 0.0242   Epoch: 10   Global Step: 51410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:34:12,023-Speed 5527.61 samples/sec   Loss 5.6558   LearningRate 0.0242   Epoch: 10   Global Step: 51420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:34:13,862-Speed 5570.58 samples/sec   Loss 5.7777   LearningRate 0.0242   Epoch: 10   Global Step: 51430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:34:15,697-Speed 5583.56 samples/sec   Loss 5.6372   LearningRate 0.0242   Epoch: 10   Global Step: 51440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:17,544-Speed 5548.16 samples/sec   Loss 5.7446   LearningRate 0.0241   Epoch: 10   Global Step: 51450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:19,400-Speed 5519.32 samples/sec   Loss 5.7353   LearningRate 0.0241   Epoch: 10   Global Step: 51460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:21,234-Speed 5585.57 samples/sec   Loss 5.8049   LearningRate 0.0241   Epoch: 10   Global Step: 51470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:23,078-Speed 5556.21 samples/sec   Loss 5.6780   LearningRate 0.0241   Epoch: 10   Global Step: 51480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:24,949-Speed 5476.93 samples/sec   Loss 5.7400   LearningRate 0.0241   Epoch: 10   Global Step: 51490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:26,784-Speed 5583.13 samples/sec   Loss 5.6257   LearningRate 0.0241   Epoch: 10   Global Step: 51500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:28,634-Speed 5538.87 samples/sec   Loss 5.7227   LearningRate 0.0241   Epoch: 10   Global Step: 51510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:30,495-Speed 5504.39 samples/sec   Loss 5.7913   LearningRate 0.0241   Epoch: 10   Global Step: 51520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:32,363-Speed 5484.28 samples/sec   Loss 5.6528   LearningRate 0.0241   Epoch: 10   Global Step: 51530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:34,207-Speed 5556.96 samples/sec   Loss 5.6754   LearningRate 0.0241   Epoch: 10   Global Step: 51540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:34:36,056-Speed 5541.61 samples/sec   Loss 5.7359   LearningRate 0.0241   Epoch: 10   Global Step: 51550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:34:37,896-Speed 5565.54 samples/sec   Loss 5.6180   LearningRate 0.0240   Epoch: 10   Global Step: 51560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:34:39,780-Speed 5441.18 samples/sec   Loss 5.5339   LearningRate 0.0240   Epoch: 10   Global Step: 51570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:34:41,623-Speed 5558.64 samples/sec   Loss 5.7339   LearningRate 0.0240   Epoch: 10   Global Step: 51580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:34:43,493-Speed 5477.51 samples/sec   Loss 5.7482   LearningRate 0.0240   Epoch: 10   Global Step: 51590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:34:45,338-Speed 5553.19 samples/sec   Loss 5.6550   LearningRate 0.0240   Epoch: 10   Global Step: 51600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:47,208-Speed 5479.04 samples/sec   Loss 5.5799   LearningRate 0.0240   Epoch: 10   Global Step: 51610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:49,043-Speed 5583.63 samples/sec   Loss 5.5659   LearningRate 0.0240   Epoch: 10   Global Step: 51620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:50,884-Speed 5563.28 samples/sec   Loss 5.7024   LearningRate 0.0240   Epoch: 10   Global Step: 51630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:52,725-Speed 5565.03 samples/sec   Loss 5.7303   LearningRate 0.0240   Epoch: 10   Global Step: 51640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:54,602-Speed 5459.14 samples/sec   Loss 5.6975   LearningRate 0.0240   Epoch: 10   Global Step: 51650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:56,450-Speed 5543.49 samples/sec   Loss 5.5216   LearningRate 0.0239   Epoch: 10   Global Step: 51660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:34:58,293-Speed 5561.50 samples/sec   Loss 5.6110   LearningRate 0.0239   Epoch: 10   Global Step: 51670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:35:00,146-Speed 5529.56 samples/sec   Loss 5.8459   LearningRate 0.0239   Epoch: 10   Global Step: 51680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:35:02,000-Speed 5524.85 samples/sec   Loss 5.7419   LearningRate 0.0239   Epoch: 10   Global Step: 51690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:35:03,856-Speed 5519.35 samples/sec   Loss 5.7849   LearningRate 0.0239   Epoch: 10   Global Step: 51700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:05,764-Speed 5370.58 samples/sec   Loss 5.8097   LearningRate 0.0239   Epoch: 10   Global Step: 51710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:07,617-Speed 5529.96 samples/sec   Loss 5.7243   LearningRate 0.0239   Epoch: 10   Global Step: 51720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:09,474-Speed 5518.92 samples/sec   Loss 5.7829   LearningRate 0.0239   Epoch: 10   Global Step: 51730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:11,313-Speed 5569.05 samples/sec   Loss 5.7626   LearningRate 0.0239   Epoch: 10   Global Step: 51740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:13,180-Speed 5488.10 samples/sec   Loss 5.7750   LearningRate 0.0239   Epoch: 10   Global Step: 51750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:15,047-Speed 5489.26 samples/sec   Loss 5.7927   LearningRate 0.0238   Epoch: 10   Global Step: 51760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:16,892-Speed 5552.52 samples/sec   Loss 5.7847   LearningRate 0.0238   Epoch: 10   Global Step: 51770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:18,734-Speed 5561.44 samples/sec   Loss 5.7081   LearningRate 0.0238   Epoch: 10   Global Step: 51780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:20,582-Speed 5544.70 samples/sec   Loss 5.8778   LearningRate 0.0238   Epoch: 10   Global Step: 51790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:22,442-Speed 5506.63 samples/sec   Loss 5.7472   LearningRate 0.0238   Epoch: 10   Global Step: 51800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:24,289-Speed 5548.54 samples/sec   Loss 5.7271   LearningRate 0.0238   Epoch: 10   Global Step: 51810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:26,131-Speed 5559.45 samples/sec   Loss 5.6983   LearningRate 0.0238   Epoch: 10   Global Step: 51820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:27,969-Speed 5573.85 samples/sec   Loss 5.7607   LearningRate 0.0238   Epoch: 10   Global Step: 51830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:29,815-Speed 5548.64 samples/sec   Loss 5.6435   LearningRate 0.0238   Epoch: 10   Global Step: 51840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:31,655-Speed 5568.92 samples/sec   Loss 5.6690   LearningRate 0.0238   Epoch: 10   Global Step: 51850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:33,512-Speed 5517.26 samples/sec   Loss 5.8676   LearningRate 0.0238   Epoch: 10   Global Step: 51860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:35,367-Speed 5524.27 samples/sec   Loss 5.7130   LearningRate 0.0237   Epoch: 10   Global Step: 51870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:37,228-Speed 5505.42 samples/sec   Loss 5.7868   LearningRate 0.0237   Epoch: 10   Global Step: 51880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:39,080-Speed 5530.12 samples/sec   Loss 5.7549   LearningRate 0.0237   Epoch: 10   Global Step: 51890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:40,928-Speed 5546.71 samples/sec   Loss 5.7271   LearningRate 0.0237   Epoch: 10   Global Step: 51900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:35:42,773-Speed 5550.93 samples/sec   Loss 5.7043   LearningRate 0.0237   Epoch: 10   Global Step: 51910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:44,628-Speed 5522.85 samples/sec   Loss 5.7400   LearningRate 0.0237   Epoch: 10   Global Step: 51920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:46,470-Speed 5563.32 samples/sec   Loss 5.7138   LearningRate 0.0237   Epoch: 10   Global Step: 51930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:48,315-Speed 5552.46 samples/sec   Loss 5.7402   LearningRate 0.0237   Epoch: 10   Global Step: 51940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:50,167-Speed 5530.45 samples/sec   Loss 5.8021   LearningRate 0.0237   Epoch: 10   Global Step: 51950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:52,012-Speed 5553.22 samples/sec   Loss 5.8820   LearningRate 0.0237   Epoch: 10   Global Step: 51960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:53,856-Speed 5555.75 samples/sec   Loss 5.8749   LearningRate 0.0236   Epoch: 10   Global Step: 51970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:55,707-Speed 5535.87 samples/sec   Loss 5.8083   LearningRate 0.0236   Epoch: 10   Global Step: 51980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:57,582-Speed 5463.59 samples/sec   Loss 5.6832   LearningRate 0.0236   Epoch: 10   Global Step: 51990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:35:59,427-Speed 5552.58 samples/sec   Loss 5.6450   LearningRate 0.0236   Epoch: 10   Global Step: 52000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:36:26,568-[lfw][52000]XNorm: 21.797578
Training: 2022-04-11 13:36:26,569-[lfw][52000]Accuracy-Flip: 0.99717+-0.00334
Training: 2022-04-11 13:36:26,569-[lfw][52000]Accuracy-Highest: 0.99817
Training: 2022-04-11 13:36:58,181-[cfp_fp][52000]XNorm: 18.944493
Training: 2022-04-11 13:36:58,182-[cfp_fp][52000]Accuracy-Flip: 0.96586+-0.00839
Training: 2022-04-11 13:36:58,183-[cfp_fp][52000]Accuracy-Highest: 0.97029
Training: 2022-04-11 13:37:25,405-[agedb_30][52000]XNorm: 21.262291
Training: 2022-04-11 13:37:25,405-[agedb_30][52000]Accuracy-Flip: 0.97383+-0.00695
Training: 2022-04-11 13:37:25,406-[agedb_30][52000]Accuracy-Highest: 0.97817
Training: 2022-04-11 13:37:27,260-Speed 116.59 samples/sec   Loss 5.6838   LearningRate 0.0236   Epoch: 10   Global Step: 52010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:37:29,083-Speed 5618.94 samples/sec   Loss 5.8508   LearningRate 0.0236   Epoch: 10   Global Step: 52020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:37:30,913-Speed 5599.62 samples/sec   Loss 5.6542   LearningRate 0.0236   Epoch: 10   Global Step: 52030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:37:32,744-Speed 5593.19 samples/sec   Loss 5.7509   LearningRate 0.0236   Epoch: 10   Global Step: 52040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:37:34,579-Speed 5582.47 samples/sec   Loss 5.6976   LearningRate 0.0236   Epoch: 10   Global Step: 52050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:37:36,418-Speed 5573.54 samples/sec   Loss 5.8405   LearningRate 0.0236   Epoch: 10   Global Step: 52060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:37:38,250-Speed 5590.22 samples/sec   Loss 5.7805   LearningRate 0.0235   Epoch: 10   Global Step: 52070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:37:40,105-Speed 5524.00 samples/sec   Loss 5.8060   LearningRate 0.0235   Epoch: 10   Global Step: 52080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:37:41,949-Speed 5559.36 samples/sec   Loss 5.7258   LearningRate 0.0235   Epoch: 10   Global Step: 52090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:37:43,793-Speed 5556.38 samples/sec   Loss 5.7463   LearningRate 0.0235   Epoch: 10   Global Step: 52100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:37:45,660-Speed 5485.87 samples/sec   Loss 5.7722   LearningRate 0.0235   Epoch: 10   Global Step: 52110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:37:47,497-Speed 5577.00 samples/sec   Loss 5.6701   LearningRate 0.0235   Epoch: 10   Global Step: 52120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:37:49,344-Speed 5547.83 samples/sec   Loss 5.6989   LearningRate 0.0235   Epoch: 10   Global Step: 52130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:37:51,259-Speed 5351.57 samples/sec   Loss 5.7983   LearningRate 0.0235   Epoch: 10   Global Step: 52140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:37:53,092-Speed 5592.30 samples/sec   Loss 5.7580   LearningRate 0.0235   Epoch: 10   Global Step: 52150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:37:54,926-Speed 5583.59 samples/sec   Loss 5.7040   LearningRate 0.0235   Epoch: 10   Global Step: 52160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:37:56,780-Speed 5526.70 samples/sec   Loss 5.7102   LearningRate 0.0235   Epoch: 10   Global Step: 52170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:37:58,613-Speed 5591.15 samples/sec   Loss 5.7459   LearningRate 0.0234   Epoch: 10   Global Step: 52180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:00,468-Speed 5523.23 samples/sec   Loss 5.8408   LearningRate 0.0234   Epoch: 10   Global Step: 52190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:02,325-Speed 5515.62 samples/sec   Loss 5.8196   LearningRate 0.0234   Epoch: 10   Global Step: 52200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:04,162-Speed 5577.33 samples/sec   Loss 5.6866   LearningRate 0.0234   Epoch: 10   Global Step: 52210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:06,018-Speed 5522.27 samples/sec   Loss 5.7341   LearningRate 0.0234   Epoch: 10   Global Step: 52220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:07,854-Speed 5578.48 samples/sec   Loss 5.8757   LearningRate 0.0234   Epoch: 10   Global Step: 52230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:09,724-Speed 5479.89 samples/sec   Loss 5.7715   LearningRate 0.0234   Epoch: 10   Global Step: 52240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:11,565-Speed 5566.16 samples/sec   Loss 5.6527   LearningRate 0.0234   Epoch: 10   Global Step: 52250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:13,428-Speed 5499.19 samples/sec   Loss 5.7002   LearningRate 0.0234   Epoch: 10   Global Step: 52260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:15,346-Speed 5342.61 samples/sec   Loss 5.8580   LearningRate 0.0234   Epoch: 10   Global Step: 52270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:17,251-Speed 5378.57 samples/sec   Loss 5.7196   LearningRate 0.0233   Epoch: 10   Global Step: 52280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:19,100-Speed 5541.32 samples/sec   Loss 5.7879   LearningRate 0.0233   Epoch: 10   Global Step: 52290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:20,943-Speed 5557.99 samples/sec   Loss 5.7888   LearningRate 0.0233   Epoch: 10   Global Step: 52300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:22,813-Speed 5479.75 samples/sec   Loss 5.6608   LearningRate 0.0233   Epoch: 10   Global Step: 52310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:24,664-Speed 5532.91 samples/sec   Loss 5.7650   LearningRate 0.0233   Epoch: 10   Global Step: 52320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:38:26,522-Speed 5513.22 samples/sec   Loss 5.7522   LearningRate 0.0233   Epoch: 10   Global Step: 52330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:28,376-Speed 5526.18 samples/sec   Loss 5.7068   LearningRate 0.0233   Epoch: 10   Global Step: 52340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:38:30,214-Speed 5574.63 samples/sec   Loss 5.7362   LearningRate 0.0233   Epoch: 10   Global Step: 52350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:38:32,063-Speed 5542.53 samples/sec   Loss 5.8470   LearningRate 0.0233   Epoch: 10   Global Step: 52360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:38:33,895-Speed 5592.34 samples/sec   Loss 5.7533   LearningRate 0.0233   Epoch: 10   Global Step: 52370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:38:35,744-Speed 5541.82 samples/sec   Loss 5.7281   LearningRate 0.0233   Epoch: 10   Global Step: 52380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:38:37,614-Speed 5478.76 samples/sec   Loss 5.5591   LearningRate 0.0232   Epoch: 10   Global Step: 52390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:38:39,462-Speed 5541.76 samples/sec   Loss 5.7810   LearningRate 0.0232   Epoch: 10   Global Step: 52400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:38:41,319-Speed 5518.26 samples/sec   Loss 5.7307   LearningRate 0.0232   Epoch: 10   Global Step: 52410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:38:43,185-Speed 5489.70 samples/sec   Loss 5.8571   LearningRate 0.0232   Epoch: 10   Global Step: 52420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:38:45,055-Speed 5480.73 samples/sec   Loss 5.8080   LearningRate 0.0232   Epoch: 10   Global Step: 52430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:38:46,898-Speed 5558.42 samples/sec   Loss 5.7259   LearningRate 0.0232   Epoch: 10   Global Step: 52440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:48,778-Speed 5449.25 samples/sec   Loss 5.8520   LearningRate 0.0232   Epoch: 10   Global Step: 52450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:50,640-Speed 5504.75 samples/sec   Loss 5.7744   LearningRate 0.0232   Epoch: 10   Global Step: 52460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:52,491-Speed 5534.71 samples/sec   Loss 5.7044   LearningRate 0.0232   Epoch: 10   Global Step: 52470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:54,335-Speed 5556.27 samples/sec   Loss 5.7912   LearningRate 0.0232   Epoch: 10   Global Step: 52480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:56,181-Speed 5550.61 samples/sec   Loss 5.7446   LearningRate 0.0231   Epoch: 10   Global Step: 52490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:58,038-Speed 5516.08 samples/sec   Loss 5.6655   LearningRate 0.0231   Epoch: 10   Global Step: 52500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:38:59,896-Speed 5512.61 samples/sec   Loss 5.6522   LearningRate 0.0231   Epoch: 10   Global Step: 52510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:01,762-Speed 5492.41 samples/sec   Loss 5.8373   LearningRate 0.0231   Epoch: 10   Global Step: 52520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:03,612-Speed 5539.72 samples/sec   Loss 5.8097   LearningRate 0.0231   Epoch: 10   Global Step: 52530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:05,479-Speed 5486.17 samples/sec   Loss 5.8515   LearningRate 0.0231   Epoch: 10   Global Step: 52540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:39:07,329-Speed 5540.27 samples/sec   Loss 5.8425   LearningRate 0.0231   Epoch: 10   Global Step: 52550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:09,181-Speed 5529.42 samples/sec   Loss 5.7387   LearningRate 0.0231   Epoch: 10   Global Step: 52560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:11,035-Speed 5526.58 samples/sec   Loss 5.7857   LearningRate 0.0231   Epoch: 10   Global Step: 52570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:12,904-Speed 5481.76 samples/sec   Loss 5.7866   LearningRate 0.0231   Epoch: 10   Global Step: 52580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:14,746-Speed 5561.32 samples/sec   Loss 5.7343   LearningRate 0.0231   Epoch: 10   Global Step: 52590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:16,607-Speed 5507.32 samples/sec   Loss 5.6772   LearningRate 0.0230   Epoch: 10   Global Step: 52600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:18,449-Speed 5560.91 samples/sec   Loss 5.7838   LearningRate 0.0230   Epoch: 10   Global Step: 52610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:20,295-Speed 5549.42 samples/sec   Loss 5.7554   LearningRate 0.0230   Epoch: 10   Global Step: 52620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:22,133-Speed 5578.02 samples/sec   Loss 5.7649   LearningRate 0.0230   Epoch: 10   Global Step: 52630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:23,990-Speed 5514.86 samples/sec   Loss 5.8373   LearningRate 0.0230   Epoch: 10   Global Step: 52640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:25,823-Speed 5590.82 samples/sec   Loss 5.7671   LearningRate 0.0230   Epoch: 10   Global Step: 52650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:27,665-Speed 5561.02 samples/sec   Loss 5.7526   LearningRate 0.0230   Epoch: 10   Global Step: 52660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:29,544-Speed 5454.22 samples/sec   Loss 5.6654   LearningRate 0.0230   Epoch: 10   Global Step: 52670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:31,391-Speed 5547.33 samples/sec   Loss 5.9179   LearningRate 0.0230   Epoch: 10   Global Step: 52680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:33,247-Speed 5518.74 samples/sec   Loss 5.6338   LearningRate 0.0230   Epoch: 10   Global Step: 52690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:35,107-Speed 5506.91 samples/sec   Loss 5.8413   LearningRate 0.0229   Epoch: 10   Global Step: 52700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:36,968-Speed 5508.13 samples/sec   Loss 5.7575   LearningRate 0.0229   Epoch: 10   Global Step: 52710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:38,841-Speed 5469.42 samples/sec   Loss 5.6526   LearningRate 0.0229   Epoch: 10   Global Step: 52720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:39:40,684-Speed 5560.92 samples/sec   Loss 5.7847   LearningRate 0.0229   Epoch: 10   Global Step: 52730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:39:42,582-Speed 5396.78 samples/sec   Loss 5.6633   LearningRate 0.0229   Epoch: 10   Global Step: 52740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:39:44,423-Speed 5564.20 samples/sec   Loss 5.6285   LearningRate 0.0229   Epoch: 10   Global Step: 52750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:39:46,302-Speed 5454.39 samples/sec   Loss 5.8385   LearningRate 0.0229   Epoch: 10   Global Step: 52760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:39:48,152-Speed 5536.56 samples/sec   Loss 5.6890   LearningRate 0.0229   Epoch: 10   Global Step: 52770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:39:50,021-Speed 5480.74 samples/sec   Loss 5.6753   LearningRate 0.0229   Epoch: 10   Global Step: 52780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:39:51,874-Speed 5530.53 samples/sec   Loss 5.8054   LearningRate 0.0229   Epoch: 10   Global Step: 52790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:39:53,748-Speed 5467.15 samples/sec   Loss 5.8317   LearningRate 0.0229   Epoch: 10   Global Step: 52800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:39:55,594-Speed 5551.97 samples/sec   Loss 5.7341   LearningRate 0.0228   Epoch: 10   Global Step: 52810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:39:57,452-Speed 5514.36 samples/sec   Loss 5.8469   LearningRate 0.0228   Epoch: 10   Global Step: 52820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:39:59,303-Speed 5532.17 samples/sec   Loss 5.7997   LearningRate 0.0228   Epoch: 10   Global Step: 52830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:01,187-Speed 5439.31 samples/sec   Loss 5.8445   LearningRate 0.0228   Epoch: 10   Global Step: 52840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:03,027-Speed 5569.38 samples/sec   Loss 5.8615   LearningRate 0.0228   Epoch: 10   Global Step: 52850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:04,889-Speed 5502.10 samples/sec   Loss 5.8786   LearningRate 0.0228   Epoch: 10   Global Step: 52860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:06,734-Speed 5553.47 samples/sec   Loss 5.7795   LearningRate 0.0228   Epoch: 10   Global Step: 52870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:08,604-Speed 5477.86 samples/sec   Loss 5.8147   LearningRate 0.0228   Epoch: 10   Global Step: 52880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:10,449-Speed 5552.40 samples/sec   Loss 5.8707   LearningRate 0.0228   Epoch: 10   Global Step: 52890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:12,319-Speed 5481.30 samples/sec   Loss 5.8866   LearningRate 0.0228   Epoch: 10   Global Step: 52900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:14,158-Speed 5569.73 samples/sec   Loss 5.6144   LearningRate 0.0227   Epoch: 10   Global Step: 52910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:16,034-Speed 5462.24 samples/sec   Loss 5.7861   LearningRate 0.0227   Epoch: 10   Global Step: 52920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:17,894-Speed 5508.21 samples/sec   Loss 5.7628   LearningRate 0.0227   Epoch: 10   Global Step: 52930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:19,742-Speed 5544.66 samples/sec   Loss 5.6922   LearningRate 0.0227   Epoch: 10   Global Step: 52940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:21,591-Speed 5542.59 samples/sec   Loss 5.8598   LearningRate 0.0227   Epoch: 10   Global Step: 52950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:23,466-Speed 5462.39 samples/sec   Loss 5.8808   LearningRate 0.0227   Epoch: 10   Global Step: 52960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:25,307-Speed 5565.28 samples/sec   Loss 5.6855   LearningRate 0.0227   Epoch: 10   Global Step: 52970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:27,148-Speed 5566.29 samples/sec   Loss 5.9144   LearningRate 0.0227   Epoch: 10   Global Step: 52980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:29,000-Speed 5530.34 samples/sec   Loss 5.9113   LearningRate 0.0227   Epoch: 10   Global Step: 52990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:30,853-Speed 5529.48 samples/sec   Loss 5.7086   LearningRate 0.0227   Epoch: 10   Global Step: 53000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:32,726-Speed 5469.29 samples/sec   Loss 5.8245   LearningRate 0.0227   Epoch: 10   Global Step: 53010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:34,563-Speed 5577.30 samples/sec   Loss 5.7978   LearningRate 0.0226   Epoch: 10   Global Step: 53020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:36,396-Speed 5591.72 samples/sec   Loss 5.8962   LearningRate 0.0226   Epoch: 10   Global Step: 53030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:38,240-Speed 5554.51 samples/sec   Loss 5.7447   LearningRate 0.0226   Epoch: 10   Global Step: 53040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:40,087-Speed 5546.20 samples/sec   Loss 5.7280   LearningRate 0.0226   Epoch: 10   Global Step: 53050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:41,989-Speed 5386.64 samples/sec   Loss 5.7866   LearningRate 0.0226   Epoch: 10   Global Step: 53060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:43,837-Speed 5545.95 samples/sec   Loss 5.7707   LearningRate 0.0226   Epoch: 10   Global Step: 53070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:45,703-Speed 5490.60 samples/sec   Loss 5.8063   LearningRate 0.0226   Epoch: 10   Global Step: 53080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:47,547-Speed 5556.75 samples/sec   Loss 5.7439   LearningRate 0.0226   Epoch: 10   Global Step: 53090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:49,406-Speed 5509.79 samples/sec   Loss 5.6707   LearningRate 0.0226   Epoch: 10   Global Step: 53100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:51,288-Speed 5445.61 samples/sec   Loss 5.7937   LearningRate 0.0226   Epoch: 10   Global Step: 53110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:53,136-Speed 5543.59 samples/sec   Loss 5.8524   LearningRate 0.0226   Epoch: 10   Global Step: 53120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:40:54,980-Speed 5553.43 samples/sec   Loss 5.8172   LearningRate 0.0225   Epoch: 10   Global Step: 53130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:40:56,851-Speed 5476.47 samples/sec   Loss 5.7199   LearningRate 0.0225   Epoch: 10   Global Step: 53140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:40:58,732-Speed 5447.33 samples/sec   Loss 5.8467   LearningRate 0.0225   Epoch: 10   Global Step: 53150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:00,598-Speed 5491.14 samples/sec   Loss 5.7315   LearningRate 0.0225   Epoch: 10   Global Step: 53160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:02,454-Speed 5522.90 samples/sec   Loss 5.9080   LearningRate 0.0225   Epoch: 10   Global Step: 53170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:04,321-Speed 5486.63 samples/sec   Loss 5.7060   LearningRate 0.0225   Epoch: 10   Global Step: 53180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:06,161-Speed 5567.99 samples/sec   Loss 5.7226   LearningRate 0.0225   Epoch: 10   Global Step: 53190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:08,030-Speed 5482.66 samples/sec   Loss 5.8014   LearningRate 0.0225   Epoch: 10   Global Step: 53200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:09,886-Speed 5518.61 samples/sec   Loss 5.7669   LearningRate 0.0225   Epoch: 10   Global Step: 53210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:11,765-Speed 5453.59 samples/sec   Loss 5.7604   LearningRate 0.0225   Epoch: 10   Global Step: 53220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:13,623-Speed 5513.93 samples/sec   Loss 5.7729   LearningRate 0.0224   Epoch: 10   Global Step: 53230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:41:15,488-Speed 5493.52 samples/sec   Loss 5.8007   LearningRate 0.0224   Epoch: 10   Global Step: 53240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:41:17,345-Speed 5518.51 samples/sec   Loss 5.8082   LearningRate 0.0224   Epoch: 10   Global Step: 53250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:41:19,194-Speed 5540.41 samples/sec   Loss 5.7654   LearningRate 0.0224   Epoch: 10   Global Step: 53260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:41:21,030-Speed 5582.36 samples/sec   Loss 5.7835   LearningRate 0.0224   Epoch: 10   Global Step: 53270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:41:22,867-Speed 5575.99 samples/sec   Loss 5.6649   LearningRate 0.0224   Epoch: 10   Global Step: 53280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:41:24,741-Speed 5466.99 samples/sec   Loss 5.7344   LearningRate 0.0224   Epoch: 10   Global Step: 53290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:41:26,585-Speed 5557.47 samples/sec   Loss 5.7629   LearningRate 0.0224   Epoch: 10   Global Step: 53300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:28,465-Speed 5447.47 samples/sec   Loss 5.8999   LearningRate 0.0224   Epoch: 10   Global Step: 53310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:30,312-Speed 5546.42 samples/sec   Loss 5.6172   LearningRate 0.0224   Epoch: 10   Global Step: 53320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:32,179-Speed 5489.82 samples/sec   Loss 5.7652   LearningRate 0.0224   Epoch: 10   Global Step: 53330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:34,051-Speed 5473.42 samples/sec   Loss 5.7465   LearningRate 0.0223   Epoch: 10   Global Step: 53340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:35,887-Speed 5579.70 samples/sec   Loss 5.8172   LearningRate 0.0223   Epoch: 10   Global Step: 53350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:37,736-Speed 5541.73 samples/sec   Loss 5.7947   LearningRate 0.0223   Epoch: 10   Global Step: 53360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:39,570-Speed 5585.73 samples/sec   Loss 5.6645   LearningRate 0.0223   Epoch: 10   Global Step: 53370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:41,429-Speed 5512.16 samples/sec   Loss 5.6774   LearningRate 0.0223   Epoch: 10   Global Step: 53380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:43,269-Speed 5567.87 samples/sec   Loss 5.6934   LearningRate 0.0223   Epoch: 10   Global Step: 53390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:45,110-Speed 5564.06 samples/sec   Loss 5.7294   LearningRate 0.0223   Epoch: 10   Global Step: 53400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:41:46,957-Speed 5550.22 samples/sec   Loss 5.5792   LearningRate 0.0223   Epoch: 10   Global Step: 53410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:41:48,847-Speed 5420.13 samples/sec   Loss 5.8713   LearningRate 0.0223   Epoch: 10   Global Step: 53420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:41:50,700-Speed 5529.76 samples/sec   Loss 5.9179   LearningRate 0.0223   Epoch: 10   Global Step: 53430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:41:52,562-Speed 5500.24 samples/sec   Loss 5.8255   LearningRate 0.0223   Epoch: 10   Global Step: 53440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:54,405-Speed 5559.81 samples/sec   Loss 5.9072   LearningRate 0.0222   Epoch: 10   Global Step: 53450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:56,248-Speed 5558.87 samples/sec   Loss 5.6209   LearningRate 0.0222   Epoch: 10   Global Step: 53460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:58,116-Speed 5485.81 samples/sec   Loss 5.7548   LearningRate 0.0222   Epoch: 10   Global Step: 53470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:41:59,964-Speed 5543.86 samples/sec   Loss 5.8057   LearningRate 0.0222   Epoch: 10   Global Step: 53480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:42:01,832-Speed 5484.59 samples/sec   Loss 5.9212   LearningRate 0.0222   Epoch: 10   Global Step: 53490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:42:03,675-Speed 5560.84 samples/sec   Loss 5.8456   LearningRate 0.0222   Epoch: 10   Global Step: 53500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:42:05,530-Speed 5521.55 samples/sec   Loss 5.8002   LearningRate 0.0222   Epoch: 10   Global Step: 53510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:42:07,379-Speed 5541.26 samples/sec   Loss 5.7054   LearningRate 0.0222   Epoch: 10   Global Step: 53520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:42:09,216-Speed 5577.27 samples/sec   Loss 5.9624   LearningRate 0.0222   Epoch: 10   Global Step: 53530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:42:11,056-Speed 5566.37 samples/sec   Loss 5.6352   LearningRate 0.0222   Epoch: 10   Global Step: 53540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:12,905-Speed 5542.67 samples/sec   Loss 5.8572   LearningRate 0.0222   Epoch: 10   Global Step: 53550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:14,742-Speed 5575.22 samples/sec   Loss 5.6869   LearningRate 0.0221   Epoch: 10   Global Step: 53560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:16,605-Speed 5496.97 samples/sec   Loss 5.8512   LearningRate 0.0221   Epoch: 10   Global Step: 53570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:18,449-Speed 5558.58 samples/sec   Loss 5.7756   LearningRate 0.0221   Epoch: 10   Global Step: 53580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:20,284-Speed 5580.14 samples/sec   Loss 5.7938   LearningRate 0.0221   Epoch: 10   Global Step: 53590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:22,126-Speed 5564.40 samples/sec   Loss 5.6666   LearningRate 0.0221   Epoch: 10   Global Step: 53600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:23,966-Speed 5565.22 samples/sec   Loss 5.6841   LearningRate 0.0221   Epoch: 10   Global Step: 53610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:25,838-Speed 5473.71 samples/sec   Loss 5.7660   LearningRate 0.0221   Epoch: 10   Global Step: 53620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:27,687-Speed 5540.14 samples/sec   Loss 5.7133   LearningRate 0.0221   Epoch: 10   Global Step: 53630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:29,562-Speed 5462.49 samples/sec   Loss 5.8061   LearningRate 0.0221   Epoch: 10   Global Step: 53640   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:42:31,412-Speed 5539.74 samples/sec   Loss 5.8839   LearningRate 0.0221   Epoch: 10   Global Step: 53650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:33,256-Speed 5555.01 samples/sec   Loss 5.8115   LearningRate 0.0220   Epoch: 10   Global Step: 53660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:35,097-Speed 5564.79 samples/sec   Loss 5.8337   LearningRate 0.0220   Epoch: 10   Global Step: 53670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:36,948-Speed 5535.08 samples/sec   Loss 5.8878   LearningRate 0.0220   Epoch: 10   Global Step: 53680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:38,840-Speed 5414.08 samples/sec   Loss 5.6656   LearningRate 0.0220   Epoch: 10   Global Step: 53690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:40,716-Speed 5458.85 samples/sec   Loss 5.7294   LearningRate 0.0220   Epoch: 10   Global Step: 53700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:42,575-Speed 5512.91 samples/sec   Loss 5.7705   LearningRate 0.0220   Epoch: 10   Global Step: 53710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:44,424-Speed 5540.80 samples/sec   Loss 5.6668   LearningRate 0.0220   Epoch: 10   Global Step: 53720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:46,276-Speed 5533.50 samples/sec   Loss 5.6854   LearningRate 0.0220   Epoch: 10   Global Step: 53730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:48,120-Speed 5555.07 samples/sec   Loss 5.6753   LearningRate 0.0220   Epoch: 10   Global Step: 53740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:49,991-Speed 5475.92 samples/sec   Loss 5.6945   LearningRate 0.0220   Epoch: 10   Global Step: 53750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:51,850-Speed 5510.01 samples/sec   Loss 5.7620   LearningRate 0.0220   Epoch: 10   Global Step: 53760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:53,715-Speed 5493.51 samples/sec   Loss 5.7301   LearningRate 0.0219   Epoch: 10   Global Step: 53770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:55,560-Speed 5553.53 samples/sec   Loss 5.8394   LearningRate 0.0219   Epoch: 10   Global Step: 53780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:57,405-Speed 5553.85 samples/sec   Loss 5.7692   LearningRate 0.0219   Epoch: 10   Global Step: 53790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:42:59,255-Speed 5537.87 samples/sec   Loss 5.7216   LearningRate 0.0219   Epoch: 10   Global Step: 53800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:43:01,113-Speed 5515.44 samples/sec   Loss 5.7042   LearningRate 0.0219   Epoch: 10   Global Step: 53810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:43:02,970-Speed 5516.66 samples/sec   Loss 5.6566   LearningRate 0.0219   Epoch: 10   Global Step: 53820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:43:04,857-Speed 5429.23 samples/sec   Loss 5.8542   LearningRate 0.0219   Epoch: 10   Global Step: 53830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:43:06,707-Speed 5539.34 samples/sec   Loss 5.7806   LearningRate 0.0219   Epoch: 10   Global Step: 53840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:43:08,551-Speed 5555.01 samples/sec   Loss 5.7051   LearningRate 0.0219   Epoch: 10   Global Step: 53850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:43:10,425-Speed 5468.36 samples/sec   Loss 5.7538   LearningRate 0.0219   Epoch: 10   Global Step: 53860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:43:12,287-Speed 5502.50 samples/sec   Loss 5.8133   LearningRate 0.0219   Epoch: 10   Global Step: 53870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:43:14,161-Speed 5468.24 samples/sec   Loss 5.7583   LearningRate 0.0218   Epoch: 10   Global Step: 53880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:43:15,998-Speed 5575.36 samples/sec   Loss 5.7999   LearningRate 0.0218   Epoch: 10   Global Step: 53890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:43:17,856-Speed 5515.86 samples/sec   Loss 5.9414   LearningRate 0.0218   Epoch: 10   Global Step: 53900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:43:19,697-Speed 5565.25 samples/sec   Loss 5.6751   LearningRate 0.0218   Epoch: 10   Global Step: 53910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:43:21,557-Speed 5507.65 samples/sec   Loss 5.8503   LearningRate 0.0218   Epoch: 10   Global Step: 53920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:43:23,403-Speed 5549.95 samples/sec   Loss 5.7812   LearningRate 0.0218   Epoch: 10   Global Step: 53930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:43:25,262-Speed 5510.74 samples/sec   Loss 5.8014   LearningRate 0.0218   Epoch: 10   Global Step: 53940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:43:27,141-Speed 5451.56 samples/sec   Loss 5.6876   LearningRate 0.0218   Epoch: 10   Global Step: 53950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:43:29,033-Speed 5417.66 samples/sec   Loss 5.9654   LearningRate 0.0218   Epoch: 10   Global Step: 53960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:43:30,940-Speed 5370.73 samples/sec   Loss 5.7887   LearningRate 0.0218   Epoch: 10   Global Step: 53970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:43:32,797-Speed 5517.31 samples/sec   Loss 5.6957   LearningRate 0.0218   Epoch: 10   Global Step: 53980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:43:34,637-Speed 5569.03 samples/sec   Loss 5.8523   LearningRate 0.0217   Epoch: 10   Global Step: 53990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:43:36,517-Speed 5449.15 samples/sec   Loss 5.7996   LearningRate 0.0217   Epoch: 10   Global Step: 54000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:44:03,632-[lfw][54000]XNorm: 22.709581
Training: 2022-04-11 13:44:03,633-[lfw][54000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-11 13:44:03,634-[lfw][54000]Accuracy-Highest: 0.99817
Training: 2022-04-11 13:44:35,029-[cfp_fp][54000]XNorm: 20.050151
Training: 2022-04-11 13:44:35,030-[cfp_fp][54000]Accuracy-Flip: 0.96186+-0.01034
Training: 2022-04-11 13:44:35,031-[cfp_fp][54000]Accuracy-Highest: 0.97029
Training: 2022-04-11 13:45:02,161-[agedb_30][54000]XNorm: 22.348742
Training: 2022-04-11 13:45:02,162-[agedb_30][54000]Accuracy-Flip: 0.97783+-0.00760
Training: 2022-04-11 13:45:02,162-[agedb_30][54000]Accuracy-Highest: 0.97817
Training: 2022-04-11 13:45:04,024-Speed 117.02 samples/sec   Loss 5.9189   LearningRate 0.0217   Epoch: 10   Global Step: 54010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:45:05,857-Speed 5589.35 samples/sec   Loss 5.7533   LearningRate 0.0217   Epoch: 10   Global Step: 54020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:07,703-Speed 5547.17 samples/sec   Loss 5.6040   LearningRate 0.0217   Epoch: 10   Global Step: 54030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:09,540-Speed 5576.90 samples/sec   Loss 5.7507   LearningRate 0.0217   Epoch: 10   Global Step: 54040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:11,372-Speed 5593.62 samples/sec   Loss 5.6199   LearningRate 0.0217   Epoch: 10   Global Step: 54050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:13,203-Speed 5594.16 samples/sec   Loss 5.7403   LearningRate 0.0217   Epoch: 10   Global Step: 54060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:15,034-Speed 5594.69 samples/sec   Loss 5.7391   LearningRate 0.0217   Epoch: 10   Global Step: 54070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:16,864-Speed 5597.64 samples/sec   Loss 5.6273   LearningRate 0.0217   Epoch: 10   Global Step: 54080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:18,700-Speed 5582.66 samples/sec   Loss 5.6505   LearningRate 0.0217   Epoch: 10   Global Step: 54090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:20,553-Speed 5528.76 samples/sec   Loss 5.8474   LearningRate 0.0216   Epoch: 10   Global Step: 54100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:22,401-Speed 5545.79 samples/sec   Loss 5.5895   LearningRate 0.0216   Epoch: 10   Global Step: 54110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:24,247-Speed 5549.18 samples/sec   Loss 5.7268   LearningRate 0.0216   Epoch: 10   Global Step: 54120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:26,074-Speed 5608.97 samples/sec   Loss 5.7342   LearningRate 0.0216   Epoch: 10   Global Step: 54130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:27,939-Speed 5495.56 samples/sec   Loss 5.6365   LearningRate 0.0216   Epoch: 10   Global Step: 54140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:29,787-Speed 5541.35 samples/sec   Loss 5.6630   LearningRate 0.0216   Epoch: 10   Global Step: 54150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:31,625-Speed 5573.50 samples/sec   Loss 5.5985   LearningRate 0.0216   Epoch: 10   Global Step: 54160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:45:33,470-Speed 5554.14 samples/sec   Loss 5.7321   LearningRate 0.0216   Epoch: 10   Global Step: 54170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:45:35,309-Speed 5571.46 samples/sec   Loss 5.7372   LearningRate 0.0216   Epoch: 10   Global Step: 54180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:45:37,167-Speed 5515.08 samples/sec   Loss 5.7052   LearningRate 0.0216   Epoch: 10   Global Step: 54190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:45:39,024-Speed 5517.41 samples/sec   Loss 5.7340   LearningRate 0.0215   Epoch: 10   Global Step: 54200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:45:40,883-Speed 5511.25 samples/sec   Loss 5.7311   LearningRate 0.0215   Epoch: 10   Global Step: 54210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:45:42,724-Speed 5566.08 samples/sec   Loss 5.8080   LearningRate 0.0215   Epoch: 10   Global Step: 54220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:45:44,595-Speed 5473.41 samples/sec   Loss 5.8639   LearningRate 0.0215   Epoch: 10   Global Step: 54230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:45:46,439-Speed 5556.30 samples/sec   Loss 5.6249   LearningRate 0.0215   Epoch: 10   Global Step: 54240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:45:48,310-Speed 5476.20 samples/sec   Loss 5.6632   LearningRate 0.0215   Epoch: 10   Global Step: 54250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:45:50,184-Speed 5466.66 samples/sec   Loss 5.7268   LearningRate 0.0215   Epoch: 10   Global Step: 54260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:52,044-Speed 5511.12 samples/sec   Loss 5.7376   LearningRate 0.0215   Epoch: 10   Global Step: 54270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:53,895-Speed 5534.71 samples/sec   Loss 5.7608   LearningRate 0.0215   Epoch: 10   Global Step: 54280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:55,750-Speed 5522.34 samples/sec   Loss 5.6530   LearningRate 0.0215   Epoch: 10   Global Step: 54290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:57,593-Speed 5562.43 samples/sec   Loss 5.7394   LearningRate 0.0215   Epoch: 10   Global Step: 54300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:45:59,445-Speed 5532.46 samples/sec   Loss 5.6920   LearningRate 0.0214   Epoch: 10   Global Step: 54310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:01,291-Speed 5551.07 samples/sec   Loss 5.7849   LearningRate 0.0214   Epoch: 10   Global Step: 54320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:46:03,169-Speed 5454.79 samples/sec   Loss 5.9050   LearningRate 0.0214   Epoch: 10   Global Step: 54330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:46:05,007-Speed 5575.76 samples/sec   Loss 5.6897   LearningRate 0.0214   Epoch: 10   Global Step: 54340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:46:06,874-Speed 5488.31 samples/sec   Loss 5.5696   LearningRate 0.0214   Epoch: 10   Global Step: 54350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:46:08,708-Speed 5586.02 samples/sec   Loss 5.8646   LearningRate 0.0214   Epoch: 10   Global Step: 54360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:46:10,565-Speed 5516.71 samples/sec   Loss 5.8399   LearningRate 0.0214   Epoch: 10   Global Step: 54370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:46:12,436-Speed 5478.42 samples/sec   Loss 5.7278   LearningRate 0.0214   Epoch: 10   Global Step: 54380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:46:14,294-Speed 5514.62 samples/sec   Loss 5.6689   LearningRate 0.0214   Epoch: 10   Global Step: 54390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:46:16,138-Speed 5554.12 samples/sec   Loss 5.8713   LearningRate 0.0214   Epoch: 10   Global Step: 54400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:46:18,009-Speed 5478.49 samples/sec   Loss 5.7891   LearningRate 0.0214   Epoch: 10   Global Step: 54410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:46:19,842-Speed 5591.71 samples/sec   Loss 5.7810   LearningRate 0.0213   Epoch: 10   Global Step: 54420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:21,692-Speed 5535.30 samples/sec   Loss 5.6521   LearningRate 0.0213   Epoch: 10   Global Step: 54430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:23,535-Speed 5562.58 samples/sec   Loss 5.6939   LearningRate 0.0213   Epoch: 10   Global Step: 54440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:25,419-Speed 5436.69 samples/sec   Loss 5.6754   LearningRate 0.0213   Epoch: 10   Global Step: 54450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:27,283-Speed 5498.35 samples/sec   Loss 5.6394   LearningRate 0.0213   Epoch: 10   Global Step: 54460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:29,138-Speed 5521.75 samples/sec   Loss 5.7549   LearningRate 0.0213   Epoch: 10   Global Step: 54470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:30,999-Speed 5507.08 samples/sec   Loss 5.7935   LearningRate 0.0213   Epoch: 10   Global Step: 54480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:32,841-Speed 5562.12 samples/sec   Loss 5.6489   LearningRate 0.0213   Epoch: 10   Global Step: 54490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:34,711-Speed 5480.92 samples/sec   Loss 5.7283   LearningRate 0.0213   Epoch: 10   Global Step: 54500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:36,559-Speed 5544.09 samples/sec   Loss 5.7868   LearningRate 0.0213   Epoch: 10   Global Step: 54510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:38,405-Speed 5549.34 samples/sec   Loss 5.6961   LearningRate 0.0213   Epoch: 10   Global Step: 54520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:40,247-Speed 5560.53 samples/sec   Loss 5.6003   LearningRate 0.0212   Epoch: 10   Global Step: 54530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:42,109-Speed 5503.30 samples/sec   Loss 5.7545   LearningRate 0.0212   Epoch: 10   Global Step: 54540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:43,949-Speed 5569.11 samples/sec   Loss 5.6955   LearningRate 0.0212   Epoch: 10   Global Step: 54550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:45,807-Speed 5513.83 samples/sec   Loss 5.6564   LearningRate 0.0212   Epoch: 10   Global Step: 54560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:47,667-Speed 5510.55 samples/sec   Loss 5.6574   LearningRate 0.0212   Epoch: 10   Global Step: 54570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:49,551-Speed 5438.96 samples/sec   Loss 5.8508   LearningRate 0.0212   Epoch: 10   Global Step: 54580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:51,387-Speed 5581.98 samples/sec   Loss 5.6973   LearningRate 0.0212   Epoch: 10   Global Step: 54590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:53,255-Speed 5485.58 samples/sec   Loss 5.7041   LearningRate 0.0212   Epoch: 10   Global Step: 54600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:55,088-Speed 5589.24 samples/sec   Loss 5.7443   LearningRate 0.0212   Epoch: 10   Global Step: 54610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:46:56,924-Speed 5581.17 samples/sec   Loss 5.6193   LearningRate 0.0212   Epoch: 10   Global Step: 54620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:46:58,759-Speed 5585.09 samples/sec   Loss 5.7329   LearningRate 0.0212   Epoch: 10   Global Step: 54630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:47:00,599-Speed 5568.74 samples/sec   Loss 5.7094   LearningRate 0.0211   Epoch: 10   Global Step: 54640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:47:02,440-Speed 5563.59 samples/sec   Loss 5.4920   LearningRate 0.0211   Epoch: 10   Global Step: 54650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:47:04,276-Speed 5581.28 samples/sec   Loss 5.7769   LearningRate 0.0211   Epoch: 10   Global Step: 54660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:47:06,109-Speed 5588.32 samples/sec   Loss 5.6320   LearningRate 0.0211   Epoch: 10   Global Step: 54670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:07,952-Speed 5559.13 samples/sec   Loss 5.8249   LearningRate 0.0211   Epoch: 10   Global Step: 54680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:09,799-Speed 5545.00 samples/sec   Loss 5.6858   LearningRate 0.0211   Epoch: 10   Global Step: 54690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:11,658-Speed 5510.23 samples/sec   Loss 5.5567   LearningRate 0.0211   Epoch: 10   Global Step: 54700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:13,539-Speed 5447.11 samples/sec   Loss 5.6349   LearningRate 0.0211   Epoch: 10   Global Step: 54710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:15,397-Speed 5512.06 samples/sec   Loss 5.8096   LearningRate 0.0211   Epoch: 10   Global Step: 54720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:17,232-Speed 5582.74 samples/sec   Loss 5.6639   LearningRate 0.0211   Epoch: 10   Global Step: 54730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:19,073-Speed 5566.34 samples/sec   Loss 5.6324   LearningRate 0.0211   Epoch: 10   Global Step: 54740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:20,909-Speed 5579.34 samples/sec   Loss 5.8911   LearningRate 0.0210   Epoch: 10   Global Step: 54750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:22,743-Speed 5587.12 samples/sec   Loss 5.6507   LearningRate 0.0210   Epoch: 10   Global Step: 54760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:24,574-Speed 5593.64 samples/sec   Loss 5.6838   LearningRate 0.0210   Epoch: 10   Global Step: 54770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:47:26,418-Speed 5553.35 samples/sec   Loss 5.7050   LearningRate 0.0210   Epoch: 10   Global Step: 54780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:47:28,274-Speed 5523.81 samples/sec   Loss 5.5862   LearningRate 0.0210   Epoch: 10   Global Step: 54790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:47:30,111-Speed 5574.29 samples/sec   Loss 5.8139   LearningRate 0.0210   Epoch: 10   Global Step: 54800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:47:31,946-Speed 5583.03 samples/sec   Loss 5.7551   LearningRate 0.0210   Epoch: 10   Global Step: 54810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:47:33,780-Speed 5588.00 samples/sec   Loss 5.7423   LearningRate 0.0210   Epoch: 10   Global Step: 54820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:47:35,632-Speed 5532.05 samples/sec   Loss 5.6422   LearningRate 0.0210   Epoch: 10   Global Step: 54830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:47:37,477-Speed 5551.20 samples/sec   Loss 5.6937   LearningRate 0.0210   Epoch: 10   Global Step: 54840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:47:39,322-Speed 5554.01 samples/sec   Loss 5.7168   LearningRate 0.0210   Epoch: 10   Global Step: 54850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:47:41,171-Speed 5541.38 samples/sec   Loss 5.7707   LearningRate 0.0209   Epoch: 10   Global Step: 54860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:47:43,003-Speed 5591.05 samples/sec   Loss 5.6153   LearningRate 0.0209   Epoch: 10   Global Step: 54870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:44,856-Speed 5530.25 samples/sec   Loss 5.6833   LearningRate 0.0209   Epoch: 10   Global Step: 54880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:46,721-Speed 5492.97 samples/sec   Loss 5.6826   LearningRate 0.0209   Epoch: 10   Global Step: 54890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:48,563-Speed 5564.26 samples/sec   Loss 5.7865   LearningRate 0.0209   Epoch: 10   Global Step: 54900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:50,403-Speed 5567.26 samples/sec   Loss 5.6192   LearningRate 0.0209   Epoch: 10   Global Step: 54910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:52,249-Speed 5550.07 samples/sec   Loss 5.6057   LearningRate 0.0209   Epoch: 10   Global Step: 54920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:54,086-Speed 5578.61 samples/sec   Loss 5.6434   LearningRate 0.0209   Epoch: 10   Global Step: 54930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:55,937-Speed 5534.64 samples/sec   Loss 5.6767   LearningRate 0.0209   Epoch: 10   Global Step: 54940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:57,767-Speed 5599.80 samples/sec   Loss 5.6864   LearningRate 0.0209   Epoch: 10   Global Step: 54950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:47:59,601-Speed 5585.89 samples/sec   Loss 5.7692   LearningRate 0.0209   Epoch: 10   Global Step: 54960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:48:01,441-Speed 5565.98 samples/sec   Loss 5.7004   LearningRate 0.0208   Epoch: 10   Global Step: 54970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:03,279-Speed 5574.27 samples/sec   Loss 5.7020   LearningRate 0.0208   Epoch: 10   Global Step: 54980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:05,122-Speed 5558.64 samples/sec   Loss 5.6455   LearningRate 0.0208   Epoch: 10   Global Step: 54990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:06,958-Speed 5580.34 samples/sec   Loss 5.7164   LearningRate 0.0208   Epoch: 10   Global Step: 55000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:08,795-Speed 5576.66 samples/sec   Loss 5.7892   LearningRate 0.0208   Epoch: 10   Global Step: 55010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:10,656-Speed 5508.37 samples/sec   Loss 5.7022   LearningRate 0.0208   Epoch: 10   Global Step: 55020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:12,484-Speed 5606.17 samples/sec   Loss 5.7236   LearningRate 0.0208   Epoch: 10   Global Step: 55030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:48:14,343-Speed 5511.56 samples/sec   Loss 5.6259   LearningRate 0.0208   Epoch: 10   Global Step: 55040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:48:16,178-Speed 5581.33 samples/sec   Loss 5.6528   LearningRate 0.0208   Epoch: 10   Global Step: 55050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:48:18,036-Speed 5518.11 samples/sec   Loss 5.6803   LearningRate 0.0208   Epoch: 10   Global Step: 55060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:48:19,878-Speed 5560.91 samples/sec   Loss 5.5741   LearningRate 0.0208   Epoch: 10   Global Step: 55070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:48:21,756-Speed 5456.79 samples/sec   Loss 5.6715   LearningRate 0.0207   Epoch: 10   Global Step: 55080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:48:23,621-Speed 5493.20 samples/sec   Loss 5.6759   LearningRate 0.0207   Epoch: 10   Global Step: 55090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:48:25,486-Speed 5493.41 samples/sec   Loss 5.6615   LearningRate 0.0207   Epoch: 10   Global Step: 55100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:48:27,340-Speed 5526.51 samples/sec   Loss 5.7286   LearningRate 0.0207   Epoch: 10   Global Step: 55110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:48:29,195-Speed 5523.00 samples/sec   Loss 5.6487   LearningRate 0.0207   Epoch: 10   Global Step: 55120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:48:31,044-Speed 5540.97 samples/sec   Loss 5.4630   LearningRate 0.0207   Epoch: 10   Global Step: 55130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:32,924-Speed 5452.37 samples/sec   Loss 5.6324   LearningRate 0.0207   Epoch: 10   Global Step: 55140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:34,785-Speed 5503.35 samples/sec   Loss 5.7635   LearningRate 0.0207   Epoch: 10   Global Step: 55150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:36,618-Speed 5589.27 samples/sec   Loss 5.6174   LearningRate 0.0207   Epoch: 10   Global Step: 55160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:38,487-Speed 5483.47 samples/sec   Loss 5.6854   LearningRate 0.0207   Epoch: 10   Global Step: 55170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:40,368-Speed 5447.41 samples/sec   Loss 5.5463   LearningRate 0.0207   Epoch: 10   Global Step: 55180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:42,216-Speed 5543.55 samples/sec   Loss 5.6347   LearningRate 0.0207   Epoch: 10   Global Step: 55190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:44,081-Speed 5494.39 samples/sec   Loss 5.5807   LearningRate 0.0206   Epoch: 10   Global Step: 55200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:45,918-Speed 5577.22 samples/sec   Loss 5.7519   LearningRate 0.0206   Epoch: 10   Global Step: 55210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:47,789-Speed 5476.05 samples/sec   Loss 5.6653   LearningRate 0.0206   Epoch: 10   Global Step: 55220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:49,629-Speed 5568.25 samples/sec   Loss 5.6181   LearningRate 0.0206   Epoch: 10   Global Step: 55230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:51,465-Speed 5578.24 samples/sec   Loss 5.4731   LearningRate 0.0206   Epoch: 10   Global Step: 55240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:53,314-Speed 5541.57 samples/sec   Loss 5.6735   LearningRate 0.0206   Epoch: 10   Global Step: 55250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:55,156-Speed 5563.88 samples/sec   Loss 5.6742   LearningRate 0.0206   Epoch: 10   Global Step: 55260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:57,026-Speed 5476.76 samples/sec   Loss 5.6727   LearningRate 0.0206   Epoch: 10   Global Step: 55270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:48:58,873-Speed 5549.31 samples/sec   Loss 5.6279   LearningRate 0.0206   Epoch: 10   Global Step: 55280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:00,758-Speed 5433.53 samples/sec   Loss 5.5920   LearningRate 0.0206   Epoch: 10   Global Step: 55290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:02,608-Speed 5539.42 samples/sec   Loss 5.6800   LearningRate 0.0206   Epoch: 10   Global Step: 55300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:04,458-Speed 5536.93 samples/sec   Loss 5.6449   LearningRate 0.0205   Epoch: 10   Global Step: 55310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:06,330-Speed 5473.70 samples/sec   Loss 5.5979   LearningRate 0.0205   Epoch: 10   Global Step: 55320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:08,216-Speed 5432.72 samples/sec   Loss 5.6765   LearningRate 0.0205   Epoch: 10   Global Step: 55330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:10,068-Speed 5531.93 samples/sec   Loss 5.4374   LearningRate 0.0205   Epoch: 10   Global Step: 55340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:11,911-Speed 5561.22 samples/sec   Loss 5.6005   LearningRate 0.0205   Epoch: 10   Global Step: 55350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:13,751-Speed 5568.79 samples/sec   Loss 5.5765   LearningRate 0.0205   Epoch: 10   Global Step: 55360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:15,621-Speed 5477.81 samples/sec   Loss 5.6002   LearningRate 0.0205   Epoch: 10   Global Step: 55370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:17,464-Speed 5558.83 samples/sec   Loss 5.6818   LearningRate 0.0205   Epoch: 10   Global Step: 55380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:19,321-Speed 5517.58 samples/sec   Loss 5.5366   LearningRate 0.0205   Epoch: 10   Global Step: 55390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:21,173-Speed 5532.25 samples/sec   Loss 5.7954   LearningRate 0.0205   Epoch: 10   Global Step: 55400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:23,045-Speed 5472.42 samples/sec   Loss 5.7020   LearningRate 0.0205   Epoch: 10   Global Step: 55410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:24,894-Speed 5540.36 samples/sec   Loss 5.5624   LearningRate 0.0204   Epoch: 10   Global Step: 55420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:26,760-Speed 5490.69 samples/sec   Loss 5.7360   LearningRate 0.0204   Epoch: 10   Global Step: 55430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:49:28,601-Speed 5566.19 samples/sec   Loss 5.5792   LearningRate 0.0204   Epoch: 10   Global Step: 55440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:30,463-Speed 5501.87 samples/sec   Loss 5.5037   LearningRate 0.0204   Epoch: 10   Global Step: 55450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:32,308-Speed 5553.34 samples/sec   Loss 5.8643   LearningRate 0.0204   Epoch: 10   Global Step: 55460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:34,162-Speed 5527.22 samples/sec   Loss 5.6429   LearningRate 0.0204   Epoch: 10   Global Step: 55470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:36,035-Speed 5469.86 samples/sec   Loss 5.5198   LearningRate 0.0204   Epoch: 10   Global Step: 55480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:37,898-Speed 5500.47 samples/sec   Loss 5.5862   LearningRate 0.0204   Epoch: 10   Global Step: 55490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:39,755-Speed 5514.78 samples/sec   Loss 5.5750   LearningRate 0.0204   Epoch: 10   Global Step: 55500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:41,615-Speed 5509.17 samples/sec   Loss 5.7029   LearningRate 0.0204   Epoch: 10   Global Step: 55510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:49:43,459-Speed 5555.18 samples/sec   Loss 5.6983   LearningRate 0.0204   Epoch: 10   Global Step: 55520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:49:45,319-Speed 5509.19 samples/sec   Loss 5.6516   LearningRate 0.0203   Epoch: 10   Global Step: 55530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:49:47,167-Speed 5545.59 samples/sec   Loss 5.5986   LearningRate 0.0203   Epoch: 10   Global Step: 55540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:49:49,022-Speed 5523.28 samples/sec   Loss 5.5851   LearningRate 0.0203   Epoch: 10   Global Step: 55550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:49:50,865-Speed 5559.01 samples/sec   Loss 5.7020   LearningRate 0.0203   Epoch: 10   Global Step: 55560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:49:52,718-Speed 5530.23 samples/sec   Loss 5.7410   LearningRate 0.0203   Epoch: 10   Global Step: 55570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:49:54,568-Speed 5538.00 samples/sec   Loss 5.6379   LearningRate 0.0203   Epoch: 10   Global Step: 55580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:49:56,429-Speed 5507.29 samples/sec   Loss 5.6747   LearningRate 0.0203   Epoch: 10   Global Step: 55590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:49:58,267-Speed 5574.55 samples/sec   Loss 5.5762   LearningRate 0.0203   Epoch: 10   Global Step: 55600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:50:00,151-Speed 5437.57 samples/sec   Loss 5.6034   LearningRate 0.0203   Epoch: 10   Global Step: 55610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:50:01,993-Speed 5561.13 samples/sec   Loss 5.4683   LearningRate 0.0203   Epoch: 10   Global Step: 55620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:50:03,918-Speed 5322.49 samples/sec   Loss 5.6437   LearningRate 0.0203   Epoch: 10   Global Step: 55630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:50:15,448-Speed 888.29 samples/sec   Loss 5.4565   LearningRate 0.0202   Epoch: 11   Global Step: 55640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:50:17,336-Speed 5427.77 samples/sec   Loss 4.8766   LearningRate 0.0202   Epoch: 11   Global Step: 55650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:50:19,207-Speed 5477.25 samples/sec   Loss 4.8261   LearningRate 0.0202   Epoch: 11   Global Step: 55660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:50:21,181-Speed 5188.49 samples/sec   Loss 4.7381   LearningRate 0.0202   Epoch: 11   Global Step: 55670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:50:23,060-Speed 5454.76 samples/sec   Loss 4.7083   LearningRate 0.0202   Epoch: 11   Global Step: 55680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:50:24,966-Speed 5375.99 samples/sec   Loss 4.8162   LearningRate 0.0202   Epoch: 11   Global Step: 55690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:50:26,831-Speed 5494.35 samples/sec   Loss 4.7462   LearningRate 0.0202   Epoch: 11   Global Step: 55700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:50:28,687-Speed 5518.91 samples/sec   Loss 4.7164   LearningRate 0.0202   Epoch: 11   Global Step: 55710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:50:30,550-Speed 5505.95 samples/sec   Loss 4.8561   LearningRate 0.0202   Epoch: 11   Global Step: 55720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:50:32,397-Speed 5551.61 samples/sec   Loss 4.6588   LearningRate 0.0202   Epoch: 11   Global Step: 55730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:50:34,254-Speed 5517.10 samples/sec   Loss 4.7921   LearningRate 0.0202   Epoch: 11   Global Step: 55740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:50:36,112-Speed 5514.08 samples/sec   Loss 4.8635   LearningRate 0.0202   Epoch: 11   Global Step: 55750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:50:37,988-Speed 5463.55 samples/sec   Loss 4.8222   LearningRate 0.0201   Epoch: 11   Global Step: 55760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:50:39,875-Speed 5429.70 samples/sec   Loss 4.7493   LearningRate 0.0201   Epoch: 11   Global Step: 55770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:50:41,756-Speed 5447.61 samples/sec   Loss 4.9538   LearningRate 0.0201   Epoch: 11   Global Step: 55780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:50:43,597-Speed 5564.34 samples/sec   Loss 4.8864   LearningRate 0.0201   Epoch: 11   Global Step: 55790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:50:45,448-Speed 5536.81 samples/sec   Loss 4.7755   LearningRate 0.0201   Epoch: 11   Global Step: 55800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:50:47,288-Speed 5566.76 samples/sec   Loss 4.7998   LearningRate 0.0201   Epoch: 11   Global Step: 55810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:50:49,153-Speed 5494.80 samples/sec   Loss 4.8333   LearningRate 0.0201   Epoch: 11   Global Step: 55820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:50:51,011-Speed 5513.14 samples/sec   Loss 4.9341   LearningRate 0.0201   Epoch: 11   Global Step: 55830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:50:52,890-Speed 5452.96 samples/sec   Loss 4.9448   LearningRate 0.0201   Epoch: 11   Global Step: 55840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:50:54,742-Speed 5531.51 samples/sec   Loss 4.7695   LearningRate 0.0201   Epoch: 11   Global Step: 55850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:50:56,622-Speed 5448.99 samples/sec   Loss 4.8345   LearningRate 0.0201   Epoch: 11   Global Step: 55860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:50:58,479-Speed 5516.75 samples/sec   Loss 4.9084   LearningRate 0.0200   Epoch: 11   Global Step: 55870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:51:00,342-Speed 5501.06 samples/sec   Loss 4.8106   LearningRate 0.0200   Epoch: 11   Global Step: 55880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:51:02,228-Speed 5432.97 samples/sec   Loss 4.8785   LearningRate 0.0200   Epoch: 11   Global Step: 55890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:51:04,090-Speed 5502.86 samples/sec   Loss 4.9176   LearningRate 0.0200   Epoch: 11   Global Step: 55900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:51:05,964-Speed 5466.71 samples/sec   Loss 4.8908   LearningRate 0.0200   Epoch: 11   Global Step: 55910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:51:07,832-Speed 5485.95 samples/sec   Loss 4.9080   LearningRate 0.0200   Epoch: 11   Global Step: 55920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:51:09,692-Speed 5506.66 samples/sec   Loss 4.9049   LearningRate 0.0200   Epoch: 11   Global Step: 55930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:51:11,555-Speed 5500.13 samples/sec   Loss 4.9196   LearningRate 0.0200   Epoch: 11   Global Step: 55940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:51:13,413-Speed 5518.28 samples/sec   Loss 4.9004   LearningRate 0.0200   Epoch: 11   Global Step: 55950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:51:15,266-Speed 5528.08 samples/sec   Loss 5.0980   LearningRate 0.0200   Epoch: 11   Global Step: 55960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:51:17,133-Speed 5487.70 samples/sec   Loss 4.8800   LearningRate 0.0200   Epoch: 11   Global Step: 55970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:51:18,988-Speed 5523.03 samples/sec   Loss 4.9868   LearningRate 0.0199   Epoch: 11   Global Step: 55980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:51:20,869-Speed 5448.97 samples/sec   Loss 4.9955   LearningRate 0.0199   Epoch: 11   Global Step: 55990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:51:22,720-Speed 5533.16 samples/sec   Loss 5.0226   LearningRate 0.0199   Epoch: 11   Global Step: 56000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:51:49,901-[lfw][56000]XNorm: 22.698495
Training: 2022-04-11 13:51:49,902-[lfw][56000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-11 13:51:49,903-[lfw][56000]Accuracy-Highest: 0.99817
Training: 2022-04-11 13:52:21,409-[cfp_fp][56000]XNorm: 20.081252
Training: 2022-04-11 13:52:21,410-[cfp_fp][56000]Accuracy-Flip: 0.97257+-0.00859
Training: 2022-04-11 13:52:21,411-[cfp_fp][56000]Accuracy-Highest: 0.97257
Training: 2022-04-11 13:52:48,523-[agedb_30][56000]XNorm: 22.154971
Training: 2022-04-11 13:52:48,524-[agedb_30][56000]Accuracy-Flip: 0.97733+-0.00651
Training: 2022-04-11 13:52:48,525-[agedb_30][56000]Accuracy-Highest: 0.97817
Training: 2022-04-11 13:52:50,408-Speed 116.78 samples/sec   Loss 5.0368   LearningRate 0.0199   Epoch: 11   Global Step: 56010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:52:52,269-Speed 5507.43 samples/sec   Loss 5.0220   LearningRate 0.0199   Epoch: 11   Global Step: 56020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:52:54,130-Speed 5504.77 samples/sec   Loss 4.9117   LearningRate 0.0199   Epoch: 11   Global Step: 56030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:52:55,971-Speed 5565.46 samples/sec   Loss 5.0721   LearningRate 0.0199   Epoch: 11   Global Step: 56040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:52:57,834-Speed 5499.91 samples/sec   Loss 5.0192   LearningRate 0.0199   Epoch: 11   Global Step: 56050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:52:59,692-Speed 5515.35 samples/sec   Loss 5.0632   LearningRate 0.0199   Epoch: 11   Global Step: 56060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:01,598-Speed 5373.68 samples/sec   Loss 5.0544   LearningRate 0.0199   Epoch: 11   Global Step: 56070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:03,438-Speed 5571.55 samples/sec   Loss 5.0412   LearningRate 0.0199   Epoch: 11   Global Step: 56080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:05,293-Speed 5521.55 samples/sec   Loss 4.9415   LearningRate 0.0198   Epoch: 11   Global Step: 56090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:07,140-Speed 5549.96 samples/sec   Loss 5.0997   LearningRate 0.0198   Epoch: 11   Global Step: 56100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:08,990-Speed 5536.12 samples/sec   Loss 4.9981   LearningRate 0.0198   Epoch: 11   Global Step: 56110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:10,830-Speed 5569.73 samples/sec   Loss 5.0542   LearningRate 0.0198   Epoch: 11   Global Step: 56120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:12,669-Speed 5568.13 samples/sec   Loss 5.0114   LearningRate 0.0198   Epoch: 11   Global Step: 56130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:53:14,518-Speed 5543.98 samples/sec   Loss 5.0307   LearningRate 0.0198   Epoch: 11   Global Step: 56140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:16,366-Speed 5545.16 samples/sec   Loss 4.8961   LearningRate 0.0198   Epoch: 11   Global Step: 56150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:18,262-Speed 5403.47 samples/sec   Loss 5.0311   LearningRate 0.0198   Epoch: 11   Global Step: 56160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:20,101-Speed 5571.92 samples/sec   Loss 5.0841   LearningRate 0.0198   Epoch: 11   Global Step: 56170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:21,935-Speed 5584.49 samples/sec   Loss 5.0687   LearningRate 0.0198   Epoch: 11   Global Step: 56180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:23,778-Speed 5561.53 samples/sec   Loss 5.0130   LearningRate 0.0198   Epoch: 11   Global Step: 56190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:25,646-Speed 5485.53 samples/sec   Loss 5.1587   LearningRate 0.0198   Epoch: 11   Global Step: 56200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:27,531-Speed 5433.88 samples/sec   Loss 5.1617   LearningRate 0.0197   Epoch: 11   Global Step: 56210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:29,381-Speed 5536.91 samples/sec   Loss 5.1047   LearningRate 0.0197   Epoch: 11   Global Step: 56220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:31,247-Speed 5492.65 samples/sec   Loss 5.1309   LearningRate 0.0197   Epoch: 11   Global Step: 56230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:33,090-Speed 5558.74 samples/sec   Loss 4.9600   LearningRate 0.0197   Epoch: 11   Global Step: 56240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:53:34,958-Speed 5487.88 samples/sec   Loss 4.9563   LearningRate 0.0197   Epoch: 11   Global Step: 56250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:53:36,815-Speed 5518.55 samples/sec   Loss 5.0720   LearningRate 0.0197   Epoch: 11   Global Step: 56260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:53:38,688-Speed 5471.26 samples/sec   Loss 5.1117   LearningRate 0.0197   Epoch: 11   Global Step: 56270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:53:40,544-Speed 5519.63 samples/sec   Loss 4.9760   LearningRate 0.0197   Epoch: 11   Global Step: 56280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:53:42,398-Speed 5524.12 samples/sec   Loss 4.9858   LearningRate 0.0197   Epoch: 11   Global Step: 56290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:53:44,230-Speed 5595.96 samples/sec   Loss 5.1267   LearningRate 0.0197   Epoch: 11   Global Step: 56300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:53:46,112-Speed 5443.60 samples/sec   Loss 5.1778   LearningRate 0.0197   Epoch: 11   Global Step: 56310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:53:47,948-Speed 5581.20 samples/sec   Loss 5.0369   LearningRate 0.0196   Epoch: 11   Global Step: 56320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:53:49,830-Speed 5442.99 samples/sec   Loss 4.9593   LearningRate 0.0196   Epoch: 11   Global Step: 56330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:53:51,699-Speed 5482.16 samples/sec   Loss 4.9747   LearningRate 0.0196   Epoch: 11   Global Step: 56340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:53:53,546-Speed 5550.42 samples/sec   Loss 4.9728   LearningRate 0.0196   Epoch: 11   Global Step: 56350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:53:55,398-Speed 5532.57 samples/sec   Loss 5.0192   LearningRate 0.0196   Epoch: 11   Global Step: 56360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:53:57,243-Speed 5552.52 samples/sec   Loss 5.1488   LearningRate 0.0196   Epoch: 11   Global Step: 56370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:53:59,098-Speed 5522.87 samples/sec   Loss 5.2059   LearningRate 0.0196   Epoch: 11   Global Step: 56380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:54:00,952-Speed 5526.48 samples/sec   Loss 5.0333   LearningRate 0.0196   Epoch: 11   Global Step: 56390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:54:02,829-Speed 5458.04 samples/sec   Loss 5.0774   LearningRate 0.0196   Epoch: 11   Global Step: 56400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:54:04,697-Speed 5485.85 samples/sec   Loss 4.9367   LearningRate 0.0196   Epoch: 11   Global Step: 56410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:54:06,537-Speed 5566.39 samples/sec   Loss 5.1707   LearningRate 0.0196   Epoch: 11   Global Step: 56420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:54:08,391-Speed 5529.33 samples/sec   Loss 5.0469   LearningRate 0.0196   Epoch: 11   Global Step: 56430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:54:10,244-Speed 5531.75 samples/sec   Loss 5.0769   LearningRate 0.0195   Epoch: 11   Global Step: 56440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:54:12,115-Speed 5475.61 samples/sec   Loss 5.1499   LearningRate 0.0195   Epoch: 11   Global Step: 56450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:54:13,973-Speed 5517.33 samples/sec   Loss 5.3538   LearningRate 0.0195   Epoch: 11   Global Step: 56460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:54:15,885-Speed 5360.95 samples/sec   Loss 5.1565   LearningRate 0.0195   Epoch: 11   Global Step: 56470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:17,725-Speed 5567.34 samples/sec   Loss 5.1710   LearningRate 0.0195   Epoch: 11   Global Step: 56480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:19,603-Speed 5458.64 samples/sec   Loss 4.9558   LearningRate 0.0195   Epoch: 11   Global Step: 56490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:21,451-Speed 5542.82 samples/sec   Loss 5.0597   LearningRate 0.0195   Epoch: 11   Global Step: 56500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:23,317-Speed 5494.45 samples/sec   Loss 5.2198   LearningRate 0.0195   Epoch: 11   Global Step: 56510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:25,162-Speed 5555.46 samples/sec   Loss 5.1907   LearningRate 0.0195   Epoch: 11   Global Step: 56520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:27,038-Speed 5461.51 samples/sec   Loss 5.1977   LearningRate 0.0195   Epoch: 11   Global Step: 56530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:28,908-Speed 5482.15 samples/sec   Loss 5.1429   LearningRate 0.0195   Epoch: 11   Global Step: 56540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:30,797-Speed 5425.12 samples/sec   Loss 5.0642   LearningRate 0.0194   Epoch: 11   Global Step: 56550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:32,652-Speed 5522.05 samples/sec   Loss 5.1963   LearningRate 0.0194   Epoch: 11   Global Step: 56560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:34,503-Speed 5537.48 samples/sec   Loss 5.2143   LearningRate 0.0194   Epoch: 11   Global Step: 56570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:54:36,380-Speed 5458.88 samples/sec   Loss 5.1822   LearningRate 0.0194   Epoch: 11   Global Step: 56580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:38,219-Speed 5570.67 samples/sec   Loss 5.2703   LearningRate 0.0194   Epoch: 11   Global Step: 56590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:40,104-Speed 5435.88 samples/sec   Loss 5.2754   LearningRate 0.0194   Epoch: 11   Global Step: 56600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:41,969-Speed 5491.45 samples/sec   Loss 5.0775   LearningRate 0.0194   Epoch: 11   Global Step: 56610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:43,837-Speed 5486.59 samples/sec   Loss 5.1119   LearningRate 0.0194   Epoch: 11   Global Step: 56620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:45,679-Speed 5562.33 samples/sec   Loss 5.1683   LearningRate 0.0194   Epoch: 11   Global Step: 56630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:47,551-Speed 5474.85 samples/sec   Loss 5.1847   LearningRate 0.0194   Epoch: 11   Global Step: 56640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:49,394-Speed 5559.34 samples/sec   Loss 5.2792   LearningRate 0.0194   Epoch: 11   Global Step: 56650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:51,292-Speed 5397.26 samples/sec   Loss 5.0243   LearningRate 0.0194   Epoch: 11   Global Step: 56660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:53,214-Speed 5329.72 samples/sec   Loss 5.2386   LearningRate 0.0193   Epoch: 11   Global Step: 56670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:55,212-Speed 5129.80 samples/sec   Loss 5.1173   LearningRate 0.0193   Epoch: 11   Global Step: 56680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:57,055-Speed 5558.51 samples/sec   Loss 5.0834   LearningRate 0.0193   Epoch: 11   Global Step: 56690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:54:58,914-Speed 5515.84 samples/sec   Loss 5.1004   LearningRate 0.0193   Epoch: 11   Global Step: 56700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:55:00,774-Speed 5507.25 samples/sec   Loss 5.1337   LearningRate 0.0193   Epoch: 11   Global Step: 56710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:55:02,633-Speed 5510.43 samples/sec   Loss 5.1408   LearningRate 0.0193   Epoch: 11   Global Step: 56720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:55:04,476-Speed 5561.70 samples/sec   Loss 5.2336   LearningRate 0.0193   Epoch: 11   Global Step: 56730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:55:06,339-Speed 5498.31 samples/sec   Loss 5.0605   LearningRate 0.0193   Epoch: 11   Global Step: 56740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:08,174-Speed 5581.71 samples/sec   Loss 5.1411   LearningRate 0.0193   Epoch: 11   Global Step: 56750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:10,020-Speed 5551.34 samples/sec   Loss 5.1433   LearningRate 0.0193   Epoch: 11   Global Step: 56760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:11,892-Speed 5472.37 samples/sec   Loss 5.2229   LearningRate 0.0193   Epoch: 11   Global Step: 56770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:13,772-Speed 5451.19 samples/sec   Loss 5.1995   LearningRate 0.0192   Epoch: 11   Global Step: 56780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:15,616-Speed 5557.24 samples/sec   Loss 5.1443   LearningRate 0.0192   Epoch: 11   Global Step: 56790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:17,488-Speed 5470.40 samples/sec   Loss 5.1652   LearningRate 0.0192   Epoch: 11   Global Step: 56800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:19,332-Speed 5558.46 samples/sec   Loss 5.3388   LearningRate 0.0192   Epoch: 11   Global Step: 56810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:21,169-Speed 5575.39 samples/sec   Loss 5.2346   LearningRate 0.0192   Epoch: 11   Global Step: 56820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:23,024-Speed 5522.91 samples/sec   Loss 5.2074   LearningRate 0.0192   Epoch: 11   Global Step: 56830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:24,913-Speed 5425.68 samples/sec   Loss 5.2893   LearningRate 0.0192   Epoch: 11   Global Step: 56840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:55:26,777-Speed 5496.03 samples/sec   Loss 5.1598   LearningRate 0.0192   Epoch: 11   Global Step: 56850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:55:28,630-Speed 5525.40 samples/sec   Loss 5.1656   LearningRate 0.0192   Epoch: 11   Global Step: 56860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:55:30,523-Speed 5414.78 samples/sec   Loss 5.2706   LearningRate 0.0192   Epoch: 11   Global Step: 56870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:55:32,360-Speed 5576.82 samples/sec   Loss 5.2635   LearningRate 0.0192   Epoch: 11   Global Step: 56880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:55:34,227-Speed 5488.92 samples/sec   Loss 5.1104   LearningRate 0.0192   Epoch: 11   Global Step: 56890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:55:36,080-Speed 5527.15 samples/sec   Loss 5.3019   LearningRate 0.0191   Epoch: 11   Global Step: 56900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:37,963-Speed 5442.30 samples/sec   Loss 5.2049   LearningRate 0.0191   Epoch: 11   Global Step: 56910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:39,819-Speed 5521.29 samples/sec   Loss 5.1693   LearningRate 0.0191   Epoch: 11   Global Step: 56920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:41,681-Speed 5500.47 samples/sec   Loss 5.2200   LearningRate 0.0191   Epoch: 11   Global Step: 56930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:43,522-Speed 5567.62 samples/sec   Loss 5.2132   LearningRate 0.0191   Epoch: 11   Global Step: 56940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:45,388-Speed 5489.14 samples/sec   Loss 5.1438   LearningRate 0.0191   Epoch: 11   Global Step: 56950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:47,249-Speed 5506.14 samples/sec   Loss 5.2103   LearningRate 0.0191   Epoch: 11   Global Step: 56960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:49,109-Speed 5511.52 samples/sec   Loss 5.0598   LearningRate 0.0191   Epoch: 11   Global Step: 56970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:50,986-Speed 5459.82 samples/sec   Loss 5.2323   LearningRate 0.0191   Epoch: 11   Global Step: 56980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:52,833-Speed 5544.39 samples/sec   Loss 5.0846   LearningRate 0.0191   Epoch: 11   Global Step: 56990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:55:54,707-Speed 5468.63 samples/sec   Loss 5.2632   LearningRate 0.0191   Epoch: 11   Global Step: 57000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:55:56,554-Speed 5546.71 samples/sec   Loss 5.3143   LearningRate 0.0190   Epoch: 11   Global Step: 57010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:55:58,390-Speed 5578.95 samples/sec   Loss 5.2309   LearningRate 0.0190   Epoch: 11   Global Step: 57020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:56:00,253-Speed 5500.46 samples/sec   Loss 5.2073   LearningRate 0.0190   Epoch: 11   Global Step: 57030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:56:02,095-Speed 5562.79 samples/sec   Loss 5.2878   LearningRate 0.0190   Epoch: 11   Global Step: 57040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:56:03,967-Speed 5472.78 samples/sec   Loss 5.1894   LearningRate 0.0190   Epoch: 11   Global Step: 57050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:56:05,801-Speed 5587.28 samples/sec   Loss 5.2012   LearningRate 0.0190   Epoch: 11   Global Step: 57060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:07,646-Speed 5550.82 samples/sec   Loss 5.1947   LearningRate 0.0190   Epoch: 11   Global Step: 57070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:09,493-Speed 5547.75 samples/sec   Loss 5.1657   LearningRate 0.0190   Epoch: 11   Global Step: 57080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:11,363-Speed 5478.05 samples/sec   Loss 5.2611   LearningRate 0.0190   Epoch: 11   Global Step: 57090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:13,217-Speed 5525.85 samples/sec   Loss 5.2955   LearningRate 0.0190   Epoch: 11   Global Step: 57100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:15,091-Speed 5468.72 samples/sec   Loss 5.1347   LearningRate 0.0190   Epoch: 11   Global Step: 57110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:16,953-Speed 5502.63 samples/sec   Loss 5.3464   LearningRate 0.0190   Epoch: 11   Global Step: 57120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:18,808-Speed 5523.59 samples/sec   Loss 5.2014   LearningRate 0.0189   Epoch: 11   Global Step: 57130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:20,648-Speed 5568.28 samples/sec   Loss 5.2178   LearningRate 0.0189   Epoch: 11   Global Step: 57140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:22,507-Speed 5509.73 samples/sec   Loss 5.1779   LearningRate 0.0189   Epoch: 11   Global Step: 57150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:24,360-Speed 5531.35 samples/sec   Loss 5.1715   LearningRate 0.0189   Epoch: 11   Global Step: 57160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:56:26,204-Speed 5554.17 samples/sec   Loss 5.3239   LearningRate 0.0189   Epoch: 11   Global Step: 57170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:56:28,063-Speed 5510.78 samples/sec   Loss 5.2199   LearningRate 0.0189   Epoch: 11   Global Step: 57180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:56:29,902-Speed 5573.00 samples/sec   Loss 5.2332   LearningRate 0.0189   Epoch: 11   Global Step: 57190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:56:31,773-Speed 5477.10 samples/sec   Loss 5.2913   LearningRate 0.0189   Epoch: 11   Global Step: 57200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:56:33,615-Speed 5560.77 samples/sec   Loss 5.1978   LearningRate 0.0189   Epoch: 11   Global Step: 57210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:56:35,485-Speed 5480.26 samples/sec   Loss 5.1838   LearningRate 0.0189   Epoch: 11   Global Step: 57220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:56:37,344-Speed 5509.53 samples/sec   Loss 5.2105   LearningRate 0.0189   Epoch: 11   Global Step: 57230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:56:39,223-Speed 5452.86 samples/sec   Loss 5.1171   LearningRate 0.0188   Epoch: 11   Global Step: 57240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:56:41,092-Speed 5481.19 samples/sec   Loss 5.3231   LearningRate 0.0188   Epoch: 11   Global Step: 57250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:42,943-Speed 5534.95 samples/sec   Loss 5.2371   LearningRate 0.0188   Epoch: 11   Global Step: 57260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:44,814-Speed 5520.14 samples/sec   Loss 5.2273   LearningRate 0.0188   Epoch: 11   Global Step: 57270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:46,654-Speed 5570.33 samples/sec   Loss 5.1047   LearningRate 0.0188   Epoch: 11   Global Step: 57280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:48,504-Speed 5535.03 samples/sec   Loss 5.2167   LearningRate 0.0188   Epoch: 11   Global Step: 57290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:50,364-Speed 5509.50 samples/sec   Loss 5.3135   LearningRate 0.0188   Epoch: 11   Global Step: 57300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:52,220-Speed 5520.00 samples/sec   Loss 5.1562   LearningRate 0.0188   Epoch: 11   Global Step: 57310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:54,069-Speed 5543.47 samples/sec   Loss 5.2181   LearningRate 0.0188   Epoch: 11   Global Step: 57320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:55,940-Speed 5476.74 samples/sec   Loss 5.2316   LearningRate 0.0188   Epoch: 11   Global Step: 57330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:57,782-Speed 5559.97 samples/sec   Loss 5.2226   LearningRate 0.0188   Epoch: 11   Global Step: 57340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:56:59,650-Speed 5487.62 samples/sec   Loss 5.4202   LearningRate 0.0188   Epoch: 11   Global Step: 57350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:01,494-Speed 5553.62 samples/sec   Loss 5.3441   LearningRate 0.0187   Epoch: 11   Global Step: 57360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:03,358-Speed 5496.69 samples/sec   Loss 5.2950   LearningRate 0.0187   Epoch: 11   Global Step: 57370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:05,228-Speed 5481.09 samples/sec   Loss 5.1390   LearningRate 0.0187   Epoch: 11   Global Step: 57380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:07,076-Speed 5541.55 samples/sec   Loss 5.2847   LearningRate 0.0187   Epoch: 11   Global Step: 57390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:08,938-Speed 5503.05 samples/sec   Loss 5.2874   LearningRate 0.0187   Epoch: 11   Global Step: 57400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:57:10,792-Speed 5528.47 samples/sec   Loss 5.2004   LearningRate 0.0187   Epoch: 11   Global Step: 57410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:57:12,653-Speed 5505.12 samples/sec   Loss 5.2390   LearningRate 0.0187   Epoch: 11   Global Step: 57420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:57:14,526-Speed 5470.30 samples/sec   Loss 5.2204   LearningRate 0.0187   Epoch: 11   Global Step: 57430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:57:16,381-Speed 5522.17 samples/sec   Loss 5.3566   LearningRate 0.0187   Epoch: 11   Global Step: 57440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:57:18,221-Speed 5567.57 samples/sec   Loss 5.2728   LearningRate 0.0187   Epoch: 11   Global Step: 57450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:57:20,075-Speed 5527.16 samples/sec   Loss 5.2827   LearningRate 0.0187   Epoch: 11   Global Step: 57460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:57:21,924-Speed 5541.70 samples/sec   Loss 5.2125   LearningRate 0.0187   Epoch: 11   Global Step: 57470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:57:23,767-Speed 5557.88 samples/sec   Loss 5.3858   LearningRate 0.0186   Epoch: 11   Global Step: 57480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:57:25,658-Speed 5418.64 samples/sec   Loss 5.2641   LearningRate 0.0186   Epoch: 11   Global Step: 57490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:57:27,515-Speed 5517.62 samples/sec   Loss 5.3542   LearningRate 0.0186   Epoch: 11   Global Step: 57500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:29,368-Speed 5532.58 samples/sec   Loss 5.2786   LearningRate 0.0186   Epoch: 11   Global Step: 57510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:31,210-Speed 5560.83 samples/sec   Loss 5.3003   LearningRate 0.0186   Epoch: 11   Global Step: 57520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:33,054-Speed 5556.97 samples/sec   Loss 5.2973   LearningRate 0.0186   Epoch: 11   Global Step: 57530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:34,906-Speed 5533.70 samples/sec   Loss 5.0813   LearningRate 0.0186   Epoch: 11   Global Step: 57540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:36,745-Speed 5570.37 samples/sec   Loss 5.2694   LearningRate 0.0186   Epoch: 11   Global Step: 57550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:38,625-Speed 5449.92 samples/sec   Loss 5.1571   LearningRate 0.0186   Epoch: 11   Global Step: 57560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:40,474-Speed 5542.19 samples/sec   Loss 5.2293   LearningRate 0.0186   Epoch: 11   Global Step: 57570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:42,344-Speed 5478.79 samples/sec   Loss 5.3060   LearningRate 0.0186   Epoch: 11   Global Step: 57580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:44,191-Speed 5546.21 samples/sec   Loss 5.2729   LearningRate 0.0186   Epoch: 11   Global Step: 57590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:46,041-Speed 5538.05 samples/sec   Loss 5.2918   LearningRate 0.0185   Epoch: 11   Global Step: 57600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:47,883-Speed 5561.75 samples/sec   Loss 5.1600   LearningRate 0.0185   Epoch: 11   Global Step: 57610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:49,777-Speed 5409.98 samples/sec   Loss 5.2322   LearningRate 0.0185   Epoch: 11   Global Step: 57620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:51,630-Speed 5530.64 samples/sec   Loss 5.2872   LearningRate 0.0185   Epoch: 11   Global Step: 57630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:57:53,487-Speed 5516.15 samples/sec   Loss 5.2342   LearningRate 0.0185   Epoch: 11   Global Step: 57640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:57:55,338-Speed 5536.69 samples/sec   Loss 5.3955   LearningRate 0.0185   Epoch: 11   Global Step: 57650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:57:57,208-Speed 5479.45 samples/sec   Loss 5.2178   LearningRate 0.0185   Epoch: 11   Global Step: 57660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:57:59,070-Speed 5505.41 samples/sec   Loss 5.3225   LearningRate 0.0185   Epoch: 11   Global Step: 57670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:58:00,910-Speed 5568.16 samples/sec   Loss 5.3002   LearningRate 0.0185   Epoch: 11   Global Step: 57680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:58:02,752-Speed 5560.59 samples/sec   Loss 5.1745   LearningRate 0.0185   Epoch: 11   Global Step: 57690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:58:04,624-Speed 5473.62 samples/sec   Loss 5.2734   LearningRate 0.0185   Epoch: 11   Global Step: 57700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:58:06,493-Speed 5482.19 samples/sec   Loss 5.3403   LearningRate 0.0184   Epoch: 11   Global Step: 57710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:58:08,346-Speed 5527.29 samples/sec   Loss 5.3031   LearningRate 0.0184   Epoch: 11   Global Step: 57720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:58:10,204-Speed 5515.48 samples/sec   Loss 5.3202   LearningRate 0.0184   Epoch: 11   Global Step: 57730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:58:12,056-Speed 5532.27 samples/sec   Loss 5.3653   LearningRate 0.0184   Epoch: 11   Global Step: 57740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:58:13,942-Speed 5432.90 samples/sec   Loss 5.2589   LearningRate 0.0184   Epoch: 11   Global Step: 57750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:58:15,788-Speed 5547.52 samples/sec   Loss 5.2402   LearningRate 0.0184   Epoch: 11   Global Step: 57760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:58:17,634-Speed 5551.28 samples/sec   Loss 5.3333   LearningRate 0.0184   Epoch: 11   Global Step: 57770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:58:19,481-Speed 5546.20 samples/sec   Loss 5.2912   LearningRate 0.0184   Epoch: 11   Global Step: 57780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:58:21,327-Speed 5549.00 samples/sec   Loss 5.2963   LearningRate 0.0184   Epoch: 11   Global Step: 57790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:58:23,177-Speed 5536.77 samples/sec   Loss 5.3221   LearningRate 0.0184   Epoch: 11   Global Step: 57800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:58:25,047-Speed 5484.30 samples/sec   Loss 5.2341   LearningRate 0.0184   Epoch: 11   Global Step: 57810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:58:26,883-Speed 5578.40 samples/sec   Loss 5.1913   LearningRate 0.0184   Epoch: 11   Global Step: 57820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:58:28,728-Speed 5552.98 samples/sec   Loss 5.2734   LearningRate 0.0183   Epoch: 11   Global Step: 57830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:58:30,577-Speed 5539.57 samples/sec   Loss 5.4204   LearningRate 0.0183   Epoch: 11   Global Step: 57840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 13:58:32,411-Speed 5586.91 samples/sec   Loss 5.2244   LearningRate 0.0183   Epoch: 11   Global Step: 57850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:58:34,262-Speed 5535.55 samples/sec   Loss 5.1822   LearningRate 0.0183   Epoch: 11   Global Step: 57860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:58:36,115-Speed 5528.98 samples/sec   Loss 5.2571   LearningRate 0.0183   Epoch: 11   Global Step: 57870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:58:37,981-Speed 5489.11 samples/sec   Loss 5.2627   LearningRate 0.0183   Epoch: 11   Global Step: 57880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:58:39,841-Speed 5508.97 samples/sec   Loss 5.2782   LearningRate 0.0183   Epoch: 11   Global Step: 57890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:58:41,702-Speed 5506.10 samples/sec   Loss 5.3433   LearningRate 0.0183   Epoch: 11   Global Step: 57900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 13:58:43,549-Speed 5546.03 samples/sec   Loss 5.2234   LearningRate 0.0183   Epoch: 11   Global Step: 57910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:58:45,396-Speed 5546.12 samples/sec   Loss 5.2581   LearningRate 0.0183   Epoch: 11   Global Step: 57920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:58:47,253-Speed 5518.41 samples/sec   Loss 5.3636   LearningRate 0.0183   Epoch: 11   Global Step: 57930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:58:49,111-Speed 5513.51 samples/sec   Loss 5.3622   LearningRate 0.0183   Epoch: 11   Global Step: 57940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:58:50,957-Speed 5550.68 samples/sec   Loss 5.2782   LearningRate 0.0182   Epoch: 11   Global Step: 57950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:58:52,829-Speed 5474.04 samples/sec   Loss 5.3166   LearningRate 0.0182   Epoch: 11   Global Step: 57960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:58:54,680-Speed 5534.61 samples/sec   Loss 5.3564   LearningRate 0.0182   Epoch: 11   Global Step: 57970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:58:56,537-Speed 5518.89 samples/sec   Loss 5.2957   LearningRate 0.0182   Epoch: 11   Global Step: 57980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:58:58,384-Speed 5544.43 samples/sec   Loss 5.3294   LearningRate 0.0182   Epoch: 11   Global Step: 57990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 13:59:00,217-Speed 5590.53 samples/sec   Loss 5.3350   LearningRate 0.0182   Epoch: 11   Global Step: 58000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 13:59:27,638-[lfw][58000]XNorm: 22.861997
Training: 2022-04-11 13:59:27,639-[lfw][58000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-04-11 13:59:27,639-[lfw][58000]Accuracy-Highest: 0.99817
Training: 2022-04-11 13:59:59,210-[cfp_fp][58000]XNorm: 20.234653
Training: 2022-04-11 13:59:59,211-[cfp_fp][58000]Accuracy-Flip: 0.97371+-0.00721
Training: 2022-04-11 13:59:59,212-[cfp_fp][58000]Accuracy-Highest: 0.97371
Training: 2022-04-11 14:00:26,480-[agedb_30][58000]XNorm: 22.841101
Training: 2022-04-11 14:00:26,481-[agedb_30][58000]Accuracy-Flip: 0.97767+-0.00659
Training: 2022-04-11 14:00:26,482-[agedb_30][58000]Accuracy-Highest: 0.97817
Training: 2022-04-11 14:00:28,331-Speed 116.22 samples/sec   Loss 5.2523   LearningRate 0.0182   Epoch: 11   Global Step: 58010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:00:30,171-Speed 5566.63 samples/sec   Loss 5.4575   LearningRate 0.0182   Epoch: 11   Global Step: 58020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:00:32,000-Speed 5602.33 samples/sec   Loss 5.3829   LearningRate 0.0182   Epoch: 11   Global Step: 58030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:00:33,845-Speed 5551.99 samples/sec   Loss 5.2061   LearningRate 0.0182   Epoch: 11   Global Step: 58040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:00:35,671-Speed 5612.80 samples/sec   Loss 5.2360   LearningRate 0.0182   Epoch: 11   Global Step: 58050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:00:37,541-Speed 5479.79 samples/sec   Loss 5.3561   LearningRate 0.0182   Epoch: 11   Global Step: 58060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:00:39,375-Speed 5585.12 samples/sec   Loss 5.3345   LearningRate 0.0181   Epoch: 11   Global Step: 58070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:00:41,223-Speed 5545.84 samples/sec   Loss 5.2834   LearningRate 0.0181   Epoch: 11   Global Step: 58080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:00:43,061-Speed 5572.90 samples/sec   Loss 5.4563   LearningRate 0.0181   Epoch: 11   Global Step: 58090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:00:44,893-Speed 5593.30 samples/sec   Loss 5.1529   LearningRate 0.0181   Epoch: 11   Global Step: 58100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:00:46,731-Speed 5573.67 samples/sec   Loss 5.2414   LearningRate 0.0181   Epoch: 11   Global Step: 58110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:00:48,620-Speed 5425.97 samples/sec   Loss 5.2433   LearningRate 0.0181   Epoch: 11   Global Step: 58120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:00:50,458-Speed 5572.46 samples/sec   Loss 5.4309   LearningRate 0.0181   Epoch: 11   Global Step: 58130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:00:52,322-Speed 5498.06 samples/sec   Loss 5.2795   LearningRate 0.0181   Epoch: 11   Global Step: 58140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:00:54,158-Speed 5580.79 samples/sec   Loss 5.2789   LearningRate 0.0181   Epoch: 11   Global Step: 58150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:00:56,022-Speed 5493.67 samples/sec   Loss 5.2987   LearningRate 0.0181   Epoch: 11   Global Step: 58160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:00:57,854-Speed 5594.31 samples/sec   Loss 5.2817   LearningRate 0.0181   Epoch: 11   Global Step: 58170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:00:59,715-Speed 5506.28 samples/sec   Loss 5.2079   LearningRate 0.0181   Epoch: 11   Global Step: 58180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:01,581-Speed 5491.56 samples/sec   Loss 5.2639   LearningRate 0.0180   Epoch: 11   Global Step: 58190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:03,453-Speed 5471.52 samples/sec   Loss 5.3517   LearningRate 0.0180   Epoch: 11   Global Step: 58200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:01:05,312-Speed 5512.68 samples/sec   Loss 5.2969   LearningRate 0.0180   Epoch: 11   Global Step: 58210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:01:07,151-Speed 5570.61 samples/sec   Loss 5.1033   LearningRate 0.0180   Epoch: 11   Global Step: 58220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:01:09,012-Speed 5506.85 samples/sec   Loss 5.2108   LearningRate 0.0180   Epoch: 11   Global Step: 58230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:01:10,894-Speed 5445.16 samples/sec   Loss 5.1943   LearningRate 0.0180   Epoch: 11   Global Step: 58240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:01:12,738-Speed 5558.75 samples/sec   Loss 5.2037   LearningRate 0.0180   Epoch: 11   Global Step: 58250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:01:14,603-Speed 5493.67 samples/sec   Loss 5.2664   LearningRate 0.0180   Epoch: 11   Global Step: 58260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:01:16,442-Speed 5573.33 samples/sec   Loss 5.3228   LearningRate 0.0180   Epoch: 11   Global Step: 58270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:18,319-Speed 5458.03 samples/sec   Loss 5.3406   LearningRate 0.0180   Epoch: 11   Global Step: 58280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:20,161-Speed 5563.29 samples/sec   Loss 5.2387   LearningRate 0.0180   Epoch: 11   Global Step: 58290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:22,025-Speed 5495.70 samples/sec   Loss 5.3063   LearningRate 0.0180   Epoch: 11   Global Step: 58300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:23,868-Speed 5560.49 samples/sec   Loss 5.1461   LearningRate 0.0179   Epoch: 11   Global Step: 58310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:25,727-Speed 5509.19 samples/sec   Loss 5.2855   LearningRate 0.0179   Epoch: 11   Global Step: 58320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:27,578-Speed 5537.03 samples/sec   Loss 5.3131   LearningRate 0.0179   Epoch: 11   Global Step: 58330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:29,461-Speed 5439.96 samples/sec   Loss 5.1686   LearningRate 0.0179   Epoch: 11   Global Step: 58340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:31,305-Speed 5554.59 samples/sec   Loss 5.3719   LearningRate 0.0179   Epoch: 11   Global Step: 58350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:33,173-Speed 5486.84 samples/sec   Loss 5.1635   LearningRate 0.0179   Epoch: 11   Global Step: 58360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:35,014-Speed 5564.86 samples/sec   Loss 5.3242   LearningRate 0.0179   Epoch: 11   Global Step: 58370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:01:36,870-Speed 5521.84 samples/sec   Loss 5.3519   LearningRate 0.0179   Epoch: 11   Global Step: 58380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:01:38,726-Speed 5519.61 samples/sec   Loss 5.1801   LearningRate 0.0179   Epoch: 11   Global Step: 58390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:01:40,582-Speed 5520.09 samples/sec   Loss 5.3671   LearningRate 0.0179   Epoch: 11   Global Step: 58400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:01:42,433-Speed 5534.53 samples/sec   Loss 5.3789   LearningRate 0.0179   Epoch: 11   Global Step: 58410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:01:44,283-Speed 5539.11 samples/sec   Loss 5.1224   LearningRate 0.0179   Epoch: 11   Global Step: 58420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:46,116-Speed 5588.76 samples/sec   Loss 5.1608   LearningRate 0.0178   Epoch: 11   Global Step: 58430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:47,994-Speed 5455.88 samples/sec   Loss 5.3382   LearningRate 0.0178   Epoch: 11   Global Step: 58440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:49,842-Speed 5544.98 samples/sec   Loss 5.3036   LearningRate 0.0178   Epoch: 11   Global Step: 58450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:51,703-Speed 5509.71 samples/sec   Loss 5.2943   LearningRate 0.0178   Epoch: 11   Global Step: 58460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:53,570-Speed 5486.51 samples/sec   Loss 5.3046   LearningRate 0.0178   Epoch: 11   Global Step: 58470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:01:55,406-Speed 5582.12 samples/sec   Loss 5.1696   LearningRate 0.0178   Epoch: 11   Global Step: 58480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:01:57,270-Speed 5495.38 samples/sec   Loss 5.1854   LearningRate 0.0178   Epoch: 11   Global Step: 58490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:01:59,124-Speed 5527.77 samples/sec   Loss 5.2803   LearningRate 0.0178   Epoch: 11   Global Step: 58500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:02:01,002-Speed 5455.74 samples/sec   Loss 5.3543   LearningRate 0.0178   Epoch: 11   Global Step: 58510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:02:02,851-Speed 5541.29 samples/sec   Loss 5.3380   LearningRate 0.0178   Epoch: 11   Global Step: 58520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:02:04,709-Speed 5513.21 samples/sec   Loss 5.1811   LearningRate 0.0178   Epoch: 11   Global Step: 58530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:02:06,546-Speed 5576.94 samples/sec   Loss 5.2931   LearningRate 0.0178   Epoch: 11   Global Step: 58540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:02:08,404-Speed 5515.57 samples/sec   Loss 5.2169   LearningRate 0.0177   Epoch: 11   Global Step: 58550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:02:10,246-Speed 5561.40 samples/sec   Loss 5.4143   LearningRate 0.0177   Epoch: 11   Global Step: 58560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:02:12,109-Speed 5501.37 samples/sec   Loss 5.3068   LearningRate 0.0177   Epoch: 11   Global Step: 58570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:02:13,975-Speed 5490.66 samples/sec   Loss 5.4858   LearningRate 0.0177   Epoch: 11   Global Step: 58580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:15,814-Speed 5568.21 samples/sec   Loss 5.2511   LearningRate 0.0177   Epoch: 11   Global Step: 58590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:17,653-Speed 5574.27 samples/sec   Loss 5.0967   LearningRate 0.0177   Epoch: 11   Global Step: 58600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:19,517-Speed 5494.86 samples/sec   Loss 5.3011   LearningRate 0.0177   Epoch: 11   Global Step: 58610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:21,359-Speed 5562.45 samples/sec   Loss 5.2258   LearningRate 0.0177   Epoch: 11   Global Step: 58620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:23,215-Speed 5520.62 samples/sec   Loss 5.3039   LearningRate 0.0177   Epoch: 11   Global Step: 58630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:25,102-Speed 5428.38 samples/sec   Loss 5.2485   LearningRate 0.0177   Epoch: 11   Global Step: 58640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:26,940-Speed 5574.96 samples/sec   Loss 5.4165   LearningRate 0.0177   Epoch: 11   Global Step: 58650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:28,792-Speed 5531.69 samples/sec   Loss 5.3023   LearningRate 0.0177   Epoch: 11   Global Step: 58660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:30,637-Speed 5552.80 samples/sec   Loss 5.3006   LearningRate 0.0176   Epoch: 11   Global Step: 58670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:32,507-Speed 5481.56 samples/sec   Loss 5.2382   LearningRate 0.0176   Epoch: 11   Global Step: 58680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:02:34,341-Speed 5585.62 samples/sec   Loss 5.3512   LearningRate 0.0176   Epoch: 11   Global Step: 58690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:02:36,218-Speed 5459.53 samples/sec   Loss 5.2825   LearningRate 0.0176   Epoch: 11   Global Step: 58700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:02:38,067-Speed 5542.75 samples/sec   Loss 5.2588   LearningRate 0.0176   Epoch: 11   Global Step: 58710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:02:39,917-Speed 5536.50 samples/sec   Loss 5.3700   LearningRate 0.0176   Epoch: 11   Global Step: 58720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:02:41,767-Speed 5538.24 samples/sec   Loss 5.1588   LearningRate 0.0176   Epoch: 11   Global Step: 58730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:43,607-Speed 5567.58 samples/sec   Loss 5.2051   LearningRate 0.0176   Epoch: 11   Global Step: 58740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:45,445-Speed 5573.75 samples/sec   Loss 5.1018   LearningRate 0.0176   Epoch: 11   Global Step: 58750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:47,310-Speed 5494.97 samples/sec   Loss 5.2005   LearningRate 0.0176   Epoch: 11   Global Step: 58760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:49,154-Speed 5556.29 samples/sec   Loss 5.2609   LearningRate 0.0176   Epoch: 11   Global Step: 58770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:50,986-Speed 5592.27 samples/sec   Loss 5.1655   LearningRate 0.0176   Epoch: 11   Global Step: 58780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:52,855-Speed 5480.24 samples/sec   Loss 5.2600   LearningRate 0.0175   Epoch: 11   Global Step: 58790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:54,720-Speed 5492.00 samples/sec   Loss 5.2698   LearningRate 0.0175   Epoch: 11   Global Step: 58800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:56,570-Speed 5539.43 samples/sec   Loss 5.1911   LearningRate 0.0175   Epoch: 11   Global Step: 58810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:02:58,437-Speed 5488.74 samples/sec   Loss 5.3378   LearningRate 0.0175   Epoch: 11   Global Step: 58820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:00,268-Speed 5591.59 samples/sec   Loss 5.2018   LearningRate 0.0175   Epoch: 11   Global Step: 58830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:03:02,111-Speed 5561.16 samples/sec   Loss 5.2082   LearningRate 0.0175   Epoch: 11   Global Step: 58840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:03:03,950-Speed 5572.35 samples/sec   Loss 5.1519   LearningRate 0.0175   Epoch: 11   Global Step: 58850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:03:05,798-Speed 5544.48 samples/sec   Loss 5.3206   LearningRate 0.0175   Epoch: 11   Global Step: 58860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:03:07,653-Speed 5524.42 samples/sec   Loss 5.1951   LearningRate 0.0175   Epoch: 11   Global Step: 58870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:03:09,485-Speed 5589.32 samples/sec   Loss 5.1549   LearningRate 0.0175   Epoch: 11   Global Step: 58880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:03:11,365-Speed 5449.32 samples/sec   Loss 5.3242   LearningRate 0.0175   Epoch: 11   Global Step: 58890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:03:13,204-Speed 5573.83 samples/sec   Loss 5.1915   LearningRate 0.0175   Epoch: 11   Global Step: 58900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:03:15,075-Speed 5475.76 samples/sec   Loss 5.2449   LearningRate 0.0174   Epoch: 11   Global Step: 58910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:03:16,916-Speed 5566.87 samples/sec   Loss 5.2697   LearningRate 0.0174   Epoch: 11   Global Step: 58920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:03:18,764-Speed 5540.40 samples/sec   Loss 5.1163   LearningRate 0.0174   Epoch: 11   Global Step: 58930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:03:20,613-Speed 5542.01 samples/sec   Loss 5.3396   LearningRate 0.0174   Epoch: 11   Global Step: 58940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:03:22,471-Speed 5516.01 samples/sec   Loss 5.1259   LearningRate 0.0174   Epoch: 11   Global Step: 58950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:24,316-Speed 5552.42 samples/sec   Loss 5.2247   LearningRate 0.0174   Epoch: 11   Global Step: 58960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:26,151-Speed 5583.11 samples/sec   Loss 5.1877   LearningRate 0.0174   Epoch: 11   Global Step: 58970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:28,010-Speed 5512.96 samples/sec   Loss 5.1383   LearningRate 0.0174   Epoch: 11   Global Step: 58980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:29,843-Speed 5587.51 samples/sec   Loss 5.1005   LearningRate 0.0174   Epoch: 11   Global Step: 58990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:31,682-Speed 5571.49 samples/sec   Loss 5.3231   LearningRate 0.0174   Epoch: 11   Global Step: 59000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:33,544-Speed 5504.15 samples/sec   Loss 5.2541   LearningRate 0.0174   Epoch: 11   Global Step: 59010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:35,383-Speed 5570.16 samples/sec   Loss 5.2841   LearningRate 0.0174   Epoch: 11   Global Step: 59020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:37,238-Speed 5526.15 samples/sec   Loss 5.2703   LearningRate 0.0173   Epoch: 11   Global Step: 59030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:39,096-Speed 5514.45 samples/sec   Loss 5.2325   LearningRate 0.0173   Epoch: 11   Global Step: 59040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:40,967-Speed 5476.86 samples/sec   Loss 5.3050   LearningRate 0.0173   Epoch: 11   Global Step: 59050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:03:42,812-Speed 5553.54 samples/sec   Loss 5.1469   LearningRate 0.0173   Epoch: 11   Global Step: 59060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:03:44,663-Speed 5535.57 samples/sec   Loss 5.4270   LearningRate 0.0173   Epoch: 11   Global Step: 59070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:46,538-Speed 5465.66 samples/sec   Loss 5.3491   LearningRate 0.0173   Epoch: 11   Global Step: 59080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:48,414-Speed 5461.11 samples/sec   Loss 5.0896   LearningRate 0.0173   Epoch: 11   Global Step: 59090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:50,311-Speed 5402.56 samples/sec   Loss 5.2485   LearningRate 0.0173   Epoch: 11   Global Step: 59100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:52,170-Speed 5511.08 samples/sec   Loss 5.2595   LearningRate 0.0173   Epoch: 11   Global Step: 59110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:54,036-Speed 5492.12 samples/sec   Loss 5.3145   LearningRate 0.0173   Epoch: 11   Global Step: 59120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:55,884-Speed 5543.01 samples/sec   Loss 5.0376   LearningRate 0.0173   Epoch: 11   Global Step: 59130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:57,748-Speed 5500.32 samples/sec   Loss 5.2571   LearningRate 0.0173   Epoch: 11   Global Step: 59140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:03:59,607-Speed 5510.68 samples/sec   Loss 5.1882   LearningRate 0.0172   Epoch: 11   Global Step: 59150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:01,446-Speed 5571.31 samples/sec   Loss 5.1662   LearningRate 0.0172   Epoch: 11   Global Step: 59160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:03,333-Speed 5429.43 samples/sec   Loss 5.2233   LearningRate 0.0172   Epoch: 11   Global Step: 59170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:04:05,179-Speed 5552.50 samples/sec   Loss 5.2324   LearningRate 0.0172   Epoch: 11   Global Step: 59180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:04:07,082-Speed 5384.62 samples/sec   Loss 5.1578   LearningRate 0.0172   Epoch: 11   Global Step: 59190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:04:08,913-Speed 5595.77 samples/sec   Loss 5.2627   LearningRate 0.0172   Epoch: 11   Global Step: 59200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:10,777-Speed 5495.64 samples/sec   Loss 5.1264   LearningRate 0.0172   Epoch: 11   Global Step: 59210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:12,699-Speed 5332.54 samples/sec   Loss 5.2335   LearningRate 0.0172   Epoch: 11   Global Step: 59220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:14,563-Speed 5498.26 samples/sec   Loss 5.2875   LearningRate 0.0172   Epoch: 11   Global Step: 59230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:16,400-Speed 5576.16 samples/sec   Loss 5.2110   LearningRate 0.0172   Epoch: 11   Global Step: 59240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:18,241-Speed 5568.52 samples/sec   Loss 5.2516   LearningRate 0.0172   Epoch: 11   Global Step: 59250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:20,077-Speed 5579.95 samples/sec   Loss 5.3318   LearningRate 0.0172   Epoch: 11   Global Step: 59260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:21,947-Speed 5478.61 samples/sec   Loss 5.2702   LearningRate 0.0171   Epoch: 11   Global Step: 59270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:23,793-Speed 5589.03 samples/sec   Loss 5.2822   LearningRate 0.0171   Epoch: 11   Global Step: 59280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:25,663-Speed 5478.48 samples/sec   Loss 5.1617   LearningRate 0.0171   Epoch: 11   Global Step: 59290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:27,511-Speed 5546.48 samples/sec   Loss 5.3365   LearningRate 0.0171   Epoch: 11   Global Step: 59300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:04:29,357-Speed 5548.33 samples/sec   Loss 5.2786   LearningRate 0.0171   Epoch: 11   Global Step: 59310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:04:31,236-Speed 5451.09 samples/sec   Loss 5.3341   LearningRate 0.0171   Epoch: 11   Global Step: 59320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:04:33,073-Speed 5579.47 samples/sec   Loss 5.3778   LearningRate 0.0171   Epoch: 11   Global Step: 59330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:04:34,918-Speed 5550.56 samples/sec   Loss 5.1893   LearningRate 0.0171   Epoch: 11   Global Step: 59340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:04:36,778-Speed 5508.26 samples/sec   Loss 5.2022   LearningRate 0.0171   Epoch: 11   Global Step: 59350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:04:38,614-Speed 5579.83 samples/sec   Loss 5.2556   LearningRate 0.0171   Epoch: 11   Global Step: 59360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:04:40,478-Speed 5497.37 samples/sec   Loss 5.3498   LearningRate 0.0171   Epoch: 11   Global Step: 59370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:04:42,321-Speed 5557.97 samples/sec   Loss 5.2044   LearningRate 0.0171   Epoch: 11   Global Step: 59380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:04:44,177-Speed 5521.61 samples/sec   Loss 5.3228   LearningRate 0.0170   Epoch: 11   Global Step: 59390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:04:46,014-Speed 5576.59 samples/sec   Loss 5.1072   LearningRate 0.0170   Epoch: 11   Global Step: 59400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 14:04:47,835-Speed 5626.87 samples/sec   Loss 5.1549   LearningRate 0.0170   Epoch: 11   Global Step: 59410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:49,681-Speed 5547.19 samples/sec   Loss 5.1600   LearningRate 0.0170   Epoch: 11   Global Step: 59420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:51,529-Speed 5545.00 samples/sec   Loss 5.2027   LearningRate 0.0170   Epoch: 11   Global Step: 59430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:53,420-Speed 5419.70 samples/sec   Loss 5.1868   LearningRate 0.0170   Epoch: 11   Global Step: 59440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:55,259-Speed 5570.22 samples/sec   Loss 5.1690   LearningRate 0.0170   Epoch: 11   Global Step: 59450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:57,120-Speed 5507.64 samples/sec   Loss 5.3898   LearningRate 0.0170   Epoch: 11   Global Step: 59460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:04:58,967-Speed 5546.12 samples/sec   Loss 5.0968   LearningRate 0.0170   Epoch: 11   Global Step: 59470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:05:00,886-Speed 5339.95 samples/sec   Loss 5.3073   LearningRate 0.0170   Epoch: 11   Global Step: 59480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:05:02,732-Speed 5550.10 samples/sec   Loss 5.0015   LearningRate 0.0170   Epoch: 11   Global Step: 59490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:05:04,634-Speed 5385.71 samples/sec   Loss 5.1964   LearningRate 0.0170   Epoch: 11   Global Step: 59500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:05:06,513-Speed 5451.56 samples/sec   Loss 5.0861   LearningRate 0.0170   Epoch: 11   Global Step: 59510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:08,352-Speed 5572.49 samples/sec   Loss 5.2467   LearningRate 0.0169   Epoch: 11   Global Step: 59520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:10,212-Speed 5510.46 samples/sec   Loss 5.1481   LearningRate 0.0169   Epoch: 11   Global Step: 59530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:12,071-Speed 5512.28 samples/sec   Loss 5.1641   LearningRate 0.0169   Epoch: 11   Global Step: 59540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:13,947-Speed 5460.87 samples/sec   Loss 5.2495   LearningRate 0.0169   Epoch: 11   Global Step: 59550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:15,804-Speed 5517.50 samples/sec   Loss 5.2998   LearningRate 0.0169   Epoch: 11   Global Step: 59560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:17,651-Speed 5546.59 samples/sec   Loss 5.1997   LearningRate 0.0169   Epoch: 11   Global Step: 59570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:19,537-Speed 5432.88 samples/sec   Loss 5.2982   LearningRate 0.0169   Epoch: 11   Global Step: 59580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:21,382-Speed 5555.64 samples/sec   Loss 5.1070   LearningRate 0.0169   Epoch: 11   Global Step: 59590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:23,248-Speed 5493.25 samples/sec   Loss 5.2454   LearningRate 0.0169   Epoch: 11   Global Step: 59600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:25,080-Speed 5589.31 samples/sec   Loss 5.1889   LearningRate 0.0169   Epoch: 11   Global Step: 59610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:26,943-Speed 5502.73 samples/sec   Loss 5.1083   LearningRate 0.0169   Epoch: 11   Global Step: 59620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:28,778-Speed 5584.62 samples/sec   Loss 5.1036   LearningRate 0.0169   Epoch: 11   Global Step: 59630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:30,632-Speed 5524.20 samples/sec   Loss 5.2134   LearningRate 0.0168   Epoch: 11   Global Step: 59640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:32,474-Speed 5562.04 samples/sec   Loss 5.2660   LearningRate 0.0168   Epoch: 11   Global Step: 59650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:34,327-Speed 5527.21 samples/sec   Loss 5.1514   LearningRate 0.0168   Epoch: 11   Global Step: 59660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:36,173-Speed 5552.96 samples/sec   Loss 5.3745   LearningRate 0.0168   Epoch: 11   Global Step: 59670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:38,026-Speed 5529.02 samples/sec   Loss 5.1265   LearningRate 0.0168   Epoch: 11   Global Step: 59680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:05:39,872-Speed 5549.51 samples/sec   Loss 5.1653   LearningRate 0.0168   Epoch: 11   Global Step: 59690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:05:41,730-Speed 5514.59 samples/sec   Loss 5.3313   LearningRate 0.0168   Epoch: 11   Global Step: 59700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:05:43,573-Speed 5557.60 samples/sec   Loss 5.1034   LearningRate 0.0168   Epoch: 11   Global Step: 59710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:05:45,439-Speed 5493.43 samples/sec   Loss 5.0250   LearningRate 0.0168   Epoch: 11   Global Step: 59720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:05:47,274-Speed 5583.11 samples/sec   Loss 5.1642   LearningRate 0.0168   Epoch: 11   Global Step: 59730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:05:49,147-Speed 5469.75 samples/sec   Loss 5.2001   LearningRate 0.0168   Epoch: 11   Global Step: 59740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:05:50,993-Speed 5549.19 samples/sec   Loss 5.2676   LearningRate 0.0168   Epoch: 11   Global Step: 59750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:05:52,880-Speed 5432.35 samples/sec   Loss 5.1251   LearningRate 0.0167   Epoch: 11   Global Step: 59760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:05:54,718-Speed 5573.51 samples/sec   Loss 5.0927   LearningRate 0.0167   Epoch: 11   Global Step: 59770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:05:56,573-Speed 5521.58 samples/sec   Loss 5.0700   LearningRate 0.0167   Epoch: 11   Global Step: 59780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:05:58,425-Speed 5533.93 samples/sec   Loss 5.3023   LearningRate 0.0167   Epoch: 11   Global Step: 59790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:00,291-Speed 5491.30 samples/sec   Loss 5.1946   LearningRate 0.0167   Epoch: 11   Global Step: 59800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:02,135-Speed 5554.71 samples/sec   Loss 5.2117   LearningRate 0.0167   Epoch: 11   Global Step: 59810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:04,017-Speed 5444.16 samples/sec   Loss 5.1140   LearningRate 0.0167   Epoch: 11   Global Step: 59820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:05,885-Speed 5484.14 samples/sec   Loss 5.1079   LearningRate 0.0167   Epoch: 11   Global Step: 59830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:07,727-Speed 5562.55 samples/sec   Loss 5.2891   LearningRate 0.0167   Epoch: 11   Global Step: 59840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:09,594-Speed 5485.48 samples/sec   Loss 5.0879   LearningRate 0.0167   Epoch: 11   Global Step: 59850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:11,442-Speed 5544.98 samples/sec   Loss 5.0774   LearningRate 0.0167   Epoch: 11   Global Step: 59860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:13,318-Speed 5462.03 samples/sec   Loss 5.1108   LearningRate 0.0167   Epoch: 11   Global Step: 59870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:15,197-Speed 5452.87 samples/sec   Loss 5.1542   LearningRate 0.0167   Epoch: 11   Global Step: 59880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:17,066-Speed 5482.20 samples/sec   Loss 5.1566   LearningRate 0.0166   Epoch: 11   Global Step: 59890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:18,932-Speed 5491.21 samples/sec   Loss 5.2892   LearningRate 0.0166   Epoch: 11   Global Step: 59900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:20,790-Speed 5514.11 samples/sec   Loss 5.2381   LearningRate 0.0166   Epoch: 11   Global Step: 59910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:22,650-Speed 5509.00 samples/sec   Loss 5.2525   LearningRate 0.0166   Epoch: 11   Global Step: 59920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:24,547-Speed 5399.15 samples/sec   Loss 5.0366   LearningRate 0.0166   Epoch: 11   Global Step: 59930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:26,494-Speed 5262.00 samples/sec   Loss 5.1894   LearningRate 0.0166   Epoch: 11   Global Step: 59940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:28,343-Speed 5542.37 samples/sec   Loss 5.2244   LearningRate 0.0166   Epoch: 11   Global Step: 59950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:30,210-Speed 5486.37 samples/sec   Loss 5.3106   LearningRate 0.0166   Epoch: 11   Global Step: 59960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:32,060-Speed 5541.59 samples/sec   Loss 5.2435   LearningRate 0.0166   Epoch: 11   Global Step: 59970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:33,922-Speed 5501.95 samples/sec   Loss 5.2576   LearningRate 0.0166   Epoch: 11   Global Step: 59980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:35,770-Speed 5542.35 samples/sec   Loss 5.3226   LearningRate 0.0166   Epoch: 11   Global Step: 59990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:06:37,628-Speed 5516.15 samples/sec   Loss 5.1387   LearningRate 0.0166   Epoch: 11   Global Step: 60000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:07:04,938-[lfw][60000]XNorm: 22.662538
Training: 2022-04-11 14:07:04,939-[lfw][60000]Accuracy-Flip: 0.99733+-0.00238
Training: 2022-04-11 14:07:04,940-[lfw][60000]Accuracy-Highest: 0.99817
Training: 2022-04-11 14:07:36,483-[cfp_fp][60000]XNorm: 20.125682
Training: 2022-04-11 14:07:36,484-[cfp_fp][60000]Accuracy-Flip: 0.97443+-0.00692
Training: 2022-04-11 14:07:36,485-[cfp_fp][60000]Accuracy-Highest: 0.97443
Training: 2022-04-11 14:08:03,752-[agedb_30][60000]XNorm: 22.528334
Training: 2022-04-11 14:08:03,753-[agedb_30][60000]Accuracy-Flip: 0.97867+-0.00748
Training: 2022-04-11 14:08:03,753-[agedb_30][60000]Accuracy-Highest: 0.97867
Training: 2022-04-11 14:08:05,623-Speed 116.37 samples/sec   Loss 5.1254   LearningRate 0.0165   Epoch: 11   Global Step: 60010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:07,479-Speed 5518.25 samples/sec   Loss 5.2929   LearningRate 0.0165   Epoch: 11   Global Step: 60020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:09,308-Speed 5603.49 samples/sec   Loss 5.1819   LearningRate 0.0165   Epoch: 11   Global Step: 60030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:11,154-Speed 5548.97 samples/sec   Loss 5.1316   LearningRate 0.0165   Epoch: 11   Global Step: 60040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:13,011-Speed 5519.15 samples/sec   Loss 5.1836   LearningRate 0.0165   Epoch: 11   Global Step: 60050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:14,843-Speed 5591.14 samples/sec   Loss 5.1787   LearningRate 0.0165   Epoch: 11   Global Step: 60060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:16,685-Speed 5561.93 samples/sec   Loss 5.1325   LearningRate 0.0165   Epoch: 11   Global Step: 60070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:18,524-Speed 5569.71 samples/sec   Loss 5.2634   LearningRate 0.0165   Epoch: 11   Global Step: 60080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:20,376-Speed 5534.15 samples/sec   Loss 5.1093   LearningRate 0.0165   Epoch: 11   Global Step: 60090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:22,210-Speed 5586.14 samples/sec   Loss 5.0939   LearningRate 0.0165   Epoch: 11   Global Step: 60100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:08:24,059-Speed 5539.65 samples/sec   Loss 5.4828   LearningRate 0.0165   Epoch: 11   Global Step: 60110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:08:25,892-Speed 5589.59 samples/sec   Loss 5.1568   LearningRate 0.0165   Epoch: 11   Global Step: 60120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:27,740-Speed 5543.04 samples/sec   Loss 5.2697   LearningRate 0.0165   Epoch: 11   Global Step: 60130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:29,592-Speed 5533.49 samples/sec   Loss 5.2542   LearningRate 0.0164   Epoch: 11   Global Step: 60140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:31,463-Speed 5474.96 samples/sec   Loss 5.1760   LearningRate 0.0164   Epoch: 11   Global Step: 60150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:33,304-Speed 5567.25 samples/sec   Loss 5.2566   LearningRate 0.0164   Epoch: 11   Global Step: 60160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:35,149-Speed 5552.40 samples/sec   Loss 5.0148   LearningRate 0.0164   Epoch: 11   Global Step: 60170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:36,997-Speed 5546.69 samples/sec   Loss 5.2721   LearningRate 0.0164   Epoch: 11   Global Step: 60180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:38,837-Speed 5570.42 samples/sec   Loss 5.0658   LearningRate 0.0164   Epoch: 11   Global Step: 60190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:40,701-Speed 5494.15 samples/sec   Loss 4.9649   LearningRate 0.0164   Epoch: 11   Global Step: 60200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:42,545-Speed 5556.65 samples/sec   Loss 5.0809   LearningRate 0.0164   Epoch: 11   Global Step: 60210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:44,375-Speed 5600.24 samples/sec   Loss 5.1172   LearningRate 0.0164   Epoch: 11   Global Step: 60220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:46,212-Speed 5577.85 samples/sec   Loss 5.1726   LearningRate 0.0164   Epoch: 11   Global Step: 60230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:48,095-Speed 5442.73 samples/sec   Loss 5.1660   LearningRate 0.0164   Epoch: 11   Global Step: 60240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:49,946-Speed 5534.92 samples/sec   Loss 5.1456   LearningRate 0.0164   Epoch: 11   Global Step: 60250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:51,802-Speed 5520.83 samples/sec   Loss 4.9987   LearningRate 0.0163   Epoch: 11   Global Step: 60260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:53,654-Speed 5531.51 samples/sec   Loss 5.1519   LearningRate 0.0163   Epoch: 11   Global Step: 60270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:55,589-Speed 5293.47 samples/sec   Loss 5.1364   LearningRate 0.0163   Epoch: 11   Global Step: 60280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:57,445-Speed 5521.12 samples/sec   Loss 5.2884   LearningRate 0.0163   Epoch: 11   Global Step: 60290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:08:59,281-Speed 5581.76 samples/sec   Loss 5.1469   LearningRate 0.0163   Epoch: 11   Global Step: 60300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:09:01,142-Speed 5502.74 samples/sec   Loss 5.1199   LearningRate 0.0163   Epoch: 11   Global Step: 60310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:09:03,026-Speed 5438.62 samples/sec   Loss 5.2621   LearningRate 0.0163   Epoch: 11   Global Step: 60320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:04,903-Speed 5460.91 samples/sec   Loss 5.2207   LearningRate 0.0163   Epoch: 11   Global Step: 60330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:06,762-Speed 5511.09 samples/sec   Loss 5.1756   LearningRate 0.0163   Epoch: 11   Global Step: 60340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:08,630-Speed 5484.05 samples/sec   Loss 5.1756   LearningRate 0.0163   Epoch: 11   Global Step: 60350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:10,475-Speed 5554.24 samples/sec   Loss 5.0935   LearningRate 0.0163   Epoch: 11   Global Step: 60360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:12,324-Speed 5539.20 samples/sec   Loss 5.2411   LearningRate 0.0163   Epoch: 11   Global Step: 60370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:14,169-Speed 5553.71 samples/sec   Loss 5.1947   LearningRate 0.0163   Epoch: 11   Global Step: 60380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:16,039-Speed 5479.25 samples/sec   Loss 5.1366   LearningRate 0.0162   Epoch: 11   Global Step: 60390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:17,877-Speed 5575.08 samples/sec   Loss 5.1070   LearningRate 0.0162   Epoch: 11   Global Step: 60400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:19,742-Speed 5493.83 samples/sec   Loss 5.2671   LearningRate 0.0162   Epoch: 11   Global Step: 60410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:21,588-Speed 5550.40 samples/sec   Loss 5.1572   LearningRate 0.0162   Epoch: 11   Global Step: 60420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 14:09:23,418-Speed 5598.69 samples/sec   Loss 5.2894   LearningRate 0.0162   Epoch: 11   Global Step: 60430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:25,268-Speed 5539.68 samples/sec   Loss 5.1785   LearningRate 0.0162   Epoch: 11   Global Step: 60440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:27,117-Speed 5540.36 samples/sec   Loss 5.1572   LearningRate 0.0162   Epoch: 11   Global Step: 60450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:28,958-Speed 5564.66 samples/sec   Loss 5.2314   LearningRate 0.0162   Epoch: 11   Global Step: 60460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:30,797-Speed 5568.59 samples/sec   Loss 5.1879   LearningRate 0.0162   Epoch: 11   Global Step: 60470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:32,665-Speed 5484.41 samples/sec   Loss 5.1563   LearningRate 0.0162   Epoch: 11   Global Step: 60480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:34,511-Speed 5553.67 samples/sec   Loss 4.9895   LearningRate 0.0162   Epoch: 11   Global Step: 60490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:36,370-Speed 5508.92 samples/sec   Loss 5.1713   LearningRate 0.0162   Epoch: 11   Global Step: 60500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:38,224-Speed 5526.55 samples/sec   Loss 5.0806   LearningRate 0.0161   Epoch: 11   Global Step: 60510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:40,071-Speed 5548.84 samples/sec   Loss 5.1835   LearningRate 0.0161   Epoch: 11   Global Step: 60520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:41,907-Speed 5577.69 samples/sec   Loss 5.0365   LearningRate 0.0161   Epoch: 11   Global Step: 60530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:43,773-Speed 5492.88 samples/sec   Loss 5.0851   LearningRate 0.0161   Epoch: 11   Global Step: 60540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:45,624-Speed 5533.81 samples/sec   Loss 5.1250   LearningRate 0.0161   Epoch: 11   Global Step: 60550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:47,478-Speed 5527.02 samples/sec   Loss 5.1297   LearningRate 0.0161   Epoch: 11   Global Step: 60560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:49,364-Speed 5432.45 samples/sec   Loss 5.1729   LearningRate 0.0161   Epoch: 11   Global Step: 60570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:51,234-Speed 5477.75 samples/sec   Loss 5.1520   LearningRate 0.0161   Epoch: 11   Global Step: 60580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:53,099-Speed 5493.39 samples/sec   Loss 5.2921   LearningRate 0.0161   Epoch: 11   Global Step: 60590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:54,970-Speed 5477.69 samples/sec   Loss 5.1000   LearningRate 0.0161   Epoch: 11   Global Step: 60600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:56,819-Speed 5540.53 samples/sec   Loss 5.1472   LearningRate 0.0161   Epoch: 11   Global Step: 60610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:09:58,666-Speed 5545.60 samples/sec   Loss 5.1389   LearningRate 0.0161   Epoch: 11   Global Step: 60620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:10:00,522-Speed 5522.71 samples/sec   Loss 5.2430   LearningRate 0.0161   Epoch: 11   Global Step: 60630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:10:02,355-Speed 5589.08 samples/sec   Loss 5.2157   LearningRate 0.0160   Epoch: 11   Global Step: 60640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:04,214-Speed 5511.23 samples/sec   Loss 5.1279   LearningRate 0.0160   Epoch: 11   Global Step: 60650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:06,060-Speed 5550.02 samples/sec   Loss 5.3132   LearningRate 0.0160   Epoch: 11   Global Step: 60660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:07,914-Speed 5525.43 samples/sec   Loss 5.1244   LearningRate 0.0160   Epoch: 11   Global Step: 60670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:09,750-Speed 5581.76 samples/sec   Loss 5.1396   LearningRate 0.0160   Epoch: 11   Global Step: 60680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:11,670-Speed 5334.91 samples/sec   Loss 5.2076   LearningRate 0.0160   Epoch: 11   Global Step: 60690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:23,802-Speed 844.20 samples/sec   Loss 4.7788   LearningRate 0.0160   Epoch: 12   Global Step: 60700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:25,706-Speed 5381.47 samples/sec   Loss 4.1145   LearningRate 0.0160   Epoch: 12   Global Step: 60710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:27,640-Speed 5299.60 samples/sec   Loss 4.2252   LearningRate 0.0160   Epoch: 12   Global Step: 60720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:29,524-Speed 5441.17 samples/sec   Loss 4.1748   LearningRate 0.0160   Epoch: 12   Global Step: 60730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:31,452-Speed 5313.53 samples/sec   Loss 4.2830   LearningRate 0.0160   Epoch: 12   Global Step: 60740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:10:33,679-Speed 4600.74 samples/sec   Loss 4.2633   LearningRate 0.0160   Epoch: 12   Global Step: 60750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:10:35,521-Speed 5563.34 samples/sec   Loss 4.1771   LearningRate 0.0159   Epoch: 12   Global Step: 60760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:37,437-Speed 5345.85 samples/sec   Loss 4.3699   LearningRate 0.0159   Epoch: 12   Global Step: 60770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:39,282-Speed 5556.69 samples/sec   Loss 4.2891   LearningRate 0.0159   Epoch: 12   Global Step: 60780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:41,145-Speed 5497.83 samples/sec   Loss 4.3044   LearningRate 0.0159   Epoch: 12   Global Step: 60790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:42,994-Speed 5542.05 samples/sec   Loss 4.4495   LearningRate 0.0159   Epoch: 12   Global Step: 60800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:44,870-Speed 5463.25 samples/sec   Loss 4.2491   LearningRate 0.0159   Epoch: 12   Global Step: 60810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:46,706-Speed 5578.90 samples/sec   Loss 4.3093   LearningRate 0.0159   Epoch: 12   Global Step: 60820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:48,563-Speed 5518.32 samples/sec   Loss 4.3386   LearningRate 0.0159   Epoch: 12   Global Step: 60830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:50,421-Speed 5514.78 samples/sec   Loss 4.3483   LearningRate 0.0159   Epoch: 12   Global Step: 60840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:10:52,269-Speed 5543.24 samples/sec   Loss 4.4680   LearningRate 0.0159   Epoch: 12   Global Step: 60850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:10:54,157-Speed 5427.45 samples/sec   Loss 4.3892   LearningRate 0.0159   Epoch: 12   Global Step: 60860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:10:55,997-Speed 5567.34 samples/sec   Loss 4.2900   LearningRate 0.0159   Epoch: 12   Global Step: 60870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:10:57,889-Speed 5415.84 samples/sec   Loss 4.3427   LearningRate 0.0159   Epoch: 12   Global Step: 60880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:10:59,734-Speed 5552.85 samples/sec   Loss 4.2420   LearningRate 0.0158   Epoch: 12   Global Step: 60890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:11:01,619-Speed 5437.21 samples/sec   Loss 4.3920   LearningRate 0.0158   Epoch: 12   Global Step: 60900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:11:03,486-Speed 5487.21 samples/sec   Loss 4.4917   LearningRate 0.0158   Epoch: 12   Global Step: 60910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:11:05,345-Speed 5512.28 samples/sec   Loss 4.2717   LearningRate 0.0158   Epoch: 12   Global Step: 60920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:11:07,197-Speed 5530.08 samples/sec   Loss 4.3966   LearningRate 0.0158   Epoch: 12   Global Step: 60930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:11:09,111-Speed 5355.06 samples/sec   Loss 4.3845   LearningRate 0.0158   Epoch: 12   Global Step: 60940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-11 14:11:10,973-Speed 5501.17 samples/sec   Loss 4.4085   LearningRate 0.0158   Epoch: 12   Global Step: 60950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:12,837-Speed 5497.05 samples/sec   Loss 4.3473   LearningRate 0.0158   Epoch: 12   Global Step: 60960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:14,687-Speed 5538.12 samples/sec   Loss 4.3760   LearningRate 0.0158   Epoch: 12   Global Step: 60970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:16,548-Speed 5506.33 samples/sec   Loss 4.5464   LearningRate 0.0158   Epoch: 12   Global Step: 60980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:18,425-Speed 5458.36 samples/sec   Loss 4.3909   LearningRate 0.0158   Epoch: 12   Global Step: 60990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:20,278-Speed 5527.84 samples/sec   Loss 4.3568   LearningRate 0.0158   Epoch: 12   Global Step: 61000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:22,143-Speed 5495.73 samples/sec   Loss 4.5706   LearningRate 0.0158   Epoch: 12   Global Step: 61010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:24,002-Speed 5509.67 samples/sec   Loss 4.4852   LearningRate 0.0157   Epoch: 12   Global Step: 61020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:25,890-Speed 5427.68 samples/sec   Loss 4.4716   LearningRate 0.0157   Epoch: 12   Global Step: 61030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:27,731-Speed 5563.77 samples/sec   Loss 4.4718   LearningRate 0.0157   Epoch: 12   Global Step: 61040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:29,596-Speed 5496.22 samples/sec   Loss 4.4406   LearningRate 0.0157   Epoch: 12   Global Step: 61050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:11:31,443-Speed 5546.96 samples/sec   Loss 4.3693   LearningRate 0.0157   Epoch: 12   Global Step: 61060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:11:33,315-Speed 5471.81 samples/sec   Loss 4.5140   LearningRate 0.0157   Epoch: 12   Global Step: 61070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:11:35,171-Speed 5518.64 samples/sec   Loss 4.4568   LearningRate 0.0157   Epoch: 12   Global Step: 61080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:11:37,029-Speed 5515.78 samples/sec   Loss 4.5651   LearningRate 0.0157   Epoch: 12   Global Step: 61090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:38,872-Speed 5559.76 samples/sec   Loss 4.5141   LearningRate 0.0157   Epoch: 12   Global Step: 61100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:40,776-Speed 5380.64 samples/sec   Loss 4.3464   LearningRate 0.0157   Epoch: 12   Global Step: 61110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:42,638-Speed 5502.96 samples/sec   Loss 4.4292   LearningRate 0.0157   Epoch: 12   Global Step: 61120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:44,563-Speed 5322.65 samples/sec   Loss 4.5818   LearningRate 0.0157   Epoch: 12   Global Step: 61130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:46,412-Speed 5538.98 samples/sec   Loss 4.4463   LearningRate 0.0157   Epoch: 12   Global Step: 61140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:48,300-Speed 5428.26 samples/sec   Loss 4.4590   LearningRate 0.0156   Epoch: 12   Global Step: 61150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:50,164-Speed 5498.19 samples/sec   Loss 4.4999   LearningRate 0.0156   Epoch: 12   Global Step: 61160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:52,041-Speed 5457.25 samples/sec   Loss 4.5321   LearningRate 0.0156   Epoch: 12   Global Step: 61170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:53,902-Speed 5503.32 samples/sec   Loss 4.5952   LearningRate 0.0156   Epoch: 12   Global Step: 61180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:11:55,804-Speed 5388.64 samples/sec   Loss 4.5659   LearningRate 0.0156   Epoch: 12   Global Step: 61190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:11:57,645-Speed 5563.28 samples/sec   Loss 4.6092   LearningRate 0.0156   Epoch: 12   Global Step: 61200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:11:59,521-Speed 5464.03 samples/sec   Loss 4.3627   LearningRate 0.0156   Epoch: 12   Global Step: 61210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:12:01,396-Speed 5462.57 samples/sec   Loss 4.6223   LearningRate 0.0156   Epoch: 12   Global Step: 61220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:12:03,261-Speed 5493.74 samples/sec   Loss 4.4275   LearningRate 0.0156   Epoch: 12   Global Step: 61230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:12:05,120-Speed 5511.66 samples/sec   Loss 4.3241   LearningRate 0.0156   Epoch: 12   Global Step: 61240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:06,997-Speed 5457.37 samples/sec   Loss 4.5693   LearningRate 0.0156   Epoch: 12   Global Step: 61250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:08,839-Speed 5562.79 samples/sec   Loss 4.3674   LearningRate 0.0156   Epoch: 12   Global Step: 61260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:10,725-Speed 5432.09 samples/sec   Loss 4.5742   LearningRate 0.0155   Epoch: 12   Global Step: 61270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:12,567-Speed 5561.05 samples/sec   Loss 4.6540   LearningRate 0.0155   Epoch: 12   Global Step: 61280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:14,421-Speed 5527.05 samples/sec   Loss 4.3376   LearningRate 0.0155   Epoch: 12   Global Step: 61290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:16,327-Speed 5376.64 samples/sec   Loss 4.6129   LearningRate 0.0155   Epoch: 12   Global Step: 61300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:18,183-Speed 5518.94 samples/sec   Loss 4.3244   LearningRate 0.0155   Epoch: 12   Global Step: 61310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:20,063-Speed 5448.92 samples/sec   Loss 4.5986   LearningRate 0.0155   Epoch: 12   Global Step: 61320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:21,934-Speed 5475.59 samples/sec   Loss 4.4104   LearningRate 0.0155   Epoch: 12   Global Step: 61330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:23,811-Speed 5459.37 samples/sec   Loss 4.6441   LearningRate 0.0155   Epoch: 12   Global Step: 61340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:12:25,661-Speed 5536.60 samples/sec   Loss 4.4822   LearningRate 0.0155   Epoch: 12   Global Step: 61350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:12:27,509-Speed 5542.26 samples/sec   Loss 4.6403   LearningRate 0.0155   Epoch: 12   Global Step: 61360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:29,378-Speed 5482.52 samples/sec   Loss 4.6077   LearningRate 0.0155   Epoch: 12   Global Step: 61370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:31,218-Speed 5568.56 samples/sec   Loss 4.6005   LearningRate 0.0155   Epoch: 12   Global Step: 61380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:33,062-Speed 5554.32 samples/sec   Loss 4.6518   LearningRate 0.0155   Epoch: 12   Global Step: 61390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:34,914-Speed 5533.09 samples/sec   Loss 4.7093   LearningRate 0.0154   Epoch: 12   Global Step: 61400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:36,788-Speed 5466.64 samples/sec   Loss 4.6215   LearningRate 0.0154   Epoch: 12   Global Step: 61410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:38,650-Speed 5500.97 samples/sec   Loss 4.4268   LearningRate 0.0154   Epoch: 12   Global Step: 61420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:40,552-Speed 5387.85 samples/sec   Loss 4.4706   LearningRate 0.0154   Epoch: 12   Global Step: 61430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:42,410-Speed 5514.38 samples/sec   Loss 4.7488   LearningRate 0.0154   Epoch: 12   Global Step: 61440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:44,313-Speed 5384.10 samples/sec   Loss 4.6346   LearningRate 0.0154   Epoch: 12   Global Step: 61450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:46,166-Speed 5528.55 samples/sec   Loss 4.6618   LearningRate 0.0154   Epoch: 12   Global Step: 61460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:12:48,028-Speed 5502.96 samples/sec   Loss 4.5805   LearningRate 0.0154   Epoch: 12   Global Step: 61470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:12:49,871-Speed 5556.27 samples/sec   Loss 4.5318   LearningRate 0.0154   Epoch: 12   Global Step: 61480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:12:51,767-Speed 5404.92 samples/sec   Loss 4.5699   LearningRate 0.0154   Epoch: 12   Global Step: 61490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:12:53,621-Speed 5523.64 samples/sec   Loss 4.5745   LearningRate 0.0154   Epoch: 12   Global Step: 61500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:55,470-Speed 5543.76 samples/sec   Loss 4.7373   LearningRate 0.0154   Epoch: 12   Global Step: 61510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:57,310-Speed 5565.34 samples/sec   Loss 4.6246   LearningRate 0.0154   Epoch: 12   Global Step: 61520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:12:59,144-Speed 5587.03 samples/sec   Loss 4.6409   LearningRate 0.0153   Epoch: 12   Global Step: 61530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:01,044-Speed 5392.29 samples/sec   Loss 4.6965   LearningRate 0.0153   Epoch: 12   Global Step: 61540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:02,880-Speed 5579.75 samples/sec   Loss 4.6832   LearningRate 0.0153   Epoch: 12   Global Step: 61550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:04,725-Speed 5553.14 samples/sec   Loss 4.5407   LearningRate 0.0153   Epoch: 12   Global Step: 61560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:06,624-Speed 5396.12 samples/sec   Loss 4.5623   LearningRate 0.0153   Epoch: 12   Global Step: 61570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:08,465-Speed 5561.31 samples/sec   Loss 4.6642   LearningRate 0.0153   Epoch: 12   Global Step: 61580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:10,330-Speed 5495.61 samples/sec   Loss 4.6871   LearningRate 0.0153   Epoch: 12   Global Step: 61590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:12,214-Speed 5436.20 samples/sec   Loss 4.7728   LearningRate 0.0153   Epoch: 12   Global Step: 61600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:13:14,054-Speed 5567.50 samples/sec   Loss 4.6118   LearningRate 0.0153   Epoch: 12   Global Step: 61610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:15,944-Speed 5423.66 samples/sec   Loss 4.5693   LearningRate 0.0153   Epoch: 12   Global Step: 61620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:17,830-Speed 5430.82 samples/sec   Loss 4.6699   LearningRate 0.0153   Epoch: 12   Global Step: 61630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:19,699-Speed 5481.63 samples/sec   Loss 4.5239   LearningRate 0.0153   Epoch: 12   Global Step: 61640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:21,551-Speed 5532.35 samples/sec   Loss 4.6437   LearningRate 0.0153   Epoch: 12   Global Step: 61650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:23,410-Speed 5508.84 samples/sec   Loss 4.7319   LearningRate 0.0152   Epoch: 12   Global Step: 61660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:25,293-Speed 5442.34 samples/sec   Loss 4.6177   LearningRate 0.0152   Epoch: 12   Global Step: 61670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:27,155-Speed 5502.49 samples/sec   Loss 4.8331   LearningRate 0.0152   Epoch: 12   Global Step: 61680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:28,998-Speed 5557.26 samples/sec   Loss 4.6507   LearningRate 0.0152   Epoch: 12   Global Step: 61690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:30,854-Speed 5518.96 samples/sec   Loss 4.7323   LearningRate 0.0152   Epoch: 12   Global Step: 61700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 14:13:32,713-Speed 5513.08 samples/sec   Loss 4.6817   LearningRate 0.0152   Epoch: 12   Global Step: 61710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:13:34,562-Speed 5541.42 samples/sec   Loss 4.6097   LearningRate 0.0152   Epoch: 12   Global Step: 61720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:13:36,428-Speed 5488.36 samples/sec   Loss 4.5465   LearningRate 0.0152   Epoch: 12   Global Step: 61730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:13:38,266-Speed 5574.87 samples/sec   Loss 4.7427   LearningRate 0.0152   Epoch: 12   Global Step: 61740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:13:40,106-Speed 5565.72 samples/sec   Loss 4.4728   LearningRate 0.0152   Epoch: 12   Global Step: 61750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 14:13:41,991-Speed 5436.60 samples/sec   Loss 4.5886   LearningRate 0.0152   Epoch: 12   Global Step: 61760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:13:43,837-Speed 5549.48 samples/sec   Loss 4.6513   LearningRate 0.0152   Epoch: 12   Global Step: 61770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:13:45,676-Speed 5570.24 samples/sec   Loss 4.5731   LearningRate 0.0152   Epoch: 12   Global Step: 61780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:13:47,537-Speed 5504.99 samples/sec   Loss 4.6721   LearningRate 0.0151   Epoch: 12   Global Step: 61790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:13:49,373-Speed 5582.19 samples/sec   Loss 4.6293   LearningRate 0.0151   Epoch: 12   Global Step: 61800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:13:51,223-Speed 5535.72 samples/sec   Loss 4.7339   LearningRate 0.0151   Epoch: 12   Global Step: 61810   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 14:13:53,094-Speed 5477.04 samples/sec   Loss 4.7389   LearningRate 0.0151   Epoch: 12   Global Step: 61820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:13:54,937-Speed 5559.74 samples/sec   Loss 4.7286   LearningRate 0.0151   Epoch: 12   Global Step: 61830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:13:56,784-Speed 5545.56 samples/sec   Loss 4.6889   LearningRate 0.0151   Epoch: 12   Global Step: 61840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:13:58,636-Speed 5530.59 samples/sec   Loss 4.7741   LearningRate 0.0151   Epoch: 12   Global Step: 61850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:14:00,484-Speed 5543.72 samples/sec   Loss 4.6530   LearningRate 0.0151   Epoch: 12   Global Step: 61860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:14:02,324-Speed 5569.67 samples/sec   Loss 4.7980   LearningRate 0.0151   Epoch: 12   Global Step: 61870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:14:04,184-Speed 5508.40 samples/sec   Loss 4.6974   LearningRate 0.0151   Epoch: 12   Global Step: 61880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:14:06,052-Speed 5481.88 samples/sec   Loss 4.7166   LearningRate 0.0151   Epoch: 12   Global Step: 61890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:14:07,889-Speed 5576.84 samples/sec   Loss 4.8404   LearningRate 0.0151   Epoch: 12   Global Step: 61900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:14:09,722-Speed 5588.72 samples/sec   Loss 4.6763   LearningRate 0.0151   Epoch: 12   Global Step: 61910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:14:11,572-Speed 5537.64 samples/sec   Loss 4.7345   LearningRate 0.0150   Epoch: 12   Global Step: 61920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:14:13,430-Speed 5513.04 samples/sec   Loss 4.7499   LearningRate 0.0150   Epoch: 12   Global Step: 61930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:14:15,285-Speed 5522.87 samples/sec   Loss 4.7613   LearningRate 0.0150   Epoch: 12   Global Step: 61940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:14:17,127-Speed 5561.81 samples/sec   Loss 4.7766   LearningRate 0.0150   Epoch: 12   Global Step: 61950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:14:18,970-Speed 5558.94 samples/sec   Loss 4.7553   LearningRate 0.0150   Epoch: 12   Global Step: 61960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:14:20,804-Speed 5584.96 samples/sec   Loss 4.7811   LearningRate 0.0150   Epoch: 12   Global Step: 61970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:14:22,655-Speed 5535.40 samples/sec   Loss 4.6492   LearningRate 0.0150   Epoch: 12   Global Step: 61980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:14:24,495-Speed 5567.24 samples/sec   Loss 4.6925   LearningRate 0.0150   Epoch: 12   Global Step: 61990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:14:26,344-Speed 5542.04 samples/sec   Loss 4.6648   LearningRate 0.0150   Epoch: 12   Global Step: 62000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:14:53,883-[lfw][62000]XNorm: 22.733218
Training: 2022-04-11 14:14:53,883-[lfw][62000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-04-11 14:14:53,884-[lfw][62000]Accuracy-Highest: 0.99817
Training: 2022-04-11 14:15:25,200-[cfp_fp][62000]XNorm: 20.083805
Training: 2022-04-11 14:15:25,201-[cfp_fp][62000]Accuracy-Flip: 0.97771+-0.00816
Training: 2022-04-11 14:15:25,201-[cfp_fp][62000]Accuracy-Highest: 0.97771
Training: 2022-04-11 14:15:52,370-[agedb_30][62000]XNorm: 22.227357
Training: 2022-04-11 14:15:52,371-[agedb_30][62000]Accuracy-Flip: 0.97867+-0.00694
Training: 2022-04-11 14:15:52,372-[agedb_30][62000]Accuracy-Highest: 0.97867
Training: 2022-04-11 14:15:54,249-Speed 116.49 samples/sec   Loss 4.7526   LearningRate 0.0150   Epoch: 12   Global Step: 62010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:15:56,091-Speed 5560.63 samples/sec   Loss 4.7187   LearningRate 0.0150   Epoch: 12   Global Step: 62020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:15:57,937-Speed 5550.46 samples/sec   Loss 4.7228   LearningRate 0.0150   Epoch: 12   Global Step: 62030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:15:59,797-Speed 5509.44 samples/sec   Loss 4.7437   LearningRate 0.0150   Epoch: 12   Global Step: 62040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:16:01,662-Speed 5495.17 samples/sec   Loss 4.5443   LearningRate 0.0149   Epoch: 12   Global Step: 62050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:16:03,521-Speed 5511.25 samples/sec   Loss 4.5602   LearningRate 0.0149   Epoch: 12   Global Step: 62060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:16:05,405-Speed 5437.18 samples/sec   Loss 4.6718   LearningRate 0.0149   Epoch: 12   Global Step: 62070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:16:07,240-Speed 5585.33 samples/sec   Loss 4.7972   LearningRate 0.0149   Epoch: 12   Global Step: 62080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:16:09,095-Speed 5522.23 samples/sec   Loss 4.6767   LearningRate 0.0149   Epoch: 12   Global Step: 62090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:16:10,930-Speed 5582.93 samples/sec   Loss 4.6693   LearningRate 0.0149   Epoch: 12   Global Step: 62100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:16:12,833-Speed 5386.54 samples/sec   Loss 4.6611   LearningRate 0.0149   Epoch: 12   Global Step: 62110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:16:14,704-Speed 5474.99 samples/sec   Loss 4.6850   LearningRate 0.0149   Epoch: 12   Global Step: 62120   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:16:16,565-Speed 5504.12 samples/sec   Loss 4.6601   LearningRate 0.0149   Epoch: 12   Global Step: 62130   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:16:18,429-Speed 5497.52 samples/sec   Loss 4.7689   LearningRate 0.0149   Epoch: 12   Global Step: 62140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:16:20,266-Speed 5576.24 samples/sec   Loss 4.6672   LearningRate 0.0149   Epoch: 12   Global Step: 62150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:16:22,123-Speed 5517.84 samples/sec   Loss 4.6760   LearningRate 0.0149   Epoch: 12   Global Step: 62160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:16:23,960-Speed 5577.63 samples/sec   Loss 4.7109   LearningRate 0.0149   Epoch: 12   Global Step: 62170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:16:25,794-Speed 5587.55 samples/sec   Loss 4.6156   LearningRate 0.0148   Epoch: 12   Global Step: 62180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:16:27,656-Speed 5504.50 samples/sec   Loss 4.7574   LearningRate 0.0148   Epoch: 12   Global Step: 62190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:16:29,528-Speed 5472.12 samples/sec   Loss 4.7557   LearningRate 0.0148   Epoch: 12   Global Step: 62200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:16:31,366-Speed 5576.64 samples/sec   Loss 4.7692   LearningRate 0.0148   Epoch: 12   Global Step: 62210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:16:33,228-Speed 5507.18 samples/sec   Loss 4.6420   LearningRate 0.0148   Epoch: 12   Global Step: 62220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:16:35,085-Speed 5516.54 samples/sec   Loss 4.5510   LearningRate 0.0148   Epoch: 12   Global Step: 62230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:16:36,959-Speed 5466.86 samples/sec   Loss 4.8429   LearningRate 0.0148   Epoch: 12   Global Step: 62240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:16:38,886-Speed 5317.44 samples/sec   Loss 4.7066   LearningRate 0.0148   Epoch: 12   Global Step: 62250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:16:40,766-Speed 5451.16 samples/sec   Loss 4.9051   LearningRate 0.0148   Epoch: 12   Global Step: 62260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:16:42,632-Speed 5491.44 samples/sec   Loss 4.6814   LearningRate 0.0148   Epoch: 12   Global Step: 62270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:16:44,507-Speed 5462.69 samples/sec   Loss 4.7725   LearningRate 0.0148   Epoch: 12   Global Step: 62280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:16:46,383-Speed 5464.11 samples/sec   Loss 4.7336   LearningRate 0.0148   Epoch: 12   Global Step: 62290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:16:48,293-Speed 5362.87 samples/sec   Loss 4.7421   LearningRate 0.0148   Epoch: 12   Global Step: 62300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:16:50,134-Speed 5566.87 samples/sec   Loss 4.7459   LearningRate 0.0147   Epoch: 12   Global Step: 62310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:16:51,981-Speed 5545.08 samples/sec   Loss 4.7309   LearningRate 0.0147   Epoch: 12   Global Step: 62320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:16:53,864-Speed 5444.78 samples/sec   Loss 4.7217   LearningRate 0.0147   Epoch: 12   Global Step: 62330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:16:55,713-Speed 5541.40 samples/sec   Loss 4.8001   LearningRate 0.0147   Epoch: 12   Global Step: 62340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:16:57,568-Speed 5523.34 samples/sec   Loss 4.8055   LearningRate 0.0147   Epoch: 12   Global Step: 62350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:16:59,460-Speed 5415.63 samples/sec   Loss 4.6976   LearningRate 0.0147   Epoch: 12   Global Step: 62360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:01,302-Speed 5561.28 samples/sec   Loss 4.8236   LearningRate 0.0147   Epoch: 12   Global Step: 62370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:03,203-Speed 5390.18 samples/sec   Loss 4.7997   LearningRate 0.0147   Epoch: 12   Global Step: 62380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:05,084-Speed 5447.15 samples/sec   Loss 4.7615   LearningRate 0.0147   Epoch: 12   Global Step: 62390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:06,941-Speed 5516.15 samples/sec   Loss 4.5926   LearningRate 0.0147   Epoch: 12   Global Step: 62400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:08,777-Speed 5585.16 samples/sec   Loss 4.6714   LearningRate 0.0147   Epoch: 12   Global Step: 62410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:10,621-Speed 5557.39 samples/sec   Loss 4.7450   LearningRate 0.0147   Epoch: 12   Global Step: 62420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:12,493-Speed 5472.88 samples/sec   Loss 4.6841   LearningRate 0.0147   Epoch: 12   Global Step: 62430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:14,393-Speed 5392.10 samples/sec   Loss 4.6978   LearningRate 0.0147   Epoch: 12   Global Step: 62440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:16,245-Speed 5531.27 samples/sec   Loss 4.7832   LearningRate 0.0146   Epoch: 12   Global Step: 62450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:18,106-Speed 5508.00 samples/sec   Loss 4.7030   LearningRate 0.0146   Epoch: 12   Global Step: 62460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:19,969-Speed 5498.16 samples/sec   Loss 4.7344   LearningRate 0.0146   Epoch: 12   Global Step: 62470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:21,829-Speed 5508.91 samples/sec   Loss 4.8097   LearningRate 0.0146   Epoch: 12   Global Step: 62480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:23,691-Speed 5504.83 samples/sec   Loss 4.8469   LearningRate 0.0146   Epoch: 12   Global Step: 62490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:25,588-Speed 5400.77 samples/sec   Loss 4.8115   LearningRate 0.0146   Epoch: 12   Global Step: 62500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:27,452-Speed 5496.65 samples/sec   Loss 4.7413   LearningRate 0.0146   Epoch: 12   Global Step: 62510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:29,351-Speed 5395.22 samples/sec   Loss 4.6670   LearningRate 0.0146   Epoch: 12   Global Step: 62520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:31,204-Speed 5533.66 samples/sec   Loss 4.7502   LearningRate 0.0146   Epoch: 12   Global Step: 62530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:33,057-Speed 5531.01 samples/sec   Loss 4.7579   LearningRate 0.0146   Epoch: 12   Global Step: 62540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:17:34,901-Speed 5560.90 samples/sec   Loss 4.8226   LearningRate 0.0146   Epoch: 12   Global Step: 62550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:17:36,752-Speed 5535.03 samples/sec   Loss 4.7071   LearningRate 0.0146   Epoch: 12   Global Step: 62560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:17:38,600-Speed 5544.67 samples/sec   Loss 4.5988   LearningRate 0.0146   Epoch: 12   Global Step: 62570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:17:40,470-Speed 5480.09 samples/sec   Loss 4.8133   LearningRate 0.0145   Epoch: 12   Global Step: 62580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:17:42,339-Speed 5480.64 samples/sec   Loss 4.8087   LearningRate 0.0145   Epoch: 12   Global Step: 62590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:17:44,177-Speed 5575.17 samples/sec   Loss 4.7914   LearningRate 0.0145   Epoch: 12   Global Step: 62600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:17:46,037-Speed 5510.77 samples/sec   Loss 4.8610   LearningRate 0.0145   Epoch: 12   Global Step: 62610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:17:47,907-Speed 5487.55 samples/sec   Loss 4.6669   LearningRate 0.0145   Epoch: 12   Global Step: 62620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:17:49,772-Speed 5491.31 samples/sec   Loss 4.6912   LearningRate 0.0145   Epoch: 12   Global Step: 62630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:17:51,634-Speed 5502.96 samples/sec   Loss 4.7198   LearningRate 0.0145   Epoch: 12   Global Step: 62640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:53,491-Speed 5520.20 samples/sec   Loss 4.7829   LearningRate 0.0145   Epoch: 12   Global Step: 62650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:55,359-Speed 5482.28 samples/sec   Loss 4.7200   LearningRate 0.0145   Epoch: 12   Global Step: 62660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:57,214-Speed 5524.62 samples/sec   Loss 4.7396   LearningRate 0.0145   Epoch: 12   Global Step: 62670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:17:59,075-Speed 5507.64 samples/sec   Loss 4.5683   LearningRate 0.0145   Epoch: 12   Global Step: 62680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:00,946-Speed 5473.23 samples/sec   Loss 4.7401   LearningRate 0.0145   Epoch: 12   Global Step: 62690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:02,827-Speed 5449.96 samples/sec   Loss 4.7315   LearningRate 0.0145   Epoch: 12   Global Step: 62700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:04,666-Speed 5568.81 samples/sec   Loss 4.8538   LearningRate 0.0144   Epoch: 12   Global Step: 62710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:06,516-Speed 5538.07 samples/sec   Loss 4.8214   LearningRate 0.0144   Epoch: 12   Global Step: 62720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:08,409-Speed 5413.10 samples/sec   Loss 4.7582   LearningRate 0.0144   Epoch: 12   Global Step: 62730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:10,268-Speed 5509.64 samples/sec   Loss 4.8892   LearningRate 0.0144   Epoch: 12   Global Step: 62740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:12,146-Speed 5455.40 samples/sec   Loss 4.6895   LearningRate 0.0144   Epoch: 12   Global Step: 62750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:13,990-Speed 5557.87 samples/sec   Loss 4.8520   LearningRate 0.0144   Epoch: 12   Global Step: 62760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:15,884-Speed 5409.24 samples/sec   Loss 4.8259   LearningRate 0.0144   Epoch: 12   Global Step: 62770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:17,746-Speed 5504.56 samples/sec   Loss 4.8194   LearningRate 0.0144   Epoch: 12   Global Step: 62780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:18:19,598-Speed 5531.71 samples/sec   Loss 4.7387   LearningRate 0.0144   Epoch: 12   Global Step: 62790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:18:21,460-Speed 5500.94 samples/sec   Loss 4.7953   LearningRate 0.0144   Epoch: 12   Global Step: 62800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:23,298-Speed 5576.45 samples/sec   Loss 4.7584   LearningRate 0.0144   Epoch: 12   Global Step: 62810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:25,190-Speed 5415.11 samples/sec   Loss 4.8117   LearningRate 0.0144   Epoch: 12   Global Step: 62820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:27,032-Speed 5561.88 samples/sec   Loss 4.6803   LearningRate 0.0144   Epoch: 12   Global Step: 62830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:28,877-Speed 5552.62 samples/sec   Loss 4.7416   LearningRate 0.0143   Epoch: 12   Global Step: 62840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:30,757-Speed 5451.86 samples/sec   Loss 4.7879   LearningRate 0.0143   Epoch: 12   Global Step: 62850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:32,601-Speed 5555.85 samples/sec   Loss 4.6569   LearningRate 0.0143   Epoch: 12   Global Step: 62860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:34,461-Speed 5507.10 samples/sec   Loss 4.7374   LearningRate 0.0143   Epoch: 12   Global Step: 62870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:36,315-Speed 5524.95 samples/sec   Loss 4.7226   LearningRate 0.0143   Epoch: 12   Global Step: 62880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:38,216-Speed 5392.36 samples/sec   Loss 4.6641   LearningRate 0.0143   Epoch: 12   Global Step: 62890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:18:40,054-Speed 5574.57 samples/sec   Loss 4.6979   LearningRate 0.0143   Epoch: 12   Global Step: 62900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:18:41,932-Speed 5454.79 samples/sec   Loss 4.7508   LearningRate 0.0143   Epoch: 12   Global Step: 62910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:18:43,791-Speed 5511.78 samples/sec   Loss 4.8112   LearningRate 0.0143   Epoch: 12   Global Step: 62920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:18:45,626-Speed 5582.64 samples/sec   Loss 4.5974   LearningRate 0.0143   Epoch: 12   Global Step: 62930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:18:47,484-Speed 5514.26 samples/sec   Loss 4.7562   LearningRate 0.0143   Epoch: 12   Global Step: 62940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:18:49,332-Speed 5545.17 samples/sec   Loss 4.7860   LearningRate 0.0143   Epoch: 12   Global Step: 62950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:18:51,215-Speed 5440.68 samples/sec   Loss 4.8520   LearningRate 0.0143   Epoch: 12   Global Step: 62960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:18:53,050-Speed 5583.70 samples/sec   Loss 4.7208   LearningRate 0.0143   Epoch: 12   Global Step: 62970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:18:54,892-Speed 5561.67 samples/sec   Loss 4.7465   LearningRate 0.0142   Epoch: 12   Global Step: 62980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:18:56,733-Speed 5567.71 samples/sec   Loss 4.7586   LearningRate 0.0142   Epoch: 12   Global Step: 62990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:18:58,633-Speed 5392.80 samples/sec   Loss 4.7225   LearningRate 0.0142   Epoch: 12   Global Step: 63000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:19:00,487-Speed 5525.53 samples/sec   Loss 4.7098   LearningRate 0.0142   Epoch: 12   Global Step: 63010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:19:02,365-Speed 5454.88 samples/sec   Loss 4.8162   LearningRate 0.0142   Epoch: 12   Global Step: 63020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:19:04,219-Speed 5529.08 samples/sec   Loss 4.6317   LearningRate 0.0142   Epoch: 12   Global Step: 63030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:19:06,065-Speed 5549.34 samples/sec   Loss 4.8151   LearningRate 0.0142   Epoch: 12   Global Step: 63040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:19:07,924-Speed 5509.98 samples/sec   Loss 4.7461   LearningRate 0.0142   Epoch: 12   Global Step: 63050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:19:09,777-Speed 5528.04 samples/sec   Loss 4.8296   LearningRate 0.0142   Epoch: 12   Global Step: 63060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:19:11,646-Speed 5483.66 samples/sec   Loss 4.7776   LearningRate 0.0142   Epoch: 12   Global Step: 63070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:19:13,519-Speed 5470.78 samples/sec   Loss 4.6941   LearningRate 0.0142   Epoch: 12   Global Step: 63080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:19:15,365-Speed 5548.94 samples/sec   Loss 4.7099   LearningRate 0.0142   Epoch: 12   Global Step: 63090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:19:17,220-Speed 5522.45 samples/sec   Loss 4.8068   LearningRate 0.0142   Epoch: 12   Global Step: 63100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:19,066-Speed 5551.10 samples/sec   Loss 4.7024   LearningRate 0.0141   Epoch: 12   Global Step: 63110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:20,927-Speed 5504.43 samples/sec   Loss 4.8328   LearningRate 0.0141   Epoch: 12   Global Step: 63120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:22,810-Speed 5445.34 samples/sec   Loss 4.6618   LearningRate 0.0141   Epoch: 12   Global Step: 63130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:24,659-Speed 5537.95 samples/sec   Loss 4.8012   LearningRate 0.0141   Epoch: 12   Global Step: 63140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:26,514-Speed 5523.23 samples/sec   Loss 4.7565   LearningRate 0.0141   Epoch: 12   Global Step: 63150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:28,375-Speed 5506.86 samples/sec   Loss 4.7554   LearningRate 0.0141   Epoch: 12   Global Step: 63160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:30,232-Speed 5517.87 samples/sec   Loss 4.7799   LearningRate 0.0141   Epoch: 12   Global Step: 63170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:32,097-Speed 5492.59 samples/sec   Loss 4.6954   LearningRate 0.0141   Epoch: 12   Global Step: 63180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:33,957-Speed 5509.82 samples/sec   Loss 4.7426   LearningRate 0.0141   Epoch: 12   Global Step: 63190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:35,804-Speed 5546.99 samples/sec   Loss 4.7981   LearningRate 0.0141   Epoch: 12   Global Step: 63200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:19:37,681-Speed 5458.54 samples/sec   Loss 4.8456   LearningRate 0.0141   Epoch: 12   Global Step: 63210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:19:39,513-Speed 5593.47 samples/sec   Loss 4.6864   LearningRate 0.0141   Epoch: 12   Global Step: 63220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:41,395-Speed 5443.35 samples/sec   Loss 4.7695   LearningRate 0.0141   Epoch: 12   Global Step: 63230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:43,238-Speed 5562.33 samples/sec   Loss 4.7566   LearningRate 0.0141   Epoch: 12   Global Step: 63240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:45,117-Speed 5451.84 samples/sec   Loss 4.8225   LearningRate 0.0140   Epoch: 12   Global Step: 63250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:46,968-Speed 5533.50 samples/sec   Loss 4.6258   LearningRate 0.0140   Epoch: 12   Global Step: 63260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:48,833-Speed 5494.43 samples/sec   Loss 4.7539   LearningRate 0.0140   Epoch: 12   Global Step: 63270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:50,672-Speed 5573.01 samples/sec   Loss 4.6981   LearningRate 0.0140   Epoch: 12   Global Step: 63280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:52,537-Speed 5492.41 samples/sec   Loss 4.8503   LearningRate 0.0140   Epoch: 12   Global Step: 63290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:54,383-Speed 5551.56 samples/sec   Loss 4.7683   LearningRate 0.0140   Epoch: 12   Global Step: 63300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:56,246-Speed 5499.61 samples/sec   Loss 4.7873   LearningRate 0.0140   Epoch: 12   Global Step: 63310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:19:58,091-Speed 5555.85 samples/sec   Loss 4.6603   LearningRate 0.0140   Epoch: 12   Global Step: 63320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:19:59,961-Speed 5477.80 samples/sec   Loss 4.7086   LearningRate 0.0140   Epoch: 12   Global Step: 63330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:20:01,807-Speed 5552.44 samples/sec   Loss 4.7906   LearningRate 0.0140   Epoch: 12   Global Step: 63340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:20:03,684-Speed 5458.35 samples/sec   Loss 4.8122   LearningRate 0.0140   Epoch: 12   Global Step: 63350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:20:05,525-Speed 5565.80 samples/sec   Loss 4.7562   LearningRate 0.0140   Epoch: 12   Global Step: 63360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:20:07,364-Speed 5569.92 samples/sec   Loss 4.6648   LearningRate 0.0140   Epoch: 12   Global Step: 63370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:20:09,204-Speed 5570.48 samples/sec   Loss 4.7263   LearningRate 0.0139   Epoch: 12   Global Step: 63380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:20:11,075-Speed 5476.34 samples/sec   Loss 4.6921   LearningRate 0.0139   Epoch: 12   Global Step: 63390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:20:12,919-Speed 5555.57 samples/sec   Loss 4.7656   LearningRate 0.0139   Epoch: 12   Global Step: 63400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:20:14,785-Speed 5491.14 samples/sec   Loss 4.7529   LearningRate 0.0139   Epoch: 12   Global Step: 63410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:20:16,667-Speed 5444.67 samples/sec   Loss 4.9289   LearningRate 0.0139   Epoch: 12   Global Step: 63420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:20:18,506-Speed 5571.14 samples/sec   Loss 4.7854   LearningRate 0.0139   Epoch: 12   Global Step: 63430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:20:20,363-Speed 5519.06 samples/sec   Loss 4.6645   LearningRate 0.0139   Epoch: 12   Global Step: 63440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:20:22,211-Speed 5542.05 samples/sec   Loss 4.8388   LearningRate 0.0139   Epoch: 12   Global Step: 63450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:20:24,066-Speed 5523.69 samples/sec   Loss 4.8976   LearningRate 0.0139   Epoch: 12   Global Step: 63460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:20:25,936-Speed 5479.57 samples/sec   Loss 4.7431   LearningRate 0.0139   Epoch: 12   Global Step: 63470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:20:27,817-Speed 5447.50 samples/sec   Loss 4.7933   LearningRate 0.0139   Epoch: 12   Global Step: 63480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:20:29,657-Speed 5567.71 samples/sec   Loss 4.8266   LearningRate 0.0139   Epoch: 12   Global Step: 63490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:20:31,542-Speed 5435.40 samples/sec   Loss 4.6832   LearningRate 0.0139   Epoch: 12   Global Step: 63500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:20:33,381-Speed 5570.89 samples/sec   Loss 4.7640   LearningRate 0.0139   Epoch: 12   Global Step: 63510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:20:35,266-Speed 5435.50 samples/sec   Loss 4.7239   LearningRate 0.0138   Epoch: 12   Global Step: 63520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:20:37,160-Speed 5410.72 samples/sec   Loss 4.6082   LearningRate 0.0138   Epoch: 12   Global Step: 63530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:20:39,033-Speed 5468.18 samples/sec   Loss 4.8641   LearningRate 0.0138   Epoch: 12   Global Step: 63540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:20:40,867-Speed 5586.11 samples/sec   Loss 4.7758   LearningRate 0.0138   Epoch: 12   Global Step: 63550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:20:42,717-Speed 5539.60 samples/sec   Loss 4.7625   LearningRate 0.0138   Epoch: 12   Global Step: 63560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:20:44,568-Speed 5534.10 samples/sec   Loss 4.7926   LearningRate 0.0138   Epoch: 12   Global Step: 63570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:20:46,423-Speed 5523.29 samples/sec   Loss 4.6198   LearningRate 0.0138   Epoch: 12   Global Step: 63580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:20:48,279-Speed 5519.96 samples/sec   Loss 4.9008   LearningRate 0.0138   Epoch: 12   Global Step: 63590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:20:50,173-Speed 5408.65 samples/sec   Loss 4.6252   LearningRate 0.0138   Epoch: 12   Global Step: 63600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:20:52,006-Speed 5587.87 samples/sec   Loss 4.7735   LearningRate 0.0138   Epoch: 12   Global Step: 63610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:20:53,901-Speed 5408.12 samples/sec   Loss 4.6104   LearningRate 0.0138   Epoch: 12   Global Step: 63620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:20:55,746-Speed 5551.76 samples/sec   Loss 4.6824   LearningRate 0.0138   Epoch: 12   Global Step: 63630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:20:57,621-Speed 5464.41 samples/sec   Loss 4.7562   LearningRate 0.0138   Epoch: 12   Global Step: 63640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:20:59,487-Speed 5492.28 samples/sec   Loss 4.7538   LearningRate 0.0137   Epoch: 12   Global Step: 63650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:21:01,350-Speed 5500.42 samples/sec   Loss 4.6563   LearningRate 0.0137   Epoch: 12   Global Step: 63660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:21:03,250-Speed 5392.42 samples/sec   Loss 4.6497   LearningRate 0.0137   Epoch: 12   Global Step: 63670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:21:05,156-Speed 5373.72 samples/sec   Loss 4.7826   LearningRate 0.0137   Epoch: 12   Global Step: 63680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:21:07,001-Speed 5554.95 samples/sec   Loss 4.6908   LearningRate 0.0137   Epoch: 12   Global Step: 63690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:21:08,862-Speed 5505.66 samples/sec   Loss 4.8807   LearningRate 0.0137   Epoch: 12   Global Step: 63700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:21:10,706-Speed 5556.68 samples/sec   Loss 4.8143   LearningRate 0.0137   Epoch: 12   Global Step: 63710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:12,556-Speed 5535.76 samples/sec   Loss 4.6425   LearningRate 0.0137   Epoch: 12   Global Step: 63720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:14,461-Speed 5378.09 samples/sec   Loss 4.6634   LearningRate 0.0137   Epoch: 12   Global Step: 63730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:16,301-Speed 5570.58 samples/sec   Loss 4.6190   LearningRate 0.0137   Epoch: 12   Global Step: 63740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:18,208-Speed 5372.19 samples/sec   Loss 4.6966   LearningRate 0.0137   Epoch: 12   Global Step: 63750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:20,051-Speed 5560.72 samples/sec   Loss 4.6943   LearningRate 0.0137   Epoch: 12   Global Step: 63760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:21,918-Speed 5485.51 samples/sec   Loss 4.6755   LearningRate 0.0137   Epoch: 12   Global Step: 63770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:23,763-Speed 5557.12 samples/sec   Loss 4.6339   LearningRate 0.0137   Epoch: 12   Global Step: 63780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:25,637-Speed 5464.86 samples/sec   Loss 4.8446   LearningRate 0.0136   Epoch: 12   Global Step: 63790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:27,500-Speed 5499.67 samples/sec   Loss 4.9290   LearningRate 0.0136   Epoch: 12   Global Step: 63800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:29,341-Speed 5566.30 samples/sec   Loss 4.8991   LearningRate 0.0136   Epoch: 12   Global Step: 63810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:31,180-Speed 5570.10 samples/sec   Loss 4.6986   LearningRate 0.0136   Epoch: 12   Global Step: 63820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:33,033-Speed 5531.90 samples/sec   Loss 4.6795   LearningRate 0.0136   Epoch: 12   Global Step: 63830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:34,914-Speed 5445.11 samples/sec   Loss 4.7600   LearningRate 0.0136   Epoch: 12   Global Step: 63840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:36,780-Speed 5488.53 samples/sec   Loss 4.7633   LearningRate 0.0136   Epoch: 12   Global Step: 63850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:38,670-Speed 5422.00 samples/sec   Loss 4.8027   LearningRate 0.0136   Epoch: 12   Global Step: 63860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:40,542-Speed 5474.22 samples/sec   Loss 4.7866   LearningRate 0.0136   Epoch: 12   Global Step: 63870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:42,461-Speed 5338.66 samples/sec   Loss 4.8258   LearningRate 0.0136   Epoch: 12   Global Step: 63880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:44,303-Speed 5564.71 samples/sec   Loss 4.7248   LearningRate 0.0136   Epoch: 12   Global Step: 63890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:46,149-Speed 5548.52 samples/sec   Loss 4.8362   LearningRate 0.0136   Epoch: 12   Global Step: 63900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:48,026-Speed 5458.28 samples/sec   Loss 4.7586   LearningRate 0.0136   Epoch: 12   Global Step: 63910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:49,898-Speed 5474.03 samples/sec   Loss 4.7272   LearningRate 0.0136   Epoch: 12   Global Step: 63920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:51,757-Speed 5513.22 samples/sec   Loss 4.6988   LearningRate 0.0135   Epoch: 12   Global Step: 63930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:53,641-Speed 5439.21 samples/sec   Loss 4.6152   LearningRate 0.0135   Epoch: 12   Global Step: 63940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:55,483-Speed 5561.61 samples/sec   Loss 4.5510   LearningRate 0.0135   Epoch: 12   Global Step: 63950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:57,331-Speed 5543.27 samples/sec   Loss 4.7802   LearningRate 0.0135   Epoch: 12   Global Step: 63960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:21:59,182-Speed 5537.78 samples/sec   Loss 4.7483   LearningRate 0.0135   Epoch: 12   Global Step: 63970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:22:01,049-Speed 5490.26 samples/sec   Loss 4.6874   LearningRate 0.0135   Epoch: 12   Global Step: 63980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:22:02,897-Speed 5545.36 samples/sec   Loss 4.7563   LearningRate 0.0135   Epoch: 12   Global Step: 63990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:22:04,766-Speed 5480.27 samples/sec   Loss 4.5929   LearningRate 0.0135   Epoch: 12   Global Step: 64000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:22:32,046-[lfw][64000]XNorm: 23.746053
Training: 2022-04-11 14:22:32,047-[lfw][64000]Accuracy-Flip: 0.99800+-0.00287
Training: 2022-04-11 14:22:32,048-[lfw][64000]Accuracy-Highest: 0.99817
Training: 2022-04-11 14:23:03,218-[cfp_fp][64000]XNorm: 21.213482
Training: 2022-04-11 14:23:03,219-[cfp_fp][64000]Accuracy-Flip: 0.97471+-0.00748
Training: 2022-04-11 14:23:03,220-[cfp_fp][64000]Accuracy-Highest: 0.97771
Training: 2022-04-11 14:23:30,454-[agedb_30][64000]XNorm: 23.290081
Training: 2022-04-11 14:23:30,454-[agedb_30][64000]Accuracy-Flip: 0.97850+-0.00754
Training: 2022-04-11 14:23:30,455-[agedb_30][64000]Accuracy-Highest: 0.97867
Training: 2022-04-11 14:23:32,302-Speed 116.98 samples/sec   Loss 4.7613   LearningRate 0.0135   Epoch: 12   Global Step: 64010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:23:34,170-Speed 5484.34 samples/sec   Loss 4.7140   LearningRate 0.0135   Epoch: 12   Global Step: 64020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:23:36,001-Speed 5594.05 samples/sec   Loss 4.7202   LearningRate 0.0135   Epoch: 12   Global Step: 64030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:23:37,893-Speed 5415.93 samples/sec   Loss 4.8728   LearningRate 0.0135   Epoch: 12   Global Step: 64040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:23:39,733-Speed 5566.87 samples/sec   Loss 4.7712   LearningRate 0.0135   Epoch: 12   Global Step: 64050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:23:41,581-Speed 5546.67 samples/sec   Loss 4.7062   LearningRate 0.0135   Epoch: 12   Global Step: 64060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:23:43,440-Speed 5511.30 samples/sec   Loss 4.7106   LearningRate 0.0134   Epoch: 12   Global Step: 64070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:23:45,332-Speed 5414.73 samples/sec   Loss 4.6451   LearningRate 0.0134   Epoch: 12   Global Step: 64080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:23:47,177-Speed 5555.10 samples/sec   Loss 4.8150   LearningRate 0.0134   Epoch: 12   Global Step: 64090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:23:49,055-Speed 5458.89 samples/sec   Loss 4.6827   LearningRate 0.0134   Epoch: 12   Global Step: 64100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:23:50,967-Speed 5356.61 samples/sec   Loss 4.6264   LearningRate 0.0134   Epoch: 12   Global Step: 64110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:23:52,900-Speed 5302.97 samples/sec   Loss 4.6758   LearningRate 0.0134   Epoch: 12   Global Step: 64120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:23:54,786-Speed 5429.00 samples/sec   Loss 4.7850   LearningRate 0.0134   Epoch: 12   Global Step: 64130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:23:56,637-Speed 5538.06 samples/sec   Loss 4.8231   LearningRate 0.0134   Epoch: 12   Global Step: 64140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:23:58,531-Speed 5408.53 samples/sec   Loss 4.6909   LearningRate 0.0134   Epoch: 12   Global Step: 64150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:00,363-Speed 5593.60 samples/sec   Loss 4.6977   LearningRate 0.0134   Epoch: 12   Global Step: 64160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:02,227-Speed 5496.14 samples/sec   Loss 4.7420   LearningRate 0.0134   Epoch: 12   Global Step: 64170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:04,070-Speed 5562.02 samples/sec   Loss 4.7279   LearningRate 0.0134   Epoch: 12   Global Step: 64180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:05,941-Speed 5474.37 samples/sec   Loss 4.7529   LearningRate 0.0134   Epoch: 12   Global Step: 64190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:07,775-Speed 5586.03 samples/sec   Loss 4.6171   LearningRate 0.0133   Epoch: 12   Global Step: 64200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:24:09,609-Speed 5587.59 samples/sec   Loss 4.6200   LearningRate 0.0133   Epoch: 12   Global Step: 64210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:24:11,454-Speed 5553.19 samples/sec   Loss 4.8103   LearningRate 0.0133   Epoch: 12   Global Step: 64220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:24:13,291-Speed 5578.89 samples/sec   Loss 4.6269   LearningRate 0.0133   Epoch: 12   Global Step: 64230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:15,143-Speed 5529.80 samples/sec   Loss 4.7121   LearningRate 0.0133   Epoch: 12   Global Step: 64240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:17,003-Speed 5508.47 samples/sec   Loss 4.7341   LearningRate 0.0133   Epoch: 12   Global Step: 64250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:18,889-Speed 5435.35 samples/sec   Loss 4.6746   LearningRate 0.0133   Epoch: 12   Global Step: 64260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:20,734-Speed 5556.12 samples/sec   Loss 4.7414   LearningRate 0.0133   Epoch: 12   Global Step: 64270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:22,610-Speed 5459.77 samples/sec   Loss 4.7655   LearningRate 0.0133   Epoch: 12   Global Step: 64280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:24,489-Speed 5453.14 samples/sec   Loss 4.6674   LearningRate 0.0133   Epoch: 12   Global Step: 64290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:26,341-Speed 5532.46 samples/sec   Loss 4.7452   LearningRate 0.0133   Epoch: 12   Global Step: 64300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:28,179-Speed 5574.36 samples/sec   Loss 4.7426   LearningRate 0.0133   Epoch: 12   Global Step: 64310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:30,056-Speed 5459.48 samples/sec   Loss 4.7575   LearningRate 0.0133   Epoch: 12   Global Step: 64320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:31,933-Speed 5459.41 samples/sec   Loss 4.6878   LearningRate 0.0133   Epoch: 12   Global Step: 64330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:24:33,773-Speed 5567.26 samples/sec   Loss 4.6819   LearningRate 0.0132   Epoch: 12   Global Step: 64340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:24:35,640-Speed 5490.38 samples/sec   Loss 4.6242   LearningRate 0.0132   Epoch: 12   Global Step: 64350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:24:37,507-Speed 5486.01 samples/sec   Loss 4.8598   LearningRate 0.0132   Epoch: 12   Global Step: 64360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:39,357-Speed 5538.18 samples/sec   Loss 4.6661   LearningRate 0.0132   Epoch: 12   Global Step: 64370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:41,237-Speed 5451.05 samples/sec   Loss 4.7043   LearningRate 0.0132   Epoch: 12   Global Step: 64380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:43,092-Speed 5523.57 samples/sec   Loss 4.7306   LearningRate 0.0132   Epoch: 12   Global Step: 64390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:44,975-Speed 5439.52 samples/sec   Loss 4.7617   LearningRate 0.0132   Epoch: 12   Global Step: 64400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:46,842-Speed 5487.66 samples/sec   Loss 4.6605   LearningRate 0.0132   Epoch: 12   Global Step: 64410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:48,702-Speed 5509.96 samples/sec   Loss 4.7874   LearningRate 0.0132   Epoch: 12   Global Step: 64420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:50,582-Speed 5453.09 samples/sec   Loss 4.7400   LearningRate 0.0132   Epoch: 12   Global Step: 64430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:52,426-Speed 5556.82 samples/sec   Loss 4.7218   LearningRate 0.0132   Epoch: 12   Global Step: 64440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:54,287-Speed 5505.28 samples/sec   Loss 4.5419   LearningRate 0.0132   Epoch: 12   Global Step: 64450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:56,128-Speed 5567.01 samples/sec   Loss 4.6443   LearningRate 0.0132   Epoch: 12   Global Step: 64460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:24:58,009-Speed 5447.47 samples/sec   Loss 4.6499   LearningRate 0.0132   Epoch: 12   Global Step: 64470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:24:59,848-Speed 5570.75 samples/sec   Loss 4.7411   LearningRate 0.0131   Epoch: 12   Global Step: 64480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:01,746-Speed 5398.46 samples/sec   Loss 4.6172   LearningRate 0.0131   Epoch: 12   Global Step: 64490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:03,598-Speed 5532.37 samples/sec   Loss 4.7479   LearningRate 0.0131   Epoch: 12   Global Step: 64500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:05,455-Speed 5517.78 samples/sec   Loss 4.7092   LearningRate 0.0131   Epoch: 12   Global Step: 64510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:07,332-Speed 5456.81 samples/sec   Loss 4.7098   LearningRate 0.0131   Epoch: 12   Global Step: 64520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:09,171-Speed 5571.95 samples/sec   Loss 4.6488   LearningRate 0.0131   Epoch: 12   Global Step: 64530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:11,063-Speed 5417.66 samples/sec   Loss 4.7780   LearningRate 0.0131   Epoch: 12   Global Step: 64540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:12,913-Speed 5537.16 samples/sec   Loss 4.6057   LearningRate 0.0131   Epoch: 12   Global Step: 64550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:14,750-Speed 5577.01 samples/sec   Loss 4.6927   LearningRate 0.0131   Epoch: 12   Global Step: 64560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:16,620-Speed 5479.16 samples/sec   Loss 4.7717   LearningRate 0.0131   Epoch: 12   Global Step: 64570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:18,480-Speed 5510.40 samples/sec   Loss 4.7113   LearningRate 0.0131   Epoch: 12   Global Step: 64580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:20,354-Speed 5464.61 samples/sec   Loss 4.7648   LearningRate 0.0131   Epoch: 12   Global Step: 64590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:22,211-Speed 5520.28 samples/sec   Loss 4.6969   LearningRate 0.0131   Epoch: 12   Global Step: 64600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:24,101-Speed 5420.39 samples/sec   Loss 4.7307   LearningRate 0.0131   Epoch: 12   Global Step: 64610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:26,004-Speed 5384.11 samples/sec   Loss 4.6525   LearningRate 0.0130   Epoch: 12   Global Step: 64620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:27,871-Speed 5487.21 samples/sec   Loss 4.6995   LearningRate 0.0130   Epoch: 12   Global Step: 64630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:29,758-Speed 5429.83 samples/sec   Loss 4.7804   LearningRate 0.0130   Epoch: 12   Global Step: 64640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:31,603-Speed 5554.44 samples/sec   Loss 4.5535   LearningRate 0.0130   Epoch: 12   Global Step: 64650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:33,483-Speed 5450.39 samples/sec   Loss 4.6747   LearningRate 0.0130   Epoch: 12   Global Step: 64660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:35,338-Speed 5523.11 samples/sec   Loss 4.7027   LearningRate 0.0130   Epoch: 12   Global Step: 64670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:25:37,231-Speed 5411.07 samples/sec   Loss 4.7071   LearningRate 0.0130   Epoch: 12   Global Step: 64680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:25:39,074-Speed 5560.88 samples/sec   Loss 4.5264   LearningRate 0.0130   Epoch: 12   Global Step: 64690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:25:40,955-Speed 5446.58 samples/sec   Loss 4.5857   LearningRate 0.0130   Epoch: 12   Global Step: 64700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:25:42,800-Speed 5552.91 samples/sec   Loss 4.6518   LearningRate 0.0130   Epoch: 12   Global Step: 64710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:25:44,665-Speed 5494.76 samples/sec   Loss 4.7534   LearningRate 0.0130   Epoch: 12   Global Step: 64720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:25:46,514-Speed 5540.97 samples/sec   Loss 4.8085   LearningRate 0.0130   Epoch: 12   Global Step: 64730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:25:48,393-Speed 5453.62 samples/sec   Loss 4.7604   LearningRate 0.0130   Epoch: 12   Global Step: 64740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:25:50,258-Speed 5493.19 samples/sec   Loss 4.6959   LearningRate 0.0130   Epoch: 12   Global Step: 64750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:52,111-Speed 5529.08 samples/sec   Loss 4.6959   LearningRate 0.0129   Epoch: 12   Global Step: 64760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:53,965-Speed 5528.49 samples/sec   Loss 4.6337   LearningRate 0.0129   Epoch: 12   Global Step: 64770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:55,803-Speed 5574.72 samples/sec   Loss 4.6739   LearningRate 0.0129   Epoch: 12   Global Step: 64780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:57,692-Speed 5422.98 samples/sec   Loss 4.7533   LearningRate 0.0129   Epoch: 12   Global Step: 64790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:25:59,537-Speed 5555.06 samples/sec   Loss 4.6432   LearningRate 0.0129   Epoch: 12   Global Step: 64800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:01,412-Speed 5463.48 samples/sec   Loss 4.8700   LearningRate 0.0129   Epoch: 12   Global Step: 64810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:03,265-Speed 5530.93 samples/sec   Loss 4.7734   LearningRate 0.0129   Epoch: 12   Global Step: 64820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:05,147-Speed 5443.44 samples/sec   Loss 4.6327   LearningRate 0.0129   Epoch: 12   Global Step: 64830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:06,994-Speed 5548.15 samples/sec   Loss 4.7085   LearningRate 0.0129   Epoch: 12   Global Step: 64840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:08,880-Speed 5431.88 samples/sec   Loss 4.6587   LearningRate 0.0129   Epoch: 12   Global Step: 64850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:26:10,731-Speed 5536.73 samples/sec   Loss 4.5671   LearningRate 0.0129   Epoch: 12   Global Step: 64860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:26:12,698-Speed 5208.19 samples/sec   Loss 4.6318   LearningRate 0.0129   Epoch: 12   Global Step: 64870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:26:14,579-Speed 5447.73 samples/sec   Loss 4.6555   LearningRate 0.0129   Epoch: 12   Global Step: 64880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:26:16,475-Speed 5403.67 samples/sec   Loss 4.7400   LearningRate 0.0129   Epoch: 12   Global Step: 64890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:18,344-Speed 5482.77 samples/sec   Loss 4.7494   LearningRate 0.0128   Epoch: 12   Global Step: 64900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:20,189-Speed 5554.33 samples/sec   Loss 4.6748   LearningRate 0.0128   Epoch: 12   Global Step: 64910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:22,036-Speed 5547.80 samples/sec   Loss 4.4907   LearningRate 0.0128   Epoch: 12   Global Step: 64920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:23,886-Speed 5537.21 samples/sec   Loss 4.8125   LearningRate 0.0128   Epoch: 12   Global Step: 64930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:25,744-Speed 5516.40 samples/sec   Loss 4.5804   LearningRate 0.0128   Epoch: 12   Global Step: 64940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:27,610-Speed 5491.37 samples/sec   Loss 4.7666   LearningRate 0.0128   Epoch: 12   Global Step: 64950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:29,469-Speed 5511.27 samples/sec   Loss 4.6506   LearningRate 0.0128   Epoch: 12   Global Step: 64960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:31,338-Speed 5481.26 samples/sec   Loss 4.7193   LearningRate 0.0128   Epoch: 12   Global Step: 64970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:33,180-Speed 5563.70 samples/sec   Loss 4.7161   LearningRate 0.0128   Epoch: 12   Global Step: 64980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:35,051-Speed 5475.51 samples/sec   Loss 4.5145   LearningRate 0.0128   Epoch: 12   Global Step: 64990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:26:36,939-Speed 5429.42 samples/sec   Loss 4.6096   LearningRate 0.0128   Epoch: 12   Global Step: 65000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:26:38,816-Speed 5455.44 samples/sec   Loss 4.7717   LearningRate 0.0128   Epoch: 12   Global Step: 65010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:26:40,680-Speed 5499.69 samples/sec   Loss 4.6802   LearningRate 0.0128   Epoch: 12   Global Step: 65020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:26:42,536-Speed 5518.46 samples/sec   Loss 4.8100   LearningRate 0.0128   Epoch: 12   Global Step: 65030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:44,375-Speed 5572.94 samples/sec   Loss 4.7153   LearningRate 0.0127   Epoch: 12   Global Step: 65040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:46,234-Speed 5509.27 samples/sec   Loss 4.7028   LearningRate 0.0127   Epoch: 12   Global Step: 65050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:48,077-Speed 5560.19 samples/sec   Loss 4.7711   LearningRate 0.0127   Epoch: 12   Global Step: 65060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:49,965-Speed 5426.94 samples/sec   Loss 4.6258   LearningRate 0.0127   Epoch: 12   Global Step: 65070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:51,834-Speed 5482.54 samples/sec   Loss 4.6701   LearningRate 0.0127   Epoch: 12   Global Step: 65080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:53,702-Speed 5483.86 samples/sec   Loss 4.6217   LearningRate 0.0127   Epoch: 12   Global Step: 65090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:55,586-Speed 5439.89 samples/sec   Loss 4.5634   LearningRate 0.0127   Epoch: 12   Global Step: 65100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:57,459-Speed 5471.10 samples/sec   Loss 4.5179   LearningRate 0.0127   Epoch: 12   Global Step: 65110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:26:59,360-Speed 5389.31 samples/sec   Loss 4.6346   LearningRate 0.0127   Epoch: 12   Global Step: 65120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:01,203-Speed 5560.59 samples/sec   Loss 4.6772   LearningRate 0.0127   Epoch: 12   Global Step: 65130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:27:03,113-Speed 5363.80 samples/sec   Loss 4.6063   LearningRate 0.0127   Epoch: 12   Global Step: 65140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:27:04,963-Speed 5537.28 samples/sec   Loss 4.7064   LearningRate 0.0127   Epoch: 12   Global Step: 65150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:27:06,840-Speed 5460.44 samples/sec   Loss 4.7727   LearningRate 0.0127   Epoch: 12   Global Step: 65160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:27:08,680-Speed 5568.15 samples/sec   Loss 4.5741   LearningRate 0.0127   Epoch: 12   Global Step: 65170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:27:10,569-Speed 5421.84 samples/sec   Loss 4.6374   LearningRate 0.0127   Epoch: 12   Global Step: 65180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:27:12,412-Speed 5561.93 samples/sec   Loss 4.6970   LearningRate 0.0126   Epoch: 12   Global Step: 65190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:27:14,276-Speed 5493.85 samples/sec   Loss 4.7344   LearningRate 0.0126   Epoch: 12   Global Step: 65200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:16,148-Speed 5474.69 samples/sec   Loss 4.7780   LearningRate 0.0126   Epoch: 12   Global Step: 65210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:18,012-Speed 5498.68 samples/sec   Loss 4.7416   LearningRate 0.0126   Epoch: 12   Global Step: 65220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:19,868-Speed 5520.23 samples/sec   Loss 4.6161   LearningRate 0.0126   Epoch: 12   Global Step: 65230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:21,731-Speed 5500.23 samples/sec   Loss 4.5898   LearningRate 0.0126   Epoch: 12   Global Step: 65240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:23,579-Speed 5544.07 samples/sec   Loss 4.5401   LearningRate 0.0126   Epoch: 12   Global Step: 65250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:25,425-Speed 5547.20 samples/sec   Loss 4.6085   LearningRate 0.0126   Epoch: 12   Global Step: 65260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:27,300-Speed 5466.83 samples/sec   Loss 4.6873   LearningRate 0.0126   Epoch: 12   Global Step: 65270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:29,162-Speed 5501.92 samples/sec   Loss 4.8021   LearningRate 0.0126   Epoch: 12   Global Step: 65280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:31,050-Speed 5428.99 samples/sec   Loss 4.7122   LearningRate 0.0126   Epoch: 12   Global Step: 65290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:32,889-Speed 5568.70 samples/sec   Loss 4.6998   LearningRate 0.0126   Epoch: 12   Global Step: 65300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:27:34,732-Speed 5560.42 samples/sec   Loss 4.6948   LearningRate 0.0126   Epoch: 12   Global Step: 65310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:27:36,600-Speed 5484.68 samples/sec   Loss 4.6442   LearningRate 0.0126   Epoch: 12   Global Step: 65320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:38,446-Speed 5553.78 samples/sec   Loss 4.7632   LearningRate 0.0125   Epoch: 12   Global Step: 65330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:40,342-Speed 5404.24 samples/sec   Loss 4.6368   LearningRate 0.0125   Epoch: 12   Global Step: 65340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:42,184-Speed 5563.87 samples/sec   Loss 4.6666   LearningRate 0.0125   Epoch: 12   Global Step: 65350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:44,063-Speed 5452.37 samples/sec   Loss 4.5683   LearningRate 0.0125   Epoch: 12   Global Step: 65360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:45,901-Speed 5574.38 samples/sec   Loss 4.7809   LearningRate 0.0125   Epoch: 12   Global Step: 65370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:47,772-Speed 5476.70 samples/sec   Loss 4.5354   LearningRate 0.0125   Epoch: 12   Global Step: 65380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:49,668-Speed 5404.13 samples/sec   Loss 4.6032   LearningRate 0.0125   Epoch: 12   Global Step: 65390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:51,556-Speed 5425.65 samples/sec   Loss 4.6164   LearningRate 0.0125   Epoch: 12   Global Step: 65400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:53,403-Speed 5546.18 samples/sec   Loss 4.5080   LearningRate 0.0125   Epoch: 12   Global Step: 65410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:27:55,298-Speed 5406.20 samples/sec   Loss 4.7001   LearningRate 0.0125   Epoch: 12   Global Step: 65420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:27:57,147-Speed 5543.53 samples/sec   Loss 4.5572   LearningRate 0.0125   Epoch: 12   Global Step: 65430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:27:58,982-Speed 5581.50 samples/sec   Loss 4.4869   LearningRate 0.0125   Epoch: 12   Global Step: 65440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:28:00,864-Speed 5445.37 samples/sec   Loss 4.6166   LearningRate 0.0125   Epoch: 12   Global Step: 65450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:28:02,718-Speed 5528.06 samples/sec   Loss 4.5699   LearningRate 0.0125   Epoch: 12   Global Step: 65460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:28:04,594-Speed 5460.28 samples/sec   Loss 4.6568   LearningRate 0.0124   Epoch: 12   Global Step: 65470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:06,435-Speed 5565.20 samples/sec   Loss 4.6920   LearningRate 0.0124   Epoch: 12   Global Step: 65480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:08,311-Speed 5460.59 samples/sec   Loss 4.7800   LearningRate 0.0124   Epoch: 12   Global Step: 65490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:10,163-Speed 5531.71 samples/sec   Loss 4.5087   LearningRate 0.0124   Epoch: 12   Global Step: 65500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:11,998-Speed 5583.37 samples/sec   Loss 4.6860   LearningRate 0.0124   Epoch: 12   Global Step: 65510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:13,969-Speed 5199.13 samples/sec   Loss 4.6332   LearningRate 0.0124   Epoch: 12   Global Step: 65520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:15,815-Speed 5550.99 samples/sec   Loss 4.5869   LearningRate 0.0124   Epoch: 12   Global Step: 65530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:17,697-Speed 5442.03 samples/sec   Loss 4.7913   LearningRate 0.0124   Epoch: 12   Global Step: 65540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:19,555-Speed 5513.19 samples/sec   Loss 4.5176   LearningRate 0.0124   Epoch: 12   Global Step: 65550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:21,392-Speed 5579.15 samples/sec   Loss 4.6384   LearningRate 0.0124   Epoch: 12   Global Step: 65560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:23,298-Speed 5374.93 samples/sec   Loss 4.4999   LearningRate 0.0124   Epoch: 12   Global Step: 65570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:28:25,140-Speed 5561.37 samples/sec   Loss 4.6839   LearningRate 0.0124   Epoch: 12   Global Step: 65580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:28:27,013-Speed 5469.87 samples/sec   Loss 4.6803   LearningRate 0.0124   Epoch: 12   Global Step: 65590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:28,848-Speed 5583.61 samples/sec   Loss 4.6482   LearningRate 0.0124   Epoch: 12   Global Step: 65600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:30,733-Speed 5435.53 samples/sec   Loss 4.5858   LearningRate 0.0123   Epoch: 12   Global Step: 65610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:32,582-Speed 5543.16 samples/sec   Loss 4.6457   LearningRate 0.0123   Epoch: 12   Global Step: 65620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:34,476-Speed 5408.68 samples/sec   Loss 4.7293   LearningRate 0.0123   Epoch: 12   Global Step: 65630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:36,339-Speed 5501.52 samples/sec   Loss 4.6478   LearningRate 0.0123   Epoch: 12   Global Step: 65640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:38,219-Speed 5451.18 samples/sec   Loss 4.6543   LearningRate 0.0123   Epoch: 12   Global Step: 65650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:40,074-Speed 5524.30 samples/sec   Loss 4.5234   LearningRate 0.0123   Epoch: 12   Global Step: 65660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:41,936-Speed 5505.02 samples/sec   Loss 4.6945   LearningRate 0.0123   Epoch: 12   Global Step: 65670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:43,777-Speed 5563.35 samples/sec   Loss 4.7435   LearningRate 0.0123   Epoch: 12   Global Step: 65680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:45,655-Speed 5454.91 samples/sec   Loss 4.5286   LearningRate 0.0123   Epoch: 12   Global Step: 65690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:28:47,510-Speed 5525.61 samples/sec   Loss 4.6242   LearningRate 0.0123   Epoch: 12   Global Step: 65700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:28:49,393-Speed 5440.62 samples/sec   Loss 4.5406   LearningRate 0.0123   Epoch: 12   Global Step: 65710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:28:51,231-Speed 5577.32 samples/sec   Loss 4.5725   LearningRate 0.0123   Epoch: 12   Global Step: 65720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:53,112-Speed 5445.77 samples/sec   Loss 4.7997   LearningRate 0.0123   Epoch: 12   Global Step: 65730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:54,983-Speed 5475.37 samples/sec   Loss 4.6728   LearningRate 0.0123   Epoch: 12   Global Step: 65740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:28:56,887-Speed 5383.81 samples/sec   Loss 4.7202   LearningRate 0.0123   Epoch: 12   Global Step: 65750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:08,258-Speed 900.67 samples/sec   Loss 4.1357   LearningRate 0.0122   Epoch: 13   Global Step: 65760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:10,103-Speed 5553.89 samples/sec   Loss 3.7468   LearningRate 0.0122   Epoch: 13   Global Step: 65770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:12,029-Speed 5321.74 samples/sec   Loss 3.8559   LearningRate 0.0122   Epoch: 13   Global Step: 65780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:13,961-Speed 5300.62 samples/sec   Loss 3.6855   LearningRate 0.0122   Epoch: 13   Global Step: 65790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:15,827-Speed 5496.09 samples/sec   Loss 3.7482   LearningRate 0.0122   Epoch: 13   Global Step: 65800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:17,739-Speed 5358.55 samples/sec   Loss 3.7452   LearningRate 0.0122   Epoch: 13   Global Step: 65810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:19,604-Speed 5495.73 samples/sec   Loss 3.8483   LearningRate 0.0122   Epoch: 13   Global Step: 65820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:29:21,461-Speed 5517.56 samples/sec   Loss 3.6690   LearningRate 0.0122   Epoch: 13   Global Step: 65830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:29:23,378-Speed 5345.17 samples/sec   Loss 3.7680   LearningRate 0.0122   Epoch: 13   Global Step: 65840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:29:25,260-Speed 5441.44 samples/sec   Loss 3.8158   LearningRate 0.0122   Epoch: 13   Global Step: 65850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:29:27,124-Speed 5495.56 samples/sec   Loss 3.8556   LearningRate 0.0122   Epoch: 13   Global Step: 65860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:29:28,982-Speed 5515.84 samples/sec   Loss 3.8454   LearningRate 0.0122   Epoch: 13   Global Step: 65870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:30,840-Speed 5513.64 samples/sec   Loss 3.7195   LearningRate 0.0122   Epoch: 13   Global Step: 65880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:32,720-Speed 5450.05 samples/sec   Loss 3.8080   LearningRate 0.0122   Epoch: 13   Global Step: 65890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:34,572-Speed 5528.47 samples/sec   Loss 3.8409   LearningRate 0.0121   Epoch: 13   Global Step: 65900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:36,426-Speed 5527.71 samples/sec   Loss 3.9080   LearningRate 0.0121   Epoch: 13   Global Step: 65910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:38,299-Speed 5468.01 samples/sec   Loss 3.8098   LearningRate 0.0121   Epoch: 13   Global Step: 65920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:40,146-Speed 5545.54 samples/sec   Loss 3.9566   LearningRate 0.0121   Epoch: 13   Global Step: 65930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:42,047-Speed 5390.06 samples/sec   Loss 3.8100   LearningRate 0.0121   Epoch: 13   Global Step: 65940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:43,883-Speed 5579.19 samples/sec   Loss 3.8712   LearningRate 0.0121   Epoch: 13   Global Step: 65950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:45,726-Speed 5558.38 samples/sec   Loss 3.9478   LearningRate 0.0121   Epoch: 13   Global Step: 65960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:47,596-Speed 5479.54 samples/sec   Loss 3.7663   LearningRate 0.0121   Epoch: 13   Global Step: 65970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:29:49,487-Speed 5416.39 samples/sec   Loss 3.9307   LearningRate 0.0121   Epoch: 13   Global Step: 65980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:29:51,326-Speed 5571.21 samples/sec   Loss 3.8977   LearningRate 0.0121   Epoch: 13   Global Step: 65990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:29:53,219-Speed 5414.30 samples/sec   Loss 3.9059   LearningRate 0.0121   Epoch: 13   Global Step: 66000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:30:20,604-[lfw][66000]XNorm: 21.880126
Training: 2022-04-11 14:30:20,605-[lfw][66000]Accuracy-Flip: 0.99817+-0.00229
Training: 2022-04-11 14:30:20,605-[lfw][66000]Accuracy-Highest: 0.99817
Training: 2022-04-11 14:30:52,172-[cfp_fp][66000]XNorm: 19.674214
Training: 2022-04-11 14:30:52,174-[cfp_fp][66000]Accuracy-Flip: 0.97771+-0.00698
Training: 2022-04-11 14:30:52,174-[cfp_fp][66000]Accuracy-Highest: 0.97771
Training: 2022-04-11 14:31:19,344-[agedb_30][66000]XNorm: 21.918721
Training: 2022-04-11 14:31:19,345-[agedb_30][66000]Accuracy-Flip: 0.97917+-0.00704
Training: 2022-04-11 14:31:19,346-[agedb_30][66000]Accuracy-Highest: 0.97917
Training: 2022-04-11 14:31:21,246-Speed 116.33 samples/sec   Loss 3.8155   LearningRate 0.0121   Epoch: 13   Global Step: 66010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:31:23,096-Speed 5539.60 samples/sec   Loss 3.8461   LearningRate 0.0121   Epoch: 13   Global Step: 66020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:31:24,957-Speed 5504.83 samples/sec   Loss 3.8517   LearningRate 0.0121   Epoch: 13   Global Step: 66030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:31:26,797-Speed 5570.66 samples/sec   Loss 3.9264   LearningRate 0.0121   Epoch: 13   Global Step: 66040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:31:28,665-Speed 5484.05 samples/sec   Loss 3.9725   LearningRate 0.0120   Epoch: 13   Global Step: 66050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:31:30,569-Speed 5386.27 samples/sec   Loss 3.8912   LearningRate 0.0120   Epoch: 13   Global Step: 66060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:31:32,422-Speed 5528.55 samples/sec   Loss 3.7526   LearningRate 0.0120   Epoch: 13   Global Step: 66070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:31:34,281-Speed 5511.26 samples/sec   Loss 3.9204   LearningRate 0.0120   Epoch: 13   Global Step: 66080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:31:36,122-Speed 5563.95 samples/sec   Loss 3.7896   LearningRate 0.0120   Epoch: 13   Global Step: 66090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:31:38,043-Speed 5334.98 samples/sec   Loss 4.0096   LearningRate 0.0120   Epoch: 13   Global Step: 66100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:31:39,880-Speed 5577.04 samples/sec   Loss 3.8844   LearningRate 0.0120   Epoch: 13   Global Step: 66110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:31:41,775-Speed 5406.28 samples/sec   Loss 4.1297   LearningRate 0.0120   Epoch: 13   Global Step: 66120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:31:43,610-Speed 5581.87 samples/sec   Loss 3.9182   LearningRate 0.0120   Epoch: 13   Global Step: 66130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:31:45,515-Speed 5380.02 samples/sec   Loss 3.9638   LearningRate 0.0120   Epoch: 13   Global Step: 66140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:31:47,358-Speed 5557.72 samples/sec   Loss 3.8326   LearningRate 0.0120   Epoch: 13   Global Step: 66150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:31:49,216-Speed 5514.82 samples/sec   Loss 3.9170   LearningRate 0.0120   Epoch: 13   Global Step: 66160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:31:51,067-Speed 5536.76 samples/sec   Loss 4.0140   LearningRate 0.0120   Epoch: 13   Global Step: 66170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:31:52,929-Speed 5501.19 samples/sec   Loss 3.8904   LearningRate 0.0120   Epoch: 13   Global Step: 66180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:31:54,793-Speed 5499.00 samples/sec   Loss 3.9290   LearningRate 0.0120   Epoch: 13   Global Step: 66190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:31:56,653-Speed 5508.29 samples/sec   Loss 4.0331   LearningRate 0.0119   Epoch: 13   Global Step: 66200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:31:58,513-Speed 5510.28 samples/sec   Loss 3.9999   LearningRate 0.0119   Epoch: 13   Global Step: 66210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:32:00,370-Speed 5519.21 samples/sec   Loss 3.7697   LearningRate 0.0119   Epoch: 13   Global Step: 66220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:02,265-Speed 5404.17 samples/sec   Loss 4.0090   LearningRate 0.0119   Epoch: 13   Global Step: 66230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:04,120-Speed 5523.98 samples/sec   Loss 3.9316   LearningRate 0.0119   Epoch: 13   Global Step: 66240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:05,970-Speed 5541.22 samples/sec   Loss 3.8835   LearningRate 0.0119   Epoch: 13   Global Step: 66250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:07,842-Speed 5473.52 samples/sec   Loss 3.8897   LearningRate 0.0119   Epoch: 13   Global Step: 66260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:09,693-Speed 5535.64 samples/sec   Loss 4.0725   LearningRate 0.0119   Epoch: 13   Global Step: 66270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:11,549-Speed 5518.21 samples/sec   Loss 4.0323   LearningRate 0.0119   Epoch: 13   Global Step: 66280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:13,433-Speed 5438.99 samples/sec   Loss 3.9105   LearningRate 0.0119   Epoch: 13   Global Step: 66290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:15,350-Speed 5346.07 samples/sec   Loss 3.9492   LearningRate 0.0119   Epoch: 13   Global Step: 66300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:17,230-Speed 5448.63 samples/sec   Loss 4.0488   LearningRate 0.0119   Epoch: 13   Global Step: 66310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:32:19,079-Speed 5540.62 samples/sec   Loss 3.9379   LearningRate 0.0119   Epoch: 13   Global Step: 66320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:32:20,950-Speed 5475.48 samples/sec   Loss 4.0670   LearningRate 0.0119   Epoch: 13   Global Step: 66330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:32:22,821-Speed 5476.21 samples/sec   Loss 3.9474   LearningRate 0.0118   Epoch: 13   Global Step: 66340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:32:24,702-Speed 5446.59 samples/sec   Loss 3.9496   LearningRate 0.0118   Epoch: 13   Global Step: 66350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:32:26,561-Speed 5513.03 samples/sec   Loss 4.0355   LearningRate 0.0118   Epoch: 13   Global Step: 66360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:32:28,415-Speed 5526.19 samples/sec   Loss 4.0609   LearningRate 0.0118   Epoch: 13   Global Step: 66370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:32:30,280-Speed 5496.19 samples/sec   Loss 4.0765   LearningRate 0.0118   Epoch: 13   Global Step: 66380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:32:32,123-Speed 5557.39 samples/sec   Loss 4.0659   LearningRate 0.0118   Epoch: 13   Global Step: 66390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:32:33,986-Speed 5499.77 samples/sec   Loss 4.0594   LearningRate 0.0118   Epoch: 13   Global Step: 66400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:32:35,828-Speed 5564.50 samples/sec   Loss 3.9908   LearningRate 0.0118   Epoch: 13   Global Step: 66410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:37,708-Speed 5448.18 samples/sec   Loss 4.0227   LearningRate 0.0118   Epoch: 13   Global Step: 66420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:39,561-Speed 5529.88 samples/sec   Loss 3.8873   LearningRate 0.0118   Epoch: 13   Global Step: 66430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:41,403-Speed 5562.85 samples/sec   Loss 4.0360   LearningRate 0.0118   Epoch: 13   Global Step: 66440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:43,241-Speed 5574.92 samples/sec   Loss 4.0039   LearningRate 0.0118   Epoch: 13   Global Step: 66450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:45,078-Speed 5574.17 samples/sec   Loss 4.0821   LearningRate 0.0118   Epoch: 13   Global Step: 66460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:46,947-Speed 5481.92 samples/sec   Loss 4.0201   LearningRate 0.0118   Epoch: 13   Global Step: 66470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:48,826-Speed 5454.33 samples/sec   Loss 4.1068   LearningRate 0.0118   Epoch: 13   Global Step: 66480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:50,676-Speed 5537.50 samples/sec   Loss 4.0018   LearningRate 0.0117   Epoch: 13   Global Step: 66490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:52,554-Speed 5455.98 samples/sec   Loss 3.9654   LearningRate 0.0117   Epoch: 13   Global Step: 66500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:54,403-Speed 5544.27 samples/sec   Loss 4.0713   LearningRate 0.0117   Epoch: 13   Global Step: 66510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:56,271-Speed 5484.16 samples/sec   Loss 4.0856   LearningRate 0.0117   Epoch: 13   Global Step: 66520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:32:58,178-Speed 5373.00 samples/sec   Loss 3.9740   LearningRate 0.0117   Epoch: 13   Global Step: 66530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:33:00,070-Speed 5416.82 samples/sec   Loss 3.9570   LearningRate 0.0117   Epoch: 13   Global Step: 66540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:33:01,997-Speed 5317.72 samples/sec   Loss 4.0148   LearningRate 0.0117   Epoch: 13   Global Step: 66550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:33:03,868-Speed 5475.31 samples/sec   Loss 4.0635   LearningRate 0.0117   Epoch: 13   Global Step: 66560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:33:05,732-Speed 5497.47 samples/sec   Loss 4.1256   LearningRate 0.0117   Epoch: 13   Global Step: 66570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:33:07,577-Speed 5553.04 samples/sec   Loss 4.0284   LearningRate 0.0117   Epoch: 13   Global Step: 66580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:09,443-Speed 5490.68 samples/sec   Loss 3.9673   LearningRate 0.0117   Epoch: 13   Global Step: 66590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:11,345-Speed 5389.15 samples/sec   Loss 4.1801   LearningRate 0.0117   Epoch: 13   Global Step: 66600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:13,221-Speed 5464.00 samples/sec   Loss 4.1901   LearningRate 0.0117   Epoch: 13   Global Step: 66610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:15,089-Speed 5485.23 samples/sec   Loss 4.1116   LearningRate 0.0117   Epoch: 13   Global Step: 66620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:16,949-Speed 5509.93 samples/sec   Loss 4.1359   LearningRate 0.0117   Epoch: 13   Global Step: 66630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:18,791-Speed 5561.55 samples/sec   Loss 4.1914   LearningRate 0.0116   Epoch: 13   Global Step: 66640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:20,687-Speed 5405.78 samples/sec   Loss 4.0352   LearningRate 0.0116   Epoch: 13   Global Step: 66650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:22,556-Speed 5480.56 samples/sec   Loss 4.0326   LearningRate 0.0116   Epoch: 13   Global Step: 66660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:24,400-Speed 5559.42 samples/sec   Loss 3.8869   LearningRate 0.0116   Epoch: 13   Global Step: 66670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:26,259-Speed 5512.07 samples/sec   Loss 4.0117   LearningRate 0.0116   Epoch: 13   Global Step: 66680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:33:28,148-Speed 5423.72 samples/sec   Loss 4.0926   LearningRate 0.0116   Epoch: 13   Global Step: 66690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:33:30,053-Speed 5380.82 samples/sec   Loss 4.0472   LearningRate 0.0116   Epoch: 13   Global Step: 66700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:33:31,901-Speed 5543.06 samples/sec   Loss 4.2154   LearningRate 0.0116   Epoch: 13   Global Step: 66710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:33:33,755-Speed 5525.90 samples/sec   Loss 4.1350   LearningRate 0.0116   Epoch: 13   Global Step: 66720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:35,593-Speed 5575.73 samples/sec   Loss 4.0787   LearningRate 0.0116   Epoch: 13   Global Step: 66730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:37,436-Speed 5557.63 samples/sec   Loss 4.0438   LearningRate 0.0116   Epoch: 13   Global Step: 66740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:39,328-Speed 5413.89 samples/sec   Loss 4.0771   LearningRate 0.0116   Epoch: 13   Global Step: 66750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:41,168-Speed 5571.51 samples/sec   Loss 4.1115   LearningRate 0.0116   Epoch: 13   Global Step: 66760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:43,015-Speed 5546.18 samples/sec   Loss 4.0593   LearningRate 0.0116   Epoch: 13   Global Step: 66770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:44,909-Speed 5409.31 samples/sec   Loss 4.0115   LearningRate 0.0116   Epoch: 13   Global Step: 66780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:46,751-Speed 5561.73 samples/sec   Loss 4.1227   LearningRate 0.0115   Epoch: 13   Global Step: 66790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:48,628-Speed 5459.39 samples/sec   Loss 4.1141   LearningRate 0.0115   Epoch: 13   Global Step: 66800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:50,478-Speed 5539.81 samples/sec   Loss 4.1522   LearningRate 0.0115   Epoch: 13   Global Step: 66810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:33:52,350-Speed 5472.60 samples/sec   Loss 4.1459   LearningRate 0.0115   Epoch: 13   Global Step: 66820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:33:54,257-Speed 5370.78 samples/sec   Loss 4.0624   LearningRate 0.0115   Epoch: 13   Global Step: 66830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:33:56,135-Speed 5460.37 samples/sec   Loss 4.0983   LearningRate 0.0115   Epoch: 13   Global Step: 66840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:33:58,023-Speed 5426.97 samples/sec   Loss 4.2030   LearningRate 0.0115   Epoch: 13   Global Step: 66850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:33:59,915-Speed 5417.18 samples/sec   Loss 4.1392   LearningRate 0.0115   Epoch: 13   Global Step: 66860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:34:01,768-Speed 5526.93 samples/sec   Loss 4.1673   LearningRate 0.0115   Epoch: 13   Global Step: 66870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:34:03,658-Speed 5424.12 samples/sec   Loss 4.0805   LearningRate 0.0115   Epoch: 13   Global Step: 66880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:34:05,578-Speed 5336.31 samples/sec   Loss 4.1179   LearningRate 0.0115   Epoch: 13   Global Step: 66890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:34:07,434-Speed 5521.12 samples/sec   Loss 4.0957   LearningRate 0.0115   Epoch: 13   Global Step: 66900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:34:09,317-Speed 5440.35 samples/sec   Loss 3.9951   LearningRate 0.0115   Epoch: 13   Global Step: 66910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:34:11,131-Speed 5649.93 samples/sec   Loss 4.2436   LearningRate 0.0115   Epoch: 13   Global Step: 66920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:34:12,975-Speed 5556.44 samples/sec   Loss 4.1368   LearningRate 0.0114   Epoch: 13   Global Step: 66930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:34:14,837-Speed 5502.45 samples/sec   Loss 4.2616   LearningRate 0.0114   Epoch: 13   Global Step: 66940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:34:16,727-Speed 5421.17 samples/sec   Loss 4.1127   LearningRate 0.0114   Epoch: 13   Global Step: 66950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:34:18,593-Speed 5491.28 samples/sec   Loss 4.0774   LearningRate 0.0114   Epoch: 13   Global Step: 66960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:34:20,496-Speed 5384.54 samples/sec   Loss 4.0076   LearningRate 0.0114   Epoch: 13   Global Step: 66970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:34:22,337-Speed 5565.34 samples/sec   Loss 4.1918   LearningRate 0.0114   Epoch: 13   Global Step: 66980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:34:24,179-Speed 5561.52 samples/sec   Loss 4.2360   LearningRate 0.0114   Epoch: 13   Global Step: 66990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:34:26,055-Speed 5462.89 samples/sec   Loss 4.0671   LearningRate 0.0114   Epoch: 13   Global Step: 67000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:34:27,907-Speed 5531.68 samples/sec   Loss 4.0806   LearningRate 0.0114   Epoch: 13   Global Step: 67010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:34:29,776-Speed 5485.04 samples/sec   Loss 4.1770   LearningRate 0.0114   Epoch: 13   Global Step: 67020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:34:31,619-Speed 5558.57 samples/sec   Loss 4.0944   LearningRate 0.0114   Epoch: 13   Global Step: 67030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:34:33,492-Speed 5469.74 samples/sec   Loss 4.1016   LearningRate 0.0114   Epoch: 13   Global Step: 67040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:34:35,341-Speed 5542.86 samples/sec   Loss 4.1466   LearningRate 0.0114   Epoch: 13   Global Step: 67050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:34:37,243-Speed 5386.28 samples/sec   Loss 4.0943   LearningRate 0.0114   Epoch: 13   Global Step: 67060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:34:39,084-Speed 5563.47 samples/sec   Loss 4.1521   LearningRate 0.0114   Epoch: 13   Global Step: 67070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:34:40,975-Speed 5417.99 samples/sec   Loss 4.1380   LearningRate 0.0113   Epoch: 13   Global Step: 67080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:34:42,819-Speed 5556.98 samples/sec   Loss 4.3175   LearningRate 0.0113   Epoch: 13   Global Step: 67090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:34:44,689-Speed 5478.00 samples/sec   Loss 4.2292   LearningRate 0.0113   Epoch: 13   Global Step: 67100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:34:46,548-Speed 5511.51 samples/sec   Loss 4.1569   LearningRate 0.0113   Epoch: 13   Global Step: 67110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:34:48,428-Speed 5451.14 samples/sec   Loss 4.0476   LearningRate 0.0113   Epoch: 13   Global Step: 67120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:34:50,268-Speed 5569.65 samples/sec   Loss 4.1525   LearningRate 0.0113   Epoch: 13   Global Step: 67130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:34:52,120-Speed 5533.03 samples/sec   Loss 4.0515   LearningRate 0.0113   Epoch: 13   Global Step: 67140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:34:53,982-Speed 5500.37 samples/sec   Loss 4.1247   LearningRate 0.0113   Epoch: 13   Global Step: 67150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:34:55,825-Speed 5561.06 samples/sec   Loss 4.1346   LearningRate 0.0113   Epoch: 13   Global Step: 67160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:34:57,707-Speed 5444.08 samples/sec   Loss 4.2185   LearningRate 0.0113   Epoch: 13   Global Step: 67170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:34:59,550-Speed 5562.16 samples/sec   Loss 4.0696   LearningRate 0.0113   Epoch: 13   Global Step: 67180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:35:01,424-Speed 5465.10 samples/sec   Loss 4.2493   LearningRate 0.0113   Epoch: 13   Global Step: 67190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:35:03,304-Speed 5449.41 samples/sec   Loss 4.1929   LearningRate 0.0113   Epoch: 13   Global Step: 67200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:35:05,144-Speed 5568.81 samples/sec   Loss 4.2452   LearningRate 0.0113   Epoch: 13   Global Step: 67210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:35:07,004-Speed 5508.42 samples/sec   Loss 4.1278   LearningRate 0.0113   Epoch: 13   Global Step: 67220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:35:08,859-Speed 5522.60 samples/sec   Loss 4.0745   LearningRate 0.0112   Epoch: 13   Global Step: 67230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:10,738-Speed 5455.95 samples/sec   Loss 4.1756   LearningRate 0.0112   Epoch: 13   Global Step: 67240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:12,611-Speed 5467.43 samples/sec   Loss 4.0360   LearningRate 0.0112   Epoch: 13   Global Step: 67250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:14,494-Speed 5443.24 samples/sec   Loss 4.1258   LearningRate 0.0112   Epoch: 13   Global Step: 67260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:16,350-Speed 5519.59 samples/sec   Loss 4.1791   LearningRate 0.0112   Epoch: 13   Global Step: 67270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:18,208-Speed 5516.62 samples/sec   Loss 4.0995   LearningRate 0.0112   Epoch: 13   Global Step: 67280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:20,053-Speed 5554.05 samples/sec   Loss 4.0307   LearningRate 0.0112   Epoch: 13   Global Step: 67290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:21,936-Speed 5441.53 samples/sec   Loss 4.1069   LearningRate 0.0112   Epoch: 13   Global Step: 67300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:23,784-Speed 5542.69 samples/sec   Loss 4.2484   LearningRate 0.0112   Epoch: 13   Global Step: 67310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:25,639-Speed 5526.14 samples/sec   Loss 4.1866   LearningRate 0.0112   Epoch: 13   Global Step: 67320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:27,498-Speed 5510.01 samples/sec   Loss 4.2994   LearningRate 0.0112   Epoch: 13   Global Step: 67330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:35:29,398-Speed 5392.68 samples/sec   Loss 4.2570   LearningRate 0.0112   Epoch: 13   Global Step: 67340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:35:31,287-Speed 5424.02 samples/sec   Loss 4.1705   LearningRate 0.0112   Epoch: 13   Global Step: 67350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:35:33,130-Speed 5560.03 samples/sec   Loss 4.0962   LearningRate 0.0112   Epoch: 13   Global Step: 67360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:34,997-Speed 5487.89 samples/sec   Loss 4.2767   LearningRate 0.0112   Epoch: 13   Global Step: 67370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:36,881-Speed 5438.89 samples/sec   Loss 4.1611   LearningRate 0.0112   Epoch: 13   Global Step: 67380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:38,748-Speed 5486.43 samples/sec   Loss 4.2119   LearningRate 0.0111   Epoch: 13   Global Step: 67390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:40,597-Speed 5545.67 samples/sec   Loss 4.2426   LearningRate 0.0111   Epoch: 13   Global Step: 67400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:42,480-Speed 5440.35 samples/sec   Loss 4.1740   LearningRate 0.0111   Epoch: 13   Global Step: 67410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:44,321-Speed 5567.33 samples/sec   Loss 4.2069   LearningRate 0.0111   Epoch: 13   Global Step: 67420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:46,166-Speed 5554.25 samples/sec   Loss 4.1791   LearningRate 0.0111   Epoch: 13   Global Step: 67430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:48,044-Speed 5453.45 samples/sec   Loss 4.1526   LearningRate 0.0111   Epoch: 13   Global Step: 67440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:49,925-Speed 5448.95 samples/sec   Loss 4.2264   LearningRate 0.0111   Epoch: 13   Global Step: 67450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:35:51,801-Speed 5461.09 samples/sec   Loss 4.1289   LearningRate 0.0111   Epoch: 13   Global Step: 67460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:35:53,676-Speed 5464.25 samples/sec   Loss 4.3987   LearningRate 0.0111   Epoch: 13   Global Step: 67470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:35:55,523-Speed 5546.68 samples/sec   Loss 4.2028   LearningRate 0.0111   Epoch: 13   Global Step: 67480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:35:57,366-Speed 5559.38 samples/sec   Loss 4.1519   LearningRate 0.0111   Epoch: 13   Global Step: 67490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:35:59,238-Speed 5473.13 samples/sec   Loss 4.3414   LearningRate 0.0111   Epoch: 13   Global Step: 67500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:36:01,075-Speed 5577.67 samples/sec   Loss 4.1051   LearningRate 0.0111   Epoch: 13   Global Step: 67510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:36:02,935-Speed 5507.31 samples/sec   Loss 4.2515   LearningRate 0.0111   Epoch: 13   Global Step: 67520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:04,797-Speed 5502.57 samples/sec   Loss 4.1574   LearningRate 0.0111   Epoch: 13   Global Step: 67530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:06,640-Speed 5558.61 samples/sec   Loss 4.1359   LearningRate 0.0110   Epoch: 13   Global Step: 67540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:08,496-Speed 5520.32 samples/sec   Loss 4.1381   LearningRate 0.0110   Epoch: 13   Global Step: 67550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:10,372-Speed 5464.42 samples/sec   Loss 4.3001   LearningRate 0.0110   Epoch: 13   Global Step: 67560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:12,246-Speed 5466.33 samples/sec   Loss 4.1859   LearningRate 0.0110   Epoch: 13   Global Step: 67570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:14,103-Speed 5516.61 samples/sec   Loss 4.1774   LearningRate 0.0110   Epoch: 13   Global Step: 67580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:15,967-Speed 5497.77 samples/sec   Loss 4.1687   LearningRate 0.0110   Epoch: 13   Global Step: 67590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:17,832-Speed 5492.61 samples/sec   Loss 4.2918   LearningRate 0.0110   Epoch: 13   Global Step: 67600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:19,674-Speed 5563.93 samples/sec   Loss 4.1992   LearningRate 0.0110   Epoch: 13   Global Step: 67610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:21,515-Speed 5563.71 samples/sec   Loss 4.1210   LearningRate 0.0110   Epoch: 13   Global Step: 67620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:36:23,391-Speed 5462.15 samples/sec   Loss 4.1382   LearningRate 0.0110   Epoch: 13   Global Step: 67630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:36:25,310-Speed 5339.04 samples/sec   Loss 4.3432   LearningRate 0.0110   Epoch: 13   Global Step: 67640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:36:27,160-Speed 5540.22 samples/sec   Loss 4.3349   LearningRate 0.0110   Epoch: 13   Global Step: 67650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:36:29,014-Speed 5523.90 samples/sec   Loss 4.2466   LearningRate 0.0110   Epoch: 13   Global Step: 67660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:30,852-Speed 5575.72 samples/sec   Loss 4.1079   LearningRate 0.0110   Epoch: 13   Global Step: 67670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:32,747-Speed 5407.19 samples/sec   Loss 4.2628   LearningRate 0.0110   Epoch: 13   Global Step: 67680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:34,614-Speed 5486.63 samples/sec   Loss 4.2787   LearningRate 0.0109   Epoch: 13   Global Step: 67690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:36,484-Speed 5481.54 samples/sec   Loss 4.3350   LearningRate 0.0109   Epoch: 13   Global Step: 67700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:38,327-Speed 5558.89 samples/sec   Loss 4.2180   LearningRate 0.0109   Epoch: 13   Global Step: 67710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:40,187-Speed 5509.57 samples/sec   Loss 4.2091   LearningRate 0.0109   Epoch: 13   Global Step: 67720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:42,030-Speed 5557.95 samples/sec   Loss 4.1618   LearningRate 0.0109   Epoch: 13   Global Step: 67730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:43,886-Speed 5519.82 samples/sec   Loss 4.3754   LearningRate 0.0109   Epoch: 13   Global Step: 67740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:45,767-Speed 5448.77 samples/sec   Loss 4.2696   LearningRate 0.0109   Epoch: 13   Global Step: 67750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:47,615-Speed 5545.86 samples/sec   Loss 4.0752   LearningRate 0.0109   Epoch: 13   Global Step: 67760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:36:49,485-Speed 5477.61 samples/sec   Loss 4.2247   LearningRate 0.0109   Epoch: 13   Global Step: 67770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:36:51,374-Speed 5423.82 samples/sec   Loss 4.1913   LearningRate 0.0109   Epoch: 13   Global Step: 67780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:36:53,256-Speed 5446.11 samples/sec   Loss 4.1642   LearningRate 0.0109   Epoch: 13   Global Step: 67790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:55,097-Speed 5566.21 samples/sec   Loss 4.0879   LearningRate 0.0109   Epoch: 13   Global Step: 67800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:56,956-Speed 5511.97 samples/sec   Loss 4.2527   LearningRate 0.0109   Epoch: 13   Global Step: 67810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:36:58,854-Speed 5398.90 samples/sec   Loss 4.1304   LearningRate 0.0109   Epoch: 13   Global Step: 67820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:00,724-Speed 5479.28 samples/sec   Loss 4.2438   LearningRate 0.0109   Epoch: 13   Global Step: 67830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:02,582-Speed 5512.19 samples/sec   Loss 4.0108   LearningRate 0.0108   Epoch: 13   Global Step: 67840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:04,447-Speed 5495.16 samples/sec   Loss 4.1848   LearningRate 0.0108   Epoch: 13   Global Step: 67850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:06,302-Speed 5526.87 samples/sec   Loss 4.1909   LearningRate 0.0108   Epoch: 13   Global Step: 67860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:08,180-Speed 5455.15 samples/sec   Loss 4.2387   LearningRate 0.0108   Epoch: 13   Global Step: 67870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:10,018-Speed 5575.49 samples/sec   Loss 4.1728   LearningRate 0.0108   Epoch: 13   Global Step: 67880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:11,871-Speed 5526.01 samples/sec   Loss 4.2741   LearningRate 0.0108   Epoch: 13   Global Step: 67890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:37:13,761-Speed 5422.62 samples/sec   Loss 4.1842   LearningRate 0.0108   Epoch: 13   Global Step: 67900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:37:15,653-Speed 5419.28 samples/sec   Loss 4.1553   LearningRate 0.0108   Epoch: 13   Global Step: 67910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:17,529-Speed 5461.27 samples/sec   Loss 4.3066   LearningRate 0.0108   Epoch: 13   Global Step: 67920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:19,419-Speed 5420.00 samples/sec   Loss 4.1912   LearningRate 0.0108   Epoch: 13   Global Step: 67930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:21,261-Speed 5564.35 samples/sec   Loss 4.4101   LearningRate 0.0108   Epoch: 13   Global Step: 67940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:23,144-Speed 5440.27 samples/sec   Loss 4.3714   LearningRate 0.0108   Epoch: 13   Global Step: 67950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:24,993-Speed 5542.02 samples/sec   Loss 4.1427   LearningRate 0.0108   Epoch: 13   Global Step: 67960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:26,881-Speed 5427.68 samples/sec   Loss 4.1337   LearningRate 0.0108   Epoch: 13   Global Step: 67970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:28,729-Speed 5544.91 samples/sec   Loss 4.3434   LearningRate 0.0108   Epoch: 13   Global Step: 67980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:30,595-Speed 5492.47 samples/sec   Loss 4.2041   LearningRate 0.0108   Epoch: 13   Global Step: 67990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:32,469-Speed 5467.34 samples/sec   Loss 4.2030   LearningRate 0.0107   Epoch: 13   Global Step: 68000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:37:59,885-[lfw][68000]XNorm: 21.546903
Training: 2022-04-11 14:37:59,886-[lfw][68000]Accuracy-Flip: 0.99767+-0.00260
Training: 2022-04-11 14:37:59,886-[lfw][68000]Accuracy-Highest: 0.99817
Training: 2022-04-11 14:38:31,289-[cfp_fp][68000]XNorm: 19.608410
Training: 2022-04-11 14:38:31,290-[cfp_fp][68000]Accuracy-Flip: 0.97886+-0.00820
Training: 2022-04-11 14:38:31,291-[cfp_fp][68000]Accuracy-Highest: 0.97886
Training: 2022-04-11 14:38:58,319-[agedb_30][68000]XNorm: 21.858776
Training: 2022-04-11 14:38:58,320-[agedb_30][68000]Accuracy-Flip: 0.97867+-0.00718
Training: 2022-04-11 14:38:58,321-[agedb_30][68000]Accuracy-Highest: 0.97917
Training: 2022-04-11 14:39:00,207-Speed 116.71 samples/sec   Loss 4.2049   LearningRate 0.0107   Epoch: 13   Global Step: 68010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:39:02,086-Speed 5450.62 samples/sec   Loss 4.2409   LearningRate 0.0107   Epoch: 13   Global Step: 68020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:39:03,981-Speed 5408.95 samples/sec   Loss 4.0797   LearningRate 0.0107   Epoch: 13   Global Step: 68030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:39:05,849-Speed 5486.36 samples/sec   Loss 4.2234   LearningRate 0.0107   Epoch: 13   Global Step: 68040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:39:07,737-Speed 5427.24 samples/sec   Loss 4.3585   LearningRate 0.0107   Epoch: 13   Global Step: 68050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:09,585-Speed 5546.49 samples/sec   Loss 4.4099   LearningRate 0.0107   Epoch: 13   Global Step: 68060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:11,449-Speed 5495.81 samples/sec   Loss 4.2501   LearningRate 0.0107   Epoch: 13   Global Step: 68070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:13,326-Speed 5461.50 samples/sec   Loss 4.2014   LearningRate 0.0107   Epoch: 13   Global Step: 68080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:15,180-Speed 5525.14 samples/sec   Loss 4.2604   LearningRate 0.0107   Epoch: 13   Global Step: 68090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:17,074-Speed 5411.84 samples/sec   Loss 4.1760   LearningRate 0.0107   Epoch: 13   Global Step: 68100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:18,904-Speed 5597.93 samples/sec   Loss 4.1880   LearningRate 0.0107   Epoch: 13   Global Step: 68110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:20,741-Speed 5574.75 samples/sec   Loss 4.2958   LearningRate 0.0107   Epoch: 13   Global Step: 68120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:22,621-Speed 5452.21 samples/sec   Loss 4.1629   LearningRate 0.0107   Epoch: 13   Global Step: 68130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:24,497-Speed 5458.94 samples/sec   Loss 4.2192   LearningRate 0.0107   Epoch: 13   Global Step: 68140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:26,326-Speed 5603.82 samples/sec   Loss 4.1989   LearningRate 0.0106   Epoch: 13   Global Step: 68150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:28,222-Speed 5404.70 samples/sec   Loss 4.1857   LearningRate 0.0106   Epoch: 13   Global Step: 68160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:30,075-Speed 5527.84 samples/sec   Loss 4.2126   LearningRate 0.0106   Epoch: 13   Global Step: 68170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:31,944-Speed 5482.56 samples/sec   Loss 4.3220   LearningRate 0.0106   Epoch: 13   Global Step: 68180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:33,804-Speed 5509.85 samples/sec   Loss 4.2084   LearningRate 0.0106   Epoch: 13   Global Step: 68190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:35,672-Speed 5486.15 samples/sec   Loss 4.2133   LearningRate 0.0106   Epoch: 13   Global Step: 68200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:37,516-Speed 5553.66 samples/sec   Loss 4.0206   LearningRate 0.0106   Epoch: 13   Global Step: 68210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:39,373-Speed 5518.96 samples/sec   Loss 4.1756   LearningRate 0.0106   Epoch: 13   Global Step: 68220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:41,258-Speed 5435.33 samples/sec   Loss 4.1807   LearningRate 0.0106   Epoch: 13   Global Step: 68230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:43,103-Speed 5554.02 samples/sec   Loss 4.2309   LearningRate 0.0106   Epoch: 13   Global Step: 68240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:44,962-Speed 5511.82 samples/sec   Loss 4.1488   LearningRate 0.0106   Epoch: 13   Global Step: 68250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:39:46,808-Speed 5548.64 samples/sec   Loss 4.1832   LearningRate 0.0106   Epoch: 13   Global Step: 68260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:39:48,693-Speed 5438.38 samples/sec   Loss 4.3255   LearningRate 0.0106   Epoch: 13   Global Step: 68270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:39:50,573-Speed 5448.72 samples/sec   Loss 4.0900   LearningRate 0.0106   Epoch: 13   Global Step: 68280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:39:52,456-Speed 5440.81 samples/sec   Loss 4.2798   LearningRate 0.0106   Epoch: 13   Global Step: 68290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:39:54,287-Speed 5596.26 samples/sec   Loss 4.2361   LearningRate 0.0106   Epoch: 13   Global Step: 68300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:56,157-Speed 5479.53 samples/sec   Loss 4.2361   LearningRate 0.0105   Epoch: 13   Global Step: 68310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:58,002-Speed 5555.22 samples/sec   Loss 4.3030   LearningRate 0.0105   Epoch: 13   Global Step: 68320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:39:59,836-Speed 5584.94 samples/sec   Loss 4.1454   LearningRate 0.0105   Epoch: 13   Global Step: 68330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:01,722-Speed 5431.19 samples/sec   Loss 4.2533   LearningRate 0.0105   Epoch: 13   Global Step: 68340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:03,584-Speed 5505.41 samples/sec   Loss 4.1852   LearningRate 0.0105   Epoch: 13   Global Step: 68350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:05,696-Speed 4850.19 samples/sec   Loss 4.2310   LearningRate 0.0105   Epoch: 13   Global Step: 68360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:07,541-Speed 5552.20 samples/sec   Loss 4.2268   LearningRate 0.0105   Epoch: 13   Global Step: 68370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:10,610-Speed 3337.57 samples/sec   Loss 4.2220   LearningRate 0.0105   Epoch: 13   Global Step: 68380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:12,477-Speed 5488.29 samples/sec   Loss 4.2140   LearningRate 0.0105   Epoch: 13   Global Step: 68390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:14,350-Speed 5472.68 samples/sec   Loss 4.2933   LearningRate 0.0105   Epoch: 13   Global Step: 68400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:40:16,239-Speed 5422.76 samples/sec   Loss 4.0653   LearningRate 0.0105   Epoch: 13   Global Step: 68410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:40:18,114-Speed 5465.97 samples/sec   Loss 4.1842   LearningRate 0.0105   Epoch: 13   Global Step: 68420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:40:19,965-Speed 5536.60 samples/sec   Loss 4.1757   LearningRate 0.0105   Epoch: 13   Global Step: 68430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:40:21,815-Speed 5537.59 samples/sec   Loss 4.2046   LearningRate 0.0105   Epoch: 13   Global Step: 68440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:23,718-Speed 5385.33 samples/sec   Loss 4.2201   LearningRate 0.0105   Epoch: 13   Global Step: 68450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:25,588-Speed 5480.29 samples/sec   Loss 4.3180   LearningRate 0.0104   Epoch: 13   Global Step: 68460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:27,468-Speed 5450.77 samples/sec   Loss 4.1838   LearningRate 0.0104   Epoch: 13   Global Step: 68470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:29,321-Speed 5529.01 samples/sec   Loss 4.0703   LearningRate 0.0104   Epoch: 13   Global Step: 68480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:31,180-Speed 5509.40 samples/sec   Loss 4.1207   LearningRate 0.0104   Epoch: 13   Global Step: 68490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:33,032-Speed 5533.15 samples/sec   Loss 4.1849   LearningRate 0.0104   Epoch: 13   Global Step: 68500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:34,878-Speed 5549.71 samples/sec   Loss 4.1536   LearningRate 0.0104   Epoch: 13   Global Step: 68510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:36,845-Speed 5210.19 samples/sec   Loss 4.2071   LearningRate 0.0104   Epoch: 13   Global Step: 68520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:38,759-Speed 5354.49 samples/sec   Loss 4.0894   LearningRate 0.0104   Epoch: 13   Global Step: 68530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:40,751-Speed 5142.98 samples/sec   Loss 4.0119   LearningRate 0.0104   Epoch: 13   Global Step: 68540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:40:42,612-Speed 5504.25 samples/sec   Loss 4.2303   LearningRate 0.0104   Epoch: 13   Global Step: 68550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:40:44,482-Speed 5481.52 samples/sec   Loss 4.1464   LearningRate 0.0104   Epoch: 13   Global Step: 68560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:40:46,341-Speed 5511.74 samples/sec   Loss 4.2595   LearningRate 0.0104   Epoch: 13   Global Step: 68570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:40:48,221-Speed 5450.88 samples/sec   Loss 4.1859   LearningRate 0.0104   Epoch: 13   Global Step: 68580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:40:50,076-Speed 5523.28 samples/sec   Loss 4.2394   LearningRate 0.0104   Epoch: 13   Global Step: 68590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:40:51,977-Speed 5390.01 samples/sec   Loss 4.1553   LearningRate 0.0104   Epoch: 13   Global Step: 68600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:40:53,832-Speed 5524.42 samples/sec   Loss 4.2568   LearningRate 0.0104   Epoch: 13   Global Step: 68610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:40:55,676-Speed 5553.96 samples/sec   Loss 4.2299   LearningRate 0.0103   Epoch: 13   Global Step: 68620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:40:57,582-Speed 5375.79 samples/sec   Loss 4.2097   LearningRate 0.0103   Epoch: 13   Global Step: 68630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:40:59,451-Speed 5481.92 samples/sec   Loss 4.2560   LearningRate 0.0103   Epoch: 13   Global Step: 68640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:41:01,347-Speed 5402.99 samples/sec   Loss 4.2672   LearningRate 0.0103   Epoch: 13   Global Step: 68650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:41:03,199-Speed 5530.72 samples/sec   Loss 4.1793   LearningRate 0.0103   Epoch: 13   Global Step: 68660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:05,073-Speed 5465.77 samples/sec   Loss 4.4287   LearningRate 0.0103   Epoch: 13   Global Step: 68670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:06,938-Speed 5492.68 samples/sec   Loss 4.3255   LearningRate 0.0103   Epoch: 13   Global Step: 68680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:08,793-Speed 5525.87 samples/sec   Loss 4.2230   LearningRate 0.0103   Epoch: 13   Global Step: 68690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:10,666-Speed 5467.58 samples/sec   Loss 4.2866   LearningRate 0.0103   Epoch: 13   Global Step: 68700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:12,541-Speed 5462.00 samples/sec   Loss 4.2665   LearningRate 0.0103   Epoch: 13   Global Step: 68710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:14,418-Speed 5458.30 samples/sec   Loss 4.3526   LearningRate 0.0103   Epoch: 13   Global Step: 68720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:16,282-Speed 5495.97 samples/sec   Loss 4.1213   LearningRate 0.0103   Epoch: 13   Global Step: 68730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:18,129-Speed 5545.14 samples/sec   Loss 4.1793   LearningRate 0.0103   Epoch: 13   Global Step: 68740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:19,989-Speed 5509.21 samples/sec   Loss 4.1601   LearningRate 0.0103   Epoch: 13   Global Step: 68750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:21,831-Speed 5560.15 samples/sec   Loss 4.3055   LearningRate 0.0103   Epoch: 13   Global Step: 68760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:41:23,688-Speed 5516.99 samples/sec   Loss 4.2166   LearningRate 0.0103   Epoch: 13   Global Step: 68770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:41:25,543-Speed 5524.00 samples/sec   Loss 4.2072   LearningRate 0.0102   Epoch: 13   Global Step: 68780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:27,392-Speed 5539.95 samples/sec   Loss 4.1311   LearningRate 0.0102   Epoch: 13   Global Step: 68790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:29,242-Speed 5535.52 samples/sec   Loss 4.1757   LearningRate 0.0102   Epoch: 13   Global Step: 68800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:31,104-Speed 5503.04 samples/sec   Loss 4.3196   LearningRate 0.0102   Epoch: 13   Global Step: 68810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:41:32,959-Speed 5520.10 samples/sec   Loss 4.2137   LearningRate 0.0102   Epoch: 13   Global Step: 68820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:41:34,798-Speed 5570.11 samples/sec   Loss 4.2094   LearningRate 0.0102   Epoch: 13   Global Step: 68830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:41:36,663-Speed 5492.99 samples/sec   Loss 4.1799   LearningRate 0.0102   Epoch: 13   Global Step: 68840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:41:38,510-Speed 5547.54 samples/sec   Loss 4.2154   LearningRate 0.0102   Epoch: 13   Global Step: 68850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:41:40,392-Speed 5442.26 samples/sec   Loss 4.1803   LearningRate 0.0102   Epoch: 13   Global Step: 68860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:41:42,241-Speed 5539.10 samples/sec   Loss 4.2758   LearningRate 0.0102   Epoch: 13   Global Step: 68870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:41:44,085-Speed 5558.40 samples/sec   Loss 4.1931   LearningRate 0.0102   Epoch: 13   Global Step: 68880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:41:45,933-Speed 5542.08 samples/sec   Loss 4.1708   LearningRate 0.0102   Epoch: 13   Global Step: 68890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:41:47,811-Speed 5455.32 samples/sec   Loss 4.2109   LearningRate 0.0102   Epoch: 13   Global Step: 68900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:41:49,759-Speed 5258.28 samples/sec   Loss 4.2044   LearningRate 0.0102   Epoch: 13   Global Step: 68910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:51,614-Speed 5521.01 samples/sec   Loss 4.2170   LearningRate 0.0102   Epoch: 13   Global Step: 68920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:53,463-Speed 5543.59 samples/sec   Loss 4.2097   LearningRate 0.0102   Epoch: 13   Global Step: 68930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:55,311-Speed 5543.31 samples/sec   Loss 4.2076   LearningRate 0.0101   Epoch: 13   Global Step: 68940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:57,158-Speed 5546.40 samples/sec   Loss 4.2144   LearningRate 0.0101   Epoch: 13   Global Step: 68950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:41:59,009-Speed 5533.17 samples/sec   Loss 4.3838   LearningRate 0.0101   Epoch: 13   Global Step: 68960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:42:00,884-Speed 5463.06 samples/sec   Loss 4.1461   LearningRate 0.0101   Epoch: 13   Global Step: 68970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:42:02,752-Speed 5483.62 samples/sec   Loss 4.2820   LearningRate 0.0101   Epoch: 13   Global Step: 68980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:42:04,643-Speed 5420.24 samples/sec   Loss 4.3020   LearningRate 0.0101   Epoch: 13   Global Step: 68990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:42:06,493-Speed 5537.20 samples/sec   Loss 4.2194   LearningRate 0.0101   Epoch: 13   Global Step: 69000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:42:08,338-Speed 5551.57 samples/sec   Loss 4.2413   LearningRate 0.0101   Epoch: 13   Global Step: 69010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:10,197-Speed 5511.33 samples/sec   Loss 4.2648   LearningRate 0.0101   Epoch: 13   Global Step: 69020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:12,055-Speed 5512.81 samples/sec   Loss 4.2923   LearningRate 0.0101   Epoch: 13   Global Step: 69030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:13,990-Speed 5293.42 samples/sec   Loss 4.3378   LearningRate 0.0101   Epoch: 13   Global Step: 69040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:15,847-Speed 5516.71 samples/sec   Loss 4.1995   LearningRate 0.0101   Epoch: 13   Global Step: 69050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:17,709-Speed 5498.58 samples/sec   Loss 4.2897   LearningRate 0.0101   Epoch: 13   Global Step: 69060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:19,574-Speed 5493.43 samples/sec   Loss 4.0791   LearningRate 0.0101   Epoch: 13   Global Step: 69070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:21,420-Speed 5551.56 samples/sec   Loss 4.2794   LearningRate 0.0101   Epoch: 13   Global Step: 69080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:23,291-Speed 5472.89 samples/sec   Loss 4.1036   LearningRate 0.0101   Epoch: 13   Global Step: 69090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:25,140-Speed 5543.10 samples/sec   Loss 4.0958   LearningRate 0.0100   Epoch: 13   Global Step: 69100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:27,020-Speed 5448.13 samples/sec   Loss 4.3307   LearningRate 0.0100   Epoch: 13   Global Step: 69110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:28,866-Speed 5550.55 samples/sec   Loss 4.2130   LearningRate 0.0100   Epoch: 13   Global Step: 69120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:30,717-Speed 5534.08 samples/sec   Loss 4.2197   LearningRate 0.0100   Epoch: 13   Global Step: 69130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:32,555-Speed 5571.72 samples/sec   Loss 4.2518   LearningRate 0.0100   Epoch: 13   Global Step: 69140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:42:34,404-Speed 5540.56 samples/sec   Loss 4.2569   LearningRate 0.0100   Epoch: 13   Global Step: 69150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:42:36,261-Speed 5516.00 samples/sec   Loss 4.1214   LearningRate 0.0100   Epoch: 13   Global Step: 69160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:42:38,130-Speed 5481.61 samples/sec   Loss 4.1384   LearningRate 0.0100   Epoch: 13   Global Step: 69170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:42:39,987-Speed 5516.10 samples/sec   Loss 4.3087   LearningRate 0.0100   Epoch: 13   Global Step: 69180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:42:41,847-Speed 5505.85 samples/sec   Loss 4.2280   LearningRate 0.0100   Epoch: 13   Global Step: 69190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:42:43,710-Speed 5500.30 samples/sec   Loss 4.2971   LearningRate 0.0100   Epoch: 13   Global Step: 69200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:42:45,560-Speed 5536.25 samples/sec   Loss 4.1835   LearningRate 0.0100   Epoch: 13   Global Step: 69210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:42:47,415-Speed 5522.65 samples/sec   Loss 4.0767   LearningRate 0.0100   Epoch: 13   Global Step: 69220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:42:49,278-Speed 5499.58 samples/sec   Loss 4.3397   LearningRate 0.0100   Epoch: 13   Global Step: 69230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:42:51,124-Speed 5549.89 samples/sec   Loss 4.2451   LearningRate 0.0100   Epoch: 13   Global Step: 69240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:52,980-Speed 5518.93 samples/sec   Loss 4.2160   LearningRate 0.0100   Epoch: 13   Global Step: 69250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:54,845-Speed 5494.00 samples/sec   Loss 4.2404   LearningRate 0.0099   Epoch: 13   Global Step: 69260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:56,695-Speed 5534.92 samples/sec   Loss 4.2083   LearningRate 0.0099   Epoch: 13   Global Step: 69270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:42:58,563-Speed 5483.10 samples/sec   Loss 4.2626   LearningRate 0.0099   Epoch: 13   Global Step: 69280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:43:00,402-Speed 5571.12 samples/sec   Loss 4.2868   LearningRate 0.0099   Epoch: 13   Global Step: 69290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:43:02,284-Speed 5443.66 samples/sec   Loss 4.1740   LearningRate 0.0099   Epoch: 13   Global Step: 69300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:43:04,129-Speed 5552.19 samples/sec   Loss 4.1712   LearningRate 0.0099   Epoch: 13   Global Step: 69310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:43:05,991-Speed 5500.69 samples/sec   Loss 4.2866   LearningRate 0.0099   Epoch: 13   Global Step: 69320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:43:07,845-Speed 5527.15 samples/sec   Loss 4.1996   LearningRate 0.0099   Epoch: 13   Global Step: 69330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:43:09,706-Speed 5505.49 samples/sec   Loss 4.2680   LearningRate 0.0099   Epoch: 13   Global Step: 69340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:11,564-Speed 5512.72 samples/sec   Loss 4.2305   LearningRate 0.0099   Epoch: 13   Global Step: 69350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:13,449-Speed 5434.36 samples/sec   Loss 4.1895   LearningRate 0.0099   Epoch: 13   Global Step: 69360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:15,317-Speed 5481.66 samples/sec   Loss 4.2936   LearningRate 0.0099   Epoch: 13   Global Step: 69370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:17,171-Speed 5525.81 samples/sec   Loss 4.2570   LearningRate 0.0099   Epoch: 13   Global Step: 69380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:19,014-Speed 5559.37 samples/sec   Loss 4.1670   LearningRate 0.0099   Epoch: 13   Global Step: 69390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:20,855-Speed 5565.10 samples/sec   Loss 4.2988   LearningRate 0.0099   Epoch: 13   Global Step: 69400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:22,706-Speed 5532.84 samples/sec   Loss 4.2869   LearningRate 0.0099   Epoch: 13   Global Step: 69410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:24,580-Speed 5467.45 samples/sec   Loss 4.2454   LearningRate 0.0098   Epoch: 13   Global Step: 69420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:26,447-Speed 5486.32 samples/sec   Loss 4.1632   LearningRate 0.0098   Epoch: 13   Global Step: 69430   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-11 14:43:28,293-Speed 5548.85 samples/sec   Loss 4.1865   LearningRate 0.0098   Epoch: 13   Global Step: 69440   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-11 14:43:30,144-Speed 5534.07 samples/sec   Loss 4.1445   LearningRate 0.0098   Epoch: 13   Global Step: 69450   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-11 14:43:32,003-Speed 5511.42 samples/sec   Loss 4.3475   LearningRate 0.0098   Epoch: 13   Global Step: 69460   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-11 14:43:33,862-Speed 5510.91 samples/sec   Loss 4.0860   LearningRate 0.0098   Epoch: 13   Global Step: 69470   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-11 14:43:35,713-Speed 5535.23 samples/sec   Loss 4.2126   LearningRate 0.0098   Epoch: 13   Global Step: 69480   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-11 14:43:37,558-Speed 5550.04 samples/sec   Loss 4.2776   LearningRate 0.0098   Epoch: 13   Global Step: 69490   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-11 14:43:39,407-Speed 5541.20 samples/sec   Loss 4.1226   LearningRate 0.0098   Epoch: 13   Global Step: 69500   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-11 14:43:41,260-Speed 5527.15 samples/sec   Loss 4.1192   LearningRate 0.0098   Epoch: 13   Global Step: 69510   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-11 14:43:43,110-Speed 5538.78 samples/sec   Loss 4.2289   LearningRate 0.0098   Epoch: 13   Global Step: 69520   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-11 14:43:44,967-Speed 5515.45 samples/sec   Loss 4.0615   LearningRate 0.0098   Epoch: 13   Global Step: 69530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:46,828-Speed 5505.00 samples/sec   Loss 4.2024   LearningRate 0.0098   Epoch: 13   Global Step: 69540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:48,688-Speed 5510.37 samples/sec   Loss 4.0860   LearningRate 0.0098   Epoch: 13   Global Step: 69550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:50,545-Speed 5515.45 samples/sec   Loss 4.1162   LearningRate 0.0098   Epoch: 13   Global Step: 69560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:52,401-Speed 5517.32 samples/sec   Loss 4.2755   LearningRate 0.0098   Epoch: 13   Global Step: 69570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:54,266-Speed 5493.92 samples/sec   Loss 4.2076   LearningRate 0.0097   Epoch: 13   Global Step: 69580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:56,120-Speed 5525.10 samples/sec   Loss 4.2908   LearningRate 0.0097   Epoch: 13   Global Step: 69590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:57,981-Speed 5504.47 samples/sec   Loss 4.1072   LearningRate 0.0097   Epoch: 13   Global Step: 69600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:43:59,846-Speed 5491.57 samples/sec   Loss 4.3328   LearningRate 0.0097   Epoch: 13   Global Step: 69610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:44:01,718-Speed 5472.49 samples/sec   Loss 4.1736   LearningRate 0.0097   Epoch: 13   Global Step: 69620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:44:03,584-Speed 5490.22 samples/sec   Loss 4.1945   LearningRate 0.0097   Epoch: 13   Global Step: 69630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:05,453-Speed 5483.07 samples/sec   Loss 4.1415   LearningRate 0.0097   Epoch: 13   Global Step: 69640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:07,300-Speed 5544.28 samples/sec   Loss 4.1176   LearningRate 0.0097   Epoch: 13   Global Step: 69650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:09,143-Speed 5559.60 samples/sec   Loss 4.2343   LearningRate 0.0097   Epoch: 13   Global Step: 69660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:10,991-Speed 5543.09 samples/sec   Loss 4.3173   LearningRate 0.0097   Epoch: 13   Global Step: 69670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:12,878-Speed 5427.09 samples/sec   Loss 4.0894   LearningRate 0.0097   Epoch: 13   Global Step: 69680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:14,747-Speed 5481.34 samples/sec   Loss 4.1479   LearningRate 0.0097   Epoch: 13   Global Step: 69690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:16,609-Speed 5503.14 samples/sec   Loss 4.0978   LearningRate 0.0097   Epoch: 13   Global Step: 69700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:18,475-Speed 5488.81 samples/sec   Loss 4.1115   LearningRate 0.0097   Epoch: 13   Global Step: 69710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:20,319-Speed 5555.83 samples/sec   Loss 4.2069   LearningRate 0.0097   Epoch: 13   Global Step: 69720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:22,165-Speed 5548.39 samples/sec   Loss 3.9989   LearningRate 0.0097   Epoch: 13   Global Step: 69730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:44:24,017-Speed 5530.43 samples/sec   Loss 4.2375   LearningRate 0.0096   Epoch: 13   Global Step: 69740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:44:25,891-Speed 5466.80 samples/sec   Loss 4.1716   LearningRate 0.0096   Epoch: 13   Global Step: 69750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:44:27,822-Speed 5306.63 samples/sec   Loss 4.3268   LearningRate 0.0096   Epoch: 13   Global Step: 69760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:44:29,666-Speed 5554.85 samples/sec   Loss 4.2204   LearningRate 0.0096   Epoch: 13   Global Step: 69770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:44:31,531-Speed 5491.75 samples/sec   Loss 4.1152   LearningRate 0.0096   Epoch: 13   Global Step: 69780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:44:33,380-Speed 5540.38 samples/sec   Loss 4.0092   LearningRate 0.0096   Epoch: 13   Global Step: 69790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:44:35,236-Speed 5521.69 samples/sec   Loss 4.1637   LearningRate 0.0096   Epoch: 13   Global Step: 69800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:44:37,115-Speed 5450.84 samples/sec   Loss 4.2529   LearningRate 0.0096   Epoch: 13   Global Step: 69810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:44:38,977-Speed 5502.00 samples/sec   Loss 4.0252   LearningRate 0.0096   Epoch: 13   Global Step: 69820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:44:40,866-Speed 5422.92 samples/sec   Loss 4.2303   LearningRate 0.0096   Epoch: 13   Global Step: 69830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:44:42,747-Speed 5443.75 samples/sec   Loss 4.0350   LearningRate 0.0096   Epoch: 13   Global Step: 69840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:44,604-Speed 5515.86 samples/sec   Loss 4.3097   LearningRate 0.0096   Epoch: 13   Global Step: 69850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:46,463-Speed 5510.78 samples/sec   Loss 4.1898   LearningRate 0.0096   Epoch: 13   Global Step: 69860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:48,309-Speed 5550.49 samples/sec   Loss 4.2345   LearningRate 0.0096   Epoch: 13   Global Step: 69870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:50,152-Speed 5558.84 samples/sec   Loss 4.2592   LearningRate 0.0096   Epoch: 13   Global Step: 69880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:52,011-Speed 5510.23 samples/sec   Loss 4.1824   LearningRate 0.0096   Epoch: 13   Global Step: 69890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:53,862-Speed 5535.28 samples/sec   Loss 4.0838   LearningRate 0.0095   Epoch: 13   Global Step: 69900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:55,712-Speed 5537.96 samples/sec   Loss 4.0118   LearningRate 0.0095   Epoch: 13   Global Step: 69910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:57,549-Speed 5575.32 samples/sec   Loss 4.0893   LearningRate 0.0095   Epoch: 13   Global Step: 69920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:44:59,396-Speed 5545.69 samples/sec   Loss 4.1242   LearningRate 0.0095   Epoch: 13   Global Step: 69930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:45:01,239-Speed 5558.11 samples/sec   Loss 4.0218   LearningRate 0.0095   Epoch: 13   Global Step: 69940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:45:03,102-Speed 5497.92 samples/sec   Loss 4.1816   LearningRate 0.0095   Epoch: 13   Global Step: 69950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:45:04,970-Speed 5486.65 samples/sec   Loss 4.1826   LearningRate 0.0095   Epoch: 13   Global Step: 69960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:45:06,848-Speed 5453.53 samples/sec   Loss 4.1822   LearningRate 0.0095   Epoch: 13   Global Step: 69970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:45:08,708-Speed 5506.80 samples/sec   Loss 4.1381   LearningRate 0.0095   Epoch: 13   Global Step: 69980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:45:10,559-Speed 5536.95 samples/sec   Loss 4.1897   LearningRate 0.0095   Epoch: 13   Global Step: 69990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:45:12,426-Speed 5485.21 samples/sec   Loss 4.1523   LearningRate 0.0095   Epoch: 13   Global Step: 70000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:45:39,467-[lfw][70000]XNorm: 23.398558
Training: 2022-04-11 14:45:39,468-[lfw][70000]Accuracy-Flip: 0.99783+-0.00259
Training: 2022-04-11 14:45:39,469-[lfw][70000]Accuracy-Highest: 0.99817
Training: 2022-04-11 14:46:12,258-[cfp_fp][70000]XNorm: 21.524082
Training: 2022-04-11 14:46:12,260-[cfp_fp][70000]Accuracy-Flip: 0.98057+-0.00636
Training: 2022-04-11 14:46:12,260-[cfp_fp][70000]Accuracy-Highest: 0.98057
Training: 2022-04-11 14:46:40,150-[agedb_30][70000]XNorm: 23.294387
Training: 2022-04-11 14:46:40,151-[agedb_30][70000]Accuracy-Flip: 0.98050+-0.00658
Training: 2022-04-11 14:46:40,151-[agedb_30][70000]Accuracy-Highest: 0.98050
Training: 2022-04-11 14:46:42,035-Speed 114.28 samples/sec   Loss 4.1100   LearningRate 0.0095   Epoch: 13   Global Step: 70010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:46:43,919-Speed 5438.38 samples/sec   Loss 4.1549   LearningRate 0.0095   Epoch: 13   Global Step: 70020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:46:45,761-Speed 5561.16 samples/sec   Loss 4.1915   LearningRate 0.0095   Epoch: 13   Global Step: 70030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:46:47,605-Speed 5555.43 samples/sec   Loss 4.0625   LearningRate 0.0095   Epoch: 13   Global Step: 70040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:46:49,491-Speed 5433.96 samples/sec   Loss 4.0821   LearningRate 0.0095   Epoch: 13   Global Step: 70050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:46:51,349-Speed 5515.47 samples/sec   Loss 4.1457   LearningRate 0.0095   Epoch: 13   Global Step: 70060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:46:53,269-Speed 5335.94 samples/sec   Loss 4.2263   LearningRate 0.0094   Epoch: 13   Global Step: 70070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:46:55,119-Speed 5540.88 samples/sec   Loss 4.0287   LearningRate 0.0094   Epoch: 13   Global Step: 70080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:46:57,007-Speed 5425.62 samples/sec   Loss 4.2500   LearningRate 0.0094   Epoch: 13   Global Step: 70090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:46:58,851-Speed 5556.54 samples/sec   Loss 4.2819   LearningRate 0.0094   Epoch: 13   Global Step: 70100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:47:00,760-Speed 5369.80 samples/sec   Loss 4.1949   LearningRate 0.0094   Epoch: 13   Global Step: 70110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:47:02,654-Speed 5406.48 samples/sec   Loss 4.0872   LearningRate 0.0094   Epoch: 13   Global Step: 70120   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:47:04,570-Speed 5348.94 samples/sec   Loss 4.0930   LearningRate 0.0094   Epoch: 13   Global Step: 70130   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:47:06,457-Speed 5428.44 samples/sec   Loss 4.0324   LearningRate 0.0094   Epoch: 13   Global Step: 70140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:08,357-Speed 5392.24 samples/sec   Loss 4.1960   LearningRate 0.0094   Epoch: 13   Global Step: 70150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:10,231-Speed 5471.39 samples/sec   Loss 4.1251   LearningRate 0.0094   Epoch: 13   Global Step: 70160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:12,157-Speed 5317.04 samples/sec   Loss 4.2765   LearningRate 0.0094   Epoch: 13   Global Step: 70170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:14,011-Speed 5528.12 samples/sec   Loss 4.1023   LearningRate 0.0094   Epoch: 13   Global Step: 70180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:15,916-Speed 5377.26 samples/sec   Loss 4.1908   LearningRate 0.0094   Epoch: 13   Global Step: 70190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:17,760-Speed 5556.96 samples/sec   Loss 4.1640   LearningRate 0.0094   Epoch: 13   Global Step: 70200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:19,606-Speed 5546.78 samples/sec   Loss 4.1074   LearningRate 0.0094   Epoch: 13   Global Step: 70210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:21,482-Speed 5462.70 samples/sec   Loss 4.2014   LearningRate 0.0094   Epoch: 13   Global Step: 70220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:23,332-Speed 5538.22 samples/sec   Loss 4.1088   LearningRate 0.0093   Epoch: 13   Global Step: 70230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:25,175-Speed 5559.70 samples/sec   Loss 4.2140   LearningRate 0.0093   Epoch: 13   Global Step: 70240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:47:27,017-Speed 5559.25 samples/sec   Loss 4.1887   LearningRate 0.0093   Epoch: 13   Global Step: 70250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:47:28,863-Speed 5549.49 samples/sec   Loss 4.1912   LearningRate 0.0093   Epoch: 13   Global Step: 70260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:30,728-Speed 5492.71 samples/sec   Loss 4.2086   LearningRate 0.0093   Epoch: 13   Global Step: 70270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:32,566-Speed 5574.37 samples/sec   Loss 4.1652   LearningRate 0.0093   Epoch: 13   Global Step: 70280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:34,407-Speed 5562.72 samples/sec   Loss 4.0135   LearningRate 0.0093   Epoch: 13   Global Step: 70290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:36,255-Speed 5544.79 samples/sec   Loss 3.9820   LearningRate 0.0093   Epoch: 13   Global Step: 70300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:38,099-Speed 5555.86 samples/sec   Loss 4.1510   LearningRate 0.0093   Epoch: 13   Global Step: 70310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:39,945-Speed 5548.42 samples/sec   Loss 4.1205   LearningRate 0.0093   Epoch: 13   Global Step: 70320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:41,798-Speed 5529.37 samples/sec   Loss 4.0627   LearningRate 0.0093   Epoch: 13   Global Step: 70330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:43,641-Speed 5556.10 samples/sec   Loss 4.0113   LearningRate 0.0093   Epoch: 13   Global Step: 70340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:45,491-Speed 5537.39 samples/sec   Loss 4.2372   LearningRate 0.0093   Epoch: 13   Global Step: 70350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:47,348-Speed 5519.73 samples/sec   Loss 4.2120   LearningRate 0.0093   Epoch: 13   Global Step: 70360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:49,198-Speed 5536.06 samples/sec   Loss 4.0251   LearningRate 0.0093   Epoch: 13   Global Step: 70370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:51,054-Speed 5519.94 samples/sec   Loss 4.1609   LearningRate 0.0093   Epoch: 13   Global Step: 70380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:52,960-Speed 5372.94 samples/sec   Loss 4.1703   LearningRate 0.0093   Epoch: 13   Global Step: 70390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:54,808-Speed 5543.95 samples/sec   Loss 4.1351   LearningRate 0.0092   Epoch: 13   Global Step: 70400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:56,649-Speed 5563.95 samples/sec   Loss 4.1736   LearningRate 0.0092   Epoch: 13   Global Step: 70410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:47:58,495-Speed 5548.34 samples/sec   Loss 4.2027   LearningRate 0.0092   Epoch: 13   Global Step: 70420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:00,353-Speed 5515.63 samples/sec   Loss 4.2172   LearningRate 0.0092   Epoch: 13   Global Step: 70430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:02,212-Speed 5508.05 samples/sec   Loss 3.9642   LearningRate 0.0092   Epoch: 13   Global Step: 70440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:04,074-Speed 5501.46 samples/sec   Loss 4.0505   LearningRate 0.0092   Epoch: 13   Global Step: 70450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:05,940-Speed 5490.78 samples/sec   Loss 4.1250   LearningRate 0.0092   Epoch: 13   Global Step: 70460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:48:07,775-Speed 5582.42 samples/sec   Loss 4.2323   LearningRate 0.0092   Epoch: 13   Global Step: 70470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:09,625-Speed 5538.43 samples/sec   Loss 3.9639   LearningRate 0.0092   Epoch: 13   Global Step: 70480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:11,590-Speed 5211.60 samples/sec   Loss 3.9378   LearningRate 0.0092   Epoch: 13   Global Step: 70490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:13,547-Speed 5235.04 samples/sec   Loss 4.0912   LearningRate 0.0092   Epoch: 13   Global Step: 70500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:15,439-Speed 5414.74 samples/sec   Loss 4.1550   LearningRate 0.0092   Epoch: 13   Global Step: 70510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:17,286-Speed 5545.38 samples/sec   Loss 4.1659   LearningRate 0.0092   Epoch: 13   Global Step: 70520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:19,132-Speed 5550.26 samples/sec   Loss 3.9466   LearningRate 0.0092   Epoch: 13   Global Step: 70530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:20,976-Speed 5554.25 samples/sec   Loss 4.1833   LearningRate 0.0092   Epoch: 13   Global Step: 70540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:22,821-Speed 5551.40 samples/sec   Loss 4.1218   LearningRate 0.0092   Epoch: 13   Global Step: 70550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:24,670-Speed 5542.44 samples/sec   Loss 4.1293   LearningRate 0.0092   Epoch: 13   Global Step: 70560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:26,557-Speed 5428.92 samples/sec   Loss 4.2258   LearningRate 0.0091   Epoch: 13   Global Step: 70570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:28,415-Speed 5512.55 samples/sec   Loss 4.0875   LearningRate 0.0091   Epoch: 13   Global Step: 70580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:30,285-Speed 5480.75 samples/sec   Loss 4.0535   LearningRate 0.0091   Epoch: 13   Global Step: 70590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:32,128-Speed 5556.25 samples/sec   Loss 4.1828   LearningRate 0.0091   Epoch: 13   Global Step: 70600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:33,975-Speed 5547.50 samples/sec   Loss 4.2042   LearningRate 0.0091   Epoch: 13   Global Step: 70610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:35,826-Speed 5532.64 samples/sec   Loss 4.1632   LearningRate 0.0091   Epoch: 13   Global Step: 70620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:37,673-Speed 5547.40 samples/sec   Loss 3.9997   LearningRate 0.0091   Epoch: 13   Global Step: 70630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:39,519-Speed 5548.82 samples/sec   Loss 4.1292   LearningRate 0.0091   Epoch: 13   Global Step: 70640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:41,366-Speed 5547.39 samples/sec   Loss 4.0502   LearningRate 0.0091   Epoch: 13   Global Step: 70650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:43,208-Speed 5558.88 samples/sec   Loss 4.0862   LearningRate 0.0091   Epoch: 13   Global Step: 70660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:45,047-Speed 5569.89 samples/sec   Loss 4.2399   LearningRate 0.0091   Epoch: 13   Global Step: 70670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:48:46,891-Speed 5557.70 samples/sec   Loss 4.0451   LearningRate 0.0091   Epoch: 13   Global Step: 70680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:48:48,750-Speed 5508.45 samples/sec   Loss 4.1488   LearningRate 0.0091   Epoch: 13   Global Step: 70690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:48:50,602-Speed 5533.36 samples/sec   Loss 3.9720   LearningRate 0.0091   Epoch: 13   Global Step: 70700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:48:52,452-Speed 5537.64 samples/sec   Loss 4.1419   LearningRate 0.0091   Epoch: 13   Global Step: 70710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:48:54,291-Speed 5570.51 samples/sec   Loss 4.2247   LearningRate 0.0091   Epoch: 13   Global Step: 70720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:56,135-Speed 5554.23 samples/sec   Loss 4.2335   LearningRate 0.0090   Epoch: 13   Global Step: 70730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:57,981-Speed 5549.60 samples/sec   Loss 4.1420   LearningRate 0.0090   Epoch: 13   Global Step: 70740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:48:59,836-Speed 5520.31 samples/sec   Loss 4.0596   LearningRate 0.0090   Epoch: 13   Global Step: 70750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:01,698-Speed 5502.46 samples/sec   Loss 4.1340   LearningRate 0.0090   Epoch: 13   Global Step: 70760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:03,580-Speed 5444.73 samples/sec   Loss 4.2798   LearningRate 0.0090   Epoch: 13   Global Step: 70770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:05,445-Speed 5490.50 samples/sec   Loss 4.0662   LearningRate 0.0090   Epoch: 13   Global Step: 70780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:07,293-Speed 5544.24 samples/sec   Loss 4.1381   LearningRate 0.0090   Epoch: 13   Global Step: 70790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:09,138-Speed 5552.73 samples/sec   Loss 4.1456   LearningRate 0.0090   Epoch: 13   Global Step: 70800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:11,085-Speed 5262.20 samples/sec   Loss 4.1010   LearningRate 0.0090   Epoch: 13   Global Step: 70810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:22,063-Speed 932.84 samples/sec   Loss 3.4596   LearningRate 0.0090   Epoch: 14   Global Step: 70820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:49:23,940-Speed 5460.30 samples/sec   Loss 3.2745   LearningRate 0.0090   Epoch: 14   Global Step: 70830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:49:25,829-Speed 5420.44 samples/sec   Loss 3.1980   LearningRate 0.0090   Epoch: 14   Global Step: 70840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:49:27,704-Speed 5464.05 samples/sec   Loss 3.2938   LearningRate 0.0090   Epoch: 14   Global Step: 70850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:49:29,569-Speed 5492.93 samples/sec   Loss 3.2384   LearningRate 0.0090   Epoch: 14   Global Step: 70860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:49:31,412-Speed 5557.56 samples/sec   Loss 3.3431   LearningRate 0.0090   Epoch: 14   Global Step: 70870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:49:33,270-Speed 5512.22 samples/sec   Loss 3.2982   LearningRate 0.0090   Epoch: 14   Global Step: 70880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:49:35,146-Speed 5460.87 samples/sec   Loss 3.3008   LearningRate 0.0090   Epoch: 14   Global Step: 70890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:49:37,008-Speed 5500.12 samples/sec   Loss 3.2277   LearningRate 0.0089   Epoch: 14   Global Step: 70900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:49:38,850-Speed 5562.67 samples/sec   Loss 3.2129   LearningRate 0.0089   Epoch: 14   Global Step: 70910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:49:40,711-Speed 5502.99 samples/sec   Loss 3.4601   LearningRate 0.0089   Epoch: 14   Global Step: 70920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:42,581-Speed 5478.88 samples/sec   Loss 3.3423   LearningRate 0.0089   Epoch: 14   Global Step: 70930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:44,440-Speed 5510.83 samples/sec   Loss 3.3486   LearningRate 0.0089   Epoch: 14   Global Step: 70940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:46,290-Speed 5537.51 samples/sec   Loss 3.3942   LearningRate 0.0089   Epoch: 14   Global Step: 70950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:48,134-Speed 5553.82 samples/sec   Loss 3.2857   LearningRate 0.0089   Epoch: 14   Global Step: 70960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:49,983-Speed 5540.65 samples/sec   Loss 3.2967   LearningRate 0.0089   Epoch: 14   Global Step: 70970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:51,849-Speed 5489.96 samples/sec   Loss 3.2378   LearningRate 0.0089   Epoch: 14   Global Step: 70980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:53,705-Speed 5519.56 samples/sec   Loss 3.3119   LearningRate 0.0089   Epoch: 14   Global Step: 70990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:55,555-Speed 5538.41 samples/sec   Loss 3.2968   LearningRate 0.0089   Epoch: 14   Global Step: 71000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:57,405-Speed 5536.46 samples/sec   Loss 3.3233   LearningRate 0.0089   Epoch: 14   Global Step: 71010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:49:59,252-Speed 5545.21 samples/sec   Loss 3.4551   LearningRate 0.0089   Epoch: 14   Global Step: 71020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:50:01,150-Speed 5396.57 samples/sec   Loss 3.3144   LearningRate 0.0089   Epoch: 14   Global Step: 71030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:50:03,029-Speed 5451.88 samples/sec   Loss 3.3685   LearningRate 0.0089   Epoch: 14   Global Step: 71040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:04,880-Speed 5533.07 samples/sec   Loss 3.2748   LearningRate 0.0089   Epoch: 14   Global Step: 71050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:06,752-Speed 5474.82 samples/sec   Loss 3.3467   LearningRate 0.0089   Epoch: 14   Global Step: 71060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:08,596-Speed 5555.25 samples/sec   Loss 3.2811   LearningRate 0.0088   Epoch: 14   Global Step: 71070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:10,445-Speed 5541.27 samples/sec   Loss 3.2737   LearningRate 0.0088   Epoch: 14   Global Step: 71080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:12,352-Speed 5372.12 samples/sec   Loss 3.3778   LearningRate 0.0088   Epoch: 14   Global Step: 71090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:14,207-Speed 5520.31 samples/sec   Loss 3.4504   LearningRate 0.0088   Epoch: 14   Global Step: 71100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:16,077-Speed 5479.54 samples/sec   Loss 3.2747   LearningRate 0.0088   Epoch: 14   Global Step: 71110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:17,949-Speed 5469.89 samples/sec   Loss 3.3701   LearningRate 0.0088   Epoch: 14   Global Step: 71120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:19,791-Speed 5561.88 samples/sec   Loss 3.3549   LearningRate 0.0088   Epoch: 14   Global Step: 71130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:21,640-Speed 5542.07 samples/sec   Loss 3.4475   LearningRate 0.0088   Epoch: 14   Global Step: 71140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:50:23,472-Speed 5589.13 samples/sec   Loss 3.4578   LearningRate 0.0088   Epoch: 14   Global Step: 71150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:25,319-Speed 5547.94 samples/sec   Loss 3.4805   LearningRate 0.0088   Epoch: 14   Global Step: 71160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:27,174-Speed 5522.77 samples/sec   Loss 3.4459   LearningRate 0.0088   Epoch: 14   Global Step: 71170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:29,028-Speed 5526.75 samples/sec   Loss 3.3835   LearningRate 0.0088   Epoch: 14   Global Step: 71180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:30,881-Speed 5527.74 samples/sec   Loss 3.2046   LearningRate 0.0088   Epoch: 14   Global Step: 71190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:32,728-Speed 5546.33 samples/sec   Loss 3.3630   LearningRate 0.0088   Epoch: 14   Global Step: 71200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:34,579-Speed 5532.71 samples/sec   Loss 3.3930   LearningRate 0.0088   Epoch: 14   Global Step: 71210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:36,428-Speed 5539.63 samples/sec   Loss 3.4309   LearningRate 0.0088   Epoch: 14   Global Step: 71220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:38,280-Speed 5532.92 samples/sec   Loss 3.4181   LearningRate 0.0088   Epoch: 14   Global Step: 71230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:40,124-Speed 5555.40 samples/sec   Loss 3.4776   LearningRate 0.0087   Epoch: 14   Global Step: 71240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:41,971-Speed 5545.68 samples/sec   Loss 3.4029   LearningRate 0.0087   Epoch: 14   Global Step: 71250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:50:43,821-Speed 5535.92 samples/sec   Loss 3.3877   LearningRate 0.0087   Epoch: 14   Global Step: 71260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:50:45,660-Speed 5572.07 samples/sec   Loss 3.4546   LearningRate 0.0087   Epoch: 14   Global Step: 71270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:47,505-Speed 5551.78 samples/sec   Loss 3.3725   LearningRate 0.0087   Epoch: 14   Global Step: 71280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:49,359-Speed 5524.82 samples/sec   Loss 3.4579   LearningRate 0.0087   Epoch: 14   Global Step: 71290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:51,205-Speed 5550.36 samples/sec   Loss 3.5022   LearningRate 0.0087   Epoch: 14   Global Step: 71300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:53,061-Speed 5520.12 samples/sec   Loss 3.3202   LearningRate 0.0087   Epoch: 14   Global Step: 71310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:54,928-Speed 5485.02 samples/sec   Loss 3.3849   LearningRate 0.0087   Epoch: 14   Global Step: 71320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:56,778-Speed 5537.28 samples/sec   Loss 3.4439   LearningRate 0.0087   Epoch: 14   Global Step: 71330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:50:58,626-Speed 5543.13 samples/sec   Loss 3.4112   LearningRate 0.0087   Epoch: 14   Global Step: 71340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:51:00,509-Speed 5441.90 samples/sec   Loss 3.3757   LearningRate 0.0087   Epoch: 14   Global Step: 71350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:51:02,385-Speed 5459.47 samples/sec   Loss 3.4963   LearningRate 0.0087   Epoch: 14   Global Step: 71360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:51:04,260-Speed 5461.72 samples/sec   Loss 3.5169   LearningRate 0.0087   Epoch: 14   Global Step: 71370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:06,153-Speed 5414.97 samples/sec   Loss 3.5538   LearningRate 0.0087   Epoch: 14   Global Step: 71380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:08,005-Speed 5529.39 samples/sec   Loss 3.3298   LearningRate 0.0087   Epoch: 14   Global Step: 71390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:09,859-Speed 5526.30 samples/sec   Loss 3.5031   LearningRate 0.0087   Epoch: 14   Global Step: 71400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:11,708-Speed 5538.94 samples/sec   Loss 3.4210   LearningRate 0.0086   Epoch: 14   Global Step: 71410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:51:13,566-Speed 5514.52 samples/sec   Loss 3.4885   LearningRate 0.0086   Epoch: 14   Global Step: 71420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:51:15,426-Speed 5507.41 samples/sec   Loss 3.5095   LearningRate 0.0086   Epoch: 14   Global Step: 71430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:51:17,294-Speed 5483.21 samples/sec   Loss 3.4501   LearningRate 0.0086   Epoch: 14   Global Step: 71440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:51:19,145-Speed 5534.11 samples/sec   Loss 3.4644   LearningRate 0.0086   Epoch: 14   Global Step: 71450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:51:21,010-Speed 5492.75 samples/sec   Loss 3.4844   LearningRate 0.0086   Epoch: 14   Global Step: 71460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:51:22,910-Speed 5393.83 samples/sec   Loss 3.4220   LearningRate 0.0086   Epoch: 14   Global Step: 71470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:51:24,813-Speed 5382.07 samples/sec   Loss 3.5002   LearningRate 0.0086   Epoch: 14   Global Step: 71480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:51:26,710-Speed 5398.64 samples/sec   Loss 3.4496   LearningRate 0.0086   Epoch: 14   Global Step: 71490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:51:28,558-Speed 5546.38 samples/sec   Loss 3.4282   LearningRate 0.0086   Epoch: 14   Global Step: 71500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:51:30,402-Speed 5552.57 samples/sec   Loss 3.5054   LearningRate 0.0086   Epoch: 14   Global Step: 71510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:32,263-Speed 5506.85 samples/sec   Loss 3.4369   LearningRate 0.0086   Epoch: 14   Global Step: 71520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:34,108-Speed 5551.58 samples/sec   Loss 3.3452   LearningRate 0.0086   Epoch: 14   Global Step: 71530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:35,971-Speed 5499.54 samples/sec   Loss 3.4010   LearningRate 0.0086   Epoch: 14   Global Step: 71540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:37,820-Speed 5539.87 samples/sec   Loss 3.5369   LearningRate 0.0086   Epoch: 14   Global Step: 71550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:39,668-Speed 5541.47 samples/sec   Loss 3.5011   LearningRate 0.0086   Epoch: 14   Global Step: 71560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:41,532-Speed 5499.19 samples/sec   Loss 3.5728   LearningRate 0.0086   Epoch: 14   Global Step: 71570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:43,387-Speed 5524.01 samples/sec   Loss 3.4977   LearningRate 0.0086   Epoch: 14   Global Step: 71580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:45,249-Speed 5500.24 samples/sec   Loss 3.6371   LearningRate 0.0085   Epoch: 14   Global Step: 71590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:47,125-Speed 5461.72 samples/sec   Loss 3.5448   LearningRate 0.0085   Epoch: 14   Global Step: 71600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:48,976-Speed 5533.42 samples/sec   Loss 3.6053   LearningRate 0.0085   Epoch: 14   Global Step: 71610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:50,881-Speed 5378.66 samples/sec   Loss 3.5322   LearningRate 0.0085   Epoch: 14   Global Step: 71620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:52,760-Speed 5451.57 samples/sec   Loss 3.5415   LearningRate 0.0085   Epoch: 14   Global Step: 71630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:54,605-Speed 5551.31 samples/sec   Loss 3.5069   LearningRate 0.0085   Epoch: 14   Global Step: 71640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:56,450-Speed 5551.89 samples/sec   Loss 3.3895   LearningRate 0.0085   Epoch: 14   Global Step: 71650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:51:58,299-Speed 5539.73 samples/sec   Loss 3.5727   LearningRate 0.0085   Epoch: 14   Global Step: 71660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:52:00,146-Speed 5546.32 samples/sec   Loss 3.4826   LearningRate 0.0085   Epoch: 14   Global Step: 71670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:02,017-Speed 5474.57 samples/sec   Loss 3.6146   LearningRate 0.0085   Epoch: 14   Global Step: 71680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:03,869-Speed 5533.35 samples/sec   Loss 3.6037   LearningRate 0.0085   Epoch: 14   Global Step: 71690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:05,718-Speed 5538.40 samples/sec   Loss 3.5380   LearningRate 0.0085   Epoch: 14   Global Step: 71700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:07,568-Speed 5537.84 samples/sec   Loss 3.4267   LearningRate 0.0085   Epoch: 14   Global Step: 71710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:09,414-Speed 5550.62 samples/sec   Loss 3.6074   LearningRate 0.0085   Epoch: 14   Global Step: 71720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:11,265-Speed 5534.55 samples/sec   Loss 3.6630   LearningRate 0.0085   Epoch: 14   Global Step: 71730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:13,119-Speed 5524.35 samples/sec   Loss 3.5692   LearningRate 0.0085   Epoch: 14   Global Step: 71740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:14,970-Speed 5532.92 samples/sec   Loss 3.4892   LearningRate 0.0085   Epoch: 14   Global Step: 71750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:16,816-Speed 5550.71 samples/sec   Loss 3.6916   LearningRate 0.0084   Epoch: 14   Global Step: 71760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:18,678-Speed 5502.86 samples/sec   Loss 3.5221   LearningRate 0.0084   Epoch: 14   Global Step: 71770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:52:20,522-Speed 5553.05 samples/sec   Loss 3.4601   LearningRate 0.0084   Epoch: 14   Global Step: 71780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:52:22,382-Speed 5507.99 samples/sec   Loss 3.5765   LearningRate 0.0084   Epoch: 14   Global Step: 71790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:24,243-Speed 5505.80 samples/sec   Loss 3.6304   LearningRate 0.0084   Epoch: 14   Global Step: 71800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:26,113-Speed 5477.93 samples/sec   Loss 3.4878   LearningRate 0.0084   Epoch: 14   Global Step: 71810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:27,973-Speed 5505.54 samples/sec   Loss 3.5952   LearningRate 0.0084   Epoch: 14   Global Step: 71820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:29,823-Speed 5538.66 samples/sec   Loss 3.5187   LearningRate 0.0084   Epoch: 14   Global Step: 71830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:31,666-Speed 5561.20 samples/sec   Loss 3.5926   LearningRate 0.0084   Epoch: 14   Global Step: 71840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:33,512-Speed 5547.81 samples/sec   Loss 3.6140   LearningRate 0.0084   Epoch: 14   Global Step: 71850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:35,396-Speed 5437.32 samples/sec   Loss 3.7099   LearningRate 0.0084   Epoch: 14   Global Step: 71860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:37,265-Speed 5480.68 samples/sec   Loss 3.5888   LearningRate 0.0084   Epoch: 14   Global Step: 71870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:39,133-Speed 5484.15 samples/sec   Loss 3.6833   LearningRate 0.0084   Epoch: 14   Global Step: 71880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:40,995-Speed 5503.13 samples/sec   Loss 3.5040   LearningRate 0.0084   Epoch: 14   Global Step: 71890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:42,853-Speed 5511.30 samples/sec   Loss 3.5778   LearningRate 0.0084   Epoch: 14   Global Step: 71900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:44,705-Speed 5529.76 samples/sec   Loss 3.7318   LearningRate 0.0084   Epoch: 14   Global Step: 71910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:46,557-Speed 5531.76 samples/sec   Loss 3.5095   LearningRate 0.0084   Epoch: 14   Global Step: 71920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:48,407-Speed 5539.50 samples/sec   Loss 3.5176   LearningRate 0.0083   Epoch: 14   Global Step: 71930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:50,274-Speed 5487.49 samples/sec   Loss 3.6060   LearningRate 0.0083   Epoch: 14   Global Step: 71940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:52,128-Speed 5525.36 samples/sec   Loss 3.4878   LearningRate 0.0083   Epoch: 14   Global Step: 71950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:53,976-Speed 5545.24 samples/sec   Loss 3.7024   LearningRate 0.0083   Epoch: 14   Global Step: 71960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:55,825-Speed 5541.43 samples/sec   Loss 3.6240   LearningRate 0.0083   Epoch: 14   Global Step: 71970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:57,686-Speed 5502.30 samples/sec   Loss 3.4624   LearningRate 0.0083   Epoch: 14   Global Step: 71980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:52:59,542-Speed 5519.60 samples/sec   Loss 3.6188   LearningRate 0.0083   Epoch: 14   Global Step: 71990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:53:01,393-Speed 5533.42 samples/sec   Loss 3.5677   LearningRate 0.0083   Epoch: 14   Global Step: 72000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:53:30,698-[lfw][72000]XNorm: 22.745616
Training: 2022-04-11 14:53:30,698-[lfw][72000]Accuracy-Flip: 0.99783+-0.00279
Training: 2022-04-11 14:53:30,699-[lfw][72000]Accuracy-Highest: 0.99817
Training: 2022-04-11 14:54:01,540-[cfp_fp][72000]XNorm: 20.677433
Training: 2022-04-11 14:54:01,541-[cfp_fp][72000]Accuracy-Flip: 0.97686+-0.00622
Training: 2022-04-11 14:54:01,541-[cfp_fp][72000]Accuracy-Highest: 0.98057
Training: 2022-04-11 14:54:28,158-[agedb_30][72000]XNorm: 22.580363
Training: 2022-04-11 14:54:28,158-[agedb_30][72000]Accuracy-Flip: 0.98217+-0.00619
Training: 2022-04-11 14:54:28,159-[agedb_30][72000]Accuracy-Highest: 0.98217
Training: 2022-04-11 14:54:30,042-Speed 115.51 samples/sec   Loss 3.6505   LearningRate 0.0083   Epoch: 14   Global Step: 72010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:54:31,883-Speed 5566.12 samples/sec   Loss 3.5904   LearningRate 0.0083   Epoch: 14   Global Step: 72020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:54:33,735-Speed 5530.34 samples/sec   Loss 3.5700   LearningRate 0.0083   Epoch: 14   Global Step: 72030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:54:35,577-Speed 5566.75 samples/sec   Loss 3.5577   LearningRate 0.0083   Epoch: 14   Global Step: 72040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:54:37,447-Speed 5478.81 samples/sec   Loss 3.5716   LearningRate 0.0083   Epoch: 14   Global Step: 72050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:54:39,294-Speed 5548.19 samples/sec   Loss 3.5341   LearningRate 0.0083   Epoch: 14   Global Step: 72060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:54:41,141-Speed 5546.89 samples/sec   Loss 3.5399   LearningRate 0.0083   Epoch: 14   Global Step: 72070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:54:43,022-Speed 5446.38 samples/sec   Loss 3.5483   LearningRate 0.0083   Epoch: 14   Global Step: 72080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:54:44,864-Speed 5562.80 samples/sec   Loss 3.5305   LearningRate 0.0083   Epoch: 14   Global Step: 72090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:54:46,712-Speed 5543.01 samples/sec   Loss 3.5456   LearningRate 0.0083   Epoch: 14   Global Step: 72100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:54:48,590-Speed 5456.18 samples/sec   Loss 3.5733   LearningRate 0.0082   Epoch: 14   Global Step: 72110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:54:50,508-Speed 5343.85 samples/sec   Loss 3.5964   LearningRate 0.0082   Epoch: 14   Global Step: 72120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:54:52,358-Speed 5539.39 samples/sec   Loss 3.6696   LearningRate 0.0082   Epoch: 14   Global Step: 72130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:54:54,242-Speed 5438.05 samples/sec   Loss 3.5166   LearningRate 0.0082   Epoch: 14   Global Step: 72140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:54:56,074-Speed 5592.59 samples/sec   Loss 3.5006   LearningRate 0.0082   Epoch: 14   Global Step: 72150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:54:57,957-Speed 5444.39 samples/sec   Loss 3.5089   LearningRate 0.0082   Epoch: 14   Global Step: 72160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:54:59,838-Speed 5446.84 samples/sec   Loss 3.5872   LearningRate 0.0082   Epoch: 14   Global Step: 72170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:01,744-Speed 5374.06 samples/sec   Loss 3.6836   LearningRate 0.0082   Epoch: 14   Global Step: 72180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:03,606-Speed 5504.54 samples/sec   Loss 3.5877   LearningRate 0.0082   Epoch: 14   Global Step: 72190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:05,491-Speed 5436.64 samples/sec   Loss 3.5711   LearningRate 0.0082   Epoch: 14   Global Step: 72200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:07,402-Speed 5358.92 samples/sec   Loss 3.5393   LearningRate 0.0082   Epoch: 14   Global Step: 72210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:09,271-Speed 5482.81 samples/sec   Loss 3.5558   LearningRate 0.0082   Epoch: 14   Global Step: 72220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:11,131-Speed 5509.36 samples/sec   Loss 3.5921   LearningRate 0.0082   Epoch: 14   Global Step: 72230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:12,998-Speed 5485.89 samples/sec   Loss 3.6073   LearningRate 0.0082   Epoch: 14   Global Step: 72240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:14,875-Speed 5460.25 samples/sec   Loss 3.5104   LearningRate 0.0082   Epoch: 14   Global Step: 72250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:55:16,738-Speed 5497.73 samples/sec   Loss 3.6796   LearningRate 0.0082   Epoch: 14   Global Step: 72260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:55:18,594-Speed 5520.87 samples/sec   Loss 3.6236   LearningRate 0.0082   Epoch: 14   Global Step: 72270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:55:20,456-Speed 5504.69 samples/sec   Loss 3.6289   LearningRate 0.0082   Epoch: 14   Global Step: 72280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:55:22,302-Speed 5549.49 samples/sec   Loss 3.7108   LearningRate 0.0081   Epoch: 14   Global Step: 72290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:24,201-Speed 5395.83 samples/sec   Loss 3.5905   LearningRate 0.0081   Epoch: 14   Global Step: 72300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:26,050-Speed 5542.03 samples/sec   Loss 3.5077   LearningRate 0.0081   Epoch: 14   Global Step: 72310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:27,902-Speed 5529.55 samples/sec   Loss 3.5355   LearningRate 0.0081   Epoch: 14   Global Step: 72320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:29,772-Speed 5479.13 samples/sec   Loss 3.5957   LearningRate 0.0081   Epoch: 14   Global Step: 72330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:31,635-Speed 5497.32 samples/sec   Loss 3.5480   LearningRate 0.0081   Epoch: 14   Global Step: 72340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:33,504-Speed 5481.89 samples/sec   Loss 3.5922   LearningRate 0.0081   Epoch: 14   Global Step: 72350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:35,353-Speed 5540.48 samples/sec   Loss 3.6134   LearningRate 0.0081   Epoch: 14   Global Step: 72360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:37,218-Speed 5493.61 samples/sec   Loss 3.5706   LearningRate 0.0081   Epoch: 14   Global Step: 72370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:39,079-Speed 5503.14 samples/sec   Loss 3.6306   LearningRate 0.0081   Epoch: 14   Global Step: 72380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:40,921-Speed 5563.29 samples/sec   Loss 3.6621   LearningRate 0.0081   Epoch: 14   Global Step: 72390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:42,773-Speed 5528.28 samples/sec   Loss 3.5137   LearningRate 0.0081   Epoch: 14   Global Step: 72400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:44,625-Speed 5534.05 samples/sec   Loss 3.5361   LearningRate 0.0081   Epoch: 14   Global Step: 72410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:46,477-Speed 5528.22 samples/sec   Loss 3.6583   LearningRate 0.0081   Epoch: 14   Global Step: 72420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:48,374-Speed 5401.06 samples/sec   Loss 3.6044   LearningRate 0.0081   Epoch: 14   Global Step: 72430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:50,279-Speed 5379.72 samples/sec   Loss 3.7555   LearningRate 0.0081   Epoch: 14   Global Step: 72440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:52,153-Speed 5464.77 samples/sec   Loss 3.7113   LearningRate 0.0081   Epoch: 14   Global Step: 72450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:54,025-Speed 5472.85 samples/sec   Loss 3.6029   LearningRate 0.0080   Epoch: 14   Global Step: 72460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:55,870-Speed 5551.85 samples/sec   Loss 3.6015   LearningRate 0.0080   Epoch: 14   Global Step: 72470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:57,711-Speed 5562.50 samples/sec   Loss 3.6314   LearningRate 0.0080   Epoch: 14   Global Step: 72480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:55:59,555-Speed 5556.91 samples/sec   Loss 3.5267   LearningRate 0.0080   Epoch: 14   Global Step: 72490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:56:01,395-Speed 5567.47 samples/sec   Loss 3.5757   LearningRate 0.0080   Epoch: 14   Global Step: 72500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:56:03,247-Speed 5531.40 samples/sec   Loss 3.6431   LearningRate 0.0080   Epoch: 14   Global Step: 72510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:56:05,128-Speed 5445.25 samples/sec   Loss 3.6103   LearningRate 0.0080   Epoch: 14   Global Step: 72520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:56:06,981-Speed 5527.52 samples/sec   Loss 3.5858   LearningRate 0.0080   Epoch: 14   Global Step: 72530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:56:08,825-Speed 5556.65 samples/sec   Loss 3.6647   LearningRate 0.0080   Epoch: 14   Global Step: 72540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:56:10,679-Speed 5525.55 samples/sec   Loss 3.6888   LearningRate 0.0080   Epoch: 14   Global Step: 72550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:56:12,545-Speed 5489.55 samples/sec   Loss 3.6165   LearningRate 0.0080   Epoch: 14   Global Step: 72560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:56:14,425-Speed 5450.53 samples/sec   Loss 3.7861   LearningRate 0.0080   Epoch: 14   Global Step: 72570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:56:16,289-Speed 5495.10 samples/sec   Loss 3.5077   LearningRate 0.0080   Epoch: 14   Global Step: 72580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:56:18,149-Speed 5506.74 samples/sec   Loss 3.5884   LearningRate 0.0080   Epoch: 14   Global Step: 72590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:56:20,012-Speed 5497.34 samples/sec   Loss 3.7876   LearningRate 0.0080   Epoch: 14   Global Step: 72600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:56:21,868-Speed 5519.07 samples/sec   Loss 3.5060   LearningRate 0.0080   Epoch: 14   Global Step: 72610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:56:23,725-Speed 5517.56 samples/sec   Loss 3.7237   LearningRate 0.0080   Epoch: 14   Global Step: 72620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:56:25,599-Speed 5466.67 samples/sec   Loss 3.5564   LearningRate 0.0080   Epoch: 14   Global Step: 72630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:56:27,490-Speed 5415.35 samples/sec   Loss 3.5986   LearningRate 0.0079   Epoch: 14   Global Step: 72640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:56:29,336-Speed 5549.71 samples/sec   Loss 3.6892   LearningRate 0.0079   Epoch: 14   Global Step: 72650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:56:31,190-Speed 5525.49 samples/sec   Loss 3.6168   LearningRate 0.0079   Epoch: 14   Global Step: 72660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:56:33,034-Speed 5555.83 samples/sec   Loss 3.5530   LearningRate 0.0079   Epoch: 14   Global Step: 72670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:56:34,888-Speed 5525.72 samples/sec   Loss 3.6561   LearningRate 0.0079   Epoch: 14   Global Step: 72680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:56:36,736-Speed 5544.41 samples/sec   Loss 3.5711   LearningRate 0.0079   Epoch: 14   Global Step: 72690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:56:38,576-Speed 5565.28 samples/sec   Loss 3.6025   LearningRate 0.0079   Epoch: 14   Global Step: 72700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:56:40,417-Speed 5563.84 samples/sec   Loss 3.6698   LearningRate 0.0079   Epoch: 14   Global Step: 72710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:56:42,261-Speed 5556.02 samples/sec   Loss 3.6594   LearningRate 0.0079   Epoch: 14   Global Step: 72720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:56:44,106-Speed 5552.21 samples/sec   Loss 3.6177   LearningRate 0.0079   Epoch: 14   Global Step: 72730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:56:45,980-Speed 5467.53 samples/sec   Loss 3.6436   LearningRate 0.0079   Epoch: 14   Global Step: 72740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:56:47,845-Speed 5491.21 samples/sec   Loss 3.6294   LearningRate 0.0079   Epoch: 14   Global Step: 72750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:56:49,700-Speed 5522.23 samples/sec   Loss 3.6471   LearningRate 0.0079   Epoch: 14   Global Step: 72760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:56:51,552-Speed 5532.48 samples/sec   Loss 3.6595   LearningRate 0.0079   Epoch: 14   Global Step: 72770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:56:53,394-Speed 5562.75 samples/sec   Loss 3.6335   LearningRate 0.0079   Epoch: 14   Global Step: 72780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:56:55,248-Speed 5523.22 samples/sec   Loss 3.5847   LearningRate 0.0079   Epoch: 14   Global Step: 72790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:56:57,100-Speed 5532.38 samples/sec   Loss 3.5850   LearningRate 0.0079   Epoch: 14   Global Step: 72800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:56:58,971-Speed 5474.86 samples/sec   Loss 3.5984   LearningRate 0.0079   Epoch: 14   Global Step: 72810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:00,845-Speed 5465.77 samples/sec   Loss 3.6253   LearningRate 0.0078   Epoch: 14   Global Step: 72820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:02,733-Speed 5426.15 samples/sec   Loss 3.6176   LearningRate 0.0078   Epoch: 14   Global Step: 72830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:04,600-Speed 5487.30 samples/sec   Loss 3.5963   LearningRate 0.0078   Epoch: 14   Global Step: 72840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:06,466-Speed 5488.99 samples/sec   Loss 3.6106   LearningRate 0.0078   Epoch: 14   Global Step: 72850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:08,321-Speed 5521.64 samples/sec   Loss 3.5633   LearningRate 0.0078   Epoch: 14   Global Step: 72860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:10,178-Speed 5517.63 samples/sec   Loss 3.6845   LearningRate 0.0078   Epoch: 14   Global Step: 72870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:12,033-Speed 5522.95 samples/sec   Loss 3.7428   LearningRate 0.0078   Epoch: 14   Global Step: 72880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:57:13,883-Speed 5537.67 samples/sec   Loss 3.7113   LearningRate 0.0078   Epoch: 14   Global Step: 72890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:57:15,751-Speed 5482.76 samples/sec   Loss 3.6295   LearningRate 0.0078   Epoch: 14   Global Step: 72900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:17,613-Speed 5501.98 samples/sec   Loss 3.6986   LearningRate 0.0078   Epoch: 14   Global Step: 72910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:19,463-Speed 5535.71 samples/sec   Loss 3.6841   LearningRate 0.0078   Epoch: 14   Global Step: 72920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:21,311-Speed 5542.48 samples/sec   Loss 3.6820   LearningRate 0.0078   Epoch: 14   Global Step: 72930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:23,189-Speed 5455.24 samples/sec   Loss 3.6623   LearningRate 0.0078   Epoch: 14   Global Step: 72940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:25,050-Speed 5506.84 samples/sec   Loss 3.7122   LearningRate 0.0078   Epoch: 14   Global Step: 72950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:26,935-Speed 5434.17 samples/sec   Loss 3.6217   LearningRate 0.0078   Epoch: 14   Global Step: 72960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:28,790-Speed 5520.95 samples/sec   Loss 3.7007   LearningRate 0.0078   Epoch: 14   Global Step: 72970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:30,631-Speed 5565.24 samples/sec   Loss 3.6588   LearningRate 0.0078   Epoch: 14   Global Step: 72980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:57:32,490-Speed 5508.74 samples/sec   Loss 3.6605   LearningRate 0.0078   Epoch: 14   Global Step: 72990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:57:34,346-Speed 5522.49 samples/sec   Loss 3.7055   LearningRate 0.0077   Epoch: 14   Global Step: 73000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:57:36,193-Speed 5544.51 samples/sec   Loss 3.6966   LearningRate 0.0077   Epoch: 14   Global Step: 73010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:57:38,034-Speed 5563.61 samples/sec   Loss 3.6505   LearningRate 0.0077   Epoch: 14   Global Step: 73020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:57:39,875-Speed 5564.73 samples/sec   Loss 3.5096   LearningRate 0.0077   Epoch: 14   Global Step: 73030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:57:41,721-Speed 5550.06 samples/sec   Loss 3.6298   LearningRate 0.0077   Epoch: 14   Global Step: 73040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:57:43,566-Speed 5553.63 samples/sec   Loss 3.6238   LearningRate 0.0077   Epoch: 14   Global Step: 73050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:57:45,415-Speed 5536.91 samples/sec   Loss 3.6781   LearningRate 0.0077   Epoch: 14   Global Step: 73060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:57:47,280-Speed 5493.26 samples/sec   Loss 3.6442   LearningRate 0.0077   Epoch: 14   Global Step: 73070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:57:49,208-Speed 5313.68 samples/sec   Loss 3.6977   LearningRate 0.0077   Epoch: 14   Global Step: 73080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:51,115-Speed 5371.56 samples/sec   Loss 3.7812   LearningRate 0.0077   Epoch: 14   Global Step: 73090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:52,970-Speed 5523.99 samples/sec   Loss 3.7339   LearningRate 0.0077   Epoch: 14   Global Step: 73100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:54,819-Speed 5541.88 samples/sec   Loss 3.6713   LearningRate 0.0077   Epoch: 14   Global Step: 73110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:56,662-Speed 5555.25 samples/sec   Loss 3.6645   LearningRate 0.0077   Epoch: 14   Global Step: 73120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:57:58,509-Speed 5547.83 samples/sec   Loss 3.7295   LearningRate 0.0077   Epoch: 14   Global Step: 73130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:00,359-Speed 5538.41 samples/sec   Loss 3.6522   LearningRate 0.0077   Epoch: 14   Global Step: 73140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:02,206-Speed 5543.69 samples/sec   Loss 3.5896   LearningRate 0.0077   Epoch: 14   Global Step: 73150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:04,055-Speed 5541.53 samples/sec   Loss 3.6089   LearningRate 0.0077   Epoch: 14   Global Step: 73160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:05,900-Speed 5552.01 samples/sec   Loss 3.6897   LearningRate 0.0077   Epoch: 14   Global Step: 73170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:07,753-Speed 5528.05 samples/sec   Loss 3.7070   LearningRate 0.0077   Epoch: 14   Global Step: 73180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:58:09,613-Speed 5507.37 samples/sec   Loss 3.6508   LearningRate 0.0076   Epoch: 14   Global Step: 73190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:58:11,487-Speed 5464.79 samples/sec   Loss 3.5679   LearningRate 0.0076   Epoch: 14   Global Step: 73200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:58:13,399-Speed 5358.39 samples/sec   Loss 3.7252   LearningRate 0.0076   Epoch: 14   Global Step: 73210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:58:15,265-Speed 5491.98 samples/sec   Loss 3.6313   LearningRate 0.0076   Epoch: 14   Global Step: 73220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:58:17,119-Speed 5525.41 samples/sec   Loss 3.7732   LearningRate 0.0076   Epoch: 14   Global Step: 73230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:58:18,965-Speed 5548.59 samples/sec   Loss 3.6160   LearningRate 0.0076   Epoch: 14   Global Step: 73240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:58:20,810-Speed 5552.97 samples/sec   Loss 3.5793   LearningRate 0.0076   Epoch: 14   Global Step: 73250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:58:22,678-Speed 5482.52 samples/sec   Loss 3.5337   LearningRate 0.0076   Epoch: 14   Global Step: 73260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:58:24,553-Speed 5463.01 samples/sec   Loss 3.6204   LearningRate 0.0076   Epoch: 14   Global Step: 73270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:26,418-Speed 5492.17 samples/sec   Loss 3.6748   LearningRate 0.0076   Epoch: 14   Global Step: 73280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:28,275-Speed 5516.71 samples/sec   Loss 3.5959   LearningRate 0.0076   Epoch: 14   Global Step: 73290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:30,135-Speed 5508.12 samples/sec   Loss 3.6996   LearningRate 0.0076   Epoch: 14   Global Step: 73300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:32,001-Speed 5489.40 samples/sec   Loss 3.5892   LearningRate 0.0076   Epoch: 14   Global Step: 73310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:33,868-Speed 5487.10 samples/sec   Loss 3.6689   LearningRate 0.0076   Epoch: 14   Global Step: 73320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:35,721-Speed 5526.79 samples/sec   Loss 3.6027   LearningRate 0.0076   Epoch: 14   Global Step: 73330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:37,579-Speed 5514.27 samples/sec   Loss 3.6114   LearningRate 0.0076   Epoch: 14   Global Step: 73340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:39,442-Speed 5498.16 samples/sec   Loss 3.6263   LearningRate 0.0076   Epoch: 14   Global Step: 73350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:41,302-Speed 5509.43 samples/sec   Loss 3.5555   LearningRate 0.0076   Epoch: 14   Global Step: 73360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:43,161-Speed 5507.45 samples/sec   Loss 3.6577   LearningRate 0.0075   Epoch: 14   Global Step: 73370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:58:45,001-Speed 5570.40 samples/sec   Loss 3.6994   LearningRate 0.0075   Epoch: 14   Global Step: 73380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:46,845-Speed 5553.19 samples/sec   Loss 3.6250   LearningRate 0.0075   Epoch: 14   Global Step: 73390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:48,706-Speed 5504.17 samples/sec   Loss 3.6517   LearningRate 0.0075   Epoch: 14   Global Step: 73400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:50,557-Speed 5536.12 samples/sec   Loss 3.6089   LearningRate 0.0075   Epoch: 14   Global Step: 73410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:52,420-Speed 5498.50 samples/sec   Loss 3.7286   LearningRate 0.0075   Epoch: 14   Global Step: 73420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:54,277-Speed 5517.51 samples/sec   Loss 3.6905   LearningRate 0.0075   Epoch: 14   Global Step: 73430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:56,124-Speed 5544.55 samples/sec   Loss 3.6455   LearningRate 0.0075   Epoch: 14   Global Step: 73440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:57,982-Speed 5513.76 samples/sec   Loss 3.6657   LearningRate 0.0075   Epoch: 14   Global Step: 73450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:58:59,862-Speed 5449.50 samples/sec   Loss 3.6665   LearningRate 0.0075   Epoch: 14   Global Step: 73460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:01,725-Speed 5497.30 samples/sec   Loss 3.7476   LearningRate 0.0075   Epoch: 14   Global Step: 73470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:03,564-Speed 5572.65 samples/sec   Loss 3.7321   LearningRate 0.0075   Epoch: 14   Global Step: 73480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:05,418-Speed 5524.01 samples/sec   Loss 3.5652   LearningRate 0.0075   Epoch: 14   Global Step: 73490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:07,262-Speed 5555.74 samples/sec   Loss 3.5572   LearningRate 0.0075   Epoch: 14   Global Step: 73500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:09,116-Speed 5524.02 samples/sec   Loss 3.7049   LearningRate 0.0075   Epoch: 14   Global Step: 73510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:10,989-Speed 5470.36 samples/sec   Loss 3.6533   LearningRate 0.0075   Epoch: 14   Global Step: 73520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:12,870-Speed 5444.86 samples/sec   Loss 3.7319   LearningRate 0.0075   Epoch: 14   Global Step: 73530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:14,774-Speed 5382.10 samples/sec   Loss 3.5970   LearningRate 0.0075   Epoch: 14   Global Step: 73540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:16,676-Speed 5386.56 samples/sec   Loss 3.7321   LearningRate 0.0074   Epoch: 14   Global Step: 73550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:18,524-Speed 5542.01 samples/sec   Loss 3.7120   LearningRate 0.0074   Epoch: 14   Global Step: 73560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:20,370-Speed 5549.99 samples/sec   Loss 3.6200   LearningRate 0.0074   Epoch: 14   Global Step: 73570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:22,224-Speed 5524.12 samples/sec   Loss 3.7249   LearningRate 0.0074   Epoch: 14   Global Step: 73580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:59:24,073-Speed 5540.41 samples/sec   Loss 3.7881   LearningRate 0.0074   Epoch: 14   Global Step: 73590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 14:59:25,911-Speed 5573.85 samples/sec   Loss 3.6088   LearningRate 0.0074   Epoch: 14   Global Step: 73600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:27,760-Speed 5540.70 samples/sec   Loss 3.6845   LearningRate 0.0074   Epoch: 14   Global Step: 73610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:29,606-Speed 5548.60 samples/sec   Loss 3.5020   LearningRate 0.0074   Epoch: 14   Global Step: 73620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:31,477-Speed 5475.24 samples/sec   Loss 3.8572   LearningRate 0.0074   Epoch: 14   Global Step: 73630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:33,330-Speed 5526.51 samples/sec   Loss 3.6927   LearningRate 0.0074   Epoch: 14   Global Step: 73640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:35,167-Speed 5577.48 samples/sec   Loss 3.7403   LearningRate 0.0074   Epoch: 14   Global Step: 73650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:59:37,014-Speed 5545.60 samples/sec   Loss 3.6962   LearningRate 0.0074   Epoch: 14   Global Step: 73660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:59:38,858-Speed 5556.92 samples/sec   Loss 3.5671   LearningRate 0.0074   Epoch: 14   Global Step: 73670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:59:40,714-Speed 5518.09 samples/sec   Loss 3.7016   LearningRate 0.0074   Epoch: 14   Global Step: 73680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:59:42,563-Speed 5540.96 samples/sec   Loss 3.6868   LearningRate 0.0074   Epoch: 14   Global Step: 73690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:59:44,408-Speed 5552.35 samples/sec   Loss 3.6222   LearningRate 0.0074   Epoch: 14   Global Step: 73700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:59:46,260-Speed 5530.32 samples/sec   Loss 3.6707   LearningRate 0.0074   Epoch: 14   Global Step: 73710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:59:48,145-Speed 5436.18 samples/sec   Loss 3.6158   LearningRate 0.0074   Epoch: 14   Global Step: 73720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:59:50,031-Speed 5431.70 samples/sec   Loss 3.7173   LearningRate 0.0074   Epoch: 14   Global Step: 73730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:59:51,888-Speed 5515.44 samples/sec   Loss 3.7273   LearningRate 0.0073   Epoch: 14   Global Step: 73740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 14:59:53,739-Speed 5534.29 samples/sec   Loss 3.7627   LearningRate 0.0073   Epoch: 14   Global Step: 73750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:55,584-Speed 5554.13 samples/sec   Loss 3.6268   LearningRate 0.0073   Epoch: 14   Global Step: 73760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:57,426-Speed 5560.48 samples/sec   Loss 3.7715   LearningRate 0.0073   Epoch: 14   Global Step: 73770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 14:59:59,271-Speed 5551.71 samples/sec   Loss 3.7480   LearningRate 0.0073   Epoch: 14   Global Step: 73780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:00:01,122-Speed 5533.05 samples/sec   Loss 3.5972   LearningRate 0.0073   Epoch: 14   Global Step: 73790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:00:02,972-Speed 5539.71 samples/sec   Loss 3.6363   LearningRate 0.0073   Epoch: 14   Global Step: 73800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:00:04,829-Speed 5515.17 samples/sec   Loss 3.5416   LearningRate 0.0073   Epoch: 14   Global Step: 73810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:00:06,680-Speed 5533.68 samples/sec   Loss 3.5891   LearningRate 0.0073   Epoch: 14   Global Step: 73820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:00:08,540-Speed 5507.95 samples/sec   Loss 3.5456   LearningRate 0.0073   Epoch: 14   Global Step: 73830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:00:10,387-Speed 5546.54 samples/sec   Loss 3.7684   LearningRate 0.0073   Epoch: 14   Global Step: 73840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:00:12,240-Speed 5528.76 samples/sec   Loss 3.7153   LearningRate 0.0073   Epoch: 14   Global Step: 73850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:00:14,106-Speed 5488.06 samples/sec   Loss 3.6356   LearningRate 0.0073   Epoch: 14   Global Step: 73860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:00:15,971-Speed 5495.19 samples/sec   Loss 3.6547   LearningRate 0.0073   Epoch: 14   Global Step: 73870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:00:17,842-Speed 5475.02 samples/sec   Loss 3.6767   LearningRate 0.0073   Epoch: 14   Global Step: 73880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:00:19,708-Speed 5490.78 samples/sec   Loss 3.5807   LearningRate 0.0073   Epoch: 14   Global Step: 73890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:00:21,553-Speed 5550.14 samples/sec   Loss 3.7007   LearningRate 0.0073   Epoch: 14   Global Step: 73900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:00:23,417-Speed 5495.52 samples/sec   Loss 3.5616   LearningRate 0.0073   Epoch: 14   Global Step: 73910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:00:25,277-Speed 5506.61 samples/sec   Loss 3.6598   LearningRate 0.0073   Epoch: 14   Global Step: 73920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:00:27,134-Speed 5517.65 samples/sec   Loss 3.7446   LearningRate 0.0072   Epoch: 14   Global Step: 73930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:00:28,984-Speed 5538.28 samples/sec   Loss 3.6577   LearningRate 0.0072   Epoch: 14   Global Step: 73940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:00:30,827-Speed 5556.74 samples/sec   Loss 3.6990   LearningRate 0.0072   Epoch: 14   Global Step: 73950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:00:32,687-Speed 5508.20 samples/sec   Loss 3.6424   LearningRate 0.0072   Epoch: 14   Global Step: 73960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:00:34,559-Speed 5472.19 samples/sec   Loss 3.6622   LearningRate 0.0072   Epoch: 14   Global Step: 73970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:00:36,453-Speed 5408.94 samples/sec   Loss 3.6009   LearningRate 0.0072   Epoch: 14   Global Step: 73980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:00:38,333-Speed 5450.19 samples/sec   Loss 3.7683   LearningRate 0.0072   Epoch: 14   Global Step: 73990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:00:40,185-Speed 5530.18 samples/sec   Loss 3.5959   LearningRate 0.0072   Epoch: 14   Global Step: 74000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:01:06,962-[lfw][74000]XNorm: 22.630857
Training: 2022-04-11 15:01:06,963-[lfw][74000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 15:01:06,963-[lfw][74000]Accuracy-Highest: 0.99817
Training: 2022-04-11 15:01:41,269-[cfp_fp][74000]XNorm: 21.012182
Training: 2022-04-11 15:01:41,270-[cfp_fp][74000]Accuracy-Flip: 0.98000+-0.00709
Training: 2022-04-11 15:01:41,270-[cfp_fp][74000]Accuracy-Highest: 0.98057
Training: 2022-04-11 15:02:10,625-[agedb_30][74000]XNorm: 22.659951
Training: 2022-04-11 15:02:10,625-[agedb_30][74000]Accuracy-Flip: 0.97900+-0.00754
Training: 2022-04-11 15:02:10,626-[agedb_30][74000]Accuracy-Highest: 0.98217
Training: 2022-04-11 15:02:12,509-Speed 110.91 samples/sec   Loss 3.6705   LearningRate 0.0072   Epoch: 14   Global Step: 74010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:02:14,360-Speed 5535.16 samples/sec   Loss 3.7274   LearningRate 0.0072   Epoch: 14   Global Step: 74020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:02:16,214-Speed 5524.07 samples/sec   Loss 3.6500   LearningRate 0.0072   Epoch: 14   Global Step: 74030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:02:18,065-Speed 5533.69 samples/sec   Loss 3.7218   LearningRate 0.0072   Epoch: 14   Global Step: 74040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:02:19,902-Speed 5578.13 samples/sec   Loss 3.6610   LearningRate 0.0072   Epoch: 14   Global Step: 74050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:02:21,743-Speed 5563.70 samples/sec   Loss 3.6391   LearningRate 0.0072   Epoch: 14   Global Step: 74060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:02:23,599-Speed 5519.02 samples/sec   Loss 3.6772   LearningRate 0.0072   Epoch: 14   Global Step: 74070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:02:25,448-Speed 5539.49 samples/sec   Loss 3.7759   LearningRate 0.0072   Epoch: 14   Global Step: 74080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:02:27,299-Speed 5534.73 samples/sec   Loss 3.5913   LearningRate 0.0072   Epoch: 14   Global Step: 74090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:02:29,146-Speed 5547.25 samples/sec   Loss 3.7152   LearningRate 0.0072   Epoch: 14   Global Step: 74100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:02:30,987-Speed 5565.03 samples/sec   Loss 3.6530   LearningRate 0.0072   Epoch: 14   Global Step: 74110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:02:32,832-Speed 5550.03 samples/sec   Loss 3.6474   LearningRate 0.0071   Epoch: 14   Global Step: 74120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:02:34,706-Speed 5465.58 samples/sec   Loss 3.6258   LearningRate 0.0071   Epoch: 14   Global Step: 74130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:02:36,561-Speed 5523.73 samples/sec   Loss 3.6826   LearningRate 0.0071   Epoch: 14   Global Step: 74140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:02:38,409-Speed 5544.76 samples/sec   Loss 3.6376   LearningRate 0.0071   Epoch: 14   Global Step: 74150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:02:40,301-Speed 5412.08 samples/sec   Loss 3.6488   LearningRate 0.0071   Epoch: 14   Global Step: 74160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:02:42,165-Speed 5497.59 samples/sec   Loss 3.7133   LearningRate 0.0071   Epoch: 14   Global Step: 74170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:02:44,026-Speed 5504.66 samples/sec   Loss 3.5619   LearningRate 0.0071   Epoch: 14   Global Step: 74180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:02:45,869-Speed 5559.42 samples/sec   Loss 3.6895   LearningRate 0.0071   Epoch: 14   Global Step: 74190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:02:47,713-Speed 5555.22 samples/sec   Loss 3.7004   LearningRate 0.0071   Epoch: 14   Global Step: 74200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:02:49,576-Speed 5499.64 samples/sec   Loss 3.6902   LearningRate 0.0071   Epoch: 14   Global Step: 74210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:02:51,436-Speed 5508.95 samples/sec   Loss 3.6299   LearningRate 0.0071   Epoch: 14   Global Step: 74220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:02:53,281-Speed 5549.22 samples/sec   Loss 3.7100   LearningRate 0.0071   Epoch: 14   Global Step: 74230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:02:55,183-Speed 5387.40 samples/sec   Loss 3.5673   LearningRate 0.0071   Epoch: 14   Global Step: 74240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:02:57,031-Speed 5543.03 samples/sec   Loss 3.6730   LearningRate 0.0071   Epoch: 14   Global Step: 74250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:02:58,877-Speed 5548.31 samples/sec   Loss 3.7107   LearningRate 0.0071   Epoch: 14   Global Step: 74260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:03:00,740-Speed 5502.00 samples/sec   Loss 3.6596   LearningRate 0.0071   Epoch: 14   Global Step: 74270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:03:02,584-Speed 5553.53 samples/sec   Loss 3.6972   LearningRate 0.0071   Epoch: 14   Global Step: 74280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:03:04,432-Speed 5542.45 samples/sec   Loss 3.5839   LearningRate 0.0071   Epoch: 14   Global Step: 74290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:06,297-Speed 5492.87 samples/sec   Loss 3.7307   LearningRate 0.0071   Epoch: 14   Global Step: 74300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:08,145-Speed 5543.88 samples/sec   Loss 3.7998   LearningRate 0.0070   Epoch: 14   Global Step: 74310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:10,011-Speed 5488.54 samples/sec   Loss 3.5776   LearningRate 0.0070   Epoch: 14   Global Step: 74320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:11,900-Speed 5425.23 samples/sec   Loss 3.7098   LearningRate 0.0070   Epoch: 14   Global Step: 74330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:13,751-Speed 5531.84 samples/sec   Loss 3.6387   LearningRate 0.0070   Epoch: 14   Global Step: 74340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:15,620-Speed 5480.68 samples/sec   Loss 3.6414   LearningRate 0.0070   Epoch: 14   Global Step: 74350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:17,471-Speed 5533.57 samples/sec   Loss 3.7055   LearningRate 0.0070   Epoch: 14   Global Step: 74360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:19,322-Speed 5534.64 samples/sec   Loss 3.6837   LearningRate 0.0070   Epoch: 14   Global Step: 74370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:21,173-Speed 5537.92 samples/sec   Loss 3.5588   LearningRate 0.0070   Epoch: 14   Global Step: 74380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:23,050-Speed 5457.26 samples/sec   Loss 3.7275   LearningRate 0.0070   Epoch: 14   Global Step: 74390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:03:24,949-Speed 5391.66 samples/sec   Loss 3.5881   LearningRate 0.0070   Epoch: 14   Global Step: 74400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:03:26,833-Speed 5440.03 samples/sec   Loss 3.7644   LearningRate 0.0070   Epoch: 14   Global Step: 74410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:03:28,698-Speed 5490.27 samples/sec   Loss 3.6222   LearningRate 0.0070   Epoch: 14   Global Step: 74420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:03:30,568-Speed 5478.43 samples/sec   Loss 3.6063   LearningRate 0.0070   Epoch: 14   Global Step: 74430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:03:32,415-Speed 5548.10 samples/sec   Loss 3.6055   LearningRate 0.0070   Epoch: 14   Global Step: 74440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:03:34,274-Speed 5507.88 samples/sec   Loss 3.5655   LearningRate 0.0070   Epoch: 14   Global Step: 74450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:36,155-Speed 5446.64 samples/sec   Loss 3.6043   LearningRate 0.0070   Epoch: 14   Global Step: 74460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:38,006-Speed 5535.20 samples/sec   Loss 3.6722   LearningRate 0.0070   Epoch: 14   Global Step: 74470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:39,847-Speed 5562.53 samples/sec   Loss 3.7371   LearningRate 0.0070   Epoch: 14   Global Step: 74480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:41,710-Speed 5501.09 samples/sec   Loss 3.6725   LearningRate 0.0070   Epoch: 14   Global Step: 74490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:43,561-Speed 5533.69 samples/sec   Loss 3.5763   LearningRate 0.0069   Epoch: 14   Global Step: 74500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:45,411-Speed 5537.67 samples/sec   Loss 3.5388   LearningRate 0.0069   Epoch: 14   Global Step: 74510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:47,304-Speed 5411.32 samples/sec   Loss 3.6747   LearningRate 0.0069   Epoch: 14   Global Step: 74520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:49,214-Speed 5360.83 samples/sec   Loss 3.6317   LearningRate 0.0069   Epoch: 14   Global Step: 74530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:51,097-Speed 5441.03 samples/sec   Loss 3.6333   LearningRate 0.0069   Epoch: 14   Global Step: 74540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:03:52,943-Speed 5549.15 samples/sec   Loss 3.5185   LearningRate 0.0069   Epoch: 14   Global Step: 74550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:03:54,816-Speed 5468.76 samples/sec   Loss 3.7511   LearningRate 0.0069   Epoch: 14   Global Step: 74560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:03:56,664-Speed 5544.11 samples/sec   Loss 3.6160   LearningRate 0.0069   Epoch: 14   Global Step: 74570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:03:58,521-Speed 5516.67 samples/sec   Loss 3.6239   LearningRate 0.0069   Epoch: 14   Global Step: 74580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:04:00,377-Speed 5520.59 samples/sec   Loss 3.7595   LearningRate 0.0069   Epoch: 14   Global Step: 74590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:04:02,248-Speed 5474.18 samples/sec   Loss 3.6736   LearningRate 0.0069   Epoch: 14   Global Step: 74600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:04,093-Speed 5551.90 samples/sec   Loss 3.5967   LearningRate 0.0069   Epoch: 14   Global Step: 74610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:05,933-Speed 5567.55 samples/sec   Loss 3.5867   LearningRate 0.0069   Epoch: 14   Global Step: 74620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:07,779-Speed 5549.22 samples/sec   Loss 3.5234   LearningRate 0.0069   Epoch: 14   Global Step: 74630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:09,621-Speed 5559.94 samples/sec   Loss 3.5342   LearningRate 0.0069   Epoch: 14   Global Step: 74640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:11,480-Speed 5512.23 samples/sec   Loss 3.7197   LearningRate 0.0069   Epoch: 14   Global Step: 74650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:13,367-Speed 5429.51 samples/sec   Loss 3.6469   LearningRate 0.0069   Epoch: 14   Global Step: 74660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:15,205-Speed 5570.50 samples/sec   Loss 3.6753   LearningRate 0.0069   Epoch: 14   Global Step: 74670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:04:17,052-Speed 5549.62 samples/sec   Loss 3.5670   LearningRate 0.0069   Epoch: 14   Global Step: 74680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:04:18,899-Speed 5544.82 samples/sec   Loss 3.5516   LearningRate 0.0068   Epoch: 14   Global Step: 74690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:04:20,742-Speed 5559.68 samples/sec   Loss 3.6037   LearningRate 0.0068   Epoch: 14   Global Step: 74700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:04:22,586-Speed 5554.05 samples/sec   Loss 3.7028   LearningRate 0.0068   Epoch: 14   Global Step: 74710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:04:24,431-Speed 5552.86 samples/sec   Loss 3.6780   LearningRate 0.0068   Epoch: 14   Global Step: 74720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:04:26,274-Speed 5558.12 samples/sec   Loss 3.6420   LearningRate 0.0068   Epoch: 14   Global Step: 74730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:04:28,129-Speed 5523.07 samples/sec   Loss 3.6332   LearningRate 0.0068   Epoch: 14   Global Step: 74740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:04:29,981-Speed 5530.65 samples/sec   Loss 3.5187   LearningRate 0.0068   Epoch: 14   Global Step: 74750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:04:31,825-Speed 5555.73 samples/sec   Loss 3.5355   LearningRate 0.0068   Epoch: 14   Global Step: 74760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:04:33,675-Speed 5537.29 samples/sec   Loss 3.6414   LearningRate 0.0068   Epoch: 14   Global Step: 74770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:35,537-Speed 5499.01 samples/sec   Loss 3.7400   LearningRate 0.0068   Epoch: 14   Global Step: 74780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:37,448-Speed 5361.40 samples/sec   Loss 3.4779   LearningRate 0.0068   Epoch: 14   Global Step: 74790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:39,296-Speed 5542.81 samples/sec   Loss 3.6877   LearningRate 0.0068   Epoch: 14   Global Step: 74800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:41,149-Speed 5528.36 samples/sec   Loss 3.5901   LearningRate 0.0068   Epoch: 14   Global Step: 74810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:42,993-Speed 5555.47 samples/sec   Loss 3.7751   LearningRate 0.0068   Epoch: 14   Global Step: 74820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:44,840-Speed 5546.05 samples/sec   Loss 3.5509   LearningRate 0.0068   Epoch: 14   Global Step: 74830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:46,684-Speed 5556.25 samples/sec   Loss 3.5631   LearningRate 0.0068   Epoch: 14   Global Step: 74840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:48,532-Speed 5543.02 samples/sec   Loss 3.5768   LearningRate 0.0068   Epoch: 14   Global Step: 74850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:50,383-Speed 5533.89 samples/sec   Loss 3.6745   LearningRate 0.0068   Epoch: 14   Global Step: 74860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:52,231-Speed 5545.00 samples/sec   Loss 3.6070   LearningRate 0.0068   Epoch: 14   Global Step: 74870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:04:54,084-Speed 5527.19 samples/sec   Loss 3.6460   LearningRate 0.0067   Epoch: 14   Global Step: 74880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:55,934-Speed 5537.54 samples/sec   Loss 3.4497   LearningRate 0.0067   Epoch: 14   Global Step: 74890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:57,796-Speed 5502.13 samples/sec   Loss 3.6382   LearningRate 0.0067   Epoch: 14   Global Step: 74900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:04:59,661-Speed 5491.38 samples/sec   Loss 3.5606   LearningRate 0.0067   Epoch: 14   Global Step: 74910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:01,539-Speed 5454.09 samples/sec   Loss 3.5342   LearningRate 0.0067   Epoch: 14   Global Step: 74920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:03,406-Speed 5488.88 samples/sec   Loss 3.6982   LearningRate 0.0067   Epoch: 14   Global Step: 74930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:05,293-Speed 5428.20 samples/sec   Loss 3.8040   LearningRate 0.0067   Epoch: 14   Global Step: 74940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:07,138-Speed 5553.48 samples/sec   Loss 3.5263   LearningRate 0.0067   Epoch: 14   Global Step: 74950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:08,983-Speed 5551.43 samples/sec   Loss 3.6107   LearningRate 0.0067   Epoch: 14   Global Step: 74960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:10,836-Speed 5527.35 samples/sec   Loss 3.7097   LearningRate 0.0067   Epoch: 14   Global Step: 74970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:12,712-Speed 5461.09 samples/sec   Loss 3.7351   LearningRate 0.0067   Epoch: 14   Global Step: 74980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:05:14,580-Speed 5483.58 samples/sec   Loss 3.6796   LearningRate 0.0067   Epoch: 14   Global Step: 74990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:05:16,424-Speed 5556.51 samples/sec   Loss 3.6476   LearningRate 0.0067   Epoch: 14   Global Step: 75000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:18,273-Speed 5537.02 samples/sec   Loss 3.6127   LearningRate 0.0067   Epoch: 14   Global Step: 75010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:20,124-Speed 5534.91 samples/sec   Loss 3.5070   LearningRate 0.0067   Epoch: 14   Global Step: 75020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:21,974-Speed 5538.31 samples/sec   Loss 3.6512   LearningRate 0.0067   Epoch: 14   Global Step: 75030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:23,837-Speed 5497.10 samples/sec   Loss 3.6153   LearningRate 0.0067   Epoch: 14   Global Step: 75040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:25,696-Speed 5511.51 samples/sec   Loss 3.5214   LearningRate 0.0067   Epoch: 14   Global Step: 75050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:27,544-Speed 5543.95 samples/sec   Loss 3.6184   LearningRate 0.0067   Epoch: 14   Global Step: 75060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:29,393-Speed 5541.38 samples/sec   Loss 3.6079   LearningRate 0.0067   Epoch: 14   Global Step: 75070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:31,248-Speed 5520.95 samples/sec   Loss 3.5736   LearningRate 0.0066   Epoch: 14   Global Step: 75080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:33,100-Speed 5531.79 samples/sec   Loss 3.5436   LearningRate 0.0066   Epoch: 14   Global Step: 75090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:34,948-Speed 5541.94 samples/sec   Loss 3.6450   LearningRate 0.0066   Epoch: 14   Global Step: 75100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:05:36,785-Speed 5578.36 samples/sec   Loss 3.5911   LearningRate 0.0066   Epoch: 14   Global Step: 75110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:38,640-Speed 5522.18 samples/sec   Loss 3.5967   LearningRate 0.0066   Epoch: 14   Global Step: 75120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:40,490-Speed 5533.96 samples/sec   Loss 3.6473   LearningRate 0.0066   Epoch: 14   Global Step: 75130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:42,343-Speed 5530.11 samples/sec   Loss 3.6364   LearningRate 0.0066   Epoch: 14   Global Step: 75140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:44,193-Speed 5536.59 samples/sec   Loss 3.6324   LearningRate 0.0066   Epoch: 14   Global Step: 75150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:46,045-Speed 5532.45 samples/sec   Loss 3.5739   LearningRate 0.0066   Epoch: 14   Global Step: 75160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:47,907-Speed 5500.83 samples/sec   Loss 3.7522   LearningRate 0.0066   Epoch: 14   Global Step: 75170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:49,753-Speed 5551.27 samples/sec   Loss 3.5975   LearningRate 0.0066   Epoch: 14   Global Step: 75180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:51,619-Speed 5490.67 samples/sec   Loss 3.6537   LearningRate 0.0066   Epoch: 14   Global Step: 75190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:53,507-Speed 5424.63 samples/sec   Loss 3.5462   LearningRate 0.0066   Epoch: 14   Global Step: 75200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:55,351-Speed 5554.18 samples/sec   Loss 3.5228   LearningRate 0.0066   Epoch: 14   Global Step: 75210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:57,191-Speed 5567.79 samples/sec   Loss 3.5447   LearningRate 0.0066   Epoch: 14   Global Step: 75220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:05:59,034-Speed 5558.02 samples/sec   Loss 3.6550   LearningRate 0.0066   Epoch: 14   Global Step: 75230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:00,887-Speed 5530.92 samples/sec   Loss 3.5826   LearningRate 0.0066   Epoch: 14   Global Step: 75240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:02,733-Speed 5547.25 samples/sec   Loss 3.5062   LearningRate 0.0066   Epoch: 14   Global Step: 75250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:04,588-Speed 5522.63 samples/sec   Loss 3.7427   LearningRate 0.0066   Epoch: 14   Global Step: 75260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:06,431-Speed 5558.19 samples/sec   Loss 3.6595   LearningRate 0.0066   Epoch: 14   Global Step: 75270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:08,276-Speed 5552.50 samples/sec   Loss 3.6258   LearningRate 0.0065   Epoch: 14   Global Step: 75280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:10,131-Speed 5522.28 samples/sec   Loss 3.6179   LearningRate 0.0065   Epoch: 14   Global Step: 75290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:12,012-Speed 5447.51 samples/sec   Loss 3.6409   LearningRate 0.0065   Epoch: 14   Global Step: 75300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:13,885-Speed 5468.59 samples/sec   Loss 3.7102   LearningRate 0.0065   Epoch: 14   Global Step: 75310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:06:15,760-Speed 5462.09 samples/sec   Loss 3.5869   LearningRate 0.0065   Epoch: 14   Global Step: 75320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:17,637-Speed 5459.06 samples/sec   Loss 3.5801   LearningRate 0.0065   Epoch: 14   Global Step: 75330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:19,483-Speed 5549.18 samples/sec   Loss 3.7480   LearningRate 0.0065   Epoch: 14   Global Step: 75340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:21,338-Speed 5522.11 samples/sec   Loss 3.6146   LearningRate 0.0065   Epoch: 14   Global Step: 75350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:23,188-Speed 5535.40 samples/sec   Loss 3.6508   LearningRate 0.0065   Epoch: 14   Global Step: 75360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:25,045-Speed 5517.40 samples/sec   Loss 3.7381   LearningRate 0.0065   Epoch: 14   Global Step: 75370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:26,914-Speed 5479.50 samples/sec   Loss 3.5495   LearningRate 0.0065   Epoch: 14   Global Step: 75380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:28,765-Speed 5535.14 samples/sec   Loss 3.5165   LearningRate 0.0065   Epoch: 14   Global Step: 75390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:30,613-Speed 5544.54 samples/sec   Loss 3.7108   LearningRate 0.0065   Epoch: 14   Global Step: 75400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:32,461-Speed 5542.11 samples/sec   Loss 3.5956   LearningRate 0.0065   Epoch: 14   Global Step: 75410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:34,305-Speed 5555.00 samples/sec   Loss 3.5271   LearningRate 0.0065   Epoch: 14   Global Step: 75420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:06:36,186-Speed 5446.23 samples/sec   Loss 3.6712   LearningRate 0.0065   Epoch: 14   Global Step: 75430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:06:38,039-Speed 5529.60 samples/sec   Loss 3.6191   LearningRate 0.0065   Epoch: 14   Global Step: 75440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:06:39,887-Speed 5541.87 samples/sec   Loss 3.5672   LearningRate 0.0065   Epoch: 14   Global Step: 75450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:06:41,743-Speed 5521.07 samples/sec   Loss 3.6359   LearningRate 0.0065   Epoch: 14   Global Step: 75460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:06:43,603-Speed 5506.29 samples/sec   Loss 3.5120   LearningRate 0.0064   Epoch: 14   Global Step: 75470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:06:45,435-Speed 5590.28 samples/sec   Loss 3.5421   LearningRate 0.0064   Epoch: 14   Global Step: 75480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:47,304-Speed 5483.05 samples/sec   Loss 3.6068   LearningRate 0.0064   Epoch: 14   Global Step: 75490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:49,171-Speed 5484.25 samples/sec   Loss 3.6725   LearningRate 0.0064   Epoch: 14   Global Step: 75500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:51,022-Speed 5535.77 samples/sec   Loss 3.5530   LearningRate 0.0064   Epoch: 14   Global Step: 75510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:52,905-Speed 5441.70 samples/sec   Loss 3.5945   LearningRate 0.0064   Epoch: 14   Global Step: 75520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:54,782-Speed 5455.63 samples/sec   Loss 3.4826   LearningRate 0.0064   Epoch: 14   Global Step: 75530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:56,630-Speed 5546.08 samples/sec   Loss 3.5719   LearningRate 0.0064   Epoch: 14   Global Step: 75540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:06:58,478-Speed 5542.05 samples/sec   Loss 3.5037   LearningRate 0.0064   Epoch: 14   Global Step: 75550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:00,333-Speed 5522.33 samples/sec   Loss 3.5535   LearningRate 0.0064   Epoch: 14   Global Step: 75560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:02,194-Speed 5504.60 samples/sec   Loss 3.5986   LearningRate 0.0064   Epoch: 14   Global Step: 75570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:04,069-Speed 5463.49 samples/sec   Loss 3.6862   LearningRate 0.0064   Epoch: 14   Global Step: 75580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:07:05,940-Speed 5474.81 samples/sec   Loss 3.4611   LearningRate 0.0064   Epoch: 14   Global Step: 75590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:07:07,803-Speed 5497.02 samples/sec   Loss 3.6180   LearningRate 0.0064   Epoch: 14   Global Step: 75600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:09,650-Speed 5546.29 samples/sec   Loss 3.5684   LearningRate 0.0064   Epoch: 14   Global Step: 75610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:11,546-Speed 5406.12 samples/sec   Loss 3.5538   LearningRate 0.0064   Epoch: 14   Global Step: 75620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:13,400-Speed 5525.32 samples/sec   Loss 3.6676   LearningRate 0.0064   Epoch: 14   Global Step: 75630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:15,270-Speed 5475.75 samples/sec   Loss 3.5585   LearningRate 0.0064   Epoch: 14   Global Step: 75640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:17,117-Speed 5546.69 samples/sec   Loss 3.5486   LearningRate 0.0064   Epoch: 14   Global Step: 75650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:18,971-Speed 5524.49 samples/sec   Loss 3.4599   LearningRate 0.0064   Epoch: 14   Global Step: 75660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:20,820-Speed 5539.64 samples/sec   Loss 3.6505   LearningRate 0.0063   Epoch: 14   Global Step: 75670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:22,706-Speed 5433.95 samples/sec   Loss 3.6787   LearningRate 0.0063   Epoch: 14   Global Step: 75680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:24,580-Speed 5463.70 samples/sec   Loss 3.6664   LearningRate 0.0063   Epoch: 14   Global Step: 75690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:26,451-Speed 5475.85 samples/sec   Loss 3.5857   LearningRate 0.0063   Epoch: 14   Global Step: 75700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:07:28,324-Speed 5468.83 samples/sec   Loss 3.6008   LearningRate 0.0063   Epoch: 14   Global Step: 75710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:07:30,174-Speed 5539.17 samples/sec   Loss 3.6090   LearningRate 0.0063   Epoch: 14   Global Step: 75720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:07:32,030-Speed 5519.00 samples/sec   Loss 3.5566   LearningRate 0.0063   Epoch: 14   Global Step: 75730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:07:33,886-Speed 5520.73 samples/sec   Loss 3.6699   LearningRate 0.0063   Epoch: 14   Global Step: 75740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:35,745-Speed 5510.82 samples/sec   Loss 3.5547   LearningRate 0.0063   Epoch: 14   Global Step: 75750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:37,601-Speed 5517.43 samples/sec   Loss 3.6076   LearningRate 0.0063   Epoch: 14   Global Step: 75760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:39,455-Speed 5526.97 samples/sec   Loss 3.7757   LearningRate 0.0063   Epoch: 14   Global Step: 75770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:41,346-Speed 5415.54 samples/sec   Loss 3.5572   LearningRate 0.0063   Epoch: 14   Global Step: 75780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:43,234-Speed 5425.93 samples/sec   Loss 3.5546   LearningRate 0.0063   Epoch: 14   Global Step: 75790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:45,081-Speed 5547.60 samples/sec   Loss 3.6777   LearningRate 0.0063   Epoch: 14   Global Step: 75800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:46,960-Speed 5451.89 samples/sec   Loss 3.5845   LearningRate 0.0063   Epoch: 14   Global Step: 75810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:48,818-Speed 5512.85 samples/sec   Loss 3.6246   LearningRate 0.0063   Epoch: 14   Global Step: 75820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:50,669-Speed 5532.62 samples/sec   Loss 3.5842   LearningRate 0.0063   Epoch: 14   Global Step: 75830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:52,553-Speed 5440.44 samples/sec   Loss 3.5097   LearningRate 0.0063   Epoch: 14   Global Step: 75840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:07:54,406-Speed 5527.83 samples/sec   Loss 3.6059   LearningRate 0.0063   Epoch: 14   Global Step: 75850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:07:56,317-Speed 5361.45 samples/sec   Loss 3.5102   LearningRate 0.0063   Epoch: 14   Global Step: 75860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:07:58,136-Speed 5630.07 samples/sec   Loss 3.5701   LearningRate 0.0063   Epoch: 14   Global Step: 75870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:08:09,042-Speed 938.98 samples/sec   Loss 2.7754   LearningRate 0.0062   Epoch: 15   Global Step: 75880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:08:10,921-Speed 5453.28 samples/sec   Loss 2.7378   LearningRate 0.0062   Epoch: 15   Global Step: 75890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:08:12,927-Speed 5105.76 samples/sec   Loss 2.7984   LearningRate 0.0062   Epoch: 15   Global Step: 75900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:08:14,790-Speed 5497.97 samples/sec   Loss 2.8234   LearningRate 0.0062   Epoch: 15   Global Step: 75910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:08:16,789-Speed 5125.26 samples/sec   Loss 2.7623   LearningRate 0.0062   Epoch: 15   Global Step: 75920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:08:18,637-Speed 5543.13 samples/sec   Loss 2.7584   LearningRate 0.0062   Epoch: 15   Global Step: 75930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:08:20,488-Speed 5533.47 samples/sec   Loss 2.6982   LearningRate 0.0062   Epoch: 15   Global Step: 75940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:08:22,339-Speed 5533.29 samples/sec   Loss 2.6679   LearningRate 0.0062   Epoch: 15   Global Step: 75950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:08:24,188-Speed 5546.37 samples/sec   Loss 2.8115   LearningRate 0.0062   Epoch: 15   Global Step: 75960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:08:26,050-Speed 5501.02 samples/sec   Loss 2.8745   LearningRate 0.0062   Epoch: 15   Global Step: 75970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:08:27,906-Speed 5517.40 samples/sec   Loss 2.6047   LearningRate 0.0062   Epoch: 15   Global Step: 75980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:08:29,755-Speed 5539.35 samples/sec   Loss 2.7713   LearningRate 0.0062   Epoch: 15   Global Step: 75990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:08:31,606-Speed 5535.57 samples/sec   Loss 2.7014   LearningRate 0.0062   Epoch: 15   Global Step: 76000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:08:58,281-[lfw][76000]XNorm: 21.818461
Training: 2022-04-11 15:08:58,282-[lfw][76000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 15:08:58,282-[lfw][76000]Accuracy-Highest: 0.99817
Training: 2022-04-11 15:09:29,081-[cfp_fp][76000]XNorm: 20.213919
Training: 2022-04-11 15:09:29,081-[cfp_fp][76000]Accuracy-Flip: 0.97971+-0.00534
Training: 2022-04-11 15:09:29,082-[cfp_fp][76000]Accuracy-Highest: 0.98057
Training: 2022-04-11 15:09:55,638-[agedb_30][76000]XNorm: 21.908650
Training: 2022-04-11 15:09:55,638-[agedb_30][76000]Accuracy-Flip: 0.98267+-0.00684
Training: 2022-04-11 15:09:55,639-[agedb_30][76000]Accuracy-Highest: 0.98267
Training: 2022-04-11 15:09:57,500-Speed 119.22 samples/sec   Loss 2.7129   LearningRate 0.0062   Epoch: 15   Global Step: 76010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:09:59,343-Speed 5558.50 samples/sec   Loss 2.8287   LearningRate 0.0062   Epoch: 15   Global Step: 76020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:01,182-Speed 5567.92 samples/sec   Loss 2.7993   LearningRate 0.0062   Epoch: 15   Global Step: 76030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:03,041-Speed 5509.15 samples/sec   Loss 2.7418   LearningRate 0.0062   Epoch: 15   Global Step: 76040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:04,883-Speed 5562.74 samples/sec   Loss 2.8485   LearningRate 0.0062   Epoch: 15   Global Step: 76050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:06,723-Speed 5566.99 samples/sec   Loss 2.9035   LearningRate 0.0062   Epoch: 15   Global Step: 76060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:10:08,565-Speed 5560.68 samples/sec   Loss 2.7406   LearningRate 0.0062   Epoch: 15   Global Step: 76070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:10:10,404-Speed 5570.64 samples/sec   Loss 2.8702   LearningRate 0.0061   Epoch: 15   Global Step: 76080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:12,255-Speed 5535.76 samples/sec   Loss 2.8218   LearningRate 0.0061   Epoch: 15   Global Step: 76090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:14,097-Speed 5559.86 samples/sec   Loss 2.9585   LearningRate 0.0061   Epoch: 15   Global Step: 76100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:15,948-Speed 5532.92 samples/sec   Loss 2.9779   LearningRate 0.0061   Epoch: 15   Global Step: 76110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:17,814-Speed 5489.21 samples/sec   Loss 2.8627   LearningRate 0.0061   Epoch: 15   Global Step: 76120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:19,659-Speed 5552.74 samples/sec   Loss 2.7646   LearningRate 0.0061   Epoch: 15   Global Step: 76130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:21,503-Speed 5556.06 samples/sec   Loss 2.7678   LearningRate 0.0061   Epoch: 15   Global Step: 76140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:23,351-Speed 5543.12 samples/sec   Loss 2.8261   LearningRate 0.0061   Epoch: 15   Global Step: 76150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:25,204-Speed 5529.62 samples/sec   Loss 2.7687   LearningRate 0.0061   Epoch: 15   Global Step: 76160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:27,045-Speed 5564.14 samples/sec   Loss 2.7912   LearningRate 0.0061   Epoch: 15   Global Step: 76170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:28,886-Speed 5563.98 samples/sec   Loss 2.8709   LearningRate 0.0061   Epoch: 15   Global Step: 76180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:10:30,724-Speed 5573.84 samples/sec   Loss 2.9218   LearningRate 0.0061   Epoch: 15   Global Step: 76190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:32,566-Speed 5560.93 samples/sec   Loss 2.7945   LearningRate 0.0061   Epoch: 15   Global Step: 76200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:34,424-Speed 5514.55 samples/sec   Loss 2.9343   LearningRate 0.0061   Epoch: 15   Global Step: 76210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:36,261-Speed 5576.04 samples/sec   Loss 2.8321   LearningRate 0.0061   Epoch: 15   Global Step: 76220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:10:38,109-Speed 5541.99 samples/sec   Loss 2.8423   LearningRate 0.0061   Epoch: 15   Global Step: 76230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:10:39,959-Speed 5537.28 samples/sec   Loss 2.8763   LearningRate 0.0061   Epoch: 15   Global Step: 76240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:10:41,803-Speed 5555.28 samples/sec   Loss 2.8840   LearningRate 0.0061   Epoch: 15   Global Step: 76250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:10:43,647-Speed 5554.32 samples/sec   Loss 2.8272   LearningRate 0.0061   Epoch: 15   Global Step: 76260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:10:45,495-Speed 5543.27 samples/sec   Loss 2.7328   LearningRate 0.0061   Epoch: 15   Global Step: 76270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:10:47,352-Speed 5519.35 samples/sec   Loss 2.8747   LearningRate 0.0060   Epoch: 15   Global Step: 76280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:10:49,194-Speed 5558.75 samples/sec   Loss 2.8526   LearningRate 0.0060   Epoch: 15   Global Step: 76290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:10:51,055-Speed 5506.55 samples/sec   Loss 2.9074   LearningRate 0.0060   Epoch: 15   Global Step: 76300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:10:52,918-Speed 5498.96 samples/sec   Loss 2.7954   LearningRate 0.0060   Epoch: 15   Global Step: 76310   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:10:54,761-Speed 5555.58 samples/sec   Loss 2.9631   LearningRate 0.0060   Epoch: 15   Global Step: 76320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:56,606-Speed 5555.40 samples/sec   Loss 2.9173   LearningRate 0.0060   Epoch: 15   Global Step: 76330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:10:58,474-Speed 5482.02 samples/sec   Loss 2.8849   LearningRate 0.0060   Epoch: 15   Global Step: 76340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:00,335-Speed 5506.02 samples/sec   Loss 2.8130   LearningRate 0.0060   Epoch: 15   Global Step: 76350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:02,198-Speed 5497.48 samples/sec   Loss 2.9340   LearningRate 0.0060   Epoch: 15   Global Step: 76360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:04,074-Speed 5461.85 samples/sec   Loss 2.7699   LearningRate 0.0060   Epoch: 15   Global Step: 76370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:05,923-Speed 5540.42 samples/sec   Loss 2.9085   LearningRate 0.0060   Epoch: 15   Global Step: 76380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:07,776-Speed 5527.79 samples/sec   Loss 2.7912   LearningRate 0.0060   Epoch: 15   Global Step: 76390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:09,619-Speed 5555.83 samples/sec   Loss 2.9010   LearningRate 0.0060   Epoch: 15   Global Step: 76400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:11,522-Speed 5384.45 samples/sec   Loss 2.9896   LearningRate 0.0060   Epoch: 15   Global Step: 76410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:13,380-Speed 5513.96 samples/sec   Loss 2.8610   LearningRate 0.0060   Epoch: 15   Global Step: 76420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:11:15,238-Speed 5513.37 samples/sec   Loss 2.9954   LearningRate 0.0060   Epoch: 15   Global Step: 76430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:11:17,074-Speed 5580.61 samples/sec   Loss 2.9093   LearningRate 0.0060   Epoch: 15   Global Step: 76440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:18,915-Speed 5562.70 samples/sec   Loss 2.9848   LearningRate 0.0060   Epoch: 15   Global Step: 76450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:20,761-Speed 5549.87 samples/sec   Loss 2.9188   LearningRate 0.0060   Epoch: 15   Global Step: 76460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:22,610-Speed 5539.92 samples/sec   Loss 2.9369   LearningRate 0.0060   Epoch: 15   Global Step: 76470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:24,458-Speed 5544.29 samples/sec   Loss 2.9652   LearningRate 0.0060   Epoch: 15   Global Step: 76480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:26,334-Speed 5459.12 samples/sec   Loss 2.9418   LearningRate 0.0059   Epoch: 15   Global Step: 76490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:28,182-Speed 5544.32 samples/sec   Loss 2.9531   LearningRate 0.0059   Epoch: 15   Global Step: 76500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:30,028-Speed 5549.17 samples/sec   Loss 2.9471   LearningRate 0.0059   Epoch: 15   Global Step: 76510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:31,878-Speed 5536.94 samples/sec   Loss 2.9733   LearningRate 0.0059   Epoch: 15   Global Step: 76520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:33,726-Speed 5541.23 samples/sec   Loss 3.0073   LearningRate 0.0059   Epoch: 15   Global Step: 76530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:35,589-Speed 5500.91 samples/sec   Loss 2.9363   LearningRate 0.0059   Epoch: 15   Global Step: 76540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:11:37,438-Speed 5539.91 samples/sec   Loss 2.9736   LearningRate 0.0059   Epoch: 15   Global Step: 76550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:39,286-Speed 5544.66 samples/sec   Loss 2.9220   LearningRate 0.0059   Epoch: 15   Global Step: 76560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:41,131-Speed 5550.28 samples/sec   Loss 2.8487   LearningRate 0.0059   Epoch: 15   Global Step: 76570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:42,979-Speed 5545.04 samples/sec   Loss 2.9284   LearningRate 0.0059   Epoch: 15   Global Step: 76580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:44,822-Speed 5557.55 samples/sec   Loss 2.9567   LearningRate 0.0059   Epoch: 15   Global Step: 76590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:46,671-Speed 5540.39 samples/sec   Loss 2.9403   LearningRate 0.0059   Epoch: 15   Global Step: 76600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:48,520-Speed 5539.33 samples/sec   Loss 2.9155   LearningRate 0.0059   Epoch: 15   Global Step: 76610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:50,377-Speed 5514.59 samples/sec   Loss 3.0326   LearningRate 0.0059   Epoch: 15   Global Step: 76620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:52,305-Speed 5314.40 samples/sec   Loss 2.9421   LearningRate 0.0059   Epoch: 15   Global Step: 76630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:54,169-Speed 5496.24 samples/sec   Loss 2.8912   LearningRate 0.0059   Epoch: 15   Global Step: 76640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:11:56,015-Speed 5548.31 samples/sec   Loss 2.9475   LearningRate 0.0059   Epoch: 15   Global Step: 76650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:11:57,869-Speed 5525.53 samples/sec   Loss 2.9749   LearningRate 0.0059   Epoch: 15   Global Step: 76660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:11:59,716-Speed 5546.76 samples/sec   Loss 2.8457   LearningRate 0.0059   Epoch: 15   Global Step: 76670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:12:01,579-Speed 5499.00 samples/sec   Loss 2.9540   LearningRate 0.0059   Epoch: 15   Global Step: 76680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:12:03,439-Speed 5508.63 samples/sec   Loss 2.9591   LearningRate 0.0059   Epoch: 15   Global Step: 76690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:12:05,360-Speed 5331.56 samples/sec   Loss 2.9974   LearningRate 0.0058   Epoch: 15   Global Step: 76700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:12:07,217-Speed 5516.54 samples/sec   Loss 2.9372   LearningRate 0.0058   Epoch: 15   Global Step: 76710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:12:09,066-Speed 5539.87 samples/sec   Loss 2.9851   LearningRate 0.0058   Epoch: 15   Global Step: 76720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:12:10,921-Speed 5522.18 samples/sec   Loss 2.9604   LearningRate 0.0058   Epoch: 15   Global Step: 76730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:12:12,769-Speed 5541.47 samples/sec   Loss 2.9912   LearningRate 0.0058   Epoch: 15   Global Step: 76740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:12:14,654-Speed 5433.59 samples/sec   Loss 2.9410   LearningRate 0.0058   Epoch: 15   Global Step: 76750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:12:16,521-Speed 5487.66 samples/sec   Loss 3.0951   LearningRate 0.0058   Epoch: 15   Global Step: 76760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:12:18,382-Speed 5504.24 samples/sec   Loss 2.9920   LearningRate 0.0058   Epoch: 15   Global Step: 76770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:12:20,229-Speed 5547.96 samples/sec   Loss 2.9921   LearningRate 0.0058   Epoch: 15   Global Step: 76780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:12:22,081-Speed 5530.59 samples/sec   Loss 2.9890   LearningRate 0.0058   Epoch: 15   Global Step: 76790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:12:23,929-Speed 5545.13 samples/sec   Loss 3.0180   LearningRate 0.0058   Epoch: 15   Global Step: 76800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:12:25,786-Speed 5515.77 samples/sec   Loss 2.9854   LearningRate 0.0058   Epoch: 15   Global Step: 76810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:12:27,646-Speed 5506.67 samples/sec   Loss 3.0525   LearningRate 0.0058   Epoch: 15   Global Step: 76820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:12:29,492-Speed 5547.51 samples/sec   Loss 2.8824   LearningRate 0.0058   Epoch: 15   Global Step: 76830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:12:31,387-Speed 5408.79 samples/sec   Loss 2.9772   LearningRate 0.0058   Epoch: 15   Global Step: 76840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:12:33,234-Speed 5543.55 samples/sec   Loss 2.9974   LearningRate 0.0058   Epoch: 15   Global Step: 76850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:12:35,081-Speed 5546.36 samples/sec   Loss 3.0456   LearningRate 0.0058   Epoch: 15   Global Step: 76860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:12:36,938-Speed 5518.01 samples/sec   Loss 2.9265   LearningRate 0.0058   Epoch: 15   Global Step: 76870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:12:38,799-Speed 5503.74 samples/sec   Loss 3.0041   LearningRate 0.0058   Epoch: 15   Global Step: 76880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:12:40,667-Speed 5484.36 samples/sec   Loss 2.9753   LearningRate 0.0058   Epoch: 15   Global Step: 76890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:12:42,524-Speed 5517.04 samples/sec   Loss 3.0365   LearningRate 0.0058   Epoch: 15   Global Step: 76900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:12:44,387-Speed 5498.02 samples/sec   Loss 2.9443   LearningRate 0.0057   Epoch: 15   Global Step: 76910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:12:46,240-Speed 5530.42 samples/sec   Loss 3.0777   LearningRate 0.0057   Epoch: 15   Global Step: 76920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:12:48,078-Speed 5570.26 samples/sec   Loss 2.9337   LearningRate 0.0057   Epoch: 15   Global Step: 76930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:12:49,929-Speed 5536.11 samples/sec   Loss 3.0609   LearningRate 0.0057   Epoch: 15   Global Step: 76940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:12:51,798-Speed 5480.75 samples/sec   Loss 2.9568   LearningRate 0.0057   Epoch: 15   Global Step: 76950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:12:53,658-Speed 5506.65 samples/sec   Loss 2.9468   LearningRate 0.0057   Epoch: 15   Global Step: 76960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:12:55,517-Speed 5509.07 samples/sec   Loss 2.9990   LearningRate 0.0057   Epoch: 15   Global Step: 76970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:12:57,363-Speed 5550.34 samples/sec   Loss 2.9609   LearningRate 0.0057   Epoch: 15   Global Step: 76980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:12:59,212-Speed 5541.37 samples/sec   Loss 3.0534   LearningRate 0.0057   Epoch: 15   Global Step: 76990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:13:01,119-Speed 5373.33 samples/sec   Loss 3.0563   LearningRate 0.0057   Epoch: 15   Global Step: 77000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:13:03,023-Speed 5377.81 samples/sec   Loss 3.0476   LearningRate 0.0057   Epoch: 15   Global Step: 77010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:13:04,888-Speed 5494.08 samples/sec   Loss 3.0185   LearningRate 0.0057   Epoch: 15   Global Step: 77020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:13:06,755-Speed 5487.24 samples/sec   Loss 2.9983   LearningRate 0.0057   Epoch: 15   Global Step: 77030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:13:08,632-Speed 5455.61 samples/sec   Loss 3.0698   LearningRate 0.0057   Epoch: 15   Global Step: 77040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:13:10,493-Speed 5505.63 samples/sec   Loss 3.1937   LearningRate 0.0057   Epoch: 15   Global Step: 77050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:13:12,409-Speed 5345.53 samples/sec   Loss 3.0202   LearningRate 0.0057   Epoch: 15   Global Step: 77060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:13:14,313-Speed 5380.37 samples/sec   Loss 3.0660   LearningRate 0.0057   Epoch: 15   Global Step: 77070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:13:16,164-Speed 5536.22 samples/sec   Loss 3.0120   LearningRate 0.0057   Epoch: 15   Global Step: 77080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 15:13:18,024-Speed 5505.63 samples/sec   Loss 2.9757   LearningRate 0.0057   Epoch: 15   Global Step: 77090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:13:19,873-Speed 5540.38 samples/sec   Loss 2.9721   LearningRate 0.0057   Epoch: 15   Global Step: 77100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:13:21,733-Speed 5509.88 samples/sec   Loss 3.0155   LearningRate 0.0057   Epoch: 15   Global Step: 77110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:13:23,607-Speed 5466.60 samples/sec   Loss 3.0626   LearningRate 0.0056   Epoch: 15   Global Step: 77120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:13:25,474-Speed 5484.01 samples/sec   Loss 3.0022   LearningRate 0.0056   Epoch: 15   Global Step: 77130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:13:27,386-Speed 5358.18 samples/sec   Loss 2.9578   LearningRate 0.0056   Epoch: 15   Global Step: 77140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:13:29,251-Speed 5492.73 samples/sec   Loss 3.0516   LearningRate 0.0056   Epoch: 15   Global Step: 77150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:13:31,136-Speed 5434.78 samples/sec   Loss 3.0644   LearningRate 0.0056   Epoch: 15   Global Step: 77160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:13:32,996-Speed 5508.24 samples/sec   Loss 3.0597   LearningRate 0.0056   Epoch: 15   Global Step: 77170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:13:34,845-Speed 5540.53 samples/sec   Loss 3.0235   LearningRate 0.0056   Epoch: 15   Global Step: 77180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:13:36,735-Speed 5419.27 samples/sec   Loss 2.9979   LearningRate 0.0056   Epoch: 15   Global Step: 77190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:13:38,580-Speed 5553.07 samples/sec   Loss 3.0393   LearningRate 0.0056   Epoch: 15   Global Step: 77200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:13:40,456-Speed 5459.15 samples/sec   Loss 3.1135   LearningRate 0.0056   Epoch: 15   Global Step: 77210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:13:42,336-Speed 5448.54 samples/sec   Loss 3.0942   LearningRate 0.0056   Epoch: 15   Global Step: 77220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:13:44,184-Speed 5543.68 samples/sec   Loss 3.0867   LearningRate 0.0056   Epoch: 15   Global Step: 77230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:13:46,037-Speed 5530.97 samples/sec   Loss 3.0197   LearningRate 0.0056   Epoch: 15   Global Step: 77240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:13:47,894-Speed 5514.86 samples/sec   Loss 3.0582   LearningRate 0.0056   Epoch: 15   Global Step: 77250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:13:49,777-Speed 5440.41 samples/sec   Loss 3.0947   LearningRate 0.0056   Epoch: 15   Global Step: 77260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:13:51,647-Speed 5476.65 samples/sec   Loss 3.0380   LearningRate 0.0056   Epoch: 15   Global Step: 77270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:13:53,509-Speed 5501.54 samples/sec   Loss 3.0031   LearningRate 0.0056   Epoch: 15   Global Step: 77280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:13:55,364-Speed 5522.92 samples/sec   Loss 2.9841   LearningRate 0.0056   Epoch: 15   Global Step: 77290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:13:57,221-Speed 5515.55 samples/sec   Loss 3.1834   LearningRate 0.0056   Epoch: 15   Global Step: 77300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:13:59,069-Speed 5543.49 samples/sec   Loss 3.0020   LearningRate 0.0056   Epoch: 15   Global Step: 77310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:00,921-Speed 5530.98 samples/sec   Loss 2.9481   LearningRate 0.0056   Epoch: 15   Global Step: 77320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:02,779-Speed 5515.24 samples/sec   Loss 3.1077   LearningRate 0.0055   Epoch: 15   Global Step: 77330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:04,652-Speed 5469.47 samples/sec   Loss 3.0633   LearningRate 0.0055   Epoch: 15   Global Step: 77340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:06,494-Speed 5562.32 samples/sec   Loss 3.1393   LearningRate 0.0055   Epoch: 15   Global Step: 77350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:08,346-Speed 5530.65 samples/sec   Loss 3.0731   LearningRate 0.0055   Epoch: 15   Global Step: 77360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:10,209-Speed 5497.15 samples/sec   Loss 2.9309   LearningRate 0.0055   Epoch: 15   Global Step: 77370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:12,107-Speed 5398.71 samples/sec   Loss 3.0679   LearningRate 0.0055   Epoch: 15   Global Step: 77380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:13,972-Speed 5492.58 samples/sec   Loss 3.0352   LearningRate 0.0055   Epoch: 15   Global Step: 77390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:15,832-Speed 5506.89 samples/sec   Loss 2.9564   LearningRate 0.0055   Epoch: 15   Global Step: 77400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:17,691-Speed 5509.19 samples/sec   Loss 3.0237   LearningRate 0.0055   Epoch: 15   Global Step: 77410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:19,547-Speed 5520.07 samples/sec   Loss 2.9887   LearningRate 0.0055   Epoch: 15   Global Step: 77420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:21,413-Speed 5488.87 samples/sec   Loss 3.0368   LearningRate 0.0055   Epoch: 15   Global Step: 77430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:23,275-Speed 5501.56 samples/sec   Loss 3.1010   LearningRate 0.0055   Epoch: 15   Global Step: 77440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:14:25,129-Speed 5526.97 samples/sec   Loss 3.0542   LearningRate 0.0055   Epoch: 15   Global Step: 77450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:14:27,016-Speed 5428.98 samples/sec   Loss 3.0429   LearningRate 0.0055   Epoch: 15   Global Step: 77460   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:14:28,861-Speed 5550.29 samples/sec   Loss 2.9228   LearningRate 0.0055   Epoch: 15   Global Step: 77470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:14:30,714-Speed 5529.28 samples/sec   Loss 3.1464   LearningRate 0.0055   Epoch: 15   Global Step: 77480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:14:32,567-Speed 5529.22 samples/sec   Loss 3.0912   LearningRate 0.0055   Epoch: 15   Global Step: 77490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:14:34,411-Speed 5553.96 samples/sec   Loss 3.0366   LearningRate 0.0055   Epoch: 15   Global Step: 77500   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:14:36,300-Speed 5424.85 samples/sec   Loss 3.1299   LearningRate 0.0055   Epoch: 15   Global Step: 77510   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:14:38,148-Speed 5542.57 samples/sec   Loss 3.0720   LearningRate 0.0055   Epoch: 15   Global Step: 77520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:14:40,000-Speed 5530.58 samples/sec   Loss 3.1120   LearningRate 0.0055   Epoch: 15   Global Step: 77530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:14:41,864-Speed 5495.43 samples/sec   Loss 2.9806   LearningRate 0.0055   Epoch: 15   Global Step: 77540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:43,726-Speed 5500.38 samples/sec   Loss 2.9533   LearningRate 0.0054   Epoch: 15   Global Step: 77550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:45,579-Speed 5527.97 samples/sec   Loss 3.0176   LearningRate 0.0054   Epoch: 15   Global Step: 77560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:47,446-Speed 5489.25 samples/sec   Loss 3.0688   LearningRate 0.0054   Epoch: 15   Global Step: 77570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 15:14:49,298-Speed 5531.26 samples/sec   Loss 3.0472   LearningRate 0.0054   Epoch: 15   Global Step: 77580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-11 15:14:51,149-Speed 5533.00 samples/sec   Loss 2.9988   LearningRate 0.0054   Epoch: 15   Global Step: 77590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:14:53,002-Speed 5527.35 samples/sec   Loss 3.0412   LearningRate 0.0054   Epoch: 15   Global Step: 77600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:14:54,861-Speed 5511.72 samples/sec   Loss 2.9801   LearningRate 0.0054   Epoch: 15   Global Step: 77610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:14:56,733-Speed 5473.48 samples/sec   Loss 3.1367   LearningRate 0.0054   Epoch: 15   Global Step: 77620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:14:58,600-Speed 5485.66 samples/sec   Loss 3.0675   LearningRate 0.0054   Epoch: 15   Global Step: 77630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:15:00,486-Speed 5429.75 samples/sec   Loss 3.0931   LearningRate 0.0054   Epoch: 15   Global Step: 77640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:15:02,360-Speed 5467.58 samples/sec   Loss 3.0733   LearningRate 0.0054   Epoch: 15   Global Step: 77650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:15:04,220-Speed 5506.20 samples/sec   Loss 3.0653   LearningRate 0.0054   Epoch: 15   Global Step: 77660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:15:06,076-Speed 5521.16 samples/sec   Loss 3.0354   LearningRate 0.0054   Epoch: 15   Global Step: 77670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:15:07,929-Speed 5527.42 samples/sec   Loss 3.0430   LearningRate 0.0054   Epoch: 15   Global Step: 77680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:09,785-Speed 5519.50 samples/sec   Loss 2.9559   LearningRate 0.0054   Epoch: 15   Global Step: 77690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:11,645-Speed 5507.54 samples/sec   Loss 2.9946   LearningRate 0.0054   Epoch: 15   Global Step: 77700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:13,541-Speed 5402.54 samples/sec   Loss 3.0442   LearningRate 0.0054   Epoch: 15   Global Step: 77710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:15:15,397-Speed 5519.14 samples/sec   Loss 3.0894   LearningRate 0.0054   Epoch: 15   Global Step: 77720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:15:17,246-Speed 5542.59 samples/sec   Loss 3.0249   LearningRate 0.0054   Epoch: 15   Global Step: 77730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:15:19,103-Speed 5515.05 samples/sec   Loss 3.0811   LearningRate 0.0054   Epoch: 15   Global Step: 77740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:15:20,953-Speed 5537.20 samples/sec   Loss 3.0832   LearningRate 0.0054   Epoch: 15   Global Step: 77750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:15:22,812-Speed 5509.58 samples/sec   Loss 3.1343   LearningRate 0.0054   Epoch: 15   Global Step: 77760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:15:24,675-Speed 5497.75 samples/sec   Loss 3.0979   LearningRate 0.0053   Epoch: 15   Global Step: 77770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:15:26,532-Speed 5518.08 samples/sec   Loss 3.1314   LearningRate 0.0053   Epoch: 15   Global Step: 77780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:15:28,382-Speed 5537.77 samples/sec   Loss 2.9793   LearningRate 0.0053   Epoch: 15   Global Step: 77790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:15:30,250-Speed 5484.07 samples/sec   Loss 3.0428   LearningRate 0.0053   Epoch: 15   Global Step: 77800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:15:32,095-Speed 5553.98 samples/sec   Loss 2.9435   LearningRate 0.0053   Epoch: 15   Global Step: 77810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:33,953-Speed 5511.88 samples/sec   Loss 3.0064   LearningRate 0.0053   Epoch: 15   Global Step: 77820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:35,820-Speed 5486.44 samples/sec   Loss 3.1422   LearningRate 0.0053   Epoch: 15   Global Step: 77830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:37,681-Speed 5505.19 samples/sec   Loss 3.0719   LearningRate 0.0053   Epoch: 15   Global Step: 77840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:39,544-Speed 5498.75 samples/sec   Loss 3.1168   LearningRate 0.0053   Epoch: 15   Global Step: 77850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:41,401-Speed 5515.17 samples/sec   Loss 3.0785   LearningRate 0.0053   Epoch: 15   Global Step: 77860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:43,263-Speed 5502.27 samples/sec   Loss 3.0281   LearningRate 0.0053   Epoch: 15   Global Step: 77870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:45,111-Speed 5543.59 samples/sec   Loss 2.9961   LearningRate 0.0053   Epoch: 15   Global Step: 77880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:46,964-Speed 5527.77 samples/sec   Loss 2.9803   LearningRate 0.0053   Epoch: 15   Global Step: 77890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:48,823-Speed 5510.94 samples/sec   Loss 3.0419   LearningRate 0.0053   Epoch: 15   Global Step: 77900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:50,660-Speed 5575.71 samples/sec   Loss 3.0352   LearningRate 0.0053   Epoch: 15   Global Step: 77910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:52,518-Speed 5514.68 samples/sec   Loss 3.0410   LearningRate 0.0053   Epoch: 15   Global Step: 77920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:54,377-Speed 5510.02 samples/sec   Loss 3.0198   LearningRate 0.0053   Epoch: 15   Global Step: 77930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:56,229-Speed 5530.37 samples/sec   Loss 3.0897   LearningRate 0.0053   Epoch: 15   Global Step: 77940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:58,077-Speed 5543.11 samples/sec   Loss 3.1617   LearningRate 0.0053   Epoch: 15   Global Step: 77950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:15:59,933-Speed 5520.69 samples/sec   Loss 3.1506   LearningRate 0.0053   Epoch: 15   Global Step: 77960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:16:01,784-Speed 5533.10 samples/sec   Loss 3.1071   LearningRate 0.0053   Epoch: 15   Global Step: 77970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:16:03,659-Speed 5463.37 samples/sec   Loss 3.0596   LearningRate 0.0053   Epoch: 15   Global Step: 77980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:16:05,518-Speed 5509.23 samples/sec   Loss 3.0366   LearningRate 0.0052   Epoch: 15   Global Step: 77990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:16:07,388-Speed 5478.90 samples/sec   Loss 3.1170   LearningRate 0.0052   Epoch: 15   Global Step: 78000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:16:33,998-[lfw][78000]XNorm: 23.235217
Training: 2022-04-11 15:16:33,999-[lfw][78000]Accuracy-Flip: 0.99767+-0.00260
Training: 2022-04-11 15:16:34,000-[lfw][78000]Accuracy-Highest: 0.99817
Training: 2022-04-11 15:17:04,622-[cfp_fp][78000]XNorm: 21.598676
Training: 2022-04-11 15:17:04,623-[cfp_fp][78000]Accuracy-Flip: 0.98157+-0.00757
Training: 2022-04-11 15:17:04,623-[cfp_fp][78000]Accuracy-Highest: 0.98157
Training: 2022-04-11 15:17:31,096-[agedb_30][78000]XNorm: 23.207164
Training: 2022-04-11 15:17:31,097-[agedb_30][78000]Accuracy-Flip: 0.97950+-0.00723
Training: 2022-04-11 15:17:31,097-[agedb_30][78000]Accuracy-Highest: 0.98267
Training: 2022-04-11 15:17:32,953-Speed 119.68 samples/sec   Loss 3.1293   LearningRate 0.0052   Epoch: 15   Global Step: 78010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:17:34,798-Speed 5552.12 samples/sec   Loss 3.0485   LearningRate 0.0052   Epoch: 15   Global Step: 78020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:17:36,622-Speed 5616.18 samples/sec   Loss 3.0384   LearningRate 0.0052   Epoch: 15   Global Step: 78030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:17:38,452-Speed 5597.79 samples/sec   Loss 3.0227   LearningRate 0.0052   Epoch: 15   Global Step: 78040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:17:40,288-Speed 5579.40 samples/sec   Loss 3.1949   LearningRate 0.0052   Epoch: 15   Global Step: 78050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:17:42,128-Speed 5567.50 samples/sec   Loss 3.1347   LearningRate 0.0052   Epoch: 15   Global Step: 78060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:17:43,950-Speed 5620.86 samples/sec   Loss 3.1305   LearningRate 0.0052   Epoch: 15   Global Step: 78070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:17:45,774-Speed 5617.11 samples/sec   Loss 3.0919   LearningRate 0.0052   Epoch: 15   Global Step: 78080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:17:47,616-Speed 5561.57 samples/sec   Loss 3.0798   LearningRate 0.0052   Epoch: 15   Global Step: 78090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:17:49,447-Speed 5595.40 samples/sec   Loss 3.1009   LearningRate 0.0052   Epoch: 15   Global Step: 78100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:17:51,306-Speed 5511.35 samples/sec   Loss 3.0278   LearningRate 0.0052   Epoch: 15   Global Step: 78110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:17:53,144-Speed 5573.77 samples/sec   Loss 2.8683   LearningRate 0.0052   Epoch: 15   Global Step: 78120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:17:54,972-Speed 5604.01 samples/sec   Loss 3.2062   LearningRate 0.0052   Epoch: 15   Global Step: 78130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:17:56,800-Speed 5603.07 samples/sec   Loss 3.1088   LearningRate 0.0052   Epoch: 15   Global Step: 78140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:17:58,630-Speed 5597.25 samples/sec   Loss 3.1370   LearningRate 0.0052   Epoch: 15   Global Step: 78150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:00,446-Speed 5641.60 samples/sec   Loss 3.1085   LearningRate 0.0052   Epoch: 15   Global Step: 78160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:02,272-Speed 5609.20 samples/sec   Loss 3.0761   LearningRate 0.0052   Epoch: 15   Global Step: 78170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:04,110-Speed 5575.02 samples/sec   Loss 3.2011   LearningRate 0.0052   Epoch: 15   Global Step: 78180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:05,950-Speed 5565.87 samples/sec   Loss 3.0625   LearningRate 0.0052   Epoch: 15   Global Step: 78190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:07,777-Speed 5606.61 samples/sec   Loss 3.0644   LearningRate 0.0052   Epoch: 15   Global Step: 78200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:09,607-Speed 5596.31 samples/sec   Loss 3.1058   LearningRate 0.0051   Epoch: 15   Global Step: 78210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:11,452-Speed 5553.74 samples/sec   Loss 3.0844   LearningRate 0.0051   Epoch: 15   Global Step: 78220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:13,290-Speed 5572.14 samples/sec   Loss 3.2531   LearningRate 0.0051   Epoch: 15   Global Step: 78230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:15,128-Speed 5576.29 samples/sec   Loss 3.1194   LearningRate 0.0051   Epoch: 15   Global Step: 78240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:16,955-Speed 5605.98 samples/sec   Loss 3.0814   LearningRate 0.0051   Epoch: 15   Global Step: 78250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:18,801-Speed 5548.43 samples/sec   Loss 3.0068   LearningRate 0.0051   Epoch: 15   Global Step: 78260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:18:20,623-Speed 5624.62 samples/sec   Loss 3.1956   LearningRate 0.0051   Epoch: 15   Global Step: 78270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:22,451-Speed 5603.94 samples/sec   Loss 3.0903   LearningRate 0.0051   Epoch: 15   Global Step: 78280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:24,286-Speed 5581.61 samples/sec   Loss 3.0076   LearningRate 0.0051   Epoch: 15   Global Step: 78290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:26,126-Speed 5567.13 samples/sec   Loss 3.1662   LearningRate 0.0051   Epoch: 15   Global Step: 78300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:27,959-Speed 5590.89 samples/sec   Loss 3.1563   LearningRate 0.0051   Epoch: 15   Global Step: 78310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:29,791-Speed 5588.79 samples/sec   Loss 2.9984   LearningRate 0.0051   Epoch: 15   Global Step: 78320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:31,628-Speed 5577.11 samples/sec   Loss 3.1548   LearningRate 0.0051   Epoch: 15   Global Step: 78330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:33,459-Speed 5595.09 samples/sec   Loss 3.1559   LearningRate 0.0051   Epoch: 15   Global Step: 78340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:35,296-Speed 5577.29 samples/sec   Loss 3.0980   LearningRate 0.0051   Epoch: 15   Global Step: 78350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:37,141-Speed 5552.82 samples/sec   Loss 3.0897   LearningRate 0.0051   Epoch: 15   Global Step: 78360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:38,975-Speed 5585.53 samples/sec   Loss 3.1337   LearningRate 0.0051   Epoch: 15   Global Step: 78370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:18:40,805-Speed 5595.46 samples/sec   Loss 2.9497   LearningRate 0.0051   Epoch: 15   Global Step: 78380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:18:42,655-Speed 5538.89 samples/sec   Loss 3.2754   LearningRate 0.0051   Epoch: 15   Global Step: 78390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:18:44,495-Speed 5567.72 samples/sec   Loss 3.2126   LearningRate 0.0051   Epoch: 15   Global Step: 78400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:18:46,326-Speed 5594.18 samples/sec   Loss 3.0949   LearningRate 0.0051   Epoch: 15   Global Step: 78410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:48,157-Speed 5595.75 samples/sec   Loss 3.0379   LearningRate 0.0051   Epoch: 15   Global Step: 78420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:49,991-Speed 5584.74 samples/sec   Loss 2.9798   LearningRate 0.0050   Epoch: 15   Global Step: 78430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:51,883-Speed 5414.64 samples/sec   Loss 3.0962   LearningRate 0.0050   Epoch: 15   Global Step: 78440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:53,726-Speed 5560.01 samples/sec   Loss 3.0795   LearningRate 0.0050   Epoch: 15   Global Step: 78450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:55,557-Speed 5593.32 samples/sec   Loss 3.0475   LearningRate 0.0050   Epoch: 15   Global Step: 78460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:57,404-Speed 5544.71 samples/sec   Loss 3.0510   LearningRate 0.0050   Epoch: 15   Global Step: 78470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:18:59,241-Speed 5576.99 samples/sec   Loss 3.1677   LearningRate 0.0050   Epoch: 15   Global Step: 78480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:01,076-Speed 5583.83 samples/sec   Loss 3.0492   LearningRate 0.0050   Epoch: 15   Global Step: 78490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:02,908-Speed 5592.23 samples/sec   Loss 3.0786   LearningRate 0.0050   Epoch: 15   Global Step: 78500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:04,742-Speed 5584.73 samples/sec   Loss 2.9712   LearningRate 0.0050   Epoch: 15   Global Step: 78510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:06,597-Speed 5523.90 samples/sec   Loss 3.0766   LearningRate 0.0050   Epoch: 15   Global Step: 78520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:08,427-Speed 5598.28 samples/sec   Loss 3.0676   LearningRate 0.0050   Epoch: 15   Global Step: 78530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:10,271-Speed 5554.56 samples/sec   Loss 3.1089   LearningRate 0.0050   Epoch: 15   Global Step: 78540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:12,111-Speed 5566.97 samples/sec   Loss 2.9658   LearningRate 0.0050   Epoch: 15   Global Step: 78550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:13,960-Speed 5540.66 samples/sec   Loss 3.0896   LearningRate 0.0050   Epoch: 15   Global Step: 78560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:15,865-Speed 5375.40 samples/sec   Loss 3.1389   LearningRate 0.0050   Epoch: 15   Global Step: 78570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:17,709-Speed 5556.28 samples/sec   Loss 2.9380   LearningRate 0.0050   Epoch: 15   Global Step: 78580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:19,540-Speed 5594.10 samples/sec   Loss 2.9932   LearningRate 0.0050   Epoch: 15   Global Step: 78590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:21,386-Speed 5551.08 samples/sec   Loss 2.9898   LearningRate 0.0050   Epoch: 15   Global Step: 78600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:23,220-Speed 5585.54 samples/sec   Loss 2.9889   LearningRate 0.0050   Epoch: 15   Global Step: 78610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:19:25,041-Speed 5625.35 samples/sec   Loss 3.0494   LearningRate 0.0050   Epoch: 15   Global Step: 78620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:26,880-Speed 5571.25 samples/sec   Loss 3.0703   LearningRate 0.0050   Epoch: 15   Global Step: 78630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:28,715-Speed 5581.89 samples/sec   Loss 2.9931   LearningRate 0.0050   Epoch: 15   Global Step: 78640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:30,544-Speed 5600.21 samples/sec   Loss 3.0727   LearningRate 0.0050   Epoch: 15   Global Step: 78650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:32,393-Speed 5539.04 samples/sec   Loss 3.1425   LearningRate 0.0049   Epoch: 15   Global Step: 78660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:34,226-Speed 5590.42 samples/sec   Loss 3.0148   LearningRate 0.0049   Epoch: 15   Global Step: 78670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:36,083-Speed 5513.85 samples/sec   Loss 3.1366   LearningRate 0.0049   Epoch: 15   Global Step: 78680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:37,940-Speed 5517.69 samples/sec   Loss 3.0248   LearningRate 0.0049   Epoch: 15   Global Step: 78690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:39,770-Speed 5598.25 samples/sec   Loss 3.1400   LearningRate 0.0049   Epoch: 15   Global Step: 78700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:41,607-Speed 5577.45 samples/sec   Loss 3.1139   LearningRate 0.0049   Epoch: 15   Global Step: 78710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:43,443-Speed 5579.16 samples/sec   Loss 3.1004   LearningRate 0.0049   Epoch: 15   Global Step: 78720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:19:45,262-Speed 5630.22 samples/sec   Loss 3.1300   LearningRate 0.0049   Epoch: 15   Global Step: 78730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:47,108-Speed 5549.22 samples/sec   Loss 3.0853   LearningRate 0.0049   Epoch: 15   Global Step: 78740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:48,938-Speed 5599.33 samples/sec   Loss 2.9865   LearningRate 0.0049   Epoch: 15   Global Step: 78750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:50,798-Speed 5505.47 samples/sec   Loss 3.0204   LearningRate 0.0049   Epoch: 15   Global Step: 78760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:52,637-Speed 5570.54 samples/sec   Loss 3.0103   LearningRate 0.0049   Epoch: 15   Global Step: 78770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:54,472-Speed 5582.29 samples/sec   Loss 3.1004   LearningRate 0.0049   Epoch: 15   Global Step: 78780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:56,302-Speed 5600.71 samples/sec   Loss 3.0969   LearningRate 0.0049   Epoch: 15   Global Step: 78790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:58,131-Speed 5598.99 samples/sec   Loss 3.1596   LearningRate 0.0049   Epoch: 15   Global Step: 78800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:19:59,964-Speed 5588.18 samples/sec   Loss 3.0313   LearningRate 0.0049   Epoch: 15   Global Step: 78810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:01,807-Speed 5560.46 samples/sec   Loss 3.0651   LearningRate 0.0049   Epoch: 15   Global Step: 78820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:03,639-Speed 5592.20 samples/sec   Loss 3.0514   LearningRate 0.0049   Epoch: 15   Global Step: 78830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:05,470-Speed 5594.88 samples/sec   Loss 3.0636   LearningRate 0.0049   Epoch: 15   Global Step: 78840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:07,301-Speed 5592.64 samples/sec   Loss 2.9551   LearningRate 0.0049   Epoch: 15   Global Step: 78850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:09,130-Speed 5600.56 samples/sec   Loss 3.0725   LearningRate 0.0049   Epoch: 15   Global Step: 78860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:10,971-Speed 5563.53 samples/sec   Loss 3.1512   LearningRate 0.0049   Epoch: 15   Global Step: 78870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:12,806-Speed 5585.19 samples/sec   Loss 3.1189   LearningRate 0.0049   Epoch: 15   Global Step: 78880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:14,640-Speed 5585.22 samples/sec   Loss 3.1240   LearningRate 0.0048   Epoch: 15   Global Step: 78890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:16,485-Speed 5552.51 samples/sec   Loss 3.0332   LearningRate 0.0048   Epoch: 15   Global Step: 78900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:18,336-Speed 5533.52 samples/sec   Loss 3.0293   LearningRate 0.0048   Epoch: 15   Global Step: 78910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:20,158-Speed 5622.47 samples/sec   Loss 3.1434   LearningRate 0.0048   Epoch: 15   Global Step: 78920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:20:21,989-Speed 5595.10 samples/sec   Loss 3.1669   LearningRate 0.0048   Epoch: 15   Global Step: 78930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:20:23,834-Speed 5553.81 samples/sec   Loss 3.0995   LearningRate 0.0048   Epoch: 15   Global Step: 78940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:20:25,665-Speed 5593.42 samples/sec   Loss 3.0535   LearningRate 0.0048   Epoch: 15   Global Step: 78950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:20:27,493-Speed 5604.88 samples/sec   Loss 3.1318   LearningRate 0.0048   Epoch: 15   Global Step: 78960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:20:29,327-Speed 5585.60 samples/sec   Loss 3.1999   LearningRate 0.0048   Epoch: 15   Global Step: 78970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:20:31,163-Speed 5576.11 samples/sec   Loss 3.1674   LearningRate 0.0048   Epoch: 15   Global Step: 78980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:20:33,010-Speed 5547.74 samples/sec   Loss 3.1469   LearningRate 0.0048   Epoch: 15   Global Step: 78990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:20:34,877-Speed 5485.68 samples/sec   Loss 2.9757   LearningRate 0.0048   Epoch: 15   Global Step: 79000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:20:36,723-Speed 5550.33 samples/sec   Loss 3.1917   LearningRate 0.0048   Epoch: 15   Global Step: 79010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:20:38,608-Speed 5435.47 samples/sec   Loss 3.0288   LearningRate 0.0048   Epoch: 15   Global Step: 79020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:40,454-Speed 5548.29 samples/sec   Loss 3.0938   LearningRate 0.0048   Epoch: 15   Global Step: 79030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:42,291-Speed 5576.88 samples/sec   Loss 3.0522   LearningRate 0.0048   Epoch: 15   Global Step: 79040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:44,132-Speed 5565.21 samples/sec   Loss 3.0750   LearningRate 0.0048   Epoch: 15   Global Step: 79050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:45,968-Speed 5578.82 samples/sec   Loss 2.9221   LearningRate 0.0048   Epoch: 15   Global Step: 79060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:47,811-Speed 5560.40 samples/sec   Loss 3.1093   LearningRate 0.0048   Epoch: 15   Global Step: 79070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:49,653-Speed 5558.40 samples/sec   Loss 3.0658   LearningRate 0.0048   Epoch: 15   Global Step: 79080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:51,484-Speed 5596.46 samples/sec   Loss 3.1300   LearningRate 0.0048   Epoch: 15   Global Step: 79090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:53,326-Speed 5560.34 samples/sec   Loss 3.0554   LearningRate 0.0048   Epoch: 15   Global Step: 79100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:55,157-Speed 5594.04 samples/sec   Loss 3.1118   LearningRate 0.0048   Epoch: 15   Global Step: 79110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:56,983-Speed 5613.27 samples/sec   Loss 3.1605   LearningRate 0.0047   Epoch: 15   Global Step: 79120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:20:58,870-Speed 5426.93 samples/sec   Loss 3.3090   LearningRate 0.0047   Epoch: 15   Global Step: 79130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:00,706-Speed 5581.68 samples/sec   Loss 3.1913   LearningRate 0.0047   Epoch: 15   Global Step: 79140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:02,562-Speed 5517.78 samples/sec   Loss 3.0481   LearningRate 0.0047   Epoch: 15   Global Step: 79150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:04,396-Speed 5588.26 samples/sec   Loss 3.0166   LearningRate 0.0047   Epoch: 15   Global Step: 79160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:06,233-Speed 5574.63 samples/sec   Loss 3.1591   LearningRate 0.0047   Epoch: 15   Global Step: 79170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:08,073-Speed 5569.21 samples/sec   Loss 3.1557   LearningRate 0.0047   Epoch: 15   Global Step: 79180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:09,905-Speed 5589.29 samples/sec   Loss 3.0788   LearningRate 0.0047   Epoch: 15   Global Step: 79190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:11,768-Speed 5512.09 samples/sec   Loss 3.0777   LearningRate 0.0047   Epoch: 15   Global Step: 79200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:13,612-Speed 5556.56 samples/sec   Loss 3.2802   LearningRate 0.0047   Epoch: 15   Global Step: 79210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:15,454-Speed 5560.90 samples/sec   Loss 3.0789   LearningRate 0.0047   Epoch: 15   Global Step: 79220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:21:17,292-Speed 5572.63 samples/sec   Loss 3.0801   LearningRate 0.0047   Epoch: 15   Global Step: 79230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:21:19,135-Speed 5558.79 samples/sec   Loss 3.1002   LearningRate 0.0047   Epoch: 15   Global Step: 79240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:20,973-Speed 5573.50 samples/sec   Loss 3.0997   LearningRate 0.0047   Epoch: 15   Global Step: 79250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:22,801-Speed 5605.37 samples/sec   Loss 2.9216   LearningRate 0.0047   Epoch: 15   Global Step: 79260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:24,635-Speed 5583.39 samples/sec   Loss 3.0732   LearningRate 0.0047   Epoch: 15   Global Step: 79270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:26,506-Speed 5475.07 samples/sec   Loss 3.0479   LearningRate 0.0047   Epoch: 15   Global Step: 79280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:28,337-Speed 5594.89 samples/sec   Loss 3.0850   LearningRate 0.0047   Epoch: 15   Global Step: 79290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:30,166-Speed 5599.94 samples/sec   Loss 3.0838   LearningRate 0.0047   Epoch: 15   Global Step: 79300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:32,000-Speed 5588.21 samples/sec   Loss 3.1351   LearningRate 0.0047   Epoch: 15   Global Step: 79310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:33,830-Speed 5597.32 samples/sec   Loss 3.2219   LearningRate 0.0047   Epoch: 15   Global Step: 79320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:35,668-Speed 5570.40 samples/sec   Loss 3.0926   LearningRate 0.0047   Epoch: 15   Global Step: 79330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:37,496-Speed 5604.25 samples/sec   Loss 3.1859   LearningRate 0.0047   Epoch: 15   Global Step: 79340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:39,337-Speed 5566.28 samples/sec   Loss 3.1233   LearningRate 0.0046   Epoch: 15   Global Step: 79350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:41,166-Speed 5598.87 samples/sec   Loss 2.9555   LearningRate 0.0046   Epoch: 15   Global Step: 79360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:42,994-Speed 5605.24 samples/sec   Loss 3.2222   LearningRate 0.0046   Epoch: 15   Global Step: 79370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:44,828-Speed 5585.25 samples/sec   Loss 3.1194   LearningRate 0.0046   Epoch: 15   Global Step: 79380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:46,657-Speed 5602.17 samples/sec   Loss 2.9793   LearningRate 0.0046   Epoch: 15   Global Step: 79390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:48,502-Speed 5551.58 samples/sec   Loss 3.0309   LearningRate 0.0046   Epoch: 15   Global Step: 79400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:50,335-Speed 5588.12 samples/sec   Loss 3.0369   LearningRate 0.0046   Epoch: 15   Global Step: 79410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:52,172-Speed 5576.24 samples/sec   Loss 3.0285   LearningRate 0.0046   Epoch: 15   Global Step: 79420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:54,002-Speed 5598.83 samples/sec   Loss 3.1162   LearningRate 0.0046   Epoch: 15   Global Step: 79430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:21:55,833-Speed 5594.81 samples/sec   Loss 3.1633   LearningRate 0.0046   Epoch: 15   Global Step: 79440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:21:57,672-Speed 5571.20 samples/sec   Loss 3.1024   LearningRate 0.0046   Epoch: 15   Global Step: 79450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:21:59,514-Speed 5559.15 samples/sec   Loss 3.0345   LearningRate 0.0046   Epoch: 15   Global Step: 79460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:22:01,369-Speed 5522.11 samples/sec   Loss 3.1284   LearningRate 0.0046   Epoch: 15   Global Step: 79470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:22:03,198-Speed 5603.52 samples/sec   Loss 3.0245   LearningRate 0.0046   Epoch: 15   Global Step: 79480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:05,038-Speed 5566.94 samples/sec   Loss 3.0701   LearningRate 0.0046   Epoch: 15   Global Step: 79490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:06,892-Speed 5526.64 samples/sec   Loss 3.0704   LearningRate 0.0046   Epoch: 15   Global Step: 79500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:08,722-Speed 5596.49 samples/sec   Loss 3.1270   LearningRate 0.0046   Epoch: 15   Global Step: 79510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:10,559-Speed 5576.63 samples/sec   Loss 3.0290   LearningRate 0.0046   Epoch: 15   Global Step: 79520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:12,394-Speed 5583.88 samples/sec   Loss 3.1848   LearningRate 0.0046   Epoch: 15   Global Step: 79530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:14,229-Speed 5579.88 samples/sec   Loss 3.0873   LearningRate 0.0046   Epoch: 15   Global Step: 79540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:16,083-Speed 5527.26 samples/sec   Loss 3.1832   LearningRate 0.0046   Epoch: 15   Global Step: 79550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:17,918-Speed 5581.67 samples/sec   Loss 3.1593   LearningRate 0.0046   Epoch: 15   Global Step: 79560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:19,749-Speed 5594.92 samples/sec   Loss 3.1269   LearningRate 0.0046   Epoch: 15   Global Step: 79570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:21,574-Speed 5615.28 samples/sec   Loss 3.0358   LearningRate 0.0046   Epoch: 15   Global Step: 79580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:22:23,412-Speed 5571.90 samples/sec   Loss 2.9896   LearningRate 0.0045   Epoch: 15   Global Step: 79590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:22:25,265-Speed 5527.59 samples/sec   Loss 3.0420   LearningRate 0.0045   Epoch: 15   Global Step: 79600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:22:27,105-Speed 5568.04 samples/sec   Loss 3.1072   LearningRate 0.0045   Epoch: 15   Global Step: 79610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:22:28,961-Speed 5518.64 samples/sec   Loss 3.0894   LearningRate 0.0045   Epoch: 15   Global Step: 79620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:22:30,800-Speed 5568.86 samples/sec   Loss 3.0001   LearningRate 0.0045   Epoch: 15   Global Step: 79630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:22:32,631-Speed 5598.53 samples/sec   Loss 3.0837   LearningRate 0.0045   Epoch: 15   Global Step: 79640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:22:34,463-Speed 5589.25 samples/sec   Loss 3.1040   LearningRate 0.0045   Epoch: 15   Global Step: 79650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:22:36,302-Speed 5569.03 samples/sec   Loss 3.1423   LearningRate 0.0045   Epoch: 15   Global Step: 79660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:22:38,136-Speed 5586.98 samples/sec   Loss 3.1401   LearningRate 0.0045   Epoch: 15   Global Step: 79670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:22:39,980-Speed 5556.73 samples/sec   Loss 3.2089   LearningRate 0.0045   Epoch: 15   Global Step: 79680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:41,813-Speed 5590.03 samples/sec   Loss 3.1374   LearningRate 0.0045   Epoch: 15   Global Step: 79690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:43,646-Speed 5586.62 samples/sec   Loss 3.1576   LearningRate 0.0045   Epoch: 15   Global Step: 79700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:45,478-Speed 5592.17 samples/sec   Loss 3.1834   LearningRate 0.0045   Epoch: 15   Global Step: 79710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:47,310-Speed 5591.98 samples/sec   Loss 2.9937   LearningRate 0.0045   Epoch: 15   Global Step: 79720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:49,144-Speed 5586.33 samples/sec   Loss 3.1155   LearningRate 0.0045   Epoch: 15   Global Step: 79730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:50,978-Speed 5583.01 samples/sec   Loss 3.1524   LearningRate 0.0045   Epoch: 15   Global Step: 79740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:52,811-Speed 5588.93 samples/sec   Loss 3.1086   LearningRate 0.0045   Epoch: 15   Global Step: 79750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:54,640-Speed 5601.34 samples/sec   Loss 3.1983   LearningRate 0.0045   Epoch: 15   Global Step: 79760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:56,471-Speed 5595.68 samples/sec   Loss 3.0329   LearningRate 0.0045   Epoch: 15   Global Step: 79770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:22:58,310-Speed 5570.06 samples/sec   Loss 3.0675   LearningRate 0.0045   Epoch: 15   Global Step: 79780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:23:00,164-Speed 5525.25 samples/sec   Loss 3.1024   LearningRate 0.0045   Epoch: 15   Global Step: 79790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:23:02,034-Speed 5480.06 samples/sec   Loss 3.0906   LearningRate 0.0045   Epoch: 15   Global Step: 79800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:23:03,869-Speed 5580.24 samples/sec   Loss 3.1775   LearningRate 0.0045   Epoch: 15   Global Step: 79810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:23:05,709-Speed 5569.30 samples/sec   Loss 3.1072   LearningRate 0.0045   Epoch: 15   Global Step: 79820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:23:07,551-Speed 5560.91 samples/sec   Loss 3.0548   LearningRate 0.0044   Epoch: 15   Global Step: 79830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:23:09,383-Speed 5590.67 samples/sec   Loss 2.9856   LearningRate 0.0044   Epoch: 15   Global Step: 79840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:23:11,214-Speed 5594.19 samples/sec   Loss 3.1550   LearningRate 0.0044   Epoch: 15   Global Step: 79850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:23:13,075-Speed 5506.89 samples/sec   Loss 3.1083   LearningRate 0.0044   Epoch: 15   Global Step: 79860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:23:14,908-Speed 5585.98 samples/sec   Loss 2.9291   LearningRate 0.0044   Epoch: 15   Global Step: 79870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:23:16,847-Speed 5285.29 samples/sec   Loss 3.0759   LearningRate 0.0044   Epoch: 15   Global Step: 79880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:23:18,688-Speed 5561.78 samples/sec   Loss 3.0639   LearningRate 0.0044   Epoch: 15   Global Step: 79890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:23:20,518-Speed 5598.44 samples/sec   Loss 3.0020   LearningRate 0.0044   Epoch: 15   Global Step: 79900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:23:22,347-Speed 5602.10 samples/sec   Loss 2.9572   LearningRate 0.0044   Epoch: 15   Global Step: 79910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:23:24,189-Speed 5559.52 samples/sec   Loss 3.2200   LearningRate 0.0044   Epoch: 15   Global Step: 79920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:23:26,025-Speed 5580.45 samples/sec   Loss 3.0910   LearningRate 0.0044   Epoch: 15   Global Step: 79930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:23:27,858-Speed 5591.04 samples/sec   Loss 3.1216   LearningRate 0.0044   Epoch: 15   Global Step: 79940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:23:29,695-Speed 5574.28 samples/sec   Loss 3.1511   LearningRate 0.0044   Epoch: 15   Global Step: 79950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:23:31,533-Speed 5573.94 samples/sec   Loss 3.0854   LearningRate 0.0044   Epoch: 15   Global Step: 79960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:23:33,366-Speed 5590.02 samples/sec   Loss 2.9992   LearningRate 0.0044   Epoch: 15   Global Step: 79970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:23:35,199-Speed 5586.53 samples/sec   Loss 3.2457   LearningRate 0.0044   Epoch: 15   Global Step: 79980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:23:37,035-Speed 5581.05 samples/sec   Loss 3.1564   LearningRate 0.0044   Epoch: 15   Global Step: 79990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:23:38,868-Speed 5587.43 samples/sec   Loss 3.1343   LearningRate 0.0044   Epoch: 15   Global Step: 80000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:24:05,393-[lfw][80000]XNorm: 22.622122
Training: 2022-04-11 15:24:05,394-[lfw][80000]Accuracy-Flip: 0.99733+-0.00291
Training: 2022-04-11 15:24:05,394-[lfw][80000]Accuracy-Highest: 0.99817
Training: 2022-04-11 15:24:36,103-[cfp_fp][80000]XNorm: 21.158114
Training: 2022-04-11 15:24:36,104-[cfp_fp][80000]Accuracy-Flip: 0.98086+-0.00683
Training: 2022-04-11 15:24:36,104-[cfp_fp][80000]Accuracy-Highest: 0.98157
Training: 2022-04-11 15:25:02,642-[agedb_30][80000]XNorm: 22.624514
Training: 2022-04-11 15:25:02,643-[agedb_30][80000]Accuracy-Flip: 0.98350+-0.00693
Training: 2022-04-11 15:25:02,643-[agedb_30][80000]Accuracy-Highest: 0.98350
Training: 2022-04-11 15:25:04,526-Speed 119.55 samples/sec   Loss 3.1165   LearningRate 0.0044   Epoch: 15   Global Step: 80010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:25:06,382-Speed 5518.12 samples/sec   Loss 3.1257   LearningRate 0.0044   Epoch: 15   Global Step: 80020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:25:08,228-Speed 5550.10 samples/sec   Loss 3.1735   LearningRate 0.0044   Epoch: 15   Global Step: 80030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:25:10,041-Speed 5648.93 samples/sec   Loss 3.0355   LearningRate 0.0044   Epoch: 15   Global Step: 80040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:25:11,879-Speed 5571.65 samples/sec   Loss 3.0633   LearningRate 0.0044   Epoch: 15   Global Step: 80050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:25:13,741-Speed 5504.14 samples/sec   Loss 3.0736   LearningRate 0.0044   Epoch: 15   Global Step: 80060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:25:15,621-Speed 5448.66 samples/sec   Loss 3.0122   LearningRate 0.0043   Epoch: 15   Global Step: 80070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:25:17,468-Speed 5544.42 samples/sec   Loss 2.9589   LearningRate 0.0043   Epoch: 15   Global Step: 80080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:25:19,322-Speed 5528.45 samples/sec   Loss 3.0902   LearningRate 0.0043   Epoch: 15   Global Step: 80090   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:25:21,146-Speed 5615.92 samples/sec   Loss 3.1001   LearningRate 0.0043   Epoch: 15   Global Step: 80100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:25:22,979-Speed 5587.46 samples/sec   Loss 3.1114   LearningRate 0.0043   Epoch: 15   Global Step: 80110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:25:24,808-Speed 5601.00 samples/sec   Loss 3.1618   LearningRate 0.0043   Epoch: 15   Global Step: 80120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:25:26,638-Speed 5596.98 samples/sec   Loss 3.1711   LearningRate 0.0043   Epoch: 15   Global Step: 80130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:25:28,467-Speed 5600.22 samples/sec   Loss 3.0840   LearningRate 0.0043   Epoch: 15   Global Step: 80140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:25:30,296-Speed 5602.55 samples/sec   Loss 3.0022   LearningRate 0.0043   Epoch: 15   Global Step: 80150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:25:32,124-Speed 5603.70 samples/sec   Loss 2.9964   LearningRate 0.0043   Epoch: 15   Global Step: 80160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:25:33,965-Speed 5562.84 samples/sec   Loss 3.1237   LearningRate 0.0043   Epoch: 15   Global Step: 80170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:25:35,799-Speed 5584.70 samples/sec   Loss 3.1032   LearningRate 0.0043   Epoch: 15   Global Step: 80180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:25:37,627-Speed 5608.06 samples/sec   Loss 3.0226   LearningRate 0.0043   Epoch: 15   Global Step: 80190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:25:39,467-Speed 5566.74 samples/sec   Loss 3.1666   LearningRate 0.0043   Epoch: 15   Global Step: 80200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:25:41,298-Speed 5594.45 samples/sec   Loss 3.1076   LearningRate 0.0043   Epoch: 15   Global Step: 80210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:25:43,132-Speed 5583.73 samples/sec   Loss 3.0631   LearningRate 0.0043   Epoch: 15   Global Step: 80220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:25:44,967-Speed 5583.14 samples/sec   Loss 3.1475   LearningRate 0.0043   Epoch: 15   Global Step: 80230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:25:46,788-Speed 5623.85 samples/sec   Loss 3.1369   LearningRate 0.0043   Epoch: 15   Global Step: 80240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:25:48,622-Speed 5586.36 samples/sec   Loss 3.0572   LearningRate 0.0043   Epoch: 15   Global Step: 80250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:25:50,453-Speed 5594.57 samples/sec   Loss 3.1197   LearningRate 0.0043   Epoch: 15   Global Step: 80260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:25:52,289-Speed 5579.47 samples/sec   Loss 3.0364   LearningRate 0.0043   Epoch: 15   Global Step: 80270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:25:54,116-Speed 5607.50 samples/sec   Loss 3.0882   LearningRate 0.0043   Epoch: 15   Global Step: 80280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:25:55,950-Speed 5585.49 samples/sec   Loss 3.0875   LearningRate 0.0043   Epoch: 15   Global Step: 80290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:25:57,787-Speed 5576.62 samples/sec   Loss 3.0345   LearningRate 0.0043   Epoch: 15   Global Step: 80300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:25:59,618-Speed 5596.32 samples/sec   Loss 3.1267   LearningRate 0.0042   Epoch: 15   Global Step: 80310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:26:01,461-Speed 5556.94 samples/sec   Loss 3.1355   LearningRate 0.0042   Epoch: 15   Global Step: 80320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:26:03,306-Speed 5555.65 samples/sec   Loss 3.1199   LearningRate 0.0042   Epoch: 15   Global Step: 80330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:26:05,144-Speed 5572.53 samples/sec   Loss 3.0435   LearningRate 0.0042   Epoch: 15   Global Step: 80340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:26:06,977-Speed 5587.51 samples/sec   Loss 3.0703   LearningRate 0.0042   Epoch: 15   Global Step: 80350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:26:08,803-Speed 5610.64 samples/sec   Loss 3.0932   LearningRate 0.0042   Epoch: 15   Global Step: 80360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:26:10,630-Speed 5606.95 samples/sec   Loss 3.0170   LearningRate 0.0042   Epoch: 15   Global Step: 80370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:26:12,471-Speed 5563.27 samples/sec   Loss 3.0134   LearningRate 0.0042   Epoch: 15   Global Step: 80380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:26:14,298-Speed 5606.64 samples/sec   Loss 3.0217   LearningRate 0.0042   Epoch: 15   Global Step: 80390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:26:16,117-Speed 5630.43 samples/sec   Loss 3.1269   LearningRate 0.0042   Epoch: 15   Global Step: 80400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:26:17,958-Speed 5565.74 samples/sec   Loss 3.0846   LearningRate 0.0042   Epoch: 15   Global Step: 80410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:26:19,788-Speed 5597.51 samples/sec   Loss 3.1587   LearningRate 0.0042   Epoch: 15   Global Step: 80420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:26:21,621-Speed 5589.88 samples/sec   Loss 3.0869   LearningRate 0.0042   Epoch: 15   Global Step: 80430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:26:23,451-Speed 5596.79 samples/sec   Loss 2.9992   LearningRate 0.0042   Epoch: 15   Global Step: 80440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:26:25,285-Speed 5587.18 samples/sec   Loss 2.9340   LearningRate 0.0042   Epoch: 15   Global Step: 80450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:26:27,113-Speed 5602.25 samples/sec   Loss 3.1375   LearningRate 0.0042   Epoch: 15   Global Step: 80460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:26:28,945-Speed 5592.60 samples/sec   Loss 3.1117   LearningRate 0.0042   Epoch: 15   Global Step: 80470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:26:30,774-Speed 5601.51 samples/sec   Loss 3.0604   LearningRate 0.0042   Epoch: 15   Global Step: 80480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:26:32,604-Speed 5596.38 samples/sec   Loss 3.0598   LearningRate 0.0042   Epoch: 15   Global Step: 80490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:26:34,442-Speed 5573.63 samples/sec   Loss 3.0085   LearningRate 0.0042   Epoch: 15   Global Step: 80500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:26:36,289-Speed 5544.27 samples/sec   Loss 3.1089   LearningRate 0.0042   Epoch: 15   Global Step: 80510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:26:38,194-Speed 5379.79 samples/sec   Loss 3.2519   LearningRate 0.0042   Epoch: 15   Global Step: 80520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:26:40,046-Speed 5529.96 samples/sec   Loss 3.0749   LearningRate 0.0042   Epoch: 15   Global Step: 80530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:26:41,878-Speed 5592.78 samples/sec   Loss 3.1517   LearningRate 0.0042   Epoch: 15   Global Step: 80540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:26:43,713-Speed 5583.66 samples/sec   Loss 3.1444   LearningRate 0.0042   Epoch: 15   Global Step: 80550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:26:45,544-Speed 5592.62 samples/sec   Loss 3.0856   LearningRate 0.0041   Epoch: 15   Global Step: 80560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:26:47,377-Speed 5587.87 samples/sec   Loss 3.1112   LearningRate 0.0041   Epoch: 15   Global Step: 80570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:26:49,210-Speed 5589.54 samples/sec   Loss 3.0089   LearningRate 0.0041   Epoch: 15   Global Step: 80580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:26:51,043-Speed 5589.39 samples/sec   Loss 3.0668   LearningRate 0.0041   Epoch: 15   Global Step: 80590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:26:52,886-Speed 5558.16 samples/sec   Loss 3.0791   LearningRate 0.0041   Epoch: 15   Global Step: 80600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:26:54,728-Speed 5561.11 samples/sec   Loss 3.0244   LearningRate 0.0041   Epoch: 15   Global Step: 80610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:26:56,559-Speed 5592.97 samples/sec   Loss 3.0969   LearningRate 0.0041   Epoch: 15   Global Step: 80620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:26:58,394-Speed 5583.20 samples/sec   Loss 2.9973   LearningRate 0.0041   Epoch: 15   Global Step: 80630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:00,226-Speed 5593.83 samples/sec   Loss 3.0314   LearningRate 0.0041   Epoch: 15   Global Step: 80640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:02,065-Speed 5567.63 samples/sec   Loss 3.0279   LearningRate 0.0041   Epoch: 15   Global Step: 80650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:03,906-Speed 5566.51 samples/sec   Loss 3.0987   LearningRate 0.0041   Epoch: 15   Global Step: 80660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:05,750-Speed 5554.24 samples/sec   Loss 3.0707   LearningRate 0.0041   Epoch: 15   Global Step: 80670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:07,588-Speed 5573.96 samples/sec   Loss 3.0934   LearningRate 0.0041   Epoch: 15   Global Step: 80680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:09,421-Speed 5590.12 samples/sec   Loss 3.0132   LearningRate 0.0041   Epoch: 15   Global Step: 80690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:11,262-Speed 5562.09 samples/sec   Loss 2.9785   LearningRate 0.0041   Epoch: 15   Global Step: 80700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:13,114-Speed 5532.54 samples/sec   Loss 2.9707   LearningRate 0.0041   Epoch: 15   Global Step: 80710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:27:14,958-Speed 5554.42 samples/sec   Loss 3.1036   LearningRate 0.0041   Epoch: 15   Global Step: 80720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:27:16,795-Speed 5576.44 samples/sec   Loss 3.1207   LearningRate 0.0041   Epoch: 15   Global Step: 80730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:27:18,636-Speed 5565.79 samples/sec   Loss 3.0764   LearningRate 0.0041   Epoch: 15   Global Step: 80740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:27:20,461-Speed 5612.89 samples/sec   Loss 3.1131   LearningRate 0.0041   Epoch: 15   Global Step: 80750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:22,295-Speed 5583.82 samples/sec   Loss 3.1241   LearningRate 0.0041   Epoch: 15   Global Step: 80760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:24,127-Speed 5593.46 samples/sec   Loss 3.1311   LearningRate 0.0041   Epoch: 15   Global Step: 80770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:25,961-Speed 5586.40 samples/sec   Loss 3.0769   LearningRate 0.0041   Epoch: 15   Global Step: 80780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:27,793-Speed 5591.27 samples/sec   Loss 2.9936   LearningRate 0.0041   Epoch: 15   Global Step: 80790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:29,623-Speed 5596.54 samples/sec   Loss 3.0319   LearningRate 0.0041   Epoch: 15   Global Step: 80800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:31,452-Speed 5599.70 samples/sec   Loss 3.1505   LearningRate 0.0040   Epoch: 15   Global Step: 80810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:33,283-Speed 5596.23 samples/sec   Loss 2.9910   LearningRate 0.0040   Epoch: 15   Global Step: 80820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:35,113-Speed 5596.34 samples/sec   Loss 2.9993   LearningRate 0.0040   Epoch: 15   Global Step: 80830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:36,943-Speed 5600.16 samples/sec   Loss 3.0736   LearningRate 0.0040   Epoch: 15   Global Step: 80840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:27:38,764-Speed 5623.13 samples/sec   Loss 3.0596   LearningRate 0.0040   Epoch: 15   Global Step: 80850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:27:40,613-Speed 5542.47 samples/sec   Loss 3.0256   LearningRate 0.0040   Epoch: 15   Global Step: 80860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:27:42,449-Speed 5576.42 samples/sec   Loss 3.0210   LearningRate 0.0040   Epoch: 15   Global Step: 80870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:27:44,285-Speed 5584.05 samples/sec   Loss 3.1047   LearningRate 0.0040   Epoch: 15   Global Step: 80880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:27:46,131-Speed 5549.27 samples/sec   Loss 2.9614   LearningRate 0.0040   Epoch: 15   Global Step: 80890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:27:47,979-Speed 5543.61 samples/sec   Loss 3.0774   LearningRate 0.0040   Epoch: 15   Global Step: 80900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:27:49,811-Speed 5591.62 samples/sec   Loss 3.0625   LearningRate 0.0040   Epoch: 15   Global Step: 80910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:27:51,788-Speed 5179.84 samples/sec   Loss 3.0419   LearningRate 0.0040   Epoch: 15   Global Step: 80920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:28:02,598-Speed 947.40 samples/sec   Loss 2.8851   LearningRate 0.0040   Epoch: 16   Global Step: 80930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:28:04,465-Speed 5488.34 samples/sec   Loss 2.2752   LearningRate 0.0040   Epoch: 16   Global Step: 80940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:28:06,302-Speed 5578.93 samples/sec   Loss 2.4113   LearningRate 0.0040   Epoch: 16   Global Step: 80950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:08,148-Speed 5546.24 samples/sec   Loss 2.2226   LearningRate 0.0040   Epoch: 16   Global Step: 80960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:09,993-Speed 5553.58 samples/sec   Loss 2.3458   LearningRate 0.0040   Epoch: 16   Global Step: 80970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:11,839-Speed 5552.45 samples/sec   Loss 2.2910   LearningRate 0.0040   Epoch: 16   Global Step: 80980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:13,674-Speed 5580.60 samples/sec   Loss 2.3544   LearningRate 0.0040   Epoch: 16   Global Step: 80990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:15,521-Speed 5545.94 samples/sec   Loss 2.2961   LearningRate 0.0040   Epoch: 16   Global Step: 81000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:17,366-Speed 5550.73 samples/sec   Loss 2.3816   LearningRate 0.0040   Epoch: 16   Global Step: 81010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:19,209-Speed 5558.75 samples/sec   Loss 2.3259   LearningRate 0.0040   Epoch: 16   Global Step: 81020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:21,046-Speed 5576.97 samples/sec   Loss 2.3083   LearningRate 0.0040   Epoch: 16   Global Step: 81030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:22,877-Speed 5595.61 samples/sec   Loss 2.2758   LearningRate 0.0040   Epoch: 16   Global Step: 81040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:28:24,724-Speed 5547.66 samples/sec   Loss 2.3025   LearningRate 0.0040   Epoch: 16   Global Step: 81050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:28:26,567-Speed 5556.26 samples/sec   Loss 2.3394   LearningRate 0.0039   Epoch: 16   Global Step: 81060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:28:28,412-Speed 5554.17 samples/sec   Loss 2.3535   LearningRate 0.0039   Epoch: 16   Global Step: 81070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:28:30,255-Speed 5556.78 samples/sec   Loss 2.3430   LearningRate 0.0039   Epoch: 16   Global Step: 81080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:28:32,110-Speed 5520.73 samples/sec   Loss 2.2524   LearningRate 0.0039   Epoch: 16   Global Step: 81090   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:28:33,942-Speed 5592.15 samples/sec   Loss 2.3326   LearningRate 0.0039   Epoch: 16   Global Step: 81100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:28:35,777-Speed 5582.15 samples/sec   Loss 2.4022   LearningRate 0.0039   Epoch: 16   Global Step: 81110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:28:37,619-Speed 5562.22 samples/sec   Loss 2.3414   LearningRate 0.0039   Epoch: 16   Global Step: 81120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:28:39,455-Speed 5581.08 samples/sec   Loss 2.2703   LearningRate 0.0039   Epoch: 16   Global Step: 81130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:28:41,305-Speed 5538.16 samples/sec   Loss 2.3345   LearningRate 0.0039   Epoch: 16   Global Step: 81140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:43,140-Speed 5581.86 samples/sec   Loss 2.2742   LearningRate 0.0039   Epoch: 16   Global Step: 81150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:44,978-Speed 5573.56 samples/sec   Loss 2.2190   LearningRate 0.0039   Epoch: 16   Global Step: 81160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:46,811-Speed 5586.53 samples/sec   Loss 2.3384   LearningRate 0.0039   Epoch: 16   Global Step: 81170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:48,669-Speed 5515.53 samples/sec   Loss 2.3614   LearningRate 0.0039   Epoch: 16   Global Step: 81180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:50,507-Speed 5571.83 samples/sec   Loss 2.3388   LearningRate 0.0039   Epoch: 16   Global Step: 81190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:52,358-Speed 5534.48 samples/sec   Loss 2.3010   LearningRate 0.0039   Epoch: 16   Global Step: 81200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:54,208-Speed 5538.54 samples/sec   Loss 2.3869   LearningRate 0.0039   Epoch: 16   Global Step: 81210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:56,041-Speed 5586.12 samples/sec   Loss 2.3426   LearningRate 0.0039   Epoch: 16   Global Step: 81220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:57,877-Speed 5580.81 samples/sec   Loss 2.3058   LearningRate 0.0039   Epoch: 16   Global Step: 81230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:28:59,727-Speed 5537.65 samples/sec   Loss 2.2785   LearningRate 0.0039   Epoch: 16   Global Step: 81240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:29:01,580-Speed 5528.77 samples/sec   Loss 2.3999   LearningRate 0.0039   Epoch: 16   Global Step: 81250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:29:03,403-Speed 5620.29 samples/sec   Loss 2.3146   LearningRate 0.0039   Epoch: 16   Global Step: 81260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:05,260-Speed 5514.44 samples/sec   Loss 2.3592   LearningRate 0.0039   Epoch: 16   Global Step: 81270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:07,101-Speed 5565.56 samples/sec   Loss 2.3088   LearningRate 0.0039   Epoch: 16   Global Step: 81280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:08,946-Speed 5553.37 samples/sec   Loss 2.4392   LearningRate 0.0039   Epoch: 16   Global Step: 81290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:10,786-Speed 5565.21 samples/sec   Loss 2.3811   LearningRate 0.0039   Epoch: 16   Global Step: 81300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:12,638-Speed 5533.07 samples/sec   Loss 2.2940   LearningRate 0.0039   Epoch: 16   Global Step: 81310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:14,513-Speed 5461.72 samples/sec   Loss 2.4837   LearningRate 0.0038   Epoch: 16   Global Step: 81320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:16,376-Speed 5499.42 samples/sec   Loss 2.3816   LearningRate 0.0038   Epoch: 16   Global Step: 81330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:18,212-Speed 5577.75 samples/sec   Loss 2.3262   LearningRate 0.0038   Epoch: 16   Global Step: 81340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:20,037-Speed 5614.85 samples/sec   Loss 2.3949   LearningRate 0.0038   Epoch: 16   Global Step: 81350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:29:21,885-Speed 5543.84 samples/sec   Loss 2.5112   LearningRate 0.0038   Epoch: 16   Global Step: 81360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:29:23,760-Speed 5462.31 samples/sec   Loss 2.4339   LearningRate 0.0038   Epoch: 16   Global Step: 81370   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:29:25,620-Speed 5507.68 samples/sec   Loss 2.3894   LearningRate 0.0038   Epoch: 16   Global Step: 81380   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:29:27,466-Speed 5550.54 samples/sec   Loss 2.3240   LearningRate 0.0038   Epoch: 16   Global Step: 81390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:29:29,313-Speed 5545.37 samples/sec   Loss 2.3745   LearningRate 0.0038   Epoch: 16   Global Step: 81400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:29:31,165-Speed 5532.76 samples/sec   Loss 2.2947   LearningRate 0.0038   Epoch: 16   Global Step: 81410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:29:32,995-Speed 5597.55 samples/sec   Loss 2.4295   LearningRate 0.0038   Epoch: 16   Global Step: 81420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:29:34,831-Speed 5577.93 samples/sec   Loss 2.3534   LearningRate 0.0038   Epoch: 16   Global Step: 81430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:29:36,682-Speed 5535.24 samples/sec   Loss 2.3951   LearningRate 0.0038   Epoch: 16   Global Step: 81440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:29:38,529-Speed 5546.32 samples/sec   Loss 2.4257   LearningRate 0.0038   Epoch: 16   Global Step: 81450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:40,384-Speed 5523.10 samples/sec   Loss 2.3327   LearningRate 0.0038   Epoch: 16   Global Step: 81460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:42,235-Speed 5534.83 samples/sec   Loss 2.4514   LearningRate 0.0038   Epoch: 16   Global Step: 81470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:44,068-Speed 5587.55 samples/sec   Loss 2.2961   LearningRate 0.0038   Epoch: 16   Global Step: 81480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:45,903-Speed 5581.96 samples/sec   Loss 2.4489   LearningRate 0.0038   Epoch: 16   Global Step: 81490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:47,748-Speed 5552.89 samples/sec   Loss 2.4476   LearningRate 0.0038   Epoch: 16   Global Step: 81500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:49,586-Speed 5574.67 samples/sec   Loss 2.3704   LearningRate 0.0038   Epoch: 16   Global Step: 81510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:51,425-Speed 5568.99 samples/sec   Loss 2.3596   LearningRate 0.0038   Epoch: 16   Global Step: 81520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:53,272-Speed 5545.12 samples/sec   Loss 2.3239   LearningRate 0.0038   Epoch: 16   Global Step: 81530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:55,108-Speed 5582.48 samples/sec   Loss 2.3821   LearningRate 0.0038   Epoch: 16   Global Step: 81540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:29:56,943-Speed 5580.32 samples/sec   Loss 2.4447   LearningRate 0.0038   Epoch: 16   Global Step: 81550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:29:58,767-Speed 5616.80 samples/sec   Loss 2.4730   LearningRate 0.0038   Epoch: 16   Global Step: 81560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:00,603-Speed 5578.06 samples/sec   Loss 2.4248   LearningRate 0.0038   Epoch: 16   Global Step: 81570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:02,460-Speed 5515.86 samples/sec   Loss 2.4042   LearningRate 0.0037   Epoch: 16   Global Step: 81580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:04,322-Speed 5504.45 samples/sec   Loss 2.4067   LearningRate 0.0037   Epoch: 16   Global Step: 81590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:06,160-Speed 5573.51 samples/sec   Loss 2.4499   LearningRate 0.0037   Epoch: 16   Global Step: 81600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:07,996-Speed 5580.06 samples/sec   Loss 2.4218   LearningRate 0.0037   Epoch: 16   Global Step: 81610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:09,830-Speed 5585.84 samples/sec   Loss 2.3721   LearningRate 0.0037   Epoch: 16   Global Step: 81620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:11,677-Speed 5546.24 samples/sec   Loss 2.3281   LearningRate 0.0037   Epoch: 16   Global Step: 81630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:13,516-Speed 5568.80 samples/sec   Loss 2.4703   LearningRate 0.0037   Epoch: 16   Global Step: 81640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:15,356-Speed 5568.92 samples/sec   Loss 2.4279   LearningRate 0.0037   Epoch: 16   Global Step: 81650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:17,190-Speed 5584.02 samples/sec   Loss 2.4236   LearningRate 0.0037   Epoch: 16   Global Step: 81660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:30:19,030-Speed 5567.43 samples/sec   Loss 2.4120   LearningRate 0.0037   Epoch: 16   Global Step: 81670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:30:20,864-Speed 5585.42 samples/sec   Loss 2.5338   LearningRate 0.0037   Epoch: 16   Global Step: 81680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:30:22,699-Speed 5582.44 samples/sec   Loss 2.5101   LearningRate 0.0037   Epoch: 16   Global Step: 81690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:30:24,532-Speed 5588.69 samples/sec   Loss 2.4490   LearningRate 0.0037   Epoch: 16   Global Step: 81700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:30:26,362-Speed 5597.35 samples/sec   Loss 2.3371   LearningRate 0.0037   Epoch: 16   Global Step: 81710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:28,201-Speed 5571.32 samples/sec   Loss 2.5223   LearningRate 0.0037   Epoch: 16   Global Step: 81720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:30,036-Speed 5584.36 samples/sec   Loss 2.4370   LearningRate 0.0037   Epoch: 16   Global Step: 81730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:31,870-Speed 5582.57 samples/sec   Loss 2.5405   LearningRate 0.0037   Epoch: 16   Global Step: 81740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:33,704-Speed 5586.24 samples/sec   Loss 2.4623   LearningRate 0.0037   Epoch: 16   Global Step: 81750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:35,547-Speed 5560.39 samples/sec   Loss 2.5358   LearningRate 0.0037   Epoch: 16   Global Step: 81760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:37,387-Speed 5566.50 samples/sec   Loss 2.5200   LearningRate 0.0037   Epoch: 16   Global Step: 81770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:39,224-Speed 5575.51 samples/sec   Loss 2.4261   LearningRate 0.0037   Epoch: 16   Global Step: 81780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:41,069-Speed 5553.42 samples/sec   Loss 2.4388   LearningRate 0.0037   Epoch: 16   Global Step: 81790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:42,917-Speed 5542.02 samples/sec   Loss 2.4349   LearningRate 0.0037   Epoch: 16   Global Step: 81800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:44,751-Speed 5584.97 samples/sec   Loss 2.3445   LearningRate 0.0037   Epoch: 16   Global Step: 81810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:30:46,584-Speed 5589.25 samples/sec   Loss 2.4997   LearningRate 0.0037   Epoch: 16   Global Step: 81820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:48,421-Speed 5577.11 samples/sec   Loss 2.4641   LearningRate 0.0037   Epoch: 16   Global Step: 81830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:50,267-Speed 5548.62 samples/sec   Loss 2.5001   LearningRate 0.0036   Epoch: 16   Global Step: 81840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:52,106-Speed 5571.54 samples/sec   Loss 2.4678   LearningRate 0.0036   Epoch: 16   Global Step: 81850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:53,963-Speed 5517.96 samples/sec   Loss 2.4271   LearningRate 0.0036   Epoch: 16   Global Step: 81860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:55,799-Speed 5576.39 samples/sec   Loss 2.4668   LearningRate 0.0036   Epoch: 16   Global Step: 81870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:57,641-Speed 5561.74 samples/sec   Loss 2.4610   LearningRate 0.0036   Epoch: 16   Global Step: 81880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:30:59,478-Speed 5576.07 samples/sec   Loss 2.4737   LearningRate 0.0036   Epoch: 16   Global Step: 81890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:31:01,323-Speed 5554.36 samples/sec   Loss 2.4766   LearningRate 0.0036   Epoch: 16   Global Step: 81900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:31:03,161-Speed 5572.12 samples/sec   Loss 2.4550   LearningRate 0.0036   Epoch: 16   Global Step: 81910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:31:05,004-Speed 5558.79 samples/sec   Loss 2.4926   LearningRate 0.0036   Epoch: 16   Global Step: 81920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:31:06,845-Speed 5564.14 samples/sec   Loss 2.4870   LearningRate 0.0036   Epoch: 16   Global Step: 81930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:31:08,677-Speed 5590.74 samples/sec   Loss 2.4258   LearningRate 0.0036   Epoch: 16   Global Step: 81940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:31:10,519-Speed 5562.39 samples/sec   Loss 2.4587   LearningRate 0.0036   Epoch: 16   Global Step: 81950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:31:12,357-Speed 5574.50 samples/sec   Loss 2.5216   LearningRate 0.0036   Epoch: 16   Global Step: 81960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:31:14,185-Speed 5603.09 samples/sec   Loss 2.5543   LearningRate 0.0036   Epoch: 16   Global Step: 81970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:31:16,032-Speed 5546.57 samples/sec   Loss 2.4627   LearningRate 0.0036   Epoch: 16   Global Step: 81980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:31:17,883-Speed 5534.53 samples/sec   Loss 2.4687   LearningRate 0.0036   Epoch: 16   Global Step: 81990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:31:19,715-Speed 5591.05 samples/sec   Loss 2.5109   LearningRate 0.0036   Epoch: 16   Global Step: 82000   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-11 15:31:46,180-[lfw][82000]XNorm: 23.026910
Training: 2022-04-11 15:31:46,181-[lfw][82000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 15:31:46,181-[lfw][82000]Accuracy-Highest: 0.99817
Training: 2022-04-11 15:32:16,837-[cfp_fp][82000]XNorm: 21.433771
Training: 2022-04-11 15:32:16,838-[cfp_fp][82000]Accuracy-Flip: 0.98243+-0.00661
Training: 2022-04-11 15:32:16,838-[cfp_fp][82000]Accuracy-Highest: 0.98243
Training: 2022-04-11 15:32:43,158-[agedb_30][82000]XNorm: 22.909013
Training: 2022-04-11 15:32:43,159-[agedb_30][82000]Accuracy-Flip: 0.98183+-0.00747
Training: 2022-04-11 15:32:43,159-[agedb_30][82000]Accuracy-Highest: 0.98350
Training: 2022-04-11 15:32:45,002-Speed 120.07 samples/sec   Loss 2.4726   LearningRate 0.0036   Epoch: 16   Global Step: 82010   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-11 15:32:46,829-Speed 5608.58 samples/sec   Loss 2.4404   LearningRate 0.0036   Epoch: 16   Global Step: 82020   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-11 15:32:48,665-Speed 5577.76 samples/sec   Loss 2.5184   LearningRate 0.0036   Epoch: 16   Global Step: 82030   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-11 15:32:50,503-Speed 5572.15 samples/sec   Loss 2.5487   LearningRate 0.0036   Epoch: 16   Global Step: 82040   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-11 15:32:52,340-Speed 5578.47 samples/sec   Loss 2.3272   LearningRate 0.0036   Epoch: 16   Global Step: 82050   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-11 15:32:54,178-Speed 5571.84 samples/sec   Loss 2.4269   LearningRate 0.0036   Epoch: 16   Global Step: 82060   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-11 15:32:56,013-Speed 5581.82 samples/sec   Loss 2.4715   LearningRate 0.0036   Epoch: 16   Global Step: 82070   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-11 15:32:57,837-Speed 5616.54 samples/sec   Loss 2.6585   LearningRate 0.0036   Epoch: 16   Global Step: 82080   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-11 15:32:59,670-Speed 5590.94 samples/sec   Loss 2.4951   LearningRate 0.0036   Epoch: 16   Global Step: 82090   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-11 15:33:01,520-Speed 5534.93 samples/sec   Loss 2.4669   LearningRate 0.0035   Epoch: 16   Global Step: 82100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:33:03,460-Speed 5282.62 samples/sec   Loss 2.4787   LearningRate 0.0035   Epoch: 16   Global Step: 82110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:33:05,309-Speed 5537.91 samples/sec   Loss 2.3645   LearningRate 0.0035   Epoch: 16   Global Step: 82120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:33:07,151-Speed 5560.86 samples/sec   Loss 2.5229   LearningRate 0.0035   Epoch: 16   Global Step: 82130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:33:08,980-Speed 5603.40 samples/sec   Loss 2.4566   LearningRate 0.0035   Epoch: 16   Global Step: 82140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:33:10,823-Speed 5559.09 samples/sec   Loss 2.5155   LearningRate 0.0035   Epoch: 16   Global Step: 82150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:33:12,676-Speed 5527.67 samples/sec   Loss 2.5272   LearningRate 0.0035   Epoch: 16   Global Step: 82160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:33:14,504-Speed 5604.01 samples/sec   Loss 2.4941   LearningRate 0.0035   Epoch: 16   Global Step: 82170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:33:16,336-Speed 5592.14 samples/sec   Loss 2.5504   LearningRate 0.0035   Epoch: 16   Global Step: 82180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:33:18,168-Speed 5589.95 samples/sec   Loss 2.5173   LearningRate 0.0035   Epoch: 16   Global Step: 82190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:33:20,010-Speed 5560.88 samples/sec   Loss 2.5551   LearningRate 0.0035   Epoch: 16   Global Step: 82200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:21,848-Speed 5573.65 samples/sec   Loss 2.4290   LearningRate 0.0035   Epoch: 16   Global Step: 82210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:23,683-Speed 5583.57 samples/sec   Loss 2.5304   LearningRate 0.0035   Epoch: 16   Global Step: 82220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:25,517-Speed 5585.19 samples/sec   Loss 2.3725   LearningRate 0.0035   Epoch: 16   Global Step: 82230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:27,359-Speed 5559.37 samples/sec   Loss 2.5328   LearningRate 0.0035   Epoch: 16   Global Step: 82240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:29,217-Speed 5516.42 samples/sec   Loss 2.4994   LearningRate 0.0035   Epoch: 16   Global Step: 82250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:31,053-Speed 5579.65 samples/sec   Loss 2.5892   LearningRate 0.0035   Epoch: 16   Global Step: 82260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:32,887-Speed 5583.03 samples/sec   Loss 2.5834   LearningRate 0.0035   Epoch: 16   Global Step: 82270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:34,726-Speed 5573.52 samples/sec   Loss 2.4939   LearningRate 0.0035   Epoch: 16   Global Step: 82280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:36,557-Speed 5593.16 samples/sec   Loss 2.4735   LearningRate 0.0035   Epoch: 16   Global Step: 82290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:38,401-Speed 5557.30 samples/sec   Loss 2.5423   LearningRate 0.0035   Epoch: 16   Global Step: 82300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:33:40,239-Speed 5570.94 samples/sec   Loss 2.5121   LearningRate 0.0035   Epoch: 16   Global Step: 82310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:33:42,068-Speed 5601.68 samples/sec   Loss 2.6535   LearningRate 0.0035   Epoch: 16   Global Step: 82320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:43,901-Speed 5590.20 samples/sec   Loss 2.6011   LearningRate 0.0035   Epoch: 16   Global Step: 82330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:45,734-Speed 5586.24 samples/sec   Loss 2.4630   LearningRate 0.0035   Epoch: 16   Global Step: 82340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:47,584-Speed 5536.60 samples/sec   Loss 2.4056   LearningRate 0.0035   Epoch: 16   Global Step: 82350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:49,436-Speed 5533.56 samples/sec   Loss 2.4550   LearningRate 0.0035   Epoch: 16   Global Step: 82360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:51,281-Speed 5549.78 samples/sec   Loss 2.4639   LearningRate 0.0035   Epoch: 16   Global Step: 82370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:53,149-Speed 5484.53 samples/sec   Loss 2.5409   LearningRate 0.0034   Epoch: 16   Global Step: 82380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:54,994-Speed 5554.91 samples/sec   Loss 2.4913   LearningRate 0.0034   Epoch: 16   Global Step: 82390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:56,827-Speed 5588.06 samples/sec   Loss 2.4742   LearningRate 0.0034   Epoch: 16   Global Step: 82400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:33:58,665-Speed 5574.08 samples/sec   Loss 2.4536   LearningRate 0.0034   Epoch: 16   Global Step: 82410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:00,489-Speed 5615.83 samples/sec   Loss 2.4789   LearningRate 0.0034   Epoch: 16   Global Step: 82420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:02,328-Speed 5567.98 samples/sec   Loss 2.3849   LearningRate 0.0034   Epoch: 16   Global Step: 82430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:04,166-Speed 5576.42 samples/sec   Loss 2.5408   LearningRate 0.0034   Epoch: 16   Global Step: 82440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:06,000-Speed 5583.43 samples/sec   Loss 2.4467   LearningRate 0.0034   Epoch: 16   Global Step: 82450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:07,831-Speed 5595.62 samples/sec   Loss 2.4318   LearningRate 0.0034   Epoch: 16   Global Step: 82460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:34:09,676-Speed 5550.71 samples/sec   Loss 2.4449   LearningRate 0.0034   Epoch: 16   Global Step: 82470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:34:11,513-Speed 5577.51 samples/sec   Loss 2.4858   LearningRate 0.0034   Epoch: 16   Global Step: 82480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:34:13,358-Speed 5552.35 samples/sec   Loss 2.5064   LearningRate 0.0034   Epoch: 16   Global Step: 82490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:34:15,196-Speed 5573.10 samples/sec   Loss 2.5588   LearningRate 0.0034   Epoch: 16   Global Step: 82500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:34:17,038-Speed 5563.48 samples/sec   Loss 2.4440   LearningRate 0.0034   Epoch: 16   Global Step: 82510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:34:18,870-Speed 5591.79 samples/sec   Loss 2.5730   LearningRate 0.0034   Epoch: 16   Global Step: 82520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:34:20,704-Speed 5583.16 samples/sec   Loss 2.5229   LearningRate 0.0034   Epoch: 16   Global Step: 82530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:34:22,552-Speed 5542.57 samples/sec   Loss 2.5694   LearningRate 0.0034   Epoch: 16   Global Step: 82540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:34:24,394-Speed 5563.95 samples/sec   Loss 2.5057   LearningRate 0.0034   Epoch: 16   Global Step: 82550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:34:26,224-Speed 5597.76 samples/sec   Loss 2.5628   LearningRate 0.0034   Epoch: 16   Global Step: 82560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:28,057-Speed 5585.76 samples/sec   Loss 2.4979   LearningRate 0.0034   Epoch: 16   Global Step: 82570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:29,894-Speed 5578.12 samples/sec   Loss 2.4954   LearningRate 0.0034   Epoch: 16   Global Step: 82580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:31,737-Speed 5558.34 samples/sec   Loss 2.5539   LearningRate 0.0034   Epoch: 16   Global Step: 82590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:33,572-Speed 5581.14 samples/sec   Loss 2.4884   LearningRate 0.0034   Epoch: 16   Global Step: 82600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:35,420-Speed 5545.81 samples/sec   Loss 2.6136   LearningRate 0.0034   Epoch: 16   Global Step: 82610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:37,276-Speed 5519.83 samples/sec   Loss 2.4419   LearningRate 0.0034   Epoch: 16   Global Step: 82620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:39,113-Speed 5574.78 samples/sec   Loss 2.5546   LearningRate 0.0034   Epoch: 16   Global Step: 82630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:40,954-Speed 5564.57 samples/sec   Loss 2.4588   LearningRate 0.0034   Epoch: 16   Global Step: 82640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:42,792-Speed 5572.53 samples/sec   Loss 2.5684   LearningRate 0.0033   Epoch: 16   Global Step: 82650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:44,625-Speed 5590.79 samples/sec   Loss 2.5423   LearningRate 0.0033   Epoch: 16   Global Step: 82660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:34:46,455-Speed 5597.66 samples/sec   Loss 2.5593   LearningRate 0.0033   Epoch: 16   Global Step: 82670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:48,284-Speed 5601.30 samples/sec   Loss 2.3767   LearningRate 0.0033   Epoch: 16   Global Step: 82680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:50,113-Speed 5598.62 samples/sec   Loss 2.5342   LearningRate 0.0033   Epoch: 16   Global Step: 82690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:51,962-Speed 5539.26 samples/sec   Loss 2.5413   LearningRate 0.0033   Epoch: 16   Global Step: 82700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:53,805-Speed 5559.02 samples/sec   Loss 2.5046   LearningRate 0.0033   Epoch: 16   Global Step: 82710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:55,647-Speed 5562.12 samples/sec   Loss 2.5380   LearningRate 0.0033   Epoch: 16   Global Step: 82720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:57,487-Speed 5567.32 samples/sec   Loss 2.5713   LearningRate 0.0033   Epoch: 16   Global Step: 82730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:34:59,326-Speed 5570.57 samples/sec   Loss 2.5544   LearningRate 0.0033   Epoch: 16   Global Step: 82740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:01,184-Speed 5514.95 samples/sec   Loss 2.4000   LearningRate 0.0033   Epoch: 16   Global Step: 82750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:03,028-Speed 5552.75 samples/sec   Loss 2.4246   LearningRate 0.0033   Epoch: 16   Global Step: 82760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:04,878-Speed 5539.07 samples/sec   Loss 2.5565   LearningRate 0.0033   Epoch: 16   Global Step: 82770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:06,732-Speed 5524.09 samples/sec   Loss 2.5228   LearningRate 0.0033   Epoch: 16   Global Step: 82780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:08,564-Speed 5592.09 samples/sec   Loss 2.5267   LearningRate 0.0033   Epoch: 16   Global Step: 82790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:10,427-Speed 5498.05 samples/sec   Loss 2.4869   LearningRate 0.0033   Epoch: 16   Global Step: 82800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:35:12,262-Speed 5584.54 samples/sec   Loss 2.4514   LearningRate 0.0033   Epoch: 16   Global Step: 82810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:35:14,125-Speed 5496.81 samples/sec   Loss 2.5166   LearningRate 0.0033   Epoch: 16   Global Step: 82820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:35:15,965-Speed 5568.17 samples/sec   Loss 2.4902   LearningRate 0.0033   Epoch: 16   Global Step: 82830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:35:17,797-Speed 5593.71 samples/sec   Loss 2.5899   LearningRate 0.0033   Epoch: 16   Global Step: 82840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:35:19,631-Speed 5583.83 samples/sec   Loss 2.5122   LearningRate 0.0033   Epoch: 16   Global Step: 82850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:35:21,469-Speed 5574.69 samples/sec   Loss 2.5677   LearningRate 0.0033   Epoch: 16   Global Step: 82860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:35:23,328-Speed 5509.75 samples/sec   Loss 2.5945   LearningRate 0.0033   Epoch: 16   Global Step: 82870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:35:25,175-Speed 5546.97 samples/sec   Loss 2.5890   LearningRate 0.0033   Epoch: 16   Global Step: 82880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:35:27,008-Speed 5586.62 samples/sec   Loss 2.5391   LearningRate 0.0033   Epoch: 16   Global Step: 82890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:35:28,839-Speed 5596.11 samples/sec   Loss 2.5043   LearningRate 0.0033   Epoch: 16   Global Step: 82900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:30,675-Speed 5579.86 samples/sec   Loss 2.5338   LearningRate 0.0033   Epoch: 16   Global Step: 82910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:32,508-Speed 5588.55 samples/sec   Loss 2.5693   LearningRate 0.0033   Epoch: 16   Global Step: 82920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:34,349-Speed 5565.78 samples/sec   Loss 2.4171   LearningRate 0.0032   Epoch: 16   Global Step: 82930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:36,185-Speed 5577.27 samples/sec   Loss 2.5561   LearningRate 0.0032   Epoch: 16   Global Step: 82940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:38,022-Speed 5578.36 samples/sec   Loss 2.5073   LearningRate 0.0032   Epoch: 16   Global Step: 82950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:39,853-Speed 5594.91 samples/sec   Loss 2.4888   LearningRate 0.0032   Epoch: 16   Global Step: 82960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:41,692-Speed 5569.39 samples/sec   Loss 2.5899   LearningRate 0.0032   Epoch: 16   Global Step: 82970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:43,520-Speed 5603.83 samples/sec   Loss 2.6715   LearningRate 0.0032   Epoch: 16   Global Step: 82980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:45,360-Speed 5567.35 samples/sec   Loss 2.6035   LearningRate 0.0032   Epoch: 16   Global Step: 82990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:47,195-Speed 5581.51 samples/sec   Loss 2.4804   LearningRate 0.0032   Epoch: 16   Global Step: 83000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:35:49,031-Speed 5582.19 samples/sec   Loss 2.4699   LearningRate 0.0032   Epoch: 16   Global Step: 83010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:35:50,891-Speed 5505.15 samples/sec   Loss 2.6646   LearningRate 0.0032   Epoch: 16   Global Step: 83020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:35:52,735-Speed 5554.38 samples/sec   Loss 2.5560   LearningRate 0.0032   Epoch: 16   Global Step: 83030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:54,578-Speed 5559.12 samples/sec   Loss 2.5374   LearningRate 0.0032   Epoch: 16   Global Step: 83040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:56,424-Speed 5550.76 samples/sec   Loss 2.5602   LearningRate 0.0032   Epoch: 16   Global Step: 83050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:35:58,259-Speed 5582.30 samples/sec   Loss 2.5600   LearningRate 0.0032   Epoch: 16   Global Step: 83060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:00,091-Speed 5592.44 samples/sec   Loss 2.4932   LearningRate 0.0032   Epoch: 16   Global Step: 83070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:01,926-Speed 5580.74 samples/sec   Loss 2.5281   LearningRate 0.0032   Epoch: 16   Global Step: 83080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:03,766-Speed 5570.54 samples/sec   Loss 2.5335   LearningRate 0.0032   Epoch: 16   Global Step: 83090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:05,600-Speed 5584.42 samples/sec   Loss 2.5946   LearningRate 0.0032   Epoch: 16   Global Step: 83100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:07,431-Speed 5594.10 samples/sec   Loss 2.5151   LearningRate 0.0032   Epoch: 16   Global Step: 83110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:09,266-Speed 5583.62 samples/sec   Loss 2.5214   LearningRate 0.0032   Epoch: 16   Global Step: 83120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:36:11,102-Speed 5577.50 samples/sec   Loss 2.4819   LearningRate 0.0032   Epoch: 16   Global Step: 83130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:36:12,942-Speed 5566.32 samples/sec   Loss 2.4919   LearningRate 0.0032   Epoch: 16   Global Step: 83140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:36:14,783-Speed 5565.70 samples/sec   Loss 2.4987   LearningRate 0.0032   Epoch: 16   Global Step: 83150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:36:16,641-Speed 5512.88 samples/sec   Loss 2.5314   LearningRate 0.0032   Epoch: 16   Global Step: 83160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:36:18,477-Speed 5580.68 samples/sec   Loss 2.5077   LearningRate 0.0032   Epoch: 16   Global Step: 83170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:36:20,310-Speed 5589.72 samples/sec   Loss 2.5284   LearningRate 0.0032   Epoch: 16   Global Step: 83180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:36:22,143-Speed 5587.73 samples/sec   Loss 2.5926   LearningRate 0.0032   Epoch: 16   Global Step: 83190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:36:23,990-Speed 5546.33 samples/sec   Loss 2.5021   LearningRate 0.0032   Epoch: 16   Global Step: 83200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:36:25,832-Speed 5561.67 samples/sec   Loss 2.5583   LearningRate 0.0031   Epoch: 16   Global Step: 83210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:36:27,667-Speed 5581.55 samples/sec   Loss 2.6502   LearningRate 0.0031   Epoch: 16   Global Step: 83220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:29,499-Speed 5592.71 samples/sec   Loss 2.5383   LearningRate 0.0031   Epoch: 16   Global Step: 83230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:31,331-Speed 5590.32 samples/sec   Loss 2.4953   LearningRate 0.0031   Epoch: 16   Global Step: 83240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:33,177-Speed 5547.30 samples/sec   Loss 2.3903   LearningRate 0.0031   Epoch: 16   Global Step: 83250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:35,014-Speed 5579.46 samples/sec   Loss 2.5497   LearningRate 0.0031   Epoch: 16   Global Step: 83260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:36,846-Speed 5591.04 samples/sec   Loss 2.5464   LearningRate 0.0031   Epoch: 16   Global Step: 83270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:38,683-Speed 5577.37 samples/sec   Loss 2.5297   LearningRate 0.0031   Epoch: 16   Global Step: 83280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:40,531-Speed 5542.55 samples/sec   Loss 2.5205   LearningRate 0.0031   Epoch: 16   Global Step: 83290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:42,365-Speed 5583.42 samples/sec   Loss 2.5452   LearningRate 0.0031   Epoch: 16   Global Step: 83300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:44,205-Speed 5568.37 samples/sec   Loss 2.6124   LearningRate 0.0031   Epoch: 16   Global Step: 83310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:46,035-Speed 5597.03 samples/sec   Loss 2.5144   LearningRate 0.0031   Epoch: 16   Global Step: 83320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:36:47,868-Speed 5589.87 samples/sec   Loss 2.4885   LearningRate 0.0031   Epoch: 16   Global Step: 83330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:36:49,705-Speed 5574.92 samples/sec   Loss 2.5468   LearningRate 0.0031   Epoch: 16   Global Step: 83340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:36:51,541-Speed 5581.04 samples/sec   Loss 2.5819   LearningRate 0.0031   Epoch: 16   Global Step: 83350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:53,378-Speed 5575.34 samples/sec   Loss 2.4722   LearningRate 0.0031   Epoch: 16   Global Step: 83360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:55,212-Speed 5586.23 samples/sec   Loss 2.5878   LearningRate 0.0031   Epoch: 16   Global Step: 83370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:57,044-Speed 5591.98 samples/sec   Loss 2.4785   LearningRate 0.0031   Epoch: 16   Global Step: 83380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:36:58,889-Speed 5554.46 samples/sec   Loss 2.4129   LearningRate 0.0031   Epoch: 16   Global Step: 83390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:37:00,756-Speed 5486.42 samples/sec   Loss 2.5878   LearningRate 0.0031   Epoch: 16   Global Step: 83400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:37:02,634-Speed 5451.53 samples/sec   Loss 2.5539   LearningRate 0.0031   Epoch: 16   Global Step: 83410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:37:04,469-Speed 5585.87 samples/sec   Loss 2.4881   LearningRate 0.0031   Epoch: 16   Global Step: 83420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:37:06,309-Speed 5566.79 samples/sec   Loss 2.5925   LearningRate 0.0031   Epoch: 16   Global Step: 83430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:37:08,145-Speed 5576.92 samples/sec   Loss 2.6171   LearningRate 0.0031   Epoch: 16   Global Step: 83440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:37:09,983-Speed 5576.17 samples/sec   Loss 2.4207   LearningRate 0.0031   Epoch: 16   Global Step: 83450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:37:11,831-Speed 5541.77 samples/sec   Loss 2.5586   LearningRate 0.0031   Epoch: 16   Global Step: 83460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:37:13,694-Speed 5498.20 samples/sec   Loss 2.5597   LearningRate 0.0031   Epoch: 16   Global Step: 83470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:37:15,534-Speed 5569.60 samples/sec   Loss 2.5899   LearningRate 0.0031   Epoch: 16   Global Step: 83480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:37:17,378-Speed 5553.88 samples/sec   Loss 2.4934   LearningRate 0.0031   Epoch: 16   Global Step: 83490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:19,216-Speed 5575.94 samples/sec   Loss 2.5469   LearningRate 0.0030   Epoch: 16   Global Step: 83500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:21,048-Speed 5590.27 samples/sec   Loss 2.5742   LearningRate 0.0030   Epoch: 16   Global Step: 83510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:22,882-Speed 5585.76 samples/sec   Loss 2.4577   LearningRate 0.0030   Epoch: 16   Global Step: 83520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:24,721-Speed 5569.67 samples/sec   Loss 2.4514   LearningRate 0.0030   Epoch: 16   Global Step: 83530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:26,556-Speed 5581.70 samples/sec   Loss 2.4947   LearningRate 0.0030   Epoch: 16   Global Step: 83540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:28,401-Speed 5551.03 samples/sec   Loss 2.4609   LearningRate 0.0030   Epoch: 16   Global Step: 83550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:30,238-Speed 5578.12 samples/sec   Loss 2.6306   LearningRate 0.0030   Epoch: 16   Global Step: 83560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:32,072-Speed 5586.51 samples/sec   Loss 2.5633   LearningRate 0.0030   Epoch: 16   Global Step: 83570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:33,910-Speed 5571.98 samples/sec   Loss 2.6453   LearningRate 0.0030   Epoch: 16   Global Step: 83580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:35,742-Speed 5593.15 samples/sec   Loss 2.5962   LearningRate 0.0030   Epoch: 16   Global Step: 83590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:37:37,601-Speed 5509.54 samples/sec   Loss 2.6716   LearningRate 0.0030   Epoch: 16   Global Step: 83600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:37:39,439-Speed 5575.84 samples/sec   Loss 2.4481   LearningRate 0.0030   Epoch: 16   Global Step: 83610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:37:41,277-Speed 5573.09 samples/sec   Loss 2.5061   LearningRate 0.0030   Epoch: 16   Global Step: 83620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:43,119-Speed 5560.90 samples/sec   Loss 2.5007   LearningRate 0.0030   Epoch: 16   Global Step: 83630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:44,955-Speed 5580.64 samples/sec   Loss 2.5590   LearningRate 0.0030   Epoch: 16   Global Step: 83640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:46,812-Speed 5513.90 samples/sec   Loss 2.5580   LearningRate 0.0030   Epoch: 16   Global Step: 83650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:48,647-Speed 5583.93 samples/sec   Loss 2.5885   LearningRate 0.0030   Epoch: 16   Global Step: 83660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:50,484-Speed 5574.35 samples/sec   Loss 2.5697   LearningRate 0.0030   Epoch: 16   Global Step: 83670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:52,324-Speed 5568.30 samples/sec   Loss 2.4997   LearningRate 0.0030   Epoch: 16   Global Step: 83680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:54,159-Speed 5583.98 samples/sec   Loss 2.4648   LearningRate 0.0030   Epoch: 16   Global Step: 83690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:55,996-Speed 5574.28 samples/sec   Loss 2.3730   LearningRate 0.0030   Epoch: 16   Global Step: 83700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:37:57,829-Speed 5587.86 samples/sec   Loss 2.5645   LearningRate 0.0030   Epoch: 16   Global Step: 83710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:37:59,668-Speed 5571.54 samples/sec   Loss 2.5599   LearningRate 0.0030   Epoch: 16   Global Step: 83720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:01,499-Speed 5595.93 samples/sec   Loss 2.4715   LearningRate 0.0030   Epoch: 16   Global Step: 83730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:03,337-Speed 5574.13 samples/sec   Loss 2.5393   LearningRate 0.0030   Epoch: 16   Global Step: 83740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:05,174-Speed 5575.63 samples/sec   Loss 2.5842   LearningRate 0.0030   Epoch: 16   Global Step: 83750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:07,012-Speed 5574.46 samples/sec   Loss 2.4920   LearningRate 0.0030   Epoch: 16   Global Step: 83760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:08,846-Speed 5585.12 samples/sec   Loss 2.5493   LearningRate 0.0030   Epoch: 16   Global Step: 83770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:10,678-Speed 5591.25 samples/sec   Loss 2.6101   LearningRate 0.0030   Epoch: 16   Global Step: 83780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:12,529-Speed 5532.69 samples/sec   Loss 2.5850   LearningRate 0.0029   Epoch: 16   Global Step: 83790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:14,372-Speed 5560.50 samples/sec   Loss 2.5812   LearningRate 0.0029   Epoch: 16   Global Step: 83800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:16,212-Speed 5566.70 samples/sec   Loss 2.4754   LearningRate 0.0029   Epoch: 16   Global Step: 83810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:38:18,052-Speed 5568.08 samples/sec   Loss 2.4815   LearningRate 0.0029   Epoch: 16   Global Step: 83820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:38:19,876-Speed 5617.85 samples/sec   Loss 2.6301   LearningRate 0.0029   Epoch: 16   Global Step: 83830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:21,713-Speed 5576.43 samples/sec   Loss 2.4872   LearningRate 0.0029   Epoch: 16   Global Step: 83840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:23,546-Speed 5589.19 samples/sec   Loss 2.5060   LearningRate 0.0029   Epoch: 16   Global Step: 83850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:25,414-Speed 5482.60 samples/sec   Loss 2.4942   LearningRate 0.0029   Epoch: 16   Global Step: 83860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:27,250-Speed 5578.55 samples/sec   Loss 2.4900   LearningRate 0.0029   Epoch: 16   Global Step: 83870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:29,093-Speed 5558.80 samples/sec   Loss 2.6070   LearningRate 0.0029   Epoch: 16   Global Step: 83880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:30,945-Speed 5530.66 samples/sec   Loss 2.6585   LearningRate 0.0029   Epoch: 16   Global Step: 83890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:32,785-Speed 5566.76 samples/sec   Loss 2.5679   LearningRate 0.0029   Epoch: 16   Global Step: 83900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:34,624-Speed 5569.58 samples/sec   Loss 2.5026   LearningRate 0.0029   Epoch: 16   Global Step: 83910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:36,470-Speed 5548.71 samples/sec   Loss 2.5938   LearningRate 0.0029   Epoch: 16   Global Step: 83920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:38,311-Speed 5565.04 samples/sec   Loss 2.5839   LearningRate 0.0029   Epoch: 16   Global Step: 83930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:38:40,153-Speed 5564.80 samples/sec   Loss 2.5503   LearningRate 0.0029   Epoch: 16   Global Step: 83940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:38:41,978-Speed 5612.19 samples/sec   Loss 2.5337   LearningRate 0.0029   Epoch: 16   Global Step: 83950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:43,810-Speed 5591.91 samples/sec   Loss 2.5422   LearningRate 0.0029   Epoch: 16   Global Step: 83960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:45,646-Speed 5580.39 samples/sec   Loss 2.5460   LearningRate 0.0029   Epoch: 16   Global Step: 83970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:47,502-Speed 5516.74 samples/sec   Loss 2.5877   LearningRate 0.0029   Epoch: 16   Global Step: 83980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:49,362-Speed 5510.10 samples/sec   Loss 2.5625   LearningRate 0.0029   Epoch: 16   Global Step: 83990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:38:51,226-Speed 5495.48 samples/sec   Loss 2.6409   LearningRate 0.0029   Epoch: 16   Global Step: 84000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:39:17,696-[lfw][84000]XNorm: 22.333789
Training: 2022-04-11 15:39:17,697-[lfw][84000]Accuracy-Flip: 0.99800+-0.00287
Training: 2022-04-11 15:39:17,697-[lfw][84000]Accuracy-Highest: 0.99817
Training: 2022-04-11 15:39:48,300-[cfp_fp][84000]XNorm: 20.998311
Training: 2022-04-11 15:39:48,301-[cfp_fp][84000]Accuracy-Flip: 0.98186+-0.00723
Training: 2022-04-11 15:39:48,302-[cfp_fp][84000]Accuracy-Highest: 0.98243
Training: 2022-04-11 15:40:14,643-[agedb_30][84000]XNorm: 22.086878
Training: 2022-04-11 15:40:14,644-[agedb_30][84000]Accuracy-Flip: 0.98100+-0.00723
Training: 2022-04-11 15:40:14,644-[agedb_30][84000]Accuracy-Highest: 0.98350
Training: 2022-04-11 15:40:16,504-Speed 120.08 samples/sec   Loss 2.7031   LearningRate 0.0029   Epoch: 16   Global Step: 84010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:40:18,337-Speed 5589.82 samples/sec   Loss 2.5498   LearningRate 0.0029   Epoch: 16   Global Step: 84020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:40:20,163-Speed 5609.97 samples/sec   Loss 2.5650   LearningRate 0.0029   Epoch: 16   Global Step: 84030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:40:22,002-Speed 5568.19 samples/sec   Loss 2.5536   LearningRate 0.0029   Epoch: 16   Global Step: 84040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:40:23,851-Speed 5539.83 samples/sec   Loss 2.6660   LearningRate 0.0029   Epoch: 16   Global Step: 84050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:25,681-Speed 5599.29 samples/sec   Loss 2.6107   LearningRate 0.0029   Epoch: 16   Global Step: 84060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:27,523-Speed 5559.62 samples/sec   Loss 2.5476   LearningRate 0.0029   Epoch: 16   Global Step: 84070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:29,355-Speed 5592.22 samples/sec   Loss 2.4689   LearningRate 0.0029   Epoch: 16   Global Step: 84080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:31,195-Speed 5566.19 samples/sec   Loss 2.4982   LearningRate 0.0028   Epoch: 16   Global Step: 84090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:33,031-Speed 5581.27 samples/sec   Loss 2.6658   LearningRate 0.0028   Epoch: 16   Global Step: 84100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:34,870-Speed 5569.90 samples/sec   Loss 2.6137   LearningRate 0.0028   Epoch: 16   Global Step: 84110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:36,704-Speed 5583.90 samples/sec   Loss 2.4902   LearningRate 0.0028   Epoch: 16   Global Step: 84120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:38,531-Speed 5605.96 samples/sec   Loss 2.4538   LearningRate 0.0028   Epoch: 16   Global Step: 84130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:40,362-Speed 5597.48 samples/sec   Loss 2.5876   LearningRate 0.0028   Epoch: 16   Global Step: 84140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:42,201-Speed 5569.12 samples/sec   Loss 2.5593   LearningRate 0.0028   Epoch: 16   Global Step: 84150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:40:44,023-Speed 5623.35 samples/sec   Loss 2.5635   LearningRate 0.0028   Epoch: 16   Global Step: 84160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:45,855-Speed 5590.64 samples/sec   Loss 2.5527   LearningRate 0.0028   Epoch: 16   Global Step: 84170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:47,702-Speed 5547.51 samples/sec   Loss 2.4517   LearningRate 0.0028   Epoch: 16   Global Step: 84180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:49,536-Speed 5584.29 samples/sec   Loss 2.6163   LearningRate 0.0028   Epoch: 16   Global Step: 84190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:51,369-Speed 5588.95 samples/sec   Loss 2.6010   LearningRate 0.0028   Epoch: 16   Global Step: 84200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:53,213-Speed 5555.23 samples/sec   Loss 2.5462   LearningRate 0.0028   Epoch: 16   Global Step: 84210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:55,060-Speed 5545.71 samples/sec   Loss 2.4956   LearningRate 0.0028   Epoch: 16   Global Step: 84220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:56,893-Speed 5590.10 samples/sec   Loss 2.5864   LearningRate 0.0028   Epoch: 16   Global Step: 84230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:40:58,722-Speed 5598.07 samples/sec   Loss 2.5964   LearningRate 0.0028   Epoch: 16   Global Step: 84240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:00,570-Speed 5546.46 samples/sec   Loss 2.4879   LearningRate 0.0028   Epoch: 16   Global Step: 84250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:02,407-Speed 5576.67 samples/sec   Loss 2.5913   LearningRate 0.0028   Epoch: 16   Global Step: 84260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:04,249-Speed 5559.45 samples/sec   Loss 2.5325   LearningRate 0.0028   Epoch: 16   Global Step: 84270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:06,084-Speed 5582.45 samples/sec   Loss 2.5053   LearningRate 0.0028   Epoch: 16   Global Step: 84280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:07,934-Speed 5538.37 samples/sec   Loss 2.4832   LearningRate 0.0028   Epoch: 16   Global Step: 84290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:09,777-Speed 5559.47 samples/sec   Loss 2.5318   LearningRate 0.0028   Epoch: 16   Global Step: 84300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:11,629-Speed 5530.97 samples/sec   Loss 2.5075   LearningRate 0.0028   Epoch: 16   Global Step: 84310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:13,469-Speed 5566.70 samples/sec   Loss 2.5582   LearningRate 0.0028   Epoch: 16   Global Step: 84320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:15,310-Speed 5563.07 samples/sec   Loss 2.6294   LearningRate 0.0028   Epoch: 16   Global Step: 84330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:17,146-Speed 5580.31 samples/sec   Loss 2.5982   LearningRate 0.0028   Epoch: 16   Global Step: 84340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:18,993-Speed 5545.43 samples/sec   Loss 2.5141   LearningRate 0.0028   Epoch: 16   Global Step: 84350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:20,827-Speed 5586.81 samples/sec   Loss 2.5169   LearningRate 0.0028   Epoch: 16   Global Step: 84360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:41:22,653-Speed 5611.82 samples/sec   Loss 2.5265   LearningRate 0.0028   Epoch: 16   Global Step: 84370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:24,502-Speed 5538.39 samples/sec   Loss 2.5668   LearningRate 0.0028   Epoch: 16   Global Step: 84380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:26,345-Speed 5559.54 samples/sec   Loss 2.5507   LearningRate 0.0027   Epoch: 16   Global Step: 84390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:28,191-Speed 5548.87 samples/sec   Loss 2.4747   LearningRate 0.0027   Epoch: 16   Global Step: 84400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:30,023-Speed 5592.84 samples/sec   Loss 2.5384   LearningRate 0.0027   Epoch: 16   Global Step: 84410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:31,856-Speed 5585.76 samples/sec   Loss 2.5948   LearningRate 0.0027   Epoch: 16   Global Step: 84420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:33,694-Speed 5574.91 samples/sec   Loss 2.5978   LearningRate 0.0027   Epoch: 16   Global Step: 84430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:35,530-Speed 5579.93 samples/sec   Loss 2.5228   LearningRate 0.0027   Epoch: 16   Global Step: 84440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:37,370-Speed 5565.32 samples/sec   Loss 2.5778   LearningRate 0.0027   Epoch: 16   Global Step: 84450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:39,203-Speed 5590.44 samples/sec   Loss 2.6101   LearningRate 0.0027   Epoch: 16   Global Step: 84460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:41,039-Speed 5579.26 samples/sec   Loss 2.5228   LearningRate 0.0027   Epoch: 16   Global Step: 84470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:42,874-Speed 5583.32 samples/sec   Loss 2.5550   LearningRate 0.0027   Epoch: 16   Global Step: 84480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:44,706-Speed 5590.48 samples/sec   Loss 2.4496   LearningRate 0.0027   Epoch: 16   Global Step: 84490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:46,544-Speed 5573.88 samples/sec   Loss 2.5564   LearningRate 0.0027   Epoch: 16   Global Step: 84500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:48,408-Speed 5496.29 samples/sec   Loss 2.6324   LearningRate 0.0027   Epoch: 16   Global Step: 84510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:50,262-Speed 5525.36 samples/sec   Loss 2.4521   LearningRate 0.0027   Epoch: 16   Global Step: 84520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:52,129-Speed 5485.00 samples/sec   Loss 2.5780   LearningRate 0.0027   Epoch: 16   Global Step: 84530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:53,999-Speed 5479.67 samples/sec   Loss 2.6432   LearningRate 0.0027   Epoch: 16   Global Step: 84540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:55,842-Speed 5558.60 samples/sec   Loss 2.5009   LearningRate 0.0027   Epoch: 16   Global Step: 84550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:57,675-Speed 5587.41 samples/sec   Loss 2.5983   LearningRate 0.0027   Epoch: 16   Global Step: 84560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:41:59,517-Speed 5561.11 samples/sec   Loss 2.5115   LearningRate 0.0027   Epoch: 16   Global Step: 84570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:01,376-Speed 5512.45 samples/sec   Loss 2.5376   LearningRate 0.0027   Epoch: 16   Global Step: 84580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:03,255-Speed 5451.08 samples/sec   Loss 2.5051   LearningRate 0.0027   Epoch: 16   Global Step: 84590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:05,119-Speed 5494.77 samples/sec   Loss 2.6524   LearningRate 0.0027   Epoch: 16   Global Step: 84600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:06,961-Speed 5562.31 samples/sec   Loss 2.4692   LearningRate 0.0027   Epoch: 16   Global Step: 84610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:08,793-Speed 5591.01 samples/sec   Loss 2.6177   LearningRate 0.0027   Epoch: 16   Global Step: 84620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:10,631-Speed 5573.79 samples/sec   Loss 2.5899   LearningRate 0.0027   Epoch: 16   Global Step: 84630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:12,471-Speed 5565.49 samples/sec   Loss 2.6619   LearningRate 0.0027   Epoch: 16   Global Step: 84640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:14,320-Speed 5540.35 samples/sec   Loss 2.5340   LearningRate 0.0027   Epoch: 16   Global Step: 84650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:16,209-Speed 5422.71 samples/sec   Loss 2.5323   LearningRate 0.0027   Epoch: 16   Global Step: 84660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:18,052-Speed 5559.20 samples/sec   Loss 2.5673   LearningRate 0.0027   Epoch: 16   Global Step: 84670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:42:19,879-Speed 5607.22 samples/sec   Loss 2.5017   LearningRate 0.0027   Epoch: 16   Global Step: 84680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:21,716-Speed 5578.18 samples/sec   Loss 2.4994   LearningRate 0.0027   Epoch: 16   Global Step: 84690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:23,548-Speed 5591.58 samples/sec   Loss 2.4984   LearningRate 0.0026   Epoch: 16   Global Step: 84700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:25,381-Speed 5588.14 samples/sec   Loss 2.5212   LearningRate 0.0026   Epoch: 16   Global Step: 84710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:27,218-Speed 5576.32 samples/sec   Loss 2.5089   LearningRate 0.0026   Epoch: 16   Global Step: 84720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:29,053-Speed 5582.07 samples/sec   Loss 2.5307   LearningRate 0.0026   Epoch: 16   Global Step: 84730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:30,884-Speed 5594.18 samples/sec   Loss 2.4716   LearningRate 0.0026   Epoch: 16   Global Step: 84740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:32,723-Speed 5570.98 samples/sec   Loss 2.4939   LearningRate 0.0026   Epoch: 16   Global Step: 84750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:34,559-Speed 5579.52 samples/sec   Loss 2.5125   LearningRate 0.0026   Epoch: 16   Global Step: 84760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:36,406-Speed 5544.94 samples/sec   Loss 2.5883   LearningRate 0.0026   Epoch: 16   Global Step: 84770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:38,250-Speed 5556.35 samples/sec   Loss 2.5442   LearningRate 0.0026   Epoch: 16   Global Step: 84780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:42:40,099-Speed 5539.43 samples/sec   Loss 2.5098   LearningRate 0.0026   Epoch: 16   Global Step: 84790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:41,943-Speed 5556.24 samples/sec   Loss 2.4843   LearningRate 0.0026   Epoch: 16   Global Step: 84800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:43,783-Speed 5566.53 samples/sec   Loss 2.6409   LearningRate 0.0026   Epoch: 16   Global Step: 84810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:45,623-Speed 5569.43 samples/sec   Loss 2.5265   LearningRate 0.0026   Epoch: 16   Global Step: 84820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:47,455-Speed 5590.15 samples/sec   Loss 2.6081   LearningRate 0.0026   Epoch: 16   Global Step: 84830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:49,295-Speed 5567.50 samples/sec   Loss 2.4996   LearningRate 0.0026   Epoch: 16   Global Step: 84840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:51,130-Speed 5582.81 samples/sec   Loss 2.4659   LearningRate 0.0026   Epoch: 16   Global Step: 84850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:52,968-Speed 5573.38 samples/sec   Loss 2.6147   LearningRate 0.0026   Epoch: 16   Global Step: 84860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:54,807-Speed 5570.92 samples/sec   Loss 2.5995   LearningRate 0.0026   Epoch: 16   Global Step: 84870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:56,641-Speed 5583.61 samples/sec   Loss 2.5863   LearningRate 0.0026   Epoch: 16   Global Step: 84880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:42:58,500-Speed 5511.70 samples/sec   Loss 2.4705   LearningRate 0.0026   Epoch: 16   Global Step: 84890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:43:00,341-Speed 5564.13 samples/sec   Loss 2.5381   LearningRate 0.0026   Epoch: 16   Global Step: 84900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:43:02,199-Speed 5514.76 samples/sec   Loss 2.5285   LearningRate 0.0026   Epoch: 16   Global Step: 84910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:43:04,029-Speed 5595.32 samples/sec   Loss 2.6482   LearningRate 0.0026   Epoch: 16   Global Step: 84920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:43:05,874-Speed 5553.27 samples/sec   Loss 2.7424   LearningRate 0.0026   Epoch: 16   Global Step: 84930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:43:07,711-Speed 5577.67 samples/sec   Loss 2.4594   LearningRate 0.0026   Epoch: 16   Global Step: 84940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:43:09,545-Speed 5584.88 samples/sec   Loss 2.4991   LearningRate 0.0026   Epoch: 16   Global Step: 84950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:43:11,404-Speed 5509.18 samples/sec   Loss 2.5684   LearningRate 0.0026   Epoch: 16   Global Step: 84960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:43:13,244-Speed 5566.21 samples/sec   Loss 2.5109   LearningRate 0.0026   Epoch: 16   Global Step: 84970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:43:15,086-Speed 5562.10 samples/sec   Loss 2.6311   LearningRate 0.0026   Epoch: 16   Global Step: 84980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:43:16,926-Speed 5569.16 samples/sec   Loss 2.5062   LearningRate 0.0026   Epoch: 16   Global Step: 84990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:18,762-Speed 5578.29 samples/sec   Loss 2.5671   LearningRate 0.0026   Epoch: 16   Global Step: 85000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:20,603-Speed 5564.09 samples/sec   Loss 2.5523   LearningRate 0.0025   Epoch: 16   Global Step: 85010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:22,437-Speed 5587.58 samples/sec   Loss 2.5850   LearningRate 0.0025   Epoch: 16   Global Step: 85020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:24,282-Speed 5549.94 samples/sec   Loss 2.4859   LearningRate 0.0025   Epoch: 16   Global Step: 85030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:26,128-Speed 5550.93 samples/sec   Loss 2.4840   LearningRate 0.0025   Epoch: 16   Global Step: 85040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:27,972-Speed 5555.93 samples/sec   Loss 2.4995   LearningRate 0.0025   Epoch: 16   Global Step: 85050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:29,810-Speed 5572.39 samples/sec   Loss 2.4252   LearningRate 0.0025   Epoch: 16   Global Step: 85060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:31,641-Speed 5596.10 samples/sec   Loss 2.4588   LearningRate 0.0025   Epoch: 16   Global Step: 85070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:33,475-Speed 5584.58 samples/sec   Loss 2.5588   LearningRate 0.0025   Epoch: 16   Global Step: 85080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:35,317-Speed 5561.29 samples/sec   Loss 2.5719   LearningRate 0.0025   Epoch: 16   Global Step: 85090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:43:37,153-Speed 5578.01 samples/sec   Loss 2.5228   LearningRate 0.0025   Epoch: 16   Global Step: 85100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:43:38,986-Speed 5590.13 samples/sec   Loss 2.6051   LearningRate 0.0025   Epoch: 16   Global Step: 85110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:40,833-Speed 5547.62 samples/sec   Loss 2.4514   LearningRate 0.0025   Epoch: 16   Global Step: 85120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:42,681-Speed 5540.80 samples/sec   Loss 2.6758   LearningRate 0.0025   Epoch: 16   Global Step: 85130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:44,514-Speed 5590.68 samples/sec   Loss 2.4748   LearningRate 0.0025   Epoch: 16   Global Step: 85140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:46,348-Speed 5586.36 samples/sec   Loss 2.5163   LearningRate 0.0025   Epoch: 16   Global Step: 85150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:48,181-Speed 5588.20 samples/sec   Loss 2.6435   LearningRate 0.0025   Epoch: 16   Global Step: 85160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:50,038-Speed 5517.39 samples/sec   Loss 2.5973   LearningRate 0.0025   Epoch: 16   Global Step: 85170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:43:51,872-Speed 5583.29 samples/sec   Loss 2.6147   LearningRate 0.0025   Epoch: 16   Global Step: 85180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:43:53,711-Speed 5569.65 samples/sec   Loss 2.4618   LearningRate 0.0025   Epoch: 16   Global Step: 85190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:43:55,555-Speed 5557.08 samples/sec   Loss 2.5057   LearningRate 0.0025   Epoch: 16   Global Step: 85200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:43:57,389-Speed 5586.23 samples/sec   Loss 2.3526   LearningRate 0.0025   Epoch: 16   Global Step: 85210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:43:59,221-Speed 5591.16 samples/sec   Loss 2.5596   LearningRate 0.0025   Epoch: 16   Global Step: 85220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:01,070-Speed 5540.25 samples/sec   Loss 2.6003   LearningRate 0.0025   Epoch: 16   Global Step: 85230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:02,914-Speed 5556.42 samples/sec   Loss 2.4876   LearningRate 0.0025   Epoch: 16   Global Step: 85240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:04,752-Speed 5572.82 samples/sec   Loss 2.4983   LearningRate 0.0025   Epoch: 16   Global Step: 85250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:06,589-Speed 5575.99 samples/sec   Loss 2.5824   LearningRate 0.0025   Epoch: 16   Global Step: 85260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:08,423-Speed 5584.75 samples/sec   Loss 2.5465   LearningRate 0.0025   Epoch: 16   Global Step: 85270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:10,258-Speed 5583.58 samples/sec   Loss 2.4987   LearningRate 0.0025   Epoch: 16   Global Step: 85280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:44:12,098-Speed 5568.49 samples/sec   Loss 2.4917   LearningRate 0.0025   Epoch: 16   Global Step: 85290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:44:13,944-Speed 5548.68 samples/sec   Loss 2.5253   LearningRate 0.0025   Epoch: 16   Global Step: 85300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:44:15,804-Speed 5508.06 samples/sec   Loss 2.5362   LearningRate 0.0025   Epoch: 16   Global Step: 85310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:44:17,635-Speed 5593.19 samples/sec   Loss 2.4407   LearningRate 0.0025   Epoch: 16   Global Step: 85320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:44:19,470-Speed 5581.00 samples/sec   Loss 2.4504   LearningRate 0.0024   Epoch: 16   Global Step: 85330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:44:21,308-Speed 5574.35 samples/sec   Loss 2.5011   LearningRate 0.0024   Epoch: 16   Global Step: 85340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:44:23,141-Speed 5587.80 samples/sec   Loss 2.5899   LearningRate 0.0024   Epoch: 16   Global Step: 85350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:44:24,976-Speed 5584.39 samples/sec   Loss 2.4285   LearningRate 0.0024   Epoch: 16   Global Step: 85360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:44:26,829-Speed 5529.80 samples/sec   Loss 2.5460   LearningRate 0.0024   Epoch: 16   Global Step: 85370   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:28,680-Speed 5533.06 samples/sec   Loss 2.5377   LearningRate 0.0024   Epoch: 16   Global Step: 85380   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:30,519-Speed 5570.61 samples/sec   Loss 2.5866   LearningRate 0.0024   Epoch: 16   Global Step: 85390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:32,355-Speed 5579.86 samples/sec   Loss 2.5742   LearningRate 0.0024   Epoch: 16   Global Step: 85400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:34,195-Speed 5566.44 samples/sec   Loss 2.4734   LearningRate 0.0024   Epoch: 16   Global Step: 85410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:36,031-Speed 5580.65 samples/sec   Loss 2.5683   LearningRate 0.0024   Epoch: 16   Global Step: 85420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:37,876-Speed 5551.61 samples/sec   Loss 2.5468   LearningRate 0.0024   Epoch: 16   Global Step: 85430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:39,714-Speed 5571.59 samples/sec   Loss 2.6313   LearningRate 0.0024   Epoch: 16   Global Step: 85440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:41,577-Speed 5498.78 samples/sec   Loss 2.5775   LearningRate 0.0024   Epoch: 16   Global Step: 85450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:43,411-Speed 5585.11 samples/sec   Loss 2.5703   LearningRate 0.0024   Epoch: 16   Global Step: 85460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:45,245-Speed 5588.47 samples/sec   Loss 2.5714   LearningRate 0.0024   Epoch: 16   Global Step: 85470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:44:47,094-Speed 5539.08 samples/sec   Loss 2.6359   LearningRate 0.0024   Epoch: 16   Global Step: 85480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:44:48,936-Speed 5562.76 samples/sec   Loss 2.6382   LearningRate 0.0024   Epoch: 16   Global Step: 85490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:44:50,780-Speed 5553.69 samples/sec   Loss 2.6018   LearningRate 0.0024   Epoch: 16   Global Step: 85500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:44:52,618-Speed 5575.67 samples/sec   Loss 2.5330   LearningRate 0.0024   Epoch: 16   Global Step: 85510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:44:54,453-Speed 5580.56 samples/sec   Loss 2.5956   LearningRate 0.0024   Epoch: 16   Global Step: 85520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:44:56,291-Speed 5575.59 samples/sec   Loss 2.5207   LearningRate 0.0024   Epoch: 16   Global Step: 85530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:44:58,119-Speed 5603.33 samples/sec   Loss 2.5720   LearningRate 0.0024   Epoch: 16   Global Step: 85540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:44:59,953-Speed 5585.99 samples/sec   Loss 2.5084   LearningRate 0.0024   Epoch: 16   Global Step: 85550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:01,789-Speed 5579.18 samples/sec   Loss 2.6042   LearningRate 0.0024   Epoch: 16   Global Step: 85560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:03,641-Speed 5530.57 samples/sec   Loss 2.4402   LearningRate 0.0024   Epoch: 16   Global Step: 85570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:05,501-Speed 5508.47 samples/sec   Loss 2.5837   LearningRate 0.0024   Epoch: 16   Global Step: 85580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:07,369-Speed 5480.82 samples/sec   Loss 2.4607   LearningRate 0.0024   Epoch: 16   Global Step: 85590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:09,215-Speed 5552.77 samples/sec   Loss 2.5046   LearningRate 0.0024   Epoch: 16   Global Step: 85600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:11,052-Speed 5576.92 samples/sec   Loss 2.4605   LearningRate 0.0024   Epoch: 16   Global Step: 85610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:12,896-Speed 5554.86 samples/sec   Loss 2.5489   LearningRate 0.0024   Epoch: 16   Global Step: 85620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:14,732-Speed 5577.28 samples/sec   Loss 2.6414   LearningRate 0.0024   Epoch: 16   Global Step: 85630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:16,573-Speed 5564.66 samples/sec   Loss 2.4809   LearningRate 0.0024   Epoch: 16   Global Step: 85640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:45:18,410-Speed 5578.07 samples/sec   Loss 2.5548   LearningRate 0.0024   Epoch: 16   Global Step: 85650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:45:20,243-Speed 5586.74 samples/sec   Loss 2.5824   LearningRate 0.0023   Epoch: 16   Global Step: 85660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:45:22,081-Speed 5574.88 samples/sec   Loss 2.5504   LearningRate 0.0023   Epoch: 16   Global Step: 85670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:45:23,969-Speed 5424.86 samples/sec   Loss 2.3725   LearningRate 0.0023   Epoch: 16   Global Step: 85680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:45:25,819-Speed 5536.42 samples/sec   Loss 2.5180   LearningRate 0.0023   Epoch: 16   Global Step: 85690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:45:27,662-Speed 5559.68 samples/sec   Loss 2.3729   LearningRate 0.0023   Epoch: 16   Global Step: 85700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:29,503-Speed 5562.06 samples/sec   Loss 2.5120   LearningRate 0.0023   Epoch: 16   Global Step: 85710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:31,355-Speed 5535.11 samples/sec   Loss 2.5966   LearningRate 0.0023   Epoch: 16   Global Step: 85720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:33,195-Speed 5567.05 samples/sec   Loss 2.5092   LearningRate 0.0023   Epoch: 16   Global Step: 85730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:35,028-Speed 5587.15 samples/sec   Loss 2.4933   LearningRate 0.0023   Epoch: 16   Global Step: 85740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:36,864-Speed 5580.82 samples/sec   Loss 2.4255   LearningRate 0.0023   Epoch: 16   Global Step: 85750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:38,702-Speed 5572.74 samples/sec   Loss 2.5397   LearningRate 0.0023   Epoch: 16   Global Step: 85760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:40,534-Speed 5590.29 samples/sec   Loss 2.5110   LearningRate 0.0023   Epoch: 16   Global Step: 85770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:42,384-Speed 5538.18 samples/sec   Loss 2.3998   LearningRate 0.0023   Epoch: 16   Global Step: 85780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:44,223-Speed 5569.84 samples/sec   Loss 2.4810   LearningRate 0.0023   Epoch: 16   Global Step: 85790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:45:46,063-Speed 5568.47 samples/sec   Loss 2.5840   LearningRate 0.0023   Epoch: 16   Global Step: 85800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:45:47,910-Speed 5543.79 samples/sec   Loss 2.4736   LearningRate 0.0023   Epoch: 16   Global Step: 85810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:45:49,752-Speed 5561.84 samples/sec   Loss 2.5520   LearningRate 0.0023   Epoch: 16   Global Step: 85820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:45:51,599-Speed 5546.72 samples/sec   Loss 2.5879   LearningRate 0.0023   Epoch: 16   Global Step: 85830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:45:53,455-Speed 5519.11 samples/sec   Loss 2.6130   LearningRate 0.0023   Epoch: 16   Global Step: 85840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:45:55,292-Speed 5578.61 samples/sec   Loss 2.5543   LearningRate 0.0023   Epoch: 16   Global Step: 85850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:45:57,128-Speed 5577.68 samples/sec   Loss 2.6232   LearningRate 0.0023   Epoch: 16   Global Step: 85860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:45:58,970-Speed 5562.58 samples/sec   Loss 2.4845   LearningRate 0.0023   Epoch: 16   Global Step: 85870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:46:00,813-Speed 5558.79 samples/sec   Loss 2.5017   LearningRate 0.0023   Epoch: 16   Global Step: 85880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:46:02,651-Speed 5570.61 samples/sec   Loss 2.5123   LearningRate 0.0023   Epoch: 16   Global Step: 85890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:46:04,492-Speed 5565.23 samples/sec   Loss 2.4952   LearningRate 0.0023   Epoch: 16   Global Step: 85900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:46:06,327-Speed 5584.42 samples/sec   Loss 2.6726   LearningRate 0.0023   Epoch: 16   Global Step: 85910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:46:08,164-Speed 5574.81 samples/sec   Loss 2.5375   LearningRate 0.0023   Epoch: 16   Global Step: 85920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:46:10,005-Speed 5565.15 samples/sec   Loss 2.5456   LearningRate 0.0023   Epoch: 16   Global Step: 85930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:46:11,867-Speed 5500.57 samples/sec   Loss 2.4701   LearningRate 0.0023   Epoch: 16   Global Step: 85940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:46:13,761-Speed 5408.78 samples/sec   Loss 2.5492   LearningRate 0.0023   Epoch: 16   Global Step: 85950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:46:15,611-Speed 5539.16 samples/sec   Loss 2.5191   LearningRate 0.0023   Epoch: 16   Global Step: 85960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:46:17,469-Speed 5511.97 samples/sec   Loss 2.4539   LearningRate 0.0023   Epoch: 16   Global Step: 85970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:46:19,397-Speed 5312.93 samples/sec   Loss 2.7140   LearningRate 0.0023   Epoch: 16   Global Step: 85980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:46:30,032-Speed 962.97 samples/sec   Loss 2.2654   LearningRate 0.0022   Epoch: 17   Global Step: 85990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:46:31,892-Speed 5508.66 samples/sec   Loss 1.9763   LearningRate 0.0022   Epoch: 17   Global Step: 86000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:46:58,452-[lfw][86000]XNorm: 22.249283
Training: 2022-04-11 15:46:58,453-[lfw][86000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 15:46:58,453-[lfw][86000]Accuracy-Highest: 0.99817
Training: 2022-04-11 15:47:28,991-[cfp_fp][86000]XNorm: 20.879434
Training: 2022-04-11 15:47:28,992-[cfp_fp][86000]Accuracy-Flip: 0.98386+-0.00586
Training: 2022-04-11 15:47:28,992-[cfp_fp][86000]Accuracy-Highest: 0.98386
Training: 2022-04-11 15:47:55,324-[agedb_30][86000]XNorm: 22.412029
Training: 2022-04-11 15:47:55,325-[agedb_30][86000]Accuracy-Flip: 0.98283+-0.00654
Training: 2022-04-11 15:47:55,326-[agedb_30][86000]Accuracy-Highest: 0.98350
Training: 2022-04-11 15:47:57,181-Speed 120.06 samples/sec   Loss 1.9425   LearningRate 0.0022   Epoch: 17   Global Step: 86010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:47:58,999-Speed 5632.54 samples/sec   Loss 1.8503   LearningRate 0.0022   Epoch: 17   Global Step: 86020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:00,838-Speed 5570.26 samples/sec   Loss 1.9287   LearningRate 0.0022   Epoch: 17   Global Step: 86030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:02,685-Speed 5545.32 samples/sec   Loss 1.8377   LearningRate 0.0022   Epoch: 17   Global Step: 86040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:04,518-Speed 5587.81 samples/sec   Loss 2.0067   LearningRate 0.0022   Epoch: 17   Global Step: 86050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:06,390-Speed 5473.12 samples/sec   Loss 1.9368   LearningRate 0.0022   Epoch: 17   Global Step: 86060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:08,223-Speed 5590.32 samples/sec   Loss 1.9248   LearningRate 0.0022   Epoch: 17   Global Step: 86070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:10,063-Speed 5564.88 samples/sec   Loss 1.9789   LearningRate 0.0022   Epoch: 17   Global Step: 86080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:11,928-Speed 5494.85 samples/sec   Loss 1.9257   LearningRate 0.0022   Epoch: 17   Global Step: 86090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:13,847-Speed 5337.35 samples/sec   Loss 1.9140   LearningRate 0.0022   Epoch: 17   Global Step: 86100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:15,752-Speed 5379.25 samples/sec   Loss 1.8696   LearningRate 0.0022   Epoch: 17   Global Step: 86110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:17,718-Speed 5210.93 samples/sec   Loss 1.9394   LearningRate 0.0022   Epoch: 17   Global Step: 86120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:48:19,556-Speed 5571.20 samples/sec   Loss 1.9802   LearningRate 0.0022   Epoch: 17   Global Step: 86130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:48:21,399-Speed 5560.91 samples/sec   Loss 1.9666   LearningRate 0.0022   Epoch: 17   Global Step: 86140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:48:23,237-Speed 5571.05 samples/sec   Loss 1.9388   LearningRate 0.0022   Epoch: 17   Global Step: 86150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:48:25,068-Speed 5596.73 samples/sec   Loss 1.9588   LearningRate 0.0022   Epoch: 17   Global Step: 86160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:48:26,891-Speed 5617.95 samples/sec   Loss 1.8306   LearningRate 0.0022   Epoch: 17   Global Step: 86170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:28,734-Speed 5559.60 samples/sec   Loss 1.9011   LearningRate 0.0022   Epoch: 17   Global Step: 86180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:30,579-Speed 5551.81 samples/sec   Loss 1.9532   LearningRate 0.0022   Epoch: 17   Global Step: 86190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:32,426-Speed 5545.34 samples/sec   Loss 1.9534   LearningRate 0.0022   Epoch: 17   Global Step: 86200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:34,266-Speed 5566.45 samples/sec   Loss 1.8496   LearningRate 0.0022   Epoch: 17   Global Step: 86210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:36,114-Speed 5545.54 samples/sec   Loss 1.8912   LearningRate 0.0022   Epoch: 17   Global Step: 86220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:37,957-Speed 5557.61 samples/sec   Loss 1.9362   LearningRate 0.0022   Epoch: 17   Global Step: 86230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:39,840-Speed 5441.01 samples/sec   Loss 1.9801   LearningRate 0.0022   Epoch: 17   Global Step: 86240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:41,692-Speed 5530.12 samples/sec   Loss 2.0438   LearningRate 0.0022   Epoch: 17   Global Step: 86250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:43,526-Speed 5586.05 samples/sec   Loss 1.9539   LearningRate 0.0022   Epoch: 17   Global Step: 86260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:45,359-Speed 5587.93 samples/sec   Loss 1.8820   LearningRate 0.0022   Epoch: 17   Global Step: 86270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:48:47,201-Speed 5562.54 samples/sec   Loss 1.9101   LearningRate 0.0022   Epoch: 17   Global Step: 86280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:48:49,035-Speed 5584.22 samples/sec   Loss 1.9638   LearningRate 0.0022   Epoch: 17   Global Step: 86290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:48:50,871-Speed 5580.47 samples/sec   Loss 1.9651   LearningRate 0.0022   Epoch: 17   Global Step: 86300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:48:52,703-Speed 5591.19 samples/sec   Loss 1.9664   LearningRate 0.0022   Epoch: 17   Global Step: 86310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:54,560-Speed 5516.42 samples/sec   Loss 1.9880   LearningRate 0.0022   Epoch: 17   Global Step: 86320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:56,405-Speed 5552.24 samples/sec   Loss 1.9270   LearningRate 0.0021   Epoch: 17   Global Step: 86330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:48:58,246-Speed 5564.66 samples/sec   Loss 1.9542   LearningRate 0.0021   Epoch: 17   Global Step: 86340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:00,087-Speed 5564.14 samples/sec   Loss 1.9500   LearningRate 0.0021   Epoch: 17   Global Step: 86350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:01,944-Speed 5516.52 samples/sec   Loss 1.9598   LearningRate 0.0021   Epoch: 17   Global Step: 86360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:03,788-Speed 5555.85 samples/sec   Loss 1.9304   LearningRate 0.0021   Epoch: 17   Global Step: 86370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:05,630-Speed 5560.44 samples/sec   Loss 1.8729   LearningRate 0.0021   Epoch: 17   Global Step: 86380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:07,478-Speed 5543.11 samples/sec   Loss 1.9249   LearningRate 0.0021   Epoch: 17   Global Step: 86390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:09,315-Speed 5578.24 samples/sec   Loss 1.9415   LearningRate 0.0021   Epoch: 17   Global Step: 86400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:11,156-Speed 5564.17 samples/sec   Loss 1.8970   LearningRate 0.0021   Epoch: 17   Global Step: 86410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:49:13,018-Speed 5500.49 samples/sec   Loss 1.9568   LearningRate 0.0021   Epoch: 17   Global Step: 86420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:14,886-Speed 5483.54 samples/sec   Loss 1.9527   LearningRate 0.0021   Epoch: 17   Global Step: 86430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:16,738-Speed 5533.63 samples/sec   Loss 1.8708   LearningRate 0.0021   Epoch: 17   Global Step: 86440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:18,580-Speed 5560.52 samples/sec   Loss 1.9852   LearningRate 0.0021   Epoch: 17   Global Step: 86450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:20,422-Speed 5562.87 samples/sec   Loss 2.0335   LearningRate 0.0021   Epoch: 17   Global Step: 86460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:22,280-Speed 5512.02 samples/sec   Loss 1.9244   LearningRate 0.0021   Epoch: 17   Global Step: 86470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:24,165-Speed 5433.93 samples/sec   Loss 1.9815   LearningRate 0.0021   Epoch: 17   Global Step: 86480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:26,044-Speed 5453.26 samples/sec   Loss 2.0127   LearningRate 0.0021   Epoch: 17   Global Step: 86490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:27,885-Speed 5563.69 samples/sec   Loss 2.0204   LearningRate 0.0021   Epoch: 17   Global Step: 86500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:29,718-Speed 5588.60 samples/sec   Loss 1.9595   LearningRate 0.0021   Epoch: 17   Global Step: 86510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:31,555-Speed 5576.88 samples/sec   Loss 2.0657   LearningRate 0.0021   Epoch: 17   Global Step: 86520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:49:33,398-Speed 5557.08 samples/sec   Loss 2.0456   LearningRate 0.0021   Epoch: 17   Global Step: 86530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:49:35,242-Speed 5557.28 samples/sec   Loss 1.9543   LearningRate 0.0021   Epoch: 17   Global Step: 86540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:49:37,083-Speed 5565.33 samples/sec   Loss 1.9926   LearningRate 0.0021   Epoch: 17   Global Step: 86550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:49:38,918-Speed 5581.20 samples/sec   Loss 2.0175   LearningRate 0.0021   Epoch: 17   Global Step: 86560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:49:40,751-Speed 5587.25 samples/sec   Loss 2.0676   LearningRate 0.0021   Epoch: 17   Global Step: 86570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:49:42,584-Speed 5588.19 samples/sec   Loss 2.0721   LearningRate 0.0021   Epoch: 17   Global Step: 86580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:49:44,426-Speed 5561.07 samples/sec   Loss 1.9226   LearningRate 0.0021   Epoch: 17   Global Step: 86590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:49:46,263-Speed 5576.74 samples/sec   Loss 2.0191   LearningRate 0.0021   Epoch: 17   Global Step: 86600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:49:48,110-Speed 5545.75 samples/sec   Loss 1.9567   LearningRate 0.0021   Epoch: 17   Global Step: 86610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:49:49,987-Speed 5458.16 samples/sec   Loss 2.0406   LearningRate 0.0021   Epoch: 17   Global Step: 86620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:51,859-Speed 5472.36 samples/sec   Loss 2.0058   LearningRate 0.0021   Epoch: 17   Global Step: 86630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:53,704-Speed 5554.45 samples/sec   Loss 1.8994   LearningRate 0.0021   Epoch: 17   Global Step: 86640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:55,538-Speed 5586.61 samples/sec   Loss 1.9513   LearningRate 0.0021   Epoch: 17   Global Step: 86650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:57,370-Speed 5590.31 samples/sec   Loss 1.9769   LearningRate 0.0021   Epoch: 17   Global Step: 86660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:49:59,206-Speed 5578.99 samples/sec   Loss 2.0602   LearningRate 0.0021   Epoch: 17   Global Step: 86670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:01,048-Speed 5561.63 samples/sec   Loss 2.1086   LearningRate 0.0020   Epoch: 17   Global Step: 86680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:02,936-Speed 5425.76 samples/sec   Loss 2.0209   LearningRate 0.0020   Epoch: 17   Global Step: 86690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:04,769-Speed 5590.76 samples/sec   Loss 2.0277   LearningRate 0.0020   Epoch: 17   Global Step: 86700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:06,615-Speed 5548.19 samples/sec   Loss 2.0138   LearningRate 0.0020   Epoch: 17   Global Step: 86710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:08,454-Speed 5570.44 samples/sec   Loss 2.0359   LearningRate 0.0020   Epoch: 17   Global Step: 86720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:50:10,294-Speed 5568.19 samples/sec   Loss 2.0821   LearningRate 0.0020   Epoch: 17   Global Step: 86730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:50:12,139-Speed 5551.50 samples/sec   Loss 1.9597   LearningRate 0.0020   Epoch: 17   Global Step: 86740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:50:14,007-Speed 5485.14 samples/sec   Loss 1.9560   LearningRate 0.0020   Epoch: 17   Global Step: 86750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:50:15,887-Speed 5448.17 samples/sec   Loss 2.0882   LearningRate 0.0020   Epoch: 17   Global Step: 86760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:17,730-Speed 5560.78 samples/sec   Loss 2.0326   LearningRate 0.0020   Epoch: 17   Global Step: 86770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:19,563-Speed 5585.91 samples/sec   Loss 2.0158   LearningRate 0.0020   Epoch: 17   Global Step: 86780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:21,409-Speed 5551.07 samples/sec   Loss 1.9777   LearningRate 0.0020   Epoch: 17   Global Step: 86790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:23,246-Speed 5576.49 samples/sec   Loss 2.0183   LearningRate 0.0020   Epoch: 17   Global Step: 86800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:25,090-Speed 5555.00 samples/sec   Loss 1.9987   LearningRate 0.0020   Epoch: 17   Global Step: 86810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:26,937-Speed 5545.69 samples/sec   Loss 1.9686   LearningRate 0.0020   Epoch: 17   Global Step: 86820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:28,773-Speed 5578.13 samples/sec   Loss 2.0531   LearningRate 0.0020   Epoch: 17   Global Step: 86830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:30,624-Speed 5536.18 samples/sec   Loss 1.9922   LearningRate 0.0020   Epoch: 17   Global Step: 86840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:32,471-Speed 5546.14 samples/sec   Loss 2.0636   LearningRate 0.0020   Epoch: 17   Global Step: 86850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:34,298-Speed 5607.06 samples/sec   Loss 1.9937   LearningRate 0.0020   Epoch: 17   Global Step: 86860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:36,136-Speed 5573.74 samples/sec   Loss 2.0008   LearningRate 0.0020   Epoch: 17   Global Step: 86870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:37,974-Speed 5573.78 samples/sec   Loss 2.0439   LearningRate 0.0020   Epoch: 17   Global Step: 86880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:50:39,800-Speed 5607.47 samples/sec   Loss 1.9787   LearningRate 0.0020   Epoch: 17   Global Step: 86890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:50:41,637-Speed 5577.87 samples/sec   Loss 1.8836   LearningRate 0.0020   Epoch: 17   Global Step: 86900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:50:43,472-Speed 5580.93 samples/sec   Loss 1.9305   LearningRate 0.0020   Epoch: 17   Global Step: 86910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:50:45,311-Speed 5570.53 samples/sec   Loss 1.9598   LearningRate 0.0020   Epoch: 17   Global Step: 86920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:50:47,154-Speed 5560.29 samples/sec   Loss 2.0308   LearningRate 0.0020   Epoch: 17   Global Step: 86930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:50:48,997-Speed 5557.57 samples/sec   Loss 1.9635   LearningRate 0.0020   Epoch: 17   Global Step: 86940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:50:50,829-Speed 5591.20 samples/sec   Loss 1.9877   LearningRate 0.0020   Epoch: 17   Global Step: 86950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:50:52,689-Speed 5508.30 samples/sec   Loss 2.0144   LearningRate 0.0020   Epoch: 17   Global Step: 86960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:50:54,525-Speed 5578.66 samples/sec   Loss 1.9840   LearningRate 0.0020   Epoch: 17   Global Step: 86970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:50:56,359-Speed 5587.03 samples/sec   Loss 2.0803   LearningRate 0.0020   Epoch: 17   Global Step: 86980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:50:58,207-Speed 5541.32 samples/sec   Loss 2.0171   LearningRate 0.0020   Epoch: 17   Global Step: 86990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:00,042-Speed 5583.98 samples/sec   Loss 1.9678   LearningRate 0.0020   Epoch: 17   Global Step: 87000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:01,884-Speed 5562.91 samples/sec   Loss 2.0570   LearningRate 0.0020   Epoch: 17   Global Step: 87010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:03,745-Speed 5503.32 samples/sec   Loss 2.0419   LearningRate 0.0020   Epoch: 17   Global Step: 87020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:05,612-Speed 5485.88 samples/sec   Loss 1.9503   LearningRate 0.0020   Epoch: 17   Global Step: 87030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:07,456-Speed 5556.99 samples/sec   Loss 2.0014   LearningRate 0.0019   Epoch: 17   Global Step: 87040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:09,307-Speed 5535.08 samples/sec   Loss 2.0703   LearningRate 0.0019   Epoch: 17   Global Step: 87050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:11,167-Speed 5506.30 samples/sec   Loss 2.0284   LearningRate 0.0019   Epoch: 17   Global Step: 87060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:13,014-Speed 5546.71 samples/sec   Loss 1.9929   LearningRate 0.0019   Epoch: 17   Global Step: 87070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:14,873-Speed 5509.29 samples/sec   Loss 2.0024   LearningRate 0.0019   Epoch: 17   Global Step: 87080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:16,715-Speed 5561.84 samples/sec   Loss 1.9900   LearningRate 0.0019   Epoch: 17   Global Step: 87090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:51:18,549-Speed 5583.83 samples/sec   Loss 1.9504   LearningRate 0.0019   Epoch: 17   Global Step: 87100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:20,386-Speed 5576.09 samples/sec   Loss 1.9611   LearningRate 0.0019   Epoch: 17   Global Step: 87110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:22,229-Speed 5558.90 samples/sec   Loss 2.0282   LearningRate 0.0019   Epoch: 17   Global Step: 87120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:24,079-Speed 5538.71 samples/sec   Loss 2.1339   LearningRate 0.0019   Epoch: 17   Global Step: 87130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:25,950-Speed 5474.07 samples/sec   Loss 1.9668   LearningRate 0.0019   Epoch: 17   Global Step: 87140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:27,793-Speed 5558.22 samples/sec   Loss 1.9889   LearningRate 0.0019   Epoch: 17   Global Step: 87150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:29,660-Speed 5488.93 samples/sec   Loss 2.0223   LearningRate 0.0019   Epoch: 17   Global Step: 87160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:31,501-Speed 5565.53 samples/sec   Loss 2.0141   LearningRate 0.0019   Epoch: 17   Global Step: 87170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:33,335-Speed 5583.92 samples/sec   Loss 2.1029   LearningRate 0.0019   Epoch: 17   Global Step: 87180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:35,171-Speed 5578.21 samples/sec   Loss 1.9143   LearningRate 0.0019   Epoch: 17   Global Step: 87190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:37,015-Speed 5565.12 samples/sec   Loss 2.0239   LearningRate 0.0019   Epoch: 17   Global Step: 87200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:51:38,860-Speed 5553.00 samples/sec   Loss 1.9916   LearningRate 0.0019   Epoch: 17   Global Step: 87210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:40,711-Speed 5535.00 samples/sec   Loss 2.0319   LearningRate 0.0019   Epoch: 17   Global Step: 87220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:42,545-Speed 5583.01 samples/sec   Loss 2.0156   LearningRate 0.0019   Epoch: 17   Global Step: 87230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:44,392-Speed 5546.37 samples/sec   Loss 1.9810   LearningRate 0.0019   Epoch: 17   Global Step: 87240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:46,232-Speed 5569.42 samples/sec   Loss 1.9339   LearningRate 0.0019   Epoch: 17   Global Step: 87250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:48,074-Speed 5562.29 samples/sec   Loss 2.0913   LearningRate 0.0019   Epoch: 17   Global Step: 87260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:49,914-Speed 5565.39 samples/sec   Loss 2.0018   LearningRate 0.0019   Epoch: 17   Global Step: 87270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:51,748-Speed 5585.71 samples/sec   Loss 2.0305   LearningRate 0.0019   Epoch: 17   Global Step: 87280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:53,593-Speed 5552.32 samples/sec   Loss 1.9756   LearningRate 0.0019   Epoch: 17   Global Step: 87290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:55,440-Speed 5545.98 samples/sec   Loss 2.1968   LearningRate 0.0019   Epoch: 17   Global Step: 87300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:51:57,279-Speed 5570.28 samples/sec   Loss 1.9723   LearningRate 0.0019   Epoch: 17   Global Step: 87310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:51:59,127-Speed 5544.16 samples/sec   Loss 2.0176   LearningRate 0.0019   Epoch: 17   Global Step: 87320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:52:00,964-Speed 5579.50 samples/sec   Loss 2.0293   LearningRate 0.0019   Epoch: 17   Global Step: 87330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:52:02,809-Speed 5550.60 samples/sec   Loss 2.0422   LearningRate 0.0019   Epoch: 17   Global Step: 87340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:52:04,655-Speed 5547.22 samples/sec   Loss 2.0222   LearningRate 0.0019   Epoch: 17   Global Step: 87350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:52:06,517-Speed 5504.30 samples/sec   Loss 1.9513   LearningRate 0.0019   Epoch: 17   Global Step: 87360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:08,356-Speed 5570.14 samples/sec   Loss 2.0213   LearningRate 0.0019   Epoch: 17   Global Step: 87370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:10,192-Speed 5578.01 samples/sec   Loss 2.0255   LearningRate 0.0019   Epoch: 17   Global Step: 87380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:12,036-Speed 5557.80 samples/sec   Loss 2.0775   LearningRate 0.0019   Epoch: 17   Global Step: 87390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:13,910-Speed 5465.63 samples/sec   Loss 1.9305   LearningRate 0.0019   Epoch: 17   Global Step: 87400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:15,785-Speed 5463.41 samples/sec   Loss 1.9961   LearningRate 0.0018   Epoch: 17   Global Step: 87410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:17,625-Speed 5568.08 samples/sec   Loss 2.1164   LearningRate 0.0018   Epoch: 17   Global Step: 87420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:19,466-Speed 5562.69 samples/sec   Loss 2.0539   LearningRate 0.0018   Epoch: 17   Global Step: 87430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:21,302-Speed 5580.46 samples/sec   Loss 2.0071   LearningRate 0.0018   Epoch: 17   Global Step: 87440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:23,139-Speed 5577.79 samples/sec   Loss 1.9409   LearningRate 0.0018   Epoch: 17   Global Step: 87450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:24,977-Speed 5572.46 samples/sec   Loss 2.1278   LearningRate 0.0018   Epoch: 17   Global Step: 87460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:52:26,799-Speed 5621.51 samples/sec   Loss 2.0513   LearningRate 0.0018   Epoch: 17   Global Step: 87470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:28,642-Speed 5558.23 samples/sec   Loss 2.1483   LearningRate 0.0018   Epoch: 17   Global Step: 87480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:30,477-Speed 5582.74 samples/sec   Loss 2.0782   LearningRate 0.0018   Epoch: 17   Global Step: 87490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:32,324-Speed 5548.22 samples/sec   Loss 1.9568   LearningRate 0.0018   Epoch: 17   Global Step: 87500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:34,165-Speed 5564.47 samples/sec   Loss 1.9524   LearningRate 0.0018   Epoch: 17   Global Step: 87510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:36,014-Speed 5539.78 samples/sec   Loss 2.0441   LearningRate 0.0018   Epoch: 17   Global Step: 87520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:37,868-Speed 5525.28 samples/sec   Loss 2.0639   LearningRate 0.0018   Epoch: 17   Global Step: 87530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:39,720-Speed 5530.21 samples/sec   Loss 2.1373   LearningRate 0.0018   Epoch: 17   Global Step: 87540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:41,565-Speed 5553.82 samples/sec   Loss 2.1038   LearningRate 0.0018   Epoch: 17   Global Step: 87550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:43,398-Speed 5589.79 samples/sec   Loss 2.0182   LearningRate 0.0018   Epoch: 17   Global Step: 87560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:45,254-Speed 5518.35 samples/sec   Loss 2.0562   LearningRate 0.0018   Epoch: 17   Global Step: 87570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:52:47,092-Speed 5574.30 samples/sec   Loss 2.0638   LearningRate 0.0018   Epoch: 17   Global Step: 87580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:52:48,934-Speed 5561.72 samples/sec   Loss 2.1783   LearningRate 0.0018   Epoch: 17   Global Step: 87590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:52:50,775-Speed 5563.85 samples/sec   Loss 2.0165   LearningRate 0.0018   Epoch: 17   Global Step: 87600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:52:52,623-Speed 5544.84 samples/sec   Loss 2.0130   LearningRate 0.0018   Epoch: 17   Global Step: 87610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:54,461-Speed 5574.06 samples/sec   Loss 2.0800   LearningRate 0.0018   Epoch: 17   Global Step: 87620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:56,314-Speed 5527.41 samples/sec   Loss 2.0441   LearningRate 0.0018   Epoch: 17   Global Step: 87630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:58,152-Speed 5574.48 samples/sec   Loss 2.1337   LearningRate 0.0018   Epoch: 17   Global Step: 87640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:52:59,984-Speed 5590.50 samples/sec   Loss 2.0721   LearningRate 0.0018   Epoch: 17   Global Step: 87650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:01,834-Speed 5536.15 samples/sec   Loss 2.0544   LearningRate 0.0018   Epoch: 17   Global Step: 87660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:03,700-Speed 5491.84 samples/sec   Loss 2.0772   LearningRate 0.0018   Epoch: 17   Global Step: 87670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:05,557-Speed 5515.40 samples/sec   Loss 2.0457   LearningRate 0.0018   Epoch: 17   Global Step: 87680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:07,398-Speed 5563.56 samples/sec   Loss 2.1629   LearningRate 0.0018   Epoch: 17   Global Step: 87690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:09,240-Speed 5562.26 samples/sec   Loss 2.0402   LearningRate 0.0018   Epoch: 17   Global Step: 87700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:11,090-Speed 5537.76 samples/sec   Loss 2.0941   LearningRate 0.0018   Epoch: 17   Global Step: 87710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:53:12,934-Speed 5556.37 samples/sec   Loss 2.1187   LearningRate 0.0018   Epoch: 17   Global Step: 87720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:53:14,779-Speed 5552.72 samples/sec   Loss 2.1944   LearningRate 0.0018   Epoch: 17   Global Step: 87730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:53:16,620-Speed 5564.64 samples/sec   Loss 2.0016   LearningRate 0.0018   Epoch: 17   Global Step: 87740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:18,455-Speed 5581.88 samples/sec   Loss 2.1822   LearningRate 0.0018   Epoch: 17   Global Step: 87750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:20,300-Speed 5549.75 samples/sec   Loss 2.0692   LearningRate 0.0018   Epoch: 17   Global Step: 87760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:22,142-Speed 5564.12 samples/sec   Loss 1.9279   LearningRate 0.0018   Epoch: 17   Global Step: 87770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:23,977-Speed 5581.39 samples/sec   Loss 2.0101   LearningRate 0.0017   Epoch: 17   Global Step: 87780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:25,829-Speed 5532.44 samples/sec   Loss 2.0444   LearningRate 0.0017   Epoch: 17   Global Step: 87790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:27,678-Speed 5539.35 samples/sec   Loss 2.0174   LearningRate 0.0017   Epoch: 17   Global Step: 87800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:29,539-Speed 5504.21 samples/sec   Loss 2.0215   LearningRate 0.0017   Epoch: 17   Global Step: 87810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:31,390-Speed 5534.74 samples/sec   Loss 2.0328   LearningRate 0.0017   Epoch: 17   Global Step: 87820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:33,226-Speed 5579.68 samples/sec   Loss 1.9925   LearningRate 0.0017   Epoch: 17   Global Step: 87830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:35,060-Speed 5583.76 samples/sec   Loss 2.1055   LearningRate 0.0017   Epoch: 17   Global Step: 87840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:36,901-Speed 5566.38 samples/sec   Loss 2.0869   LearningRate 0.0017   Epoch: 17   Global Step: 87850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:38,740-Speed 5568.12 samples/sec   Loss 2.0941   LearningRate 0.0017   Epoch: 17   Global Step: 87860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:40,579-Speed 5571.09 samples/sec   Loss 2.0801   LearningRate 0.0017   Epoch: 17   Global Step: 87870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:42,417-Speed 5573.90 samples/sec   Loss 2.1144   LearningRate 0.0017   Epoch: 17   Global Step: 87880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:44,253-Speed 5580.28 samples/sec   Loss 2.0925   LearningRate 0.0017   Epoch: 17   Global Step: 87890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:46,097-Speed 5552.95 samples/sec   Loss 2.0737   LearningRate 0.0017   Epoch: 17   Global Step: 87900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:47,953-Speed 5520.16 samples/sec   Loss 2.0716   LearningRate 0.0017   Epoch: 17   Global Step: 87910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:49,789-Speed 5579.05 samples/sec   Loss 2.0029   LearningRate 0.0017   Epoch: 17   Global Step: 87920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:53:51,619-Speed 5600.07 samples/sec   Loss 2.0781   LearningRate 0.0017   Epoch: 17   Global Step: 87930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:53:53,466-Speed 5544.28 samples/sec   Loss 2.0202   LearningRate 0.0017   Epoch: 17   Global Step: 87940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:53:55,322-Speed 5519.43 samples/sec   Loss 2.0362   LearningRate 0.0017   Epoch: 17   Global Step: 87950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:53:57,161-Speed 5570.46 samples/sec   Loss 2.0202   LearningRate 0.0017   Epoch: 17   Global Step: 87960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:53:59,009-Speed 5542.80 samples/sec   Loss 2.0364   LearningRate 0.0017   Epoch: 17   Global Step: 87970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:54:00,849-Speed 5569.64 samples/sec   Loss 2.0722   LearningRate 0.0017   Epoch: 17   Global Step: 87980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:54:02,688-Speed 5568.81 samples/sec   Loss 2.0318   LearningRate 0.0017   Epoch: 17   Global Step: 87990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:54:04,528-Speed 5568.93 samples/sec   Loss 1.9986   LearningRate 0.0017   Epoch: 17   Global Step: 88000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:54:31,115-[lfw][88000]XNorm: 22.533454
Training: 2022-04-11 15:54:31,115-[lfw][88000]Accuracy-Flip: 0.99767+-0.00260
Training: 2022-04-11 15:54:31,116-[lfw][88000]Accuracy-Highest: 0.99817
Training: 2022-04-11 15:55:01,718-[cfp_fp][88000]XNorm: 21.454129
Training: 2022-04-11 15:55:01,719-[cfp_fp][88000]Accuracy-Flip: 0.98329+-0.00515
Training: 2022-04-11 15:55:01,720-[cfp_fp][88000]Accuracy-Highest: 0.98386
Training: 2022-04-11 15:55:28,147-[agedb_30][88000]XNorm: 22.700543
Training: 2022-04-11 15:55:28,148-[agedb_30][88000]Accuracy-Flip: 0.98350+-0.00754
Training: 2022-04-11 15:55:28,148-[agedb_30][88000]Accuracy-Highest: 0.98350
Training: 2022-04-11 15:55:30,061-Speed 119.72 samples/sec   Loss 2.0418   LearningRate 0.0017   Epoch: 17   Global Step: 88010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:55:31,943-Speed 5442.28 samples/sec   Loss 2.0396   LearningRate 0.0017   Epoch: 17   Global Step: 88020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:55:33,772-Speed 5599.41 samples/sec   Loss 1.9940   LearningRate 0.0017   Epoch: 17   Global Step: 88030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:55:35,630-Speed 5512.63 samples/sec   Loss 2.1146   LearningRate 0.0017   Epoch: 17   Global Step: 88040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:55:37,489-Speed 5513.97 samples/sec   Loss 2.0368   LearningRate 0.0017   Epoch: 17   Global Step: 88050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:55:39,334-Speed 5551.77 samples/sec   Loss 2.0364   LearningRate 0.0017   Epoch: 17   Global Step: 88060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:55:41,176-Speed 5560.12 samples/sec   Loss 2.0637   LearningRate 0.0017   Epoch: 17   Global Step: 88070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:55:43,041-Speed 5495.94 samples/sec   Loss 2.0656   LearningRate 0.0017   Epoch: 17   Global Step: 88080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:55:44,874-Speed 5587.30 samples/sec   Loss 2.0790   LearningRate 0.0017   Epoch: 17   Global Step: 88090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:55:46,704-Speed 5595.83 samples/sec   Loss 2.0473   LearningRate 0.0017   Epoch: 17   Global Step: 88100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:55:48,549-Speed 5554.45 samples/sec   Loss 1.9448   LearningRate 0.0017   Epoch: 17   Global Step: 88110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:55:50,394-Speed 5551.01 samples/sec   Loss 2.0300   LearningRate 0.0017   Epoch: 17   Global Step: 88120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:55:52,231-Speed 5578.51 samples/sec   Loss 2.0622   LearningRate 0.0017   Epoch: 17   Global Step: 88130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:55:54,076-Speed 5552.85 samples/sec   Loss 2.0675   LearningRate 0.0017   Epoch: 17   Global Step: 88140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:55:55,918-Speed 5561.52 samples/sec   Loss 2.0044   LearningRate 0.0017   Epoch: 17   Global Step: 88150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:55:57,747-Speed 5600.21 samples/sec   Loss 1.9557   LearningRate 0.0017   Epoch: 17   Global Step: 88160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:55:59,581-Speed 5585.96 samples/sec   Loss 2.0503   LearningRate 0.0016   Epoch: 17   Global Step: 88170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:56:01,412-Speed 5592.17 samples/sec   Loss 2.1044   LearningRate 0.0016   Epoch: 17   Global Step: 88180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:56:03,245-Speed 5589.64 samples/sec   Loss 2.0794   LearningRate 0.0016   Epoch: 17   Global Step: 88190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:56:05,075-Speed 5598.89 samples/sec   Loss 2.0052   LearningRate 0.0016   Epoch: 17   Global Step: 88200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:56:06,932-Speed 5516.43 samples/sec   Loss 1.9613   LearningRate 0.0016   Epoch: 17   Global Step: 88210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:56:08,763-Speed 5595.57 samples/sec   Loss 2.1317   LearningRate 0.0016   Epoch: 17   Global Step: 88220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:56:10,605-Speed 5561.79 samples/sec   Loss 1.9555   LearningRate 0.0016   Epoch: 17   Global Step: 88230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:56:12,445-Speed 5567.46 samples/sec   Loss 2.0971   LearningRate 0.0016   Epoch: 17   Global Step: 88240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:56:14,297-Speed 5531.15 samples/sec   Loss 2.0067   LearningRate 0.0016   Epoch: 17   Global Step: 88250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:56:16,137-Speed 5567.68 samples/sec   Loss 2.2034   LearningRate 0.0016   Epoch: 17   Global Step: 88260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:56:17,971-Speed 5586.22 samples/sec   Loss 2.0466   LearningRate 0.0016   Epoch: 17   Global Step: 88270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:56:19,824-Speed 5527.49 samples/sec   Loss 2.0637   LearningRate 0.0016   Epoch: 17   Global Step: 88280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:56:21,666-Speed 5562.78 samples/sec   Loss 2.0735   LearningRate 0.0016   Epoch: 17   Global Step: 88290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:56:23,503-Speed 5576.52 samples/sec   Loss 2.0688   LearningRate 0.0016   Epoch: 17   Global Step: 88300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:56:25,341-Speed 5573.15 samples/sec   Loss 2.0403   LearningRate 0.0016   Epoch: 17   Global Step: 88310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:56:27,186-Speed 5552.47 samples/sec   Loss 2.0215   LearningRate 0.0016   Epoch: 17   Global Step: 88320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:56:29,037-Speed 5537.12 samples/sec   Loss 2.0177   LearningRate 0.0016   Epoch: 17   Global Step: 88330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:56:30,870-Speed 5586.89 samples/sec   Loss 2.0002   LearningRate 0.0016   Epoch: 17   Global Step: 88340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:56:32,700-Speed 5597.65 samples/sec   Loss 2.0494   LearningRate 0.0016   Epoch: 17   Global Step: 88350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:56:34,535-Speed 5584.45 samples/sec   Loss 2.0827   LearningRate 0.0016   Epoch: 17   Global Step: 88360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:56:36,385-Speed 5537.34 samples/sec   Loss 1.9968   LearningRate 0.0016   Epoch: 17   Global Step: 88370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:56:38,229-Speed 5553.98 samples/sec   Loss 2.0579   LearningRate 0.0016   Epoch: 17   Global Step: 88380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:56:40,077-Speed 5544.95 samples/sec   Loss 2.0812   LearningRate 0.0016   Epoch: 17   Global Step: 88390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:56:41,918-Speed 5562.37 samples/sec   Loss 2.1291   LearningRate 0.0016   Epoch: 17   Global Step: 88400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:56:43,759-Speed 5565.16 samples/sec   Loss 2.0606   LearningRate 0.0016   Epoch: 17   Global Step: 88410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:56:45,596-Speed 5577.07 samples/sec   Loss 2.0549   LearningRate 0.0016   Epoch: 17   Global Step: 88420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:56:47,443-Speed 5547.66 samples/sec   Loss 2.0293   LearningRate 0.0016   Epoch: 17   Global Step: 88430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:56:49,273-Speed 5596.15 samples/sec   Loss 2.0681   LearningRate 0.0016   Epoch: 17   Global Step: 88440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:56:51,108-Speed 5582.35 samples/sec   Loss 2.0639   LearningRate 0.0016   Epoch: 17   Global Step: 88450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:56:52,954-Speed 5550.36 samples/sec   Loss 2.0577   LearningRate 0.0016   Epoch: 17   Global Step: 88460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:56:54,797-Speed 5558.38 samples/sec   Loss 2.0771   LearningRate 0.0016   Epoch: 17   Global Step: 88470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:56:56,628-Speed 5594.88 samples/sec   Loss 2.1385   LearningRate 0.0016   Epoch: 17   Global Step: 88480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:56:58,459-Speed 5595.77 samples/sec   Loss 2.0415   LearningRate 0.0016   Epoch: 17   Global Step: 88490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:00,289-Speed 5597.61 samples/sec   Loss 2.1105   LearningRate 0.0016   Epoch: 17   Global Step: 88500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:02,131-Speed 5561.10 samples/sec   Loss 2.0222   LearningRate 0.0016   Epoch: 17   Global Step: 88510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:03,973-Speed 5562.10 samples/sec   Loss 2.0209   LearningRate 0.0016   Epoch: 17   Global Step: 88520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:05,824-Speed 5535.46 samples/sec   Loss 2.0750   LearningRate 0.0016   Epoch: 17   Global Step: 88530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:07,657-Speed 5586.02 samples/sec   Loss 1.9557   LearningRate 0.0016   Epoch: 17   Global Step: 88540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:09,490-Speed 5591.16 samples/sec   Loss 2.1287   LearningRate 0.0016   Epoch: 17   Global Step: 88550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:11,336-Speed 5548.09 samples/sec   Loss 2.0222   LearningRate 0.0016   Epoch: 17   Global Step: 88560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:13,186-Speed 5539.53 samples/sec   Loss 2.0485   LearningRate 0.0015   Epoch: 17   Global Step: 88570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:15,030-Speed 5554.03 samples/sec   Loss 2.0609   LearningRate 0.0015   Epoch: 17   Global Step: 88580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:16,881-Speed 5534.98 samples/sec   Loss 2.0347   LearningRate 0.0015   Epoch: 17   Global Step: 88590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:18,722-Speed 5564.09 samples/sec   Loss 1.9930   LearningRate 0.0015   Epoch: 17   Global Step: 88600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:20,556-Speed 5584.13 samples/sec   Loss 2.0300   LearningRate 0.0015   Epoch: 17   Global Step: 88610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:22,395-Speed 5571.80 samples/sec   Loss 2.1028   LearningRate 0.0015   Epoch: 17   Global Step: 88620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:24,247-Speed 5531.61 samples/sec   Loss 2.1492   LearningRate 0.0015   Epoch: 17   Global Step: 88630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:26,092-Speed 5553.49 samples/sec   Loss 2.0089   LearningRate 0.0015   Epoch: 17   Global Step: 88640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:27,935-Speed 5558.43 samples/sec   Loss 2.1357   LearningRate 0.0015   Epoch: 17   Global Step: 88650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:29,771-Speed 5578.04 samples/sec   Loss 1.9635   LearningRate 0.0015   Epoch: 17   Global Step: 88660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:31,607-Speed 5580.31 samples/sec   Loss 2.0031   LearningRate 0.0015   Epoch: 17   Global Step: 88670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:33,443-Speed 5578.77 samples/sec   Loss 1.9981   LearningRate 0.0015   Epoch: 17   Global Step: 88680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:35,275-Speed 5591.52 samples/sec   Loss 2.0595   LearningRate 0.0015   Epoch: 17   Global Step: 88690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:57:37,120-Speed 5551.15 samples/sec   Loss 1.9716   LearningRate 0.0015   Epoch: 17   Global Step: 88700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:57:38,967-Speed 5547.01 samples/sec   Loss 2.1026   LearningRate 0.0015   Epoch: 17   Global Step: 88710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:57:40,809-Speed 5560.53 samples/sec   Loss 2.0758   LearningRate 0.0015   Epoch: 17   Global Step: 88720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:57:42,668-Speed 5511.47 samples/sec   Loss 2.0764   LearningRate 0.0015   Epoch: 17   Global Step: 88730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:57:44,505-Speed 5578.40 samples/sec   Loss 2.1442   LearningRate 0.0015   Epoch: 17   Global Step: 88740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:57:46,377-Speed 5471.75 samples/sec   Loss 2.0599   LearningRate 0.0015   Epoch: 17   Global Step: 88750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:57:48,216-Speed 5569.22 samples/sec   Loss 2.0245   LearningRate 0.0015   Epoch: 17   Global Step: 88760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:57:50,065-Speed 5541.38 samples/sec   Loss 2.0984   LearningRate 0.0015   Epoch: 17   Global Step: 88770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:57:51,912-Speed 5545.99 samples/sec   Loss 1.9629   LearningRate 0.0015   Epoch: 17   Global Step: 88780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 15:57:53,762-Speed 5537.96 samples/sec   Loss 2.0501   LearningRate 0.0015   Epoch: 17   Global Step: 88790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:55,603-Speed 5562.84 samples/sec   Loss 2.0600   LearningRate 0.0015   Epoch: 17   Global Step: 88800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:57,438-Speed 5581.18 samples/sec   Loss 2.0664   LearningRate 0.0015   Epoch: 17   Global Step: 88810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:57:59,284-Speed 5549.17 samples/sec   Loss 2.0784   LearningRate 0.0015   Epoch: 17   Global Step: 88820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:01,131-Speed 5546.39 samples/sec   Loss 2.0854   LearningRate 0.0015   Epoch: 17   Global Step: 88830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:02,985-Speed 5526.28 samples/sec   Loss 2.1572   LearningRate 0.0015   Epoch: 17   Global Step: 88840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:04,836-Speed 5535.61 samples/sec   Loss 2.0103   LearningRate 0.0015   Epoch: 17   Global Step: 88850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:06,688-Speed 5529.65 samples/sec   Loss 2.0349   LearningRate 0.0015   Epoch: 17   Global Step: 88860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:08,524-Speed 5580.99 samples/sec   Loss 2.1418   LearningRate 0.0015   Epoch: 17   Global Step: 88870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:10,353-Speed 5598.92 samples/sec   Loss 2.0716   LearningRate 0.0015   Epoch: 17   Global Step: 88880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:12,200-Speed 5547.19 samples/sec   Loss 1.9863   LearningRate 0.0015   Epoch: 17   Global Step: 88890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:58:14,055-Speed 5522.87 samples/sec   Loss 1.9661   LearningRate 0.0015   Epoch: 17   Global Step: 88900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:15,901-Speed 5547.29 samples/sec   Loss 2.0625   LearningRate 0.0015   Epoch: 17   Global Step: 88910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:17,757-Speed 5519.31 samples/sec   Loss 2.1479   LearningRate 0.0015   Epoch: 17   Global Step: 88920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:19,598-Speed 5564.40 samples/sec   Loss 2.0656   LearningRate 0.0015   Epoch: 17   Global Step: 88930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:21,434-Speed 5581.85 samples/sec   Loss 2.0363   LearningRate 0.0015   Epoch: 17   Global Step: 88940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:23,276-Speed 5561.09 samples/sec   Loss 2.1603   LearningRate 0.0015   Epoch: 17   Global Step: 88950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:25,129-Speed 5529.58 samples/sec   Loss 2.0895   LearningRate 0.0015   Epoch: 17   Global Step: 88960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:26,977-Speed 5540.82 samples/sec   Loss 2.0845   LearningRate 0.0015   Epoch: 17   Global Step: 88970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:28,816-Speed 5572.16 samples/sec   Loss 2.1913   LearningRate 0.0014   Epoch: 17   Global Step: 88980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:30,655-Speed 5571.04 samples/sec   Loss 2.1233   LearningRate 0.0014   Epoch: 17   Global Step: 88990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:32,490-Speed 5584.18 samples/sec   Loss 1.9575   LearningRate 0.0014   Epoch: 17   Global Step: 89000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:34,336-Speed 5547.60 samples/sec   Loss 2.0817   LearningRate 0.0014   Epoch: 17   Global Step: 89010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:36,185-Speed 5542.89 samples/sec   Loss 2.0744   LearningRate 0.0014   Epoch: 17   Global Step: 89020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:38,033-Speed 5542.60 samples/sec   Loss 2.0417   LearningRate 0.0014   Epoch: 17   Global Step: 89030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:39,879-Speed 5548.30 samples/sec   Loss 2.1218   LearningRate 0.0014   Epoch: 17   Global Step: 89040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:41,745-Speed 5489.20 samples/sec   Loss 2.1072   LearningRate 0.0014   Epoch: 17   Global Step: 89050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:43,601-Speed 5520.30 samples/sec   Loss 2.0484   LearningRate 0.0014   Epoch: 17   Global Step: 89060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:45,433-Speed 5591.94 samples/sec   Loss 2.0079   LearningRate 0.0014   Epoch: 17   Global Step: 89070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:47,269-Speed 5580.99 samples/sec   Loss 2.1371   LearningRate 0.0014   Epoch: 17   Global Step: 89080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:49,105-Speed 5578.64 samples/sec   Loss 2.1780   LearningRate 0.0014   Epoch: 17   Global Step: 89090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:50,944-Speed 5569.98 samples/sec   Loss 2.1143   LearningRate 0.0014   Epoch: 17   Global Step: 89100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:58:52,781-Speed 5576.72 samples/sec   Loss 2.0696   LearningRate 0.0014   Epoch: 17   Global Step: 89110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:54,637-Speed 5518.99 samples/sec   Loss 2.0442   LearningRate 0.0014   Epoch: 17   Global Step: 89120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:56,479-Speed 5562.51 samples/sec   Loss 2.0395   LearningRate 0.0014   Epoch: 17   Global Step: 89130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:58:58,319-Speed 5567.02 samples/sec   Loss 2.0954   LearningRate 0.0014   Epoch: 17   Global Step: 89140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:00,158-Speed 5568.31 samples/sec   Loss 1.8729   LearningRate 0.0014   Epoch: 17   Global Step: 89150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:01,997-Speed 5571.10 samples/sec   Loss 1.9924   LearningRate 0.0014   Epoch: 17   Global Step: 89160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:03,842-Speed 5552.82 samples/sec   Loss 2.0912   LearningRate 0.0014   Epoch: 17   Global Step: 89170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:05,691-Speed 5541.33 samples/sec   Loss 2.1536   LearningRate 0.0014   Epoch: 17   Global Step: 89180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:07,541-Speed 5536.57 samples/sec   Loss 2.1062   LearningRate 0.0014   Epoch: 17   Global Step: 89190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:09,378-Speed 5577.33 samples/sec   Loss 2.0716   LearningRate 0.0014   Epoch: 17   Global Step: 89200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:11,209-Speed 5592.81 samples/sec   Loss 2.0910   LearningRate 0.0014   Epoch: 17   Global Step: 89210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:13,064-Speed 5521.90 samples/sec   Loss 2.0778   LearningRate 0.0014   Epoch: 17   Global Step: 89220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:14,900-Speed 5581.08 samples/sec   Loss 1.9929   LearningRate 0.0014   Epoch: 17   Global Step: 89230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:16,744-Speed 5554.59 samples/sec   Loss 1.9486   LearningRate 0.0014   Epoch: 17   Global Step: 89240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:18,579-Speed 5583.11 samples/sec   Loss 2.1218   LearningRate 0.0014   Epoch: 17   Global Step: 89250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:20,422-Speed 5559.38 samples/sec   Loss 2.0342   LearningRate 0.0014   Epoch: 17   Global Step: 89260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:22,256-Speed 5586.00 samples/sec   Loss 2.1136   LearningRate 0.0014   Epoch: 17   Global Step: 89270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:24,093-Speed 5575.36 samples/sec   Loss 2.1474   LearningRate 0.0014   Epoch: 17   Global Step: 89280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:25,938-Speed 5551.32 samples/sec   Loss 1.9590   LearningRate 0.0014   Epoch: 17   Global Step: 89290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:27,787-Speed 5541.08 samples/sec   Loss 2.0293   LearningRate 0.0014   Epoch: 17   Global Step: 89300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:29,630-Speed 5559.95 samples/sec   Loss 2.0029   LearningRate 0.0014   Epoch: 17   Global Step: 89310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:59:31,465-Speed 5580.88 samples/sec   Loss 2.1510   LearningRate 0.0014   Epoch: 17   Global Step: 89320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 15:59:33,302-Speed 5576.58 samples/sec   Loss 2.0369   LearningRate 0.0014   Epoch: 17   Global Step: 89330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:35,141-Speed 5572.42 samples/sec   Loss 2.0454   LearningRate 0.0014   Epoch: 17   Global Step: 89340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:36,981-Speed 5567.13 samples/sec   Loss 2.1119   LearningRate 0.0014   Epoch: 17   Global Step: 89350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:38,826-Speed 5551.51 samples/sec   Loss 2.0299   LearningRate 0.0014   Epoch: 17   Global Step: 89360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:40,667-Speed 5566.63 samples/sec   Loss 2.0405   LearningRate 0.0014   Epoch: 17   Global Step: 89370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:42,508-Speed 5563.54 samples/sec   Loss 2.1150   LearningRate 0.0014   Epoch: 17   Global Step: 89380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:44,348-Speed 5567.41 samples/sec   Loss 2.0466   LearningRate 0.0014   Epoch: 17   Global Step: 89390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:46,180-Speed 5592.02 samples/sec   Loss 2.0641   LearningRate 0.0014   Epoch: 17   Global Step: 89400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:48,019-Speed 5569.79 samples/sec   Loss 2.0640   LearningRate 0.0013   Epoch: 17   Global Step: 89410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:49,854-Speed 5584.06 samples/sec   Loss 1.9415   LearningRate 0.0013   Epoch: 17   Global Step: 89420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:51,680-Speed 5608.31 samples/sec   Loss 2.0341   LearningRate 0.0013   Epoch: 17   Global Step: 89430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:53,517-Speed 5578.91 samples/sec   Loss 1.9672   LearningRate 0.0013   Epoch: 17   Global Step: 89440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:55,357-Speed 5566.79 samples/sec   Loss 2.0190   LearningRate 0.0013   Epoch: 17   Global Step: 89450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:57,191-Speed 5583.98 samples/sec   Loss 2.1210   LearningRate 0.0013   Epoch: 17   Global Step: 89460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 15:59:59,031-Speed 5569.16 samples/sec   Loss 2.0861   LearningRate 0.0013   Epoch: 17   Global Step: 89470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:00,867-Speed 5579.46 samples/sec   Loss 2.0474   LearningRate 0.0013   Epoch: 17   Global Step: 89480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:02,708-Speed 5562.53 samples/sec   Loss 2.1337   LearningRate 0.0013   Epoch: 17   Global Step: 89490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:04,552-Speed 5558.06 samples/sec   Loss 1.9682   LearningRate 0.0013   Epoch: 17   Global Step: 89500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:06,390-Speed 5570.73 samples/sec   Loss 2.0472   LearningRate 0.0013   Epoch: 17   Global Step: 89510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:08,222-Speed 5590.98 samples/sec   Loss 2.0437   LearningRate 0.0013   Epoch: 17   Global Step: 89520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:10,059-Speed 5578.93 samples/sec   Loss 2.0459   LearningRate 0.0013   Epoch: 17   Global Step: 89530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:00:11,920-Speed 5504.19 samples/sec   Loss 2.0595   LearningRate 0.0013   Epoch: 17   Global Step: 89540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:13,765-Speed 5553.45 samples/sec   Loss 1.9595   LearningRate 0.0013   Epoch: 17   Global Step: 89550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:15,628-Speed 5497.02 samples/sec   Loss 2.0633   LearningRate 0.0013   Epoch: 17   Global Step: 89560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:17,474-Speed 5550.39 samples/sec   Loss 2.0572   LearningRate 0.0013   Epoch: 17   Global Step: 89570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:19,315-Speed 5565.19 samples/sec   Loss 2.0870   LearningRate 0.0013   Epoch: 17   Global Step: 89580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:21,148-Speed 5586.99 samples/sec   Loss 2.0537   LearningRate 0.0013   Epoch: 17   Global Step: 89590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:22,998-Speed 5536.43 samples/sec   Loss 1.9786   LearningRate 0.0013   Epoch: 17   Global Step: 89600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:24,835-Speed 5579.08 samples/sec   Loss 2.0921   LearningRate 0.0013   Epoch: 17   Global Step: 89610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:26,673-Speed 5572.25 samples/sec   Loss 2.0510   LearningRate 0.0013   Epoch: 17   Global Step: 89620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:28,515-Speed 5562.92 samples/sec   Loss 2.0453   LearningRate 0.0013   Epoch: 17   Global Step: 89630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:30,349-Speed 5583.98 samples/sec   Loss 2.0343   LearningRate 0.0013   Epoch: 17   Global Step: 89640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:00:32,180-Speed 5593.92 samples/sec   Loss 2.0609   LearningRate 0.0013   Epoch: 17   Global Step: 89650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:34,017-Speed 5578.25 samples/sec   Loss 1.9558   LearningRate 0.0013   Epoch: 17   Global Step: 89660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:35,861-Speed 5554.27 samples/sec   Loss 2.1058   LearningRate 0.0013   Epoch: 17   Global Step: 89670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:37,705-Speed 5554.91 samples/sec   Loss 2.0910   LearningRate 0.0013   Epoch: 17   Global Step: 89680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:39,568-Speed 5501.05 samples/sec   Loss 2.0938   LearningRate 0.0013   Epoch: 17   Global Step: 89690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:41,428-Speed 5506.97 samples/sec   Loss 2.1437   LearningRate 0.0013   Epoch: 17   Global Step: 89700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:43,272-Speed 5555.24 samples/sec   Loss 2.1150   LearningRate 0.0013   Epoch: 17   Global Step: 89710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:45,110-Speed 5571.74 samples/sec   Loss 2.0630   LearningRate 0.0013   Epoch: 17   Global Step: 89720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:46,958-Speed 5543.92 samples/sec   Loss 2.1514   LearningRate 0.0013   Epoch: 17   Global Step: 89730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:48,807-Speed 5542.12 samples/sec   Loss 2.0494   LearningRate 0.0013   Epoch: 17   Global Step: 89740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:50,640-Speed 5585.96 samples/sec   Loss 2.0198   LearningRate 0.0013   Epoch: 17   Global Step: 89750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:00:52,497-Speed 5517.44 samples/sec   Loss 2.1382   LearningRate 0.0013   Epoch: 17   Global Step: 89760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:00:54,374-Speed 5459.06 samples/sec   Loss 1.9906   LearningRate 0.0013   Epoch: 17   Global Step: 89770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:56,226-Speed 5530.40 samples/sec   Loss 2.1148   LearningRate 0.0013   Epoch: 17   Global Step: 89780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:58,073-Speed 5545.44 samples/sec   Loss 2.0922   LearningRate 0.0013   Epoch: 17   Global Step: 89790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:00:59,916-Speed 5560.73 samples/sec   Loss 2.0498   LearningRate 0.0013   Epoch: 17   Global Step: 89800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:01:01,754-Speed 5570.28 samples/sec   Loss 2.0693   LearningRate 0.0013   Epoch: 17   Global Step: 89810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:01:03,591-Speed 5578.62 samples/sec   Loss 2.0507   LearningRate 0.0013   Epoch: 17   Global Step: 89820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:01:05,432-Speed 5563.42 samples/sec   Loss 2.1143   LearningRate 0.0013   Epoch: 17   Global Step: 89830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:01:07,292-Speed 5507.76 samples/sec   Loss 2.1187   LearningRate 0.0013   Epoch: 17   Global Step: 89840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:01:09,126-Speed 5584.43 samples/sec   Loss 2.0367   LearningRate 0.0012   Epoch: 17   Global Step: 89850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:01:10,968-Speed 5563.36 samples/sec   Loss 2.0756   LearningRate 0.0012   Epoch: 17   Global Step: 89860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:01:12,820-Speed 5530.21 samples/sec   Loss 2.0803   LearningRate 0.0012   Epoch: 17   Global Step: 89870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:01:14,672-Speed 5533.50 samples/sec   Loss 2.0416   LearningRate 0.0012   Epoch: 17   Global Step: 89880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:01:16,514-Speed 5560.73 samples/sec   Loss 2.0376   LearningRate 0.0012   Epoch: 17   Global Step: 89890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:01:18,359-Speed 5550.04 samples/sec   Loss 2.0818   LearningRate 0.0012   Epoch: 17   Global Step: 89900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:01:20,195-Speed 5580.49 samples/sec   Loss 2.1450   LearningRate 0.0012   Epoch: 17   Global Step: 89910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:01:22,024-Speed 5600.82 samples/sec   Loss 2.1407   LearningRate 0.0012   Epoch: 17   Global Step: 89920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:01:23,864-Speed 5566.28 samples/sec   Loss 2.0301   LearningRate 0.0012   Epoch: 17   Global Step: 89930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:01:25,713-Speed 5540.76 samples/sec   Loss 2.1012   LearningRate 0.0012   Epoch: 17   Global Step: 89940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:01:27,569-Speed 5520.63 samples/sec   Loss 2.0577   LearningRate 0.0012   Epoch: 17   Global Step: 89950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:01:29,421-Speed 5530.10 samples/sec   Loss 2.1568   LearningRate 0.0012   Epoch: 17   Global Step: 89960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:01:31,260-Speed 5569.86 samples/sec   Loss 2.0078   LearningRate 0.0012   Epoch: 17   Global Step: 89970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:01:33,118-Speed 5515.48 samples/sec   Loss 1.9957   LearningRate 0.0012   Epoch: 17   Global Step: 89980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:01:34,964-Speed 5549.13 samples/sec   Loss 2.1145   LearningRate 0.0012   Epoch: 17   Global Step: 89990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:01:36,804-Speed 5568.72 samples/sec   Loss 2.0924   LearningRate 0.0012   Epoch: 17   Global Step: 90000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:02:03,514-[lfw][90000]XNorm: 22.050924
Training: 2022-04-11 16:02:03,515-[lfw][90000]Accuracy-Flip: 0.99817+-0.00229
Training: 2022-04-11 16:02:03,515-[lfw][90000]Accuracy-Highest: 0.99817
Training: 2022-04-11 16:02:34,328-[cfp_fp][90000]XNorm: 21.039067
Training: 2022-04-11 16:02:34,329-[cfp_fp][90000]Accuracy-Flip: 0.98414+-0.00548
Training: 2022-04-11 16:02:34,329-[cfp_fp][90000]Accuracy-Highest: 0.98414
Training: 2022-04-11 16:03:00,895-[agedb_30][90000]XNorm: 22.194027
Training: 2022-04-11 16:03:00,896-[agedb_30][90000]Accuracy-Flip: 0.98217+-0.00610
Training: 2022-04-11 16:03:00,896-[agedb_30][90000]Accuracy-Highest: 0.98350
Training: 2022-04-11 16:03:02,755-Speed 119.14 samples/sec   Loss 2.1028   LearningRate 0.0012   Epoch: 17   Global Step: 90010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:03:04,605-Speed 5536.14 samples/sec   Loss 2.1079   LearningRate 0.0012   Epoch: 17   Global Step: 90020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:06,449-Speed 5554.97 samples/sec   Loss 1.9305   LearningRate 0.0012   Epoch: 17   Global Step: 90030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:08,280-Speed 5593.20 samples/sec   Loss 2.0863   LearningRate 0.0012   Epoch: 17   Global Step: 90040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:10,123-Speed 5559.48 samples/sec   Loss 2.0027   LearningRate 0.0012   Epoch: 17   Global Step: 90050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:11,974-Speed 5534.68 samples/sec   Loss 2.0735   LearningRate 0.0012   Epoch: 17   Global Step: 90060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:13,819-Speed 5551.27 samples/sec   Loss 2.0249   LearningRate 0.0012   Epoch: 17   Global Step: 90070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:15,676-Speed 5515.85 samples/sec   Loss 2.0807   LearningRate 0.0012   Epoch: 17   Global Step: 90080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:17,535-Speed 5512.17 samples/sec   Loss 2.2197   LearningRate 0.0012   Epoch: 17   Global Step: 90090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:19,368-Speed 5587.00 samples/sec   Loss 2.0466   LearningRate 0.0012   Epoch: 17   Global Step: 90100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:21,210-Speed 5563.67 samples/sec   Loss 2.1572   LearningRate 0.0012   Epoch: 17   Global Step: 90110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:23,057-Speed 5544.33 samples/sec   Loss 2.0435   LearningRate 0.0012   Epoch: 17   Global Step: 90120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:03:24,934-Speed 5459.02 samples/sec   Loss 2.0965   LearningRate 0.0012   Epoch: 17   Global Step: 90130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:03:26,763-Speed 5601.99 samples/sec   Loss 2.0115   LearningRate 0.0012   Epoch: 17   Global Step: 90140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:28,602-Speed 5567.51 samples/sec   Loss 2.0995   LearningRate 0.0012   Epoch: 17   Global Step: 90150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:30,438-Speed 5580.75 samples/sec   Loss 2.0002   LearningRate 0.0012   Epoch: 17   Global Step: 90160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:32,279-Speed 5564.80 samples/sec   Loss 1.9809   LearningRate 0.0012   Epoch: 17   Global Step: 90170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:34,112-Speed 5587.22 samples/sec   Loss 2.0350   LearningRate 0.0012   Epoch: 17   Global Step: 90180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:35,950-Speed 5576.28 samples/sec   Loss 2.1370   LearningRate 0.0012   Epoch: 17   Global Step: 90190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:37,789-Speed 5567.90 samples/sec   Loss 1.9961   LearningRate 0.0012   Epoch: 17   Global Step: 90200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:39,621-Speed 5593.15 samples/sec   Loss 2.0411   LearningRate 0.0012   Epoch: 17   Global Step: 90210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:41,462-Speed 5566.00 samples/sec   Loss 2.1257   LearningRate 0.0012   Epoch: 17   Global Step: 90220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:43,337-Speed 5462.45 samples/sec   Loss 2.1026   LearningRate 0.0012   Epoch: 17   Global Step: 90230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:03:45,172-Speed 5580.34 samples/sec   Loss 2.0829   LearningRate 0.0012   Epoch: 17   Global Step: 90240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:03:47,026-Speed 5526.47 samples/sec   Loss 2.0733   LearningRate 0.0012   Epoch: 17   Global Step: 90250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:03:48,861-Speed 5583.96 samples/sec   Loss 2.0955   LearningRate 0.0012   Epoch: 17   Global Step: 90260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:03:50,720-Speed 5510.03 samples/sec   Loss 2.0445   LearningRate 0.0012   Epoch: 17   Global Step: 90270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:03:52,616-Speed 5401.69 samples/sec   Loss 2.0916   LearningRate 0.0012   Epoch: 17   Global Step: 90280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:03:54,506-Speed 5420.57 samples/sec   Loss 2.0349   LearningRate 0.0012   Epoch: 17   Global Step: 90290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:03:56,349-Speed 5560.28 samples/sec   Loss 2.0290   LearningRate 0.0012   Epoch: 17   Global Step: 90300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:03:58,186-Speed 5576.89 samples/sec   Loss 2.1238   LearningRate 0.0012   Epoch: 17   Global Step: 90310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:04:00,031-Speed 5550.83 samples/sec   Loss 2.0886   LearningRate 0.0011   Epoch: 17   Global Step: 90320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:01,873-Speed 5562.24 samples/sec   Loss 2.0811   LearningRate 0.0011   Epoch: 17   Global Step: 90330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:03,716-Speed 5560.68 samples/sec   Loss 2.0669   LearningRate 0.0011   Epoch: 17   Global Step: 90340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:05,568-Speed 5529.51 samples/sec   Loss 2.0756   LearningRate 0.0011   Epoch: 17   Global Step: 90350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:07,421-Speed 5528.54 samples/sec   Loss 2.0542   LearningRate 0.0011   Epoch: 17   Global Step: 90360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:09,262-Speed 5565.02 samples/sec   Loss 2.0797   LearningRate 0.0011   Epoch: 17   Global Step: 90370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:11,106-Speed 5556.24 samples/sec   Loss 2.1446   LearningRate 0.0011   Epoch: 17   Global Step: 90380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:12,944-Speed 5571.56 samples/sec   Loss 2.0412   LearningRate 0.0011   Epoch: 17   Global Step: 90390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:14,782-Speed 5574.45 samples/sec   Loss 2.0215   LearningRate 0.0011   Epoch: 17   Global Step: 90400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:16,637-Speed 5523.04 samples/sec   Loss 2.0889   LearningRate 0.0011   Epoch: 17   Global Step: 90410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:18,492-Speed 5520.10 samples/sec   Loss 2.1176   LearningRate 0.0011   Epoch: 17   Global Step: 90420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:04:20,325-Speed 5588.72 samples/sec   Loss 2.0848   LearningRate 0.0011   Epoch: 17   Global Step: 90430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:04:22,155-Speed 5599.27 samples/sec   Loss 1.9652   LearningRate 0.0011   Epoch: 17   Global Step: 90440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:23,994-Speed 5569.81 samples/sec   Loss 2.0823   LearningRate 0.0011   Epoch: 17   Global Step: 90450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:25,837-Speed 5559.04 samples/sec   Loss 2.0604   LearningRate 0.0011   Epoch: 17   Global Step: 90460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:27,676-Speed 5570.90 samples/sec   Loss 2.1839   LearningRate 0.0011   Epoch: 17   Global Step: 90470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:29,514-Speed 5572.09 samples/sec   Loss 2.0914   LearningRate 0.0011   Epoch: 17   Global Step: 90480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:31,367-Speed 5530.47 samples/sec   Loss 2.1434   LearningRate 0.0011   Epoch: 17   Global Step: 90490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:33,203-Speed 5577.62 samples/sec   Loss 2.1909   LearningRate 0.0011   Epoch: 17   Global Step: 90500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:35,051-Speed 5544.04 samples/sec   Loss 2.0429   LearningRate 0.0011   Epoch: 17   Global Step: 90510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:36,891-Speed 5566.60 samples/sec   Loss 2.1893   LearningRate 0.0011   Epoch: 17   Global Step: 90520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:38,751-Speed 5506.04 samples/sec   Loss 1.9920   LearningRate 0.0011   Epoch: 17   Global Step: 90530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:40,589-Speed 5573.30 samples/sec   Loss 2.0865   LearningRate 0.0011   Epoch: 17   Global Step: 90540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:04:42,435-Speed 5551.16 samples/sec   Loss 2.1367   LearningRate 0.0011   Epoch: 17   Global Step: 90550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:04:44,266-Speed 5594.19 samples/sec   Loss 2.1208   LearningRate 0.0011   Epoch: 17   Global Step: 90560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:04:46,096-Speed 5599.46 samples/sec   Loss 2.0462   LearningRate 0.0011   Epoch: 17   Global Step: 90570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:47,949-Speed 5528.23 samples/sec   Loss 2.0489   LearningRate 0.0011   Epoch: 17   Global Step: 90580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:49,781-Speed 5591.64 samples/sec   Loss 2.1251   LearningRate 0.0011   Epoch: 17   Global Step: 90590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:51,618-Speed 5575.78 samples/sec   Loss 2.1614   LearningRate 0.0011   Epoch: 17   Global Step: 90600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:53,460-Speed 5563.02 samples/sec   Loss 2.0894   LearningRate 0.0011   Epoch: 17   Global Step: 90610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:55,297-Speed 5574.97 samples/sec   Loss 2.1995   LearningRate 0.0011   Epoch: 17   Global Step: 90620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:57,140-Speed 5559.42 samples/sec   Loss 2.0447   LearningRate 0.0011   Epoch: 17   Global Step: 90630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:04:58,978-Speed 5571.25 samples/sec   Loss 2.0681   LearningRate 0.0011   Epoch: 17   Global Step: 90640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:00,813-Speed 5582.97 samples/sec   Loss 2.0584   LearningRate 0.0011   Epoch: 17   Global Step: 90650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:02,658-Speed 5552.47 samples/sec   Loss 1.9644   LearningRate 0.0011   Epoch: 17   Global Step: 90660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:04,494-Speed 5581.60 samples/sec   Loss 2.0504   LearningRate 0.0011   Epoch: 17   Global Step: 90670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:05:06,319-Speed 5613.04 samples/sec   Loss 2.1271   LearningRate 0.0011   Epoch: 17   Global Step: 90680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:08,170-Speed 5534.99 samples/sec   Loss 2.1778   LearningRate 0.0011   Epoch: 17   Global Step: 90690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:10,012-Speed 5558.35 samples/sec   Loss 2.0875   LearningRate 0.0011   Epoch: 17   Global Step: 90700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:11,866-Speed 5527.50 samples/sec   Loss 2.0215   LearningRate 0.0011   Epoch: 17   Global Step: 90710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:13,736-Speed 5476.86 samples/sec   Loss 2.0801   LearningRate 0.0011   Epoch: 17   Global Step: 90720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:15,580-Speed 5554.27 samples/sec   Loss 2.0869   LearningRate 0.0011   Epoch: 17   Global Step: 90730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:17,426-Speed 5552.25 samples/sec   Loss 2.0449   LearningRate 0.0011   Epoch: 17   Global Step: 90740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:19,268-Speed 5560.55 samples/sec   Loss 2.0387   LearningRate 0.0011   Epoch: 17   Global Step: 90750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:21,113-Speed 5551.27 samples/sec   Loss 2.0385   LearningRate 0.0011   Epoch: 17   Global Step: 90760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:22,963-Speed 5538.31 samples/sec   Loss 2.0037   LearningRate 0.0011   Epoch: 17   Global Step: 90770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:24,801-Speed 5572.48 samples/sec   Loss 1.9750   LearningRate 0.0011   Epoch: 17   Global Step: 90780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:05:26,643-Speed 5562.69 samples/sec   Loss 2.1519   LearningRate 0.0011   Epoch: 17   Global Step: 90790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:05:28,479-Speed 5578.91 samples/sec   Loss 2.1404   LearningRate 0.0010   Epoch: 17   Global Step: 90800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:05:30,329-Speed 5537.45 samples/sec   Loss 2.0856   LearningRate 0.0010   Epoch: 17   Global Step: 90810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:05:32,169-Speed 5569.36 samples/sec   Loss 2.0373   LearningRate 0.0010   Epoch: 17   Global Step: 90820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:05:34,003-Speed 5582.81 samples/sec   Loss 2.0654   LearningRate 0.0010   Epoch: 17   Global Step: 90830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:05:35,863-Speed 5509.48 samples/sec   Loss 2.0286   LearningRate 0.0010   Epoch: 17   Global Step: 90840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:05:37,689-Speed 5610.42 samples/sec   Loss 2.1191   LearningRate 0.0010   Epoch: 17   Global Step: 90850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:39,536-Speed 5545.09 samples/sec   Loss 2.0857   LearningRate 0.0010   Epoch: 17   Global Step: 90860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:41,378-Speed 5562.73 samples/sec   Loss 2.1456   LearningRate 0.0010   Epoch: 17   Global Step: 90870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:43,209-Speed 5594.67 samples/sec   Loss 2.0429   LearningRate 0.0010   Epoch: 17   Global Step: 90880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:45,053-Speed 5553.87 samples/sec   Loss 2.0569   LearningRate 0.0010   Epoch: 17   Global Step: 90890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:46,900-Speed 5548.92 samples/sec   Loss 2.0013   LearningRate 0.0010   Epoch: 17   Global Step: 90900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:48,741-Speed 5562.68 samples/sec   Loss 2.0271   LearningRate 0.0010   Epoch: 17   Global Step: 90910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:50,580-Speed 5569.51 samples/sec   Loss 2.0079   LearningRate 0.0010   Epoch: 17   Global Step: 90920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:52,430-Speed 5539.47 samples/sec   Loss 2.0974   LearningRate 0.0010   Epoch: 17   Global Step: 90930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:54,275-Speed 5551.36 samples/sec   Loss 2.1140   LearningRate 0.0010   Epoch: 17   Global Step: 90940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:05:56,114-Speed 5570.26 samples/sec   Loss 2.0172   LearningRate 0.0010   Epoch: 17   Global Step: 90950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:05:57,958-Speed 5554.08 samples/sec   Loss 2.0504   LearningRate 0.0010   Epoch: 17   Global Step: 90960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:05:59,853-Speed 5406.42 samples/sec   Loss 1.9566   LearningRate 0.0010   Epoch: 17   Global Step: 90970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:06:01,727-Speed 5467.53 samples/sec   Loss 2.0424   LearningRate 0.0010   Epoch: 17   Global Step: 90980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:06:03,554-Speed 5608.02 samples/sec   Loss 2.0075   LearningRate 0.0010   Epoch: 17   Global Step: 90990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:05,392-Speed 5571.39 samples/sec   Loss 2.1756   LearningRate 0.0010   Epoch: 17   Global Step: 91000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:07,232-Speed 5567.68 samples/sec   Loss 2.1747   LearningRate 0.0010   Epoch: 17   Global Step: 91010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:09,078-Speed 5550.04 samples/sec   Loss 2.1073   LearningRate 0.0010   Epoch: 17   Global Step: 91020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:10,924-Speed 5549.94 samples/sec   Loss 2.0791   LearningRate 0.0010   Epoch: 17   Global Step: 91030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:12,821-Speed 5399.76 samples/sec   Loss 2.0105   LearningRate 0.0010   Epoch: 17   Global Step: 91040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:24,176-Speed 901.85 samples/sec   Loss 1.7908   LearningRate 0.0010   Epoch: 18   Global Step: 91050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:26,048-Speed 5475.35 samples/sec   Loss 1.5925   LearningRate 0.0010   Epoch: 18   Global Step: 91060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:28,043-Speed 5134.62 samples/sec   Loss 1.6665   LearningRate 0.0010   Epoch: 18   Global Step: 91070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:29,911-Speed 5483.57 samples/sec   Loss 1.6445   LearningRate 0.0010   Epoch: 18   Global Step: 91080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:31,768-Speed 5514.21 samples/sec   Loss 1.7378   LearningRate 0.0010   Epoch: 18   Global Step: 91090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:06:33,603-Speed 5584.00 samples/sec   Loss 1.5850   LearningRate 0.0010   Epoch: 18   Global Step: 91100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:35,461-Speed 5511.72 samples/sec   Loss 1.7458   LearningRate 0.0010   Epoch: 18   Global Step: 91110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:37,318-Speed 5518.02 samples/sec   Loss 1.5713   LearningRate 0.0010   Epoch: 18   Global Step: 91120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:39,169-Speed 5532.71 samples/sec   Loss 1.6983   LearningRate 0.0010   Epoch: 18   Global Step: 91130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:41,012-Speed 5557.53 samples/sec   Loss 1.6516   LearningRate 0.0010   Epoch: 18   Global Step: 91140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:42,864-Speed 5532.49 samples/sec   Loss 1.6835   LearningRate 0.0010   Epoch: 18   Global Step: 91150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:44,704-Speed 5566.62 samples/sec   Loss 1.6414   LearningRate 0.0010   Epoch: 18   Global Step: 91160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:46,569-Speed 5491.72 samples/sec   Loss 1.6604   LearningRate 0.0010   Epoch: 18   Global Step: 91170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:48,423-Speed 5525.30 samples/sec   Loss 1.7600   LearningRate 0.0010   Epoch: 18   Global Step: 91180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:50,273-Speed 5540.52 samples/sec   Loss 1.6710   LearningRate 0.0010   Epoch: 18   Global Step: 91190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:06:52,125-Speed 5532.21 samples/sec   Loss 1.6525   LearningRate 0.0010   Epoch: 18   Global Step: 91200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:06:53,971-Speed 5547.06 samples/sec   Loss 1.6703   LearningRate 0.0010   Epoch: 18   Global Step: 91210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:06:55,808-Speed 5578.06 samples/sec   Loss 1.5907   LearningRate 0.0010   Epoch: 18   Global Step: 91220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:06:57,650-Speed 5562.07 samples/sec   Loss 1.6547   LearningRate 0.0010   Epoch: 18   Global Step: 91230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:06:59,478-Speed 5601.29 samples/sec   Loss 1.7225   LearningRate 0.0010   Epoch: 18   Global Step: 91240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:01,326-Speed 5544.05 samples/sec   Loss 1.6379   LearningRate 0.0010   Epoch: 18   Global Step: 91250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:03,177-Speed 5535.63 samples/sec   Loss 1.6503   LearningRate 0.0010   Epoch: 18   Global Step: 91260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:05,034-Speed 5514.68 samples/sec   Loss 1.7440   LearningRate 0.0010   Epoch: 18   Global Step: 91270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:06,911-Speed 5459.28 samples/sec   Loss 1.6765   LearningRate 0.0010   Epoch: 18   Global Step: 91280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:08,756-Speed 5550.13 samples/sec   Loss 1.6201   LearningRate 0.0010   Epoch: 18   Global Step: 91290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:10,600-Speed 5556.98 samples/sec   Loss 1.6741   LearningRate 0.0010   Epoch: 18   Global Step: 91300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:12,454-Speed 5524.82 samples/sec   Loss 1.7272   LearningRate 0.0009   Epoch: 18   Global Step: 91310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:14,309-Speed 5522.49 samples/sec   Loss 1.7218   LearningRate 0.0009   Epoch: 18   Global Step: 91320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:16,155-Speed 5547.99 samples/sec   Loss 1.7620   LearningRate 0.0009   Epoch: 18   Global Step: 91330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:18,013-Speed 5513.90 samples/sec   Loss 1.7044   LearningRate 0.0009   Epoch: 18   Global Step: 91340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:07:19,843-Speed 5598.32 samples/sec   Loss 1.7311   LearningRate 0.0009   Epoch: 18   Global Step: 91350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:21,710-Speed 5485.33 samples/sec   Loss 1.7198   LearningRate 0.0009   Epoch: 18   Global Step: 91360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:23,568-Speed 5514.83 samples/sec   Loss 1.6058   LearningRate 0.0009   Epoch: 18   Global Step: 91370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:25,411-Speed 5557.45 samples/sec   Loss 1.6709   LearningRate 0.0009   Epoch: 18   Global Step: 91380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:27,275-Speed 5496.27 samples/sec   Loss 1.6697   LearningRate 0.0009   Epoch: 18   Global Step: 91390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:29,143-Speed 5482.86 samples/sec   Loss 1.7078   LearningRate 0.0009   Epoch: 18   Global Step: 91400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:30,989-Speed 5550.40 samples/sec   Loss 1.6930   LearningRate 0.0009   Epoch: 18   Global Step: 91410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:32,841-Speed 5530.88 samples/sec   Loss 1.7601   LearningRate 0.0009   Epoch: 18   Global Step: 91420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:34,683-Speed 5560.23 samples/sec   Loss 1.5586   LearningRate 0.0009   Epoch: 18   Global Step: 91430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:36,523-Speed 5569.05 samples/sec   Loss 1.7070   LearningRate 0.0009   Epoch: 18   Global Step: 91440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:38,371-Speed 5543.49 samples/sec   Loss 1.6770   LearningRate 0.0009   Epoch: 18   Global Step: 91450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:07:40,215-Speed 5555.85 samples/sec   Loss 1.6820   LearningRate 0.0009   Epoch: 18   Global Step: 91460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:07:42,051-Speed 5577.67 samples/sec   Loss 1.7545   LearningRate 0.0009   Epoch: 18   Global Step: 91470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:43,901-Speed 5537.41 samples/sec   Loss 1.6303   LearningRate 0.0009   Epoch: 18   Global Step: 91480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:45,744-Speed 5556.56 samples/sec   Loss 1.6447   LearningRate 0.0009   Epoch: 18   Global Step: 91490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:47,605-Speed 5506.30 samples/sec   Loss 1.6687   LearningRate 0.0009   Epoch: 18   Global Step: 91500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:49,470-Speed 5493.00 samples/sec   Loss 1.5954   LearningRate 0.0009   Epoch: 18   Global Step: 91510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:51,315-Speed 5551.76 samples/sec   Loss 1.5900   LearningRate 0.0009   Epoch: 18   Global Step: 91520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:53,154-Speed 5572.62 samples/sec   Loss 1.7395   LearningRate 0.0009   Epoch: 18   Global Step: 91530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:54,998-Speed 5552.45 samples/sec   Loss 1.7522   LearningRate 0.0009   Epoch: 18   Global Step: 91540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:56,846-Speed 5543.98 samples/sec   Loss 1.6881   LearningRate 0.0009   Epoch: 18   Global Step: 91550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:07:58,690-Speed 5563.22 samples/sec   Loss 1.6661   LearningRate 0.0009   Epoch: 18   Global Step: 91560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:00,523-Speed 5587.42 samples/sec   Loss 1.7333   LearningRate 0.0009   Epoch: 18   Global Step: 91570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:02,367-Speed 5557.03 samples/sec   Loss 1.6646   LearningRate 0.0009   Epoch: 18   Global Step: 91580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:04,209-Speed 5561.82 samples/sec   Loss 1.6073   LearningRate 0.0009   Epoch: 18   Global Step: 91590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:06,055-Speed 5547.29 samples/sec   Loss 1.6327   LearningRate 0.0009   Epoch: 18   Global Step: 91600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:07,896-Speed 5564.81 samples/sec   Loss 1.6823   LearningRate 0.0009   Epoch: 18   Global Step: 91610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:09,734-Speed 5573.27 samples/sec   Loss 1.6937   LearningRate 0.0009   Epoch: 18   Global Step: 91620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:11,578-Speed 5558.21 samples/sec   Loss 1.6562   LearningRate 0.0009   Epoch: 18   Global Step: 91630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:13,429-Speed 5533.66 samples/sec   Loss 1.6501   LearningRate 0.0009   Epoch: 18   Global Step: 91640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:15,317-Speed 5423.77 samples/sec   Loss 1.7789   LearningRate 0.0009   Epoch: 18   Global Step: 91650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:17,160-Speed 5561.57 samples/sec   Loss 1.7022   LearningRate 0.0009   Epoch: 18   Global Step: 91660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:19,004-Speed 5553.02 samples/sec   Loss 1.6459   LearningRate 0.0009   Epoch: 18   Global Step: 91670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:08:20,837-Speed 5589.42 samples/sec   Loss 1.6844   LearningRate 0.0009   Epoch: 18   Global Step: 91680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:22,695-Speed 5513.41 samples/sec   Loss 1.7707   LearningRate 0.0009   Epoch: 18   Global Step: 91690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:24,534-Speed 5570.92 samples/sec   Loss 1.7086   LearningRate 0.0009   Epoch: 18   Global Step: 91700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:26,390-Speed 5518.66 samples/sec   Loss 1.7397   LearningRate 0.0009   Epoch: 18   Global Step: 91710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:28,234-Speed 5554.25 samples/sec   Loss 1.6245   LearningRate 0.0009   Epoch: 18   Global Step: 91720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:30,081-Speed 5548.36 samples/sec   Loss 1.6807   LearningRate 0.0009   Epoch: 18   Global Step: 91730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:31,924-Speed 5559.10 samples/sec   Loss 1.7155   LearningRate 0.0009   Epoch: 18   Global Step: 91740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:08:33,762-Speed 5571.03 samples/sec   Loss 1.6340   LearningRate 0.0009   Epoch: 18   Global Step: 91750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:08:35,600-Speed 5573.54 samples/sec   Loss 1.6111   LearningRate 0.0009   Epoch: 18   Global Step: 91760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:08:37,451-Speed 5535.67 samples/sec   Loss 1.7359   LearningRate 0.0009   Epoch: 18   Global Step: 91770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:08:39,301-Speed 5537.22 samples/sec   Loss 1.7212   LearningRate 0.0009   Epoch: 18   Global Step: 91780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:08:41,157-Speed 5519.07 samples/sec   Loss 1.6616   LearningRate 0.0009   Epoch: 18   Global Step: 91790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:08:43,006-Speed 5540.02 samples/sec   Loss 1.6319   LearningRate 0.0009   Epoch: 18   Global Step: 91800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:08:44,843-Speed 5575.47 samples/sec   Loss 1.6774   LearningRate 0.0009   Epoch: 18   Global Step: 91810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:08:46,685-Speed 5562.54 samples/sec   Loss 1.7858   LearningRate 0.0009   Epoch: 18   Global Step: 91820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:08:48,526-Speed 5562.56 samples/sec   Loss 1.7315   LearningRate 0.0009   Epoch: 18   Global Step: 91830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:08:50,384-Speed 5513.17 samples/sec   Loss 1.6792   LearningRate 0.0008   Epoch: 18   Global Step: 91840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:52,231-Speed 5548.15 samples/sec   Loss 1.7607   LearningRate 0.0008   Epoch: 18   Global Step: 91850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:54,090-Speed 5509.01 samples/sec   Loss 1.5944   LearningRate 0.0008   Epoch: 18   Global Step: 91860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:55,952-Speed 5501.84 samples/sec   Loss 1.6399   LearningRate 0.0008   Epoch: 18   Global Step: 91870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:57,800-Speed 5545.63 samples/sec   Loss 1.7281   LearningRate 0.0008   Epoch: 18   Global Step: 91880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:08:59,651-Speed 5531.83 samples/sec   Loss 1.6754   LearningRate 0.0008   Epoch: 18   Global Step: 91890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:09:01,566-Speed 5348.90 samples/sec   Loss 1.6271   LearningRate 0.0008   Epoch: 18   Global Step: 91900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:09:03,412-Speed 5550.80 samples/sec   Loss 1.7073   LearningRate 0.0008   Epoch: 18   Global Step: 91910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:09:05,256-Speed 5553.65 samples/sec   Loss 1.6675   LearningRate 0.0008   Epoch: 18   Global Step: 91920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:09:07,116-Speed 5507.54 samples/sec   Loss 1.6220   LearningRate 0.0008   Epoch: 18   Global Step: 91930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:09:08,949-Speed 5587.73 samples/sec   Loss 1.7275   LearningRate 0.0008   Epoch: 18   Global Step: 91940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:09:10,809-Speed 5508.96 samples/sec   Loss 1.6270   LearningRate 0.0008   Epoch: 18   Global Step: 91950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:09:12,684-Speed 5463.21 samples/sec   Loss 1.7448   LearningRate 0.0008   Epoch: 18   Global Step: 91960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:09:14,532-Speed 5545.15 samples/sec   Loss 1.7538   LearningRate 0.0008   Epoch: 18   Global Step: 91970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:09:16,371-Speed 5569.16 samples/sec   Loss 1.6840   LearningRate 0.0008   Epoch: 18   Global Step: 91980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:09:18,215-Speed 5555.71 samples/sec   Loss 1.7528   LearningRate 0.0008   Epoch: 18   Global Step: 91990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:09:20,056-Speed 5562.56 samples/sec   Loss 1.6257   LearningRate 0.0008   Epoch: 18   Global Step: 92000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:09:46,705-[lfw][92000]XNorm: 22.401583
Training: 2022-04-11 16:09:46,706-[lfw][92000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 16:09:46,706-[lfw][92000]Accuracy-Highest: 0.99817
Training: 2022-04-11 16:10:17,588-[cfp_fp][92000]XNorm: 21.515875
Training: 2022-04-11 16:10:17,589-[cfp_fp][92000]Accuracy-Flip: 0.98471+-0.00546
Training: 2022-04-11 16:10:17,589-[cfp_fp][92000]Accuracy-Highest: 0.98471
Training: 2022-04-11 16:10:44,264-[agedb_30][92000]XNorm: 22.596551
Training: 2022-04-11 16:10:44,265-[agedb_30][92000]Accuracy-Flip: 0.98267+-0.00775
Training: 2022-04-11 16:10:44,265-[agedb_30][92000]Accuracy-Highest: 0.98350
Training: 2022-04-11 16:10:46,110-Speed 119.00 samples/sec   Loss 1.7382   LearningRate 0.0008   Epoch: 18   Global Step: 92010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:10:47,947-Speed 5576.70 samples/sec   Loss 1.7208   LearningRate 0.0008   Epoch: 18   Global Step: 92020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:10:49,794-Speed 5546.14 samples/sec   Loss 1.6546   LearningRate 0.0008   Epoch: 18   Global Step: 92030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:10:51,638-Speed 5554.66 samples/sec   Loss 1.7049   LearningRate 0.0008   Epoch: 18   Global Step: 92040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:10:53,469-Speed 5594.66 samples/sec   Loss 1.7332   LearningRate 0.0008   Epoch: 18   Global Step: 92050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:10:55,311-Speed 5561.18 samples/sec   Loss 1.7266   LearningRate 0.0008   Epoch: 18   Global Step: 92060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:10:57,152-Speed 5563.64 samples/sec   Loss 1.7365   LearningRate 0.0008   Epoch: 18   Global Step: 92070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:10:58,984-Speed 5591.59 samples/sec   Loss 1.6848   LearningRate 0.0008   Epoch: 18   Global Step: 92080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:00,824-Speed 5567.64 samples/sec   Loss 1.6494   LearningRate 0.0008   Epoch: 18   Global Step: 92090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:02,666-Speed 5561.89 samples/sec   Loss 1.6625   LearningRate 0.0008   Epoch: 18   Global Step: 92100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:04,505-Speed 5570.05 samples/sec   Loss 1.7070   LearningRate 0.0008   Epoch: 18   Global Step: 92110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:06,344-Speed 5570.24 samples/sec   Loss 1.6246   LearningRate 0.0008   Epoch: 18   Global Step: 92120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:08,179-Speed 5580.78 samples/sec   Loss 1.6678   LearningRate 0.0008   Epoch: 18   Global Step: 92130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:10,008-Speed 5601.36 samples/sec   Loss 1.7348   LearningRate 0.0008   Epoch: 18   Global Step: 92140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:11,851-Speed 5558.33 samples/sec   Loss 1.6382   LearningRate 0.0008   Epoch: 18   Global Step: 92150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:11:13,723-Speed 5474.05 samples/sec   Loss 1.6664   LearningRate 0.0008   Epoch: 18   Global Step: 92160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:11:15,573-Speed 5538.62 samples/sec   Loss 1.6875   LearningRate 0.0008   Epoch: 18   Global Step: 92170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:11:17,409-Speed 5577.81 samples/sec   Loss 1.7372   LearningRate 0.0008   Epoch: 18   Global Step: 92180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:11:19,244-Speed 5583.91 samples/sec   Loss 1.7369   LearningRate 0.0008   Epoch: 18   Global Step: 92190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:11:21,083-Speed 5570.64 samples/sec   Loss 1.7234   LearningRate 0.0008   Epoch: 18   Global Step: 92200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:11:22,919-Speed 5576.80 samples/sec   Loss 1.7259   LearningRate 0.0008   Epoch: 18   Global Step: 92210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:11:24,772-Speed 5531.14 samples/sec   Loss 1.7372   LearningRate 0.0008   Epoch: 18   Global Step: 92220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:11:26,600-Speed 5603.23 samples/sec   Loss 1.7254   LearningRate 0.0008   Epoch: 18   Global Step: 92230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:28,463-Speed 5497.85 samples/sec   Loss 1.6892   LearningRate 0.0008   Epoch: 18   Global Step: 92240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:30,310-Speed 5545.07 samples/sec   Loss 1.6972   LearningRate 0.0008   Epoch: 18   Global Step: 92250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:32,152-Speed 5562.90 samples/sec   Loss 1.7258   LearningRate 0.0008   Epoch: 18   Global Step: 92260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:33,987-Speed 5583.55 samples/sec   Loss 1.7316   LearningRate 0.0008   Epoch: 18   Global Step: 92270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:35,829-Speed 5560.14 samples/sec   Loss 1.6969   LearningRate 0.0008   Epoch: 18   Global Step: 92280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:37,670-Speed 5566.36 samples/sec   Loss 1.6493   LearningRate 0.0008   Epoch: 18   Global Step: 92290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:39,510-Speed 5567.98 samples/sec   Loss 1.7174   LearningRate 0.0008   Epoch: 18   Global Step: 92300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:41,343-Speed 5586.00 samples/sec   Loss 1.7226   LearningRate 0.0008   Epoch: 18   Global Step: 92310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:43,187-Speed 5554.44 samples/sec   Loss 1.6214   LearningRate 0.0008   Epoch: 18   Global Step: 92320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:45,025-Speed 5574.62 samples/sec   Loss 1.7710   LearningRate 0.0008   Epoch: 18   Global Step: 92330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:11:46,871-Speed 5548.56 samples/sec   Loss 1.7483   LearningRate 0.0008   Epoch: 18   Global Step: 92340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:11:48,709-Speed 5574.57 samples/sec   Loss 1.6360   LearningRate 0.0008   Epoch: 18   Global Step: 92350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:11:50,544-Speed 5582.52 samples/sec   Loss 1.7182   LearningRate 0.0008   Epoch: 18   Global Step: 92360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:52,383-Speed 5568.25 samples/sec   Loss 1.7561   LearningRate 0.0008   Epoch: 18   Global Step: 92370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:54,223-Speed 5570.54 samples/sec   Loss 1.7047   LearningRate 0.0008   Epoch: 18   Global Step: 92380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:56,056-Speed 5586.93 samples/sec   Loss 1.6260   LearningRate 0.0008   Epoch: 18   Global Step: 92390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:57,903-Speed 5547.83 samples/sec   Loss 1.7763   LearningRate 0.0007   Epoch: 18   Global Step: 92400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:11:59,746-Speed 5556.10 samples/sec   Loss 1.7240   LearningRate 0.0007   Epoch: 18   Global Step: 92410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:01,607-Speed 5506.93 samples/sec   Loss 1.8217   LearningRate 0.0007   Epoch: 18   Global Step: 92420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:03,454-Speed 5544.00 samples/sec   Loss 1.7048   LearningRate 0.0007   Epoch: 18   Global Step: 92430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:05,293-Speed 5571.55 samples/sec   Loss 1.6341   LearningRate 0.0007   Epoch: 18   Global Step: 92440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:07,136-Speed 5558.36 samples/sec   Loss 1.7892   LearningRate 0.0007   Epoch: 18   Global Step: 92450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:08,980-Speed 5554.84 samples/sec   Loss 1.7282   LearningRate 0.0007   Epoch: 18   Global Step: 92460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:12:10,817-Speed 5575.67 samples/sec   Loss 1.7449   LearningRate 0.0007   Epoch: 18   Global Step: 92470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:12:12,667-Speed 5538.08 samples/sec   Loss 1.7168   LearningRate 0.0007   Epoch: 18   Global Step: 92480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:12:14,538-Speed 5476.49 samples/sec   Loss 1.7445   LearningRate 0.0007   Epoch: 18   Global Step: 92490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:12:16,384-Speed 5549.76 samples/sec   Loss 1.7580   LearningRate 0.0007   Epoch: 18   Global Step: 92500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:12:18,225-Speed 5562.65 samples/sec   Loss 1.7265   LearningRate 0.0007   Epoch: 18   Global Step: 92510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:12:20,060-Speed 5583.93 samples/sec   Loss 1.7155   LearningRate 0.0007   Epoch: 18   Global Step: 92520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:12:21,907-Speed 5546.24 samples/sec   Loss 1.7239   LearningRate 0.0007   Epoch: 18   Global Step: 92530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:12:23,736-Speed 5599.67 samples/sec   Loss 1.7294   LearningRate 0.0007   Epoch: 18   Global Step: 92540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:25,578-Speed 5560.45 samples/sec   Loss 1.6378   LearningRate 0.0007   Epoch: 18   Global Step: 92550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:27,435-Speed 5518.17 samples/sec   Loss 1.6693   LearningRate 0.0007   Epoch: 18   Global Step: 92560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:29,274-Speed 5570.91 samples/sec   Loss 1.6936   LearningRate 0.0007   Epoch: 18   Global Step: 92570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:31,127-Speed 5526.52 samples/sec   Loss 1.6877   LearningRate 0.0007   Epoch: 18   Global Step: 92580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:32,967-Speed 5567.21 samples/sec   Loss 1.7657   LearningRate 0.0007   Epoch: 18   Global Step: 92590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:34,810-Speed 5559.00 samples/sec   Loss 1.8042   LearningRate 0.0007   Epoch: 18   Global Step: 92600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:36,660-Speed 5537.46 samples/sec   Loss 1.7771   LearningRate 0.0007   Epoch: 18   Global Step: 92610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:38,509-Speed 5539.89 samples/sec   Loss 1.6487   LearningRate 0.0007   Epoch: 18   Global Step: 92620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:40,345-Speed 5581.64 samples/sec   Loss 1.6766   LearningRate 0.0007   Epoch: 18   Global Step: 92630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:42,215-Speed 5475.89 samples/sec   Loss 1.7296   LearningRate 0.0007   Epoch: 18   Global Step: 92640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:12:44,054-Speed 5570.94 samples/sec   Loss 1.6446   LearningRate 0.0007   Epoch: 18   Global Step: 92650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:12:45,886-Speed 5592.51 samples/sec   Loss 1.6652   LearningRate 0.0007   Epoch: 18   Global Step: 92660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:47,726-Speed 5565.05 samples/sec   Loss 1.8083   LearningRate 0.0007   Epoch: 18   Global Step: 92670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:49,562-Speed 5582.20 samples/sec   Loss 1.7546   LearningRate 0.0007   Epoch: 18   Global Step: 92680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:51,401-Speed 5567.63 samples/sec   Loss 1.7160   LearningRate 0.0007   Epoch: 18   Global Step: 92690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:53,243-Speed 5562.63 samples/sec   Loss 1.6299   LearningRate 0.0007   Epoch: 18   Global Step: 92700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:55,093-Speed 5536.16 samples/sec   Loss 1.7143   LearningRate 0.0007   Epoch: 18   Global Step: 92710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:56,929-Speed 5581.19 samples/sec   Loss 1.6892   LearningRate 0.0007   Epoch: 18   Global Step: 92720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:12:58,780-Speed 5533.83 samples/sec   Loss 1.8093   LearningRate 0.0007   Epoch: 18   Global Step: 92730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:00,641-Speed 5505.39 samples/sec   Loss 1.7441   LearningRate 0.0007   Epoch: 18   Global Step: 92740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:02,503-Speed 5499.87 samples/sec   Loss 1.6457   LearningRate 0.0007   Epoch: 18   Global Step: 92750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:04,372-Speed 5483.00 samples/sec   Loss 1.6851   LearningRate 0.0007   Epoch: 18   Global Step: 92760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:13:06,215-Speed 5558.39 samples/sec   Loss 1.7104   LearningRate 0.0007   Epoch: 18   Global Step: 92770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:13:08,058-Speed 5556.88 samples/sec   Loss 1.7813   LearningRate 0.0007   Epoch: 18   Global Step: 92780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:13:09,885-Speed 5608.35 samples/sec   Loss 1.8169   LearningRate 0.0007   Epoch: 18   Global Step: 92790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:11,734-Speed 5539.48 samples/sec   Loss 1.6818   LearningRate 0.0007   Epoch: 18   Global Step: 92800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:13,629-Speed 5403.21 samples/sec   Loss 1.7631   LearningRate 0.0007   Epoch: 18   Global Step: 92810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:15,482-Speed 5528.84 samples/sec   Loss 1.7516   LearningRate 0.0007   Epoch: 18   Global Step: 92820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:17,337-Speed 5522.50 samples/sec   Loss 1.7357   LearningRate 0.0007   Epoch: 18   Global Step: 92830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:19,184-Speed 5548.70 samples/sec   Loss 1.8795   LearningRate 0.0007   Epoch: 18   Global Step: 92840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:21,023-Speed 5570.73 samples/sec   Loss 1.7866   LearningRate 0.0007   Epoch: 18   Global Step: 92850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:22,860-Speed 5578.57 samples/sec   Loss 1.6383   LearningRate 0.0007   Epoch: 18   Global Step: 92860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:13:24,709-Speed 5542.11 samples/sec   Loss 1.7514   LearningRate 0.0007   Epoch: 18   Global Step: 92870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:13:26,573-Speed 5493.49 samples/sec   Loss 1.7378   LearningRate 0.0007   Epoch: 18   Global Step: 92880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:13:28,433-Speed 5507.35 samples/sec   Loss 1.6441   LearningRate 0.0007   Epoch: 18   Global Step: 92890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:13:30,274-Speed 5565.21 samples/sec   Loss 1.7721   LearningRate 0.0007   Epoch: 18   Global Step: 92900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:13:32,113-Speed 5571.41 samples/sec   Loss 1.7813   LearningRate 0.0007   Epoch: 18   Global Step: 92910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:13:33,964-Speed 5531.76 samples/sec   Loss 1.6250   LearningRate 0.0007   Epoch: 18   Global Step: 92920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:13:35,803-Speed 5571.50 samples/sec   Loss 1.6958   LearningRate 0.0007   Epoch: 18   Global Step: 92930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:13:37,653-Speed 5536.09 samples/sec   Loss 1.7531   LearningRate 0.0007   Epoch: 18   Global Step: 92940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:13:39,507-Speed 5526.60 samples/sec   Loss 1.7055   LearningRate 0.0007   Epoch: 18   Global Step: 92950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:13:41,358-Speed 5533.85 samples/sec   Loss 1.6304   LearningRate 0.0007   Epoch: 18   Global Step: 92960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:43,209-Speed 5536.47 samples/sec   Loss 1.6870   LearningRate 0.0007   Epoch: 18   Global Step: 92970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:45,053-Speed 5555.49 samples/sec   Loss 1.7398   LearningRate 0.0007   Epoch: 18   Global Step: 92980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:46,888-Speed 5579.95 samples/sec   Loss 1.6988   LearningRate 0.0007   Epoch: 18   Global Step: 92990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:48,746-Speed 5514.91 samples/sec   Loss 1.6509   LearningRate 0.0007   Epoch: 18   Global Step: 93000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:50,602-Speed 5518.53 samples/sec   Loss 1.7744   LearningRate 0.0006   Epoch: 18   Global Step: 93010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:52,460-Speed 5513.50 samples/sec   Loss 1.6538   LearningRate 0.0006   Epoch: 18   Global Step: 93020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:54,313-Speed 5525.99 samples/sec   Loss 1.7187   LearningRate 0.0006   Epoch: 18   Global Step: 93030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:56,166-Speed 5530.24 samples/sec   Loss 1.8288   LearningRate 0.0006   Epoch: 18   Global Step: 93040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:58,010-Speed 5554.03 samples/sec   Loss 1.7205   LearningRate 0.0006   Epoch: 18   Global Step: 93050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:13:59,940-Speed 5308.14 samples/sec   Loss 1.7449   LearningRate 0.0006   Epoch: 18   Global Step: 93060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:14:01,782-Speed 5562.62 samples/sec   Loss 1.7232   LearningRate 0.0006   Epoch: 18   Global Step: 93070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:14:03,627-Speed 5552.53 samples/sec   Loss 1.8108   LearningRate 0.0006   Epoch: 18   Global Step: 93080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:14:05,467-Speed 5567.20 samples/sec   Loss 1.7099   LearningRate 0.0006   Epoch: 18   Global Step: 93090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 16:14:07,306-Speed 5571.25 samples/sec   Loss 1.6513   LearningRate 0.0006   Epoch: 18   Global Step: 93100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:14:09,146-Speed 5565.09 samples/sec   Loss 1.6666   LearningRate 0.0006   Epoch: 18   Global Step: 93110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:14:10,982-Speed 5580.14 samples/sec   Loss 1.6691   LearningRate 0.0006   Epoch: 18   Global Step: 93120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:14:12,862-Speed 5447.92 samples/sec   Loss 1.6931   LearningRate 0.0006   Epoch: 18   Global Step: 93130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:14:14,727-Speed 5495.68 samples/sec   Loss 1.7056   LearningRate 0.0006   Epoch: 18   Global Step: 93140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:14:16,581-Speed 5525.17 samples/sec   Loss 1.7821   LearningRate 0.0006   Epoch: 18   Global Step: 93150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:14:18,460-Speed 5451.11 samples/sec   Loss 1.6632   LearningRate 0.0006   Epoch: 18   Global Step: 93160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:14:20,297-Speed 5575.89 samples/sec   Loss 1.7245   LearningRate 0.0006   Epoch: 18   Global Step: 93170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:14:22,126-Speed 5601.65 samples/sec   Loss 1.6881   LearningRate 0.0006   Epoch: 18   Global Step: 93180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:14:23,967-Speed 5563.18 samples/sec   Loss 1.7013   LearningRate 0.0006   Epoch: 18   Global Step: 93190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:14:25,808-Speed 5564.70 samples/sec   Loss 1.6584   LearningRate 0.0006   Epoch: 18   Global Step: 93200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:14:27,655-Speed 5547.17 samples/sec   Loss 1.7128   LearningRate 0.0006   Epoch: 18   Global Step: 93210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:14:29,502-Speed 5545.93 samples/sec   Loss 1.8186   LearningRate 0.0006   Epoch: 18   Global Step: 93220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:14:31,339-Speed 5575.46 samples/sec   Loss 1.6999   LearningRate 0.0006   Epoch: 18   Global Step: 93230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:14:33,176-Speed 5577.62 samples/sec   Loss 1.6641   LearningRate 0.0006   Epoch: 18   Global Step: 93240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:14:35,038-Speed 5501.53 samples/sec   Loss 1.6468   LearningRate 0.0006   Epoch: 18   Global Step: 93250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:14:36,892-Speed 5525.13 samples/sec   Loss 1.7021   LearningRate 0.0006   Epoch: 18   Global Step: 93260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:14:38,757-Speed 5492.28 samples/sec   Loss 1.7779   LearningRate 0.0006   Epoch: 18   Global Step: 93270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-11 16:14:40,610-Speed 5529.30 samples/sec   Loss 1.8076   LearningRate 0.0006   Epoch: 18   Global Step: 93280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:14:42,484-Speed 5465.15 samples/sec   Loss 1.7489   LearningRate 0.0006   Epoch: 18   Global Step: 93290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 16:14:44,327-Speed 5557.42 samples/sec   Loss 1.6522   LearningRate 0.0006   Epoch: 18   Global Step: 93300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:14:46,167-Speed 5570.55 samples/sec   Loss 1.7361   LearningRate 0.0006   Epoch: 18   Global Step: 93310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:14:48,011-Speed 5552.35 samples/sec   Loss 1.7230   LearningRate 0.0006   Epoch: 18   Global Step: 93320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:14:49,853-Speed 5562.18 samples/sec   Loss 1.7427   LearningRate 0.0006   Epoch: 18   Global Step: 93330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:14:51,697-Speed 5555.98 samples/sec   Loss 1.7416   LearningRate 0.0006   Epoch: 18   Global Step: 93340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:14:53,538-Speed 5562.39 samples/sec   Loss 1.7122   LearningRate 0.0006   Epoch: 18   Global Step: 93350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:14:55,378-Speed 5568.39 samples/sec   Loss 1.7689   LearningRate 0.0006   Epoch: 18   Global Step: 93360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:14:57,223-Speed 5551.85 samples/sec   Loss 1.7658   LearningRate 0.0006   Epoch: 18   Global Step: 93370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:14:59,054-Speed 5596.31 samples/sec   Loss 1.7162   LearningRate 0.0006   Epoch: 18   Global Step: 93380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:00,894-Speed 5569.02 samples/sec   Loss 1.6111   LearningRate 0.0006   Epoch: 18   Global Step: 93390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:02,750-Speed 5517.22 samples/sec   Loss 1.7072   LearningRate 0.0006   Epoch: 18   Global Step: 93400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:04,600-Speed 5540.02 samples/sec   Loss 1.7096   LearningRate 0.0006   Epoch: 18   Global Step: 93410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:06,459-Speed 5509.23 samples/sec   Loss 1.8028   LearningRate 0.0006   Epoch: 18   Global Step: 93420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:08,307-Speed 5540.74 samples/sec   Loss 1.7496   LearningRate 0.0006   Epoch: 18   Global Step: 93430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:10,151-Speed 5557.16 samples/sec   Loss 1.7125   LearningRate 0.0006   Epoch: 18   Global Step: 93440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:12,005-Speed 5524.99 samples/sec   Loss 1.6890   LearningRate 0.0006   Epoch: 18   Global Step: 93450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:13,874-Speed 5480.33 samples/sec   Loss 1.7420   LearningRate 0.0006   Epoch: 18   Global Step: 93460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:15,725-Speed 5535.34 samples/sec   Loss 1.6191   LearningRate 0.0006   Epoch: 18   Global Step: 93470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:17,572-Speed 5545.59 samples/sec   Loss 1.6934   LearningRate 0.0006   Epoch: 18   Global Step: 93480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:15:19,411-Speed 5569.26 samples/sec   Loss 1.7544   LearningRate 0.0006   Epoch: 18   Global Step: 93490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:15:21,241-Speed 5599.15 samples/sec   Loss 1.6464   LearningRate 0.0006   Epoch: 18   Global Step: 93500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:23,083-Speed 5560.97 samples/sec   Loss 1.7080   LearningRate 0.0006   Epoch: 18   Global Step: 93510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:24,940-Speed 5518.06 samples/sec   Loss 1.8150   LearningRate 0.0006   Epoch: 18   Global Step: 93520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:26,788-Speed 5542.00 samples/sec   Loss 1.7431   LearningRate 0.0006   Epoch: 18   Global Step: 93530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:28,683-Speed 5407.06 samples/sec   Loss 1.7794   LearningRate 0.0006   Epoch: 18   Global Step: 93540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:30,517-Speed 5586.12 samples/sec   Loss 1.6913   LearningRate 0.0006   Epoch: 18   Global Step: 93550   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:15:32,366-Speed 5538.12 samples/sec   Loss 1.7743   LearningRate 0.0006   Epoch: 18   Global Step: 93560   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:15:34,210-Speed 5555.93 samples/sec   Loss 1.7678   LearningRate 0.0006   Epoch: 18   Global Step: 93570   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:15:36,047-Speed 5576.84 samples/sec   Loss 1.7573   LearningRate 0.0006   Epoch: 18   Global Step: 93580   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:15:37,890-Speed 5557.38 samples/sec   Loss 1.6117   LearningRate 0.0006   Epoch: 18   Global Step: 93590   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:15:39,727-Speed 5578.21 samples/sec   Loss 1.7719   LearningRate 0.0006   Epoch: 18   Global Step: 93600   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:15:41,566-Speed 5568.94 samples/sec   Loss 1.7256   LearningRate 0.0006   Epoch: 18   Global Step: 93610   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:15:43,408-Speed 5562.75 samples/sec   Loss 1.6831   LearningRate 0.0006   Epoch: 18   Global Step: 93620   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:15:45,243-Speed 5581.73 samples/sec   Loss 1.7555   LearningRate 0.0006   Epoch: 18   Global Step: 93630   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:15:47,090-Speed 5546.94 samples/sec   Loss 1.7197   LearningRate 0.0006   Epoch: 18   Global Step: 93640   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:15:48,938-Speed 5542.05 samples/sec   Loss 1.7401   LearningRate 0.0006   Epoch: 18   Global Step: 93650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:50,787-Speed 5541.83 samples/sec   Loss 1.7511   LearningRate 0.0005   Epoch: 18   Global Step: 93660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:52,630-Speed 5558.29 samples/sec   Loss 1.7014   LearningRate 0.0005   Epoch: 18   Global Step: 93670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:54,519-Speed 5421.57 samples/sec   Loss 1.6876   LearningRate 0.0005   Epoch: 18   Global Step: 93680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:56,372-Speed 5528.56 samples/sec   Loss 1.7101   LearningRate 0.0005   Epoch: 18   Global Step: 93690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:15:58,215-Speed 5557.89 samples/sec   Loss 1.6504   LearningRate 0.0005   Epoch: 18   Global Step: 93700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:00,053-Speed 5572.89 samples/sec   Loss 1.8078   LearningRate 0.0005   Epoch: 18   Global Step: 93710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:01,893-Speed 5568.20 samples/sec   Loss 1.8174   LearningRate 0.0005   Epoch: 18   Global Step: 93720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:03,742-Speed 5541.21 samples/sec   Loss 1.7780   LearningRate 0.0005   Epoch: 18   Global Step: 93730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:05,586-Speed 5556.42 samples/sec   Loss 1.7615   LearningRate 0.0005   Epoch: 18   Global Step: 93740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:07,422-Speed 5577.91 samples/sec   Loss 1.7325   LearningRate 0.0005   Epoch: 18   Global Step: 93750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:09,262-Speed 5567.01 samples/sec   Loss 1.6885   LearningRate 0.0005   Epoch: 18   Global Step: 93760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:11,107-Speed 5553.25 samples/sec   Loss 1.8094   LearningRate 0.0005   Epoch: 18   Global Step: 93770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:12,955-Speed 5540.94 samples/sec   Loss 1.7195   LearningRate 0.0005   Epoch: 18   Global Step: 93780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:14,797-Speed 5562.64 samples/sec   Loss 1.6832   LearningRate 0.0005   Epoch: 18   Global Step: 93790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:16,653-Speed 5520.19 samples/sec   Loss 1.6898   LearningRate 0.0005   Epoch: 18   Global Step: 93800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:18,503-Speed 5535.09 samples/sec   Loss 1.7484   LearningRate 0.0005   Epoch: 18   Global Step: 93810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:20,344-Speed 5564.50 samples/sec   Loss 1.8185   LearningRate 0.0005   Epoch: 18   Global Step: 93820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:22,188-Speed 5554.82 samples/sec   Loss 1.7601   LearningRate 0.0005   Epoch: 18   Global Step: 93830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:24,071-Speed 5440.86 samples/sec   Loss 1.7417   LearningRate 0.0005   Epoch: 18   Global Step: 93840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:25,918-Speed 5547.89 samples/sec   Loss 1.7140   LearningRate 0.0005   Epoch: 18   Global Step: 93850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:16:27,843-Speed 5320.17 samples/sec   Loss 1.6503   LearningRate 0.0005   Epoch: 18   Global Step: 93860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:16:29,746-Speed 5384.12 samples/sec   Loss 1.7630   LearningRate 0.0005   Epoch: 18   Global Step: 93870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:16:31,588-Speed 5562.42 samples/sec   Loss 1.7125   LearningRate 0.0005   Epoch: 18   Global Step: 93880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:16:33,437-Speed 5538.87 samples/sec   Loss 1.7453   LearningRate 0.0005   Epoch: 18   Global Step: 93890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:16:35,279-Speed 5560.44 samples/sec   Loss 1.6688   LearningRate 0.0005   Epoch: 18   Global Step: 93900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:16:37,126-Speed 5547.29 samples/sec   Loss 1.7454   LearningRate 0.0005   Epoch: 18   Global Step: 93910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:16:38,979-Speed 5526.74 samples/sec   Loss 1.7905   LearningRate 0.0005   Epoch: 18   Global Step: 93920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:40,834-Speed 5524.21 samples/sec   Loss 1.7214   LearningRate 0.0005   Epoch: 18   Global Step: 93930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:42,701-Speed 5484.97 samples/sec   Loss 1.6771   LearningRate 0.0005   Epoch: 18   Global Step: 93940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:44,542-Speed 5564.81 samples/sec   Loss 1.6248   LearningRate 0.0005   Epoch: 18   Global Step: 93950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:46,382-Speed 5567.75 samples/sec   Loss 1.8158   LearningRate 0.0005   Epoch: 18   Global Step: 93960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:48,236-Speed 5525.65 samples/sec   Loss 1.7124   LearningRate 0.0005   Epoch: 18   Global Step: 93970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:50,082-Speed 5551.38 samples/sec   Loss 1.8401   LearningRate 0.0005   Epoch: 18   Global Step: 93980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:51,930-Speed 5541.26 samples/sec   Loss 1.7337   LearningRate 0.0005   Epoch: 18   Global Step: 93990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:16:53,778-Speed 5542.62 samples/sec   Loss 1.7487   LearningRate 0.0005   Epoch: 18   Global Step: 94000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:17:20,185-[lfw][94000]XNorm: 22.356079
Training: 2022-04-11 16:17:20,186-[lfw][94000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 16:17:20,186-[lfw][94000]Accuracy-Highest: 0.99817
Training: 2022-04-11 16:17:50,830-[cfp_fp][94000]XNorm: 21.361798
Training: 2022-04-11 16:17:50,830-[cfp_fp][94000]Accuracy-Flip: 0.98500+-0.00488
Training: 2022-04-11 16:17:50,831-[cfp_fp][94000]Accuracy-Highest: 0.98500
Training: 2022-04-11 16:18:17,527-[agedb_30][94000]XNorm: 22.452351
Training: 2022-04-11 16:18:17,528-[agedb_30][94000]Accuracy-Flip: 0.98150+-0.00621
Training: 2022-04-11 16:18:17,528-[agedb_30][94000]Accuracy-Highest: 0.98350
Training: 2022-04-11 16:18:19,378-Speed 119.63 samples/sec   Loss 1.6988   LearningRate 0.0005   Epoch: 18   Global Step: 94010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:21,194-Speed 5639.84 samples/sec   Loss 1.7415   LearningRate 0.0005   Epoch: 18   Global Step: 94020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:23,027-Speed 5587.25 samples/sec   Loss 1.7727   LearningRate 0.0005   Epoch: 18   Global Step: 94030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:24,889-Speed 5500.98 samples/sec   Loss 1.7324   LearningRate 0.0005   Epoch: 18   Global Step: 94040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:26,745-Speed 5520.01 samples/sec   Loss 1.7772   LearningRate 0.0005   Epoch: 18   Global Step: 94050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:28,574-Speed 5601.70 samples/sec   Loss 1.7370   LearningRate 0.0005   Epoch: 18   Global Step: 94060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:30,413-Speed 5570.31 samples/sec   Loss 1.6955   LearningRate 0.0005   Epoch: 18   Global Step: 94070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:32,250-Speed 5573.21 samples/sec   Loss 1.7015   LearningRate 0.0005   Epoch: 18   Global Step: 94080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:34,079-Speed 5601.81 samples/sec   Loss 1.7874   LearningRate 0.0005   Epoch: 18   Global Step: 94090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:35,907-Speed 5603.35 samples/sec   Loss 1.7043   LearningRate 0.0005   Epoch: 18   Global Step: 94100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:37,738-Speed 5595.54 samples/sec   Loss 1.8611   LearningRate 0.0005   Epoch: 18   Global Step: 94110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:39,584-Speed 5548.71 samples/sec   Loss 1.7610   LearningRate 0.0005   Epoch: 18   Global Step: 94120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:18:41,420-Speed 5580.04 samples/sec   Loss 1.7794   LearningRate 0.0005   Epoch: 18   Global Step: 94130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:43,260-Speed 5567.54 samples/sec   Loss 1.6832   LearningRate 0.0005   Epoch: 18   Global Step: 94140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:45,097-Speed 5575.45 samples/sec   Loss 1.7333   LearningRate 0.0005   Epoch: 18   Global Step: 94150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:46,934-Speed 5577.44 samples/sec   Loss 1.6918   LearningRate 0.0005   Epoch: 18   Global Step: 94160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:48,768-Speed 5587.34 samples/sec   Loss 1.7229   LearningRate 0.0005   Epoch: 18   Global Step: 94170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:50,600-Speed 5590.76 samples/sec   Loss 1.7508   LearningRate 0.0005   Epoch: 18   Global Step: 94180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:52,436-Speed 5579.72 samples/sec   Loss 1.6791   LearningRate 0.0005   Epoch: 18   Global Step: 94190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:54,289-Speed 5528.69 samples/sec   Loss 1.7028   LearningRate 0.0005   Epoch: 18   Global Step: 94200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:56,134-Speed 5551.45 samples/sec   Loss 1.7343   LearningRate 0.0005   Epoch: 18   Global Step: 94210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:57,970-Speed 5579.89 samples/sec   Loss 1.7830   LearningRate 0.0005   Epoch: 18   Global Step: 94220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:18:59,825-Speed 5522.46 samples/sec   Loss 1.7176   LearningRate 0.0005   Epoch: 18   Global Step: 94230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:19:01,680-Speed 5522.83 samples/sec   Loss 1.6571   LearningRate 0.0005   Epoch: 18   Global Step: 94240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:03,533-Speed 5527.55 samples/sec   Loss 1.6836   LearningRate 0.0005   Epoch: 18   Global Step: 94250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:05,381-Speed 5543.46 samples/sec   Loss 1.7327   LearningRate 0.0005   Epoch: 18   Global Step: 94260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:07,221-Speed 5566.97 samples/sec   Loss 1.8212   LearningRate 0.0005   Epoch: 18   Global Step: 94270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:09,064-Speed 5559.81 samples/sec   Loss 1.7194   LearningRate 0.0005   Epoch: 18   Global Step: 94280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:10,918-Speed 5525.57 samples/sec   Loss 1.8078   LearningRate 0.0005   Epoch: 18   Global Step: 94290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:12,773-Speed 5520.69 samples/sec   Loss 1.6501   LearningRate 0.0005   Epoch: 18   Global Step: 94300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:14,650-Speed 5459.01 samples/sec   Loss 1.6408   LearningRate 0.0005   Epoch: 18   Global Step: 94310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:16,558-Speed 5369.08 samples/sec   Loss 1.6951   LearningRate 0.0005   Epoch: 18   Global Step: 94320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:18,410-Speed 5531.39 samples/sec   Loss 1.6827   LearningRate 0.0005   Epoch: 18   Global Step: 94330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:20,256-Speed 5546.48 samples/sec   Loss 1.6938   LearningRate 0.0005   Epoch: 18   Global Step: 94340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:19:22,103-Speed 5547.99 samples/sec   Loss 1.7884   LearningRate 0.0005   Epoch: 18   Global Step: 94350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:19:23,933-Speed 5598.75 samples/sec   Loss 1.7948   LearningRate 0.0005   Epoch: 18   Global Step: 94360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:25,784-Speed 5532.99 samples/sec   Loss 1.7238   LearningRate 0.0005   Epoch: 18   Global Step: 94370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:27,667-Speed 5441.66 samples/sec   Loss 1.7015   LearningRate 0.0004   Epoch: 18   Global Step: 94380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:29,512-Speed 5551.61 samples/sec   Loss 1.7921   LearningRate 0.0004   Epoch: 18   Global Step: 94390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:31,353-Speed 5564.29 samples/sec   Loss 1.6675   LearningRate 0.0004   Epoch: 18   Global Step: 94400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:33,199-Speed 5549.21 samples/sec   Loss 1.6353   LearningRate 0.0004   Epoch: 18   Global Step: 94410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:35,041-Speed 5561.66 samples/sec   Loss 1.7226   LearningRate 0.0004   Epoch: 18   Global Step: 94420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:36,888-Speed 5543.87 samples/sec   Loss 1.7278   LearningRate 0.0004   Epoch: 18   Global Step: 94430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:38,751-Speed 5501.92 samples/sec   Loss 1.7202   LearningRate 0.0004   Epoch: 18   Global Step: 94440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:40,598-Speed 5544.67 samples/sec   Loss 1.6014   LearningRate 0.0004   Epoch: 18   Global Step: 94450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:42,441-Speed 5557.48 samples/sec   Loss 1.7249   LearningRate 0.0004   Epoch: 18   Global Step: 94460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:19:44,281-Speed 5568.81 samples/sec   Loss 1.6782   LearningRate 0.0004   Epoch: 18   Global Step: 94470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:19:46,121-Speed 5568.57 samples/sec   Loss 1.6951   LearningRate 0.0004   Epoch: 18   Global Step: 94480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:47,965-Speed 5554.99 samples/sec   Loss 1.7873   LearningRate 0.0004   Epoch: 18   Global Step: 94490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:49,831-Speed 5488.73 samples/sec   Loss 1.6216   LearningRate 0.0004   Epoch: 18   Global Step: 94500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:51,688-Speed 5516.91 samples/sec   Loss 1.6980   LearningRate 0.0004   Epoch: 18   Global Step: 94510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:53,535-Speed 5545.97 samples/sec   Loss 1.7935   LearningRate 0.0004   Epoch: 18   Global Step: 94520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:55,375-Speed 5565.88 samples/sec   Loss 1.7529   LearningRate 0.0004   Epoch: 18   Global Step: 94530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:57,220-Speed 5554.09 samples/sec   Loss 1.6298   LearningRate 0.0004   Epoch: 18   Global Step: 94540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:19:59,056-Speed 5580.11 samples/sec   Loss 1.7852   LearningRate 0.0004   Epoch: 18   Global Step: 94550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:00,905-Speed 5539.72 samples/sec   Loss 1.7238   LearningRate 0.0004   Epoch: 18   Global Step: 94560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:02,741-Speed 5578.70 samples/sec   Loss 1.6833   LearningRate 0.0004   Epoch: 18   Global Step: 94570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:04,576-Speed 5583.54 samples/sec   Loss 1.7033   LearningRate 0.0004   Epoch: 18   Global Step: 94580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:20:06,404-Speed 5604.79 samples/sec   Loss 1.6958   LearningRate 0.0004   Epoch: 18   Global Step: 94590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:08,239-Speed 5581.46 samples/sec   Loss 1.7923   LearningRate 0.0004   Epoch: 18   Global Step: 94600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:10,075-Speed 5580.76 samples/sec   Loss 1.7388   LearningRate 0.0004   Epoch: 18   Global Step: 94610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:11,933-Speed 5511.48 samples/sec   Loss 1.7252   LearningRate 0.0004   Epoch: 18   Global Step: 94620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:13,779-Speed 5551.15 samples/sec   Loss 1.6816   LearningRate 0.0004   Epoch: 18   Global Step: 94630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:15,613-Speed 5583.72 samples/sec   Loss 1.7293   LearningRate 0.0004   Epoch: 18   Global Step: 94640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:17,461-Speed 5544.90 samples/sec   Loss 1.7094   LearningRate 0.0004   Epoch: 18   Global Step: 94650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:19,301-Speed 5566.84 samples/sec   Loss 1.6402   LearningRate 0.0004   Epoch: 18   Global Step: 94660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:21,153-Speed 5531.93 samples/sec   Loss 1.7001   LearningRate 0.0004   Epoch: 18   Global Step: 94670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:22,993-Speed 5569.10 samples/sec   Loss 1.7385   LearningRate 0.0004   Epoch: 18   Global Step: 94680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:24,836-Speed 5556.52 samples/sec   Loss 1.7435   LearningRate 0.0004   Epoch: 18   Global Step: 94690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:20:26,661-Speed 5613.70 samples/sec   Loss 1.6379   LearningRate 0.0004   Epoch: 18   Global Step: 94700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:28,492-Speed 5595.90 samples/sec   Loss 1.6982   LearningRate 0.0004   Epoch: 18   Global Step: 94710   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:20:30,349-Speed 5515.02 samples/sec   Loss 1.7244   LearningRate 0.0004   Epoch: 18   Global Step: 94720   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:20:32,188-Speed 5568.84 samples/sec   Loss 1.7365   LearningRate 0.0004   Epoch: 18   Global Step: 94730   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:20:34,031-Speed 5560.92 samples/sec   Loss 1.7122   LearningRate 0.0004   Epoch: 18   Global Step: 94740   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:20:35,877-Speed 5548.00 samples/sec   Loss 1.6970   LearningRate 0.0004   Epoch: 18   Global Step: 94750   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:20:37,726-Speed 5540.03 samples/sec   Loss 1.6495   LearningRate 0.0004   Epoch: 18   Global Step: 94760   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:20:39,563-Speed 5576.03 samples/sec   Loss 1.6980   LearningRate 0.0004   Epoch: 18   Global Step: 94770   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:20:41,420-Speed 5517.09 samples/sec   Loss 1.7530   LearningRate 0.0004   Epoch: 18   Global Step: 94780   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:20:43,260-Speed 5566.56 samples/sec   Loss 1.6593   LearningRate 0.0004   Epoch: 18   Global Step: 94790   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:20:45,106-Speed 5549.06 samples/sec   Loss 1.6704   LearningRate 0.0004   Epoch: 18   Global Step: 94800   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:20:46,941-Speed 5582.02 samples/sec   Loss 1.7047   LearningRate 0.0004   Epoch: 18   Global Step: 94810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:48,774-Speed 5591.12 samples/sec   Loss 1.8218   LearningRate 0.0004   Epoch: 18   Global Step: 94820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:50,610-Speed 5577.39 samples/sec   Loss 1.7318   LearningRate 0.0004   Epoch: 18   Global Step: 94830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:52,451-Speed 5563.95 samples/sec   Loss 1.6882   LearningRate 0.0004   Epoch: 18   Global Step: 94840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:54,298-Speed 5546.97 samples/sec   Loss 1.6572   LearningRate 0.0004   Epoch: 18   Global Step: 94850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:56,132-Speed 5586.27 samples/sec   Loss 1.6071   LearningRate 0.0004   Epoch: 18   Global Step: 94860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:57,969-Speed 5574.79 samples/sec   Loss 1.8373   LearningRate 0.0004   Epoch: 18   Global Step: 94870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:20:59,805-Speed 5581.19 samples/sec   Loss 1.6901   LearningRate 0.0004   Epoch: 18   Global Step: 94880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:01,654-Speed 5539.52 samples/sec   Loss 1.6510   LearningRate 0.0004   Epoch: 18   Global Step: 94890   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:21:03,511-Speed 5517.76 samples/sec   Loss 1.7170   LearningRate 0.0004   Epoch: 18   Global Step: 94900   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:21:05,358-Speed 5546.29 samples/sec   Loss 1.7176   LearningRate 0.0004   Epoch: 18   Global Step: 94910   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:21:07,200-Speed 5563.54 samples/sec   Loss 1.7559   LearningRate 0.0004   Epoch: 18   Global Step: 94920   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:21:09,036-Speed 5578.76 samples/sec   Loss 1.6673   LearningRate 0.0004   Epoch: 18   Global Step: 94930   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:21:10,874-Speed 5573.44 samples/sec   Loss 1.6362   LearningRate 0.0004   Epoch: 18   Global Step: 94940   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:21:12,722-Speed 5542.28 samples/sec   Loss 1.6828   LearningRate 0.0004   Epoch: 18   Global Step: 94950   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:21:14,613-Speed 5418.48 samples/sec   Loss 1.7246   LearningRate 0.0004   Epoch: 18   Global Step: 94960   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:21:16,453-Speed 5566.91 samples/sec   Loss 1.6546   LearningRate 0.0004   Epoch: 18   Global Step: 94970   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:21:18,322-Speed 5480.11 samples/sec   Loss 1.6897   LearningRate 0.0004   Epoch: 18   Global Step: 94980   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:21:20,182-Speed 5508.42 samples/sec   Loss 1.7376   LearningRate 0.0004   Epoch: 18   Global Step: 94990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:22,021-Speed 5568.21 samples/sec   Loss 1.6309   LearningRate 0.0004   Epoch: 18   Global Step: 95000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:23,856-Speed 5584.25 samples/sec   Loss 1.7191   LearningRate 0.0004   Epoch: 18   Global Step: 95010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:25,735-Speed 5452.24 samples/sec   Loss 1.6312   LearningRate 0.0004   Epoch: 18   Global Step: 95020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:27,576-Speed 5563.67 samples/sec   Loss 1.7187   LearningRate 0.0004   Epoch: 18   Global Step: 95030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:29,410-Speed 5586.06 samples/sec   Loss 1.7195   LearningRate 0.0004   Epoch: 18   Global Step: 95040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:31,244-Speed 5583.91 samples/sec   Loss 1.6956   LearningRate 0.0004   Epoch: 18   Global Step: 95050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:33,081-Speed 5578.10 samples/sec   Loss 1.7373   LearningRate 0.0004   Epoch: 18   Global Step: 95060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:34,918-Speed 5575.11 samples/sec   Loss 1.6628   LearningRate 0.0004   Epoch: 18   Global Step: 95070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:36,773-Speed 5523.03 samples/sec   Loss 1.8141   LearningRate 0.0004   Epoch: 18   Global Step: 95080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:38,610-Speed 5578.10 samples/sec   Loss 1.7567   LearningRate 0.0004   Epoch: 18   Global Step: 95090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:40,452-Speed 5559.23 samples/sec   Loss 1.6539   LearningRate 0.0004   Epoch: 18   Global Step: 95100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:42,304-Speed 5530.44 samples/sec   Loss 1.8315   LearningRate 0.0004   Epoch: 18   Global Step: 95110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:44,150-Speed 5550.90 samples/sec   Loss 1.7952   LearningRate 0.0004   Epoch: 18   Global Step: 95120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:45,982-Speed 5591.59 samples/sec   Loss 1.7584   LearningRate 0.0004   Epoch: 18   Global Step: 95130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:47,831-Speed 5542.54 samples/sec   Loss 1.7148   LearningRate 0.0004   Epoch: 18   Global Step: 95140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:49,704-Speed 5466.41 samples/sec   Loss 1.6926   LearningRate 0.0004   Epoch: 18   Global Step: 95150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:51,567-Speed 5499.72 samples/sec   Loss 1.8588   LearningRate 0.0004   Epoch: 18   Global Step: 95160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:53,406-Speed 5570.28 samples/sec   Loss 1.8265   LearningRate 0.0004   Epoch: 18   Global Step: 95170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:55,243-Speed 5577.85 samples/sec   Loss 1.7059   LearningRate 0.0003   Epoch: 18   Global Step: 95180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:21:57,077-Speed 5583.13 samples/sec   Loss 1.7451   LearningRate 0.0003   Epoch: 18   Global Step: 95190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:21:58,920-Speed 5559.09 samples/sec   Loss 1.7815   LearningRate 0.0003   Epoch: 18   Global Step: 95200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:22:00,751-Speed 5594.96 samples/sec   Loss 1.8254   LearningRate 0.0003   Epoch: 18   Global Step: 95210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:02,608-Speed 5514.65 samples/sec   Loss 1.6443   LearningRate 0.0003   Epoch: 18   Global Step: 95220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:04,453-Speed 5552.99 samples/sec   Loss 1.7911   LearningRate 0.0003   Epoch: 18   Global Step: 95230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:06,295-Speed 5563.82 samples/sec   Loss 1.7031   LearningRate 0.0003   Epoch: 18   Global Step: 95240   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:22:08,132-Speed 5576.34 samples/sec   Loss 1.6915   LearningRate 0.0003   Epoch: 18   Global Step: 95250   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:22:09,967-Speed 5582.50 samples/sec   Loss 1.6708   LearningRate 0.0003   Epoch: 18   Global Step: 95260   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:22:11,806-Speed 5568.69 samples/sec   Loss 1.7522   LearningRate 0.0003   Epoch: 18   Global Step: 95270   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:22:13,686-Speed 5449.84 samples/sec   Loss 1.6990   LearningRate 0.0003   Epoch: 18   Global Step: 95280   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:22:15,547-Speed 5503.34 samples/sec   Loss 1.7476   LearningRate 0.0003   Epoch: 18   Global Step: 95290   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:22:17,420-Speed 5470.51 samples/sec   Loss 1.6889   LearningRate 0.0003   Epoch: 18   Global Step: 95300   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:22:19,254-Speed 5585.40 samples/sec   Loss 1.6613   LearningRate 0.0003   Epoch: 18   Global Step: 95310   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:22:21,088-Speed 5584.23 samples/sec   Loss 1.6118   LearningRate 0.0003   Epoch: 18   Global Step: 95320   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:22:22,935-Speed 5547.97 samples/sec   Loss 1.6892   LearningRate 0.0003   Epoch: 18   Global Step: 95330   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:22:24,777-Speed 5560.31 samples/sec   Loss 1.8240   LearningRate 0.0003   Epoch: 18   Global Step: 95340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:26,617-Speed 5566.67 samples/sec   Loss 1.6743   LearningRate 0.0003   Epoch: 18   Global Step: 95350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:28,455-Speed 5574.03 samples/sec   Loss 1.7499   LearningRate 0.0003   Epoch: 18   Global Step: 95360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:30,296-Speed 5565.51 samples/sec   Loss 1.6288   LearningRate 0.0003   Epoch: 18   Global Step: 95370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:32,142-Speed 5547.65 samples/sec   Loss 1.7236   LearningRate 0.0003   Epoch: 18   Global Step: 95380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:33,983-Speed 5574.40 samples/sec   Loss 1.6928   LearningRate 0.0003   Epoch: 18   Global Step: 95390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:35,815-Speed 5591.51 samples/sec   Loss 1.6367   LearningRate 0.0003   Epoch: 18   Global Step: 95400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:37,666-Speed 5533.00 samples/sec   Loss 1.7567   LearningRate 0.0003   Epoch: 18   Global Step: 95410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:39,510-Speed 5554.40 samples/sec   Loss 1.7021   LearningRate 0.0003   Epoch: 18   Global Step: 95420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:41,351-Speed 5564.81 samples/sec   Loss 1.7064   LearningRate 0.0003   Epoch: 18   Global Step: 95430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:43,209-Speed 5514.12 samples/sec   Loss 1.7556   LearningRate 0.0003   Epoch: 18   Global Step: 95440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:22:45,037-Speed 5603.23 samples/sec   Loss 1.8782   LearningRate 0.0003   Epoch: 18   Global Step: 95450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:46,875-Speed 5573.04 samples/sec   Loss 1.7410   LearningRate 0.0003   Epoch: 18   Global Step: 95460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:48,724-Speed 5542.10 samples/sec   Loss 1.7313   LearningRate 0.0003   Epoch: 18   Global Step: 95470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:50,558-Speed 5584.61 samples/sec   Loss 1.7172   LearningRate 0.0003   Epoch: 18   Global Step: 95480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:52,398-Speed 5569.27 samples/sec   Loss 1.7580   LearningRate 0.0003   Epoch: 18   Global Step: 95490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:54,239-Speed 5562.72 samples/sec   Loss 1.7766   LearningRate 0.0003   Epoch: 18   Global Step: 95500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:56,089-Speed 5538.07 samples/sec   Loss 1.7145   LearningRate 0.0003   Epoch: 18   Global Step: 95510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:57,927-Speed 5573.16 samples/sec   Loss 1.7146   LearningRate 0.0003   Epoch: 18   Global Step: 95520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:22:59,774-Speed 5545.38 samples/sec   Loss 1.7226   LearningRate 0.0003   Epoch: 18   Global Step: 95530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:01,625-Speed 5536.02 samples/sec   Loss 1.8193   LearningRate 0.0003   Epoch: 18   Global Step: 95540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:03,495-Speed 5478.17 samples/sec   Loss 1.5902   LearningRate 0.0003   Epoch: 18   Global Step: 95550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:23:05,347-Speed 5531.66 samples/sec   Loss 1.6633   LearningRate 0.0003   Epoch: 18   Global Step: 95560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:23:07,184-Speed 5575.31 samples/sec   Loss 1.7138   LearningRate 0.0003   Epoch: 18   Global Step: 95570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:09,022-Speed 5573.37 samples/sec   Loss 1.6554   LearningRate 0.0003   Epoch: 18   Global Step: 95580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:10,869-Speed 5546.84 samples/sec   Loss 1.7846   LearningRate 0.0003   Epoch: 18   Global Step: 95590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:12,738-Speed 5482.16 samples/sec   Loss 1.6664   LearningRate 0.0003   Epoch: 18   Global Step: 95600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:14,598-Speed 5507.72 samples/sec   Loss 1.7221   LearningRate 0.0003   Epoch: 18   Global Step: 95610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:16,439-Speed 5563.03 samples/sec   Loss 1.6306   LearningRate 0.0003   Epoch: 18   Global Step: 95620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:18,280-Speed 5564.11 samples/sec   Loss 1.5820   LearningRate 0.0003   Epoch: 18   Global Step: 95630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:20,115-Speed 5582.99 samples/sec   Loss 1.7859   LearningRate 0.0003   Epoch: 18   Global Step: 95640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:21,956-Speed 5563.98 samples/sec   Loss 1.7794   LearningRate 0.0003   Epoch: 18   Global Step: 95650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:23,803-Speed 5546.57 samples/sec   Loss 1.6530   LearningRate 0.0003   Epoch: 18   Global Step: 95660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:25,694-Speed 5419.11 samples/sec   Loss 1.6828   LearningRate 0.0003   Epoch: 18   Global Step: 95670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:27,537-Speed 5555.11 samples/sec   Loss 1.7090   LearningRate 0.0003   Epoch: 18   Global Step: 95680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:29,383-Speed 5550.28 samples/sec   Loss 1.7249   LearningRate 0.0003   Epoch: 18   Global Step: 95690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:31,246-Speed 5497.81 samples/sec   Loss 1.6951   LearningRate 0.0003   Epoch: 18   Global Step: 95700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:33,097-Speed 5534.72 samples/sec   Loss 1.8077   LearningRate 0.0003   Epoch: 18   Global Step: 95710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:34,943-Speed 5550.00 samples/sec   Loss 1.6792   LearningRate 0.0003   Epoch: 18   Global Step: 95720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:36,784-Speed 5563.78 samples/sec   Loss 1.6317   LearningRate 0.0003   Epoch: 18   Global Step: 95730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:38,629-Speed 5554.76 samples/sec   Loss 1.7649   LearningRate 0.0003   Epoch: 18   Global Step: 95740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:40,470-Speed 5563.91 samples/sec   Loss 1.8371   LearningRate 0.0003   Epoch: 18   Global Step: 95750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:42,326-Speed 5518.46 samples/sec   Loss 1.6847   LearningRate 0.0003   Epoch: 18   Global Step: 95760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:23:44,161-Speed 5583.59 samples/sec   Loss 1.7723   LearningRate 0.0003   Epoch: 18   Global Step: 95770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:23:45,997-Speed 5580.31 samples/sec   Loss 1.6993   LearningRate 0.0003   Epoch: 18   Global Step: 95780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:23:47,837-Speed 5566.65 samples/sec   Loss 1.7067   LearningRate 0.0003   Epoch: 18   Global Step: 95790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:23:49,663-Speed 5608.12 samples/sec   Loss 1.7339   LearningRate 0.0003   Epoch: 18   Global Step: 95800   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:23:51,522-Speed 5511.21 samples/sec   Loss 1.7682   LearningRate 0.0003   Epoch: 18   Global Step: 95810   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:23:53,374-Speed 5531.92 samples/sec   Loss 1.7474   LearningRate 0.0003   Epoch: 18   Global Step: 95820   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:23:55,227-Speed 5527.51 samples/sec   Loss 1.7121   LearningRate 0.0003   Epoch: 18   Global Step: 95830   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:23:57,063-Speed 5580.78 samples/sec   Loss 1.6532   LearningRate 0.0003   Epoch: 18   Global Step: 95840   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:23:58,902-Speed 5568.68 samples/sec   Loss 1.6766   LearningRate 0.0003   Epoch: 18   Global Step: 95850   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:24:00,742-Speed 5566.96 samples/sec   Loss 1.7047   LearningRate 0.0003   Epoch: 18   Global Step: 95860   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:24:02,591-Speed 5540.23 samples/sec   Loss 1.7445   LearningRate 0.0003   Epoch: 18   Global Step: 95870   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:24:04,435-Speed 5558.44 samples/sec   Loss 1.8043   LearningRate 0.0003   Epoch: 18   Global Step: 95880   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:24:06,271-Speed 5577.80 samples/sec   Loss 1.7631   LearningRate 0.0003   Epoch: 18   Global Step: 95890   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:24:08,105-Speed 5587.85 samples/sec   Loss 1.7372   LearningRate 0.0003   Epoch: 18   Global Step: 95900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:24:09,945-Speed 5564.48 samples/sec   Loss 1.7694   LearningRate 0.0003   Epoch: 18   Global Step: 95910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:24:11,817-Speed 5471.69 samples/sec   Loss 1.7495   LearningRate 0.0003   Epoch: 18   Global Step: 95920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:24:13,689-Speed 5472.67 samples/sec   Loss 1.6386   LearningRate 0.0003   Epoch: 18   Global Step: 95930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:24:15,556-Speed 5487.66 samples/sec   Loss 1.8113   LearningRate 0.0003   Epoch: 18   Global Step: 95940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:24:17,422-Speed 5488.19 samples/sec   Loss 1.7306   LearningRate 0.0003   Epoch: 18   Global Step: 95950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:24:19,272-Speed 5539.63 samples/sec   Loss 1.7387   LearningRate 0.0003   Epoch: 18   Global Step: 95960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:24:21,113-Speed 5562.49 samples/sec   Loss 1.7172   LearningRate 0.0003   Epoch: 18   Global Step: 95970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:24:22,967-Speed 5528.12 samples/sec   Loss 1.7220   LearningRate 0.0003   Epoch: 18   Global Step: 95980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:24:24,816-Speed 5539.02 samples/sec   Loss 1.7026   LearningRate 0.0003   Epoch: 18   Global Step: 95990   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:24:26,660-Speed 5555.51 samples/sec   Loss 1.6638   LearningRate 0.0003   Epoch: 18   Global Step: 96000   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:24:53,347-[lfw][96000]XNorm: 22.436321
Training: 2022-04-11 16:24:53,347-[lfw][96000]Accuracy-Flip: 0.99817+-0.00273
Training: 2022-04-11 16:24:53,348-[lfw][96000]Accuracy-Highest: 0.99817
Training: 2022-04-11 16:25:24,198-[cfp_fp][96000]XNorm: 21.560779
Training: 2022-04-11 16:25:24,199-[cfp_fp][96000]Accuracy-Flip: 0.98386+-0.00593
Training: 2022-04-11 16:25:24,199-[cfp_fp][96000]Accuracy-Highest: 0.98500
Training: 2022-04-11 16:25:50,762-[agedb_30][96000]XNorm: 22.637707
Training: 2022-04-11 16:25:50,763-[agedb_30][96000]Accuracy-Flip: 0.98417+-0.00647
Training: 2022-04-11 16:25:50,763-[agedb_30][96000]Accuracy-Highest: 0.98417
Training: 2022-04-11 16:25:52,630-Speed 119.11 samples/sec   Loss 1.7052   LearningRate 0.0003   Epoch: 18   Global Step: 96010   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:25:54,535-Speed 5376.59 samples/sec   Loss 1.6769   LearningRate 0.0003   Epoch: 18   Global Step: 96020   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:25:56,366-Speed 5596.60 samples/sec   Loss 1.6879   LearningRate 0.0003   Epoch: 18   Global Step: 96030   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:25:58,199-Speed 5588.01 samples/sec   Loss 1.7293   LearningRate 0.0003   Epoch: 18   Global Step: 96040   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:26:00,037-Speed 5574.61 samples/sec   Loss 1.7952   LearningRate 0.0003   Epoch: 18   Global Step: 96050   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:26:01,873-Speed 5577.50 samples/sec   Loss 1.7215   LearningRate 0.0003   Epoch: 18   Global Step: 96060   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:26:03,706-Speed 5590.05 samples/sec   Loss 1.6452   LearningRate 0.0003   Epoch: 18   Global Step: 96070   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:26:05,544-Speed 5572.37 samples/sec   Loss 1.6685   LearningRate 0.0003   Epoch: 18   Global Step: 96080   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:26:07,381-Speed 5576.70 samples/sec   Loss 1.7190   LearningRate 0.0003   Epoch: 18   Global Step: 96090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:09,272-Speed 5418.12 samples/sec   Loss 1.6523   LearningRate 0.0003   Epoch: 18   Global Step: 96100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:19,924-Speed 961.39 samples/sec   Loss 1.6288   LearningRate 0.0002   Epoch: 19   Global Step: 96110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:21,779-Speed 5523.72 samples/sec   Loss 1.5500   LearningRate 0.0002   Epoch: 19   Global Step: 96120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:23,614-Speed 5581.62 samples/sec   Loss 1.6171   LearningRate 0.0002   Epoch: 19   Global Step: 96130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:25,451-Speed 5575.15 samples/sec   Loss 1.5062   LearningRate 0.0002   Epoch: 19   Global Step: 96140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:27,548-Speed 4885.49 samples/sec   Loss 1.4894   LearningRate 0.0002   Epoch: 19   Global Step: 96150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:29,392-Speed 5554.56 samples/sec   Loss 1.5033   LearningRate 0.0002   Epoch: 19   Global Step: 96160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:31,233-Speed 5564.29 samples/sec   Loss 1.5522   LearningRate 0.0002   Epoch: 19   Global Step: 96170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:33,083-Speed 5536.52 samples/sec   Loss 1.5919   LearningRate 0.0002   Epoch: 19   Global Step: 96180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:34,931-Speed 5543.26 samples/sec   Loss 1.5657   LearningRate 0.0002   Epoch: 19   Global Step: 96190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:26:36,766-Speed 5583.79 samples/sec   Loss 1.4484   LearningRate 0.0002   Epoch: 19   Global Step: 96200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:26:38,617-Speed 5533.38 samples/sec   Loss 1.4633   LearningRate 0.0002   Epoch: 19   Global Step: 96210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:26:40,481-Speed 5495.59 samples/sec   Loss 1.5142   LearningRate 0.0002   Epoch: 19   Global Step: 96220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:42,329-Speed 5542.81 samples/sec   Loss 1.4413   LearningRate 0.0002   Epoch: 19   Global Step: 96230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:44,183-Speed 5525.56 samples/sec   Loss 1.4122   LearningRate 0.0002   Epoch: 19   Global Step: 96240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:46,024-Speed 5564.43 samples/sec   Loss 1.5488   LearningRate 0.0002   Epoch: 19   Global Step: 96250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:47,858-Speed 5585.82 samples/sec   Loss 1.5331   LearningRate 0.0002   Epoch: 19   Global Step: 96260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:49,696-Speed 5573.34 samples/sec   Loss 1.5835   LearningRate 0.0002   Epoch: 19   Global Step: 96270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:51,556-Speed 5506.05 samples/sec   Loss 1.5863   LearningRate 0.0002   Epoch: 19   Global Step: 96280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:53,400-Speed 5556.70 samples/sec   Loss 1.4578   LearningRate 0.0002   Epoch: 19   Global Step: 96290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:55,241-Speed 5561.91 samples/sec   Loss 1.5130   LearningRate 0.0002   Epoch: 19   Global Step: 96300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:57,074-Speed 5592.56 samples/sec   Loss 1.5679   LearningRate 0.0002   Epoch: 19   Global Step: 96310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:26:58,911-Speed 5574.78 samples/sec   Loss 1.5646   LearningRate 0.0002   Epoch: 19   Global Step: 96320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:27:00,763-Speed 5532.87 samples/sec   Loss 1.5487   LearningRate 0.0002   Epoch: 19   Global Step: 96330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:27:02,600-Speed 5574.17 samples/sec   Loss 1.5059   LearningRate 0.0002   Epoch: 19   Global Step: 96340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:27:04,442-Speed 5561.73 samples/sec   Loss 1.5336   LearningRate 0.0002   Epoch: 19   Global Step: 96350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:06,304-Speed 5503.68 samples/sec   Loss 1.5271   LearningRate 0.0002   Epoch: 19   Global Step: 96360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:08,148-Speed 5553.99 samples/sec   Loss 1.5063   LearningRate 0.0002   Epoch: 19   Global Step: 96370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:10,011-Speed 5499.16 samples/sec   Loss 1.4885   LearningRate 0.0002   Epoch: 19   Global Step: 96380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:11,898-Speed 5428.25 samples/sec   Loss 1.4566   LearningRate 0.0002   Epoch: 19   Global Step: 96390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:13,762-Speed 5495.23 samples/sec   Loss 1.6043   LearningRate 0.0002   Epoch: 19   Global Step: 96400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:15,600-Speed 5572.04 samples/sec   Loss 1.5409   LearningRate 0.0002   Epoch: 19   Global Step: 96410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:17,471-Speed 5477.54 samples/sec   Loss 1.5653   LearningRate 0.0002   Epoch: 19   Global Step: 96420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:19,303-Speed 5591.64 samples/sec   Loss 1.5728   LearningRate 0.0002   Epoch: 19   Global Step: 96430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:21,140-Speed 5576.39 samples/sec   Loss 1.4928   LearningRate 0.0002   Epoch: 19   Global Step: 96440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:22,968-Speed 5603.79 samples/sec   Loss 1.4695   LearningRate 0.0002   Epoch: 19   Global Step: 96450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:24,804-Speed 5577.68 samples/sec   Loss 1.4448   LearningRate 0.0002   Epoch: 19   Global Step: 96460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:26,660-Speed 5521.83 samples/sec   Loss 1.5919   LearningRate 0.0002   Epoch: 19   Global Step: 96470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:28,499-Speed 5568.24 samples/sec   Loss 1.5885   LearningRate 0.0002   Epoch: 19   Global Step: 96480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:30,332-Speed 5590.23 samples/sec   Loss 1.4721   LearningRate 0.0002   Epoch: 19   Global Step: 96490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:32,172-Speed 5567.46 samples/sec   Loss 1.5424   LearningRate 0.0002   Epoch: 19   Global Step: 96500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:34,007-Speed 5580.64 samples/sec   Loss 1.5276   LearningRate 0.0002   Epoch: 19   Global Step: 96510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:35,853-Speed 5550.44 samples/sec   Loss 1.4780   LearningRate 0.0002   Epoch: 19   Global Step: 96520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:37,709-Speed 5518.19 samples/sec   Loss 1.6754   LearningRate 0.0002   Epoch: 19   Global Step: 96530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:39,542-Speed 5591.03 samples/sec   Loss 1.4864   LearningRate 0.0002   Epoch: 19   Global Step: 96540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:41,374-Speed 5590.50 samples/sec   Loss 1.6298   LearningRate 0.0002   Epoch: 19   Global Step: 96550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:27:43,206-Speed 5591.75 samples/sec   Loss 1.6660   LearningRate 0.0002   Epoch: 19   Global Step: 96560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:27:45,038-Speed 5590.64 samples/sec   Loss 1.5445   LearningRate 0.0002   Epoch: 19   Global Step: 96570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:27:46,879-Speed 5565.47 samples/sec   Loss 1.5003   LearningRate 0.0002   Epoch: 19   Global Step: 96580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:27:48,709-Speed 5597.29 samples/sec   Loss 1.5549   LearningRate 0.0002   Epoch: 19   Global Step: 96590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:50,544-Speed 5582.13 samples/sec   Loss 1.4732   LearningRate 0.0002   Epoch: 19   Global Step: 96600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:52,401-Speed 5518.93 samples/sec   Loss 1.6774   LearningRate 0.0002   Epoch: 19   Global Step: 96610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:54,238-Speed 5574.88 samples/sec   Loss 1.5135   LearningRate 0.0002   Epoch: 19   Global Step: 96620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:56,068-Speed 5599.29 samples/sec   Loss 1.4696   LearningRate 0.0002   Epoch: 19   Global Step: 96630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:57,905-Speed 5575.91 samples/sec   Loss 1.5830   LearningRate 0.0002   Epoch: 19   Global Step: 96640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:27:59,742-Speed 5574.37 samples/sec   Loss 1.5306   LearningRate 0.0002   Epoch: 19   Global Step: 96650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:01,583-Speed 5564.90 samples/sec   Loss 1.5565   LearningRate 0.0002   Epoch: 19   Global Step: 96660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:03,431-Speed 5545.17 samples/sec   Loss 1.5637   LearningRate 0.0002   Epoch: 19   Global Step: 96670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:05,293-Speed 5501.90 samples/sec   Loss 1.5401   LearningRate 0.0002   Epoch: 19   Global Step: 96680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:07,125-Speed 5591.06 samples/sec   Loss 1.5558   LearningRate 0.0002   Epoch: 19   Global Step: 96690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:28:08,952-Speed 5606.48 samples/sec   Loss 1.4871   LearningRate 0.0002   Epoch: 19   Global Step: 96700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:10,793-Speed 5562.98 samples/sec   Loss 1.6032   LearningRate 0.0002   Epoch: 19   Global Step: 96710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:12,631-Speed 5574.98 samples/sec   Loss 1.5365   LearningRate 0.0002   Epoch: 19   Global Step: 96720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:14,467-Speed 5577.68 samples/sec   Loss 1.6064   LearningRate 0.0002   Epoch: 19   Global Step: 96730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:16,335-Speed 5485.33 samples/sec   Loss 1.5063   LearningRate 0.0002   Epoch: 19   Global Step: 96740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:18,166-Speed 5593.85 samples/sec   Loss 1.4849   LearningRate 0.0002   Epoch: 19   Global Step: 96750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:19,999-Speed 5587.84 samples/sec   Loss 1.6050   LearningRate 0.0002   Epoch: 19   Global Step: 96760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:21,842-Speed 5559.56 samples/sec   Loss 1.4584   LearningRate 0.0002   Epoch: 19   Global Step: 96770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:23,695-Speed 5530.28 samples/sec   Loss 1.6288   LearningRate 0.0002   Epoch: 19   Global Step: 96780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:25,529-Speed 5584.40 samples/sec   Loss 1.5922   LearningRate 0.0002   Epoch: 19   Global Step: 96790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:27,369-Speed 5567.25 samples/sec   Loss 1.5093   LearningRate 0.0002   Epoch: 19   Global Step: 96800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:29,200-Speed 5595.12 samples/sec   Loss 1.5577   LearningRate 0.0002   Epoch: 19   Global Step: 96810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:31,034-Speed 5586.77 samples/sec   Loss 1.5981   LearningRate 0.0002   Epoch: 19   Global Step: 96820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:32,888-Speed 5525.32 samples/sec   Loss 1.4584   LearningRate 0.0002   Epoch: 19   Global Step: 96830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:34,739-Speed 5531.46 samples/sec   Loss 1.5282   LearningRate 0.0002   Epoch: 19   Global Step: 96840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:36,577-Speed 5574.84 samples/sec   Loss 1.6049   LearningRate 0.0002   Epoch: 19   Global Step: 96850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:38,437-Speed 5507.26 samples/sec   Loss 1.5967   LearningRate 0.0002   Epoch: 19   Global Step: 96860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:40,277-Speed 5567.79 samples/sec   Loss 1.5644   LearningRate 0.0002   Epoch: 19   Global Step: 96870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:42,121-Speed 5553.59 samples/sec   Loss 1.4490   LearningRate 0.0002   Epoch: 19   Global Step: 96880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:43,964-Speed 5560.18 samples/sec   Loss 1.5956   LearningRate 0.0002   Epoch: 19   Global Step: 96890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:28:45,810-Speed 5547.50 samples/sec   Loss 1.5547   LearningRate 0.0002   Epoch: 19   Global Step: 96900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:28:47,646-Speed 5582.96 samples/sec   Loss 1.6425   LearningRate 0.0002   Epoch: 19   Global Step: 96910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:28:49,483-Speed 5575.51 samples/sec   Loss 1.5897   LearningRate 0.0002   Epoch: 19   Global Step: 96920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:28:51,328-Speed 5551.92 samples/sec   Loss 1.6153   LearningRate 0.0002   Epoch: 19   Global Step: 96930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:28:53,163-Speed 5581.04 samples/sec   Loss 1.5716   LearningRate 0.0002   Epoch: 19   Global Step: 96940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:28:54,997-Speed 5585.52 samples/sec   Loss 1.4954   LearningRate 0.0002   Epoch: 19   Global Step: 96950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:28:56,831-Speed 5586.14 samples/sec   Loss 1.5248   LearningRate 0.0002   Epoch: 19   Global Step: 96960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:28:58,681-Speed 5538.56 samples/sec   Loss 1.5471   LearningRate 0.0002   Epoch: 19   Global Step: 96970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:29:00,530-Speed 5539.67 samples/sec   Loss 1.4920   LearningRate 0.0002   Epoch: 19   Global Step: 96980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:29:02,371-Speed 5562.77 samples/sec   Loss 1.5504   LearningRate 0.0002   Epoch: 19   Global Step: 96990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:29:04,214-Speed 5558.43 samples/sec   Loss 1.5899   LearningRate 0.0002   Epoch: 19   Global Step: 97000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:29:06,055-Speed 5564.75 samples/sec   Loss 1.6381   LearningRate 0.0002   Epoch: 19   Global Step: 97010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:29:07,889-Speed 5586.22 samples/sec   Loss 1.5267   LearningRate 0.0002   Epoch: 19   Global Step: 97020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:29:09,714-Speed 5614.57 samples/sec   Loss 1.6197   LearningRate 0.0002   Epoch: 19   Global Step: 97030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:29:11,551-Speed 5577.39 samples/sec   Loss 1.5780   LearningRate 0.0002   Epoch: 19   Global Step: 97040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:29:13,389-Speed 5570.54 samples/sec   Loss 1.5218   LearningRate 0.0002   Epoch: 19   Global Step: 97050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:29:15,229-Speed 5567.35 samples/sec   Loss 1.4878   LearningRate 0.0002   Epoch: 19   Global Step: 97060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:29:17,070-Speed 5565.96 samples/sec   Loss 1.6445   LearningRate 0.0002   Epoch: 19   Global Step: 97070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:29:18,904-Speed 5585.23 samples/sec   Loss 1.5782   LearningRate 0.0002   Epoch: 19   Global Step: 97080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:29:20,736-Speed 5589.33 samples/sec   Loss 1.6008   LearningRate 0.0002   Epoch: 19   Global Step: 97090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:29:22,567-Speed 5595.08 samples/sec   Loss 1.5073   LearningRate 0.0002   Epoch: 19   Global Step: 97100   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:29:24,416-Speed 5542.29 samples/sec   Loss 1.6279   LearningRate 0.0002   Epoch: 19   Global Step: 97110   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:29:26,253-Speed 5575.44 samples/sec   Loss 1.6264   LearningRate 0.0002   Epoch: 19   Global Step: 97120   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:29:28,102-Speed 5540.93 samples/sec   Loss 1.6026   LearningRate 0.0002   Epoch: 19   Global Step: 97130   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:29:29,940-Speed 5574.06 samples/sec   Loss 1.5924   LearningRate 0.0002   Epoch: 19   Global Step: 97140   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:29:31,775-Speed 5587.57 samples/sec   Loss 1.5293   LearningRate 0.0002   Epoch: 19   Global Step: 97150   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:29:33,614-Speed 5571.94 samples/sec   Loss 1.5793   LearningRate 0.0002   Epoch: 19   Global Step: 97160   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:29:35,451-Speed 5576.34 samples/sec   Loss 1.5088   LearningRate 0.0002   Epoch: 19   Global Step: 97170   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:29:37,285-Speed 5585.33 samples/sec   Loss 1.5292   LearningRate 0.0002   Epoch: 19   Global Step: 97180   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:29:39,116-Speed 5593.00 samples/sec   Loss 1.6175   LearningRate 0.0002   Epoch: 19   Global Step: 97190   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:29:40,957-Speed 5565.23 samples/sec   Loss 1.5385   LearningRate 0.0002   Epoch: 19   Global Step: 97200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:29:42,793-Speed 5578.80 samples/sec   Loss 1.5468   LearningRate 0.0002   Epoch: 19   Global Step: 97210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:29:44,636-Speed 5558.07 samples/sec   Loss 1.5630   LearningRate 0.0002   Epoch: 19   Global Step: 97220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:29:46,483-Speed 5547.12 samples/sec   Loss 1.5253   LearningRate 0.0002   Epoch: 19   Global Step: 97230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:29:48,339-Speed 5519.30 samples/sec   Loss 1.5211   LearningRate 0.0002   Epoch: 19   Global Step: 97240   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:29:50,191-Speed 5533.09 samples/sec   Loss 1.4622   LearningRate 0.0001   Epoch: 19   Global Step: 97250   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:29:52,092-Speed 5388.71 samples/sec   Loss 1.5588   LearningRate 0.0001   Epoch: 19   Global Step: 97260   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:29:53,953-Speed 5502.59 samples/sec   Loss 1.5096   LearningRate 0.0001   Epoch: 19   Global Step: 97270   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:29:55,793-Speed 5568.28 samples/sec   Loss 1.4815   LearningRate 0.0001   Epoch: 19   Global Step: 97280   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:29:57,633-Speed 5568.74 samples/sec   Loss 1.5897   LearningRate 0.0001   Epoch: 19   Global Step: 97290   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:29:59,490-Speed 5514.26 samples/sec   Loss 1.5691   LearningRate 0.0001   Epoch: 19   Global Step: 97300   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:30:01,351-Speed 5506.82 samples/sec   Loss 1.5362   LearningRate 0.0001   Epoch: 19   Global Step: 97310   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:30:03,205-Speed 5523.64 samples/sec   Loss 1.5433   LearningRate 0.0001   Epoch: 19   Global Step: 97320   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:30:05,038-Speed 5588.48 samples/sec   Loss 1.6234   LearningRate 0.0001   Epoch: 19   Global Step: 97330   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:30:06,873-Speed 5585.02 samples/sec   Loss 1.4902   LearningRate 0.0001   Epoch: 19   Global Step: 97340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:08,709-Speed 5579.51 samples/sec   Loss 1.6154   LearningRate 0.0001   Epoch: 19   Global Step: 97350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:10,549-Speed 5564.96 samples/sec   Loss 1.5567   LearningRate 0.0001   Epoch: 19   Global Step: 97360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:12,387-Speed 5574.92 samples/sec   Loss 1.5868   LearningRate 0.0001   Epoch: 19   Global Step: 97370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:14,228-Speed 5562.34 samples/sec   Loss 1.5264   LearningRate 0.0001   Epoch: 19   Global Step: 97380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:16,064-Speed 5581.15 samples/sec   Loss 1.4759   LearningRate 0.0001   Epoch: 19   Global Step: 97390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:17,904-Speed 5566.51 samples/sec   Loss 1.5689   LearningRate 0.0001   Epoch: 19   Global Step: 97400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:19,738-Speed 5586.84 samples/sec   Loss 1.5749   LearningRate 0.0001   Epoch: 19   Global Step: 97410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:21,573-Speed 5581.96 samples/sec   Loss 1.5145   LearningRate 0.0001   Epoch: 19   Global Step: 97420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:23,407-Speed 5586.72 samples/sec   Loss 1.5260   LearningRate 0.0001   Epoch: 19   Global Step: 97430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:25,254-Speed 5546.01 samples/sec   Loss 1.5888   LearningRate 0.0001   Epoch: 19   Global Step: 97440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:30:27,089-Speed 5584.14 samples/sec   Loss 1.5807   LearningRate 0.0001   Epoch: 19   Global Step: 97450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:28,923-Speed 5584.83 samples/sec   Loss 1.5078   LearningRate 0.0001   Epoch: 19   Global Step: 97460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:30,758-Speed 5580.28 samples/sec   Loss 1.5475   LearningRate 0.0001   Epoch: 19   Global Step: 97470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:32,593-Speed 5583.19 samples/sec   Loss 1.5042   LearningRate 0.0001   Epoch: 19   Global Step: 97480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:34,435-Speed 5562.22 samples/sec   Loss 1.4937   LearningRate 0.0001   Epoch: 19   Global Step: 97490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:36,278-Speed 5559.00 samples/sec   Loss 1.5787   LearningRate 0.0001   Epoch: 19   Global Step: 97500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:38,134-Speed 5517.26 samples/sec   Loss 1.4576   LearningRate 0.0001   Epoch: 19   Global Step: 97510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:39,971-Speed 5576.68 samples/sec   Loss 1.6353   LearningRate 0.0001   Epoch: 19   Global Step: 97520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:41,821-Speed 5537.55 samples/sec   Loss 1.5839   LearningRate 0.0001   Epoch: 19   Global Step: 97530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:43,657-Speed 5580.41 samples/sec   Loss 1.3945   LearningRate 0.0001   Epoch: 19   Global Step: 97540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:45,478-Speed 5623.94 samples/sec   Loss 1.5067   LearningRate 0.0001   Epoch: 19   Global Step: 97550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:47,314-Speed 5579.99 samples/sec   Loss 1.5166   LearningRate 0.0001   Epoch: 19   Global Step: 97560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:49,153-Speed 5572.18 samples/sec   Loss 1.5265   LearningRate 0.0001   Epoch: 19   Global Step: 97570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:50,990-Speed 5573.70 samples/sec   Loss 1.5782   LearningRate 0.0001   Epoch: 19   Global Step: 97580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:52,827-Speed 5578.19 samples/sec   Loss 1.5006   LearningRate 0.0001   Epoch: 19   Global Step: 97590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:54,666-Speed 5568.76 samples/sec   Loss 1.5237   LearningRate 0.0001   Epoch: 19   Global Step: 97600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:56,510-Speed 5558.30 samples/sec   Loss 1.5298   LearningRate 0.0001   Epoch: 19   Global Step: 97610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:30:58,342-Speed 5590.37 samples/sec   Loss 1.4784   LearningRate 0.0001   Epoch: 19   Global Step: 97620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:00,178-Speed 5578.17 samples/sec   Loss 1.5498   LearningRate 0.0001   Epoch: 19   Global Step: 97630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:02,045-Speed 5486.87 samples/sec   Loss 1.5968   LearningRate 0.0001   Epoch: 19   Global Step: 97640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:03,924-Speed 5451.71 samples/sec   Loss 1.5192   LearningRate 0.0001   Epoch: 19   Global Step: 97650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:31:05,764-Speed 5569.90 samples/sec   Loss 1.5549   LearningRate 0.0001   Epoch: 19   Global Step: 97660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:07,606-Speed 5561.72 samples/sec   Loss 1.5322   LearningRate 0.0001   Epoch: 19   Global Step: 97670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:09,439-Speed 5589.17 samples/sec   Loss 1.4695   LearningRate 0.0001   Epoch: 19   Global Step: 97680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:11,281-Speed 5559.92 samples/sec   Loss 1.5265   LearningRate 0.0001   Epoch: 19   Global Step: 97690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:13,143-Speed 5501.40 samples/sec   Loss 1.5530   LearningRate 0.0001   Epoch: 19   Global Step: 97700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:14,984-Speed 5562.90 samples/sec   Loss 1.5590   LearningRate 0.0001   Epoch: 19   Global Step: 97710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:16,830-Speed 5551.51 samples/sec   Loss 1.5534   LearningRate 0.0001   Epoch: 19   Global Step: 97720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:18,674-Speed 5554.60 samples/sec   Loss 1.5098   LearningRate 0.0001   Epoch: 19   Global Step: 97730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:20,507-Speed 5588.21 samples/sec   Loss 1.5702   LearningRate 0.0001   Epoch: 19   Global Step: 97740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:22,353-Speed 5548.39 samples/sec   Loss 1.5424   LearningRate 0.0001   Epoch: 19   Global Step: 97750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:24,192-Speed 5570.84 samples/sec   Loss 1.5917   LearningRate 0.0001   Epoch: 19   Global Step: 97760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:31:26,053-Speed 5503.95 samples/sec   Loss 1.6055   LearningRate 0.0001   Epoch: 19   Global Step: 97770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:27,882-Speed 5602.83 samples/sec   Loss 1.5538   LearningRate 0.0001   Epoch: 19   Global Step: 97780   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:31:29,721-Speed 5568.43 samples/sec   Loss 1.5859   LearningRate 0.0001   Epoch: 19   Global Step: 97790   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:31:31,556-Speed 5584.61 samples/sec   Loss 1.5049   LearningRate 0.0001   Epoch: 19   Global Step: 97800   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:31:33,394-Speed 5572.68 samples/sec   Loss 1.4840   LearningRate 0.0001   Epoch: 19   Global Step: 97810   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:31:35,231-Speed 5576.78 samples/sec   Loss 1.5672   LearningRate 0.0001   Epoch: 19   Global Step: 97820   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:31:37,073-Speed 5562.64 samples/sec   Loss 1.5252   LearningRate 0.0001   Epoch: 19   Global Step: 97830   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:31:38,914-Speed 5562.65 samples/sec   Loss 1.6160   LearningRate 0.0001   Epoch: 19   Global Step: 97840   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:31:40,748-Speed 5586.10 samples/sec   Loss 1.6245   LearningRate 0.0001   Epoch: 19   Global Step: 97850   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:31:42,591-Speed 5557.70 samples/sec   Loss 1.5544   LearningRate 0.0001   Epoch: 19   Global Step: 97860   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:31:44,429-Speed 5573.71 samples/sec   Loss 1.5147   LearningRate 0.0001   Epoch: 19   Global Step: 97870   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:31:46,287-Speed 5514.43 samples/sec   Loss 1.5191   LearningRate 0.0001   Epoch: 19   Global Step: 97880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:48,134-Speed 5543.93 samples/sec   Loss 1.5384   LearningRate 0.0001   Epoch: 19   Global Step: 97890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:50,036-Speed 5386.55 samples/sec   Loss 1.5537   LearningRate 0.0001   Epoch: 19   Global Step: 97900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:51,888-Speed 5533.48 samples/sec   Loss 1.4853   LearningRate 0.0001   Epoch: 19   Global Step: 97910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:53,733-Speed 5552.84 samples/sec   Loss 1.5143   LearningRate 0.0001   Epoch: 19   Global Step: 97920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:55,571-Speed 5572.31 samples/sec   Loss 1.5523   LearningRate 0.0001   Epoch: 19   Global Step: 97930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:57,416-Speed 5552.61 samples/sec   Loss 1.5639   LearningRate 0.0001   Epoch: 19   Global Step: 97940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:31:59,251-Speed 5580.81 samples/sec   Loss 1.5381   LearningRate 0.0001   Epoch: 19   Global Step: 97950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:32:01,093-Speed 5562.36 samples/sec   Loss 1.5348   LearningRate 0.0001   Epoch: 19   Global Step: 97960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:32:02,929-Speed 5578.45 samples/sec   Loss 1.5863   LearningRate 0.0001   Epoch: 19   Global Step: 97970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:32:04,795-Speed 5489.16 samples/sec   Loss 1.5208   LearningRate 0.0001   Epoch: 19   Global Step: 97980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:32:06,640-Speed 5554.18 samples/sec   Loss 1.5683   LearningRate 0.0001   Epoch: 19   Global Step: 97990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:32:08,479-Speed 5570.41 samples/sec   Loss 1.5615   LearningRate 0.0001   Epoch: 19   Global Step: 98000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:32:34,903-[lfw][98000]XNorm: 22.373969
Training: 2022-04-11 16:32:34,904-[lfw][98000]Accuracy-Flip: 0.99817+-0.00229
Training: 2022-04-11 16:32:34,904-[lfw][98000]Accuracy-Highest: 0.99817
Training: 2022-04-11 16:33:05,524-[cfp_fp][98000]XNorm: 21.498835
Training: 2022-04-11 16:33:05,525-[cfp_fp][98000]Accuracy-Flip: 0.98529+-0.00527
Training: 2022-04-11 16:33:05,525-[cfp_fp][98000]Accuracy-Highest: 0.98529
Training: 2022-04-11 16:33:31,867-[agedb_30][98000]XNorm: 22.535583
Training: 2022-04-11 16:33:31,867-[agedb_30][98000]Accuracy-Flip: 0.98450+-0.00654
Training: 2022-04-11 16:33:31,868-[agedb_30][98000]Accuracy-Highest: 0.98450
Training: 2022-04-11 16:33:33,702-Speed 120.16 samples/sec   Loss 1.5491   LearningRate 0.0001   Epoch: 19   Global Step: 98010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:33:35,529-Speed 5608.09 samples/sec   Loss 1.5762   LearningRate 0.0001   Epoch: 19   Global Step: 98020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:33:37,357-Speed 5603.72 samples/sec   Loss 1.5689   LearningRate 0.0001   Epoch: 19   Global Step: 98030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:33:39,186-Speed 5598.55 samples/sec   Loss 1.5942   LearningRate 0.0001   Epoch: 19   Global Step: 98040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:33:41,017-Speed 5593.99 samples/sec   Loss 1.4326   LearningRate 0.0001   Epoch: 19   Global Step: 98050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:33:42,848-Speed 5596.31 samples/sec   Loss 1.5950   LearningRate 0.0001   Epoch: 19   Global Step: 98060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:33:44,683-Speed 5582.73 samples/sec   Loss 1.4537   LearningRate 0.0001   Epoch: 19   Global Step: 98070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:33:46,525-Speed 5561.45 samples/sec   Loss 1.4576   LearningRate 0.0001   Epoch: 19   Global Step: 98080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:33:48,365-Speed 5566.67 samples/sec   Loss 1.5298   LearningRate 0.0001   Epoch: 19   Global Step: 98090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:33:50,197-Speed 5592.07 samples/sec   Loss 1.5708   LearningRate 0.0001   Epoch: 19   Global Step: 98100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:33:52,031-Speed 5584.39 samples/sec   Loss 1.5289   LearningRate 0.0001   Epoch: 19   Global Step: 98110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:33:53,869-Speed 5574.65 samples/sec   Loss 1.5788   LearningRate 0.0001   Epoch: 19   Global Step: 98120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:33:55,707-Speed 5571.70 samples/sec   Loss 1.5150   LearningRate 0.0001   Epoch: 19   Global Step: 98130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:33:57,540-Speed 5590.95 samples/sec   Loss 1.5046   LearningRate 0.0001   Epoch: 19   Global Step: 98140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:33:59,381-Speed 5564.62 samples/sec   Loss 1.5839   LearningRate 0.0001   Epoch: 19   Global Step: 98150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:01,232-Speed 5532.97 samples/sec   Loss 1.5878   LearningRate 0.0001   Epoch: 19   Global Step: 98160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:03,066-Speed 5585.03 samples/sec   Loss 1.5751   LearningRate 0.0001   Epoch: 19   Global Step: 98170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:04,902-Speed 5578.71 samples/sec   Loss 1.5056   LearningRate 0.0001   Epoch: 19   Global Step: 98180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:06,738-Speed 5581.55 samples/sec   Loss 1.5489   LearningRate 0.0001   Epoch: 19   Global Step: 98190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:08,570-Speed 5589.88 samples/sec   Loss 1.5553   LearningRate 0.0001   Epoch: 19   Global Step: 98200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:10,409-Speed 5570.04 samples/sec   Loss 1.5398   LearningRate 0.0001   Epoch: 19   Global Step: 98210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:12,244-Speed 5584.64 samples/sec   Loss 1.6236   LearningRate 0.0001   Epoch: 19   Global Step: 98220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:34:14,083-Speed 5568.70 samples/sec   Loss 1.4875   LearningRate 0.0001   Epoch: 19   Global Step: 98230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:34:15,911-Speed 5606.13 samples/sec   Loss 1.5319   LearningRate 0.0001   Epoch: 19   Global Step: 98240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:17,756-Speed 5552.58 samples/sec   Loss 1.5957   LearningRate 0.0001   Epoch: 19   Global Step: 98250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:19,590-Speed 5585.45 samples/sec   Loss 1.5612   LearningRate 0.0001   Epoch: 19   Global Step: 98260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:21,430-Speed 5565.15 samples/sec   Loss 1.5654   LearningRate 0.0001   Epoch: 19   Global Step: 98270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:23,266-Speed 5580.15 samples/sec   Loss 1.5554   LearningRate 0.0001   Epoch: 19   Global Step: 98280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:25,124-Speed 5513.35 samples/sec   Loss 1.5819   LearningRate 0.0001   Epoch: 19   Global Step: 98290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:26,962-Speed 5575.78 samples/sec   Loss 1.5464   LearningRate 0.0001   Epoch: 19   Global Step: 98300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:28,789-Speed 5604.79 samples/sec   Loss 1.4836   LearningRate 0.0001   Epoch: 19   Global Step: 98310   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:34:30,633-Speed 5556.37 samples/sec   Loss 1.5671   LearningRate 0.0001   Epoch: 19   Global Step: 98320   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:34:32,469-Speed 5580.76 samples/sec   Loss 1.5847   LearningRate 0.0001   Epoch: 19   Global Step: 98330   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:34:34,327-Speed 5512.84 samples/sec   Loss 1.5092   LearningRate 0.0001   Epoch: 19   Global Step: 98340   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:34:36,177-Speed 5537.00 samples/sec   Loss 1.4904   LearningRate 0.0001   Epoch: 19   Global Step: 98350   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:34:38,041-Speed 5496.09 samples/sec   Loss 1.4984   LearningRate 0.0001   Epoch: 19   Global Step: 98360   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:34:39,905-Speed 5496.95 samples/sec   Loss 1.4611   LearningRate 0.0001   Epoch: 19   Global Step: 98370   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:34:41,752-Speed 5544.82 samples/sec   Loss 1.6575   LearningRate 0.0001   Epoch: 19   Global Step: 98380   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:34:43,592-Speed 5568.99 samples/sec   Loss 1.5046   LearningRate 0.0001   Epoch: 19   Global Step: 98390   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:34:45,432-Speed 5565.26 samples/sec   Loss 1.6495   LearningRate 0.0001   Epoch: 19   Global Step: 98400   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:34:47,280-Speed 5543.59 samples/sec   Loss 1.5218   LearningRate 0.0001   Epoch: 19   Global Step: 98410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:49,120-Speed 5567.13 samples/sec   Loss 1.5482   LearningRate 0.0001   Epoch: 19   Global Step: 98420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:50,962-Speed 5561.63 samples/sec   Loss 1.5098   LearningRate 0.0001   Epoch: 19   Global Step: 98430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:52,802-Speed 5568.69 samples/sec   Loss 1.5826   LearningRate 0.0001   Epoch: 19   Global Step: 98440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:54,642-Speed 5565.03 samples/sec   Loss 1.5873   LearningRate 0.0001   Epoch: 19   Global Step: 98450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:56,493-Speed 5536.36 samples/sec   Loss 1.6411   LearningRate 0.0001   Epoch: 19   Global Step: 98460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:34:58,336-Speed 5559.02 samples/sec   Loss 1.6501   LearningRate 0.0001   Epoch: 19   Global Step: 98470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:00,174-Speed 5572.93 samples/sec   Loss 1.5320   LearningRate 0.0001   Epoch: 19   Global Step: 98480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:02,037-Speed 5497.01 samples/sec   Loss 1.4210   LearningRate 0.0001   Epoch: 19   Global Step: 98490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:03,903-Speed 5492.03 samples/sec   Loss 1.5486   LearningRate 0.0001   Epoch: 19   Global Step: 98500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:05,788-Speed 5432.91 samples/sec   Loss 1.5599   LearningRate 0.0001   Epoch: 19   Global Step: 98510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:35:07,639-Speed 5535.54 samples/sec   Loss 1.5070   LearningRate 0.0001   Epoch: 19   Global Step: 98520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:35:09,467-Speed 5601.46 samples/sec   Loss 1.6505   LearningRate 0.0001   Epoch: 19   Global Step: 98530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:11,308-Speed 5566.12 samples/sec   Loss 1.5561   LearningRate 0.0001   Epoch: 19   Global Step: 98540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:13,160-Speed 5532.54 samples/sec   Loss 1.5003   LearningRate 0.0001   Epoch: 19   Global Step: 98550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:15,017-Speed 5513.90 samples/sec   Loss 1.5340   LearningRate 0.0001   Epoch: 19   Global Step: 98560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:16,872-Speed 5523.22 samples/sec   Loss 1.4834   LearningRate 0.0001   Epoch: 19   Global Step: 98570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:18,713-Speed 5565.91 samples/sec   Loss 1.6588   LearningRate 0.0001   Epoch: 19   Global Step: 98580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:20,546-Speed 5587.12 samples/sec   Loss 1.5161   LearningRate 0.0001   Epoch: 19   Global Step: 98590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:22,380-Speed 5587.15 samples/sec   Loss 1.6025   LearningRate 0.0001   Epoch: 19   Global Step: 98600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:24,229-Speed 5539.81 samples/sec   Loss 1.5274   LearningRate 0.0001   Epoch: 19   Global Step: 98610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:26,064-Speed 5580.71 samples/sec   Loss 1.5029   LearningRate 0.0001   Epoch: 19   Global Step: 98620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:27,901-Speed 5578.37 samples/sec   Loss 1.5332   LearningRate 0.0001   Epoch: 19   Global Step: 98630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:35:29,741-Speed 5566.52 samples/sec   Loss 1.6062   LearningRate 0.0001   Epoch: 19   Global Step: 98640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:35:31,571-Speed 5595.69 samples/sec   Loss 1.6063   LearningRate 0.0001   Epoch: 19   Global Step: 98650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:33,405-Speed 5587.50 samples/sec   Loss 1.4816   LearningRate 0.0001   Epoch: 19   Global Step: 98660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:35,247-Speed 5560.95 samples/sec   Loss 1.5309   LearningRate 0.0001   Epoch: 19   Global Step: 98670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:37,079-Speed 5593.98 samples/sec   Loss 1.6012   LearningRate 0.0001   Epoch: 19   Global Step: 98680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:38,923-Speed 5553.73 samples/sec   Loss 1.5664   LearningRate 0.0001   Epoch: 19   Global Step: 98690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:40,777-Speed 5525.44 samples/sec   Loss 1.5748   LearningRate 0.0001   Epoch: 19   Global Step: 98700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:42,615-Speed 5573.85 samples/sec   Loss 1.5483   LearningRate 0.0001   Epoch: 19   Global Step: 98710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:44,456-Speed 5565.53 samples/sec   Loss 1.5462   LearningRate 0.0001   Epoch: 19   Global Step: 98720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:46,315-Speed 5507.54 samples/sec   Loss 1.6601   LearningRate 0.0001   Epoch: 19   Global Step: 98730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:48,150-Speed 5586.03 samples/sec   Loss 1.4587   LearningRate 0.0001   Epoch: 19   Global Step: 98740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:49,986-Speed 5576.48 samples/sec   Loss 1.6492   LearningRate 0.0001   Epoch: 19   Global Step: 98750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:35:51,833-Speed 5548.09 samples/sec   Loss 1.5210   LearningRate 0.0001   Epoch: 19   Global Step: 98760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:35:53,657-Speed 5614.32 samples/sec   Loss 1.5600   LearningRate 0.0001   Epoch: 19   Global Step: 98770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:55,494-Speed 5577.87 samples/sec   Loss 1.5662   LearningRate 0.0001   Epoch: 19   Global Step: 98780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:57,334-Speed 5566.74 samples/sec   Loss 1.5276   LearningRate 0.0001   Epoch: 19   Global Step: 98790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:35:59,169-Speed 5584.89 samples/sec   Loss 1.5091   LearningRate 0.0001   Epoch: 19   Global Step: 98800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:01,003-Speed 5585.25 samples/sec   Loss 1.5553   LearningRate 0.0001   Epoch: 19   Global Step: 98810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:02,849-Speed 5549.30 samples/sec   Loss 1.5425   LearningRate 0.0001   Epoch: 19   Global Step: 98820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:04,687-Speed 5573.97 samples/sec   Loss 1.5734   LearningRate 0.0001   Epoch: 19   Global Step: 98830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:06,530-Speed 5555.57 samples/sec   Loss 1.4738   LearningRate 0.0001   Epoch: 19   Global Step: 98840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:08,374-Speed 5558.69 samples/sec   Loss 1.5950   LearningRate 0.0001   Epoch: 19   Global Step: 98850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:10,205-Speed 5591.77 samples/sec   Loss 1.5419   LearningRate 0.0001   Epoch: 19   Global Step: 98860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:12,048-Speed 5561.06 samples/sec   Loss 1.6811   LearningRate 0.0001   Epoch: 19   Global Step: 98870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:36:13,889-Speed 5563.14 samples/sec   Loss 1.5677   LearningRate 0.0001   Epoch: 19   Global Step: 98880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:36:15,718-Speed 5598.55 samples/sec   Loss 1.5148   LearningRate 0.0001   Epoch: 19   Global Step: 98890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:17,559-Speed 5567.64 samples/sec   Loss 1.5549   LearningRate 0.0000   Epoch: 19   Global Step: 98900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:19,421-Speed 5502.03 samples/sec   Loss 1.5636   LearningRate 0.0000   Epoch: 19   Global Step: 98910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:21,255-Speed 5582.83 samples/sec   Loss 1.5323   LearningRate 0.0000   Epoch: 19   Global Step: 98920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:23,087-Speed 5594.11 samples/sec   Loss 1.5042   LearningRate 0.0000   Epoch: 19   Global Step: 98930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:24,938-Speed 5534.51 samples/sec   Loss 1.5433   LearningRate 0.0000   Epoch: 19   Global Step: 98940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:26,771-Speed 5586.74 samples/sec   Loss 1.5973   LearningRate 0.0000   Epoch: 19   Global Step: 98950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:28,608-Speed 5577.36 samples/sec   Loss 1.5377   LearningRate 0.0000   Epoch: 19   Global Step: 98960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:30,446-Speed 5573.69 samples/sec   Loss 1.5308   LearningRate 0.0000   Epoch: 19   Global Step: 98970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:32,282-Speed 5577.88 samples/sec   Loss 1.5760   LearningRate 0.0000   Epoch: 19   Global Step: 98980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:34,112-Speed 5598.52 samples/sec   Loss 1.5666   LearningRate 0.0000   Epoch: 19   Global Step: 98990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:35,943-Speed 5595.36 samples/sec   Loss 1.6505   LearningRate 0.0000   Epoch: 19   Global Step: 99000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:37,789-Speed 5546.83 samples/sec   Loss 1.5794   LearningRate 0.0000   Epoch: 19   Global Step: 99010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:39,630-Speed 5565.64 samples/sec   Loss 1.5711   LearningRate 0.0000   Epoch: 19   Global Step: 99020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:41,467-Speed 5576.37 samples/sec   Loss 1.5869   LearningRate 0.0000   Epoch: 19   Global Step: 99030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:43,305-Speed 5575.03 samples/sec   Loss 1.5488   LearningRate 0.0000   Epoch: 19   Global Step: 99040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:45,142-Speed 5576.23 samples/sec   Loss 1.4894   LearningRate 0.0000   Epoch: 19   Global Step: 99050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:46,981-Speed 5568.22 samples/sec   Loss 1.6079   LearningRate 0.0000   Epoch: 19   Global Step: 99060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:48,842-Speed 5507.08 samples/sec   Loss 1.5433   LearningRate 0.0000   Epoch: 19   Global Step: 99070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:50,680-Speed 5570.38 samples/sec   Loss 1.4866   LearningRate 0.0000   Epoch: 19   Global Step: 99080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:52,516-Speed 5581.58 samples/sec   Loss 1.6010   LearningRate 0.0000   Epoch: 19   Global Step: 99090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:36:54,351-Speed 5582.56 samples/sec   Loss 1.5711   LearningRate 0.0000   Epoch: 19   Global Step: 99100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:56,201-Speed 5535.07 samples/sec   Loss 1.5629   LearningRate 0.0000   Epoch: 19   Global Step: 99110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:58,037-Speed 5581.84 samples/sec   Loss 1.4897   LearningRate 0.0000   Epoch: 19   Global Step: 99120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:36:59,893-Speed 5519.59 samples/sec   Loss 1.5854   LearningRate 0.0000   Epoch: 19   Global Step: 99130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:01,730-Speed 5576.40 samples/sec   Loss 1.4849   LearningRate 0.0000   Epoch: 19   Global Step: 99140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:03,569-Speed 5570.04 samples/sec   Loss 1.4680   LearningRate 0.0000   Epoch: 19   Global Step: 99150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:05,413-Speed 5555.16 samples/sec   Loss 1.4905   LearningRate 0.0000   Epoch: 19   Global Step: 99160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:07,248-Speed 5584.35 samples/sec   Loss 1.5755   LearningRate 0.0000   Epoch: 19   Global Step: 99170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:09,083-Speed 5580.13 samples/sec   Loss 1.6214   LearningRate 0.0000   Epoch: 19   Global Step: 99180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:10,937-Speed 5526.49 samples/sec   Loss 1.5123   LearningRate 0.0000   Epoch: 19   Global Step: 99190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:12,793-Speed 5518.72 samples/sec   Loss 1.4790   LearningRate 0.0000   Epoch: 19   Global Step: 99200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:37:14,670-Speed 5457.80 samples/sec   Loss 1.5239   LearningRate 0.0000   Epoch: 19   Global Step: 99210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:37:16,546-Speed 5458.18 samples/sec   Loss 1.5783   LearningRate 0.0000   Epoch: 19   Global Step: 99220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:37:18,370-Speed 5623.88 samples/sec   Loss 1.5448   LearningRate 0.0000   Epoch: 19   Global Step: 99230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:20,196-Speed 5609.98 samples/sec   Loss 1.4737   LearningRate 0.0000   Epoch: 19   Global Step: 99240   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:37:22,069-Speed 5468.37 samples/sec   Loss 1.5059   LearningRate 0.0000   Epoch: 19   Global Step: 99250   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:37:23,928-Speed 5510.62 samples/sec   Loss 1.5371   LearningRate 0.0000   Epoch: 19   Global Step: 99260   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:37:25,777-Speed 5541.96 samples/sec   Loss 1.6095   LearningRate 0.0000   Epoch: 19   Global Step: 99270   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:37:27,633-Speed 5518.12 samples/sec   Loss 1.4999   LearningRate 0.0000   Epoch: 19   Global Step: 99280   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:37:29,500-Speed 5487.39 samples/sec   Loss 1.5028   LearningRate 0.0000   Epoch: 19   Global Step: 99290   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:37:31,356-Speed 5520.42 samples/sec   Loss 1.5398   LearningRate 0.0000   Epoch: 19   Global Step: 99300   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:37:33,194-Speed 5571.17 samples/sec   Loss 1.4506   LearningRate 0.0000   Epoch: 19   Global Step: 99310   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:37:35,052-Speed 5514.65 samples/sec   Loss 1.6072   LearningRate 0.0000   Epoch: 19   Global Step: 99320   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:37:36,901-Speed 5539.79 samples/sec   Loss 1.5243   LearningRate 0.0000   Epoch: 19   Global Step: 99330   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:37:38,749-Speed 5544.35 samples/sec   Loss 1.5526   LearningRate 0.0000   Epoch: 19   Global Step: 99340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:40,588-Speed 5570.15 samples/sec   Loss 1.5271   LearningRate 0.0000   Epoch: 19   Global Step: 99350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:42,420-Speed 5591.83 samples/sec   Loss 1.4992   LearningRate 0.0000   Epoch: 19   Global Step: 99360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:44,254-Speed 5586.23 samples/sec   Loss 1.6003   LearningRate 0.0000   Epoch: 19   Global Step: 99370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:46,099-Speed 5551.47 samples/sec   Loss 1.5242   LearningRate 0.0000   Epoch: 19   Global Step: 99380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:47,946-Speed 5546.84 samples/sec   Loss 1.5389   LearningRate 0.0000   Epoch: 19   Global Step: 99390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:49,782-Speed 5576.98 samples/sec   Loss 1.5509   LearningRate 0.0000   Epoch: 19   Global Step: 99400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:51,625-Speed 5559.51 samples/sec   Loss 1.5213   LearningRate 0.0000   Epoch: 19   Global Step: 99410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:53,463-Speed 5573.22 samples/sec   Loss 1.5248   LearningRate 0.0000   Epoch: 19   Global Step: 99420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:55,307-Speed 5556.17 samples/sec   Loss 1.4947   LearningRate 0.0000   Epoch: 19   Global Step: 99430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:37:57,145-Speed 5572.32 samples/sec   Loss 1.5268   LearningRate 0.0000   Epoch: 19   Global Step: 99440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:37:58,987-Speed 5562.34 samples/sec   Loss 1.4729   LearningRate 0.0000   Epoch: 19   Global Step: 99450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:38:00,823-Speed 5579.90 samples/sec   Loss 1.5950   LearningRate 0.0000   Epoch: 19   Global Step: 99460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:38:02,667-Speed 5554.26 samples/sec   Loss 1.5958   LearningRate 0.0000   Epoch: 19   Global Step: 99470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:38:04,501-Speed 5586.69 samples/sec   Loss 1.5574   LearningRate 0.0000   Epoch: 19   Global Step: 99480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:38:06,338-Speed 5575.65 samples/sec   Loss 1.5536   LearningRate 0.0000   Epoch: 19   Global Step: 99490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:38:08,171-Speed 5588.79 samples/sec   Loss 1.4935   LearningRate 0.0000   Epoch: 19   Global Step: 99500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:38:10,003-Speed 5591.11 samples/sec   Loss 1.5435   LearningRate 0.0000   Epoch: 19   Global Step: 99510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:11,863-Speed 5507.54 samples/sec   Loss 1.6333   LearningRate 0.0000   Epoch: 19   Global Step: 99520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:13,699-Speed 5578.45 samples/sec   Loss 1.6476   LearningRate 0.0000   Epoch: 19   Global Step: 99530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:15,543-Speed 5556.63 samples/sec   Loss 1.5138   LearningRate 0.0000   Epoch: 19   Global Step: 99540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:17,390-Speed 5547.51 samples/sec   Loss 1.5398   LearningRate 0.0000   Epoch: 19   Global Step: 99550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:19,240-Speed 5534.80 samples/sec   Loss 1.5649   LearningRate 0.0000   Epoch: 19   Global Step: 99560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:21,075-Speed 5583.58 samples/sec   Loss 1.5797   LearningRate 0.0000   Epoch: 19   Global Step: 99570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:22,915-Speed 5568.57 samples/sec   Loss 1.5230   LearningRate 0.0000   Epoch: 19   Global Step: 99580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:24,762-Speed 5544.60 samples/sec   Loss 1.5404   LearningRate 0.0000   Epoch: 19   Global Step: 99590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:26,606-Speed 5553.99 samples/sec   Loss 1.5551   LearningRate 0.0000   Epoch: 19   Global Step: 99600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:28,433-Speed 5609.93 samples/sec   Loss 1.5295   LearningRate 0.0000   Epoch: 19   Global Step: 99610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:30,265-Speed 5589.53 samples/sec   Loss 1.6056   LearningRate 0.0000   Epoch: 19   Global Step: 99620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:32,107-Speed 5560.63 samples/sec   Loss 1.5698   LearningRate 0.0000   Epoch: 19   Global Step: 99630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:33,943-Speed 5581.30 samples/sec   Loss 1.5406   LearningRate 0.0000   Epoch: 19   Global Step: 99640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:35,784-Speed 5564.58 samples/sec   Loss 1.4891   LearningRate 0.0000   Epoch: 19   Global Step: 99650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:37,622-Speed 5575.20 samples/sec   Loss 1.5352   LearningRate 0.0000   Epoch: 19   Global Step: 99660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:39,462-Speed 5565.52 samples/sec   Loss 1.6449   LearningRate 0.0000   Epoch: 19   Global Step: 99670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:41,305-Speed 5558.30 samples/sec   Loss 1.5306   LearningRate 0.0000   Epoch: 19   Global Step: 99680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:43,161-Speed 5520.85 samples/sec   Loss 1.5322   LearningRate 0.0000   Epoch: 19   Global Step: 99690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:44,997-Speed 5579.52 samples/sec   Loss 1.5114   LearningRate 0.0000   Epoch: 19   Global Step: 99700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:46,832-Speed 5581.96 samples/sec   Loss 1.5835   LearningRate 0.0000   Epoch: 19   Global Step: 99710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:38:48,659-Speed 5605.96 samples/sec   Loss 1.5428   LearningRate 0.0000   Epoch: 19   Global Step: 99720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:50,499-Speed 5568.38 samples/sec   Loss 1.6402   LearningRate 0.0000   Epoch: 19   Global Step: 99730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:52,336-Speed 5577.12 samples/sec   Loss 1.5448   LearningRate 0.0000   Epoch: 19   Global Step: 99740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:54,215-Speed 5451.04 samples/sec   Loss 1.4859   LearningRate 0.0000   Epoch: 19   Global Step: 99750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:56,050-Speed 5581.88 samples/sec   Loss 1.4122   LearningRate 0.0000   Epoch: 19   Global Step: 99760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:57,885-Speed 5582.99 samples/sec   Loss 1.5136   LearningRate 0.0000   Epoch: 19   Global Step: 99770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:38:59,721-Speed 5580.42 samples/sec   Loss 1.5696   LearningRate 0.0000   Epoch: 19   Global Step: 99780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:01,601-Speed 5447.22 samples/sec   Loss 1.6049   LearningRate 0.0000   Epoch: 19   Global Step: 99790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:03,458-Speed 5516.80 samples/sec   Loss 1.5307   LearningRate 0.0000   Epoch: 19   Global Step: 99800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:05,298-Speed 5567.31 samples/sec   Loss 1.5574   LearningRate 0.0000   Epoch: 19   Global Step: 99810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:07,137-Speed 5568.70 samples/sec   Loss 1.5579   LearningRate 0.0000   Epoch: 19   Global Step: 99820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:39:08,971-Speed 5586.92 samples/sec   Loss 1.5193   LearningRate 0.0000   Epoch: 19   Global Step: 99830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:39:10,798-Speed 5609.54 samples/sec   Loss 1.5862   LearningRate 0.0000   Epoch: 19   Global Step: 99840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:12,646-Speed 5543.63 samples/sec   Loss 1.4372   LearningRate 0.0000   Epoch: 19   Global Step: 99850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:14,500-Speed 5523.23 samples/sec   Loss 1.6092   LearningRate 0.0000   Epoch: 19   Global Step: 99860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:16,344-Speed 5555.73 samples/sec   Loss 1.5828   LearningRate 0.0000   Epoch: 19   Global Step: 99870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:18,199-Speed 5522.31 samples/sec   Loss 1.4493   LearningRate 0.0000   Epoch: 19   Global Step: 99880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:20,038-Speed 5571.18 samples/sec   Loss 1.5304   LearningRate 0.0000   Epoch: 19   Global Step: 99890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:21,883-Speed 5552.17 samples/sec   Loss 1.5508   LearningRate 0.0000   Epoch: 19   Global Step: 99900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:23,717-Speed 5584.39 samples/sec   Loss 1.5884   LearningRate 0.0000   Epoch: 19   Global Step: 99910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:25,550-Speed 5589.12 samples/sec   Loss 1.6476   LearningRate 0.0000   Epoch: 19   Global Step: 99920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:27,402-Speed 5531.65 samples/sec   Loss 1.5528   LearningRate 0.0000   Epoch: 19   Global Step: 99930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:29,263-Speed 5505.87 samples/sec   Loss 1.5267   LearningRate 0.0000   Epoch: 19   Global Step: 99940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:39:31,092-Speed 5598.94 samples/sec   Loss 1.5511   LearningRate 0.0000   Epoch: 19   Global Step: 99950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:32,932-Speed 5567.67 samples/sec   Loss 1.4862   LearningRate 0.0000   Epoch: 19   Global Step: 99960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:34,778-Speed 5550.21 samples/sec   Loss 1.6034   LearningRate 0.0000   Epoch: 19   Global Step: 99970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:36,626-Speed 5543.80 samples/sec   Loss 1.5801   LearningRate 0.0000   Epoch: 19   Global Step: 99980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:38,461-Speed 5581.24 samples/sec   Loss 1.4670   LearningRate 0.0000   Epoch: 19   Global Step: 99990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:39:40,305-Speed 5556.82 samples/sec   Loss 1.6267   LearningRate 0.0000   Epoch: 19   Global Step: 100000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:40:06,735-[lfw][100000]XNorm: 22.374677
Training: 2022-04-11 16:40:06,736-[lfw][100000]Accuracy-Flip: 0.99833+-0.00236
Training: 2022-04-11 16:40:06,736-[lfw][100000]Accuracy-Highest: 0.99833
Training: 2022-04-11 16:40:37,201-[cfp_fp][100000]XNorm: 21.519639
Training: 2022-04-11 16:40:37,202-[cfp_fp][100000]Accuracy-Flip: 0.98543+-0.00545
Training: 2022-04-11 16:40:37,202-[cfp_fp][100000]Accuracy-Highest: 0.98543
Training: 2022-04-11 16:41:03,659-[agedb_30][100000]XNorm: 22.519361
Training: 2022-04-11 16:41:03,659-[agedb_30][100000]Accuracy-Flip: 0.98167+-0.00767
Training: 2022-04-11 16:41:03,660-[agedb_30][100000]Accuracy-Highest: 0.98450
Training: 2022-04-11 16:41:05,512-Speed 120.18 samples/sec   Loss 1.5613   LearningRate 0.0000   Epoch: 19   Global Step: 100010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:07,363-Speed 5534.74 samples/sec   Loss 1.5364   LearningRate 0.0000   Epoch: 19   Global Step: 100020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:09,204-Speed 5563.70 samples/sec   Loss 1.5536   LearningRate 0.0000   Epoch: 19   Global Step: 100030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:11,085-Speed 5445.87 samples/sec   Loss 1.6470   LearningRate 0.0000   Epoch: 19   Global Step: 100040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:12,932-Speed 5545.32 samples/sec   Loss 1.5129   LearningRate 0.0000   Epoch: 19   Global Step: 100050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:41:14,767-Speed 5583.49 samples/sec   Loss 1.4756   LearningRate 0.0000   Epoch: 19   Global Step: 100060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:41:16,605-Speed 5570.85 samples/sec   Loss 1.5512   LearningRate 0.0000   Epoch: 19   Global Step: 100070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:18,446-Speed 5563.76 samples/sec   Loss 1.5128   LearningRate 0.0000   Epoch: 19   Global Step: 100080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:20,272-Speed 5610.41 samples/sec   Loss 1.5129   LearningRate 0.0000   Epoch: 19   Global Step: 100090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:22,104-Speed 5591.65 samples/sec   Loss 1.5173   LearningRate 0.0000   Epoch: 19   Global Step: 100100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:23,943-Speed 5572.18 samples/sec   Loss 1.5575   LearningRate 0.0000   Epoch: 19   Global Step: 100110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:25,780-Speed 5576.86 samples/sec   Loss 1.6114   LearningRate 0.0000   Epoch: 19   Global Step: 100120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:27,611-Speed 5593.08 samples/sec   Loss 1.6246   LearningRate 0.0000   Epoch: 19   Global Step: 100130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:29,439-Speed 5604.59 samples/sec   Loss 1.4899   LearningRate 0.0000   Epoch: 19   Global Step: 100140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:31,279-Speed 5567.35 samples/sec   Loss 1.4984   LearningRate 0.0000   Epoch: 19   Global Step: 100150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:33,114-Speed 5583.59 samples/sec   Loss 1.5484   LearningRate 0.0000   Epoch: 19   Global Step: 100160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:34,940-Speed 5609.03 samples/sec   Loss 1.5152   LearningRate 0.0000   Epoch: 19   Global Step: 100170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:36,771-Speed 5596.28 samples/sec   Loss 1.5305   LearningRate 0.0000   Epoch: 19   Global Step: 100180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:38,602-Speed 5593.80 samples/sec   Loss 1.6447   LearningRate 0.0000   Epoch: 19   Global Step: 100190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:40,433-Speed 5593.40 samples/sec   Loss 1.5478   LearningRate 0.0000   Epoch: 19   Global Step: 100200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:42,269-Speed 5581.70 samples/sec   Loss 1.5400   LearningRate 0.0000   Epoch: 19   Global Step: 100210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:44,100-Speed 5595.70 samples/sec   Loss 1.5053   LearningRate 0.0000   Epoch: 19   Global Step: 100220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:45,932-Speed 5589.48 samples/sec   Loss 1.5701   LearningRate 0.0000   Epoch: 19   Global Step: 100230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:47,767-Speed 5581.32 samples/sec   Loss 1.6357   LearningRate 0.0000   Epoch: 19   Global Step: 100240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:49,646-Speed 5454.79 samples/sec   Loss 1.5208   LearningRate 0.0000   Epoch: 19   Global Step: 100250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:51,490-Speed 5554.27 samples/sec   Loss 1.5365   LearningRate 0.0000   Epoch: 19   Global Step: 100260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:53,318-Speed 5603.69 samples/sec   Loss 1.5162   LearningRate 0.0000   Epoch: 19   Global Step: 100270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:55,153-Speed 5585.08 samples/sec   Loss 1.4981   LearningRate 0.0000   Epoch: 19   Global Step: 100280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:56,987-Speed 5585.02 samples/sec   Loss 1.5505   LearningRate 0.0000   Epoch: 19   Global Step: 100290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:41:58,822-Speed 5583.53 samples/sec   Loss 1.5597   LearningRate 0.0000   Epoch: 19   Global Step: 100300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:00,650-Speed 5601.68 samples/sec   Loss 1.4801   LearningRate 0.0000   Epoch: 19   Global Step: 100310   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:42:02,483-Speed 5589.31 samples/sec   Loss 1.5708   LearningRate 0.0000   Epoch: 19   Global Step: 100320   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:42:04,328-Speed 5553.57 samples/sec   Loss 1.6345   LearningRate 0.0000   Epoch: 19   Global Step: 100330   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:42:06,160-Speed 5589.16 samples/sec   Loss 1.4848   LearningRate 0.0000   Epoch: 19   Global Step: 100340   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:42:07,993-Speed 5590.67 samples/sec   Loss 1.5180   LearningRate 0.0000   Epoch: 19   Global Step: 100350   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:42:09,833-Speed 5566.34 samples/sec   Loss 1.5533   LearningRate 0.0000   Epoch: 19   Global Step: 100360   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:42:11,681-Speed 5542.58 samples/sec   Loss 1.5494   LearningRate 0.0000   Epoch: 19   Global Step: 100370   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:42:13,527-Speed 5549.71 samples/sec   Loss 1.6425   LearningRate 0.0000   Epoch: 19   Global Step: 100380   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:42:15,374-Speed 5547.63 samples/sec   Loss 1.5772   LearningRate 0.0000   Epoch: 19   Global Step: 100390   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:42:17,207-Speed 5586.91 samples/sec   Loss 1.5299   LearningRate 0.0000   Epoch: 19   Global Step: 100400   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:42:19,040-Speed 5589.34 samples/sec   Loss 1.4655   LearningRate 0.0000   Epoch: 19   Global Step: 100410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:20,870-Speed 5599.38 samples/sec   Loss 1.4868   LearningRate 0.0000   Epoch: 19   Global Step: 100420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:22,705-Speed 5581.82 samples/sec   Loss 1.5557   LearningRate 0.0000   Epoch: 19   Global Step: 100430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:24,539-Speed 5585.06 samples/sec   Loss 1.5357   LearningRate 0.0000   Epoch: 19   Global Step: 100440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:26,389-Speed 5536.67 samples/sec   Loss 1.5620   LearningRate 0.0000   Epoch: 19   Global Step: 100450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:28,225-Speed 5580.68 samples/sec   Loss 1.5270   LearningRate 0.0000   Epoch: 19   Global Step: 100460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:30,059-Speed 5584.71 samples/sec   Loss 1.6223   LearningRate 0.0000   Epoch: 19   Global Step: 100470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:31,890-Speed 5593.53 samples/sec   Loss 1.4744   LearningRate 0.0000   Epoch: 19   Global Step: 100480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:33,721-Speed 5597.60 samples/sec   Loss 1.5056   LearningRate 0.0000   Epoch: 19   Global Step: 100490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:35,574-Speed 5526.92 samples/sec   Loss 1.4846   LearningRate 0.0000   Epoch: 19   Global Step: 100500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:37,405-Speed 5593.80 samples/sec   Loss 1.5588   LearningRate 0.0000   Epoch: 19   Global Step: 100510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:42:39,242-Speed 5578.72 samples/sec   Loss 1.5121   LearningRate 0.0000   Epoch: 19   Global Step: 100520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:42:41,088-Speed 5549.08 samples/sec   Loss 1.4202   LearningRate 0.0000   Epoch: 19   Global Step: 100530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:42,941-Speed 5527.77 samples/sec   Loss 1.5759   LearningRate 0.0000   Epoch: 19   Global Step: 100540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:44,773-Speed 5593.22 samples/sec   Loss 1.5173   LearningRate 0.0000   Epoch: 19   Global Step: 100550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:46,609-Speed 5576.99 samples/sec   Loss 1.5604   LearningRate 0.0000   Epoch: 19   Global Step: 100560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:48,453-Speed 5555.03 samples/sec   Loss 1.4869   LearningRate 0.0000   Epoch: 19   Global Step: 100570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:50,302-Speed 5542.52 samples/sec   Loss 1.5304   LearningRate 0.0000   Epoch: 19   Global Step: 100580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:52,140-Speed 5572.98 samples/sec   Loss 1.5888   LearningRate 0.0000   Epoch: 19   Global Step: 100590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:54,002-Speed 5503.27 samples/sec   Loss 1.5424   LearningRate 0.0000   Epoch: 19   Global Step: 100600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:55,848-Speed 5548.41 samples/sec   Loss 1.5421   LearningRate 0.0000   Epoch: 19   Global Step: 100610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:57,685-Speed 5574.42 samples/sec   Loss 1.6090   LearningRate 0.0000   Epoch: 19   Global Step: 100620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:42:59,510-Speed 5613.50 samples/sec   Loss 1.5747   LearningRate 0.0000   Epoch: 19   Global Step: 100630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:01,354-Speed 5556.15 samples/sec   Loss 1.4801   LearningRate 0.0000   Epoch: 19   Global Step: 100640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:03,195-Speed 5562.69 samples/sec   Loss 1.5378   LearningRate 0.0000   Epoch: 19   Global Step: 100650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:05,032-Speed 5577.43 samples/sec   Loss 1.5450   LearningRate 0.0000   Epoch: 19   Global Step: 100660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:06,868-Speed 5579.58 samples/sec   Loss 1.5391   LearningRate 0.0000   Epoch: 19   Global Step: 100670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:08,698-Speed 5598.59 samples/sec   Loss 1.5710   LearningRate 0.0000   Epoch: 19   Global Step: 100680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:10,551-Speed 5528.73 samples/sec   Loss 1.4891   LearningRate 0.0000   Epoch: 19   Global Step: 100690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:12,382-Speed 5596.29 samples/sec   Loss 1.6110   LearningRate 0.0000   Epoch: 19   Global Step: 100700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:14,231-Speed 5539.39 samples/sec   Loss 1.4947   LearningRate 0.0000   Epoch: 19   Global Step: 100710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:16,065-Speed 5584.73 samples/sec   Loss 1.6215   LearningRate 0.0000   Epoch: 19   Global Step: 100720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:17,917-Speed 5530.12 samples/sec   Loss 1.5123   LearningRate 0.0000   Epoch: 19   Global Step: 100730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:43:19,738-Speed 5627.34 samples/sec   Loss 1.4700   LearningRate 0.0000   Epoch: 19   Global Step: 100740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:21,569-Speed 5593.05 samples/sec   Loss 1.4786   LearningRate 0.0000   Epoch: 19   Global Step: 100750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:23,411-Speed 5560.56 samples/sec   Loss 1.5158   LearningRate 0.0000   Epoch: 19   Global Step: 100760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:25,256-Speed 5553.86 samples/sec   Loss 1.5067   LearningRate 0.0000   Epoch: 19   Global Step: 100770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:27,108-Speed 5532.32 samples/sec   Loss 1.6094   LearningRate 0.0000   Epoch: 19   Global Step: 100780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:28,953-Speed 5550.47 samples/sec   Loss 1.5322   LearningRate 0.0000   Epoch: 19   Global Step: 100790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:30,794-Speed 5567.16 samples/sec   Loss 1.5195   LearningRate 0.0000   Epoch: 19   Global Step: 100800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:32,624-Speed 5597.71 samples/sec   Loss 1.5206   LearningRate 0.0000   Epoch: 19   Global Step: 100810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:34,458-Speed 5586.01 samples/sec   Loss 1.4713   LearningRate 0.0000   Epoch: 19   Global Step: 100820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:36,289-Speed 5592.56 samples/sec   Loss 1.6088   LearningRate 0.0000   Epoch: 19   Global Step: 100830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:38,124-Speed 5581.88 samples/sec   Loss 1.4945   LearningRate 0.0000   Epoch: 19   Global Step: 100840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:39,961-Speed 5579.29 samples/sec   Loss 1.5250   LearningRate 0.0000   Epoch: 19   Global Step: 100850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:41,794-Speed 5586.83 samples/sec   Loss 1.5099   LearningRate 0.0000   Epoch: 19   Global Step: 100860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:43,625-Speed 5594.96 samples/sec   Loss 1.4279   LearningRate 0.0000   Epoch: 19   Global Step: 100870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:45,461-Speed 5578.02 samples/sec   Loss 1.5159   LearningRate 0.0000   Epoch: 19   Global Step: 100880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:47,306-Speed 5552.04 samples/sec   Loss 1.5473   LearningRate 0.0000   Epoch: 19   Global Step: 100890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:49,189-Speed 5442.03 samples/sec   Loss 1.5156   LearningRate 0.0000   Epoch: 19   Global Step: 100900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:51,080-Speed 5417.35 samples/sec   Loss 1.5026   LearningRate 0.0000   Epoch: 19   Global Step: 100910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:52,918-Speed 5574.29 samples/sec   Loss 1.5635   LearningRate 0.0000   Epoch: 19   Global Step: 100920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:54,756-Speed 5572.19 samples/sec   Loss 1.5136   LearningRate 0.0000   Epoch: 19   Global Step: 100930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:56,580-Speed 5618.16 samples/sec   Loss 1.4635   LearningRate 0.0000   Epoch: 19   Global Step: 100940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:43:58,413-Speed 5586.29 samples/sec   Loss 1.5180   LearningRate 0.0000   Epoch: 19   Global Step: 100950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:44:00,247-Speed 5585.81 samples/sec   Loss 1.5239   LearningRate 0.0000   Epoch: 19   Global Step: 100960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:44:02,087-Speed 5568.27 samples/sec   Loss 1.5212   LearningRate 0.0000   Epoch: 19   Global Step: 100970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:44:03,926-Speed 5570.40 samples/sec   Loss 1.5330   LearningRate 0.0000   Epoch: 19   Global Step: 100980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:44:05,762-Speed 5579.61 samples/sec   Loss 1.5464   LearningRate 0.0000   Epoch: 19   Global Step: 100990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:44:07,596-Speed 5583.72 samples/sec   Loss 1.5363   LearningRate 0.0000   Epoch: 19   Global Step: 101000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:44:09,438-Speed 5561.98 samples/sec   Loss 1.5348   LearningRate 0.0000   Epoch: 19   Global Step: 101010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:44:11,271-Speed 5590.68 samples/sec   Loss 1.5249   LearningRate 0.0000   Epoch: 19   Global Step: 101020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:44:13,120-Speed 5539.22 samples/sec   Loss 1.5277   LearningRate 0.0000   Epoch: 19   Global Step: 101030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:44:14,960-Speed 5567.56 samples/sec   Loss 1.5634   LearningRate 0.0000   Epoch: 19   Global Step: 101040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:44:16,800-Speed 5568.38 samples/sec   Loss 1.4445   LearningRate 0.0000   Epoch: 19   Global Step: 101050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:44:18,637-Speed 5574.81 samples/sec   Loss 1.5661   LearningRate 0.0000   Epoch: 19   Global Step: 101060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-11 16:44:20,475-Speed 5574.33 samples/sec   Loss 1.5339   LearningRate 0.0000   Epoch: 19   Global Step: 101070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:44:22,315-Speed 5567.23 samples/sec   Loss 1.6465   LearningRate 0.0000   Epoch: 19   Global Step: 101080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:44:24,156-Speed 5566.19 samples/sec   Loss 1.5718   LearningRate 0.0000   Epoch: 19   Global Step: 101090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-11 16:44:25,995-Speed 5569.65 samples/sec   Loss 1.5917   LearningRate 0.0000   Epoch: 19   Global Step: 101100   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:44:27,839-Speed 5553.29 samples/sec   Loss 1.6061   LearningRate 0.0000   Epoch: 19   Global Step: 101110   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:44:29,674-Speed 5582.25 samples/sec   Loss 1.4670   LearningRate 0.0000   Epoch: 19   Global Step: 101120   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:44:31,516-Speed 5563.60 samples/sec   Loss 1.5379   LearningRate 0.0000   Epoch: 19   Global Step: 101130   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:44:33,348-Speed 5592.15 samples/sec   Loss 1.5829   LearningRate 0.0000   Epoch: 19   Global Step: 101140   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:44:35,261-Speed 5355.66 samples/sec   Loss 1.5560   LearningRate 0.0000   Epoch: 19   Global Step: 101150   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-11 16:44:37,066-Speed 5673.23 samples/sec   Loss 1.5722   LearningRate 0.0000   Epoch: 19   Global Step: 101160   Fp16 Grad Scale: 16384   Required: -0 hours