Training: 2022-04-11 11:31:24,318-rank_id: 0
Training: 2022-04-11 11:31:51,754-: margin_list              [1.0, 0.0, 0.4]
Training: 2022-04-11 11:31:51,755-: network                  mbf
Training: 2022-04-11 11:31:51,755-: resume                   False
Training: 2022-04-11 11:31:51,755-: output                   work_dirs/glint360k_mbf
Training: 2022-04-11 11:31:51,755-: embedding_size           512
Training: 2022-04-11 11:31:51,755-: sample_rate              1.0
Training: 2022-04-11 11:31:51,755-: interclass_filtering_threshold0
Training: 2022-04-11 11:31:51,755-: fp16                     True
Training: 2022-04-11 11:31:51,755-: batch_size               128
Training: 2022-04-11 11:31:51,756-: optimizer                sgd
Training: 2022-04-11 11:31:51,756-: lr                       0.1
Training: 2022-04-11 11:31:51,756-: momentum                 0.9
Training: 2022-04-11 11:31:51,756-: weight_decay             0.0001
Training: 2022-04-11 11:31:51,756-: verbose                  2000
Training: 2022-04-11 11:31:51,756-: frequent                 10
Training: 2022-04-11 11:31:51,756-: dali                     False
Training: 2022-04-11 11:31:51,756-: rec                      /train_tmp/glint360k
Training: 2022-04-11 11:31:51,756-: num_classes              360232
Training: 2022-04-11 11:31:51,756-: num_image                17091657
Training: 2022-04-11 11:31:51,756-: num_epoch                20
Training: 2022-04-11 11:31:51,756-: warmup_epoch             0
Training: 2022-04-11 11:31:51,756-: val_targets              ['lfw', 'cfp_fp', 'agedb_30']
Training: 2022-04-11 11:31:51,756-: total_batch_size         1024
Training: 2022-04-11 11:31:51,756-: warmup_step              0
Training: 2022-04-11 11:31:51,756-: total_step               333820
Training: 2022-04-11 11:33:15,630-Reducer buckets have been rebuilt in this iteration.
Training: 2022-04-11 11:33:17,713-Speed 8787.19 samples/sec   Loss 42.3996   LearningRate 0.1000   Epoch: 0   Global Step: 20   Fp16 Grad Scale: 4096   Required: 122 hours
Training: 2022-04-11 11:33:18,859-Speed 8943.89 samples/sec   Loss 42.5296   LearningRate 0.1000   Epoch: 0   Global Step: 30   Fp16 Grad Scale: 4096   Required: 86 hours
Training: 2022-04-11 11:33:19,964-Speed 9273.25 samples/sec   Loss 42.6236   LearningRate 0.1000   Epoch: 0   Global Step: 40   Fp16 Grad Scale: 4096   Required: 67 hours
Training: 2022-04-11 11:33:21,087-Speed 9116.49 samples/sec   Loss 42.7848   LearningRate 0.1000   Epoch: 0   Global Step: 50   Fp16 Grad Scale: 4096   Required: 56 hours
Training: 2022-04-11 11:33:22,208-Speed 9140.96 samples/sec   Loss 42.9586   LearningRate 0.1000   Epoch: 0   Global Step: 60   Fp16 Grad Scale: 4096   Required: 49 hours
Training: 2022-04-11 11:33:23,363-Speed 8880.50 samples/sec   Loss 43.0868   LearningRate 0.1000   Epoch: 0   Global Step: 70   Fp16 Grad Scale: 4096   Required: 43 hours
Training: 2022-04-11 11:33:24,474-Speed 9219.27 samples/sec   Loss 43.2540   LearningRate 0.1000   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 4096   Required: 39 hours
Training: 2022-04-11 11:33:25,605-Speed 9063.97 samples/sec   Loss 43.1604   LearningRate 0.0999   Epoch: 0   Global Step: 90   Fp16 Grad Scale: 4096   Required: 36 hours
Training: 2022-04-11 11:33:26,769-Speed 8797.35 samples/sec   Loss 43.0091   LearningRate 0.0999   Epoch: 0   Global Step: 100   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-04-11 11:33:27,892-Speed 9126.50 samples/sec   Loss 43.0253   LearningRate 0.0999   Epoch: 0   Global Step: 110   Fp16 Grad Scale: 8192   Required: 32 hours
Training: 2022-04-11 11:33:28,960-Speed 9602.65 samples/sec   Loss 43.3264   LearningRate 0.0999   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 8192   Required: 30 hours
Training: 2022-04-11 11:33:30,198-Speed 8271.76 samples/sec   Loss 43.0532   LearningRate 0.0999   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 8192   Required: 28 hours
Training: 2022-04-11 11:33:31,227-Speed 9958.09 samples/sec   Loss 43.3245   LearningRate 0.0999   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 8192   Required: 27 hours
Training: 2022-04-11 11:33:32,344-Speed 9172.97 samples/sec   Loss 43.0899   LearningRate 0.0999   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 8192   Required: 26 hours
Training: 2022-04-11 11:33:33,489-Speed 8951.67 samples/sec   Loss 43.3627   LearningRate 0.0999   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 8192   Required: 25 hours
Training: 2022-04-11 11:33:34,629-Speed 8991.83 samples/sec   Loss 42.9427   LearningRate 0.0999   Epoch: 0   Global Step: 170   Fp16 Grad Scale: 8192   Required: 24 hours
Training: 2022-04-11 11:33:35,735-Speed 9262.29 samples/sec   Loss 42.8606   LearningRate 0.0999   Epoch: 0   Global Step: 180   Fp16 Grad Scale: 8192   Required: 23 hours
Training: 2022-04-11 11:33:36,817-Speed 9473.41 samples/sec   Loss 42.9219   LearningRate 0.0999   Epoch: 0   Global Step: 190   Fp16 Grad Scale: 8192   Required: 23 hours
Training: 2022-04-11 11:33:37,905-Speed 9412.27 samples/sec   Loss 43.0300   LearningRate 0.0999   Epoch: 0   Global Step: 200   Fp16 Grad Scale: 8192   Required: 22 hours
Training: 2022-04-11 11:33:39,044-Speed 8997.29 samples/sec   Loss 42.8388   LearningRate 0.0999   Epoch: 0   Global Step: 210   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-11 11:33:40,127-Speed 9460.20 samples/sec   Loss 42.7754   LearningRate 0.0999   Epoch: 0   Global Step: 220   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-11 11:33:41,237-Speed 9234.45 samples/sec   Loss 42.6151   LearningRate 0.0999   Epoch: 0   Global Step: 230   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-11 11:33:42,329-Speed 9383.76 samples/sec   Loss 42.6262   LearningRate 0.0999   Epoch: 0   Global Step: 240   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-11 11:33:43,419-Speed 9399.88 samples/sec   Loss 42.5744   LearningRate 0.0999   Epoch: 0   Global Step: 250   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-11 11:33:44,472-Speed 9729.27 samples/sec   Loss 42.6049   LearningRate 0.0998   Epoch: 0   Global Step: 260   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-11 11:33:45,529-Speed 9699.90 samples/sec   Loss 42.3859   LearningRate 0.0998   Epoch: 0   Global Step: 270   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-11 11:33:46,603-Speed 9540.58 samples/sec   Loss 42.2667   LearningRate 0.0998   Epoch: 0   Global Step: 280   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-11 11:33:47,693-Speed 9399.71 samples/sec   Loss 42.2248   LearningRate 0.0998   Epoch: 0   Global Step: 290   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-11 11:33:48,804-Speed 9221.25 samples/sec   Loss 42.0976   LearningRate 0.0998   Epoch: 0   Global Step: 300   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-11 11:33:49,874-Speed 9575.45 samples/sec   Loss 42.0218   LearningRate 0.0998   Epoch: 0   Global Step: 310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 11:33:50,978-Speed 9282.55 samples/sec   Loss 41.9394   LearningRate 0.0998   Epoch: 0   Global Step: 320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-11 11:33:52,084-Speed 9264.44 samples/sec   Loss 41.9499   LearningRate 0.0998   Epoch: 0   Global Step: 330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 11:33:53,176-Speed 9379.86 samples/sec   Loss 41.8290   LearningRate 0.0998   Epoch: 0   Global Step: 340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 11:33:54,248-Speed 9558.24 samples/sec   Loss 41.8004   LearningRate 0.0998   Epoch: 0   Global Step: 350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 11:33:55,351-Speed 9293.91 samples/sec   Loss 41.7503   LearningRate 0.0998   Epoch: 0   Global Step: 360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 11:33:56,449-Speed 9332.67 samples/sec   Loss 41.6263   LearningRate 0.0998   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-11 11:33:57,561-Speed 9208.24 samples/sec   Loss 41.5018   LearningRate 0.0998   Epoch: 0   Global Step: 380   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 11:33:58,621-Speed 9671.25 samples/sec   Loss 41.5668   LearningRate 0.0998   Epoch: 0   Global Step: 390   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 11:33:59,684-Speed 9632.53 samples/sec   Loss 41.4467   LearningRate 0.0998   Epoch: 0   Global Step: 400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-11 11:34:00,803-Speed 9169.17 samples/sec   Loss 41.4013   LearningRate 0.0998   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 11:34:01,924-Speed 9140.94 samples/sec   Loss 41.2523   LearningRate 0.0997   Epoch: 0   Global Step: 420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 11:34:03,026-Speed 9302.61 samples/sec   Loss 41.3443   LearningRate 0.0997   Epoch: 0   Global Step: 430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 11:34:04,096-Speed 9569.18 samples/sec   Loss 41.3764   LearningRate 0.0997   Epoch: 0   Global Step: 440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-11 11:34:05,185-Speed 9415.58 samples/sec   Loss 41.2882   LearningRate 0.0997   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 11:34:06,279-Speed 9366.75 samples/sec   Loss 41.2735   LearningRate 0.0997   Epoch: 0   Global Step: 460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 11:34:07,447-Speed 8768.55 samples/sec   Loss 41.1040   LearningRate 0.0997   Epoch: 0   Global Step: 470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 11:34:08,568-Speed 9145.66 samples/sec   Loss 41.0232   LearningRate 0.0997   Epoch: 0   Global Step: 480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 11:34:09,637-Speed 9584.77 samples/sec   Loss 41.0009   LearningRate 0.0997   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 11:34:10,700-Speed 9646.95 samples/sec   Loss 40.8902   LearningRate 0.0997   Epoch: 0   Global Step: 500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-11 11:34:11,786-Speed 9431.41 samples/sec   Loss 40.7500   LearningRate 0.0997   Epoch: 0   Global Step: 510   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 11:34:12,894-Speed 9245.96 samples/sec   Loss 40.7766   LearningRate 0.0997   Epoch: 0   Global Step: 520   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 11:34:13,937-Speed 9829.37 samples/sec   Loss 40.7121   LearningRate 0.0997   Epoch: 0   Global Step: 530   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 11:34:15,048-Speed 9226.87 samples/sec   Loss 40.6090   LearningRate 0.0997   Epoch: 0   Global Step: 540   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-11 11:34:16,108-Speed 9666.96 samples/sec   Loss 40.5855   LearningRate 0.0997   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:34:17,171-Speed 9635.27 samples/sec   Loss 40.5653   LearningRate 0.0997   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:34:18,268-Speed 9343.05 samples/sec   Loss 40.4747   LearningRate 0.0997   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:34:19,391-Speed 9121.43 samples/sec   Loss 40.3522   LearningRate 0.0997   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:34:20,456-Speed 9622.83 samples/sec   Loss 40.3887   LearningRate 0.0996   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:34:21,592-Speed 9019.80 samples/sec   Loss 40.2842   LearningRate 0.0996   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:34:22,695-Speed 9287.78 samples/sec   Loss 40.1904   LearningRate 0.0996   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:34:23,840-Speed 8948.52 samples/sec   Loss 40.0410   LearningRate 0.0996   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:34:24,917-Speed 9523.70 samples/sec   Loss 40.1066   LearningRate 0.0996   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:34:25,971-Speed 9722.78 samples/sec   Loss 40.0238   LearningRate 0.0996   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:34:27,052-Speed 9478.21 samples/sec   Loss 39.9026   LearningRate 0.0996   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:34:28,109-Speed 9691.78 samples/sec   Loss 40.0262   LearningRate 0.0996   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:34:29,179-Speed 9576.41 samples/sec   Loss 39.8761   LearningRate 0.0996   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:34:30,265-Speed 9438.76 samples/sec   Loss 39.7657   LearningRate 0.0996   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:34:31,317-Speed 9738.25 samples/sec   Loss 39.7470   LearningRate 0.0996   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:34:32,374-Speed 9695.30 samples/sec   Loss 39.5880   LearningRate 0.0996   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:33,431-Speed 9693.54 samples/sec   Loss 39.6011   LearningRate 0.0996   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:34,515-Speed 9445.64 samples/sec   Loss 39.4537   LearningRate 0.0996   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:35,622-Speed 9260.37 samples/sec   Loss 39.4143   LearningRate 0.0996   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:36,753-Speed 9062.49 samples/sec   Loss 39.4386   LearningRate 0.0996   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:37,852-Speed 9315.90 samples/sec   Loss 39.2995   LearningRate 0.0996   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:38,898-Speed 9797.14 samples/sec   Loss 39.2248   LearningRate 0.0995   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:39,952-Speed 9726.92 samples/sec   Loss 39.2457   LearningRate 0.0995   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:40,982-Speed 9951.72 samples/sec   Loss 39.1575   LearningRate 0.0995   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:42,015-Speed 9916.22 samples/sec   Loss 39.0314   LearningRate 0.0995   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:43,081-Speed 9615.79 samples/sec   Loss 39.0970   LearningRate 0.0995   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:44,124-Speed 9821.94 samples/sec   Loss 39.0166   LearningRate 0.0995   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:45,174-Speed 9758.53 samples/sec   Loss 39.0490   LearningRate 0.0995   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:46,240-Speed 9607.50 samples/sec   Loss 38.7588   LearningRate 0.0995   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:47,344-Speed 9281.40 samples/sec   Loss 38.7828   LearningRate 0.0995   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:48,442-Speed 9330.52 samples/sec   Loss 38.7587   LearningRate 0.0995   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:49,530-Speed 9417.79 samples/sec   Loss 38.7793   LearningRate 0.0995   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:50,629-Speed 9324.35 samples/sec   Loss 38.5799   LearningRate 0.0995   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:51,699-Speed 9573.77 samples/sec   Loss 38.5452   LearningRate 0.0995   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:52,814-Speed 9192.81 samples/sec   Loss 38.5473   LearningRate 0.0995   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:53,899-Speed 9441.89 samples/sec   Loss 38.3906   LearningRate 0.0995   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:54,939-Speed 9855.11 samples/sec   Loss 38.3346   LearningRate 0.0995   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:56,064-Speed 9109.98 samples/sec   Loss 38.2635   LearningRate 0.0994   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:57,209-Speed 8942.94 samples/sec   Loss 38.2549   LearningRate 0.0994   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:58,337-Speed 9084.14 samples/sec   Loss 38.1453   LearningRate 0.0994   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:34:59,425-Speed 9422.30 samples/sec   Loss 38.1507   LearningRate 0.0994   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:35:00,526-Speed 9311.86 samples/sec   Loss 37.9890   LearningRate 0.0994   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:35:01,615-Speed 9404.35 samples/sec   Loss 38.0052   LearningRate 0.0994   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:35:02,660-Speed 9804.46 samples/sec   Loss 37.9400   LearningRate 0.0994   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:03,742-Speed 9465.14 samples/sec   Loss 37.8194   LearningRate 0.0994   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:04,846-Speed 9286.35 samples/sec   Loss 37.7295   LearningRate 0.0994   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:05,934-Speed 9418.60 samples/sec   Loss 37.6577   LearningRate 0.0994   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 524288   Required: 12 hours
Training: 2022-04-11 11:35:07,015-Speed 9479.56 samples/sec   Loss 37.6427   LearningRate 0.0994   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:08,098-Speed 9457.02 samples/sec   Loss 37.7354   LearningRate 0.0994   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:09,183-Speed 9445.79 samples/sec   Loss 37.5299   LearningRate 0.0994   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:10,303-Speed 9151.65 samples/sec   Loss 37.5562   LearningRate 0.0994   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:11,393-Speed 9398.64 samples/sec   Loss 37.3747   LearningRate 0.0994   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:12,477-Speed 9450.83 samples/sec   Loss 37.3130   LearningRate 0.0994   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:13,575-Speed 9327.09 samples/sec   Loss 37.2564   LearningRate 0.0994   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:14,714-Speed 9001.19 samples/sec   Loss 37.3192   LearningRate 0.0993   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:15,820-Speed 9267.50 samples/sec   Loss 37.1651   LearningRate 0.0993   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:16,883-Speed 9636.97 samples/sec   Loss 37.0751   LearningRate 0.0993   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:17,947-Speed 9626.88 samples/sec   Loss 36.9957   LearningRate 0.0993   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:19,076-Speed 9080.30 samples/sec   Loss 37.0190   LearningRate 0.0993   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:20,184-Speed 9251.19 samples/sec   Loss 36.9417   LearningRate 0.0993   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:21,226-Speed 9830.29 samples/sec   Loss 36.8264   LearningRate 0.0993   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:22,308-Speed 9471.05 samples/sec   Loss 36.8428   LearningRate 0.0993   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:23,433-Speed 9103.34 samples/sec   Loss 36.6717   LearningRate 0.0993   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:24,508-Speed 9530.40 samples/sec   Loss 36.7140   LearningRate 0.0993   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:25,580-Speed 9559.44 samples/sec   Loss 36.6191   LearningRate 0.0993   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:26,626-Speed 9798.49 samples/sec   Loss 36.5196   LearningRate 0.0993   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:27,730-Speed 9281.89 samples/sec   Loss 36.3867   LearningRate 0.0993   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:28,815-Speed 9442.52 samples/sec   Loss 36.5123   LearningRate 0.0993   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:29,888-Speed 9545.75 samples/sec   Loss 36.3375   LearningRate 0.0993   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:30,972-Speed 9457.84 samples/sec   Loss 36.3926   LearningRate 0.0993   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:32,039-Speed 9596.86 samples/sec   Loss 36.2685   LearningRate 0.0993   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:33,129-Speed 9401.41 samples/sec   Loss 36.1549   LearningRate 0.0992   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:34,200-Speed 9567.13 samples/sec   Loss 36.1515   LearningRate 0.0992   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:35,265-Speed 9623.66 samples/sec   Loss 36.0292   LearningRate 0.0992   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:36,345-Speed 9486.40 samples/sec   Loss 35.9985   LearningRate 0.0992   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:37,450-Speed 9273.37 samples/sec   Loss 35.9019   LearningRate 0.0992   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:38,506-Speed 9707.68 samples/sec   Loss 35.8413   LearningRate 0.0992   Epoch: 0   Global Step: 1310   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:39,548-Speed 9837.54 samples/sec   Loss 35.7904   LearningRate 0.0992   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:40,593-Speed 9805.59 samples/sec   Loss 35.7528   LearningRate 0.0992   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:41,672-Speed 9491.06 samples/sec   Loss 35.7582   LearningRate 0.0992   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:42,773-Speed 9310.69 samples/sec   Loss 35.5771   LearningRate 0.0992   Epoch: 0   Global Step: 1350   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:43,862-Speed 9410.46 samples/sec   Loss 35.4164   LearningRate 0.0992   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:44,955-Speed 9374.11 samples/sec   Loss 35.4513   LearningRate 0.0992   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:46,031-Speed 9515.81 samples/sec   Loss 35.4516   LearningRate 0.0992   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:47,127-Speed 9346.49 samples/sec   Loss 35.2492   LearningRate 0.0992   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:48,227-Speed 9314.60 samples/sec   Loss 35.4169   LearningRate 0.0992   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:35:49,247-Speed 10050.07 samples/sec   Loss 35.3666   LearningRate 0.0992   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:50,367-Speed 9150.26 samples/sec   Loss 35.2298   LearningRate 0.0992   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:51,412-Speed 9802.42 samples/sec   Loss 35.1054   LearningRate 0.0991   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:52,452-Speed 9852.02 samples/sec   Loss 34.9619   LearningRate 0.0991   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:53,558-Speed 9260.97 samples/sec   Loss 35.0683   LearningRate 0.0991   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:54,665-Speed 9255.69 samples/sec   Loss 34.9896   LearningRate 0.0991   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:55,788-Speed 9124.10 samples/sec   Loss 34.5901   LearningRate 0.0991   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:56,883-Speed 9360.61 samples/sec   Loss 34.7220   LearningRate 0.0991   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:57,979-Speed 9346.61 samples/sec   Loss 34.6491   LearningRate 0.0991   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:35:59,098-Speed 9155.53 samples/sec   Loss 34.7042   LearningRate 0.0991   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:36:00,166-Speed 9594.95 samples/sec   Loss 34.6305   LearningRate 0.0991   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:36:01,269-Speed 9295.83 samples/sec   Loss 34.5935   LearningRate 0.0991   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:36:02,380-Speed 9220.66 samples/sec   Loss 34.4681   LearningRate 0.0991   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:36:03,437-Speed 9693.33 samples/sec   Loss 34.4340   LearningRate 0.0991   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:36:04,492-Speed 9703.19 samples/sec   Loss 34.3773   LearningRate 0.0991   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:36:05,574-Speed 9473.03 samples/sec   Loss 34.3005   LearningRate 0.0991   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:36:06,644-Speed 9575.16 samples/sec   Loss 34.2316   LearningRate 0.0991   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:36:07,737-Speed 9376.85 samples/sec   Loss 34.1170   LearningRate 0.0991   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:36:08,824-Speed 9424.79 samples/sec   Loss 34.1382   LearningRate 0.0990   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:36:09,926-Speed 9303.70 samples/sec   Loss 34.0386   LearningRate 0.0990   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:36:11,012-Speed 9432.02 samples/sec   Loss 33.9903   LearningRate 0.0990   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:36:12,109-Speed 9343.99 samples/sec   Loss 33.7938   LearningRate 0.0990   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:13,181-Speed 9557.72 samples/sec   Loss 33.8424   LearningRate 0.0990   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:14,273-Speed 9376.10 samples/sec   Loss 33.6816   LearningRate 0.0990   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:15,344-Speed 9567.98 samples/sec   Loss 33.7177   LearningRate 0.0990   Epoch: 0   Global Step: 1650   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:16,455-Speed 9221.22 samples/sec   Loss 33.6289   LearningRate 0.0990   Epoch: 0   Global Step: 1660   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:17,545-Speed 9403.30 samples/sec   Loss 33.5413   LearningRate 0.0990   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:18,601-Speed 9703.03 samples/sec   Loss 33.4596   LearningRate 0.0990   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:19,717-Speed 9182.20 samples/sec   Loss 33.3738   LearningRate 0.0990   Epoch: 0   Global Step: 1690   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:20,821-Speed 9283.74 samples/sec   Loss 33.4612   LearningRate 0.0990   Epoch: 0   Global Step: 1700   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:21,921-Speed 9313.25 samples/sec   Loss 33.3738   LearningRate 0.0990   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:23,002-Speed 9485.04 samples/sec   Loss 33.4399   LearningRate 0.0990   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:24,062-Speed 9660.11 samples/sec   Loss 33.2144   LearningRate 0.0990   Epoch: 0   Global Step: 1730   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:25,133-Speed 9564.03 samples/sec   Loss 33.1580   LearningRate 0.0990   Epoch: 0   Global Step: 1740   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:26,234-Speed 9305.92 samples/sec   Loss 32.9919   LearningRate 0.0990   Epoch: 0   Global Step: 1750   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:27,350-Speed 9187.71 samples/sec   Loss 32.9931   LearningRate 0.0989   Epoch: 0   Global Step: 1760   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:28,452-Speed 9298.98 samples/sec   Loss 33.0209   LearningRate 0.0989   Epoch: 0   Global Step: 1770   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:29,565-Speed 9200.66 samples/sec   Loss 32.7947   LearningRate 0.0989   Epoch: 0   Global Step: 1780   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:30,663-Speed 9336.27 samples/sec   Loss 32.8844   LearningRate 0.0989   Epoch: 0   Global Step: 1790   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:31,760-Speed 9337.08 samples/sec   Loss 32.7610   LearningRate 0.0989   Epoch: 0   Global Step: 1800   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:32,878-Speed 9166.64 samples/sec   Loss 32.7524   LearningRate 0.0989   Epoch: 0   Global Step: 1810   Fp16 Grad Scale: 524288   Required: 11 hours
Training: 2022-04-11 11:36:33,996-Speed 9169.01 samples/sec   Loss 32.5941   LearningRate 0.0989   Epoch: 0   Global Step: 1820   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:35,073-Speed 9507.41 samples/sec   Loss 32.6007   LearningRate 0.0989   Epoch: 0   Global Step: 1830   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:36,129-Speed 9700.83 samples/sec   Loss 32.5418   LearningRate 0.0989   Epoch: 0   Global Step: 1840   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:37,155-Speed 9992.45 samples/sec   Loss 32.6607   LearningRate 0.0989   Epoch: 0   Global Step: 1850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 11:36:38,207-Speed 9734.39 samples/sec   Loss 32.3708   LearningRate 0.0989   Epoch: 0   Global Step: 1860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 11:36:39,257-Speed 9760.87 samples/sec   Loss 32.3397   LearningRate 0.0989   Epoch: 0   Global Step: 1870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 11:36:40,335-Speed 9507.46 samples/sec   Loss 32.3313   LearningRate 0.0989   Epoch: 0   Global Step: 1880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 11:36:41,429-Speed 9368.00 samples/sec   Loss 32.3307   LearningRate 0.0989   Epoch: 0   Global Step: 1890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 11:36:42,526-Speed 9337.89 samples/sec   Loss 32.2800   LearningRate 0.0989   Epoch: 0   Global Step: 1900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 11:36:43,587-Speed 9657.83 samples/sec   Loss 32.1815   LearningRate 0.0989   Epoch: 0   Global Step: 1910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 11:36:44,663-Speed 9520.41 samples/sec   Loss 32.1612   LearningRate 0.0989   Epoch: 0   Global Step: 1920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 11:36:45,759-Speed 9355.43 samples/sec   Loss 31.9223   LearningRate 0.0988   Epoch: 0   Global Step: 1930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 11:36:46,843-Speed 9450.90 samples/sec   Loss 31.9629   LearningRate 0.0988   Epoch: 0   Global Step: 1940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 11:36:47,940-Speed 9339.64 samples/sec   Loss 31.9720   LearningRate 0.0988   Epoch: 0   Global Step: 1950   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:49,028-Speed 9418.06 samples/sec   Loss 31.8727   LearningRate 0.0988   Epoch: 0   Global Step: 1960   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:50,119-Speed 9395.04 samples/sec   Loss 31.7402   LearningRate 0.0988   Epoch: 0   Global Step: 1970   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:51,171-Speed 9736.71 samples/sec   Loss 31.6748   LearningRate 0.0988   Epoch: 0   Global Step: 1980   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:52,272-Speed 9304.66 samples/sec   Loss 31.6774   LearningRate 0.0988   Epoch: 0   Global Step: 1990   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:36:53,338-Speed 9615.02 samples/sec   Loss 31.8078   LearningRate 0.0988   Epoch: 0   Global Step: 2000   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 11:37:15,243-[lfw][2000]XNorm: 21.302492
Training: 2022-04-11 11:37:15,244-[lfw][2000]Accuracy-Flip: 0.94083+-0.01044
Training: 2022-04-11 11:37:15,244-[lfw][2000]Accuracy-Highest: 0.94083
Training: 2022-04-11 11:37:40,497-[cfp_fp][2000]XNorm: 20.083306
Training: 2022-04-11 11:37:40,497-[cfp_fp][2000]Accuracy-Flip: 0.73143+-0.01352
Training: 2022-04-11 11:37:40,498-[cfp_fp][2000]Accuracy-Highest: 0.73143
Training: 2022-04-11 11:38:02,464-[agedb_30][2000]XNorm: 19.667832
Training: 2022-04-11 11:38:02,465-[agedb_30][2000]Accuracy-Flip: 0.74450+-0.02003
Training: 2022-04-11 11:38:02,465-[agedb_30][2000]Accuracy-Highest: 0.74450
Training: 2022-04-11 11:38:03,542-Speed 145.86 samples/sec   Loss 31.6493   LearningRate 0.0988   Epoch: 0   Global Step: 2010   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:04,633-Speed 9396.38 samples/sec   Loss 31.4240   LearningRate 0.0988   Epoch: 0   Global Step: 2020   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:05,731-Speed 9329.03 samples/sec   Loss 31.3945   LearningRate 0.0988   Epoch: 0   Global Step: 2030   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:06,783-Speed 9738.87 samples/sec   Loss 31.4165   LearningRate 0.0988   Epoch: 0   Global Step: 2040   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:07,830-Speed 9783.14 samples/sec   Loss 31.4865   LearningRate 0.0988   Epoch: 0   Global Step: 2050   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:08,925-Speed 9361.46 samples/sec   Loss 31.1778   LearningRate 0.0988   Epoch: 0   Global Step: 2060   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:09,988-Speed 9637.98 samples/sec   Loss 31.2587   LearningRate 0.0988   Epoch: 0   Global Step: 2070   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:11,085-Speed 9344.99 samples/sec   Loss 31.2665   LearningRate 0.0988   Epoch: 0   Global Step: 2080   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:12,163-Speed 9504.96 samples/sec   Loss 31.0855   LearningRate 0.0988   Epoch: 0   Global Step: 2090   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:13,282-Speed 9160.56 samples/sec   Loss 30.9602   LearningRate 0.0987   Epoch: 0   Global Step: 2100   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:14,367-Speed 9438.18 samples/sec   Loss 31.0281   LearningRate 0.0987   Epoch: 0   Global Step: 2110   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:15,432-Speed 9620.37 samples/sec   Loss 30.9674   LearningRate 0.0987   Epoch: 0   Global Step: 2120   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:16,530-Speed 9337.18 samples/sec   Loss 30.9391   LearningRate 0.0987   Epoch: 0   Global Step: 2130   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:17,644-Speed 9199.14 samples/sec   Loss 30.8284   LearningRate 0.0987   Epoch: 0   Global Step: 2140   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:18,751-Speed 9252.19 samples/sec   Loss 30.8331   LearningRate 0.0987   Epoch: 0   Global Step: 2150   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:19,840-Speed 9402.93 samples/sec   Loss 30.8226   LearningRate 0.0987   Epoch: 0   Global Step: 2160   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:20,911-Speed 9573.57 samples/sec   Loss 30.5586   LearningRate 0.0987   Epoch: 0   Global Step: 2170   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:22,007-Speed 9350.71 samples/sec   Loss 30.5003   LearningRate 0.0987   Epoch: 0   Global Step: 2180   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:23,090-Speed 9464.59 samples/sec   Loss 30.7346   LearningRate 0.0987   Epoch: 0   Global Step: 2190   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:24,216-Speed 9098.24 samples/sec   Loss 30.3343   LearningRate 0.0987   Epoch: 0   Global Step: 2200   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:25,286-Speed 9573.57 samples/sec   Loss 30.4900   LearningRate 0.0987   Epoch: 0   Global Step: 2210   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:26,388-Speed 9299.88 samples/sec   Loss 30.5228   LearningRate 0.0987   Epoch: 0   Global Step: 2220   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:27,465-Speed 9509.59 samples/sec   Loss 30.2343   LearningRate 0.0987   Epoch: 0   Global Step: 2230   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:28,563-Speed 9331.44 samples/sec   Loss 30.1264   LearningRate 0.0987   Epoch: 0   Global Step: 2240   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:29,633-Speed 9579.75 samples/sec   Loss 30.1347   LearningRate 0.0987   Epoch: 0   Global Step: 2250   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:30,713-Speed 9481.36 samples/sec   Loss 30.0713   LearningRate 0.0987   Epoch: 0   Global Step: 2260   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:31,771-Speed 9689.52 samples/sec   Loss 30.0619   LearningRate 0.0986   Epoch: 0   Global Step: 2270   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:32,816-Speed 9802.15 samples/sec   Loss 30.1006   LearningRate 0.0986   Epoch: 0   Global Step: 2280   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:33,895-Speed 9496.54 samples/sec   Loss 29.8213   LearningRate 0.0986   Epoch: 0   Global Step: 2290   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:34,988-Speed 9371.36 samples/sec   Loss 29.9877   LearningRate 0.0986   Epoch: 0   Global Step: 2300   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:36,066-Speed 9507.48 samples/sec   Loss 29.8158   LearningRate 0.0986   Epoch: 0   Global Step: 2310   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:37,161-Speed 9355.25 samples/sec   Loss 29.8065   LearningRate 0.0986   Epoch: 0   Global Step: 2320   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:38,239-Speed 9507.44 samples/sec   Loss 29.7894   LearningRate 0.0986   Epoch: 0   Global Step: 2330   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:39,331-Speed 9377.40 samples/sec   Loss 29.6188   LearningRate 0.0986   Epoch: 0   Global Step: 2340   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:40,412-Speed 9492.49 samples/sec   Loss 29.7129   LearningRate 0.0986   Epoch: 0   Global Step: 2350   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:41,505-Speed 9375.06 samples/sec   Loss 29.4236   LearningRate 0.0986   Epoch: 0   Global Step: 2360   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:42,593-Speed 9409.32 samples/sec   Loss 29.5886   LearningRate 0.0986   Epoch: 0   Global Step: 2370   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:38:43,692-Speed 9326.12 samples/sec   Loss 29.2220   LearningRate 0.0986   Epoch: 0   Global Step: 2380   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:44,797-Speed 9274.61 samples/sec   Loss 29.4497   LearningRate 0.0986   Epoch: 0   Global Step: 2390   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:45,924-Speed 9091.93 samples/sec   Loss 29.4958   LearningRate 0.0986   Epoch: 0   Global Step: 2400   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:47,011-Speed 9426.79 samples/sec   Loss 29.3085   LearningRate 0.0986   Epoch: 0   Global Step: 2410   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:48,087-Speed 9520.95 samples/sec   Loss 29.3025   LearningRate 0.0986   Epoch: 0   Global Step: 2420   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:49,154-Speed 9600.91 samples/sec   Loss 29.3635   LearningRate 0.0985   Epoch: 0   Global Step: 2430   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:50,233-Speed 9492.13 samples/sec   Loss 29.1482   LearningRate 0.0985   Epoch: 0   Global Step: 2440   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:51,280-Speed 9795.03 samples/sec   Loss 29.0147   LearningRate 0.0985   Epoch: 0   Global Step: 2450   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:52,349-Speed 9575.79 samples/sec   Loss 29.1862   LearningRate 0.0985   Epoch: 0   Global Step: 2460   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:53,395-Speed 9802.06 samples/sec   Loss 28.8512   LearningRate 0.0985   Epoch: 0   Global Step: 2470   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:54,463-Speed 9594.88 samples/sec   Loss 28.9959   LearningRate 0.0985   Epoch: 0   Global Step: 2480   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:55,546-Speed 9456.99 samples/sec   Loss 28.8837   LearningRate 0.0985   Epoch: 0   Global Step: 2490   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:38:56,654-Speed 9247.07 samples/sec   Loss 28.9614   LearningRate 0.0985   Epoch: 0   Global Step: 2500   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:38:57,740-Speed 9440.23 samples/sec   Loss 28.7684   LearningRate 0.0985   Epoch: 0   Global Step: 2510   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:38:58,824-Speed 9449.20 samples/sec   Loss 28.7972   LearningRate 0.0985   Epoch: 0   Global Step: 2520   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:38:59,917-Speed 9380.55 samples/sec   Loss 28.6234   LearningRate 0.0985   Epoch: 0   Global Step: 2530   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:01,054-Speed 9012.10 samples/sec   Loss 28.7611   LearningRate 0.0985   Epoch: 0   Global Step: 2540   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:02,124-Speed 9575.43 samples/sec   Loss 28.6291   LearningRate 0.0985   Epoch: 0   Global Step: 2550   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:03,232-Speed 9244.25 samples/sec   Loss 28.5612   LearningRate 0.0985   Epoch: 0   Global Step: 2560   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:04,279-Speed 9781.87 samples/sec   Loss 28.5539   LearningRate 0.0985   Epoch: 0   Global Step: 2570   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:05,401-Speed 9136.73 samples/sec   Loss 28.5903   LearningRate 0.0985   Epoch: 0   Global Step: 2580   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:06,447-Speed 9794.01 samples/sec   Loss 28.5194   LearningRate 0.0985   Epoch: 0   Global Step: 2590   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:07,533-Speed 9434.32 samples/sec   Loss 28.4560   LearningRate 0.0984   Epoch: 0   Global Step: 2600   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:08,641-Speed 9241.18 samples/sec   Loss 28.3076   LearningRate 0.0984   Epoch: 0   Global Step: 2610   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:09,704-Speed 9645.98 samples/sec   Loss 28.2020   LearningRate 0.0984   Epoch: 0   Global Step: 2620   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:10,799-Speed 9354.14 samples/sec   Loss 28.2341   LearningRate 0.0984   Epoch: 0   Global Step: 2630   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:11,869-Speed 9575.73 samples/sec   Loss 28.0553   LearningRate 0.0984   Epoch: 0   Global Step: 2640   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:12,933-Speed 9632.53 samples/sec   Loss 28.0487   LearningRate 0.0984   Epoch: 0   Global Step: 2650   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:14,060-Speed 9086.66 samples/sec   Loss 28.0750   LearningRate 0.0984   Epoch: 0   Global Step: 2660   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:15,206-Speed 8941.79 samples/sec   Loss 27.8443   LearningRate 0.0984   Epoch: 0   Global Step: 2670   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:16,326-Speed 9154.13 samples/sec   Loss 27.9489   LearningRate 0.0984   Epoch: 0   Global Step: 2680   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:17,410-Speed 9446.34 samples/sec   Loss 28.0030   LearningRate 0.0984   Epoch: 0   Global Step: 2690   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:18,528-Speed 9167.71 samples/sec   Loss 27.9432   LearningRate 0.0984   Epoch: 0   Global Step: 2700   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:19,634-Speed 9259.79 samples/sec   Loss 27.9509   LearningRate 0.0984   Epoch: 0   Global Step: 2710   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:20,750-Speed 9187.54 samples/sec   Loss 27.8057   LearningRate 0.0984   Epoch: 0   Global Step: 2720   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:21,815-Speed 9617.32 samples/sec   Loss 27.7531   LearningRate 0.0984   Epoch: 0   Global Step: 2730   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:22,931-Speed 9182.47 samples/sec   Loss 27.6869   LearningRate 0.0984   Epoch: 0   Global Step: 2740   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:24,006-Speed 9526.62 samples/sec   Loss 27.5178   LearningRate 0.0984   Epoch: 0   Global Step: 2750   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:25,029-Speed 10015.83 samples/sec   Loss 27.5282   LearningRate 0.0984   Epoch: 0   Global Step: 2760   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:26,073-Speed 9820.85 samples/sec   Loss 27.5394   LearningRate 0.0983   Epoch: 0   Global Step: 2770   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:27,146-Speed 9553.98 samples/sec   Loss 27.5231   LearningRate 0.0983   Epoch: 0   Global Step: 2780   Fp16 Grad Scale: 524288   Required: 13 hours
Training: 2022-04-11 11:39:28,255-Speed 9234.42 samples/sec   Loss 27.5008   LearningRate 0.0983   Epoch: 0   Global Step: 2790   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:29,352-Speed 9338.33 samples/sec   Loss 27.4459   LearningRate 0.0983   Epoch: 0   Global Step: 2800   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:30,453-Speed 9306.79 samples/sec   Loss 27.1967   LearningRate 0.0983   Epoch: 0   Global Step: 2810   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:31,510-Speed 9690.65 samples/sec   Loss 27.3284   LearningRate 0.0983   Epoch: 0   Global Step: 2820   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:32,575-Speed 9626.86 samples/sec   Loss 27.2653   LearningRate 0.0983   Epoch: 0   Global Step: 2830   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:33,611-Speed 9886.26 samples/sec   Loss 27.1966   LearningRate 0.0983   Epoch: 0   Global Step: 2840   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:34,649-Speed 9874.45 samples/sec   Loss 27.1315   LearningRate 0.0983   Epoch: 0   Global Step: 2850   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:35,718-Speed 9587.95 samples/sec   Loss 27.1248   LearningRate 0.0983   Epoch: 0   Global Step: 2860   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:36,829-Speed 9221.84 samples/sec   Loss 27.1346   LearningRate 0.0983   Epoch: 0   Global Step: 2870   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:37,915-Speed 9437.15 samples/sec   Loss 26.9945   LearningRate 0.0983   Epoch: 0   Global Step: 2880   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:39,000-Speed 9440.38 samples/sec   Loss 26.9756   LearningRate 0.0983   Epoch: 0   Global Step: 2890   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:40,097-Speed 9344.81 samples/sec   Loss 26.8477   LearningRate 0.0983   Epoch: 0   Global Step: 2900   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:41,171-Speed 9539.52 samples/sec   Loss 26.8870   LearningRate 0.0983   Epoch: 0   Global Step: 2910   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:42,258-Speed 9427.45 samples/sec   Loss 26.6652   LearningRate 0.0983   Epoch: 0   Global Step: 2920   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:43,316-Speed 9684.71 samples/sec   Loss 26.5680   LearningRate 0.0983   Epoch: 0   Global Step: 2930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:39:44,374-Speed 9677.62 samples/sec   Loss 26.6828   LearningRate 0.0982   Epoch: 0   Global Step: 2940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:39:45,452-Speed 9511.06 samples/sec   Loss 26.6652   LearningRate 0.0982   Epoch: 0   Global Step: 2950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:39:46,569-Speed 9169.44 samples/sec   Loss 26.4338   LearningRate 0.0982   Epoch: 0   Global Step: 2960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:39:47,637-Speed 9589.44 samples/sec   Loss 26.4327   LearningRate 0.0982   Epoch: 0   Global Step: 2970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:39:48,730-Speed 9378.20 samples/sec   Loss 26.4036   LearningRate 0.0982   Epoch: 0   Global Step: 2980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:39:49,828-Speed 9324.49 samples/sec   Loss 26.3230   LearningRate 0.0982   Epoch: 0   Global Step: 2990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:39:50,937-Speed 9248.72 samples/sec   Loss 26.4164   LearningRate 0.0982   Epoch: 0   Global Step: 3000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:39:52,048-Speed 9221.44 samples/sec   Loss 26.2399   LearningRate 0.0982   Epoch: 0   Global Step: 3010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:39:53,136-Speed 9411.06 samples/sec   Loss 26.3143   LearningRate 0.0982   Epoch: 0   Global Step: 3020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:39:54,230-Speed 9367.21 samples/sec   Loss 26.2874   LearningRate 0.0982   Epoch: 0   Global Step: 3030   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:55,322-Speed 9389.51 samples/sec   Loss 26.2631   LearningRate 0.0982   Epoch: 0   Global Step: 3040   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:56,426-Speed 9276.65 samples/sec   Loss 26.0831   LearningRate 0.0982   Epoch: 0   Global Step: 3050   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:57,515-Speed 9416.20 samples/sec   Loss 25.9044   LearningRate 0.0982   Epoch: 0   Global Step: 3060   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:58,592-Speed 9512.59 samples/sec   Loss 26.0520   LearningRate 0.0982   Epoch: 0   Global Step: 3070   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:39:59,670-Speed 9507.19 samples/sec   Loss 26.0527   LearningRate 0.0982   Epoch: 0   Global Step: 3080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:00,747-Speed 9512.21 samples/sec   Loss 26.0106   LearningRate 0.0982   Epoch: 0   Global Step: 3090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:01,835-Speed 9419.05 samples/sec   Loss 26.0056   LearningRate 0.0982   Epoch: 0   Global Step: 3100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:02,886-Speed 9745.99 samples/sec   Loss 25.8853   LearningRate 0.0981   Epoch: 0   Global Step: 3110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:04,010-Speed 9112.70 samples/sec   Loss 25.8001   LearningRate 0.0981   Epoch: 0   Global Step: 3120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:05,095-Speed 9447.08 samples/sec   Loss 25.8179   LearningRate 0.0981   Epoch: 0   Global Step: 3130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:06,184-Speed 9407.05 samples/sec   Loss 25.7573   LearningRate 0.0981   Epoch: 0   Global Step: 3140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:07,294-Speed 9228.03 samples/sec   Loss 25.7964   LearningRate 0.0981   Epoch: 0   Global Step: 3150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:08,392-Speed 9338.44 samples/sec   Loss 25.5342   LearningRate 0.0981   Epoch: 0   Global Step: 3160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:09,482-Speed 9392.22 samples/sec   Loss 25.5955   LearningRate 0.0981   Epoch: 0   Global Step: 3170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:10,587-Speed 9275.48 samples/sec   Loss 25.6632   LearningRate 0.0981   Epoch: 0   Global Step: 3180   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:40:11,668-Speed 9477.49 samples/sec   Loss 25.6035   LearningRate 0.0981   Epoch: 0   Global Step: 3190   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:40:12,736-Speed 9597.40 samples/sec   Loss 25.4692   LearningRate 0.0981   Epoch: 0   Global Step: 3200   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:40:13,827-Speed 9387.46 samples/sec   Loss 25.4972   LearningRate 0.0981   Epoch: 0   Global Step: 3210   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:40:14,929-Speed 9301.07 samples/sec   Loss 25.3767   LearningRate 0.0981   Epoch: 0   Global Step: 3220   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:40:16,005-Speed 9526.43 samples/sec   Loss 25.4692   LearningRate 0.0981   Epoch: 0   Global Step: 3230   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:40:17,070-Speed 9619.72 samples/sec   Loss 25.3869   LearningRate 0.0981   Epoch: 0   Global Step: 3240   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:40:18,167-Speed 9339.55 samples/sec   Loss 25.3993   LearningRate 0.0981   Epoch: 0   Global Step: 3250   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:40:19,212-Speed 9801.86 samples/sec   Loss 25.3644   LearningRate 0.0981   Epoch: 0   Global Step: 3260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:20,323-Speed 9225.96 samples/sec   Loss 25.3812   LearningRate 0.0981   Epoch: 0   Global Step: 3270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:21,395-Speed 9591.06 samples/sec   Loss 25.0313   LearningRate 0.0980   Epoch: 0   Global Step: 3280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:22,469-Speed 9543.98 samples/sec   Loss 25.1362   LearningRate 0.0980   Epoch: 0   Global Step: 3290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:23,554-Speed 9435.29 samples/sec   Loss 25.1974   LearningRate 0.0980   Epoch: 0   Global Step: 3300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:24,696-Speed 8972.28 samples/sec   Loss 24.9678   LearningRate 0.0980   Epoch: 0   Global Step: 3310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:25,779-Speed 9460.17 samples/sec   Loss 25.1927   LearningRate 0.0980   Epoch: 0   Global Step: 3320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:26,871-Speed 9383.44 samples/sec   Loss 24.7233   LearningRate 0.0980   Epoch: 0   Global Step: 3330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:27,935-Speed 9636.86 samples/sec   Loss 24.9191   LearningRate 0.0980   Epoch: 0   Global Step: 3340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:29,027-Speed 9380.91 samples/sec   Loss 24.8747   LearningRate 0.0980   Epoch: 0   Global Step: 3350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:40:30,099-Speed 9554.38 samples/sec   Loss 24.7252   LearningRate 0.0980   Epoch: 0   Global Step: 3360   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:40:31,181-Speed 9473.79 samples/sec   Loss 24.7704   LearningRate 0.0980   Epoch: 0   Global Step: 3370   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:40:32,242-Speed 9648.62 samples/sec   Loss 24.7138   LearningRate 0.0980   Epoch: 0   Global Step: 3380   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:40:33,352-Speed 9232.60 samples/sec   Loss 24.7365   LearningRate 0.0980   Epoch: 0   Global Step: 3390   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:40:34,475-Speed 9122.57 samples/sec   Loss 24.8114   LearningRate 0.0980   Epoch: 0   Global Step: 3400   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:40:35,526-Speed 9750.10 samples/sec   Loss 24.4197   LearningRate 0.0980   Epoch: 0   Global Step: 3410   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:40:36,593-Speed 9607.15 samples/sec   Loss 24.4058   LearningRate 0.0980   Epoch: 0   Global Step: 3420   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:40:37,627-Speed 9909.12 samples/sec   Loss 24.5961   LearningRate 0.0980   Epoch: 0   Global Step: 3430   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:40:38,689-Speed 9645.79 samples/sec   Loss 24.4074   LearningRate 0.0979   Epoch: 0   Global Step: 3440   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:40:39,742-Speed 9730.23 samples/sec   Loss 24.5136   LearningRate 0.0979   Epoch: 0   Global Step: 3450   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:40:40,835-Speed 9378.08 samples/sec   Loss 24.4123   LearningRate 0.0979   Epoch: 0   Global Step: 3460   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:40:41,976-Speed 8979.70 samples/sec   Loss 24.2460   LearningRate 0.0979   Epoch: 0   Global Step: 3470   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:40:43,039-Speed 9636.98 samples/sec   Loss 24.3612   LearningRate 0.0979   Epoch: 0   Global Step: 3480   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:40:44,127-Speed 9419.95 samples/sec   Loss 24.2599   LearningRate 0.0979   Epoch: 0   Global Step: 3490   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:40:45,215-Speed 9411.79 samples/sec   Loss 24.3556   LearningRate 0.0979   Epoch: 0   Global Step: 3500   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:40:46,322-Speed 9258.64 samples/sec   Loss 24.1862   LearningRate 0.0979   Epoch: 0   Global Step: 3510   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:40:47,394-Speed 9554.79 samples/sec   Loss 24.2493   LearningRate 0.0979   Epoch: 0   Global Step: 3520   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:40:48,490-Speed 9349.15 samples/sec   Loss 24.1878   LearningRate 0.0979   Epoch: 0   Global Step: 3530   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:40:49,556-Speed 9614.34 samples/sec   Loss 24.0111   LearningRate 0.0979   Epoch: 0   Global Step: 3540   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:40:50,627-Speed 9565.01 samples/sec   Loss 23.9228   LearningRate 0.0979   Epoch: 0   Global Step: 3550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:40:51,739-Speed 9216.69 samples/sec   Loss 23.9314   LearningRate 0.0979   Epoch: 0   Global Step: 3560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:40:52,773-Speed 9909.68 samples/sec   Loss 23.9337   LearningRate 0.0979   Epoch: 0   Global Step: 3570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:40:53,862-Speed 9402.73 samples/sec   Loss 23.9387   LearningRate 0.0979   Epoch: 0   Global Step: 3580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:40:54,973-Speed 9227.95 samples/sec   Loss 23.7981   LearningRate 0.0979   Epoch: 0   Global Step: 3590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:40:56,089-Speed 9179.20 samples/sec   Loss 23.9713   LearningRate 0.0979   Epoch: 0   Global Step: 3600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:40:57,208-Speed 9157.64 samples/sec   Loss 23.9104   LearningRate 0.0978   Epoch: 0   Global Step: 3610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:40:58,311-Speed 9293.94 samples/sec   Loss 23.7161   LearningRate 0.0978   Epoch: 0   Global Step: 3620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:40:59,401-Speed 9393.31 samples/sec   Loss 23.8549   LearningRate 0.0978   Epoch: 0   Global Step: 3630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:00,462-Speed 9659.49 samples/sec   Loss 23.7585   LearningRate 0.0978   Epoch: 0   Global Step: 3640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:01,528-Speed 9614.76 samples/sec   Loss 23.6619   LearningRate 0.0978   Epoch: 0   Global Step: 3650   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:41:02,607-Speed 9488.70 samples/sec   Loss 23.5985   LearningRate 0.0978   Epoch: 0   Global Step: 3660   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:41:03,676-Speed 9590.58 samples/sec   Loss 23.5304   LearningRate 0.0978   Epoch: 0   Global Step: 3670   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:41:04,767-Speed 9383.92 samples/sec   Loss 23.4844   LearningRate 0.0978   Epoch: 0   Global Step: 3680   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:41:05,889-Speed 9136.40 samples/sec   Loss 23.5951   LearningRate 0.0978   Epoch: 0   Global Step: 3690   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:41:06,977-Speed 9419.57 samples/sec   Loss 23.5941   LearningRate 0.0978   Epoch: 0   Global Step: 3700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:08,100-Speed 9124.25 samples/sec   Loss 23.5238   LearningRate 0.0978   Epoch: 0   Global Step: 3710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:09,226-Speed 9101.63 samples/sec   Loss 23.5394   LearningRate 0.0978   Epoch: 0   Global Step: 3720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:10,267-Speed 9842.09 samples/sec   Loss 23.4746   LearningRate 0.0978   Epoch: 0   Global Step: 3730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:11,368-Speed 9306.08 samples/sec   Loss 23.4492   LearningRate 0.0978   Epoch: 0   Global Step: 3740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:12,484-Speed 9175.36 samples/sec   Loss 23.3254   LearningRate 0.0978   Epoch: 0   Global Step: 3750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:13,575-Speed 9393.64 samples/sec   Loss 23.4848   LearningRate 0.0978   Epoch: 0   Global Step: 3760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:14,621-Speed 9797.33 samples/sec   Loss 23.4905   LearningRate 0.0978   Epoch: 0   Global Step: 3770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:15,715-Speed 9371.64 samples/sec   Loss 23.2275   LearningRate 0.0977   Epoch: 0   Global Step: 3780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:16,785-Speed 9570.87 samples/sec   Loss 23.1681   LearningRate 0.0977   Epoch: 0   Global Step: 3790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:17,875-Speed 9402.67 samples/sec   Loss 23.1944   LearningRate 0.0977   Epoch: 0   Global Step: 3800   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:41:19,000-Speed 9107.73 samples/sec   Loss 23.1811   LearningRate 0.0977   Epoch: 0   Global Step: 3810   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:41:20,097-Speed 9342.89 samples/sec   Loss 23.1209   LearningRate 0.0977   Epoch: 0   Global Step: 3820   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:41:21,153-Speed 9701.62 samples/sec   Loss 23.0708   LearningRate 0.0977   Epoch: 0   Global Step: 3830   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:41:22,232-Speed 9500.73 samples/sec   Loss 23.1382   LearningRate 0.0977   Epoch: 0   Global Step: 3840   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:41:23,310-Speed 9495.88 samples/sec   Loss 23.1086   LearningRate 0.0977   Epoch: 0   Global Step: 3850   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:41:24,452-Speed 8975.53 samples/sec   Loss 23.0812   LearningRate 0.0977   Epoch: 0   Global Step: 3860   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:41:25,592-Speed 8990.64 samples/sec   Loss 22.8504   LearningRate 0.0977   Epoch: 0   Global Step: 3870   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:41:26,750-Speed 8846.44 samples/sec   Loss 22.9490   LearningRate 0.0977   Epoch: 0   Global Step: 3880   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:41:27,832-Speed 9471.54 samples/sec   Loss 22.9126   LearningRate 0.0977   Epoch: 0   Global Step: 3890   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:41:28,903-Speed 9566.29 samples/sec   Loss 22.8571   LearningRate 0.0977   Epoch: 0   Global Step: 3900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:29,980-Speed 9510.94 samples/sec   Loss 22.7082   LearningRate 0.0977   Epoch: 0   Global Step: 3910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:31,073-Speed 9377.33 samples/sec   Loss 22.9047   LearningRate 0.0977   Epoch: 0   Global Step: 3920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:32,161-Speed 9415.93 samples/sec   Loss 22.5243   LearningRate 0.0977   Epoch: 0   Global Step: 3930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:33,249-Speed 9413.58 samples/sec   Loss 22.7221   LearningRate 0.0977   Epoch: 0   Global Step: 3940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:34,342-Speed 9369.79 samples/sec   Loss 22.4994   LearningRate 0.0976   Epoch: 0   Global Step: 3950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:35,419-Speed 9522.06 samples/sec   Loss 22.5968   LearningRate 0.0976   Epoch: 0   Global Step: 3960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:36,519-Speed 9313.16 samples/sec   Loss 22.6006   LearningRate 0.0976   Epoch: 0   Global Step: 3970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:37,628-Speed 9235.46 samples/sec   Loss 22.6227   LearningRate 0.0976   Epoch: 0   Global Step: 3980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:38,724-Speed 9348.41 samples/sec   Loss 22.5620   LearningRate 0.0976   Epoch: 0   Global Step: 3990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:41:39,815-Speed 9393.46 samples/sec   Loss 22.4115   LearningRate 0.0976   Epoch: 0   Global Step: 4000   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:42:01,886-[lfw][4000]XNorm: 18.150763
Training: 2022-04-11 11:42:01,887-[lfw][4000]Accuracy-Flip: 0.97550+-0.00633
Training: 2022-04-11 11:42:01,887-[lfw][4000]Accuracy-Highest: 0.97550
Training: 2022-04-11 11:42:27,367-[cfp_fp][4000]XNorm: 15.910640
Training: 2022-04-11 11:42:27,368-[cfp_fp][4000]Accuracy-Flip: 0.81714+-0.01902
Training: 2022-04-11 11:42:27,368-[cfp_fp][4000]Accuracy-Highest: 0.81714
Training: 2022-04-11 11:42:49,334-[agedb_30][4000]XNorm: 17.389619
Training: 2022-04-11 11:42:49,334-[agedb_30][4000]Accuracy-Flip: 0.84050+-0.02569
Training: 2022-04-11 11:42:49,334-[agedb_30][4000]Accuracy-Highest: 0.84050
Training: 2022-04-11 11:42:50,402-Speed 145.07 samples/sec   Loss 22.2854   LearningRate 0.0976   Epoch: 0   Global Step: 4010   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:42:51,432-Speed 9949.98 samples/sec   Loss 22.3653   LearningRate 0.0976   Epoch: 0   Global Step: 4020   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:42:52,497-Speed 9616.18 samples/sec   Loss 22.4593   LearningRate 0.0976   Epoch: 0   Global Step: 4030   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:42:53,579-Speed 9466.73 samples/sec   Loss 22.2937   LearningRate 0.0976   Epoch: 0   Global Step: 4040   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:42:54,715-Speed 9023.90 samples/sec   Loss 22.4776   LearningRate 0.0976   Epoch: 0   Global Step: 4050   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:42:55,794-Speed 9497.56 samples/sec   Loss 22.0867   LearningRate 0.0976   Epoch: 0   Global Step: 4060   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:42:56,910-Speed 9174.70 samples/sec   Loss 21.9746   LearningRate 0.0976   Epoch: 0   Global Step: 4070   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:42:57,947-Speed 9888.07 samples/sec   Loss 22.1232   LearningRate 0.0976   Epoch: 0   Global Step: 4080   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:42:59,030-Speed 9460.29 samples/sec   Loss 21.9763   LearningRate 0.0976   Epoch: 0   Global Step: 4090   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:43:00,099-Speed 9582.60 samples/sec   Loss 22.2258   LearningRate 0.0976   Epoch: 0   Global Step: 4100   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:43:01,198-Speed 9324.19 samples/sec   Loss 22.0691   LearningRate 0.0976   Epoch: 0   Global Step: 4110   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:43:02,290-Speed 9384.79 samples/sec   Loss 22.0069   LearningRate 0.0975   Epoch: 0   Global Step: 4120   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:43:03,374-Speed 9450.77 samples/sec   Loss 21.9752   LearningRate 0.0975   Epoch: 0   Global Step: 4130   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:43:04,483-Speed 9243.09 samples/sec   Loss 22.0534   LearningRate 0.0975   Epoch: 0   Global Step: 4140   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:43:05,569-Speed 9435.82 samples/sec   Loss 21.9970   LearningRate 0.0975   Epoch: 0   Global Step: 4150   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:43:06,629-Speed 9665.16 samples/sec   Loss 21.6982   LearningRate 0.0975   Epoch: 0   Global Step: 4160   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:43:07,733-Speed 9279.84 samples/sec   Loss 21.8319   LearningRate 0.0975   Epoch: 0   Global Step: 4170   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:43:08,795-Speed 9644.53 samples/sec   Loss 21.9921   LearningRate 0.0975   Epoch: 0   Global Step: 4180   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:43:09,875-Speed 9484.39 samples/sec   Loss 21.9641   LearningRate 0.0975   Epoch: 0   Global Step: 4190   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:43:10,926-Speed 9746.21 samples/sec   Loss 22.1073   LearningRate 0.0975   Epoch: 0   Global Step: 4200   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-11 11:43:12,000-Speed 9545.32 samples/sec   Loss 21.9464   LearningRate 0.0975   Epoch: 0   Global Step: 4210   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:43:13,082-Speed 9467.92 samples/sec   Loss 21.8877   LearningRate 0.0975   Epoch: 0   Global Step: 4220   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:43:14,182-Speed 9314.88 samples/sec   Loss 21.7187   LearningRate 0.0975   Epoch: 0   Global Step: 4230   Fp16 Grad Scale: 262144   Required: 14 hours
Training: 2022-04-11 11:43:15,245-Speed 9642.91 samples/sec   Loss 21.7246   LearningRate 0.0975   Epoch: 0   Global Step: 4240   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:16,323-Speed 9501.18 samples/sec   Loss 21.7000   LearningRate 0.0975   Epoch: 0   Global Step: 4250   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:17,425-Speed 9299.60 samples/sec   Loss 21.6902   LearningRate 0.0975   Epoch: 0   Global Step: 4260   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:18,521-Speed 9343.96 samples/sec   Loss 21.6160   LearningRate 0.0975   Epoch: 0   Global Step: 4270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:19,593-Speed 9557.25 samples/sec   Loss 21.6972   LearningRate 0.0975   Epoch: 0   Global Step: 4280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:20,679-Speed 9434.27 samples/sec   Loss 21.8595   LearningRate 0.0974   Epoch: 0   Global Step: 4290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:21,819-Speed 8991.96 samples/sec   Loss 21.4877   LearningRate 0.0974   Epoch: 0   Global Step: 4300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:22,954-Speed 9020.67 samples/sec   Loss 21.5318   LearningRate 0.0974   Epoch: 0   Global Step: 4310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:24,023-Speed 9587.70 samples/sec   Loss 21.4337   LearningRate 0.0974   Epoch: 0   Global Step: 4320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:25,145-Speed 9129.34 samples/sec   Loss 21.3612   LearningRate 0.0974   Epoch: 0   Global Step: 4330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:26,308-Speed 8811.24 samples/sec   Loss 21.4196   LearningRate 0.0974   Epoch: 0   Global Step: 4340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:27,456-Speed 8924.52 samples/sec   Loss 21.2804   LearningRate 0.0974   Epoch: 0   Global Step: 4350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:28,531-Speed 9533.75 samples/sec   Loss 21.5462   LearningRate 0.0974   Epoch: 0   Global Step: 4360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:29,628-Speed 9341.88 samples/sec   Loss 21.2936   LearningRate 0.0974   Epoch: 0   Global Step: 4370   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:30,729-Speed 9313.53 samples/sec   Loss 21.2395   LearningRate 0.0974   Epoch: 0   Global Step: 4380   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:31,843-Speed 9193.94 samples/sec   Loss 21.1103   LearningRate 0.0974   Epoch: 0   Global Step: 4390   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:32,930-Speed 9431.57 samples/sec   Loss 21.1466   LearningRate 0.0974   Epoch: 0   Global Step: 4400   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:34,036-Speed 9257.76 samples/sec   Loss 21.3131   LearningRate 0.0974   Epoch: 0   Global Step: 4410   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:35,147-Speed 9222.46 samples/sec   Loss 21.1059   LearningRate 0.0974   Epoch: 0   Global Step: 4420   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:36,257-Speed 9231.40 samples/sec   Loss 21.1085   LearningRate 0.0974   Epoch: 0   Global Step: 4430   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:37,345-Speed 9420.07 samples/sec   Loss 21.1556   LearningRate 0.0974   Epoch: 0   Global Step: 4440   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:38,412-Speed 9608.78 samples/sec   Loss 21.1984   LearningRate 0.0974   Epoch: 0   Global Step: 4450   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:39,512-Speed 9306.00 samples/sec   Loss 21.1935   LearningRate 0.0973   Epoch: 0   Global Step: 4460   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:40,570-Speed 9688.94 samples/sec   Loss 20.9696   LearningRate 0.0973   Epoch: 0   Global Step: 4470   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:41,619-Speed 9763.21 samples/sec   Loss 20.8218   LearningRate 0.0973   Epoch: 0   Global Step: 4480   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:42,694-Speed 9536.73 samples/sec   Loss 21.0455   LearningRate 0.0973   Epoch: 0   Global Step: 4490   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:43,752-Speed 9681.25 samples/sec   Loss 21.0737   LearningRate 0.0973   Epoch: 0   Global Step: 4500   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:44,811-Speed 9678.19 samples/sec   Loss 20.9992   LearningRate 0.0973   Epoch: 0   Global Step: 4510   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:45,876-Speed 9617.55 samples/sec   Loss 20.9504   LearningRate 0.0973   Epoch: 0   Global Step: 4520   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:46,966-Speed 9398.36 samples/sec   Loss 21.0031   LearningRate 0.0973   Epoch: 0   Global Step: 4530   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:48,051-Speed 9450.94 samples/sec   Loss 20.6416   LearningRate 0.0973   Epoch: 0   Global Step: 4540   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:43:49,085-Speed 9906.81 samples/sec   Loss 20.8955   LearningRate 0.0973   Epoch: 0   Global Step: 4550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:50,141-Speed 9703.82 samples/sec   Loss 20.7509   LearningRate 0.0973   Epoch: 0   Global Step: 4560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:51,243-Speed 9301.55 samples/sec   Loss 20.7338   LearningRate 0.0973   Epoch: 0   Global Step: 4570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:52,338-Speed 9359.91 samples/sec   Loss 20.6263   LearningRate 0.0973   Epoch: 0   Global Step: 4580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:53,401-Speed 9634.75 samples/sec   Loss 20.6812   LearningRate 0.0973   Epoch: 0   Global Step: 4590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:54,520-Speed 9161.08 samples/sec   Loss 20.7516   LearningRate 0.0973   Epoch: 0   Global Step: 4600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:55,583-Speed 9636.22 samples/sec   Loss 20.6207   LearningRate 0.0973   Epoch: 0   Global Step: 4610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:56,643-Speed 9666.24 samples/sec   Loss 20.5009   LearningRate 0.0973   Epoch: 0   Global Step: 4620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:57,695-Speed 9739.87 samples/sec   Loss 20.5345   LearningRate 0.0972   Epoch: 0   Global Step: 4630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:58,773-Speed 9501.43 samples/sec   Loss 20.6136   LearningRate 0.0972   Epoch: 0   Global Step: 4640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:43:59,850-Speed 9510.92 samples/sec   Loss 20.4605   LearningRate 0.0972   Epoch: 0   Global Step: 4650   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:00,928-Speed 9514.52 samples/sec   Loss 20.5138   LearningRate 0.0972   Epoch: 0   Global Step: 4660   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:02,038-Speed 9225.31 samples/sec   Loss 20.5248   LearningRate 0.0972   Epoch: 0   Global Step: 4670   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:03,145-Speed 9256.52 samples/sec   Loss 20.5428   LearningRate 0.0972   Epoch: 0   Global Step: 4680   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:04,241-Speed 9344.41 samples/sec   Loss 20.4880   LearningRate 0.0972   Epoch: 0   Global Step: 4690   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:05,300-Speed 9682.11 samples/sec   Loss 20.5168   LearningRate 0.0972   Epoch: 0   Global Step: 4700   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:06,358-Speed 9678.90 samples/sec   Loss 20.5719   LearningRate 0.0972   Epoch: 0   Global Step: 4710   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:07,461-Speed 9296.54 samples/sec   Loss 20.4287   LearningRate 0.0972   Epoch: 0   Global Step: 4720   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:08,544-Speed 9459.21 samples/sec   Loss 20.2676   LearningRate 0.0972   Epoch: 0   Global Step: 4730   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:09,611-Speed 9607.57 samples/sec   Loss 20.3449   LearningRate 0.0972   Epoch: 0   Global Step: 4740   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:10,679-Speed 9594.52 samples/sec   Loss 20.2511   LearningRate 0.0972   Epoch: 0   Global Step: 4750   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:11,755-Speed 9520.25 samples/sec   Loss 20.2724   LearningRate 0.0972   Epoch: 0   Global Step: 4760   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:12,863-Speed 9247.80 samples/sec   Loss 20.1414   LearningRate 0.0972   Epoch: 0   Global Step: 4770   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:13,956-Speed 9366.95 samples/sec   Loss 20.3054   LearningRate 0.0972   Epoch: 0   Global Step: 4780   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:15,124-Speed 8772.62 samples/sec   Loss 20.2283   LearningRate 0.0972   Epoch: 0   Global Step: 4790   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:16,199-Speed 9532.45 samples/sec   Loss 19.9648   LearningRate 0.0971   Epoch: 0   Global Step: 4800   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:17,280-Speed 9477.86 samples/sec   Loss 20.1779   LearningRate 0.0971   Epoch: 0   Global Step: 4810   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:18,346-Speed 9609.47 samples/sec   Loss 20.0078   LearningRate 0.0971   Epoch: 0   Global Step: 4820   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:19,456-Speed 9236.22 samples/sec   Loss 20.0908   LearningRate 0.0971   Epoch: 0   Global Step: 4830   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:20,550-Speed 9365.72 samples/sec   Loss 20.1457   LearningRate 0.0971   Epoch: 0   Global Step: 4840   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:21,619-Speed 9582.12 samples/sec   Loss 20.0080   LearningRate 0.0971   Epoch: 0   Global Step: 4850   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:22,711-Speed 9386.65 samples/sec   Loss 20.1607   LearningRate 0.0971   Epoch: 0   Global Step: 4860   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:23,770-Speed 9676.85 samples/sec   Loss 19.9143   LearningRate 0.0971   Epoch: 0   Global Step: 4870   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:24,947-Speed 8700.89 samples/sec   Loss 19.8807   LearningRate 0.0971   Epoch: 0   Global Step: 4880   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:26,056-Speed 9241.87 samples/sec   Loss 20.0876   LearningRate 0.0971   Epoch: 0   Global Step: 4890   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:27,160-Speed 9274.12 samples/sec   Loss 20.0098   LearningRate 0.0971   Epoch: 0   Global Step: 4900   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:28,223-Speed 9645.08 samples/sec   Loss 20.0129   LearningRate 0.0971   Epoch: 0   Global Step: 4910   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:29,280-Speed 9696.33 samples/sec   Loss 19.8824   LearningRate 0.0971   Epoch: 0   Global Step: 4920   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:30,342-Speed 9646.94 samples/sec   Loss 19.9389   LearningRate 0.0971   Epoch: 0   Global Step: 4930   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:31,442-Speed 9315.80 samples/sec   Loss 19.8909   LearningRate 0.0971   Epoch: 0   Global Step: 4940   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:32,547-Speed 9281.08 samples/sec   Loss 19.9501   LearningRate 0.0971   Epoch: 0   Global Step: 4950   Fp16 Grad Scale: 524288   Required: 13 hours
Training: 2022-04-11 11:44:33,644-Speed 9341.99 samples/sec   Loss 19.7934   LearningRate 0.0971   Epoch: 0   Global Step: 4960   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:34,735-Speed 9388.10 samples/sec   Loss 19.7875   LearningRate 0.0970   Epoch: 0   Global Step: 4970   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:35,788-Speed 9735.73 samples/sec   Loss 19.7300   LearningRate 0.0970   Epoch: 0   Global Step: 4980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:44:36,877-Speed 9406.79 samples/sec   Loss 19.6381   LearningRate 0.0970   Epoch: 0   Global Step: 4990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:44:38,000-Speed 9123.06 samples/sec   Loss 19.7158   LearningRate 0.0970   Epoch: 0   Global Step: 5000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:44:39,072-Speed 9561.97 samples/sec   Loss 19.7361   LearningRate 0.0970   Epoch: 0   Global Step: 5010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:44:40,139-Speed 9602.95 samples/sec   Loss 19.6004   LearningRate 0.0970   Epoch: 0   Global Step: 5020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:44:41,232-Speed 9368.34 samples/sec   Loss 19.6818   LearningRate 0.0970   Epoch: 0   Global Step: 5030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:44:42,329-Speed 9345.09 samples/sec   Loss 19.6162   LearningRate 0.0970   Epoch: 0   Global Step: 5040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:44:43,389-Speed 9660.81 samples/sec   Loss 19.6476   LearningRate 0.0970   Epoch: 0   Global Step: 5050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:44:44,526-Speed 9007.59 samples/sec   Loss 19.5601   LearningRate 0.0970   Epoch: 0   Global Step: 5060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:44:45,656-Speed 9066.39 samples/sec   Loss 19.4445   LearningRate 0.0970   Epoch: 0   Global Step: 5070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:44:46,749-Speed 9381.76 samples/sec   Loss 19.5585   LearningRate 0.0970   Epoch: 0   Global Step: 5080   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:47,817-Speed 9587.81 samples/sec   Loss 19.5550   LearningRate 0.0970   Epoch: 0   Global Step: 5090   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:48,881-Speed 9632.19 samples/sec   Loss 19.6959   LearningRate 0.0970   Epoch: 0   Global Step: 5100   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:49,956-Speed 9532.71 samples/sec   Loss 19.4494   LearningRate 0.0970   Epoch: 0   Global Step: 5110   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:51,046-Speed 9407.53 samples/sec   Loss 19.4698   LearningRate 0.0970   Epoch: 0   Global Step: 5120   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:52,174-Speed 9077.77 samples/sec   Loss 19.4328   LearningRate 0.0970   Epoch: 0   Global Step: 5130   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:53,236-Speed 9652.72 samples/sec   Loss 19.3848   LearningRate 0.0969   Epoch: 0   Global Step: 5140   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:54,311-Speed 9526.48 samples/sec   Loss 19.4890   LearningRate 0.0969   Epoch: 0   Global Step: 5150   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:55,401-Speed 9401.04 samples/sec   Loss 19.4728   LearningRate 0.0969   Epoch: 0   Global Step: 5160   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:56,492-Speed 9393.68 samples/sec   Loss 19.3903   LearningRate 0.0969   Epoch: 0   Global Step: 5170   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:57,539-Speed 9780.88 samples/sec   Loss 19.4691   LearningRate 0.0969   Epoch: 0   Global Step: 5180   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:58,633-Speed 9371.86 samples/sec   Loss 19.1470   LearningRate 0.0969   Epoch: 0   Global Step: 5190   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:44:59,731-Speed 9330.43 samples/sec   Loss 19.3418   LearningRate 0.0969   Epoch: 0   Global Step: 5200   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:00,863-Speed 9045.81 samples/sec   Loss 19.1907   LearningRate 0.0969   Epoch: 0   Global Step: 5210   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:01,925-Speed 9658.06 samples/sec   Loss 19.2618   LearningRate 0.0969   Epoch: 0   Global Step: 5220   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:03,033-Speed 9246.90 samples/sec   Loss 19.2372   LearningRate 0.0969   Epoch: 0   Global Step: 5230   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:04,140-Speed 9255.33 samples/sec   Loss 19.2026   LearningRate 0.0969   Epoch: 0   Global Step: 5240   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:05,216-Speed 9518.54 samples/sec   Loss 19.1667   LearningRate 0.0969   Epoch: 0   Global Step: 5250   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:06,295-Speed 9496.13 samples/sec   Loss 19.0466   LearningRate 0.0969   Epoch: 0   Global Step: 5260   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:07,396-Speed 9305.96 samples/sec   Loss 18.9775   LearningRate 0.0969   Epoch: 0   Global Step: 5270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:45:08,490-Speed 9368.65 samples/sec   Loss 19.2514   LearningRate 0.0969   Epoch: 0   Global Step: 5280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:45:09,578-Speed 9417.71 samples/sec   Loss 19.1179   LearningRate 0.0969   Epoch: 0   Global Step: 5290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:45:10,689-Speed 9221.53 samples/sec   Loss 18.9584   LearningRate 0.0968   Epoch: 0   Global Step: 5300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:45:11,781-Speed 9390.33 samples/sec   Loss 19.0272   LearningRate 0.0968   Epoch: 0   Global Step: 5310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:45:12,854-Speed 9547.31 samples/sec   Loss 19.0288   LearningRate 0.0968   Epoch: 0   Global Step: 5320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:45:13,990-Speed 9018.60 samples/sec   Loss 18.8454   LearningRate 0.0968   Epoch: 0   Global Step: 5330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:45:15,109-Speed 9151.00 samples/sec   Loss 18.9004   LearningRate 0.0968   Epoch: 0   Global Step: 5340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:45:16,156-Speed 9789.00 samples/sec   Loss 18.9816   LearningRate 0.0968   Epoch: 0   Global Step: 5350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:45:17,243-Speed 9428.96 samples/sec   Loss 18.9874   LearningRate 0.0968   Epoch: 0   Global Step: 5360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:45:18,355-Speed 9209.02 samples/sec   Loss 18.9047   LearningRate 0.0968   Epoch: 0   Global Step: 5370   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:19,423-Speed 9598.08 samples/sec   Loss 18.7873   LearningRate 0.0968   Epoch: 0   Global Step: 5380   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:20,507-Speed 9449.40 samples/sec   Loss 18.7929   LearningRate 0.0968   Epoch: 0   Global Step: 5390   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:21,609-Speed 9302.10 samples/sec   Loss 18.8171   LearningRate 0.0968   Epoch: 0   Global Step: 5400   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:22,678-Speed 9585.54 samples/sec   Loss 18.8697   LearningRate 0.0968   Epoch: 0   Global Step: 5410   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:23,719-Speed 9839.37 samples/sec   Loss 18.7860   LearningRate 0.0968   Epoch: 0   Global Step: 5420   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:24,812-Speed 9377.17 samples/sec   Loss 18.7701   LearningRate 0.0968   Epoch: 0   Global Step: 5430   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:25,892-Speed 9483.29 samples/sec   Loss 18.7940   LearningRate 0.0968   Epoch: 0   Global Step: 5440   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:26,996-Speed 9285.09 samples/sec   Loss 18.5525   LearningRate 0.0968   Epoch: 0   Global Step: 5450   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:28,061-Speed 9618.28 samples/sec   Loss 18.8522   LearningRate 0.0968   Epoch: 0   Global Step: 5460   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:29,137-Speed 9522.48 samples/sec   Loss 18.7236   LearningRate 0.0967   Epoch: 0   Global Step: 5470   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:30,202-Speed 9625.34 samples/sec   Loss 18.6760   LearningRate 0.0967   Epoch: 0   Global Step: 5480   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:31,285-Speed 9459.82 samples/sec   Loss 18.7610   LearningRate 0.0967   Epoch: 0   Global Step: 5490   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:32,352-Speed 9608.46 samples/sec   Loss 18.5223   LearningRate 0.0967   Epoch: 0   Global Step: 5500   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:33,436-Speed 9445.83 samples/sec   Loss 18.7246   LearningRate 0.0967   Epoch: 0   Global Step: 5510   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:34,518-Speed 9471.39 samples/sec   Loss 18.7294   LearningRate 0.0967   Epoch: 0   Global Step: 5520   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:35,684-Speed 8788.38 samples/sec   Loss 18.6238   LearningRate 0.0967   Epoch: 0   Global Step: 5530   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:36,790-Speed 9264.41 samples/sec   Loss 18.4025   LearningRate 0.0967   Epoch: 0   Global Step: 5540   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:37,876-Speed 9433.10 samples/sec   Loss 18.5196   LearningRate 0.0967   Epoch: 0   Global Step: 5550   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:38,960-Speed 9448.17 samples/sec   Loss 18.4728   LearningRate 0.0967   Epoch: 0   Global Step: 5560   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:39,989-Speed 9958.66 samples/sec   Loss 18.5642   LearningRate 0.0967   Epoch: 0   Global Step: 5570   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:41,126-Speed 9010.03 samples/sec   Loss 18.5164   LearningRate 0.0967   Epoch: 0   Global Step: 5580   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:42,184-Speed 9688.44 samples/sec   Loss 18.3858   LearningRate 0.0967   Epoch: 0   Global Step: 5590   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:43,248-Speed 9626.62 samples/sec   Loss 18.3577   LearningRate 0.0967   Epoch: 0   Global Step: 5600   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:44,354-Speed 9260.80 samples/sec   Loss 18.5251   LearningRate 0.0967   Epoch: 0   Global Step: 5610   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:45,410-Speed 9705.51 samples/sec   Loss 18.4723   LearningRate 0.0967   Epoch: 0   Global Step: 5620   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:46,462-Speed 9737.46 samples/sec   Loss 18.2085   LearningRate 0.0967   Epoch: 0   Global Step: 5630   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:47,542-Speed 9489.51 samples/sec   Loss 18.2877   LearningRate 0.0966   Epoch: 0   Global Step: 5640   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:48,604-Speed 9642.10 samples/sec   Loss 18.2462   LearningRate 0.0966   Epoch: 0   Global Step: 5650   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:49,660-Speed 9708.98 samples/sec   Loss 18.5288   LearningRate 0.0966   Epoch: 0   Global Step: 5660   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:50,742-Speed 9477.43 samples/sec   Loss 18.4966   LearningRate 0.0966   Epoch: 0   Global Step: 5670   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:51,817-Speed 9529.50 samples/sec   Loss 18.1736   LearningRate 0.0966   Epoch: 0   Global Step: 5680   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:52,908-Speed 9386.44 samples/sec   Loss 18.2846   LearningRate 0.0966   Epoch: 0   Global Step: 5690   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:53,985-Speed 9516.09 samples/sec   Loss 18.3580   LearningRate 0.0966   Epoch: 0   Global Step: 5700   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:45:55,071-Speed 9435.13 samples/sec   Loss 18.2558   LearningRate 0.0966   Epoch: 0   Global Step: 5710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:45:56,158-Speed 9429.41 samples/sec   Loss 17.9494   LearningRate 0.0966   Epoch: 0   Global Step: 5720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:45:57,258-Speed 9311.03 samples/sec   Loss 18.2213   LearningRate 0.0966   Epoch: 0   Global Step: 5730   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:45:58,325-Speed 9604.50 samples/sec   Loss 18.2671   LearningRate 0.0966   Epoch: 0   Global Step: 5740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:45:59,451-Speed 9091.87 samples/sec   Loss 18.0655   LearningRate 0.0966   Epoch: 0   Global Step: 5750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:46:00,549-Speed 9338.11 samples/sec   Loss 18.1874   LearningRate 0.0966   Epoch: 0   Global Step: 5760   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:46:01,652-Speed 9288.04 samples/sec   Loss 18.1885   LearningRate 0.0966   Epoch: 0   Global Step: 5770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:46:02,749-Speed 9336.62 samples/sec   Loss 18.2172   LearningRate 0.0966   Epoch: 0   Global Step: 5780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:46:03,842-Speed 9375.49 samples/sec   Loss 18.1127   LearningRate 0.0966   Epoch: 0   Global Step: 5790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:46:04,941-Speed 9322.06 samples/sec   Loss 18.0833   LearningRate 0.0966   Epoch: 0   Global Step: 5800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:46:06,006-Speed 9621.28 samples/sec   Loss 18.2995   LearningRate 0.0965   Epoch: 0   Global Step: 5810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:46:07,075-Speed 9588.68 samples/sec   Loss 18.0242   LearningRate 0.0965   Epoch: 0   Global Step: 5820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:46:08,120-Speed 9810.07 samples/sec   Loss 18.0551   LearningRate 0.0965   Epoch: 0   Global Step: 5830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:46:09,250-Speed 9067.31 samples/sec   Loss 17.9733   LearningRate 0.0965   Epoch: 0   Global Step: 5840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:46:10,353-Speed 9290.91 samples/sec   Loss 17.9848   LearningRate 0.0965   Epoch: 0   Global Step: 5850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:46:11,419-Speed 9605.88 samples/sec   Loss 18.1109   LearningRate 0.0965   Epoch: 0   Global Step: 5860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:46:12,474-Speed 9708.32 samples/sec   Loss 17.9610   LearningRate 0.0965   Epoch: 0   Global Step: 5870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:46:13,514-Speed 9854.96 samples/sec   Loss 17.9215   LearningRate 0.0965   Epoch: 0   Global Step: 5880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:46:14,587-Speed 9545.04 samples/sec   Loss 18.1456   LearningRate 0.0965   Epoch: 0   Global Step: 5890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:46:15,677-Speed 9400.11 samples/sec   Loss 18.1284   LearningRate 0.0965   Epoch: 0   Global Step: 5900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:46:16,792-Speed 9191.32 samples/sec   Loss 18.0347   LearningRate 0.0965   Epoch: 0   Global Step: 5910   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:46:17,860-Speed 9591.86 samples/sec   Loss 17.8141   LearningRate 0.0965   Epoch: 0   Global Step: 5920   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:46:18,942-Speed 9471.31 samples/sec   Loss 17.7210   LearningRate 0.0965   Epoch: 0   Global Step: 5930   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:46:20,021-Speed 9493.01 samples/sec   Loss 17.7901   LearningRate 0.0965   Epoch: 0   Global Step: 5940   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:46:21,064-Speed 9830.80 samples/sec   Loss 17.9212   LearningRate 0.0965   Epoch: 0   Global Step: 5950   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:46:22,114-Speed 9759.04 samples/sec   Loss 17.9150   LearningRate 0.0965   Epoch: 0   Global Step: 5960   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:46:23,166-Speed 9736.27 samples/sec   Loss 17.8616   LearningRate 0.0965   Epoch: 0   Global Step: 5970   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:46:24,228-Speed 9648.05 samples/sec   Loss 17.6481   LearningRate 0.0964   Epoch: 0   Global Step: 5980   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:46:25,327-Speed 9323.36 samples/sec   Loss 17.6401   LearningRate 0.0964   Epoch: 0   Global Step: 5990   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:46:26,464-Speed 9014.60 samples/sec   Loss 17.7010   LearningRate 0.0964   Epoch: 0   Global Step: 6000   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:46:48,276-[lfw][6000]XNorm: 16.679557
Training: 2022-04-11 11:46:48,277-[lfw][6000]Accuracy-Flip: 0.98383+-0.00606
Training: 2022-04-11 11:46:48,277-[lfw][6000]Accuracy-Highest: 0.98383
Training: 2022-04-11 11:47:13,524-[cfp_fp][6000]XNorm: 14.667061
Training: 2022-04-11 11:47:13,525-[cfp_fp][6000]Accuracy-Flip: 0.84529+-0.01912
Training: 2022-04-11 11:47:13,525-[cfp_fp][6000]Accuracy-Highest: 0.84529
Training: 2022-04-11 11:47:35,305-[agedb_30][6000]XNorm: 16.206444
Training: 2022-04-11 11:47:35,306-[agedb_30][6000]Accuracy-Flip: 0.87917+-0.01664
Training: 2022-04-11 11:47:35,307-[agedb_30][6000]Accuracy-Highest: 0.87917
Training: 2022-04-11 11:47:36,385-Speed 146.45 samples/sec   Loss 17.5008   LearningRate 0.0964   Epoch: 0   Global Step: 6010   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:47:37,452-Speed 9598.74 samples/sec   Loss 17.7172   LearningRate 0.0964   Epoch: 0   Global Step: 6020   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:47:38,524-Speed 9556.98 samples/sec   Loss 17.8318   LearningRate 0.0964   Epoch: 0   Global Step: 6030   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:47:39,582-Speed 9684.99 samples/sec   Loss 17.7027   LearningRate 0.0964   Epoch: 0   Global Step: 6040   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:47:40,686-Speed 9281.07 samples/sec   Loss 17.6464   LearningRate 0.0964   Epoch: 0   Global Step: 6050   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:47:41,746-Speed 9666.90 samples/sec   Loss 17.6286   LearningRate 0.0964   Epoch: 0   Global Step: 6060   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:47:42,799-Speed 9724.41 samples/sec   Loss 17.6727   LearningRate 0.0964   Epoch: 0   Global Step: 6070   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:47:43,873-Speed 9545.52 samples/sec   Loss 17.6142   LearningRate 0.0964   Epoch: 0   Global Step: 6080   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:47:44,999-Speed 9098.83 samples/sec   Loss 17.6770   LearningRate 0.0964   Epoch: 0   Global Step: 6090   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:47:46,080-Speed 9475.74 samples/sec   Loss 17.5960   LearningRate 0.0964   Epoch: 0   Global Step: 6100   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:47:47,175-Speed 9360.01 samples/sec   Loss 17.5396   LearningRate 0.0964   Epoch: 0   Global Step: 6110   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:47:48,270-Speed 9362.37 samples/sec   Loss 17.5554   LearningRate 0.0964   Epoch: 0   Global Step: 6120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:47:49,351-Speed 9474.00 samples/sec   Loss 17.5602   LearningRate 0.0964   Epoch: 0   Global Step: 6130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:47:50,431-Speed 9486.83 samples/sec   Loss 17.4436   LearningRate 0.0964   Epoch: 0   Global Step: 6140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:47:51,538-Speed 9253.54 samples/sec   Loss 17.4730   LearningRate 0.0963   Epoch: 0   Global Step: 6150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:47:52,692-Speed 8877.83 samples/sec   Loss 17.4105   LearningRate 0.0963   Epoch: 0   Global Step: 6160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:47:53,753-Speed 9659.82 samples/sec   Loss 17.3556   LearningRate 0.0963   Epoch: 0   Global Step: 6170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:47:54,839-Speed 9437.72 samples/sec   Loss 17.4628   LearningRate 0.0963   Epoch: 0   Global Step: 6180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:47:55,936-Speed 9331.70 samples/sec   Loss 17.3087   LearningRate 0.0963   Epoch: 0   Global Step: 6190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:47:57,003-Speed 9606.91 samples/sec   Loss 17.4685   LearningRate 0.0963   Epoch: 0   Global Step: 6200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:47:58,082-Speed 9501.80 samples/sec   Loss 17.5520   LearningRate 0.0963   Epoch: 0   Global Step: 6210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:47:59,150-Speed 9588.23 samples/sec   Loss 17.4761   LearningRate 0.0963   Epoch: 0   Global Step: 6220   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:00,274-Speed 9120.17 samples/sec   Loss 17.4283   LearningRate 0.0963   Epoch: 0   Global Step: 6230   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:01,357-Speed 9461.98 samples/sec   Loss 17.1740   LearningRate 0.0963   Epoch: 0   Global Step: 6240   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:02,436-Speed 9493.01 samples/sec   Loss 17.4032   LearningRate 0.0963   Epoch: 0   Global Step: 6250   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:03,534-Speed 9327.02 samples/sec   Loss 17.1687   LearningRate 0.0963   Epoch: 0   Global Step: 6260   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:04,615-Speed 9481.43 samples/sec   Loss 17.2191   LearningRate 0.0963   Epoch: 0   Global Step: 6270   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:05,733-Speed 9167.81 samples/sec   Loss 17.2318   LearningRate 0.0963   Epoch: 0   Global Step: 6280   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:06,824-Speed 9387.61 samples/sec   Loss 17.3612   LearningRate 0.0963   Epoch: 0   Global Step: 6290   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:07,931-Speed 9264.65 samples/sec   Loss 17.2101   LearningRate 0.0963   Epoch: 0   Global Step: 6300   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:09,039-Speed 9242.54 samples/sec   Loss 17.1371   LearningRate 0.0963   Epoch: 0   Global Step: 6310   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:10,142-Speed 9292.87 samples/sec   Loss 17.1368   LearningRate 0.0962   Epoch: 0   Global Step: 6320   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:11,264-Speed 9126.10 samples/sec   Loss 17.3025   LearningRate 0.0962   Epoch: 0   Global Step: 6330   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:12,388-Speed 9120.82 samples/sec   Loss 17.2962   LearningRate 0.0962   Epoch: 0   Global Step: 6340   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:13,521-Speed 9037.21 samples/sec   Loss 17.1724   LearningRate 0.0962   Epoch: 0   Global Step: 6350   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:14,577-Speed 9703.90 samples/sec   Loss 17.0951   LearningRate 0.0962   Epoch: 0   Global Step: 6360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:15,655-Speed 9509.80 samples/sec   Loss 17.2601   LearningRate 0.0962   Epoch: 0   Global Step: 6370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:16,709-Speed 9720.56 samples/sec   Loss 17.1461   LearningRate 0.0962   Epoch: 0   Global Step: 6380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:17,796-Speed 9423.53 samples/sec   Loss 17.1726   LearningRate 0.0962   Epoch: 0   Global Step: 6390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:18,886-Speed 9406.45 samples/sec   Loss 17.1888   LearningRate 0.0962   Epoch: 0   Global Step: 6400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:19,937-Speed 9747.20 samples/sec   Loss 16.9550   LearningRate 0.0962   Epoch: 0   Global Step: 6410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:20,997-Speed 9671.61 samples/sec   Loss 17.1542   LearningRate 0.0962   Epoch: 0   Global Step: 6420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:22,094-Speed 9338.55 samples/sec   Loss 17.1342   LearningRate 0.0962   Epoch: 0   Global Step: 6430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:23,174-Speed 9489.02 samples/sec   Loss 17.0609   LearningRate 0.0962   Epoch: 0   Global Step: 6440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:24,229-Speed 9711.99 samples/sec   Loss 16.8454   LearningRate 0.0962   Epoch: 0   Global Step: 6450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:25,335-Speed 9263.48 samples/sec   Loss 17.1135   LearningRate 0.0962   Epoch: 0   Global Step: 6460   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:26,438-Speed 9289.38 samples/sec   Loss 17.0617   LearningRate 0.0962   Epoch: 0   Global Step: 6470   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:27,499-Speed 9654.36 samples/sec   Loss 17.1604   LearningRate 0.0962   Epoch: 0   Global Step: 6480   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:28,577-Speed 9510.55 samples/sec   Loss 17.0028   LearningRate 0.0961   Epoch: 0   Global Step: 6490   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:29,672-Speed 9353.09 samples/sec   Loss 16.8402   LearningRate 0.0961   Epoch: 0   Global Step: 6500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:30,733-Speed 9658.50 samples/sec   Loss 16.9588   LearningRate 0.0961   Epoch: 0   Global Step: 6510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:31,800-Speed 9596.09 samples/sec   Loss 17.0587   LearningRate 0.0961   Epoch: 0   Global Step: 6520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:32,854-Speed 9727.62 samples/sec   Loss 16.8722   LearningRate 0.0961   Epoch: 0   Global Step: 6530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:33,899-Speed 9797.89 samples/sec   Loss 16.7385   LearningRate 0.0961   Epoch: 0   Global Step: 6540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:34,945-Speed 9800.53 samples/sec   Loss 16.9695   LearningRate 0.0961   Epoch: 0   Global Step: 6550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:36,066-Speed 9139.18 samples/sec   Loss 16.9160   LearningRate 0.0961   Epoch: 0   Global Step: 6560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:37,147-Speed 9485.05 samples/sec   Loss 16.8015   LearningRate 0.0961   Epoch: 0   Global Step: 6570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:38,226-Speed 9490.39 samples/sec   Loss 16.9650   LearningRate 0.0961   Epoch: 0   Global Step: 6580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:39,343-Speed 9180.98 samples/sec   Loss 16.7433   LearningRate 0.0961   Epoch: 0   Global Step: 6590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:40,428-Speed 9436.42 samples/sec   Loss 16.7425   LearningRate 0.0961   Epoch: 0   Global Step: 6600   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:41,508-Speed 9493.10 samples/sec   Loss 16.9101   LearningRate 0.0961   Epoch: 0   Global Step: 6610   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:42,624-Speed 9178.37 samples/sec   Loss 16.7501   LearningRate 0.0961   Epoch: 0   Global Step: 6620   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:43,708-Speed 9451.55 samples/sec   Loss 16.7721   LearningRate 0.0961   Epoch: 0   Global Step: 6630   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:48:44,827-Speed 9152.68 samples/sec   Loss 16.7537   LearningRate 0.0961   Epoch: 0   Global Step: 6640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:48:45,933-Speed 9267.26 samples/sec   Loss 16.7479   LearningRate 0.0961   Epoch: 0   Global Step: 6650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:48:47,007-Speed 9534.24 samples/sec   Loss 16.8425   LearningRate 0.0960   Epoch: 0   Global Step: 6660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:48:48,075-Speed 9603.63 samples/sec   Loss 16.8204   LearningRate 0.0960   Epoch: 0   Global Step: 6670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:48:49,113-Speed 9864.46 samples/sec   Loss 16.7974   LearningRate 0.0960   Epoch: 0   Global Step: 6680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:48:50,151-Speed 9875.94 samples/sec   Loss 16.7632   LearningRate 0.0960   Epoch: 0   Global Step: 6690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:48:51,237-Speed 9431.68 samples/sec   Loss 16.9259   LearningRate 0.0960   Epoch: 0   Global Step: 6700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:48:52,331-Speed 9367.61 samples/sec   Loss 16.8108   LearningRate 0.0960   Epoch: 0   Global Step: 6710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:48:53,459-Speed 9085.17 samples/sec   Loss 16.5744   LearningRate 0.0960   Epoch: 0   Global Step: 6720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:48:54,571-Speed 9216.09 samples/sec   Loss 16.6875   LearningRate 0.0960   Epoch: 0   Global Step: 6730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:48:55,665-Speed 9368.37 samples/sec   Loss 16.5989   LearningRate 0.0960   Epoch: 0   Global Step: 6740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:56,812-Speed 8931.26 samples/sec   Loss 16.6209   LearningRate 0.0960   Epoch: 0   Global Step: 6750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:57,877-Speed 9617.80 samples/sec   Loss 16.5377   LearningRate 0.0960   Epoch: 0   Global Step: 6760   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:48:58,965-Speed 9421.55 samples/sec   Loss 16.5793   LearningRate 0.0960   Epoch: 0   Global Step: 6770   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:00,017-Speed 9737.60 samples/sec   Loss 16.4705   LearningRate 0.0960   Epoch: 0   Global Step: 6780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:01,054-Speed 9881.72 samples/sec   Loss 16.5168   LearningRate 0.0960   Epoch: 0   Global Step: 6790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:02,134-Speed 9489.14 samples/sec   Loss 16.4327   LearningRate 0.0960   Epoch: 0   Global Step: 6800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:03,210-Speed 9516.38 samples/sec   Loss 16.6456   LearningRate 0.0960   Epoch: 0   Global Step: 6810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:04,306-Speed 9353.23 samples/sec   Loss 16.4754   LearningRate 0.0960   Epoch: 0   Global Step: 6820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:05,421-Speed 9183.16 samples/sec   Loss 16.5452   LearningRate 0.0959   Epoch: 0   Global Step: 6830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:06,523-Speed 9299.63 samples/sec   Loss 16.3983   LearningRate 0.0959   Epoch: 0   Global Step: 6840   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:07,596-Speed 9553.34 samples/sec   Loss 16.4772   LearningRate 0.0959   Epoch: 0   Global Step: 6850   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:08,709-Speed 9207.90 samples/sec   Loss 16.6114   LearningRate 0.0959   Epoch: 0   Global Step: 6860   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:09,800-Speed 9391.37 samples/sec   Loss 16.2884   LearningRate 0.0959   Epoch: 0   Global Step: 6870   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:10,902-Speed 9294.47 samples/sec   Loss 16.4442   LearningRate 0.0959   Epoch: 0   Global Step: 6880   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:11,985-Speed 9460.57 samples/sec   Loss 16.4762   LearningRate 0.0959   Epoch: 0   Global Step: 6890   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:13,076-Speed 9395.34 samples/sec   Loss 16.6455   LearningRate 0.0959   Epoch: 0   Global Step: 6900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:14,172-Speed 9346.61 samples/sec   Loss 16.4600   LearningRate 0.0959   Epoch: 0   Global Step: 6910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:15,238-Speed 9607.89 samples/sec   Loss 16.5215   LearningRate 0.0959   Epoch: 0   Global Step: 6920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:16,359-Speed 9139.01 samples/sec   Loss 16.1868   LearningRate 0.0959   Epoch: 0   Global Step: 6930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:17,448-Speed 9413.73 samples/sec   Loss 16.3410   LearningRate 0.0959   Epoch: 0   Global Step: 6940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:18,549-Speed 9303.98 samples/sec   Loss 16.2221   LearningRate 0.0959   Epoch: 0   Global Step: 6950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:19,657-Speed 9248.72 samples/sec   Loss 16.5358   LearningRate 0.0959   Epoch: 0   Global Step: 6960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:20,763-Speed 9261.76 samples/sec   Loss 16.3804   LearningRate 0.0959   Epoch: 0   Global Step: 6970   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:21,837-Speed 9539.47 samples/sec   Loss 16.4595   LearningRate 0.0959   Epoch: 0   Global Step: 6980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:22,936-Speed 9327.36 samples/sec   Loss 16.4410   LearningRate 0.0959   Epoch: 0   Global Step: 6990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:24,019-Speed 9458.17 samples/sec   Loss 16.4244   LearningRate 0.0959   Epoch: 0   Global Step: 7000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:25,083-Speed 9634.37 samples/sec   Loss 16.2831   LearningRate 0.0958   Epoch: 0   Global Step: 7010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:26,158-Speed 9531.21 samples/sec   Loss 16.2303   LearningRate 0.0958   Epoch: 0   Global Step: 7020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:27,239-Speed 9474.68 samples/sec   Loss 16.4015   LearningRate 0.0958   Epoch: 0   Global Step: 7030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:28,339-Speed 9317.98 samples/sec   Loss 16.3464   LearningRate 0.0958   Epoch: 0   Global Step: 7040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:29,407-Speed 9592.53 samples/sec   Loss 16.4169   LearningRate 0.0958   Epoch: 0   Global Step: 7050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:30,486-Speed 9492.65 samples/sec   Loss 16.4705   LearningRate 0.0958   Epoch: 0   Global Step: 7060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:31,580-Speed 9365.36 samples/sec   Loss 16.2408   LearningRate 0.0958   Epoch: 0   Global Step: 7070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:32,676-Speed 9357.75 samples/sec   Loss 16.1687   LearningRate 0.0958   Epoch: 0   Global Step: 7080   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:33,790-Speed 9199.89 samples/sec   Loss 16.3010   LearningRate 0.0958   Epoch: 0   Global Step: 7090   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:34,866-Speed 9518.49 samples/sec   Loss 16.3324   LearningRate 0.0958   Epoch: 0   Global Step: 7100   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:35,968-Speed 9296.22 samples/sec   Loss 16.1534   LearningRate 0.0958   Epoch: 0   Global Step: 7110   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:37,062-Speed 9365.04 samples/sec   Loss 16.1953   LearningRate 0.0958   Epoch: 0   Global Step: 7120   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:38,172-Speed 9231.29 samples/sec   Loss 16.2129   LearningRate 0.0958   Epoch: 0   Global Step: 7130   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:39,239-Speed 9617.85 samples/sec   Loss 16.2317   LearningRate 0.0958   Epoch: 0   Global Step: 7140   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:40,327-Speed 9416.80 samples/sec   Loss 16.1564   LearningRate 0.0958   Epoch: 0   Global Step: 7150   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:41,444-Speed 9171.52 samples/sec   Loss 16.0538   LearningRate 0.0958   Epoch: 0   Global Step: 7160   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:42,484-Speed 9845.65 samples/sec   Loss 16.0014   LearningRate 0.0958   Epoch: 0   Global Step: 7170   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:43,536-Speed 9743.55 samples/sec   Loss 16.0453   LearningRate 0.0957   Epoch: 0   Global Step: 7180   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:44,605-Speed 9578.30 samples/sec   Loss 15.9771   LearningRate 0.0957   Epoch: 0   Global Step: 7190   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:45,665-Speed 9667.93 samples/sec   Loss 16.1116   LearningRate 0.0957   Epoch: 0   Global Step: 7200   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:46,741-Speed 9527.50 samples/sec   Loss 16.0942   LearningRate 0.0957   Epoch: 0   Global Step: 7210   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:47,807-Speed 9610.49 samples/sec   Loss 16.1601   LearningRate 0.0957   Epoch: 0   Global Step: 7220   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:48,882-Speed 9531.76 samples/sec   Loss 15.9627   LearningRate 0.0957   Epoch: 0   Global Step: 7230   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:49,946-Speed 9631.79 samples/sec   Loss 16.0249   LearningRate 0.0957   Epoch: 0   Global Step: 7240   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:51,025-Speed 9495.93 samples/sec   Loss 16.0340   LearningRate 0.0957   Epoch: 0   Global Step: 7250   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:52,141-Speed 9176.54 samples/sec   Loss 15.9855   LearningRate 0.0957   Epoch: 0   Global Step: 7260   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:53,242-Speed 9313.60 samples/sec   Loss 15.9557   LearningRate 0.0957   Epoch: 0   Global Step: 7270   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:54,333-Speed 9387.11 samples/sec   Loss 15.8949   LearningRate 0.0957   Epoch: 0   Global Step: 7280   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:55,424-Speed 9394.29 samples/sec   Loss 16.1444   LearningRate 0.0957   Epoch: 0   Global Step: 7290   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:49:56,499-Speed 9533.76 samples/sec   Loss 15.9049   LearningRate 0.0957   Epoch: 0   Global Step: 7300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:57,573-Speed 9538.54 samples/sec   Loss 15.8453   LearningRate 0.0957   Epoch: 0   Global Step: 7310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:58,681-Speed 9250.44 samples/sec   Loss 16.0681   LearningRate 0.0957   Epoch: 0   Global Step: 7320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:49:59,764-Speed 9464.14 samples/sec   Loss 15.9797   LearningRate 0.0957   Epoch: 0   Global Step: 7330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:00,848-Speed 9446.16 samples/sec   Loss 16.0411   LearningRate 0.0957   Epoch: 0   Global Step: 7340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:01,941-Speed 9375.35 samples/sec   Loss 15.9272   LearningRate 0.0956   Epoch: 0   Global Step: 7350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:03,040-Speed 9320.94 samples/sec   Loss 15.9009   LearningRate 0.0956   Epoch: 0   Global Step: 7360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:04,090-Speed 9754.11 samples/sec   Loss 15.9812   LearningRate 0.0956   Epoch: 0   Global Step: 7370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:05,146-Speed 9706.18 samples/sec   Loss 15.8851   LearningRate 0.0956   Epoch: 0   Global Step: 7380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:06,231-Speed 9439.52 samples/sec   Loss 15.8638   LearningRate 0.0956   Epoch: 0   Global Step: 7390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:07,300-Speed 9583.58 samples/sec   Loss 15.8123   LearningRate 0.0956   Epoch: 0   Global Step: 7400   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:08,376-Speed 9533.33 samples/sec   Loss 15.8232   LearningRate 0.0956   Epoch: 0   Global Step: 7410   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:09,463-Speed 9421.76 samples/sec   Loss 15.7331   LearningRate 0.0956   Epoch: 0   Global Step: 7420   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:10,520-Speed 9695.65 samples/sec   Loss 16.0093   LearningRate 0.0956   Epoch: 0   Global Step: 7430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:11,622-Speed 9298.26 samples/sec   Loss 15.9563   LearningRate 0.0956   Epoch: 0   Global Step: 7440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:12,741-Speed 9150.99 samples/sec   Loss 15.6756   LearningRate 0.0956   Epoch: 0   Global Step: 7450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:13,797-Speed 9708.70 samples/sec   Loss 15.8312   LearningRate 0.0956   Epoch: 0   Global Step: 7460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:14,866-Speed 9585.44 samples/sec   Loss 15.7785   LearningRate 0.0956   Epoch: 0   Global Step: 7470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:15,963-Speed 9341.07 samples/sec   Loss 15.9013   LearningRate 0.0956   Epoch: 0   Global Step: 7480   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:17,043-Speed 9486.86 samples/sec   Loss 15.8060   LearningRate 0.0956   Epoch: 0   Global Step: 7490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:18,110-Speed 9603.13 samples/sec   Loss 15.6741   LearningRate 0.0956   Epoch: 0   Global Step: 7500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:19,226-Speed 9180.30 samples/sec   Loss 15.7917   LearningRate 0.0956   Epoch: 0   Global Step: 7510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:20,276-Speed 9755.61 samples/sec   Loss 15.6208   LearningRate 0.0955   Epoch: 0   Global Step: 7520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:21,328-Speed 9744.90 samples/sec   Loss 15.7513   LearningRate 0.0955   Epoch: 0   Global Step: 7530   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:22,452-Speed 9110.82 samples/sec   Loss 15.7954   LearningRate 0.0955   Epoch: 0   Global Step: 7540   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:23,528-Speed 9523.47 samples/sec   Loss 15.8151   LearningRate 0.0955   Epoch: 0   Global Step: 7550   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:24,625-Speed 9345.43 samples/sec   Loss 15.7733   LearningRate 0.0955   Epoch: 0   Global Step: 7560   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:25,716-Speed 9384.33 samples/sec   Loss 15.8369   LearningRate 0.0955   Epoch: 0   Global Step: 7570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:26,830-Speed 9201.25 samples/sec   Loss 15.6917   LearningRate 0.0955   Epoch: 0   Global Step: 7580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:27,878-Speed 9780.39 samples/sec   Loss 15.6704   LearningRate 0.0955   Epoch: 0   Global Step: 7590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:28,963-Speed 9444.44 samples/sec   Loss 15.6117   LearningRate 0.0955   Epoch: 0   Global Step: 7600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:30,040-Speed 9506.13 samples/sec   Loss 15.5319   LearningRate 0.0955   Epoch: 0   Global Step: 7610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:31,106-Speed 9609.15 samples/sec   Loss 15.5709   LearningRate 0.0955   Epoch: 0   Global Step: 7620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:32,183-Speed 9513.84 samples/sec   Loss 15.5791   LearningRate 0.0955   Epoch: 0   Global Step: 7630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:33,277-Speed 9367.40 samples/sec   Loss 15.5446   LearningRate 0.0955   Epoch: 0   Global Step: 7640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:34,339-Speed 9651.03 samples/sec   Loss 15.5619   LearningRate 0.0955   Epoch: 0   Global Step: 7650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:35,377-Speed 9870.05 samples/sec   Loss 15.5141   LearningRate 0.0955   Epoch: 0   Global Step: 7660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:36,425-Speed 9781.15 samples/sec   Loss 15.7960   LearningRate 0.0955   Epoch: 0   Global Step: 7670   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:37,463-Speed 9871.33 samples/sec   Loss 15.6440   LearningRate 0.0955   Epoch: 0   Global Step: 7680   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:38,549-Speed 9431.66 samples/sec   Loss 15.6327   LearningRate 0.0954   Epoch: 0   Global Step: 7690   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:39,667-Speed 9167.39 samples/sec   Loss 15.4566   LearningRate 0.0954   Epoch: 0   Global Step: 7700   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:40,755-Speed 9419.30 samples/sec   Loss 15.4343   LearningRate 0.0954   Epoch: 0   Global Step: 7710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:41,831-Speed 9514.78 samples/sec   Loss 15.4710   LearningRate 0.0954   Epoch: 0   Global Step: 7720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:42,892-Speed 9663.60 samples/sec   Loss 15.6013   LearningRate 0.0954   Epoch: 0   Global Step: 7730   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:43,977-Speed 9442.05 samples/sec   Loss 15.5167   LearningRate 0.0954   Epoch: 0   Global Step: 7740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:45,029-Speed 9732.59 samples/sec   Loss 15.5458   LearningRate 0.0954   Epoch: 0   Global Step: 7750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:46,104-Speed 9533.02 samples/sec   Loss 15.4970   LearningRate 0.0954   Epoch: 0   Global Step: 7760   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:47,164-Speed 9669.63 samples/sec   Loss 15.4526   LearningRate 0.0954   Epoch: 0   Global Step: 7770   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:48,247-Speed 9463.67 samples/sec   Loss 15.3404   LearningRate 0.0954   Epoch: 0   Global Step: 7780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:49,309-Speed 9647.45 samples/sec   Loss 15.5775   LearningRate 0.0954   Epoch: 0   Global Step: 7790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:50:50,342-Speed 9916.82 samples/sec   Loss 15.5025   LearningRate 0.0954   Epoch: 0   Global Step: 7800   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:51,393-Speed 9744.84 samples/sec   Loss 15.4618   LearningRate 0.0954   Epoch: 0   Global Step: 7810   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:52,482-Speed 9408.95 samples/sec   Loss 15.3603   LearningRate 0.0954   Epoch: 0   Global Step: 7820   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:53,527-Speed 9814.07 samples/sec   Loss 15.5566   LearningRate 0.0954   Epoch: 0   Global Step: 7830   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:54,583-Speed 9708.58 samples/sec   Loss 15.3929   LearningRate 0.0954   Epoch: 0   Global Step: 7840   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:55,655-Speed 9560.23 samples/sec   Loss 15.4494   LearningRate 0.0954   Epoch: 0   Global Step: 7850   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:56,718-Speed 9636.05 samples/sec   Loss 15.7313   LearningRate 0.0953   Epoch: 0   Global Step: 7860   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:57,807-Speed 9408.56 samples/sec   Loss 15.3551   LearningRate 0.0953   Epoch: 0   Global Step: 7870   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:50:58,944-Speed 9014.42 samples/sec   Loss 15.3278   LearningRate 0.0953   Epoch: 0   Global Step: 7880   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:51:00,032-Speed 9417.02 samples/sec   Loss 15.4437   LearningRate 0.0953   Epoch: 0   Global Step: 7890   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:51:01,083-Speed 9742.02 samples/sec   Loss 15.3176   LearningRate 0.0953   Epoch: 0   Global Step: 7900   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:51:02,164-Speed 9485.74 samples/sec   Loss 15.2191   LearningRate 0.0953   Epoch: 0   Global Step: 7910   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 11:51:03,195-Speed 9936.19 samples/sec   Loss 15.3220   LearningRate 0.0953   Epoch: 0   Global Step: 7920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:51:04,247-Speed 9731.80 samples/sec   Loss 15.3347   LearningRate 0.0953   Epoch: 0   Global Step: 7930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:51:05,342-Speed 9355.81 samples/sec   Loss 15.2523   LearningRate 0.0953   Epoch: 0   Global Step: 7940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:51:06,416-Speed 9547.22 samples/sec   Loss 15.2218   LearningRate 0.0953   Epoch: 0   Global Step: 7950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:51:07,476-Speed 9664.42 samples/sec   Loss 15.1459   LearningRate 0.0953   Epoch: 0   Global Step: 7960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:51:08,582-Speed 9261.72 samples/sec   Loss 15.3113   LearningRate 0.0953   Epoch: 0   Global Step: 7970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:51:09,656-Speed 9544.29 samples/sec   Loss 15.2284   LearningRate 0.0953   Epoch: 0   Global Step: 7980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:51:10,709-Speed 9723.87 samples/sec   Loss 15.1489   LearningRate 0.0953   Epoch: 0   Global Step: 7990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:51:11,826-Speed 9179.69 samples/sec   Loss 15.2488   LearningRate 0.0953   Epoch: 0   Global Step: 8000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:51:33,638-[lfw][8000]XNorm: 15.520002
Training: 2022-04-11 11:51:33,639-[lfw][8000]Accuracy-Flip: 0.98833+-0.00563
Training: 2022-04-11 11:51:33,639-[lfw][8000]Accuracy-Highest: 0.98833
Training: 2022-04-11 11:51:58,837-[cfp_fp][8000]XNorm: 13.494573
Training: 2022-04-11 11:51:58,838-[cfp_fp][8000]Accuracy-Flip: 0.86786+-0.01484
Training: 2022-04-11 11:51:58,838-[cfp_fp][8000]Accuracy-Highest: 0.86786
Training: 2022-04-11 11:52:20,581-[agedb_30][8000]XNorm: 14.844895
Training: 2022-04-11 11:52:20,581-[agedb_30][8000]Accuracy-Flip: 0.89850+-0.01985
Training: 2022-04-11 11:52:20,582-[agedb_30][8000]Accuracy-Highest: 0.89850
Training: 2022-04-11 11:52:21,676-Speed 146.60 samples/sec   Loss 15.0222   LearningRate 0.0953   Epoch: 0   Global Step: 8010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:22,766-Speed 9401.00 samples/sec   Loss 15.2775   LearningRate 0.0953   Epoch: 0   Global Step: 8020   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:52:23,838-Speed 9552.76 samples/sec   Loss 15.2405   LearningRate 0.0952   Epoch: 0   Global Step: 8030   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:52:24,880-Speed 9839.74 samples/sec   Loss 15.0844   LearningRate 0.0952   Epoch: 0   Global Step: 8040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:25,999-Speed 9150.91 samples/sec   Loss 15.1817   LearningRate 0.0952   Epoch: 0   Global Step: 8050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:27,121-Speed 9133.47 samples/sec   Loss 15.1577   LearningRate 0.0952   Epoch: 0   Global Step: 8060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:28,224-Speed 9296.77 samples/sec   Loss 15.3948   LearningRate 0.0952   Epoch: 0   Global Step: 8070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:29,307-Speed 9456.54 samples/sec   Loss 15.3087   LearningRate 0.0952   Epoch: 0   Global Step: 8080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:30,446-Speed 8992.65 samples/sec   Loss 15.0670   LearningRate 0.0952   Epoch: 0   Global Step: 8090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:31,532-Speed 9437.26 samples/sec   Loss 15.2515   LearningRate 0.0952   Epoch: 0   Global Step: 8100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:32,645-Speed 9201.76 samples/sec   Loss 15.2359   LearningRate 0.0952   Epoch: 0   Global Step: 8110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:33,727-Speed 9473.79 samples/sec   Loss 15.0842   LearningRate 0.0952   Epoch: 0   Global Step: 8120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:34,798-Speed 9562.26 samples/sec   Loss 15.1574   LearningRate 0.0952   Epoch: 0   Global Step: 8130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:35,875-Speed 9515.15 samples/sec   Loss 15.0796   LearningRate 0.0952   Epoch: 0   Global Step: 8140   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:52:36,935-Speed 9665.56 samples/sec   Loss 15.0916   LearningRate 0.0952   Epoch: 0   Global Step: 8150   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:52:37,987-Speed 9745.19 samples/sec   Loss 15.1117   LearningRate 0.0952   Epoch: 0   Global Step: 8160   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:52:39,056-Speed 9585.07 samples/sec   Loss 15.0558   LearningRate 0.0952   Epoch: 0   Global Step: 8170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:40,143-Speed 9422.06 samples/sec   Loss 15.1955   LearningRate 0.0952   Epoch: 0   Global Step: 8180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:41,251-Speed 9255.72 samples/sec   Loss 15.1030   LearningRate 0.0952   Epoch: 0   Global Step: 8190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:42,314-Speed 9636.04 samples/sec   Loss 15.2572   LearningRate 0.0951   Epoch: 0   Global Step: 8200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:43,407-Speed 9375.22 samples/sec   Loss 15.0293   LearningRate 0.0951   Epoch: 0   Global Step: 8210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:44,476-Speed 9591.43 samples/sec   Loss 15.1566   LearningRate 0.0951   Epoch: 0   Global Step: 8220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:45,567-Speed 9389.58 samples/sec   Loss 15.1310   LearningRate 0.0951   Epoch: 0   Global Step: 8230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:46,635-Speed 9593.84 samples/sec   Loss 15.2285   LearningRate 0.0951   Epoch: 0   Global Step: 8240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:47,749-Speed 9196.07 samples/sec   Loss 15.0746   LearningRate 0.0951   Epoch: 0   Global Step: 8250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:48,804-Speed 9709.18 samples/sec   Loss 15.1088   LearningRate 0.0951   Epoch: 0   Global Step: 8260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:49,876-Speed 9561.02 samples/sec   Loss 14.9655   LearningRate 0.0951   Epoch: 0   Global Step: 8270   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:52:50,994-Speed 9162.25 samples/sec   Loss 15.1881   LearningRate 0.0951   Epoch: 0   Global Step: 8280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:52,092-Speed 9335.11 samples/sec   Loss 14.9222   LearningRate 0.0951   Epoch: 0   Global Step: 8290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:53,150-Speed 9676.10 samples/sec   Loss 14.9933   LearningRate 0.0951   Epoch: 0   Global Step: 8300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:54,220-Speed 9579.18 samples/sec   Loss 15.0484   LearningRate 0.0951   Epoch: 0   Global Step: 8310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:55,302-Speed 9465.69 samples/sec   Loss 14.9019   LearningRate 0.0951   Epoch: 0   Global Step: 8320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:56,395-Speed 9381.76 samples/sec   Loss 14.8229   LearningRate 0.0951   Epoch: 0   Global Step: 8330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:57,467-Speed 9562.90 samples/sec   Loss 15.0150   LearningRate 0.0951   Epoch: 0   Global Step: 8340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:58,552-Speed 9449.39 samples/sec   Loss 14.8976   LearningRate 0.0951   Epoch: 0   Global Step: 8350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:52:59,640-Speed 9410.52 samples/sec   Loss 14.8662   LearningRate 0.0951   Epoch: 0   Global Step: 8360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:00,704-Speed 9630.88 samples/sec   Loss 14.9685   LearningRate 0.0950   Epoch: 0   Global Step: 8370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:01,788-Speed 9458.40 samples/sec   Loss 15.0106   LearningRate 0.0950   Epoch: 0   Global Step: 8380   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:53:02,869-Speed 9475.54 samples/sec   Loss 15.0083   LearningRate 0.0950   Epoch: 0   Global Step: 8390   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:53:03,999-Speed 9068.47 samples/sec   Loss 14.9393   LearningRate 0.0950   Epoch: 0   Global Step: 8400   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:53:05,091-Speed 9383.46 samples/sec   Loss 14.8000   LearningRate 0.0950   Epoch: 0   Global Step: 8410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:06,193-Speed 9294.56 samples/sec   Loss 14.8640   LearningRate 0.0950   Epoch: 0   Global Step: 8420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:07,286-Speed 9369.61 samples/sec   Loss 14.9343   LearningRate 0.0950   Epoch: 0   Global Step: 8430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:08,348-Speed 9661.85 samples/sec   Loss 14.9694   LearningRate 0.0950   Epoch: 0   Global Step: 8440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:09,469-Speed 9142.36 samples/sec   Loss 14.9183   LearningRate 0.0950   Epoch: 0   Global Step: 8450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:10,553-Speed 9443.82 samples/sec   Loss 14.9703   LearningRate 0.0950   Epoch: 0   Global Step: 8460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:11,670-Speed 9172.85 samples/sec   Loss 14.8755   LearningRate 0.0950   Epoch: 0   Global Step: 8470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:12,767-Speed 9346.77 samples/sec   Loss 14.8454   LearningRate 0.0950   Epoch: 0   Global Step: 8480   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:13,867-Speed 9314.09 samples/sec   Loss 14.9873   LearningRate 0.0950   Epoch: 0   Global Step: 8490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:14,988-Speed 9141.80 samples/sec   Loss 14.6725   LearningRate 0.0950   Epoch: 0   Global Step: 8500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:16,042-Speed 9722.15 samples/sec   Loss 14.8946   LearningRate 0.0950   Epoch: 0   Global Step: 8510   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:53:17,129-Speed 9426.21 samples/sec   Loss 14.7996   LearningRate 0.0950   Epoch: 0   Global Step: 8520   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:53:18,205-Speed 9518.84 samples/sec   Loss 14.8122   LearningRate 0.0950   Epoch: 0   Global Step: 8530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:19,307-Speed 9300.91 samples/sec   Loss 14.7667   LearningRate 0.0949   Epoch: 0   Global Step: 8540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:20,379-Speed 9559.78 samples/sec   Loss 14.9513   LearningRate 0.0949   Epoch: 0   Global Step: 8550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:21,469-Speed 9399.89 samples/sec   Loss 14.8031   LearningRate 0.0949   Epoch: 0   Global Step: 8560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:22,544-Speed 9528.73 samples/sec   Loss 14.7933   LearningRate 0.0949   Epoch: 0   Global Step: 8570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:23,587-Speed 9830.32 samples/sec   Loss 14.7071   LearningRate 0.0949   Epoch: 0   Global Step: 8580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:24,658-Speed 9560.58 samples/sec   Loss 14.7208   LearningRate 0.0949   Epoch: 0   Global Step: 8590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:25,755-Speed 9342.46 samples/sec   Loss 14.7633   LearningRate 0.0949   Epoch: 0   Global Step: 8600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:26,867-Speed 9213.22 samples/sec   Loss 14.8649   LearningRate 0.0949   Epoch: 0   Global Step: 8610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:27,958-Speed 9388.58 samples/sec   Loss 14.7308   LearningRate 0.0949   Epoch: 0   Global Step: 8620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:29,051-Speed 9379.20 samples/sec   Loss 14.6444   LearningRate 0.0949   Epoch: 0   Global Step: 8630   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:53:30,182-Speed 9063.92 samples/sec   Loss 14.7099   LearningRate 0.0949   Epoch: 0   Global Step: 8640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:31,280-Speed 9329.78 samples/sec   Loss 14.6920   LearningRate 0.0949   Epoch: 0   Global Step: 8650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:32,346-Speed 9606.95 samples/sec   Loss 14.8013   LearningRate 0.0949   Epoch: 0   Global Step: 8660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:33,421-Speed 9530.63 samples/sec   Loss 14.6737   LearningRate 0.0949   Epoch: 0   Global Step: 8670   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:34,532-Speed 9220.80 samples/sec   Loss 14.6479   LearningRate 0.0949   Epoch: 0   Global Step: 8680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:35,607-Speed 9536.95 samples/sec   Loss 14.6136   LearningRate 0.0949   Epoch: 0   Global Step: 8690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:36,722-Speed 9188.57 samples/sec   Loss 14.5834   LearningRate 0.0949   Epoch: 0   Global Step: 8700   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:37,817-Speed 9360.19 samples/sec   Loss 14.6875   LearningRate 0.0948   Epoch: 0   Global Step: 8710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:38,936-Speed 9153.79 samples/sec   Loss 14.6546   LearningRate 0.0948   Epoch: 0   Global Step: 8720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:40,021-Speed 9443.02 samples/sec   Loss 14.6563   LearningRate 0.0948   Epoch: 0   Global Step: 8730   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:41,104-Speed 9461.59 samples/sec   Loss 14.5757   LearningRate 0.0948   Epoch: 0   Global Step: 8740   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:53:42,179-Speed 9531.81 samples/sec   Loss 14.6187   LearningRate 0.0948   Epoch: 0   Global Step: 8750   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:53:43,290-Speed 9222.39 samples/sec   Loss 14.4769   LearningRate 0.0948   Epoch: 0   Global Step: 8760   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:53:44,417-Speed 9092.22 samples/sec   Loss 14.5844   LearningRate 0.0948   Epoch: 0   Global Step: 8770   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:53:45,515-Speed 9335.70 samples/sec   Loss 14.6638   LearningRate 0.0948   Epoch: 0   Global Step: 8780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:46,586-Speed 9570.90 samples/sec   Loss 14.6435   LearningRate 0.0948   Epoch: 0   Global Step: 8790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:47,690-Speed 9275.92 samples/sec   Loss 14.7833   LearningRate 0.0948   Epoch: 0   Global Step: 8800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:48,802-Speed 9214.44 samples/sec   Loss 14.6483   LearningRate 0.0948   Epoch: 0   Global Step: 8810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:49,886-Speed 9453.12 samples/sec   Loss 14.5539   LearningRate 0.0948   Epoch: 0   Global Step: 8820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:50,980-Speed 9364.64 samples/sec   Loss 14.6425   LearningRate 0.0948   Epoch: 0   Global Step: 8830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:52,046-Speed 9616.41 samples/sec   Loss 14.5090   LearningRate 0.0948   Epoch: 0   Global Step: 8840   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:53,165-Speed 9155.04 samples/sec   Loss 14.5300   LearningRate 0.0948   Epoch: 0   Global Step: 8850   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:54,206-Speed 9845.80 samples/sec   Loss 14.6407   LearningRate 0.0948   Epoch: 0   Global Step: 8860   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:55,299-Speed 9373.53 samples/sec   Loss 14.4673   LearningRate 0.0948   Epoch: 0   Global Step: 8870   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:56,387-Speed 9415.88 samples/sec   Loss 14.4904   LearningRate 0.0948   Epoch: 0   Global Step: 8880   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:53:57,462-Speed 9523.19 samples/sec   Loss 14.5097   LearningRate 0.0947   Epoch: 0   Global Step: 8890   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:53:58,537-Speed 9536.32 samples/sec   Loss 14.4103   LearningRate 0.0947   Epoch: 0   Global Step: 8900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:53:59,657-Speed 9154.86 samples/sec   Loss 14.5347   LearningRate 0.0947   Epoch: 0   Global Step: 8910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:00,757-Speed 9310.50 samples/sec   Loss 14.5774   LearningRate 0.0947   Epoch: 0   Global Step: 8920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:01,873-Speed 9182.22 samples/sec   Loss 14.6534   LearningRate 0.0947   Epoch: 0   Global Step: 8930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:02,971-Speed 9331.64 samples/sec   Loss 14.6237   LearningRate 0.0947   Epoch: 0   Global Step: 8940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:04,058-Speed 9426.11 samples/sec   Loss 14.6331   LearningRate 0.0947   Epoch: 0   Global Step: 8950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:05,125-Speed 9605.63 samples/sec   Loss 14.4673   LearningRate 0.0947   Epoch: 0   Global Step: 8960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:06,186-Speed 9656.13 samples/sec   Loss 14.3469   LearningRate 0.0947   Epoch: 0   Global Step: 8970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:07,295-Speed 9240.15 samples/sec   Loss 14.3912   LearningRate 0.0947   Epoch: 0   Global Step: 8980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:08,402-Speed 9256.53 samples/sec   Loss 14.5361   LearningRate 0.0947   Epoch: 0   Global Step: 8990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:09,489-Speed 9424.30 samples/sec   Loss 14.5439   LearningRate 0.0947   Epoch: 0   Global Step: 9000   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:54:10,568-Speed 9494.45 samples/sec   Loss 14.5701   LearningRate 0.0947   Epoch: 0   Global Step: 9010   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:54:11,642-Speed 9543.04 samples/sec   Loss 14.3791   LearningRate 0.0947   Epoch: 0   Global Step: 9020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:12,748-Speed 9265.20 samples/sec   Loss 14.4282   LearningRate 0.0947   Epoch: 0   Global Step: 9030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:13,827-Speed 9495.30 samples/sec   Loss 14.4555   LearningRate 0.0947   Epoch: 0   Global Step: 9040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:14,911-Speed 9453.77 samples/sec   Loss 14.3501   LearningRate 0.0947   Epoch: 0   Global Step: 9050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:15,997-Speed 9435.89 samples/sec   Loss 14.4889   LearningRate 0.0946   Epoch: 0   Global Step: 9060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:17,033-Speed 9889.67 samples/sec   Loss 14.5686   LearningRate 0.0946   Epoch: 0   Global Step: 9070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:18,104-Speed 9567.66 samples/sec   Loss 14.4444   LearningRate 0.0946   Epoch: 0   Global Step: 9080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:19,155-Speed 9744.89 samples/sec   Loss 14.4829   LearningRate 0.0946   Epoch: 0   Global Step: 9090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:20,255-Speed 9309.59 samples/sec   Loss 14.5255   LearningRate 0.0946   Epoch: 0   Global Step: 9100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:21,320-Speed 9626.38 samples/sec   Loss 14.3949   LearningRate 0.0946   Epoch: 0   Global Step: 9110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:22,427-Speed 9256.19 samples/sec   Loss 14.3023   LearningRate 0.0946   Epoch: 0   Global Step: 9120   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:54:23,529-Speed 9293.19 samples/sec   Loss 14.3447   LearningRate 0.0946   Epoch: 0   Global Step: 9130   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:54:24,591-Speed 9651.97 samples/sec   Loss 14.2210   LearningRate 0.0946   Epoch: 0   Global Step: 9140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:25,701-Speed 9226.04 samples/sec   Loss 14.3370   LearningRate 0.0946   Epoch: 0   Global Step: 9150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:26,758-Speed 9699.60 samples/sec   Loss 14.5415   LearningRate 0.0946   Epoch: 0   Global Step: 9160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:27,848-Speed 9397.44 samples/sec   Loss 14.3287   LearningRate 0.0946   Epoch: 0   Global Step: 9170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:28,903-Speed 9718.20 samples/sec   Loss 14.4068   LearningRate 0.0946   Epoch: 0   Global Step: 9180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:29,984-Speed 9474.40 samples/sec   Loss 14.3969   LearningRate 0.0946   Epoch: 0   Global Step: 9190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:31,069-Speed 9440.22 samples/sec   Loss 14.4114   LearningRate 0.0946   Epoch: 0   Global Step: 9200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:32,144-Speed 9536.46 samples/sec   Loss 14.3317   LearningRate 0.0946   Epoch: 0   Global Step: 9210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:33,261-Speed 9175.72 samples/sec   Loss 14.3303   LearningRate 0.0946   Epoch: 0   Global Step: 9220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:34,347-Speed 9426.01 samples/sec   Loss 14.4023   LearningRate 0.0945   Epoch: 0   Global Step: 9230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:35,398-Speed 9755.42 samples/sec   Loss 14.1657   LearningRate 0.0945   Epoch: 0   Global Step: 9240   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:54:36,434-Speed 9887.72 samples/sec   Loss 14.3222   LearningRate 0.0945   Epoch: 0   Global Step: 9250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:37,493-Speed 9671.66 samples/sec   Loss 14.4690   LearningRate 0.0945   Epoch: 0   Global Step: 9260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:38,605-Speed 9220.27 samples/sec   Loss 14.2836   LearningRate 0.0945   Epoch: 0   Global Step: 9270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:39,697-Speed 9384.31 samples/sec   Loss 14.1386   LearningRate 0.0945   Epoch: 0   Global Step: 9280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:40,800-Speed 9288.68 samples/sec   Loss 14.2956   LearningRate 0.0945   Epoch: 0   Global Step: 9290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:41,915-Speed 9181.42 samples/sec   Loss 14.4021   LearningRate 0.0945   Epoch: 0   Global Step: 9300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:43,006-Speed 9392.29 samples/sec   Loss 14.2632   LearningRate 0.0945   Epoch: 0   Global Step: 9310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:44,088-Speed 9476.21 samples/sec   Loss 14.2763   LearningRate 0.0945   Epoch: 0   Global Step: 9320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:45,142-Speed 9730.09 samples/sec   Loss 14.2442   LearningRate 0.0945   Epoch: 0   Global Step: 9330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:46,245-Speed 9289.55 samples/sec   Loss 14.2962   LearningRate 0.0945   Epoch: 0   Global Step: 9340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:47,340-Speed 9355.74 samples/sec   Loss 14.2791   LearningRate 0.0945   Epoch: 0   Global Step: 9350   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:54:48,401-Speed 9656.23 samples/sec   Loss 14.2043   LearningRate 0.0945   Epoch: 0   Global Step: 9360   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:54:49,458-Speed 9695.89 samples/sec   Loss 14.3050   LearningRate 0.0945   Epoch: 0   Global Step: 9370   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:54:50,503-Speed 9799.35 samples/sec   Loss 14.2533   LearningRate 0.0945   Epoch: 0   Global Step: 9380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:51,570-Speed 9607.63 samples/sec   Loss 14.3734   LearningRate 0.0945   Epoch: 0   Global Step: 9390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:52,652-Speed 9465.35 samples/sec   Loss 14.2330   LearningRate 0.0944   Epoch: 0   Global Step: 9400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:53,728-Speed 9525.93 samples/sec   Loss 13.9418   LearningRate 0.0944   Epoch: 0   Global Step: 9410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:54,797-Speed 9578.90 samples/sec   Loss 14.1254   LearningRate 0.0944   Epoch: 0   Global Step: 9420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:55,844-Speed 9789.29 samples/sec   Loss 14.0854   LearningRate 0.0944   Epoch: 0   Global Step: 9430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:56,917-Speed 9545.91 samples/sec   Loss 14.1763   LearningRate 0.0944   Epoch: 0   Global Step: 9440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:58,078-Speed 8829.16 samples/sec   Loss 14.1632   LearningRate 0.0944   Epoch: 0   Global Step: 9450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:54:59,198-Speed 9148.15 samples/sec   Loss 14.0741   LearningRate 0.0944   Epoch: 0   Global Step: 9460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:00,297-Speed 9322.03 samples/sec   Loss 14.1996   LearningRate 0.0944   Epoch: 0   Global Step: 9470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:01,376-Speed 9490.74 samples/sec   Loss 14.0939   LearningRate 0.0944   Epoch: 0   Global Step: 9480   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:55:02,427-Speed 9752.23 samples/sec   Loss 14.1523   LearningRate 0.0944   Epoch: 0   Global Step: 9490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:03,532-Speed 9269.35 samples/sec   Loss 14.1423   LearningRate 0.0944   Epoch: 0   Global Step: 9500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:04,660-Speed 9089.13 samples/sec   Loss 14.1426   LearningRate 0.0944   Epoch: 0   Global Step: 9510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:05,701-Speed 9842.96 samples/sec   Loss 14.1207   LearningRate 0.0944   Epoch: 0   Global Step: 9520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:06,764-Speed 9639.88 samples/sec   Loss 14.1092   LearningRate 0.0944   Epoch: 0   Global Step: 9530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:07,895-Speed 9051.68 samples/sec   Loss 14.1582   LearningRate 0.0944   Epoch: 0   Global Step: 9540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:08,993-Speed 9342.41 samples/sec   Loss 14.1210   LearningRate 0.0944   Epoch: 0   Global Step: 9550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:10,058-Speed 9618.02 samples/sec   Loss 14.1277   LearningRate 0.0944   Epoch: 0   Global Step: 9560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:11,173-Speed 9184.54 samples/sec   Loss 14.0517   LearningRate 0.0943   Epoch: 0   Global Step: 9570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:12,232-Speed 9676.86 samples/sec   Loss 14.1337   LearningRate 0.0943   Epoch: 0   Global Step: 9580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:13,292-Speed 9667.56 samples/sec   Loss 14.0319   LearningRate 0.0943   Epoch: 0   Global Step: 9590   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:55:14,390-Speed 9327.88 samples/sec   Loss 14.1079   LearningRate 0.0943   Epoch: 0   Global Step: 9600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:15,479-Speed 9417.91 samples/sec   Loss 14.1678   LearningRate 0.0943   Epoch: 0   Global Step: 9610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:16,554-Speed 9531.68 samples/sec   Loss 13.9427   LearningRate 0.0943   Epoch: 0   Global Step: 9620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:17,618-Speed 9625.78 samples/sec   Loss 14.0555   LearningRate 0.0943   Epoch: 0   Global Step: 9630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:18,712-Speed 9366.87 samples/sec   Loss 13.9973   LearningRate 0.0943   Epoch: 0   Global Step: 9640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:19,823-Speed 9215.93 samples/sec   Loss 14.0377   LearningRate 0.0943   Epoch: 0   Global Step: 9650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:20,891-Speed 9604.18 samples/sec   Loss 14.1020   LearningRate 0.0943   Epoch: 0   Global Step: 9660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:21,973-Speed 9467.93 samples/sec   Loss 14.0979   LearningRate 0.0943   Epoch: 0   Global Step: 9670   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:23,032-Speed 9674.30 samples/sec   Loss 14.0212   LearningRate 0.0943   Epoch: 0   Global Step: 9680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:24,131-Speed 9324.70 samples/sec   Loss 13.9637   LearningRate 0.0943   Epoch: 0   Global Step: 9690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:25,199-Speed 9593.92 samples/sec   Loss 13.9928   LearningRate 0.0943   Epoch: 0   Global Step: 9700   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:55:26,300-Speed 9300.15 samples/sec   Loss 14.0360   LearningRate 0.0943   Epoch: 0   Global Step: 9710   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:55:27,415-Speed 9188.37 samples/sec   Loss 14.0094   LearningRate 0.0943   Epoch: 0   Global Step: 9720   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:55:28,491-Speed 9532.66 samples/sec   Loss 14.1481   LearningRate 0.0943   Epoch: 0   Global Step: 9730   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:55:29,582-Speed 9385.12 samples/sec   Loss 13.9599   LearningRate 0.0942   Epoch: 0   Global Step: 9740   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:55:30,679-Speed 9345.94 samples/sec   Loss 13.9150   LearningRate 0.0942   Epoch: 0   Global Step: 9750   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:55:31,736-Speed 9688.63 samples/sec   Loss 13.9560   LearningRate 0.0942   Epoch: 0   Global Step: 9760   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:55:32,788-Speed 9739.73 samples/sec   Loss 13.9750   LearningRate 0.0942   Epoch: 0   Global Step: 9770   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:33,854-Speed 9610.63 samples/sec   Loss 14.0218   LearningRate 0.0942   Epoch: 0   Global Step: 9780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:34,955-Speed 9308.82 samples/sec   Loss 13.9575   LearningRate 0.0942   Epoch: 0   Global Step: 9790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:36,035-Speed 9479.82 samples/sec   Loss 13.8919   LearningRate 0.0942   Epoch: 0   Global Step: 9800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:37,106-Speed 9567.90 samples/sec   Loss 13.9690   LearningRate 0.0942   Epoch: 0   Global Step: 9810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:38,219-Speed 9214.97 samples/sec   Loss 13.8265   LearningRate 0.0942   Epoch: 0   Global Step: 9820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:55:39,334-Speed 9185.25 samples/sec   Loss 13.9871   LearningRate 0.0942   Epoch: 0   Global Step: 9830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:55:40,433-Speed 9328.53 samples/sec   Loss 13.8396   LearningRate 0.0942   Epoch: 0   Global Step: 9840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:55:41,496-Speed 9636.96 samples/sec   Loss 13.9508   LearningRate 0.0942   Epoch: 0   Global Step: 9850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:55:42,567-Speed 9559.78 samples/sec   Loss 13.8536   LearningRate 0.0942   Epoch: 0   Global Step: 9860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:55:43,685-Speed 9167.24 samples/sec   Loss 13.8362   LearningRate 0.0942   Epoch: 0   Global Step: 9870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:55:44,760-Speed 9534.57 samples/sec   Loss 13.9766   LearningRate 0.0942   Epoch: 0   Global Step: 9880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:55:45,803-Speed 9822.47 samples/sec   Loss 13.9342   LearningRate 0.0942   Epoch: 0   Global Step: 9890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:55:46,904-Speed 9304.37 samples/sec   Loss 13.9251   LearningRate 0.0942   Epoch: 0   Global Step: 9900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:55:48,021-Speed 9175.13 samples/sec   Loss 13.8554   LearningRate 0.0942   Epoch: 0   Global Step: 9910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:55:49,117-Speed 9349.10 samples/sec   Loss 13.9319   LearningRate 0.0941   Epoch: 0   Global Step: 9920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:50,200-Speed 9460.83 samples/sec   Loss 13.8180   LearningRate 0.0941   Epoch: 0   Global Step: 9930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:55:51,252-Speed 9738.72 samples/sec   Loss 13.8256   LearningRate 0.0941   Epoch: 0   Global Step: 9940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:55:52,393-Speed 8978.88 samples/sec   Loss 13.8198   LearningRate 0.0941   Epoch: 0   Global Step: 9950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:55:53,459-Speed 9612.11 samples/sec   Loss 13.8622   LearningRate 0.0941   Epoch: 0   Global Step: 9960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 11:55:54,503-Speed 9806.27 samples/sec   Loss 13.6730   LearningRate 0.0941   Epoch: 0   Global Step: 9970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 11:55:55,589-Speed 9437.68 samples/sec   Loss 13.7765   LearningRate 0.0941   Epoch: 0   Global Step: 9980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 11:55:56,677-Speed 9414.97 samples/sec   Loss 13.8868   LearningRate 0.0941   Epoch: 0   Global Step: 9990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 11:55:57,780-Speed 9297.00 samples/sec   Loss 13.8260   LearningRate 0.0941   Epoch: 0   Global Step: 10000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 11:56:19,588-[lfw][10000]XNorm: 14.822650
Training: 2022-04-11 11:56:19,589-[lfw][10000]Accuracy-Flip: 0.98850+-0.00580
Training: 2022-04-11 11:56:19,589-[lfw][10000]Accuracy-Highest: 0.98850
Training: 2022-04-11 11:56:44,775-[cfp_fp][10000]XNorm: 12.692100
Training: 2022-04-11 11:56:44,776-[cfp_fp][10000]Accuracy-Flip: 0.89357+-0.01505
Training: 2022-04-11 11:56:44,776-[cfp_fp][10000]Accuracy-Highest: 0.89357
Training: 2022-04-11 11:57:06,526-[agedb_30][10000]XNorm: 14.347011
Training: 2022-04-11 11:57:06,527-[agedb_30][10000]Accuracy-Flip: 0.91417+-0.01724
Training: 2022-04-11 11:57:06,527-[agedb_30][10000]Accuracy-Highest: 0.91417
Training: 2022-04-11 11:57:07,614-Speed 146.64 samples/sec   Loss 13.8117   LearningRate 0.0941   Epoch: 0   Global Step: 10010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:57:08,661-Speed 9792.04 samples/sec   Loss 13.6865   LearningRate 0.0941   Epoch: 0   Global Step: 10020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:57:09,691-Speed 9946.35 samples/sec   Loss 13.9296   LearningRate 0.0941   Epoch: 0   Global Step: 10030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:57:10,735-Speed 9810.93 samples/sec   Loss 13.7630   LearningRate 0.0941   Epoch: 0   Global Step: 10040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:57:11,806-Speed 9571.14 samples/sec   Loss 13.6834   LearningRate 0.0941   Epoch: 0   Global Step: 10050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:57:12,873-Speed 9595.53 samples/sec   Loss 13.9078   LearningRate 0.0941   Epoch: 0   Global Step: 10060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:57:13,976-Speed 9292.32 samples/sec   Loss 13.7805   LearningRate 0.0941   Epoch: 0   Global Step: 10070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:15,030-Speed 9721.49 samples/sec   Loss 13.7886   LearningRate 0.0941   Epoch: 0   Global Step: 10080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:16,104-Speed 9541.89 samples/sec   Loss 13.8436   LearningRate 0.0940   Epoch: 0   Global Step: 10090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:17,232-Speed 9082.67 samples/sec   Loss 13.7716   LearningRate 0.0940   Epoch: 0   Global Step: 10100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:18,303-Speed 9564.45 samples/sec   Loss 13.8955   LearningRate 0.0940   Epoch: 0   Global Step: 10110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:19,404-Speed 9308.63 samples/sec   Loss 13.5553   LearningRate 0.0940   Epoch: 0   Global Step: 10120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:20,494-Speed 9401.75 samples/sec   Loss 13.7221   LearningRate 0.0940   Epoch: 0   Global Step: 10130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:21,576-Speed 9469.33 samples/sec   Loss 13.8253   LearningRate 0.0940   Epoch: 0   Global Step: 10140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:22,675-Speed 9319.12 samples/sec   Loss 13.8168   LearningRate 0.0940   Epoch: 0   Global Step: 10150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:23,747-Speed 9554.69 samples/sec   Loss 13.6775   LearningRate 0.0940   Epoch: 0   Global Step: 10160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:24,810-Speed 9639.41 samples/sec   Loss 13.7506   LearningRate 0.0940   Epoch: 0   Global Step: 10170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:25,902-Speed 9386.87 samples/sec   Loss 13.6320   LearningRate 0.0940   Epoch: 0   Global Step: 10180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:27,010-Speed 9253.19 samples/sec   Loss 13.6417   LearningRate 0.0940   Epoch: 0   Global Step: 10190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:28,082-Speed 9557.58 samples/sec   Loss 13.7480   LearningRate 0.0940   Epoch: 0   Global Step: 10200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:29,155-Speed 9548.46 samples/sec   Loss 13.8239   LearningRate 0.0940   Epoch: 0   Global Step: 10210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:30,229-Speed 9544.92 samples/sec   Loss 13.8356   LearningRate 0.0940   Epoch: 0   Global Step: 10220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:31,323-Speed 9362.99 samples/sec   Loss 13.5969   LearningRate 0.0940   Epoch: 0   Global Step: 10230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:32,388-Speed 9615.98 samples/sec   Loss 13.6792   LearningRate 0.0940   Epoch: 0   Global Step: 10240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:33,440-Speed 9742.79 samples/sec   Loss 13.7257   LearningRate 0.0940   Epoch: 0   Global Step: 10250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:34,481-Speed 9841.00 samples/sec   Loss 13.6115   LearningRate 0.0939   Epoch: 0   Global Step: 10260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:35,582-Speed 9311.19 samples/sec   Loss 13.6101   LearningRate 0.0939   Epoch: 0   Global Step: 10270   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:57:36,674-Speed 9380.86 samples/sec   Loss 13.7199   LearningRate 0.0939   Epoch: 0   Global Step: 10280   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:57:37,764-Speed 9397.05 samples/sec   Loss 13.8257   LearningRate 0.0939   Epoch: 0   Global Step: 10290   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:57:38,868-Speed 9280.39 samples/sec   Loss 13.6122   LearningRate 0.0939   Epoch: 0   Global Step: 10300   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:57:39,947-Speed 9498.16 samples/sec   Loss 13.6817   LearningRate 0.0939   Epoch: 0   Global Step: 10310   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:57:41,028-Speed 9479.07 samples/sec   Loss 13.6109   LearningRate 0.0939   Epoch: 0   Global Step: 10320   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:57:42,085-Speed 9693.86 samples/sec   Loss 13.8151   LearningRate 0.0939   Epoch: 0   Global Step: 10330   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:57:43,178-Speed 9368.60 samples/sec   Loss 13.7769   LearningRate 0.0939   Epoch: 0   Global Step: 10340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:44,265-Speed 9430.40 samples/sec   Loss 13.6004   LearningRate 0.0939   Epoch: 0   Global Step: 10350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:45,304-Speed 9865.20 samples/sec   Loss 13.6354   LearningRate 0.0939   Epoch: 0   Global Step: 10360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:46,387-Speed 9464.54 samples/sec   Loss 13.6196   LearningRate 0.0939   Epoch: 0   Global Step: 10370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:47,496-Speed 9240.61 samples/sec   Loss 13.6462   LearningRate 0.0939   Epoch: 0   Global Step: 10380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:48,626-Speed 9061.92 samples/sec   Loss 13.6245   LearningRate 0.0939   Epoch: 0   Global Step: 10390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:49,700-Speed 9540.01 samples/sec   Loss 13.6304   LearningRate 0.0939   Epoch: 0   Global Step: 10400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:50,744-Speed 9817.20 samples/sec   Loss 13.5567   LearningRate 0.0939   Epoch: 0   Global Step: 10410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:51,810-Speed 9612.87 samples/sec   Loss 13.6514   LearningRate 0.0939   Epoch: 0   Global Step: 10420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:52,890-Speed 9484.26 samples/sec   Loss 13.4888   LearningRate 0.0938   Epoch: 0   Global Step: 10430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:53,969-Speed 9498.50 samples/sec   Loss 13.4899   LearningRate 0.0938   Epoch: 0   Global Step: 10440   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:57:55,085-Speed 9182.58 samples/sec   Loss 13.5914   LearningRate 0.0938   Epoch: 0   Global Step: 10450   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:57:56,193-Speed 9244.70 samples/sec   Loss 13.4860   LearningRate 0.0938   Epoch: 0   Global Step: 10460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:57,301-Speed 9242.74 samples/sec   Loss 13.5170   LearningRate 0.0938   Epoch: 0   Global Step: 10470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:58,379-Speed 9506.60 samples/sec   Loss 13.6101   LearningRate 0.0938   Epoch: 0   Global Step: 10480   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:57:59,455-Speed 9527.97 samples/sec   Loss 13.4814   LearningRate 0.0938   Epoch: 0   Global Step: 10490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:00,570-Speed 9182.89 samples/sec   Loss 13.5444   LearningRate 0.0938   Epoch: 0   Global Step: 10500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:01,637-Speed 9605.54 samples/sec   Loss 13.6231   LearningRate 0.0938   Epoch: 0   Global Step: 10510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:02,737-Speed 9315.11 samples/sec   Loss 13.4149   LearningRate 0.0938   Epoch: 0   Global Step: 10520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:03,835-Speed 9328.59 samples/sec   Loss 13.4588   LearningRate 0.0938   Epoch: 0   Global Step: 10530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:04,927-Speed 9385.33 samples/sec   Loss 13.5180   LearningRate 0.0938   Epoch: 0   Global Step: 10540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:05,992-Speed 9616.64 samples/sec   Loss 13.6097   LearningRate 0.0938   Epoch: 0   Global Step: 10550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:07,131-Speed 8997.34 samples/sec   Loss 13.4665   LearningRate 0.0938   Epoch: 0   Global Step: 10560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:08,228-Speed 9340.52 samples/sec   Loss 13.5826   LearningRate 0.0938   Epoch: 0   Global Step: 10570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:09,305-Speed 9514.50 samples/sec   Loss 13.3947   LearningRate 0.0938   Epoch: 0   Global Step: 10580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:10,384-Speed 9496.78 samples/sec   Loss 13.5758   LearningRate 0.0938   Epoch: 0   Global Step: 10590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:11,478-Speed 9369.71 samples/sec   Loss 13.4992   LearningRate 0.0938   Epoch: 0   Global Step: 10600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:12,584-Speed 9265.96 samples/sec   Loss 13.5095   LearningRate 0.0937   Epoch: 0   Global Step: 10610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:13,649-Speed 9619.38 samples/sec   Loss 13.3801   LearningRate 0.0937   Epoch: 0   Global Step: 10620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:14,746-Speed 9339.84 samples/sec   Loss 13.4453   LearningRate 0.0937   Epoch: 0   Global Step: 10630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:15,842-Speed 9351.89 samples/sec   Loss 13.2912   LearningRate 0.0937   Epoch: 0   Global Step: 10640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:16,931-Speed 9407.13 samples/sec   Loss 13.3842   LearningRate 0.0937   Epoch: 0   Global Step: 10650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:18,052-Speed 9140.58 samples/sec   Loss 13.4956   LearningRate 0.0937   Epoch: 0   Global Step: 10660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:19,151-Speed 9317.10 samples/sec   Loss 13.4850   LearningRate 0.0937   Epoch: 0   Global Step: 10670   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:20,205-Speed 9721.04 samples/sec   Loss 13.4888   LearningRate 0.0937   Epoch: 0   Global Step: 10680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:21,285-Speed 9494.91 samples/sec   Loss 13.4634   LearningRate 0.0937   Epoch: 0   Global Step: 10690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:22,363-Speed 9501.75 samples/sec   Loss 13.5067   LearningRate 0.0937   Epoch: 0   Global Step: 10700   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:23,471-Speed 9244.70 samples/sec   Loss 13.4755   LearningRate 0.0937   Epoch: 0   Global Step: 10710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:24,571-Speed 9313.33 samples/sec   Loss 13.3568   LearningRate 0.0937   Epoch: 0   Global Step: 10720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:25,605-Speed 9910.73 samples/sec   Loss 13.5001   LearningRate 0.0937   Epoch: 0   Global Step: 10730   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:58:26,653-Speed 9776.31 samples/sec   Loss 13.3838   LearningRate 0.0937   Epoch: 0   Global Step: 10740   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:58:27,768-Speed 9192.10 samples/sec   Loss 13.4533   LearningRate 0.0937   Epoch: 0   Global Step: 10750   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:58:28,836-Speed 9597.33 samples/sec   Loss 13.3535   LearningRate 0.0937   Epoch: 0   Global Step: 10760   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:58:29,972-Speed 9019.85 samples/sec   Loss 13.3199   LearningRate 0.0937   Epoch: 0   Global Step: 10770   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:58:31,065-Speed 9369.86 samples/sec   Loss 13.3470   LearningRate 0.0936   Epoch: 0   Global Step: 10780   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:58:32,158-Speed 9374.92 samples/sec   Loss 13.4602   LearningRate 0.0936   Epoch: 0   Global Step: 10790   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:58:33,234-Speed 9530.16 samples/sec   Loss 13.3897   LearningRate 0.0936   Epoch: 0   Global Step: 10800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:34,314-Speed 9487.29 samples/sec   Loss 13.3459   LearningRate 0.0936   Epoch: 0   Global Step: 10810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:35,396-Speed 9462.40 samples/sec   Loss 13.4320   LearningRate 0.0936   Epoch: 0   Global Step: 10820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:36,519-Speed 9122.12 samples/sec   Loss 13.3965   LearningRate 0.0936   Epoch: 0   Global Step: 10830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:37,591-Speed 9564.64 samples/sec   Loss 13.2727   LearningRate 0.0936   Epoch: 0   Global Step: 10840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:38,663-Speed 9561.60 samples/sec   Loss 13.2606   LearningRate 0.0936   Epoch: 0   Global Step: 10850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:39,750-Speed 9424.23 samples/sec   Loss 13.3826   LearningRate 0.0936   Epoch: 0   Global Step: 10860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:40,849-Speed 9321.18 samples/sec   Loss 13.4596   LearningRate 0.0936   Epoch: 0   Global Step: 10870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:41,937-Speed 9420.36 samples/sec   Loss 13.2379   LearningRate 0.0936   Epoch: 0   Global Step: 10880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:43,032-Speed 9348.58 samples/sec   Loss 13.2307   LearningRate 0.0936   Epoch: 0   Global Step: 10890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:44,103-Speed 9574.31 samples/sec   Loss 13.3384   LearningRate 0.0936   Epoch: 0   Global Step: 10900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:45,216-Speed 9209.52 samples/sec   Loss 13.2638   LearningRate 0.0936   Epoch: 0   Global Step: 10910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:46,290-Speed 9535.49 samples/sec   Loss 13.3883   LearningRate 0.0936   Epoch: 0   Global Step: 10920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:47,369-Speed 9498.24 samples/sec   Loss 13.2771   LearningRate 0.0936   Epoch: 0   Global Step: 10930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:48,469-Speed 9317.48 samples/sec   Loss 13.3046   LearningRate 0.0936   Epoch: 0   Global Step: 10940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:49,583-Speed 9199.58 samples/sec   Loss 13.3710   LearningRate 0.0935   Epoch: 0   Global Step: 10950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:50,692-Speed 9235.20 samples/sec   Loss 13.3177   LearningRate 0.0935   Epoch: 0   Global Step: 10960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:51,782-Speed 9396.22 samples/sec   Loss 13.2272   LearningRate 0.0935   Epoch: 0   Global Step: 10970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:52,908-Speed 9099.05 samples/sec   Loss 13.3388   LearningRate 0.0935   Epoch: 0   Global Step: 10980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:54,025-Speed 9175.35 samples/sec   Loss 13.3432   LearningRate 0.0935   Epoch: 0   Global Step: 10990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:55,095-Speed 9571.15 samples/sec   Loss 13.1947   LearningRate 0.0935   Epoch: 0   Global Step: 11000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:56,189-Speed 9369.05 samples/sec   Loss 13.3308   LearningRate 0.0935   Epoch: 0   Global Step: 11010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:58:57,274-Speed 9453.33 samples/sec   Loss 13.4233   LearningRate 0.0935   Epoch: 0   Global Step: 11020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:58,352-Speed 9500.58 samples/sec   Loss 13.3589   LearningRate 0.0935   Epoch: 0   Global Step: 11030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:58:59,471-Speed 9160.29 samples/sec   Loss 13.3372   LearningRate 0.0935   Epoch: 0   Global Step: 11040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:00,562-Speed 9390.26 samples/sec   Loss 13.4123   LearningRate 0.0935   Epoch: 0   Global Step: 11050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:01,639-Speed 9510.30 samples/sec   Loss 13.2353   LearningRate 0.0935   Epoch: 0   Global Step: 11060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:02,724-Speed 9441.62 samples/sec   Loss 13.1576   LearningRate 0.0935   Epoch: 0   Global Step: 11070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:03,807-Speed 9459.56 samples/sec   Loss 13.2591   LearningRate 0.0935   Epoch: 0   Global Step: 11080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:04,922-Speed 9191.72 samples/sec   Loss 13.1777   LearningRate 0.0935   Epoch: 0   Global Step: 11090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:05,990-Speed 9598.36 samples/sec   Loss 13.2163   LearningRate 0.0935   Epoch: 0   Global Step: 11100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:07,062-Speed 9556.80 samples/sec   Loss 13.0921   LearningRate 0.0935   Epoch: 0   Global Step: 11110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:08,126-Speed 9626.43 samples/sec   Loss 13.2595   LearningRate 0.0934   Epoch: 0   Global Step: 11120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:09,202-Speed 9521.30 samples/sec   Loss 13.2789   LearningRate 0.0934   Epoch: 0   Global Step: 11130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:10,317-Speed 9194.02 samples/sec   Loss 13.3091   LearningRate 0.0934   Epoch: 0   Global Step: 11140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:11,395-Speed 9503.99 samples/sec   Loss 13.2040   LearningRate 0.0934   Epoch: 0   Global Step: 11150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:12,480-Speed 9441.01 samples/sec   Loss 13.2387   LearningRate 0.0934   Epoch: 0   Global Step: 11160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:13,568-Speed 9416.32 samples/sec   Loss 13.2121   LearningRate 0.0934   Epoch: 0   Global Step: 11170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:14,623-Speed 9711.85 samples/sec   Loss 13.2449   LearningRate 0.0934   Epoch: 0   Global Step: 11180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:15,668-Speed 9809.95 samples/sec   Loss 13.1522   LearningRate 0.0934   Epoch: 0   Global Step: 11190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:16,716-Speed 9777.29 samples/sec   Loss 13.2581   LearningRate 0.0934   Epoch: 0   Global Step: 11200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:17,803-Speed 9420.12 samples/sec   Loss 13.0787   LearningRate 0.0934   Epoch: 0   Global Step: 11210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:18,873-Speed 9577.86 samples/sec   Loss 13.2372   LearningRate 0.0934   Epoch: 0   Global Step: 11220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:19,897-Speed 10008.41 samples/sec   Loss 13.1641   LearningRate 0.0934   Epoch: 0   Global Step: 11230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 11:59:20,976-Speed 9495.70 samples/sec   Loss 13.3030   LearningRate 0.0934   Epoch: 0   Global Step: 11240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 11:59:22,037-Speed 9661.51 samples/sec   Loss 13.1580   LearningRate 0.0934   Epoch: 0   Global Step: 11250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 11:59:23,121-Speed 9449.94 samples/sec   Loss 13.0880   LearningRate 0.0934   Epoch: 0   Global Step: 11260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 11:59:24,217-Speed 9350.55 samples/sec   Loss 13.1713   LearningRate 0.0934   Epoch: 0   Global Step: 11270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 11:59:25,303-Speed 9433.51 samples/sec   Loss 13.1631   LearningRate 0.0934   Epoch: 0   Global Step: 11280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 11:59:26,418-Speed 9191.36 samples/sec   Loss 13.1579   LearningRate 0.0934   Epoch: 0   Global Step: 11290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 11:59:27,507-Speed 9404.81 samples/sec   Loss 13.2202   LearningRate 0.0933   Epoch: 0   Global Step: 11300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 11:59:28,566-Speed 9681.14 samples/sec   Loss 13.1730   LearningRate 0.0933   Epoch: 0   Global Step: 11310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 11:59:29,692-Speed 9092.07 samples/sec   Loss 13.1400   LearningRate 0.0933   Epoch: 0   Global Step: 11320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-11 11:59:30,773-Speed 9483.39 samples/sec   Loss 13.2696   LearningRate 0.0933   Epoch: 0   Global Step: 11330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:31,836-Speed 9635.38 samples/sec   Loss 13.0583   LearningRate 0.0933   Epoch: 0   Global Step: 11340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:32,898-Speed 9651.54 samples/sec   Loss 13.1067   LearningRate 0.0933   Epoch: 0   Global Step: 11350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:33,988-Speed 9399.40 samples/sec   Loss 13.0934   LearningRate 0.0933   Epoch: 0   Global Step: 11360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:35,036-Speed 9774.23 samples/sec   Loss 13.1968   LearningRate 0.0933   Epoch: 0   Global Step: 11370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:36,093-Speed 9694.05 samples/sec   Loss 13.2346   LearningRate 0.0933   Epoch: 0   Global Step: 11380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:37,144-Speed 9748.37 samples/sec   Loss 13.0503   LearningRate 0.0933   Epoch: 0   Global Step: 11390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:38,202-Speed 9684.53 samples/sec   Loss 13.1405   LearningRate 0.0933   Epoch: 0   Global Step: 11400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:39,236-Speed 9907.88 samples/sec   Loss 13.2056   LearningRate 0.0933   Epoch: 0   Global Step: 11410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:40,329-Speed 9373.17 samples/sec   Loss 13.1356   LearningRate 0.0933   Epoch: 0   Global Step: 11420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 11:59:41,406-Speed 9510.54 samples/sec   Loss 13.1380   LearningRate 0.0933   Epoch: 0   Global Step: 11430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:42,462-Speed 9706.65 samples/sec   Loss 13.0074   LearningRate 0.0933   Epoch: 0   Global Step: 11440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:43,538-Speed 9523.76 samples/sec   Loss 12.9947   LearningRate 0.0933   Epoch: 0   Global Step: 11450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:44,651-Speed 9203.28 samples/sec   Loss 13.1376   LearningRate 0.0933   Epoch: 0   Global Step: 11460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:45,745-Speed 9370.67 samples/sec   Loss 13.1798   LearningRate 0.0932   Epoch: 0   Global Step: 11470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:46,791-Speed 9801.14 samples/sec   Loss 13.0685   LearningRate 0.0932   Epoch: 0   Global Step: 11480   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:47,930-Speed 8989.10 samples/sec   Loss 13.1962   LearningRate 0.0932   Epoch: 0   Global Step: 11490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:49,044-Speed 9202.26 samples/sec   Loss 13.0952   LearningRate 0.0932   Epoch: 0   Global Step: 11500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:50,175-Speed 9060.94 samples/sec   Loss 13.2808   LearningRate 0.0932   Epoch: 0   Global Step: 11510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:51,239-Speed 9625.89 samples/sec   Loss 13.0085   LearningRate 0.0932   Epoch: 0   Global Step: 11520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:52,316-Speed 9512.02 samples/sec   Loss 13.0292   LearningRate 0.0932   Epoch: 0   Global Step: 11530   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:59:53,415-Speed 9328.41 samples/sec   Loss 13.0972   LearningRate 0.0932   Epoch: 0   Global Step: 11540   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 11:59:54,532-Speed 9164.96 samples/sec   Loss 12.8728   LearningRate 0.0932   Epoch: 0   Global Step: 11550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:55,638-Speed 9266.00 samples/sec   Loss 12.9844   LearningRate 0.0932   Epoch: 0   Global Step: 11560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:56,710-Speed 9559.35 samples/sec   Loss 13.2074   LearningRate 0.0932   Epoch: 0   Global Step: 11570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:57,752-Speed 9832.97 samples/sec   Loss 13.1581   LearningRate 0.0932   Epoch: 0   Global Step: 11580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 11:59:58,887-Speed 9027.28 samples/sec   Loss 13.0190   LearningRate 0.0932   Epoch: 0   Global Step: 11590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:00:00,005-Speed 9165.10 samples/sec   Loss 13.0243   LearningRate 0.0932   Epoch: 0   Global Step: 11600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:00:01,096-Speed 9389.09 samples/sec   Loss 12.9247   LearningRate 0.0932   Epoch: 0   Global Step: 11610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:00:02,213-Speed 9176.56 samples/sec   Loss 13.0095   LearningRate 0.0932   Epoch: 0   Global Step: 11620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:00:03,302-Speed 9407.28 samples/sec   Loss 12.9081   LearningRate 0.0932   Epoch: 0   Global Step: 11630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:00:04,400-Speed 9326.97 samples/sec   Loss 13.1367   LearningRate 0.0931   Epoch: 0   Global Step: 11640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:00:05,472-Speed 9565.72 samples/sec   Loss 12.9357   LearningRate 0.0931   Epoch: 0   Global Step: 11650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:00:06,524-Speed 9738.14 samples/sec   Loss 13.0027   LearningRate 0.0931   Epoch: 0   Global Step: 11660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:00:07,585-Speed 9655.01 samples/sec   Loss 13.0084   LearningRate 0.0931   Epoch: 0   Global Step: 11670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:00:08,648-Speed 9639.88 samples/sec   Loss 13.0947   LearningRate 0.0931   Epoch: 0   Global Step: 11680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:00:09,779-Speed 9058.75 samples/sec   Loss 13.0559   LearningRate 0.0931   Epoch: 0   Global Step: 11690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:00:10,832-Speed 9726.27 samples/sec   Loss 12.8795   LearningRate 0.0931   Epoch: 0   Global Step: 11700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:00:11,930-Speed 9330.14 samples/sec   Loss 13.0638   LearningRate 0.0931   Epoch: 0   Global Step: 11710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:00:12,986-Speed 9702.07 samples/sec   Loss 12.9762   LearningRate 0.0931   Epoch: 0   Global Step: 11720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:00:14,094-Speed 9255.38 samples/sec   Loss 13.0909   LearningRate 0.0931   Epoch: 0   Global Step: 11730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:00:15,202-Speed 9242.21 samples/sec   Loss 12.9561   LearningRate 0.0931   Epoch: 0   Global Step: 11740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:00:16,285-Speed 9464.03 samples/sec   Loss 12.9493   LearningRate 0.0931   Epoch: 0   Global Step: 11750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:00:17,344-Speed 9674.00 samples/sec   Loss 13.0619   LearningRate 0.0931   Epoch: 0   Global Step: 11760   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:00:18,418-Speed 9539.54 samples/sec   Loss 12.9873   LearningRate 0.0931   Epoch: 0   Global Step: 11770   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:00:19,490-Speed 9556.51 samples/sec   Loss 12.9322   LearningRate 0.0931   Epoch: 0   Global Step: 11780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:00:20,571-Speed 9484.36 samples/sec   Loss 13.0992   LearningRate 0.0931   Epoch: 0   Global Step: 11790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:00:21,622-Speed 9744.22 samples/sec   Loss 12.8867   LearningRate 0.0931   Epoch: 0   Global Step: 11800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:00:22,741-Speed 9154.60 samples/sec   Loss 12.8418   LearningRate 0.0930   Epoch: 0   Global Step: 11810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:00:23,824-Speed 9465.83 samples/sec   Loss 12.9154   LearningRate 0.0930   Epoch: 0   Global Step: 11820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:00:24,898-Speed 9535.03 samples/sec   Loss 12.9473   LearningRate 0.0930   Epoch: 0   Global Step: 11830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:00:26,013-Speed 9191.84 samples/sec   Loss 12.7851   LearningRate 0.0930   Epoch: 0   Global Step: 11840   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:00:27,098-Speed 9443.95 samples/sec   Loss 13.0222   LearningRate 0.0930   Epoch: 0   Global Step: 11850   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:00:28,168-Speed 9575.05 samples/sec   Loss 12.7787   LearningRate 0.0930   Epoch: 0   Global Step: 11860   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:00:29,255-Speed 9429.12 samples/sec   Loss 12.9065   LearningRate 0.0930   Epoch: 0   Global Step: 11870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:00:30,347-Speed 9383.92 samples/sec   Loss 12.9979   LearningRate 0.0930   Epoch: 0   Global Step: 11880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:00:31,426-Speed 9493.66 samples/sec   Loss 12.9361   LearningRate 0.0930   Epoch: 0   Global Step: 11890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:00:32,480-Speed 9724.43 samples/sec   Loss 12.8256   LearningRate 0.0930   Epoch: 0   Global Step: 11900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:00:33,563-Speed 9463.13 samples/sec   Loss 12.8610   LearningRate 0.0930   Epoch: 0   Global Step: 11910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:00:34,659-Speed 9346.46 samples/sec   Loss 12.7382   LearningRate 0.0930   Epoch: 0   Global Step: 11920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:00:35,748-Speed 9408.23 samples/sec   Loss 12.9615   LearningRate 0.0930   Epoch: 0   Global Step: 11930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:00:36,846-Speed 9333.22 samples/sec   Loss 12.8015   LearningRate 0.0930   Epoch: 0   Global Step: 11940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:00:37,926-Speed 9480.63 samples/sec   Loss 13.0450   LearningRate 0.0930   Epoch: 0   Global Step: 11950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:00:39,038-Speed 9215.17 samples/sec   Loss 12.8651   LearningRate 0.0930   Epoch: 0   Global Step: 11960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:00:40,119-Speed 9480.64 samples/sec   Loss 12.9885   LearningRate 0.0930   Epoch: 0   Global Step: 11970   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:00:41,158-Speed 9855.73 samples/sec   Loss 12.8115   LearningRate 0.0930   Epoch: 0   Global Step: 11980   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:00:42,227-Speed 9583.40 samples/sec   Loss 12.9106   LearningRate 0.0929   Epoch: 0   Global Step: 11990   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:00:43,336-Speed 9237.69 samples/sec   Loss 12.7730   LearningRate 0.0929   Epoch: 0   Global Step: 12000   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:01:04,984-[lfw][12000]XNorm: 14.524520
Training: 2022-04-11 12:01:04,985-[lfw][12000]Accuracy-Flip: 0.99250+-0.00449
Training: 2022-04-11 12:01:04,985-[lfw][12000]Accuracy-Highest: 0.99250
Training: 2022-04-11 12:01:30,027-[cfp_fp][12000]XNorm: 12.549293
Training: 2022-04-11 12:01:30,027-[cfp_fp][12000]Accuracy-Flip: 0.90814+-0.01084
Training: 2022-04-11 12:01:30,028-[cfp_fp][12000]Accuracy-Highest: 0.90814
Training: 2022-04-11 12:01:51,618-[agedb_30][12000]XNorm: 14.055729
Training: 2022-04-11 12:01:51,619-[agedb_30][12000]Accuracy-Flip: 0.91983+-0.01552
Training: 2022-04-11 12:01:51,620-[agedb_30][12000]Accuracy-Highest: 0.91983
Training: 2022-04-11 12:01:52,711-Speed 147.61 samples/sec   Loss 12.8340   LearningRate 0.0929   Epoch: 0   Global Step: 12010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:01:53,772-Speed 9651.64 samples/sec   Loss 12.8630   LearningRate 0.0929   Epoch: 0   Global Step: 12020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:01:54,852-Speed 9490.37 samples/sec   Loss 12.8801   LearningRate 0.0929   Epoch: 0   Global Step: 12030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:01:55,954-Speed 9296.69 samples/sec   Loss 12.8837   LearningRate 0.0929   Epoch: 0   Global Step: 12040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:01:57,024-Speed 9570.91 samples/sec   Loss 12.8698   LearningRate 0.0929   Epoch: 0   Global Step: 12050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:01:58,082-Speed 9681.11 samples/sec   Loss 12.8692   LearningRate 0.0929   Epoch: 0   Global Step: 12060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:01:59,141-Speed 9680.37 samples/sec   Loss 12.9020   LearningRate 0.0929   Epoch: 0   Global Step: 12070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:00,216-Speed 9535.36 samples/sec   Loss 12.6502   LearningRate 0.0929   Epoch: 0   Global Step: 12080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:01,314-Speed 9326.91 samples/sec   Loss 12.8896   LearningRate 0.0929   Epoch: 0   Global Step: 12090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:02,378-Speed 9629.61 samples/sec   Loss 12.8220   LearningRate 0.0929   Epoch: 0   Global Step: 12100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:03,502-Speed 9115.10 samples/sec   Loss 12.7903   LearningRate 0.0929   Epoch: 0   Global Step: 12110   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:02:04,578-Speed 9522.17 samples/sec   Loss 12.8962   LearningRate 0.0929   Epoch: 0   Global Step: 12120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:05,657-Speed 9495.88 samples/sec   Loss 12.7374   LearningRate 0.0929   Epoch: 0   Global Step: 12130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:06,700-Speed 9824.96 samples/sec   Loss 12.6687   LearningRate 0.0929   Epoch: 0   Global Step: 12140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:07,779-Speed 9493.39 samples/sec   Loss 12.7032   LearningRate 0.0929   Epoch: 0   Global Step: 12150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:08,878-Speed 9326.71 samples/sec   Loss 12.7360   LearningRate 0.0928   Epoch: 0   Global Step: 12160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:09,937-Speed 9675.73 samples/sec   Loss 12.7242   LearningRate 0.0928   Epoch: 0   Global Step: 12170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:10,974-Speed 9881.33 samples/sec   Loss 12.7418   LearningRate 0.0928   Epoch: 0   Global Step: 12180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:12,056-Speed 9468.80 samples/sec   Loss 12.8030   LearningRate 0.0928   Epoch: 0   Global Step: 12190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:13,099-Speed 9827.26 samples/sec   Loss 12.7961   LearningRate 0.0928   Epoch: 0   Global Step: 12200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:14,187-Speed 9415.64 samples/sec   Loss 12.8507   LearningRate 0.0928   Epoch: 0   Global Step: 12210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:15,283-Speed 9349.35 samples/sec   Loss 12.8995   LearningRate 0.0928   Epoch: 0   Global Step: 12220   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:02:16,334-Speed 9743.59 samples/sec   Loss 12.6023   LearningRate 0.0928   Epoch: 0   Global Step: 12230   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:02:17,400-Speed 9612.69 samples/sec   Loss 12.9008   LearningRate 0.0928   Epoch: 0   Global Step: 12240   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:02:18,493-Speed 9375.45 samples/sec   Loss 12.8437   LearningRate 0.0928   Epoch: 0   Global Step: 12250   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:02:19,591-Speed 9332.63 samples/sec   Loss 12.6981   LearningRate 0.0928   Epoch: 0   Global Step: 12260   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:02:20,678-Speed 9421.80 samples/sec   Loss 12.7391   LearningRate 0.0928   Epoch: 0   Global Step: 12270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:21,756-Speed 9511.00 samples/sec   Loss 12.8582   LearningRate 0.0928   Epoch: 0   Global Step: 12280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:22,815-Speed 9668.45 samples/sec   Loss 12.8479   LearningRate 0.0928   Epoch: 0   Global Step: 12290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:23,907-Speed 9381.71 samples/sec   Loss 12.7283   LearningRate 0.0928   Epoch: 0   Global Step: 12300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:24,983-Speed 9523.25 samples/sec   Loss 12.6446   LearningRate 0.0928   Epoch: 0   Global Step: 12310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:26,039-Speed 9709.94 samples/sec   Loss 12.6791   LearningRate 0.0928   Epoch: 0   Global Step: 12320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:27,150-Speed 9218.81 samples/sec   Loss 12.7706   LearningRate 0.0927   Epoch: 0   Global Step: 12330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:28,213-Speed 9635.32 samples/sec   Loss 12.8033   LearningRate 0.0927   Epoch: 0   Global Step: 12340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:29,267-Speed 9727.40 samples/sec   Loss 12.6068   LearningRate 0.0927   Epoch: 0   Global Step: 12350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:30,344-Speed 9516.66 samples/sec   Loss 12.7052   LearningRate 0.0927   Epoch: 0   Global Step: 12360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:31,444-Speed 9311.20 samples/sec   Loss 12.6010   LearningRate 0.0927   Epoch: 0   Global Step: 12370   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:02:32,492-Speed 9773.81 samples/sec   Loss 12.6863   LearningRate 0.0927   Epoch: 0   Global Step: 12380   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:02:33,589-Speed 9339.61 samples/sec   Loss 12.6741   LearningRate 0.0927   Epoch: 0   Global Step: 12390   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:02:34,633-Speed 9819.56 samples/sec   Loss 12.7076   LearningRate 0.0927   Epoch: 0   Global Step: 12400   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:02:35,692-Speed 9669.50 samples/sec   Loss 12.7454   LearningRate 0.0927   Epoch: 0   Global Step: 12410   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:02:36,712-Speed 10051.30 samples/sec   Loss 12.6901   LearningRate 0.0927   Epoch: 0   Global Step: 12420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:37,797-Speed 9441.27 samples/sec   Loss 12.6408   LearningRate 0.0927   Epoch: 0   Global Step: 12430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:38,905-Speed 9247.23 samples/sec   Loss 12.6322   LearningRate 0.0927   Epoch: 0   Global Step: 12440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:40,010-Speed 9275.92 samples/sec   Loss 12.5561   LearningRate 0.0927   Epoch: 0   Global Step: 12450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:41,104-Speed 9364.34 samples/sec   Loss 12.7031   LearningRate 0.0927   Epoch: 0   Global Step: 12460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:42,150-Speed 9797.71 samples/sec   Loss 12.7378   LearningRate 0.0927   Epoch: 0   Global Step: 12470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:43,239-Speed 9402.89 samples/sec   Loss 12.8337   LearningRate 0.0927   Epoch: 0   Global Step: 12480   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:44,290-Speed 9751.64 samples/sec   Loss 12.6136   LearningRate 0.0927   Epoch: 0   Global Step: 12490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:45,349-Speed 9680.94 samples/sec   Loss 12.5764   LearningRate 0.0927   Epoch: 0   Global Step: 12500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:46,437-Speed 9414.26 samples/sec   Loss 12.5857   LearningRate 0.0926   Epoch: 0   Global Step: 12510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:47,528-Speed 9389.55 samples/sec   Loss 12.8426   LearningRate 0.0926   Epoch: 0   Global Step: 12520   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:02:48,608-Speed 9482.00 samples/sec   Loss 12.6352   LearningRate 0.0926   Epoch: 0   Global Step: 12530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:49,686-Speed 9511.43 samples/sec   Loss 12.5440   LearningRate 0.0926   Epoch: 0   Global Step: 12540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:50,768-Speed 9461.66 samples/sec   Loss 12.6327   LearningRate 0.0926   Epoch: 0   Global Step: 12550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:51,857-Speed 9408.28 samples/sec   Loss 12.6484   LearningRate 0.0926   Epoch: 0   Global Step: 12560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:52,975-Speed 9170.16 samples/sec   Loss 12.5944   LearningRate 0.0926   Epoch: 0   Global Step: 12570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:54,034-Speed 9673.76 samples/sec   Loss 12.6673   LearningRate 0.0926   Epoch: 0   Global Step: 12580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:55,102-Speed 9592.64 samples/sec   Loss 12.6545   LearningRate 0.0926   Epoch: 0   Global Step: 12590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:56,175-Speed 9553.68 samples/sec   Loss 12.4322   LearningRate 0.0926   Epoch: 0   Global Step: 12600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:57,241-Speed 9613.22 samples/sec   Loss 12.6872   LearningRate 0.0926   Epoch: 0   Global Step: 12610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:58,334-Speed 9374.16 samples/sec   Loss 12.5953   LearningRate 0.0926   Epoch: 0   Global Step: 12620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:02:59,372-Speed 9867.45 samples/sec   Loss 12.5456   LearningRate 0.0926   Epoch: 0   Global Step: 12630   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:03:00,427-Speed 9716.97 samples/sec   Loss 12.6085   LearningRate 0.0926   Epoch: 0   Global Step: 12640   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:03:01,514-Speed 9417.48 samples/sec   Loss 12.6472   LearningRate 0.0926   Epoch: 0   Global Step: 12650   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:03:02,544-Speed 9954.67 samples/sec   Loss 12.6460   LearningRate 0.0926   Epoch: 0   Global Step: 12660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:03,575-Speed 9937.05 samples/sec   Loss 12.6353   LearningRate 0.0926   Epoch: 0   Global Step: 12670   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:04,720-Speed 8949.53 samples/sec   Loss 12.7286   LearningRate 0.0925   Epoch: 0   Global Step: 12680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:05,778-Speed 9680.63 samples/sec   Loss 12.6698   LearningRate 0.0925   Epoch: 0   Global Step: 12690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:06,850-Speed 9554.75 samples/sec   Loss 12.4817   LearningRate 0.0925   Epoch: 0   Global Step: 12700   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:07,951-Speed 9309.86 samples/sec   Loss 12.6575   LearningRate 0.0925   Epoch: 0   Global Step: 12710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:09,012-Speed 9659.38 samples/sec   Loss 12.6429   LearningRate 0.0925   Epoch: 0   Global Step: 12720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:10,075-Speed 9631.48 samples/sec   Loss 12.5919   LearningRate 0.0925   Epoch: 0   Global Step: 12730   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:11,171-Speed 9353.79 samples/sec   Loss 12.5849   LearningRate 0.0925   Epoch: 0   Global Step: 12740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:12,257-Speed 9428.35 samples/sec   Loss 12.6372   LearningRate 0.0925   Epoch: 0   Global Step: 12750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:13,366-Speed 9240.61 samples/sec   Loss 12.6055   LearningRate 0.0925   Epoch: 0   Global Step: 12760   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:03:14,419-Speed 9734.23 samples/sec   Loss 12.7091   LearningRate 0.0925   Epoch: 0   Global Step: 12770   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:03:15,464-Speed 9810.24 samples/sec   Loss 12.6457   LearningRate 0.0925   Epoch: 0   Global Step: 12780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:16,508-Speed 9813.66 samples/sec   Loss 12.5026   LearningRate 0.0925   Epoch: 0   Global Step: 12790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:17,563-Speed 9705.50 samples/sec   Loss 12.6652   LearningRate 0.0925   Epoch: 0   Global Step: 12800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:18,653-Speed 9401.11 samples/sec   Loss 12.6603   LearningRate 0.0925   Epoch: 0   Global Step: 12810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:19,725-Speed 9558.01 samples/sec   Loss 12.5191   LearningRate 0.0925   Epoch: 0   Global Step: 12820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:20,810-Speed 9449.85 samples/sec   Loss 12.5762   LearningRate 0.0925   Epoch: 0   Global Step: 12830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:21,892-Speed 9461.22 samples/sec   Loss 12.6681   LearningRate 0.0925   Epoch: 0   Global Step: 12840   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:22,996-Speed 9279.60 samples/sec   Loss 12.5955   LearningRate 0.0924   Epoch: 0   Global Step: 12850   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:24,086-Speed 9398.97 samples/sec   Loss 12.5997   LearningRate 0.0924   Epoch: 0   Global Step: 12860   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:25,218-Speed 9054.20 samples/sec   Loss 12.6456   LearningRate 0.0924   Epoch: 0   Global Step: 12870   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:26,266-Speed 9776.85 samples/sec   Loss 12.5949   LearningRate 0.0924   Epoch: 0   Global Step: 12880   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:03:27,336-Speed 9582.01 samples/sec   Loss 12.4659   LearningRate 0.0924   Epoch: 0   Global Step: 12890   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:03:28,430-Speed 9361.11 samples/sec   Loss 12.4890   LearningRate 0.0924   Epoch: 0   Global Step: 12900   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:03:29,512-Speed 9468.99 samples/sec   Loss 12.4189   LearningRate 0.0924   Epoch: 0   Global Step: 12910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:30,558-Speed 9794.71 samples/sec   Loss 12.5972   LearningRate 0.0924   Epoch: 0   Global Step: 12920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:31,653-Speed 9359.44 samples/sec   Loss 12.6682   LearningRate 0.0924   Epoch: 0   Global Step: 12930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:32,748-Speed 9358.70 samples/sec   Loss 12.4769   LearningRate 0.0924   Epoch: 0   Global Step: 12940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:33,874-Speed 9100.48 samples/sec   Loss 12.5518   LearningRate 0.0924   Epoch: 0   Global Step: 12950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:34,998-Speed 9117.73 samples/sec   Loss 12.5567   LearningRate 0.0924   Epoch: 0   Global Step: 12960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:36,087-Speed 9402.58 samples/sec   Loss 12.5242   LearningRate 0.0924   Epoch: 0   Global Step: 12970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:37,150-Speed 9639.61 samples/sec   Loss 12.5538   LearningRate 0.0924   Epoch: 0   Global Step: 12980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:38,215-Speed 9621.35 samples/sec   Loss 12.5736   LearningRate 0.0924   Epoch: 0   Global Step: 12990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:39,319-Speed 9283.11 samples/sec   Loss 12.4390   LearningRate 0.0924   Epoch: 0   Global Step: 13000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:40,429-Speed 9228.65 samples/sec   Loss 12.5126   LearningRate 0.0924   Epoch: 0   Global Step: 13010   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:03:41,490-Speed 9659.46 samples/sec   Loss 12.3492   LearningRate 0.0924   Epoch: 0   Global Step: 13020   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:03:42,547-Speed 9698.05 samples/sec   Loss 12.3923   LearningRate 0.0923   Epoch: 0   Global Step: 13030   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:03:43,630-Speed 9454.40 samples/sec   Loss 12.4301   LearningRate 0.0923   Epoch: 0   Global Step: 13040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:44,700-Speed 9575.96 samples/sec   Loss 12.4297   LearningRate 0.0923   Epoch: 0   Global Step: 13050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:45,767-Speed 9611.30 samples/sec   Loss 12.5151   LearningRate 0.0923   Epoch: 0   Global Step: 13060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:46,824-Speed 9689.85 samples/sec   Loss 12.3869   LearningRate 0.0923   Epoch: 0   Global Step: 13070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:03:47,957-Speed 9041.46 samples/sec   Loss 12.3746   LearningRate 0.0923   Epoch: 0   Global Step: 13080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:03:49,046-Speed 9415.46 samples/sec   Loss 12.3226   LearningRate 0.0923   Epoch: 0   Global Step: 13090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:03:50,131-Speed 9438.48 samples/sec   Loss 12.3378   LearningRate 0.0923   Epoch: 0   Global Step: 13100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:03:51,203-Speed 9557.71 samples/sec   Loss 12.4092   LearningRate 0.0923   Epoch: 0   Global Step: 13110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:03:52,265-Speed 9651.65 samples/sec   Loss 12.4583   LearningRate 0.0923   Epoch: 0   Global Step: 13120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:03:53,358-Speed 9370.66 samples/sec   Loss 12.4302   LearningRate 0.0923   Epoch: 0   Global Step: 13130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:03:54,423-Speed 9622.44 samples/sec   Loss 12.4398   LearningRate 0.0923   Epoch: 0   Global Step: 13140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:03:55,507-Speed 9448.85 samples/sec   Loss 12.3969   LearningRate 0.0923   Epoch: 0   Global Step: 13150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:03:56,605-Speed 9332.71 samples/sec   Loss 12.2608   LearningRate 0.0923   Epoch: 0   Global Step: 13160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:03:57,701-Speed 9348.55 samples/sec   Loss 12.3879   LearningRate 0.0923   Epoch: 0   Global Step: 13170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:58,805-Speed 9280.93 samples/sec   Loss 12.5216   LearningRate 0.0923   Epoch: 0   Global Step: 13180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:03:59,909-Speed 9286.64 samples/sec   Loss 12.4154   LearningRate 0.0923   Epoch: 0   Global Step: 13190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:01,013-Speed 9279.19 samples/sec   Loss 12.2881   LearningRate 0.0922   Epoch: 0   Global Step: 13200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:02,141-Speed 9084.82 samples/sec   Loss 12.5593   LearningRate 0.0922   Epoch: 0   Global Step: 13210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:03,223-Speed 9469.12 samples/sec   Loss 12.4628   LearningRate 0.0922   Epoch: 0   Global Step: 13220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:04,284-Speed 9664.17 samples/sec   Loss 12.4251   LearningRate 0.0922   Epoch: 0   Global Step: 13230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:05,333-Speed 9762.41 samples/sec   Loss 12.4314   LearningRate 0.0922   Epoch: 0   Global Step: 13240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:06,410-Speed 9510.47 samples/sec   Loss 12.4674   LearningRate 0.0922   Epoch: 0   Global Step: 13250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:07,527-Speed 9174.68 samples/sec   Loss 12.3817   LearningRate 0.0922   Epoch: 0   Global Step: 13260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:08,614-Speed 9426.97 samples/sec   Loss 12.4942   LearningRate 0.0922   Epoch: 0   Global Step: 13270   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:04:09,674-Speed 9664.94 samples/sec   Loss 12.4095   LearningRate 0.0922   Epoch: 0   Global Step: 13280   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:04:10,770-Speed 9347.39 samples/sec   Loss 12.4539   LearningRate 0.0922   Epoch: 0   Global Step: 13290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:11,870-Speed 9312.27 samples/sec   Loss 12.3728   LearningRate 0.0922   Epoch: 0   Global Step: 13300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:12,985-Speed 9194.80 samples/sec   Loss 12.5353   LearningRate 0.0922   Epoch: 0   Global Step: 13310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:14,067-Speed 9463.58 samples/sec   Loss 12.3878   LearningRate 0.0922   Epoch: 0   Global Step: 13320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:15,119-Speed 9746.22 samples/sec   Loss 12.4562   LearningRate 0.0922   Epoch: 0   Global Step: 13330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:16,198-Speed 9499.43 samples/sec   Loss 12.3794   LearningRate 0.0922   Epoch: 0   Global Step: 13340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:17,294-Speed 9348.68 samples/sec   Loss 12.3280   LearningRate 0.0922   Epoch: 0   Global Step: 13350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:18,406-Speed 9206.21 samples/sec   Loss 12.2635   LearningRate 0.0922   Epoch: 0   Global Step: 13360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:19,449-Speed 9832.62 samples/sec   Loss 12.2532   LearningRate 0.0922   Epoch: 0   Global Step: 13370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:20,538-Speed 9404.46 samples/sec   Loss 12.5140   LearningRate 0.0921   Epoch: 0   Global Step: 13380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:21,642-Speed 9282.71 samples/sec   Loss 12.3176   LearningRate 0.0921   Epoch: 0   Global Step: 13390   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:04:22,731-Speed 9405.27 samples/sec   Loss 12.2675   LearningRate 0.0921   Epoch: 0   Global Step: 13400   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:04:23,792-Speed 9655.80 samples/sec   Loss 12.3865   LearningRate 0.0921   Epoch: 0   Global Step: 13410   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:04:24,884-Speed 9383.47 samples/sec   Loss 12.2444   LearningRate 0.0921   Epoch: 0   Global Step: 13420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:25,964-Speed 9494.42 samples/sec   Loss 12.2592   LearningRate 0.0921   Epoch: 0   Global Step: 13430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:27,041-Speed 9512.86 samples/sec   Loss 12.3091   LearningRate 0.0921   Epoch: 0   Global Step: 13440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:28,114-Speed 9543.64 samples/sec   Loss 12.3033   LearningRate 0.0921   Epoch: 0   Global Step: 13450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:29,170-Speed 9704.74 samples/sec   Loss 12.3247   LearningRate 0.0921   Epoch: 0   Global Step: 13460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:30,228-Speed 9686.80 samples/sec   Loss 12.3368   LearningRate 0.0921   Epoch: 0   Global Step: 13470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:31,277-Speed 9769.81 samples/sec   Loss 12.1808   LearningRate 0.0921   Epoch: 0   Global Step: 13480   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:32,351-Speed 9540.58 samples/sec   Loss 12.2779   LearningRate 0.0921   Epoch: 0   Global Step: 13490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:33,465-Speed 9192.21 samples/sec   Loss 12.1764   LearningRate 0.0921   Epoch: 0   Global Step: 13500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:34,563-Speed 9337.22 samples/sec   Loss 12.2099   LearningRate 0.0921   Epoch: 0   Global Step: 13510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:35,653-Speed 9398.97 samples/sec   Loss 12.2138   LearningRate 0.0921   Epoch: 0   Global Step: 13520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:36,699-Speed 9796.70 samples/sec   Loss 12.2397   LearningRate 0.0921   Epoch: 0   Global Step: 13530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:04:37,760-Speed 9650.02 samples/sec   Loss 12.3388   LearningRate 0.0921   Epoch: 0   Global Step: 13540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:04:38,836-Speed 9526.57 samples/sec   Loss 12.2845   LearningRate 0.0920   Epoch: 0   Global Step: 13550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:04:39,908-Speed 9560.48 samples/sec   Loss 12.3474   LearningRate 0.0920   Epoch: 0   Global Step: 13560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:04:40,954-Speed 9790.28 samples/sec   Loss 12.4895   LearningRate 0.0920   Epoch: 0   Global Step: 13570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:04:42,065-Speed 9223.94 samples/sec   Loss 12.3225   LearningRate 0.0920   Epoch: 0   Global Step: 13580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:04:43,150-Speed 9441.76 samples/sec   Loss 12.2434   LearningRate 0.0920   Epoch: 0   Global Step: 13590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:04:44,234-Speed 9456.35 samples/sec   Loss 12.2950   LearningRate 0.0920   Epoch: 0   Global Step: 13600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:04:45,277-Speed 9827.79 samples/sec   Loss 12.3657   LearningRate 0.0920   Epoch: 0   Global Step: 13610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:04:46,361-Speed 9447.55 samples/sec   Loss 12.2772   LearningRate 0.0920   Epoch: 0   Global Step: 13620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:04:47,492-Speed 9065.78 samples/sec   Loss 12.1833   LearningRate 0.0920   Epoch: 0   Global Step: 13630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:48,585-Speed 9373.29 samples/sec   Loss 12.1516   LearningRate 0.0920   Epoch: 0   Global Step: 13640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:49,630-Speed 9802.40 samples/sec   Loss 12.2556   LearningRate 0.0920   Epoch: 0   Global Step: 13650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:50,659-Speed 9951.83 samples/sec   Loss 12.2326   LearningRate 0.0920   Epoch: 0   Global Step: 13660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:51,740-Speed 9480.17 samples/sec   Loss 12.2588   LearningRate 0.0920   Epoch: 0   Global Step: 13670   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:04:52,812-Speed 9558.90 samples/sec   Loss 12.2050   LearningRate 0.0920   Epoch: 0   Global Step: 13680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:04:53,912-Speed 9315.85 samples/sec   Loss 12.4608   LearningRate 0.0920   Epoch: 0   Global Step: 13690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:04:54,946-Speed 9908.13 samples/sec   Loss 12.2377   LearningRate 0.0920   Epoch: 0   Global Step: 13700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:04:55,988-Speed 9835.85 samples/sec   Loss 12.2080   LearningRate 0.0920   Epoch: 0   Global Step: 13710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:04:57,125-Speed 9010.09 samples/sec   Loss 12.3106   LearningRate 0.0919   Epoch: 0   Global Step: 13720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:04:58,252-Speed 9093.45 samples/sec   Loss 12.2522   LearningRate 0.0919   Epoch: 0   Global Step: 13730   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:04:59,311-Speed 9674.27 samples/sec   Loss 12.3416   LearningRate 0.0919   Epoch: 0   Global Step: 13740   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:05:00,437-Speed 9105.72 samples/sec   Loss 12.2248   LearningRate 0.0919   Epoch: 0   Global Step: 13750   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:05:01,501-Speed 9625.03 samples/sec   Loss 12.1997   LearningRate 0.0919   Epoch: 0   Global Step: 13760   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:05:02,603-Speed 9303.68 samples/sec   Loss 12.1260   LearningRate 0.0919   Epoch: 0   Global Step: 13770   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:05:03,693-Speed 9398.00 samples/sec   Loss 12.3195   LearningRate 0.0919   Epoch: 0   Global Step: 13780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:05:04,795-Speed 9301.63 samples/sec   Loss 12.2425   LearningRate 0.0919   Epoch: 0   Global Step: 13790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:05:05,897-Speed 9294.73 samples/sec   Loss 12.2130   LearningRate 0.0919   Epoch: 0   Global Step: 13800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:05:06,953-Speed 9699.81 samples/sec   Loss 12.2542   LearningRate 0.0919   Epoch: 0   Global Step: 13810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:05:08,050-Speed 9337.38 samples/sec   Loss 12.1573   LearningRate 0.0919   Epoch: 0   Global Step: 13820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:05:09,112-Speed 9650.57 samples/sec   Loss 12.2083   LearningRate 0.0919   Epoch: 0   Global Step: 13830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:05:10,191-Speed 9499.17 samples/sec   Loss 12.3091   LearningRate 0.0919   Epoch: 0   Global Step: 13840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:05:11,265-Speed 9539.54 samples/sec   Loss 12.2635   LearningRate 0.0919   Epoch: 0   Global Step: 13850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:05:12,357-Speed 9380.56 samples/sec   Loss 12.1053   LearningRate 0.0919   Epoch: 0   Global Step: 13860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:05:13,451-Speed 9366.60 samples/sec   Loss 12.2629   LearningRate 0.0919   Epoch: 0   Global Step: 13870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:05:14,547-Speed 9349.19 samples/sec   Loss 12.2397   LearningRate 0.0919   Epoch: 0   Global Step: 13880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:05:15,617-Speed 9575.21 samples/sec   Loss 12.2242   LearningRate 0.0919   Epoch: 0   Global Step: 13890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:05:16,689-Speed 9566.63 samples/sec   Loss 12.4194   LearningRate 0.0918   Epoch: 0   Global Step: 13900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:05:17,781-Speed 9377.92 samples/sec   Loss 12.1494   LearningRate 0.0918   Epoch: 0   Global Step: 13910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:05:18,867-Speed 9432.33 samples/sec   Loss 12.2492   LearningRate 0.0918   Epoch: 0   Global Step: 13920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:05:19,945-Speed 9512.05 samples/sec   Loss 12.2088   LearningRate 0.0918   Epoch: 0   Global Step: 13930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:05:21,019-Speed 9536.10 samples/sec   Loss 12.3089   LearningRate 0.0918   Epoch: 0   Global Step: 13940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:05:22,068-Speed 9768.38 samples/sec   Loss 12.2258   LearningRate 0.0918   Epoch: 0   Global Step: 13950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:05:23,153-Speed 9444.85 samples/sec   Loss 12.0991   LearningRate 0.0918   Epoch: 0   Global Step: 13960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:05:24,234-Speed 9473.66 samples/sec   Loss 12.0774   LearningRate 0.0918   Epoch: 0   Global Step: 13970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:05:25,300-Speed 9618.44 samples/sec   Loss 12.1688   LearningRate 0.0918   Epoch: 0   Global Step: 13980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:05:26,427-Speed 9093.56 samples/sec   Loss 12.2206   LearningRate 0.0918   Epoch: 0   Global Step: 13990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:05:27,540-Speed 9203.34 samples/sec   Loss 11.9706   LearningRate 0.0918   Epoch: 0   Global Step: 14000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:05:49,451-[lfw][14000]XNorm: 14.270774
Training: 2022-04-11 12:05:49,452-[lfw][14000]Accuracy-Flip: 0.99317+-0.00369
Training: 2022-04-11 12:05:49,452-[lfw][14000]Accuracy-Highest: 0.99317
Training: 2022-04-11 12:06:14,824-[cfp_fp][14000]XNorm: 12.275331
Training: 2022-04-11 12:06:14,824-[cfp_fp][14000]Accuracy-Flip: 0.91457+-0.01183
Training: 2022-04-11 12:06:14,825-[cfp_fp][14000]Accuracy-Highest: 0.91457
Training: 2022-04-11 12:06:36,719-[agedb_30][14000]XNorm: 13.830484
Training: 2022-04-11 12:06:36,719-[agedb_30][14000]Accuracy-Flip: 0.93083+-0.01480
Training: 2022-04-11 12:06:36,720-[agedb_30][14000]Accuracy-Highest: 0.93083
Training: 2022-04-11 12:06:37,792-Speed 145.76 samples/sec   Loss 12.1851   LearningRate 0.0918   Epoch: 0   Global Step: 14010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:06:38,878-Speed 9439.85 samples/sec   Loss 12.0886   LearningRate 0.0918   Epoch: 0   Global Step: 14020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:06:39,999-Speed 9136.15 samples/sec   Loss 12.1327   LearningRate 0.0918   Epoch: 0   Global Step: 14030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:06:41,096-Speed 9341.66 samples/sec   Loss 12.2180   LearningRate 0.0918   Epoch: 0   Global Step: 14040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:06:42,185-Speed 9414.59 samples/sec   Loss 12.1815   LearningRate 0.0918   Epoch: 0   Global Step: 14050   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:06:43,278-Speed 9374.53 samples/sec   Loss 12.1006   LearningRate 0.0918   Epoch: 0   Global Step: 14060   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:06:44,379-Speed 9298.68 samples/sec   Loss 12.0662   LearningRate 0.0917   Epoch: 0   Global Step: 14070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:06:45,443-Speed 9631.75 samples/sec   Loss 12.1563   LearningRate 0.0917   Epoch: 0   Global Step: 14080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:06:46,538-Speed 9357.25 samples/sec   Loss 12.1743   LearningRate 0.0917   Epoch: 0   Global Step: 14090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:06:47,637-Speed 9325.88 samples/sec   Loss 12.1596   LearningRate 0.0917   Epoch: 0   Global Step: 14100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:06:48,729-Speed 9377.14 samples/sec   Loss 12.2693   LearningRate 0.0917   Epoch: 0   Global Step: 14110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:06:49,813-Speed 9456.04 samples/sec   Loss 12.1877   LearningRate 0.0917   Epoch: 0   Global Step: 14120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:06:50,852-Speed 9864.61 samples/sec   Loss 12.1650   LearningRate 0.0917   Epoch: 0   Global Step: 14130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:06:51,919-Speed 9603.92 samples/sec   Loss 12.1866   LearningRate 0.0917   Epoch: 0   Global Step: 14140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:06:52,991-Speed 9550.23 samples/sec   Loss 12.1826   LearningRate 0.0917   Epoch: 0   Global Step: 14150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:06:54,083-Speed 9389.14 samples/sec   Loss 12.2008   LearningRate 0.0917   Epoch: 0   Global Step: 14160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:06:55,209-Speed 9092.22 samples/sec   Loss 12.0511   LearningRate 0.0917   Epoch: 0   Global Step: 14170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:06:56,266-Speed 9702.17 samples/sec   Loss 12.1455   LearningRate 0.0917   Epoch: 0   Global Step: 14180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:06:57,363-Speed 9337.49 samples/sec   Loss 12.2032   LearningRate 0.0917   Epoch: 0   Global Step: 14190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:06:58,427-Speed 9628.37 samples/sec   Loss 12.1471   LearningRate 0.0917   Epoch: 0   Global Step: 14200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:06:59,468-Speed 9846.12 samples/sec   Loss 12.1746   LearningRate 0.0917   Epoch: 0   Global Step: 14210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:00,575-Speed 9255.10 samples/sec   Loss 12.0544   LearningRate 0.0917   Epoch: 0   Global Step: 14220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:01,656-Speed 9483.47 samples/sec   Loss 12.0907   LearningRate 0.0917   Epoch: 0   Global Step: 14230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:02,723-Speed 9599.99 samples/sec   Loss 12.0712   LearningRate 0.0917   Epoch: 0   Global Step: 14240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:03,776-Speed 9733.64 samples/sec   Loss 12.0033   LearningRate 0.0916   Epoch: 0   Global Step: 14250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:04,846-Speed 9571.68 samples/sec   Loss 12.1213   LearningRate 0.0916   Epoch: 0   Global Step: 14260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:05,909-Speed 9636.49 samples/sec   Loss 11.9063   LearningRate 0.0916   Epoch: 0   Global Step: 14270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:06,951-Speed 9833.98 samples/sec   Loss 12.0486   LearningRate 0.0916   Epoch: 0   Global Step: 14280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:08,021-Speed 9577.69 samples/sec   Loss 12.0914   LearningRate 0.0916   Epoch: 0   Global Step: 14290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:09,092-Speed 9568.65 samples/sec   Loss 12.0334   LearningRate 0.0916   Epoch: 0   Global Step: 14300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:10,191-Speed 9326.13 samples/sec   Loss 12.1311   LearningRate 0.0916   Epoch: 0   Global Step: 14310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:11,298-Speed 9252.35 samples/sec   Loss 11.8749   LearningRate 0.0916   Epoch: 0   Global Step: 14320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:12,382-Speed 9455.57 samples/sec   Loss 11.9627   LearningRate 0.0916   Epoch: 0   Global Step: 14330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:13,517-Speed 9024.39 samples/sec   Loss 12.1669   LearningRate 0.0916   Epoch: 0   Global Step: 14340   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:07:14,652-Speed 9027.00 samples/sec   Loss 11.9712   LearningRate 0.0916   Epoch: 0   Global Step: 14350   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:07:15,762-Speed 9225.02 samples/sec   Loss 12.1143   LearningRate 0.0916   Epoch: 0   Global Step: 14360   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:07:16,841-Speed 9498.70 samples/sec   Loss 12.1133   LearningRate 0.0916   Epoch: 0   Global Step: 14370   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:07:17,918-Speed 9519.40 samples/sec   Loss 12.1661   LearningRate 0.0916   Epoch: 0   Global Step: 14380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:19,025-Speed 9259.16 samples/sec   Loss 11.9875   LearningRate 0.0916   Epoch: 0   Global Step: 14390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:20,116-Speed 9392.08 samples/sec   Loss 12.1295   LearningRate 0.0916   Epoch: 0   Global Step: 14400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:21,227-Speed 9224.15 samples/sec   Loss 12.0643   LearningRate 0.0916   Epoch: 0   Global Step: 14410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:22,270-Speed 9824.41 samples/sec   Loss 12.0729   LearningRate 0.0915   Epoch: 0   Global Step: 14420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:23,364-Speed 9360.01 samples/sec   Loss 12.0137   LearningRate 0.0915   Epoch: 0   Global Step: 14430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:24,424-Speed 9664.40 samples/sec   Loss 12.0708   LearningRate 0.0915   Epoch: 0   Global Step: 14440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:25,580-Speed 8868.37 samples/sec   Loss 12.0676   LearningRate 0.0915   Epoch: 0   Global Step: 14450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:26,642-Speed 9650.71 samples/sec   Loss 12.1367   LearningRate 0.0915   Epoch: 0   Global Step: 14460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:27,710-Speed 9592.83 samples/sec   Loss 12.1241   LearningRate 0.0915   Epoch: 0   Global Step: 14470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:28,766-Speed 9701.38 samples/sec   Loss 12.0971   LearningRate 0.0915   Epoch: 0   Global Step: 14480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:29,825-Speed 9669.07 samples/sec   Loss 12.1021   LearningRate 0.0915   Epoch: 0   Global Step: 14490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:30,862-Speed 9878.44 samples/sec   Loss 12.1350   LearningRate 0.0915   Epoch: 0   Global Step: 14500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:31,991-Speed 9081.52 samples/sec   Loss 12.1066   LearningRate 0.0915   Epoch: 0   Global Step: 14510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:33,090-Speed 9320.31 samples/sec   Loss 12.1014   LearningRate 0.0915   Epoch: 0   Global Step: 14520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:34,123-Speed 9916.79 samples/sec   Loss 11.9311   LearningRate 0.0915   Epoch: 0   Global Step: 14530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:35,270-Speed 8930.63 samples/sec   Loss 12.0781   LearningRate 0.0915   Epoch: 0   Global Step: 14540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:36,373-Speed 9290.52 samples/sec   Loss 12.1560   LearningRate 0.0915   Epoch: 0   Global Step: 14550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:37,471-Speed 9333.95 samples/sec   Loss 11.9866   LearningRate 0.0915   Epoch: 0   Global Step: 14560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:38,556-Speed 9491.90 samples/sec   Loss 11.9248   LearningRate 0.0915   Epoch: 0   Global Step: 14570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:39,636-Speed 9482.42 samples/sec   Loss 12.0374   LearningRate 0.0915   Epoch: 0   Global Step: 14580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:40,699-Speed 9640.81 samples/sec   Loss 12.0464   LearningRate 0.0914   Epoch: 0   Global Step: 14590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:41,800-Speed 9309.43 samples/sec   Loss 11.9838   LearningRate 0.0914   Epoch: 0   Global Step: 14600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:42,873-Speed 9549.12 samples/sec   Loss 11.9571   LearningRate 0.0914   Epoch: 0   Global Step: 14610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:43,974-Speed 9298.11 samples/sec   Loss 11.8206   LearningRate 0.0914   Epoch: 0   Global Step: 14620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:45,055-Speed 9482.97 samples/sec   Loss 11.9890   LearningRate 0.0914   Epoch: 0   Global Step: 14630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:46,108-Speed 9727.07 samples/sec   Loss 12.0158   LearningRate 0.0914   Epoch: 0   Global Step: 14640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:47,185-Speed 9513.05 samples/sec   Loss 12.0857   LearningRate 0.0914   Epoch: 0   Global Step: 14650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:07:48,244-Speed 9680.53 samples/sec   Loss 11.9804   LearningRate 0.0914   Epoch: 0   Global Step: 14660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:49,313-Speed 9583.33 samples/sec   Loss 11.9114   LearningRate 0.0914   Epoch: 0   Global Step: 14670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:50,417-Speed 9283.98 samples/sec   Loss 11.9844   LearningRate 0.0914   Epoch: 0   Global Step: 14680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:51,498-Speed 9478.06 samples/sec   Loss 12.0201   LearningRate 0.0914   Epoch: 0   Global Step: 14690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:52,552-Speed 9714.60 samples/sec   Loss 12.0135   LearningRate 0.0914   Epoch: 0   Global Step: 14700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:53,618-Speed 9611.91 samples/sec   Loss 11.9690   LearningRate 0.0914   Epoch: 0   Global Step: 14710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:54,706-Speed 9420.67 samples/sec   Loss 12.0423   LearningRate 0.0914   Epoch: 0   Global Step: 14720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:55,778-Speed 9567.49 samples/sec   Loss 12.0172   LearningRate 0.0914   Epoch: 0   Global Step: 14730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:56,850-Speed 9553.11 samples/sec   Loss 11.8900   LearningRate 0.0914   Epoch: 0   Global Step: 14740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:57,970-Speed 9150.76 samples/sec   Loss 12.0095   LearningRate 0.0914   Epoch: 0   Global Step: 14750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:07:59,075-Speed 9271.83 samples/sec   Loss 11.9094   LearningRate 0.0914   Epoch: 0   Global Step: 14760   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:00,174-Speed 9320.80 samples/sec   Loss 11.9992   LearningRate 0.0913   Epoch: 0   Global Step: 14770   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:01,259-Speed 9441.26 samples/sec   Loss 11.8873   LearningRate 0.0913   Epoch: 0   Global Step: 14780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:02,335-Speed 9520.54 samples/sec   Loss 11.9921   LearningRate 0.0913   Epoch: 0   Global Step: 14790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:03,396-Speed 9664.13 samples/sec   Loss 12.0396   LearningRate 0.0913   Epoch: 0   Global Step: 14800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:04,475-Speed 9491.78 samples/sec   Loss 11.9169   LearningRate 0.0913   Epoch: 0   Global Step: 14810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:05,539-Speed 9627.74 samples/sec   Loss 11.8847   LearningRate 0.0913   Epoch: 0   Global Step: 14820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:06,627-Speed 9421.50 samples/sec   Loss 11.9643   LearningRate 0.0913   Epoch: 0   Global Step: 14830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:07,677-Speed 9757.95 samples/sec   Loss 11.9326   LearningRate 0.0913   Epoch: 0   Global Step: 14840   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:08,765-Speed 9414.88 samples/sec   Loss 12.0292   LearningRate 0.0913   Epoch: 0   Global Step: 14850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:09,864-Speed 9327.84 samples/sec   Loss 11.8849   LearningRate 0.0913   Epoch: 0   Global Step: 14860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:10,971-Speed 9252.86 samples/sec   Loss 11.8783   LearningRate 0.0913   Epoch: 0   Global Step: 14870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:12,097-Speed 9096.05 samples/sec   Loss 11.9811   LearningRate 0.0913   Epoch: 0   Global Step: 14880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:13,172-Speed 9531.82 samples/sec   Loss 12.0979   LearningRate 0.0913   Epoch: 0   Global Step: 14890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:14,240-Speed 9595.62 samples/sec   Loss 11.9962   LearningRate 0.0913   Epoch: 0   Global Step: 14900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:15,308-Speed 9590.79 samples/sec   Loss 11.8617   LearningRate 0.0913   Epoch: 0   Global Step: 14910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:16,407-Speed 9327.54 samples/sec   Loss 11.9993   LearningRate 0.0913   Epoch: 0   Global Step: 14920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:17,501-Speed 9361.25 samples/sec   Loss 11.7845   LearningRate 0.0913   Epoch: 0   Global Step: 14930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:18,550-Speed 9771.96 samples/sec   Loss 11.9204   LearningRate 0.0912   Epoch: 0   Global Step: 14940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:19,585-Speed 9896.14 samples/sec   Loss 11.9193   LearningRate 0.0912   Epoch: 0   Global Step: 14950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:20,667-Speed 9476.07 samples/sec   Loss 12.0000   LearningRate 0.0912   Epoch: 0   Global Step: 14960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:21,773-Speed 9263.56 samples/sec   Loss 11.9422   LearningRate 0.0912   Epoch: 0   Global Step: 14970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:22,834-Speed 9655.99 samples/sec   Loss 11.7482   LearningRate 0.0912   Epoch: 0   Global Step: 14980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:23,880-Speed 9794.66 samples/sec   Loss 11.9768   LearningRate 0.0912   Epoch: 0   Global Step: 14990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:24,975-Speed 9353.23 samples/sec   Loss 11.8133   LearningRate 0.0912   Epoch: 0   Global Step: 15000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:26,043-Speed 9602.16 samples/sec   Loss 11.8807   LearningRate 0.0912   Epoch: 0   Global Step: 15010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:27,144-Speed 9306.10 samples/sec   Loss 11.9748   LearningRate 0.0912   Epoch: 0   Global Step: 15020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:28,186-Speed 9829.62 samples/sec   Loss 11.8927   LearningRate 0.0912   Epoch: 0   Global Step: 15030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:29,355-Speed 8762.29 samples/sec   Loss 12.0114   LearningRate 0.0912   Epoch: 0   Global Step: 15040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:30,434-Speed 9498.33 samples/sec   Loss 11.8390   LearningRate 0.0912   Epoch: 0   Global Step: 15050   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:08:31,531-Speed 9344.76 samples/sec   Loss 11.9419   LearningRate 0.0912   Epoch: 0   Global Step: 15060   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:08:32,573-Speed 9829.69 samples/sec   Loss 11.9942   LearningRate 0.0912   Epoch: 0   Global Step: 15070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:33,666-Speed 9371.63 samples/sec   Loss 11.8170   LearningRate 0.0912   Epoch: 0   Global Step: 15080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:34,776-Speed 9236.08 samples/sec   Loss 11.8859   LearningRate 0.0912   Epoch: 0   Global Step: 15090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:35,839-Speed 9640.65 samples/sec   Loss 11.7871   LearningRate 0.0912   Epoch: 0   Global Step: 15100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:36,935-Speed 9350.39 samples/sec   Loss 11.8138   LearningRate 0.0912   Epoch: 0   Global Step: 15110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:38,018-Speed 9454.04 samples/sec   Loss 11.7433   LearningRate 0.0911   Epoch: 0   Global Step: 15120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:39,105-Speed 9436.33 samples/sec   Loss 11.9279   LearningRate 0.0911   Epoch: 0   Global Step: 15130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:40,158-Speed 9727.93 samples/sec   Loss 11.9021   LearningRate 0.0911   Epoch: 0   Global Step: 15140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:41,178-Speed 10043.41 samples/sec   Loss 11.7625   LearningRate 0.0911   Epoch: 0   Global Step: 15150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:42,246-Speed 9591.58 samples/sec   Loss 11.8297   LearningRate 0.0911   Epoch: 0   Global Step: 15160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:43,346-Speed 9310.99 samples/sec   Loss 11.6465   LearningRate 0.0911   Epoch: 0   Global Step: 15170   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:08:44,432-Speed 9440.04 samples/sec   Loss 11.7528   LearningRate 0.0911   Epoch: 0   Global Step: 15180   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:08:45,518-Speed 9435.08 samples/sec   Loss 11.7531   LearningRate 0.0911   Epoch: 0   Global Step: 15190   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:08:46,606-Speed 9414.81 samples/sec   Loss 11.8311   LearningRate 0.0911   Epoch: 0   Global Step: 15200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:47,707-Speed 9307.07 samples/sec   Loss 11.7752   LearningRate 0.0911   Epoch: 0   Global Step: 15210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:48,765-Speed 9681.07 samples/sec   Loss 11.8102   LearningRate 0.0911   Epoch: 0   Global Step: 15220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:08:49,889-Speed 9117.20 samples/sec   Loss 11.6840   LearningRate 0.0911   Epoch: 0   Global Step: 15230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:50,964-Speed 9533.02 samples/sec   Loss 11.9313   LearningRate 0.0911   Epoch: 0   Global Step: 15240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:52,037-Speed 9551.80 samples/sec   Loss 11.7891   LearningRate 0.0911   Epoch: 0   Global Step: 15250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:53,114-Speed 9505.80 samples/sec   Loss 11.9617   LearningRate 0.0911   Epoch: 0   Global Step: 15260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:54,188-Speed 9546.35 samples/sec   Loss 11.6927   LearningRate 0.0911   Epoch: 0   Global Step: 15270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:55,261-Speed 9542.66 samples/sec   Loss 11.8284   LearningRate 0.0911   Epoch: 0   Global Step: 15280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:56,405-Speed 8967.02 samples/sec   Loss 11.7464   LearningRate 0.0910   Epoch: 0   Global Step: 15290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:57,546-Speed 8974.64 samples/sec   Loss 11.9094   LearningRate 0.0910   Epoch: 0   Global Step: 15300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:58,618-Speed 9558.43 samples/sec   Loss 11.7717   LearningRate 0.0910   Epoch: 0   Global Step: 15310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:08:59,693-Speed 9529.37 samples/sec   Loss 11.8900   LearningRate 0.0910   Epoch: 0   Global Step: 15320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:09:00,777-Speed 9457.06 samples/sec   Loss 11.9040   LearningRate 0.0910   Epoch: 0   Global Step: 15330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:09:01,852-Speed 9533.45 samples/sec   Loss 11.7788   LearningRate 0.0910   Epoch: 0   Global Step: 15340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:09:02,955-Speed 9287.06 samples/sec   Loss 11.7600   LearningRate 0.0910   Epoch: 0   Global Step: 15350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:09:04,056-Speed 9299.72 samples/sec   Loss 11.8806   LearningRate 0.0910   Epoch: 0   Global Step: 15360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:09:05,174-Speed 9164.59 samples/sec   Loss 11.8045   LearningRate 0.0910   Epoch: 0   Global Step: 15370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:09:06,232-Speed 9684.43 samples/sec   Loss 11.7220   LearningRate 0.0910   Epoch: 0   Global Step: 15380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:09:07,273-Speed 9846.24 samples/sec   Loss 11.7334   LearningRate 0.0910   Epoch: 0   Global Step: 15390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:09:08,315-Speed 9837.85 samples/sec   Loss 11.7859   LearningRate 0.0910   Epoch: 0   Global Step: 15400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:09:09,401-Speed 9432.66 samples/sec   Loss 11.8469   LearningRate 0.0910   Epoch: 0   Global Step: 15410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:09:10,482-Speed 9481.51 samples/sec   Loss 11.7805   LearningRate 0.0910   Epoch: 0   Global Step: 15420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:09:11,547-Speed 9612.60 samples/sec   Loss 11.7125   LearningRate 0.0910   Epoch: 0   Global Step: 15430   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:09:12,601-Speed 9725.84 samples/sec   Loss 11.7514   LearningRate 0.0910   Epoch: 0   Global Step: 15440   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:09:13,646-Speed 9803.51 samples/sec   Loss 11.8582   LearningRate 0.0910   Epoch: 0   Global Step: 15450   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:09:14,745-Speed 9319.55 samples/sec   Loss 11.7563   LearningRate 0.0910   Epoch: 0   Global Step: 15460   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:09:15,837-Speed 9390.55 samples/sec   Loss 11.8585   LearningRate 0.0909   Epoch: 0   Global Step: 15470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:16,907-Speed 9576.72 samples/sec   Loss 11.8731   LearningRate 0.0909   Epoch: 0   Global Step: 15480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:17,957-Speed 9750.18 samples/sec   Loss 11.6198   LearningRate 0.0909   Epoch: 0   Global Step: 15490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:19,026-Speed 9586.69 samples/sec   Loss 11.6308   LearningRate 0.0909   Epoch: 0   Global Step: 15500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:20,147-Speed 9144.96 samples/sec   Loss 11.7820   LearningRate 0.0909   Epoch: 0   Global Step: 15510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:21,237-Speed 9402.97 samples/sec   Loss 11.7360   LearningRate 0.0909   Epoch: 0   Global Step: 15520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:22,285-Speed 9774.21 samples/sec   Loss 11.7495   LearningRate 0.0909   Epoch: 0   Global Step: 15530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:23,395-Speed 9230.41 samples/sec   Loss 11.6290   LearningRate 0.0909   Epoch: 0   Global Step: 15540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:24,471-Speed 9518.94 samples/sec   Loss 11.5500   LearningRate 0.0909   Epoch: 0   Global Step: 15550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:25,559-Speed 9422.05 samples/sec   Loss 11.6793   LearningRate 0.0909   Epoch: 0   Global Step: 15560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:26,641-Speed 9472.87 samples/sec   Loss 11.7836   LearningRate 0.0909   Epoch: 0   Global Step: 15570   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:09:27,713-Speed 9551.93 samples/sec   Loss 11.5937   LearningRate 0.0909   Epoch: 0   Global Step: 15580   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:09:28,800-Speed 9425.22 samples/sec   Loss 11.7034   LearningRate 0.0909   Epoch: 0   Global Step: 15590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:29,863-Speed 9638.29 samples/sec   Loss 11.8234   LearningRate 0.0909   Epoch: 0   Global Step: 15600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:09:30,952-Speed 9409.65 samples/sec   Loss 11.6990   LearningRate 0.0909   Epoch: 0   Global Step: 15610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:09:32,029-Speed 9513.59 samples/sec   Loss 11.6444   LearningRate 0.0909   Epoch: 0   Global Step: 15620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:09:33,088-Speed 9677.93 samples/sec   Loss 11.8032   LearningRate 0.0909   Epoch: 0   Global Step: 15630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:09:34,150-Speed 9640.80 samples/sec   Loss 11.6995   LearningRate 0.0908   Epoch: 0   Global Step: 15640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:09:35,314-Speed 8809.60 samples/sec   Loss 11.7043   LearningRate 0.0908   Epoch: 0   Global Step: 15650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:09:36,408-Speed 9363.95 samples/sec   Loss 11.7226   LearningRate 0.0908   Epoch: 0   Global Step: 15660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:09:37,530-Speed 9135.31 samples/sec   Loss 11.8005   LearningRate 0.0908   Epoch: 0   Global Step: 15670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:09:38,616-Speed 9433.07 samples/sec   Loss 11.7666   LearningRate 0.0908   Epoch: 0   Global Step: 15680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:09:39,695-Speed 9500.62 samples/sec   Loss 11.6816   LearningRate 0.0908   Epoch: 0   Global Step: 15690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:09:40,755-Speed 9667.19 samples/sec   Loss 11.7433   LearningRate 0.0908   Epoch: 0   Global Step: 15700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:41,806-Speed 9748.69 samples/sec   Loss 11.6522   LearningRate 0.0908   Epoch: 0   Global Step: 15710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:42,885-Speed 9497.56 samples/sec   Loss 11.5866   LearningRate 0.0908   Epoch: 0   Global Step: 15720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:43,975-Speed 9399.40 samples/sec   Loss 11.6595   LearningRate 0.0908   Epoch: 0   Global Step: 15730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:45,053-Speed 9504.05 samples/sec   Loss 11.7896   LearningRate 0.0908   Epoch: 0   Global Step: 15740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:46,121-Speed 9593.60 samples/sec   Loss 11.6122   LearningRate 0.0908   Epoch: 0   Global Step: 15750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:47,190-Speed 9585.49 samples/sec   Loss 11.6318   LearningRate 0.0908   Epoch: 0   Global Step: 15760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:48,286-Speed 9344.19 samples/sec   Loss 11.5868   LearningRate 0.0908   Epoch: 0   Global Step: 15770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:49,417-Speed 9064.90 samples/sec   Loss 11.7618   LearningRate 0.0908   Epoch: 0   Global Step: 15780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:50,483-Speed 9607.79 samples/sec   Loss 11.7001   LearningRate 0.0908   Epoch: 0   Global Step: 15790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:51,575-Speed 9383.82 samples/sec   Loss 11.7093   LearningRate 0.0908   Epoch: 0   Global Step: 15800   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:09:52,630-Speed 9706.38 samples/sec   Loss 11.6994   LearningRate 0.0908   Epoch: 0   Global Step: 15810   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:09:53,772-Speed 8978.41 samples/sec   Loss 11.5868   LearningRate 0.0907   Epoch: 0   Global Step: 15820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:54,895-Speed 9120.93 samples/sec   Loss 11.6608   LearningRate 0.0907   Epoch: 0   Global Step: 15830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:55,946-Speed 9755.28 samples/sec   Loss 11.7203   LearningRate 0.0907   Epoch: 0   Global Step: 15840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:57,035-Speed 9408.76 samples/sec   Loss 11.5887   LearningRate 0.0907   Epoch: 0   Global Step: 15850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:58,097-Speed 9643.99 samples/sec   Loss 11.5577   LearningRate 0.0907   Epoch: 0   Global Step: 15860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:09:59,180-Speed 9459.69 samples/sec   Loss 11.5677   LearningRate 0.0907   Epoch: 0   Global Step: 15870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:10:00,255-Speed 9533.60 samples/sec   Loss 11.6617   LearningRate 0.0907   Epoch: 0   Global Step: 15880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:10:01,304-Speed 9769.46 samples/sec   Loss 11.7082   LearningRate 0.0907   Epoch: 0   Global Step: 15890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:10:02,367-Speed 9634.32 samples/sec   Loss 11.7516   LearningRate 0.0907   Epoch: 0   Global Step: 15900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:10:03,450-Speed 9467.17 samples/sec   Loss 11.5323   LearningRate 0.0907   Epoch: 0   Global Step: 15910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:10:04,523-Speed 9542.61 samples/sec   Loss 11.6315   LearningRate 0.0907   Epoch: 0   Global Step: 15920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:10:05,592-Speed 9586.49 samples/sec   Loss 11.5522   LearningRate 0.0907   Epoch: 0   Global Step: 15930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:10:06,671-Speed 9490.80 samples/sec   Loss 11.6365   LearningRate 0.0907   Epoch: 0   Global Step: 15940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:10:07,738-Speed 9602.56 samples/sec   Loss 11.6783   LearningRate 0.0907   Epoch: 0   Global Step: 15950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:10:08,816-Speed 9511.37 samples/sec   Loss 11.5670   LearningRate 0.0907   Epoch: 0   Global Step: 15960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:10:09,920-Speed 9278.36 samples/sec   Loss 11.5745   LearningRate 0.0907   Epoch: 0   Global Step: 15970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:10:10,961-Speed 9850.14 samples/sec   Loss 11.7599   LearningRate 0.0907   Epoch: 0   Global Step: 15980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:10:12,047-Speed 9428.13 samples/sec   Loss 11.6846   LearningRate 0.0906   Epoch: 0   Global Step: 15990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:10:13,183-Speed 9024.81 samples/sec   Loss 11.4858   LearningRate 0.0906   Epoch: 0   Global Step: 16000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:10:35,454-[lfw][16000]XNorm: 14.390814
Training: 2022-04-11 12:10:35,455-[lfw][16000]Accuracy-Flip: 0.99267+-0.00448
Training: 2022-04-11 12:10:35,455-[lfw][16000]Accuracy-Highest: 0.99317
Training: 2022-04-11 12:11:01,059-[cfp_fp][16000]XNorm: 12.355993
Training: 2022-04-11 12:11:01,060-[cfp_fp][16000]Accuracy-Flip: 0.91700+-0.01248
Training: 2022-04-11 12:11:01,060-[cfp_fp][16000]Accuracy-Highest: 0.91700
Training: 2022-04-11 12:11:23,035-[agedb_30][16000]XNorm: 13.844527
Training: 2022-04-11 12:11:23,036-[agedb_30][16000]Accuracy-Flip: 0.92967+-0.01641
Training: 2022-04-11 12:11:23,037-[agedb_30][16000]Accuracy-Highest: 0.93083
Training: 2022-04-11 12:11:24,106-Speed 144.38 samples/sec   Loss 11.5254   LearningRate 0.0906   Epoch: 0   Global Step: 16010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:25,182-Speed 9518.83 samples/sec   Loss 11.5666   LearningRate 0.0906   Epoch: 0   Global Step: 16020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:26,258-Speed 9522.16 samples/sec   Loss 11.6676   LearningRate 0.0906   Epoch: 0   Global Step: 16030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:27,360-Speed 9297.44 samples/sec   Loss 11.5352   LearningRate 0.0906   Epoch: 0   Global Step: 16040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:28,436-Speed 9520.91 samples/sec   Loss 11.6476   LearningRate 0.0906   Epoch: 0   Global Step: 16050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:29,487-Speed 9748.89 samples/sec   Loss 11.6504   LearningRate 0.0906   Epoch: 0   Global Step: 16060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:30,600-Speed 9206.23 samples/sec   Loss 11.6341   LearningRate 0.0906   Epoch: 0   Global Step: 16070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:31,677-Speed 9516.10 samples/sec   Loss 11.6039   LearningRate 0.0906   Epoch: 0   Global Step: 16080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:32,718-Speed 9842.04 samples/sec   Loss 11.5709   LearningRate 0.0906   Epoch: 0   Global Step: 16090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:33,785-Speed 9605.92 samples/sec   Loss 11.7156   LearningRate 0.0906   Epoch: 0   Global Step: 16100   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:11:34,864-Speed 9490.71 samples/sec   Loss 11.5102   LearningRate 0.0906   Epoch: 0   Global Step: 16110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:11:35,958-Speed 9363.99 samples/sec   Loss 11.6546   LearningRate 0.0906   Epoch: 0   Global Step: 16120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:11:37,016-Speed 9684.65 samples/sec   Loss 11.7099   LearningRate 0.0906   Epoch: 0   Global Step: 16130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:11:38,100-Speed 9455.04 samples/sec   Loss 11.6172   LearningRate 0.0906   Epoch: 0   Global Step: 16140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:11:39,186-Speed 9431.37 samples/sec   Loss 11.7497   LearningRate 0.0906   Epoch: 0   Global Step: 16150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:11:40,270-Speed 9463.06 samples/sec   Loss 11.5953   LearningRate 0.0906   Epoch: 0   Global Step: 16160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:11:41,355-Speed 9439.49 samples/sec   Loss 11.6466   LearningRate 0.0905   Epoch: 0   Global Step: 16170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:11:42,475-Speed 9145.86 samples/sec   Loss 11.5153   LearningRate 0.0905   Epoch: 0   Global Step: 16180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:11:43,557-Speed 9478.36 samples/sec   Loss 11.5628   LearningRate 0.0905   Epoch: 0   Global Step: 16190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:11:44,608-Speed 9745.53 samples/sec   Loss 11.5774   LearningRate 0.0905   Epoch: 0   Global Step: 16200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:11:45,655-Speed 9782.54 samples/sec   Loss 11.5766   LearningRate 0.0905   Epoch: 0   Global Step: 16210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:46,733-Speed 9515.31 samples/sec   Loss 11.6465   LearningRate 0.0905   Epoch: 0   Global Step: 16220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:47,821-Speed 9412.28 samples/sec   Loss 11.5418   LearningRate 0.0905   Epoch: 0   Global Step: 16230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:48,886-Speed 9621.17 samples/sec   Loss 11.5928   LearningRate 0.0905   Epoch: 0   Global Step: 16240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:49,978-Speed 9383.35 samples/sec   Loss 11.6317   LearningRate 0.0905   Epoch: 0   Global Step: 16250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:51,065-Speed 9421.16 samples/sec   Loss 11.5658   LearningRate 0.0905   Epoch: 0   Global Step: 16260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:52,132-Speed 9605.23 samples/sec   Loss 11.6592   LearningRate 0.0905   Epoch: 0   Global Step: 16270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:53,195-Speed 9636.53 samples/sec   Loss 11.5533   LearningRate 0.0905   Epoch: 0   Global Step: 16280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:54,319-Speed 9119.30 samples/sec   Loss 11.5611   LearningRate 0.0905   Epoch: 0   Global Step: 16290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:55,363-Speed 9813.63 samples/sec   Loss 11.4909   LearningRate 0.0905   Epoch: 0   Global Step: 16300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:56,421-Speed 9683.71 samples/sec   Loss 11.5292   LearningRate 0.0905   Epoch: 0   Global Step: 16310   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:11:57,502-Speed 9473.01 samples/sec   Loss 11.4926   LearningRate 0.0905   Epoch: 0   Global Step: 16320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:58,614-Speed 9220.50 samples/sec   Loss 11.6181   LearningRate 0.0905   Epoch: 0   Global Step: 16330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:11:59,700-Speed 9433.08 samples/sec   Loss 11.5920   LearningRate 0.0904   Epoch: 0   Global Step: 16340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:00,778-Speed 9502.50 samples/sec   Loss 11.4260   LearningRate 0.0904   Epoch: 0   Global Step: 16350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:01,862-Speed 9455.71 samples/sec   Loss 11.4515   LearningRate 0.0904   Epoch: 0   Global Step: 16360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:02,955-Speed 9370.22 samples/sec   Loss 11.4889   LearningRate 0.0904   Epoch: 0   Global Step: 16370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:04,060-Speed 9274.30 samples/sec   Loss 11.5285   LearningRate 0.0904   Epoch: 0   Global Step: 16380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:05,133-Speed 9553.77 samples/sec   Loss 11.4439   LearningRate 0.0904   Epoch: 0   Global Step: 16390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:06,215-Speed 9468.09 samples/sec   Loss 11.5055   LearningRate 0.0904   Epoch: 0   Global Step: 16400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:07,280-Speed 9619.67 samples/sec   Loss 11.5291   LearningRate 0.0904   Epoch: 0   Global Step: 16410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:08,384-Speed 9278.28 samples/sec   Loss 11.3905   LearningRate 0.0904   Epoch: 0   Global Step: 16420   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:12:09,481-Speed 9341.17 samples/sec   Loss 11.5184   LearningRate 0.0904   Epoch: 0   Global Step: 16430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:10,569-Speed 9419.88 samples/sec   Loss 11.5696   LearningRate 0.0904   Epoch: 0   Global Step: 16440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:11,652-Speed 9463.88 samples/sec   Loss 11.6175   LearningRate 0.0904   Epoch: 0   Global Step: 16450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:12,766-Speed 9197.16 samples/sec   Loss 11.5146   LearningRate 0.0904   Epoch: 0   Global Step: 16460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:13,861-Speed 9359.92 samples/sec   Loss 11.5617   LearningRate 0.0904   Epoch: 0   Global Step: 16470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:14,977-Speed 9176.77 samples/sec   Loss 11.5747   LearningRate 0.0904   Epoch: 0   Global Step: 16480   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:16,006-Speed 9956.12 samples/sec   Loss 11.5791   LearningRate 0.0904   Epoch: 0   Global Step: 16490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:17,059-Speed 9738.63 samples/sec   Loss 11.4911   LearningRate 0.0904   Epoch: 0   Global Step: 16500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:18,120-Speed 9657.56 samples/sec   Loss 11.5522   LearningRate 0.0904   Epoch: 0   Global Step: 16510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:19,189-Speed 9581.12 samples/sec   Loss 11.4091   LearningRate 0.0903   Epoch: 0   Global Step: 16520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:20,288-Speed 9322.55 samples/sec   Loss 11.6741   LearningRate 0.0903   Epoch: 0   Global Step: 16530   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:12:21,339-Speed 9748.22 samples/sec   Loss 11.5850   LearningRate 0.0903   Epoch: 0   Global Step: 16540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:22,375-Speed 9889.97 samples/sec   Loss 11.5221   LearningRate 0.0903   Epoch: 0   Global Step: 16550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:23,418-Speed 9827.46 samples/sec   Loss 11.4709   LearningRate 0.0903   Epoch: 0   Global Step: 16560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:12:24,453-Speed 9906.41 samples/sec   Loss 11.4413   LearningRate 0.0903   Epoch: 0   Global Step: 16570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:12:25,527-Speed 9537.41 samples/sec   Loss 11.4402   LearningRate 0.0903   Epoch: 0   Global Step: 16580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:12:26,602-Speed 9531.01 samples/sec   Loss 11.5501   LearningRate 0.0903   Epoch: 0   Global Step: 16590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:12:27,715-Speed 9199.18 samples/sec   Loss 11.4448   LearningRate 0.0903   Epoch: 0   Global Step: 16600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:12:28,824-Speed 9240.87 samples/sec   Loss 11.5353   LearningRate 0.0903   Epoch: 0   Global Step: 16610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:12:29,894-Speed 9580.61 samples/sec   Loss 11.3909   LearningRate 0.0903   Epoch: 0   Global Step: 16620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:12:30,969-Speed 9533.53 samples/sec   Loss 11.4310   LearningRate 0.0903   Epoch: 0   Global Step: 16630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:12:32,050-Speed 9476.80 samples/sec   Loss 11.5372   LearningRate 0.0903   Epoch: 0   Global Step: 16640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:12:33,136-Speed 9428.01 samples/sec   Loss 11.4440   LearningRate 0.0903   Epoch: 0   Global Step: 16650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:12:34,202-Speed 9612.94 samples/sec   Loss 11.5497   LearningRate 0.0903   Epoch: 0   Global Step: 16660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:35,315-Speed 9205.15 samples/sec   Loss 11.3278   LearningRate 0.0903   Epoch: 0   Global Step: 16670   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:36,741-Speed 7182.54 samples/sec   Loss 11.5450   LearningRate 0.0903   Epoch: 0   Global Step: 16680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:12:37,790-Speed 9766.55 samples/sec   Loss 11.4412   LearningRate 0.0903   Epoch: 0   Global Step: 16690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:10,677-Speed 311.38 samples/sec   Loss 10.7104   LearningRate 0.0902   Epoch: 1   Global Step: 16700   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:11,934-Speed 8155.30 samples/sec   Loss 10.5454   LearningRate 0.0902   Epoch: 1   Global Step: 16710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:13,080-Speed 8938.96 samples/sec   Loss 10.7086   LearningRate 0.0902   Epoch: 1   Global Step: 16720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:14,158-Speed 9505.45 samples/sec   Loss 10.5646   LearningRate 0.0902   Epoch: 1   Global Step: 16730   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:15,304-Speed 8937.37 samples/sec   Loss 10.5241   LearningRate 0.0902   Epoch: 1   Global Step: 16740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:16,406-Speed 9303.24 samples/sec   Loss 10.6329   LearningRate 0.0902   Epoch: 1   Global Step: 16750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:18,516-Speed 4855.49 samples/sec   Loss 10.6392   LearningRate 0.0902   Epoch: 1   Global Step: 16760   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:19,587-Speed 9572.08 samples/sec   Loss 10.5079   LearningRate 0.0902   Epoch: 1   Global Step: 16770   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:20,693-Speed 9257.85 samples/sec   Loss 10.5909   LearningRate 0.0902   Epoch: 1   Global Step: 16780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:21,871-Speed 8697.68 samples/sec   Loss 10.5529   LearningRate 0.0902   Epoch: 1   Global Step: 16790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:23,040-Speed 8765.59 samples/sec   Loss 10.6659   LearningRate 0.0902   Epoch: 1   Global Step: 16800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:24,146-Speed 9260.82 samples/sec   Loss 10.6209   LearningRate 0.0902   Epoch: 1   Global Step: 16810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:25,223-Speed 9518.41 samples/sec   Loss 10.6567   LearningRate 0.0902   Epoch: 1   Global Step: 16820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:26,312-Speed 9407.30 samples/sec   Loss 10.5656   LearningRate 0.0902   Epoch: 1   Global Step: 16830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:27,379-Speed 9603.00 samples/sec   Loss 10.5475   LearningRate 0.0902   Epoch: 1   Global Step: 16840   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:28,467-Speed 9414.81 samples/sec   Loss 10.5731   LearningRate 0.0902   Epoch: 1   Global Step: 16850   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:29,528-Speed 9652.90 samples/sec   Loss 10.5232   LearningRate 0.0902   Epoch: 1   Global Step: 16860   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:13:30,615-Speed 9433.15 samples/sec   Loss 10.7433   LearningRate 0.0901   Epoch: 1   Global Step: 16870   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:13:31,740-Speed 9108.31 samples/sec   Loss 10.6933   LearningRate 0.0901   Epoch: 1   Global Step: 16880   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:13:32,829-Speed 9407.49 samples/sec   Loss 10.6025   LearningRate 0.0901   Epoch: 1   Global Step: 16890   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:13:33,881-Speed 9742.54 samples/sec   Loss 10.6044   LearningRate 0.0901   Epoch: 1   Global Step: 16900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:34,978-Speed 9334.45 samples/sec   Loss 10.5977   LearningRate 0.0901   Epoch: 1   Global Step: 16910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:36,094-Speed 9178.73 samples/sec   Loss 10.7549   LearningRate 0.0901   Epoch: 1   Global Step: 16920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:37,172-Speed 9511.84 samples/sec   Loss 10.6553   LearningRate 0.0901   Epoch: 1   Global Step: 16930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:38,590-Speed 7226.61 samples/sec   Loss 10.7400   LearningRate 0.0901   Epoch: 1   Global Step: 16940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:39,690-Speed 9313.23 samples/sec   Loss 10.6604   LearningRate 0.0901   Epoch: 1   Global Step: 16950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:40,774-Speed 9454.82 samples/sec   Loss 10.6586   LearningRate 0.0901   Epoch: 1   Global Step: 16960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:13:41,877-Speed 9286.12 samples/sec   Loss 10.6468   LearningRate 0.0901   Epoch: 1   Global Step: 16970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:13:42,966-Speed 9414.68 samples/sec   Loss 10.6722   LearningRate 0.0901   Epoch: 1   Global Step: 16980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:13:44,038-Speed 9555.80 samples/sec   Loss 10.7272   LearningRate 0.0901   Epoch: 1   Global Step: 16990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:13:45,152-Speed 9190.58 samples/sec   Loss 10.6701   LearningRate 0.0901   Epoch: 1   Global Step: 17000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:13:46,288-Speed 9025.11 samples/sec   Loss 10.5134   LearningRate 0.0901   Epoch: 1   Global Step: 17010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:13:47,367-Speed 9500.17 samples/sec   Loss 10.6492   LearningRate 0.0901   Epoch: 1   Global Step: 17020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:13:48,486-Speed 9149.29 samples/sec   Loss 10.5671   LearningRate 0.0901   Epoch: 1   Global Step: 17030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:13:49,571-Speed 9451.55 samples/sec   Loss 10.6251   LearningRate 0.0901   Epoch: 1   Global Step: 17040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:13:50,676-Speed 9264.72 samples/sec   Loss 10.5883   LearningRate 0.0900   Epoch: 1   Global Step: 17050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:13:51,786-Speed 9234.64 samples/sec   Loss 10.6533   LearningRate 0.0900   Epoch: 1   Global Step: 17060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:52,881-Speed 9352.49 samples/sec   Loss 10.5445   LearningRate 0.0900   Epoch: 1   Global Step: 17070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:53,952-Speed 9568.99 samples/sec   Loss 10.5837   LearningRate 0.0900   Epoch: 1   Global Step: 17080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:55,048-Speed 9354.99 samples/sec   Loss 10.7921   LearningRate 0.0900   Epoch: 1   Global Step: 17090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:56,108-Speed 9666.44 samples/sec   Loss 10.6724   LearningRate 0.0900   Epoch: 1   Global Step: 17100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:57,222-Speed 9197.89 samples/sec   Loss 10.6069   LearningRate 0.0900   Epoch: 1   Global Step: 17110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:58,306-Speed 9451.26 samples/sec   Loss 10.7484   LearningRate 0.0900   Epoch: 1   Global Step: 17120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:13:59,357-Speed 9747.28 samples/sec   Loss 10.7546   LearningRate 0.0900   Epoch: 1   Global Step: 17130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:00,488-Speed 9060.31 samples/sec   Loss 10.6446   LearningRate 0.0900   Epoch: 1   Global Step: 17140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:01,592-Speed 9275.17 samples/sec   Loss 10.8212   LearningRate 0.0900   Epoch: 1   Global Step: 17150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:02,703-Speed 9225.85 samples/sec   Loss 10.6174   LearningRate 0.0900   Epoch: 1   Global Step: 17160   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:14:03,765-Speed 9648.47 samples/sec   Loss 10.7752   LearningRate 0.0900   Epoch: 1   Global Step: 17170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:04,904-Speed 8993.65 samples/sec   Loss 10.7181   LearningRate 0.0900   Epoch: 1   Global Step: 17180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:05,985-Speed 9482.25 samples/sec   Loss 10.5637   LearningRate 0.0900   Epoch: 1   Global Step: 17190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:07,098-Speed 9203.62 samples/sec   Loss 10.7173   LearningRate 0.0900   Epoch: 1   Global Step: 17200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:08,213-Speed 9188.77 samples/sec   Loss 10.6381   LearningRate 0.0900   Epoch: 1   Global Step: 17210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:09,320-Speed 9260.93 samples/sec   Loss 10.6878   LearningRate 0.0899   Epoch: 1   Global Step: 17220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:10,427-Speed 9262.61 samples/sec   Loss 10.7537   LearningRate 0.0899   Epoch: 1   Global Step: 17230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:11,513-Speed 9436.93 samples/sec   Loss 10.6699   LearningRate 0.0899   Epoch: 1   Global Step: 17240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:12,613-Speed 9311.40 samples/sec   Loss 10.6071   LearningRate 0.0899   Epoch: 1   Global Step: 17250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:13,663-Speed 9757.75 samples/sec   Loss 10.7437   LearningRate 0.0899   Epoch: 1   Global Step: 17260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:14,766-Speed 9296.07 samples/sec   Loss 10.8205   LearningRate 0.0899   Epoch: 1   Global Step: 17270   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:14:15,817-Speed 9748.34 samples/sec   Loss 10.6715   LearningRate 0.0899   Epoch: 1   Global Step: 17280   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:14:16,862-Speed 9803.00 samples/sec   Loss 10.7199   LearningRate 0.0899   Epoch: 1   Global Step: 17290   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:14:17,934-Speed 9553.88 samples/sec   Loss 10.6814   LearningRate 0.0899   Epoch: 1   Global Step: 17300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:18,980-Speed 9795.96 samples/sec   Loss 10.7940   LearningRate 0.0899   Epoch: 1   Global Step: 17310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:20,037-Speed 9694.42 samples/sec   Loss 10.7808   LearningRate 0.0899   Epoch: 1   Global Step: 17320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:21,123-Speed 9432.31 samples/sec   Loss 10.8118   LearningRate 0.0899   Epoch: 1   Global Step: 17330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:22,201-Speed 9508.23 samples/sec   Loss 10.6500   LearningRate 0.0899   Epoch: 1   Global Step: 17340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:23,288-Speed 9534.57 samples/sec   Loss 10.7649   LearningRate 0.0899   Epoch: 1   Global Step: 17350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:24,351-Speed 9637.41 samples/sec   Loss 10.7126   LearningRate 0.0899   Epoch: 1   Global Step: 17360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:25,459-Speed 9246.96 samples/sec   Loss 10.7393   LearningRate 0.0899   Epoch: 1   Global Step: 17370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:26,575-Speed 9186.47 samples/sec   Loss 10.7498   LearningRate 0.0899   Epoch: 1   Global Step: 17380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:27,681-Speed 9263.19 samples/sec   Loss 10.7579   LearningRate 0.0899   Epoch: 1   Global Step: 17390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:28,755-Speed 9534.22 samples/sec   Loss 10.7184   LearningRate 0.0898   Epoch: 1   Global Step: 17400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:29,828-Speed 9566.20 samples/sec   Loss 10.7045   LearningRate 0.0898   Epoch: 1   Global Step: 17410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:30,908-Speed 9490.93 samples/sec   Loss 10.7346   LearningRate 0.0898   Epoch: 1   Global Step: 17420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:31,982-Speed 9531.87 samples/sec   Loss 10.7781   LearningRate 0.0898   Epoch: 1   Global Step: 17430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:33,048-Speed 9618.58 samples/sec   Loss 10.6559   LearningRate 0.0898   Epoch: 1   Global Step: 17440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:34,090-Speed 9826.33 samples/sec   Loss 10.6481   LearningRate 0.0898   Epoch: 1   Global Step: 17450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:35,210-Speed 9150.81 samples/sec   Loss 10.7002   LearningRate 0.0898   Epoch: 1   Global Step: 17460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:36,323-Speed 9202.09 samples/sec   Loss 10.9602   LearningRate 0.0898   Epoch: 1   Global Step: 17470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:37,416-Speed 9377.68 samples/sec   Loss 10.7387   LearningRate 0.0898   Epoch: 1   Global Step: 17480   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:38,531-Speed 9185.26 samples/sec   Loss 10.8021   LearningRate 0.0898   Epoch: 1   Global Step: 17490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:39,601-Speed 9580.40 samples/sec   Loss 10.6833   LearningRate 0.0898   Epoch: 1   Global Step: 17500   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:14:40,694-Speed 9372.12 samples/sec   Loss 10.6411   LearningRate 0.0898   Epoch: 1   Global Step: 17510   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:14:41,773-Speed 9495.18 samples/sec   Loss 10.7294   LearningRate 0.0898   Epoch: 1   Global Step: 17520   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:14:42,869-Speed 9348.42 samples/sec   Loss 10.7815   LearningRate 0.0898   Epoch: 1   Global Step: 17530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:43,946-Speed 9514.92 samples/sec   Loss 10.7162   LearningRate 0.0898   Epoch: 1   Global Step: 17540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:45,038-Speed 9383.89 samples/sec   Loss 10.8865   LearningRate 0.0898   Epoch: 1   Global Step: 17550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:46,086-Speed 9777.29 samples/sec   Loss 10.6806   LearningRate 0.0898   Epoch: 1   Global Step: 17560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:47,185-Speed 9330.00 samples/sec   Loss 10.7117   LearningRate 0.0898   Epoch: 1   Global Step: 17570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:48,262-Speed 9512.26 samples/sec   Loss 10.7749   LearningRate 0.0897   Epoch: 1   Global Step: 17580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:49,347-Speed 9446.43 samples/sec   Loss 10.6617   LearningRate 0.0897   Epoch: 1   Global Step: 17590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:50,439-Speed 9378.70 samples/sec   Loss 10.8839   LearningRate 0.0897   Epoch: 1   Global Step: 17600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:51,524-Speed 9439.05 samples/sec   Loss 10.7569   LearningRate 0.0897   Epoch: 1   Global Step: 17610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:52,638-Speed 9201.56 samples/sec   Loss 10.7667   LearningRate 0.0897   Epoch: 1   Global Step: 17620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:14:53,691-Speed 9744.60 samples/sec   Loss 10.8386   LearningRate 0.0897   Epoch: 1   Global Step: 17630   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:14:54,745-Speed 9726.03 samples/sec   Loss 10.6433   LearningRate 0.0897   Epoch: 1   Global Step: 17640   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:14:55,839-Speed 9363.82 samples/sec   Loss 10.7146   LearningRate 0.0897   Epoch: 1   Global Step: 17650   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:14:56,919-Speed 9483.05 samples/sec   Loss 10.7236   LearningRate 0.0897   Epoch: 1   Global Step: 17660   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:14:58,001-Speed 9474.31 samples/sec   Loss 10.7369   LearningRate 0.0897   Epoch: 1   Global Step: 17670   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:14:59,129-Speed 9083.52 samples/sec   Loss 10.7287   LearningRate 0.0897   Epoch: 1   Global Step: 17680   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:15:00,230-Speed 9306.58 samples/sec   Loss 10.7028   LearningRate 0.0897   Epoch: 1   Global Step: 17690   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:15:01,335-Speed 9271.02 samples/sec   Loss 10.7478   LearningRate 0.0897   Epoch: 1   Global Step: 17700   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:15:02,403-Speed 9599.96 samples/sec   Loss 10.7061   LearningRate 0.0897   Epoch: 1   Global Step: 17710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:03,549-Speed 8933.24 samples/sec   Loss 10.6568   LearningRate 0.0897   Epoch: 1   Global Step: 17720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:04,625-Speed 9527.01 samples/sec   Loss 10.7015   LearningRate 0.0897   Epoch: 1   Global Step: 17730   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:05,684-Speed 9675.92 samples/sec   Loss 10.6739   LearningRate 0.0897   Epoch: 1   Global Step: 17740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:06,778-Speed 9366.17 samples/sec   Loss 10.7983   LearningRate 0.0896   Epoch: 1   Global Step: 17750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:07,897-Speed 9158.73 samples/sec   Loss 10.7613   LearningRate 0.0896   Epoch: 1   Global Step: 17760   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:09,045-Speed 8919.99 samples/sec   Loss 10.8561   LearningRate 0.0896   Epoch: 1   Global Step: 17770   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:10,139-Speed 9370.41 samples/sec   Loss 10.6237   LearningRate 0.0896   Epoch: 1   Global Step: 17780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:11,230-Speed 9389.87 samples/sec   Loss 10.8128   LearningRate 0.0896   Epoch: 1   Global Step: 17790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:12,298-Speed 9596.19 samples/sec   Loss 10.6789   LearningRate 0.0896   Epoch: 1   Global Step: 17800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:13,377-Speed 9499.78 samples/sec   Loss 10.7996   LearningRate 0.0896   Epoch: 1   Global Step: 17810   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:15:14,450-Speed 9544.12 samples/sec   Loss 10.7511   LearningRate 0.0896   Epoch: 1   Global Step: 17820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:15,575-Speed 9113.34 samples/sec   Loss 10.9599   LearningRate 0.0896   Epoch: 1   Global Step: 17830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:16,643-Speed 9596.66 samples/sec   Loss 10.8447   LearningRate 0.0896   Epoch: 1   Global Step: 17840   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:17,720-Speed 9514.75 samples/sec   Loss 10.7658   LearningRate 0.0896   Epoch: 1   Global Step: 17850   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:18,799-Speed 9494.49 samples/sec   Loss 10.7086   LearningRate 0.0896   Epoch: 1   Global Step: 17860   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:19,927-Speed 9078.99 samples/sec   Loss 10.7801   LearningRate 0.0896   Epoch: 1   Global Step: 17870   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:21,023-Speed 9349.13 samples/sec   Loss 10.7843   LearningRate 0.0896   Epoch: 1   Global Step: 17880   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:22,385-Speed 7523.84 samples/sec   Loss 10.8288   LearningRate 0.0896   Epoch: 1   Global Step: 17890   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:23,438-Speed 9733.05 samples/sec   Loss 10.6581   LearningRate 0.0896   Epoch: 1   Global Step: 17900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:24,558-Speed 9150.61 samples/sec   Loss 10.6091   LearningRate 0.0896   Epoch: 1   Global Step: 17910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:25,655-Speed 9337.21 samples/sec   Loss 10.7650   LearningRate 0.0896   Epoch: 1   Global Step: 17920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:26,760-Speed 9275.58 samples/sec   Loss 10.7901   LearningRate 0.0895   Epoch: 1   Global Step: 17930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:27,854-Speed 9361.36 samples/sec   Loss 10.6588   LearningRate 0.0895   Epoch: 1   Global Step: 17940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:29,333-Speed 6932.18 samples/sec   Loss 10.7497   LearningRate 0.0895   Epoch: 1   Global Step: 17950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:30,559-Speed 8355.71 samples/sec   Loss 10.7167   LearningRate 0.0895   Epoch: 1   Global Step: 17960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:31,638-Speed 9489.84 samples/sec   Loss 10.7602   LearningRate 0.0895   Epoch: 1   Global Step: 17970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:32,735-Speed 9337.43 samples/sec   Loss 10.5699   LearningRate 0.0895   Epoch: 1   Global Step: 17980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:33,821-Speed 9441.66 samples/sec   Loss 10.7921   LearningRate 0.0895   Epoch: 1   Global Step: 17990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:15:34,912-Speed 9386.98 samples/sec   Loss 10.8397   LearningRate 0.0895   Epoch: 1   Global Step: 18000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:15:57,021-[lfw][18000]XNorm: 13.881175
Training: 2022-04-11 12:15:57,022-[lfw][18000]Accuracy-Flip: 0.99167+-0.00408
Training: 2022-04-11 12:15:57,022-[lfw][18000]Accuracy-Highest: 0.99317
Training: 2022-04-11 12:16:22,537-[cfp_fp][18000]XNorm: 11.811630
Training: 2022-04-11 12:16:22,538-[cfp_fp][18000]Accuracy-Flip: 0.92114+-0.01359
Training: 2022-04-11 12:16:22,538-[cfp_fp][18000]Accuracy-Highest: 0.92114
Training: 2022-04-11 12:16:44,568-[agedb_30][18000]XNorm: 13.412701
Training: 2022-04-11 12:16:44,569-[agedb_30][18000]Accuracy-Flip: 0.93433+-0.01104
Training: 2022-04-11 12:16:44,570-[agedb_30][18000]Accuracy-Highest: 0.93433
Training: 2022-04-11 12:16:45,690-Speed 144.68 samples/sec   Loss 10.7648   LearningRate 0.0895   Epoch: 1   Global Step: 18010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:16:46,752-Speed 9642.72 samples/sec   Loss 10.8985   LearningRate 0.0895   Epoch: 1   Global Step: 18020   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:16:47,812-Speed 9667.42 samples/sec   Loss 10.8695   LearningRate 0.0895   Epoch: 1   Global Step: 18030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:16:48,873-Speed 9653.67 samples/sec   Loss 10.7784   LearningRate 0.0895   Epoch: 1   Global Step: 18040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:16:49,943-Speed 9578.69 samples/sec   Loss 10.7272   LearningRate 0.0895   Epoch: 1   Global Step: 18050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:16:51,020-Speed 9512.72 samples/sec   Loss 10.7445   LearningRate 0.0895   Epoch: 1   Global Step: 18060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:16:52,118-Speed 9332.90 samples/sec   Loss 10.7755   LearningRate 0.0895   Epoch: 1   Global Step: 18070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:16:53,223-Speed 9269.41 samples/sec   Loss 10.8234   LearningRate 0.0895   Epoch: 1   Global Step: 18080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:16:54,280-Speed 9694.02 samples/sec   Loss 10.7611   LearningRate 0.0895   Epoch: 1   Global Step: 18090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:16:55,392-Speed 9212.14 samples/sec   Loss 10.7489   LearningRate 0.0894   Epoch: 1   Global Step: 18100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:16:56,507-Speed 9197.05 samples/sec   Loss 10.7562   LearningRate 0.0894   Epoch: 1   Global Step: 18110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:16:57,613-Speed 9263.45 samples/sec   Loss 10.8251   LearningRate 0.0894   Epoch: 1   Global Step: 18120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:16:58,736-Speed 9119.76 samples/sec   Loss 10.7783   LearningRate 0.0894   Epoch: 1   Global Step: 18130   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:16:59,831-Speed 9356.81 samples/sec   Loss 10.7964   LearningRate 0.0894   Epoch: 1   Global Step: 18140   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:17:00,891-Speed 9671.11 samples/sec   Loss 10.7908   LearningRate 0.0894   Epoch: 1   Global Step: 18150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:01,999-Speed 9258.08 samples/sec   Loss 10.6389   LearningRate 0.0894   Epoch: 1   Global Step: 18160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:03,090-Speed 9392.59 samples/sec   Loss 10.7936   LearningRate 0.0894   Epoch: 1   Global Step: 18170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:04,179-Speed 9409.84 samples/sec   Loss 10.5934   LearningRate 0.0894   Epoch: 1   Global Step: 18180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:05,293-Speed 9198.77 samples/sec   Loss 10.7449   LearningRate 0.0894   Epoch: 1   Global Step: 18190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:06,375-Speed 9473.07 samples/sec   Loss 10.7884   LearningRate 0.0894   Epoch: 1   Global Step: 18200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:07,475-Speed 9315.12 samples/sec   Loss 10.6524   LearningRate 0.0894   Epoch: 1   Global Step: 18210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:08,563-Speed 9412.50 samples/sec   Loss 10.7620   LearningRate 0.0894   Epoch: 1   Global Step: 18220   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:09,705-Speed 8971.56 samples/sec   Loss 10.8025   LearningRate 0.0894   Epoch: 1   Global Step: 18230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:10,800-Speed 9361.99 samples/sec   Loss 10.8515   LearningRate 0.0894   Epoch: 1   Global Step: 18240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:11,849-Speed 9762.22 samples/sec   Loss 10.8577   LearningRate 0.0894   Epoch: 1   Global Step: 18250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:12,957-Speed 9250.06 samples/sec   Loss 10.7544   LearningRate 0.0894   Epoch: 1   Global Step: 18260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:14,053-Speed 9353.83 samples/sec   Loss 10.7826   LearningRate 0.0894   Epoch: 1   Global Step: 18270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:15,105-Speed 9737.11 samples/sec   Loss 10.8343   LearningRate 0.0893   Epoch: 1   Global Step: 18280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:16,216-Speed 9221.64 samples/sec   Loss 10.7500   LearningRate 0.0893   Epoch: 1   Global Step: 18290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:17,347-Speed 9066.66 samples/sec   Loss 10.7348   LearningRate 0.0893   Epoch: 1   Global Step: 18300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:18,419-Speed 9551.62 samples/sec   Loss 10.7707   LearningRate 0.0893   Epoch: 1   Global Step: 18310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:17:19,535-Speed 9181.65 samples/sec   Loss 10.8129   LearningRate 0.0893   Epoch: 1   Global Step: 18320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:17:20,620-Speed 9446.53 samples/sec   Loss 10.7385   LearningRate 0.0893   Epoch: 1   Global Step: 18330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:17:21,741-Speed 9144.01 samples/sec   Loss 10.8042   LearningRate 0.0893   Epoch: 1   Global Step: 18340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:17:22,785-Speed 9813.36 samples/sec   Loss 10.6808   LearningRate 0.0893   Epoch: 1   Global Step: 18350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:17:23,958-Speed 8733.83 samples/sec   Loss 10.8229   LearningRate 0.0893   Epoch: 1   Global Step: 18360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:17:25,042-Speed 9451.75 samples/sec   Loss 10.7219   LearningRate 0.0893   Epoch: 1   Global Step: 18370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:17:26,117-Speed 9531.08 samples/sec   Loss 10.8298   LearningRate 0.0893   Epoch: 1   Global Step: 18380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:17:27,206-Speed 9403.56 samples/sec   Loss 10.7938   LearningRate 0.0893   Epoch: 1   Global Step: 18390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:17:28,281-Speed 9535.85 samples/sec   Loss 10.7535   LearningRate 0.0893   Epoch: 1   Global Step: 18400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:17:29,347-Speed 9609.71 samples/sec   Loss 10.7682   LearningRate 0.0893   Epoch: 1   Global Step: 18410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:30,443-Speed 9353.85 samples/sec   Loss 10.6428   LearningRate 0.0893   Epoch: 1   Global Step: 18420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:31,533-Speed 9403.17 samples/sec   Loss 10.7851   LearningRate 0.0893   Epoch: 1   Global Step: 18430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:32,619-Speed 9430.98 samples/sec   Loss 10.6875   LearningRate 0.0893   Epoch: 1   Global Step: 18440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:33,702-Speed 9465.98 samples/sec   Loss 10.9015   LearningRate 0.0893   Epoch: 1   Global Step: 18450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:34,781-Speed 9492.79 samples/sec   Loss 10.6635   LearningRate 0.0892   Epoch: 1   Global Step: 18460   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:35,844-Speed 9641.69 samples/sec   Loss 10.8039   LearningRate 0.0892   Epoch: 1   Global Step: 18470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:36,905-Speed 9654.10 samples/sec   Loss 10.7454   LearningRate 0.0892   Epoch: 1   Global Step: 18480   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:38,012-Speed 9258.60 samples/sec   Loss 10.7924   LearningRate 0.0892   Epoch: 1   Global Step: 18490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:39,064-Speed 9739.35 samples/sec   Loss 10.7296   LearningRate 0.0892   Epoch: 1   Global Step: 18500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:40,135-Speed 9566.90 samples/sec   Loss 10.7681   LearningRate 0.0892   Epoch: 1   Global Step: 18510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:41,201-Speed 9620.74 samples/sec   Loss 10.7388   LearningRate 0.0892   Epoch: 1   Global Step: 18520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:42,262-Speed 9652.89 samples/sec   Loss 10.7238   LearningRate 0.0892   Epoch: 1   Global Step: 18530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:43,337-Speed 9529.06 samples/sec   Loss 10.7184   LearningRate 0.0892   Epoch: 1   Global Step: 18540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:44,419-Speed 9475.13 samples/sec   Loss 10.8271   LearningRate 0.0892   Epoch: 1   Global Step: 18550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:45,492-Speed 9549.15 samples/sec   Loss 10.8114   LearningRate 0.0892   Epoch: 1   Global Step: 18560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:46,561-Speed 9580.89 samples/sec   Loss 10.7836   LearningRate 0.0892   Epoch: 1   Global Step: 18570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:47,610-Speed 9768.96 samples/sec   Loss 10.8046   LearningRate 0.0892   Epoch: 1   Global Step: 18580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:48,689-Speed 9501.80 samples/sec   Loss 10.8640   LearningRate 0.0892   Epoch: 1   Global Step: 18590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:49,781-Speed 9376.63 samples/sec   Loss 10.7711   LearningRate 0.0892   Epoch: 1   Global Step: 18600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:50,846-Speed 9625.76 samples/sec   Loss 10.8198   LearningRate 0.0892   Epoch: 1   Global Step: 18610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:51,951-Speed 9268.25 samples/sec   Loss 10.7694   LearningRate 0.0892   Epoch: 1   Global Step: 18620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:53,018-Speed 9611.06 samples/sec   Loss 10.8065   LearningRate 0.0891   Epoch: 1   Global Step: 18630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:54,110-Speed 9381.23 samples/sec   Loss 10.8746   LearningRate 0.0891   Epoch: 1   Global Step: 18640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:55,213-Speed 9285.69 samples/sec   Loss 10.7582   LearningRate 0.0891   Epoch: 1   Global Step: 18650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:56,305-Speed 9384.27 samples/sec   Loss 10.7459   LearningRate 0.0891   Epoch: 1   Global Step: 18660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:57,396-Speed 9390.48 samples/sec   Loss 10.8122   LearningRate 0.0891   Epoch: 1   Global Step: 18670   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:58,485-Speed 9406.17 samples/sec   Loss 10.7963   LearningRate 0.0891   Epoch: 1   Global Step: 18680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:17:59,584-Speed 9322.98 samples/sec   Loss 10.7727   LearningRate 0.0891   Epoch: 1   Global Step: 18690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:00,665-Speed 9483.59 samples/sec   Loss 10.7413   LearningRate 0.0891   Epoch: 1   Global Step: 18700   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:01,746-Speed 9476.19 samples/sec   Loss 10.7090   LearningRate 0.0891   Epoch: 1   Global Step: 18710   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:18:02,824-Speed 9509.37 samples/sec   Loss 10.6695   LearningRate 0.0891   Epoch: 1   Global Step: 18720   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:18:03,858-Speed 9909.10 samples/sec   Loss 10.7877   LearningRate 0.0891   Epoch: 1   Global Step: 18730   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:18:04,969-Speed 9222.56 samples/sec   Loss 10.6971   LearningRate 0.0891   Epoch: 1   Global Step: 18740   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:06,046-Speed 9505.25 samples/sec   Loss 10.7900   LearningRate 0.0891   Epoch: 1   Global Step: 18750   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:07,147-Speed 9312.80 samples/sec   Loss 10.6809   LearningRate 0.0891   Epoch: 1   Global Step: 18760   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:08,254-Speed 9251.25 samples/sec   Loss 10.6463   LearningRate 0.0891   Epoch: 1   Global Step: 18770   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:09,309-Speed 9708.03 samples/sec   Loss 10.8416   LearningRate 0.0891   Epoch: 1   Global Step: 18780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:10,379-Speed 9589.76 samples/sec   Loss 10.8635   LearningRate 0.0891   Epoch: 1   Global Step: 18790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:11,458-Speed 9497.63 samples/sec   Loss 10.7948   LearningRate 0.0891   Epoch: 1   Global Step: 18800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:12,565-Speed 9254.55 samples/sec   Loss 10.7881   LearningRate 0.0890   Epoch: 1   Global Step: 18810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:13,666-Speed 9303.34 samples/sec   Loss 10.7646   LearningRate 0.0890   Epoch: 1   Global Step: 18820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:14,766-Speed 9311.31 samples/sec   Loss 10.8125   LearningRate 0.0890   Epoch: 1   Global Step: 18830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:15,881-Speed 9188.43 samples/sec   Loss 10.7233   LearningRate 0.0890   Epoch: 1   Global Step: 18840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:16,971-Speed 9408.06 samples/sec   Loss 10.6236   LearningRate 0.0890   Epoch: 1   Global Step: 18850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:18,076-Speed 9269.75 samples/sec   Loss 10.7893   LearningRate 0.0890   Epoch: 1   Global Step: 18860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:19,229-Speed 8888.72 samples/sec   Loss 10.7349   LearningRate 0.0890   Epoch: 1   Global Step: 18870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:20,312-Speed 9459.49 samples/sec   Loss 10.8793   LearningRate 0.0890   Epoch: 1   Global Step: 18880   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:21,384-Speed 9559.35 samples/sec   Loss 10.8265   LearningRate 0.0890   Epoch: 1   Global Step: 18890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:22,462-Speed 9504.39 samples/sec   Loss 10.7562   LearningRate 0.0890   Epoch: 1   Global Step: 18900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:23,532-Speed 9576.88 samples/sec   Loss 10.8779   LearningRate 0.0890   Epoch: 1   Global Step: 18910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:24,624-Speed 9379.79 samples/sec   Loss 10.8022   LearningRate 0.0890   Epoch: 1   Global Step: 18920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:25,694-Speed 9572.47 samples/sec   Loss 10.7339   LearningRate 0.0890   Epoch: 1   Global Step: 18930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:26,821-Speed 9093.94 samples/sec   Loss 10.7065   LearningRate 0.0890   Epoch: 1   Global Step: 18940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:27,914-Speed 9377.87 samples/sec   Loss 10.7215   LearningRate 0.0890   Epoch: 1   Global Step: 18950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:28,976-Speed 9651.07 samples/sec   Loss 10.7712   LearningRate 0.0890   Epoch: 1   Global Step: 18960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:30,074-Speed 9333.24 samples/sec   Loss 10.7761   LearningRate 0.0890   Epoch: 1   Global Step: 18970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:31,162-Speed 9415.38 samples/sec   Loss 10.7703   LearningRate 0.0890   Epoch: 1   Global Step: 18980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:18:32,285-Speed 9127.37 samples/sec   Loss 10.6911   LearningRate 0.0889   Epoch: 1   Global Step: 18990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:33,362-Speed 9508.06 samples/sec   Loss 10.7797   LearningRate 0.0889   Epoch: 1   Global Step: 19000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:34,415-Speed 9730.67 samples/sec   Loss 10.8568   LearningRate 0.0889   Epoch: 1   Global Step: 19010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:35,490-Speed 9534.97 samples/sec   Loss 10.7805   LearningRate 0.0889   Epoch: 1   Global Step: 19020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:36,558-Speed 9587.58 samples/sec   Loss 10.6699   LearningRate 0.0889   Epoch: 1   Global Step: 19030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:37,652-Speed 9366.95 samples/sec   Loss 10.7952   LearningRate 0.0889   Epoch: 1   Global Step: 19040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:38,766-Speed 9199.69 samples/sec   Loss 10.7535   LearningRate 0.0889   Epoch: 1   Global Step: 19050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:39,852-Speed 9434.61 samples/sec   Loss 10.7693   LearningRate 0.0889   Epoch: 1   Global Step: 19060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:40,962-Speed 9228.08 samples/sec   Loss 10.6477   LearningRate 0.0889   Epoch: 1   Global Step: 19070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:42,080-Speed 9170.62 samples/sec   Loss 10.7889   LearningRate 0.0889   Epoch: 1   Global Step: 19080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:43,161-Speed 9474.60 samples/sec   Loss 10.7729   LearningRate 0.0889   Epoch: 1   Global Step: 19090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:44,244-Speed 9463.19 samples/sec   Loss 10.7642   LearningRate 0.0889   Epoch: 1   Global Step: 19100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:45,340-Speed 9352.10 samples/sec   Loss 10.6663   LearningRate 0.0889   Epoch: 1   Global Step: 19110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:46,412-Speed 9554.73 samples/sec   Loss 10.7662   LearningRate 0.0889   Epoch: 1   Global Step: 19120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:47,486-Speed 9545.79 samples/sec   Loss 10.6726   LearningRate 0.0889   Epoch: 1   Global Step: 19130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:48,571-Speed 9441.63 samples/sec   Loss 10.7560   LearningRate 0.0889   Epoch: 1   Global Step: 19140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:49,623-Speed 9737.26 samples/sec   Loss 10.7658   LearningRate 0.0889   Epoch: 1   Global Step: 19150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:50,677-Speed 9727.73 samples/sec   Loss 10.8267   LearningRate 0.0889   Epoch: 1   Global Step: 19160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:51,727-Speed 9755.99 samples/sec   Loss 10.6676   LearningRate 0.0888   Epoch: 1   Global Step: 19170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:52,815-Speed 9419.16 samples/sec   Loss 10.7561   LearningRate 0.0888   Epoch: 1   Global Step: 19180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:53,874-Speed 9676.54 samples/sec   Loss 10.9518   LearningRate 0.0888   Epoch: 1   Global Step: 19190   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:18:54,930-Speed 9697.64 samples/sec   Loss 10.7417   LearningRate 0.0888   Epoch: 1   Global Step: 19200   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:18:56,018-Speed 9420.84 samples/sec   Loss 10.8675   LearningRate 0.0888   Epoch: 1   Global Step: 19210   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:18:57,125-Speed 9254.24 samples/sec   Loss 10.7876   LearningRate 0.0888   Epoch: 1   Global Step: 19220   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:18:58,186-Speed 9654.83 samples/sec   Loss 10.8671   LearningRate 0.0888   Epoch: 1   Global Step: 19230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:18:59,243-Speed 9699.74 samples/sec   Loss 10.7873   LearningRate 0.0888   Epoch: 1   Global Step: 19240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:00,335-Speed 9386.29 samples/sec   Loss 10.7893   LearningRate 0.0888   Epoch: 1   Global Step: 19250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:01,458-Speed 9115.78 samples/sec   Loss 10.7994   LearningRate 0.0888   Epoch: 1   Global Step: 19260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:02,511-Speed 9737.40 samples/sec   Loss 10.7695   LearningRate 0.0888   Epoch: 1   Global Step: 19270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:03,632-Speed 9133.88 samples/sec   Loss 10.7937   LearningRate 0.0888   Epoch: 1   Global Step: 19280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:04,689-Speed 9692.75 samples/sec   Loss 10.8030   LearningRate 0.0888   Epoch: 1   Global Step: 19290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:05,750-Speed 9659.94 samples/sec   Loss 10.7087   LearningRate 0.0888   Epoch: 1   Global Step: 19300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:06,815-Speed 9622.66 samples/sec   Loss 10.7739   LearningRate 0.0888   Epoch: 1   Global Step: 19310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:07,889-Speed 9540.84 samples/sec   Loss 10.7178   LearningRate 0.0888   Epoch: 1   Global Step: 19320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:08,983-Speed 9360.54 samples/sec   Loss 10.7433   LearningRate 0.0888   Epoch: 1   Global Step: 19330   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:19:10,080-Speed 9342.22 samples/sec   Loss 10.7484   LearningRate 0.0887   Epoch: 1   Global Step: 19340   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:19:11,183-Speed 9295.43 samples/sec   Loss 10.8462   LearningRate 0.0887   Epoch: 1   Global Step: 19350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:12,265-Speed 9463.07 samples/sec   Loss 10.7893   LearningRate 0.0887   Epoch: 1   Global Step: 19360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:13,346-Speed 9482.73 samples/sec   Loss 10.8337   LearningRate 0.0887   Epoch: 1   Global Step: 19370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:19:14,495-Speed 8916.55 samples/sec   Loss 10.7566   LearningRate 0.0887   Epoch: 1   Global Step: 19380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:19:15,621-Speed 9096.58 samples/sec   Loss 10.6387   LearningRate 0.0887   Epoch: 1   Global Step: 19390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:19:16,687-Speed 9613.27 samples/sec   Loss 10.7331   LearningRate 0.0887   Epoch: 1   Global Step: 19400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:19:17,800-Speed 9219.50 samples/sec   Loss 10.7802   LearningRate 0.0887   Epoch: 1   Global Step: 19410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:19:18,900-Speed 9312.90 samples/sec   Loss 10.6722   LearningRate 0.0887   Epoch: 1   Global Step: 19420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:19:20,026-Speed 9097.30 samples/sec   Loss 10.6167   LearningRate 0.0887   Epoch: 1   Global Step: 19430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:19:21,135-Speed 9240.02 samples/sec   Loss 10.7549   LearningRate 0.0887   Epoch: 1   Global Step: 19440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:19:22,264-Speed 9075.82 samples/sec   Loss 10.6715   LearningRate 0.0887   Epoch: 1   Global Step: 19450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:19:23,389-Speed 9100.59 samples/sec   Loss 10.8036   LearningRate 0.0887   Epoch: 1   Global Step: 19460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:19:24,488-Speed 9325.09 samples/sec   Loss 10.7606   LearningRate 0.0887   Epoch: 1   Global Step: 19470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:25,554-Speed 9615.37 samples/sec   Loss 10.7314   LearningRate 0.0887   Epoch: 1   Global Step: 19480   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:26,623-Speed 9583.86 samples/sec   Loss 10.7554   LearningRate 0.0887   Epoch: 1   Global Step: 19490   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:27,676-Speed 9729.78 samples/sec   Loss 10.7534   LearningRate 0.0887   Epoch: 1   Global Step: 19500   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:28,768-Speed 9389.42 samples/sec   Loss 10.7790   LearningRate 0.0887   Epoch: 1   Global Step: 19510   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:29,867-Speed 9324.82 samples/sec   Loss 10.7053   LearningRate 0.0886   Epoch: 1   Global Step: 19520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:30,978-Speed 9216.68 samples/sec   Loss 10.7521   LearningRate 0.0886   Epoch: 1   Global Step: 19530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:32,058-Speed 9490.41 samples/sec   Loss 10.8390   LearningRate 0.0886   Epoch: 1   Global Step: 19540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:33,167-Speed 9241.25 samples/sec   Loss 10.6409   LearningRate 0.0886   Epoch: 1   Global Step: 19550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:34,282-Speed 9188.14 samples/sec   Loss 10.6438   LearningRate 0.0886   Epoch: 1   Global Step: 19560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:35,383-Speed 9305.96 samples/sec   Loss 10.7333   LearningRate 0.0886   Epoch: 1   Global Step: 19570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:36,495-Speed 9213.83 samples/sec   Loss 10.7123   LearningRate 0.0886   Epoch: 1   Global Step: 19580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:37,613-Speed 9164.81 samples/sec   Loss 10.8527   LearningRate 0.0886   Epoch: 1   Global Step: 19590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:38,711-Speed 9330.78 samples/sec   Loss 10.6695   LearningRate 0.0886   Epoch: 1   Global Step: 19600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:39,791-Speed 9479.95 samples/sec   Loss 10.7843   LearningRate 0.0886   Epoch: 1   Global Step: 19610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:40,907-Speed 9190.87 samples/sec   Loss 10.7466   LearningRate 0.0886   Epoch: 1   Global Step: 19620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:42,000-Speed 9373.64 samples/sec   Loss 10.7201   LearningRate 0.0886   Epoch: 1   Global Step: 19630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:19:43,109-Speed 9238.22 samples/sec   Loss 10.7362   LearningRate 0.0886   Epoch: 1   Global Step: 19640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:19:44,189-Speed 9484.83 samples/sec   Loss 10.5849   LearningRate 0.0886   Epoch: 1   Global Step: 19650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:19:45,314-Speed 9106.59 samples/sec   Loss 10.7296   LearningRate 0.0886   Epoch: 1   Global Step: 19660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:19:46,403-Speed 9414.63 samples/sec   Loss 10.8111   LearningRate 0.0886   Epoch: 1   Global Step: 19670   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:19:47,479-Speed 9519.55 samples/sec   Loss 10.8104   LearningRate 0.0886   Epoch: 1   Global Step: 19680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:19:48,569-Speed 9403.96 samples/sec   Loss 10.7312   LearningRate 0.0886   Epoch: 1   Global Step: 19690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:19:49,657-Speed 9419.49 samples/sec   Loss 10.7839   LearningRate 0.0885   Epoch: 1   Global Step: 19700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:19:50,743-Speed 9434.24 samples/sec   Loss 10.7490   LearningRate 0.0885   Epoch: 1   Global Step: 19710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:19:51,793-Speed 9755.70 samples/sec   Loss 10.7593   LearningRate 0.0885   Epoch: 1   Global Step: 19720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:19:52,880-Speed 9428.24 samples/sec   Loss 10.7427   LearningRate 0.0885   Epoch: 1   Global Step: 19730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:19:54,008-Speed 9082.30 samples/sec   Loss 10.7352   LearningRate 0.0885   Epoch: 1   Global Step: 19740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:19:55,092-Speed 9450.35 samples/sec   Loss 10.6681   LearningRate 0.0885   Epoch: 1   Global Step: 19750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:19:56,185-Speed 9379.32 samples/sec   Loss 10.7835   LearningRate 0.0885   Epoch: 1   Global Step: 19760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:19:57,304-Speed 9153.65 samples/sec   Loss 10.8730   LearningRate 0.0885   Epoch: 1   Global Step: 19770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:19:58,413-Speed 9235.73 samples/sec   Loss 10.7490   LearningRate 0.0885   Epoch: 1   Global Step: 19780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:19:59,512-Speed 9324.19 samples/sec   Loss 10.7979   LearningRate 0.0885   Epoch: 1   Global Step: 19790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:00,632-Speed 9142.59 samples/sec   Loss 10.6528   LearningRate 0.0885   Epoch: 1   Global Step: 19800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:01,706-Speed 9543.68 samples/sec   Loss 10.8222   LearningRate 0.0885   Epoch: 1   Global Step: 19810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:02,799-Speed 9375.99 samples/sec   Loss 10.6647   LearningRate 0.0885   Epoch: 1   Global Step: 19820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:03,859-Speed 9667.53 samples/sec   Loss 10.8203   LearningRate 0.0885   Epoch: 1   Global Step: 19830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:04,927-Speed 9599.74 samples/sec   Loss 10.6521   LearningRate 0.0885   Epoch: 1   Global Step: 19840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:06,095-Speed 8770.15 samples/sec   Loss 10.7133   LearningRate 0.0885   Epoch: 1   Global Step: 19850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:07,181-Speed 9435.54 samples/sec   Loss 10.7909   LearningRate 0.0885   Epoch: 1   Global Step: 19860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:08,313-Speed 9050.20 samples/sec   Loss 10.7242   LearningRate 0.0884   Epoch: 1   Global Step: 19870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:09,385-Speed 9560.06 samples/sec   Loss 10.5222   LearningRate 0.0884   Epoch: 1   Global Step: 19880   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:20:10,462-Speed 9507.28 samples/sec   Loss 10.8281   LearningRate 0.0884   Epoch: 1   Global Step: 19890   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:20:11,611-Speed 8915.04 samples/sec   Loss 10.6846   LearningRate 0.0884   Epoch: 1   Global Step: 19900   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:20:12,713-Speed 9300.04 samples/sec   Loss 10.6810   LearningRate 0.0884   Epoch: 1   Global Step: 19910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:13,830-Speed 9171.29 samples/sec   Loss 10.6147   LearningRate 0.0884   Epoch: 1   Global Step: 19920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:14,923-Speed 9378.19 samples/sec   Loss 10.6906   LearningRate 0.0884   Epoch: 1   Global Step: 19930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:16,020-Speed 9337.40 samples/sec   Loss 10.6311   LearningRate 0.0884   Epoch: 1   Global Step: 19940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:17,096-Speed 9521.39 samples/sec   Loss 10.6934   LearningRate 0.0884   Epoch: 1   Global Step: 19950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:18,153-Speed 9699.96 samples/sec   Loss 10.6639   LearningRate 0.0884   Epoch: 1   Global Step: 19960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:19,255-Speed 9293.17 samples/sec   Loss 10.7168   LearningRate 0.0884   Epoch: 1   Global Step: 19970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:20,375-Speed 9151.86 samples/sec   Loss 10.7195   LearningRate 0.0884   Epoch: 1   Global Step: 19980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:21,455-Speed 9486.40 samples/sec   Loss 10.6113   LearningRate 0.0884   Epoch: 1   Global Step: 19990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:22,568-Speed 9211.05 samples/sec   Loss 10.5440   LearningRate 0.0884   Epoch: 1   Global Step: 20000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:20:44,654-[lfw][20000]XNorm: 13.832026
Training: 2022-04-11 12:20:44,655-[lfw][20000]Accuracy-Flip: 0.99383+-0.00373
Training: 2022-04-11 12:20:44,655-[lfw][20000]Accuracy-Highest: 0.99383
Training: 2022-04-11 12:21:10,148-[cfp_fp][20000]XNorm: 11.748980
Training: 2022-04-11 12:21:10,149-[cfp_fp][20000]Accuracy-Flip: 0.92257+-0.01751
Training: 2022-04-11 12:21:10,150-[cfp_fp][20000]Accuracy-Highest: 0.92257
Training: 2022-04-11 12:21:32,199-[agedb_30][20000]XNorm: 13.340283
Training: 2022-04-11 12:21:32,200-[agedb_30][20000]Accuracy-Flip: 0.93417+-0.01375
Training: 2022-04-11 12:21:32,201-[agedb_30][20000]Accuracy-Highest: 0.93433
Training: 2022-04-11 12:21:33,294-Speed 144.79 samples/sec   Loss 10.7104   LearningRate 0.0884   Epoch: 1   Global Step: 20010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:34,389-Speed 9358.17 samples/sec   Loss 10.5733   LearningRate 0.0884   Epoch: 1   Global Step: 20020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:35,464-Speed 9526.09 samples/sec   Loss 10.7952   LearningRate 0.0884   Epoch: 1   Global Step: 20030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:36,535-Speed 9567.56 samples/sec   Loss 10.5917   LearningRate 0.0884   Epoch: 1   Global Step: 20040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:37,625-Speed 9402.65 samples/sec   Loss 10.8281   LearningRate 0.0883   Epoch: 1   Global Step: 20050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:38,711-Speed 9432.96 samples/sec   Loss 10.7512   LearningRate 0.0883   Epoch: 1   Global Step: 20060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:39,779-Speed 9600.10 samples/sec   Loss 10.6674   LearningRate 0.0883   Epoch: 1   Global Step: 20070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:40,822-Speed 9820.31 samples/sec   Loss 10.6337   LearningRate 0.0883   Epoch: 1   Global Step: 20080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:41,889-Speed 9600.71 samples/sec   Loss 10.7269   LearningRate 0.0883   Epoch: 1   Global Step: 20090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:42,953-Speed 9632.33 samples/sec   Loss 10.6887   LearningRate 0.0883   Epoch: 1   Global Step: 20100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:44,020-Speed 9602.11 samples/sec   Loss 10.6978   LearningRate 0.0883   Epoch: 1   Global Step: 20110   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:21:45,124-Speed 9276.55 samples/sec   Loss 10.6848   LearningRate 0.0883   Epoch: 1   Global Step: 20120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:46,209-Speed 9444.66 samples/sec   Loss 10.7738   LearningRate 0.0883   Epoch: 1   Global Step: 20130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:47,294-Speed 9449.90 samples/sec   Loss 10.7653   LearningRate 0.0883   Epoch: 1   Global Step: 20140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:48,402-Speed 9241.77 samples/sec   Loss 10.7595   LearningRate 0.0883   Epoch: 1   Global Step: 20150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:49,543-Speed 8981.81 samples/sec   Loss 10.8360   LearningRate 0.0883   Epoch: 1   Global Step: 20160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:50,608-Speed 9620.90 samples/sec   Loss 10.7603   LearningRate 0.0883   Epoch: 1   Global Step: 20170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:51,707-Speed 9320.06 samples/sec   Loss 10.8175   LearningRate 0.0883   Epoch: 1   Global Step: 20180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:52,778-Speed 9569.44 samples/sec   Loss 10.5257   LearningRate 0.0883   Epoch: 1   Global Step: 20190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:53,873-Speed 9361.08 samples/sec   Loss 10.7392   LearningRate 0.0883   Epoch: 1   Global Step: 20200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:21:54,953-Speed 9491.20 samples/sec   Loss 10.7620   LearningRate 0.0883   Epoch: 1   Global Step: 20210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:21:56,038-Speed 9440.48 samples/sec   Loss 10.7555   LearningRate 0.0883   Epoch: 1   Global Step: 20220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:21:57,161-Speed 9121.05 samples/sec   Loss 10.5662   LearningRate 0.0882   Epoch: 1   Global Step: 20230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:21:58,299-Speed 9002.25 samples/sec   Loss 10.7779   LearningRate 0.0882   Epoch: 1   Global Step: 20240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:21:59,402-Speed 9292.56 samples/sec   Loss 10.7917   LearningRate 0.0882   Epoch: 1   Global Step: 20250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:00,455-Speed 9730.85 samples/sec   Loss 10.7439   LearningRate 0.0882   Epoch: 1   Global Step: 20260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:01,506-Speed 9742.01 samples/sec   Loss 10.6818   LearningRate 0.0882   Epoch: 1   Global Step: 20270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:02,560-Speed 9725.40 samples/sec   Loss 10.6588   LearningRate 0.0882   Epoch: 1   Global Step: 20280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:03,606-Speed 9789.53 samples/sec   Loss 10.8553   LearningRate 0.0882   Epoch: 1   Global Step: 20290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:04,647-Speed 9848.81 samples/sec   Loss 10.7270   LearningRate 0.0882   Epoch: 1   Global Step: 20300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:05,708-Speed 9655.86 samples/sec   Loss 10.7101   LearningRate 0.0882   Epoch: 1   Global Step: 20310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:06,770-Speed 9646.87 samples/sec   Loss 10.6262   LearningRate 0.0882   Epoch: 1   Global Step: 20320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:07,824-Speed 9721.94 samples/sec   Loss 10.5392   LearningRate 0.0882   Epoch: 1   Global Step: 20330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:08,921-Speed 9339.14 samples/sec   Loss 10.7871   LearningRate 0.0882   Epoch: 1   Global Step: 20340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:10,026-Speed 9272.90 samples/sec   Loss 10.8169   LearningRate 0.0882   Epoch: 1   Global Step: 20350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:11,133-Speed 9257.92 samples/sec   Loss 10.7157   LearningRate 0.0882   Epoch: 1   Global Step: 20360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:12,204-Speed 9571.56 samples/sec   Loss 10.6630   LearningRate 0.0882   Epoch: 1   Global Step: 20370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:13,259-Speed 9708.01 samples/sec   Loss 10.6168   LearningRate 0.0882   Epoch: 1   Global Step: 20380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:14,319-Speed 9670.34 samples/sec   Loss 10.6506   LearningRate 0.0882   Epoch: 1   Global Step: 20390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:15,379-Speed 9663.27 samples/sec   Loss 10.7491   LearningRate 0.0882   Epoch: 1   Global Step: 20400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:16,478-Speed 9330.35 samples/sec   Loss 10.7160   LearningRate 0.0881   Epoch: 1   Global Step: 20410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:17,543-Speed 9613.39 samples/sec   Loss 10.6581   LearningRate 0.0881   Epoch: 1   Global Step: 20420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:18,649-Speed 9269.98 samples/sec   Loss 10.7002   LearningRate 0.0881   Epoch: 1   Global Step: 20430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:19,739-Speed 9402.25 samples/sec   Loss 10.5835   LearningRate 0.0881   Epoch: 1   Global Step: 20440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:20,819-Speed 9479.57 samples/sec   Loss 10.7084   LearningRate 0.0881   Epoch: 1   Global Step: 20450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:21,910-Speed 9395.98 samples/sec   Loss 10.7959   LearningRate 0.0881   Epoch: 1   Global Step: 20460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:22,985-Speed 9537.59 samples/sec   Loss 10.8569   LearningRate 0.0881   Epoch: 1   Global Step: 20470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:24,090-Speed 9268.49 samples/sec   Loss 10.7308   LearningRate 0.0881   Epoch: 1   Global Step: 20480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:25,214-Speed 9117.21 samples/sec   Loss 10.7536   LearningRate 0.0881   Epoch: 1   Global Step: 20490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:26,285-Speed 9567.95 samples/sec   Loss 10.6756   LearningRate 0.0881   Epoch: 1   Global Step: 20500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:27,385-Speed 9313.38 samples/sec   Loss 10.8099   LearningRate 0.0881   Epoch: 1   Global Step: 20510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:22:28,457-Speed 9557.76 samples/sec   Loss 10.5507   LearningRate 0.0881   Epoch: 1   Global Step: 20520   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:29,551-Speed 9362.64 samples/sec   Loss 10.5772   LearningRate 0.0881   Epoch: 1   Global Step: 20530   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:30,624-Speed 9556.66 samples/sec   Loss 10.6399   LearningRate 0.0881   Epoch: 1   Global Step: 20540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:31,673-Speed 9762.39 samples/sec   Loss 10.7334   LearningRate 0.0881   Epoch: 1   Global Step: 20550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:32,783-Speed 9231.15 samples/sec   Loss 10.6409   LearningRate 0.0881   Epoch: 1   Global Step: 20560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:33,865-Speed 9469.29 samples/sec   Loss 10.7275   LearningRate 0.0881   Epoch: 1   Global Step: 20570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:34,937-Speed 9563.17 samples/sec   Loss 10.7162   LearningRate 0.0881   Epoch: 1   Global Step: 20580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:36,031-Speed 9369.80 samples/sec   Loss 10.7910   LearningRate 0.0880   Epoch: 1   Global Step: 20590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:37,099-Speed 9590.96 samples/sec   Loss 10.7013   LearningRate 0.0880   Epoch: 1   Global Step: 20600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:38,242-Speed 8964.08 samples/sec   Loss 10.6827   LearningRate 0.0880   Epoch: 1   Global Step: 20610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:39,328-Speed 9431.18 samples/sec   Loss 10.8137   LearningRate 0.0880   Epoch: 1   Global Step: 20620   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:22:40,430-Speed 9298.94 samples/sec   Loss 10.6835   LearningRate 0.0880   Epoch: 1   Global Step: 20630   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:22:41,499-Speed 9589.58 samples/sec   Loss 10.6843   LearningRate 0.0880   Epoch: 1   Global Step: 20640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:42,598-Speed 9321.82 samples/sec   Loss 10.6728   LearningRate 0.0880   Epoch: 1   Global Step: 20650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:43,684-Speed 9430.33 samples/sec   Loss 10.7580   LearningRate 0.0880   Epoch: 1   Global Step: 20660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:44,744-Speed 9667.88 samples/sec   Loss 10.5834   LearningRate 0.0880   Epoch: 1   Global Step: 20670   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:45,829-Speed 9448.97 samples/sec   Loss 10.7169   LearningRate 0.0880   Epoch: 1   Global Step: 20680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:46,910-Speed 9473.43 samples/sec   Loss 10.5097   LearningRate 0.0880   Epoch: 1   Global Step: 20690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:47,995-Speed 9443.55 samples/sec   Loss 10.6973   LearningRate 0.0880   Epoch: 1   Global Step: 20700   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:49,088-Speed 9372.09 samples/sec   Loss 10.6618   LearningRate 0.0880   Epoch: 1   Global Step: 20710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:50,181-Speed 9381.33 samples/sec   Loss 10.6414   LearningRate 0.0880   Epoch: 1   Global Step: 20720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:51,248-Speed 9595.21 samples/sec   Loss 10.6237   LearningRate 0.0880   Epoch: 1   Global Step: 20730   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:52,291-Speed 9825.96 samples/sec   Loss 10.7041   LearningRate 0.0880   Epoch: 1   Global Step: 20740   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:22:53,364-Speed 9556.88 samples/sec   Loss 10.6172   LearningRate 0.0880   Epoch: 1   Global Step: 20750   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:22:54,435-Speed 9562.41 samples/sec   Loss 10.5922   LearningRate 0.0879   Epoch: 1   Global Step: 20760   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:22:55,492-Speed 9694.21 samples/sec   Loss 10.6505   LearningRate 0.0879   Epoch: 1   Global Step: 20770   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:56,591-Speed 9325.25 samples/sec   Loss 10.7077   LearningRate 0.0879   Epoch: 1   Global Step: 20780   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:57,668-Speed 9514.31 samples/sec   Loss 10.5843   LearningRate 0.0879   Epoch: 1   Global Step: 20790   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:58,785-Speed 9169.09 samples/sec   Loss 10.7184   LearningRate 0.0879   Epoch: 1   Global Step: 20800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:22:59,851-Speed 9615.20 samples/sec   Loss 10.6391   LearningRate 0.0879   Epoch: 1   Global Step: 20810   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:00,920-Speed 9581.43 samples/sec   Loss 10.7262   LearningRate 0.0879   Epoch: 1   Global Step: 20820   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:02,010-Speed 9403.65 samples/sec   Loss 10.6754   LearningRate 0.0879   Epoch: 1   Global Step: 20830   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:03,110-Speed 9318.71 samples/sec   Loss 10.5945   LearningRate 0.0879   Epoch: 1   Global Step: 20840   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:04,245-Speed 9028.10 samples/sec   Loss 10.7485   LearningRate 0.0879   Epoch: 1   Global Step: 20850   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:05,290-Speed 9806.63 samples/sec   Loss 10.6739   LearningRate 0.0879   Epoch: 1   Global Step: 20860   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:06,353-Speed 9636.90 samples/sec   Loss 10.7512   LearningRate 0.0879   Epoch: 1   Global Step: 20870   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:23:07,411-Speed 9681.46 samples/sec   Loss 10.6768   LearningRate 0.0879   Epoch: 1   Global Step: 20880   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:23:08,546-Speed 9031.45 samples/sec   Loss 10.5162   LearningRate 0.0879   Epoch: 1   Global Step: 20890   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:09,671-Speed 9104.98 samples/sec   Loss 10.6642   LearningRate 0.0879   Epoch: 1   Global Step: 20900   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:10,740-Speed 9586.05 samples/sec   Loss 10.7696   LearningRate 0.0879   Epoch: 1   Global Step: 20910   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:11,828-Speed 9416.76 samples/sec   Loss 10.5781   LearningRate 0.0879   Epoch: 1   Global Step: 20920   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:12,990-Speed 8812.44 samples/sec   Loss 10.5929   LearningRate 0.0879   Epoch: 1   Global Step: 20930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:14,095-Speed 9278.26 samples/sec   Loss 10.6150   LearningRate 0.0878   Epoch: 1   Global Step: 20940   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:15,161-Speed 9613.75 samples/sec   Loss 10.7179   LearningRate 0.0878   Epoch: 1   Global Step: 20950   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:16,266-Speed 9276.92 samples/sec   Loss 10.6442   LearningRate 0.0878   Epoch: 1   Global Step: 20960   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:17,379-Speed 9201.86 samples/sec   Loss 10.6734   LearningRate 0.0878   Epoch: 1   Global Step: 20970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:18,450-Speed 9569.85 samples/sec   Loss 10.6120   LearningRate 0.0878   Epoch: 1   Global Step: 20980   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:19,528-Speed 9502.53 samples/sec   Loss 10.6470   LearningRate 0.0878   Epoch: 1   Global Step: 20990   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:20,614-Speed 9430.52 samples/sec   Loss 10.5393   LearningRate 0.0878   Epoch: 1   Global Step: 21000   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:21,669-Speed 9721.09 samples/sec   Loss 10.5581   LearningRate 0.0878   Epoch: 1   Global Step: 21010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:22,764-Speed 9358.34 samples/sec   Loss 10.6982   LearningRate 0.0878   Epoch: 1   Global Step: 21020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:23,889-Speed 9105.20 samples/sec   Loss 10.7087   LearningRate 0.0878   Epoch: 1   Global Step: 21030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:24,983-Speed 9366.11 samples/sec   Loss 10.6099   LearningRate 0.0878   Epoch: 1   Global Step: 21040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:26,071-Speed 9419.56 samples/sec   Loss 10.5759   LearningRate 0.0878   Epoch: 1   Global Step: 21050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:27,149-Speed 9498.01 samples/sec   Loss 10.6313   LearningRate 0.0878   Epoch: 1   Global Step: 21060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:28,281-Speed 9053.32 samples/sec   Loss 10.5412   LearningRate 0.0878   Epoch: 1   Global Step: 21070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:29,433-Speed 8890.88 samples/sec   Loss 10.6015   LearningRate 0.0878   Epoch: 1   Global Step: 21080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:30,504-Speed 9566.08 samples/sec   Loss 10.5520   LearningRate 0.0878   Epoch: 1   Global Step: 21090   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:23:31,562-Speed 9687.39 samples/sec   Loss 10.4695   LearningRate 0.0878   Epoch: 1   Global Step: 21100   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:23:32,700-Speed 9009.23 samples/sec   Loss 10.6797   LearningRate 0.0878   Epoch: 1   Global Step: 21110   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:23:33,804-Speed 9274.61 samples/sec   Loss 10.6909   LearningRate 0.0877   Epoch: 1   Global Step: 21120   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:23:34,857-Speed 9738.28 samples/sec   Loss 10.5583   LearningRate 0.0877   Epoch: 1   Global Step: 21130   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:23:35,936-Speed 9491.43 samples/sec   Loss 10.4924   LearningRate 0.0877   Epoch: 1   Global Step: 21140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:37,018-Speed 9474.62 samples/sec   Loss 10.4969   LearningRate 0.0877   Epoch: 1   Global Step: 21150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:38,095-Speed 9512.89 samples/sec   Loss 10.6115   LearningRate 0.0877   Epoch: 1   Global Step: 21160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:39,188-Speed 9372.97 samples/sec   Loss 10.5438   LearningRate 0.0877   Epoch: 1   Global Step: 21170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:40,285-Speed 9335.68 samples/sec   Loss 10.5456   LearningRate 0.0877   Epoch: 1   Global Step: 21180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:41,354-Speed 9587.28 samples/sec   Loss 10.5583   LearningRate 0.0877   Epoch: 1   Global Step: 21190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:42,428-Speed 9541.07 samples/sec   Loss 10.6985   LearningRate 0.0877   Epoch: 1   Global Step: 21200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:43,482-Speed 9714.79 samples/sec   Loss 10.4817   LearningRate 0.0877   Epoch: 1   Global Step: 21210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:23:44,570-Speed 9422.49 samples/sec   Loss 10.6604   LearningRate 0.0877   Epoch: 1   Global Step: 21220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:23:45,666-Speed 9343.48 samples/sec   Loss 10.5686   LearningRate 0.0877   Epoch: 1   Global Step: 21230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:23:46,757-Speed 9399.27 samples/sec   Loss 10.6045   LearningRate 0.0877   Epoch: 1   Global Step: 21240   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:23:47,823-Speed 9610.25 samples/sec   Loss 10.5822   LearningRate 0.0877   Epoch: 1   Global Step: 21250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:23:48,895-Speed 9559.15 samples/sec   Loss 10.6647   LearningRate 0.0877   Epoch: 1   Global Step: 21260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:23:49,964-Speed 9583.09 samples/sec   Loss 10.6536   LearningRate 0.0877   Epoch: 1   Global Step: 21270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:23:51,078-Speed 9196.98 samples/sec   Loss 10.7121   LearningRate 0.0877   Epoch: 1   Global Step: 21280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:23:52,199-Speed 9141.61 samples/sec   Loss 10.6063   LearningRate 0.0877   Epoch: 1   Global Step: 21290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:23:53,265-Speed 9611.22 samples/sec   Loss 10.7077   LearningRate 0.0876   Epoch: 1   Global Step: 21300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:23:54,333-Speed 9600.37 samples/sec   Loss 10.5154   LearningRate 0.0876   Epoch: 1   Global Step: 21310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:23:55,392-Speed 9670.93 samples/sec   Loss 10.5617   LearningRate 0.0876   Epoch: 1   Global Step: 21320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:23:56,450-Speed 9684.39 samples/sec   Loss 10.5770   LearningRate 0.0876   Epoch: 1   Global Step: 21330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:23:57,519-Speed 9580.98 samples/sec   Loss 10.6684   LearningRate 0.0876   Epoch: 1   Global Step: 21340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:23:58,600-Speed 9477.51 samples/sec   Loss 10.6837   LearningRate 0.0876   Epoch: 1   Global Step: 21350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:23:59,762-Speed 8822.39 samples/sec   Loss 10.5150   LearningRate 0.0876   Epoch: 1   Global Step: 21360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:00,861-Speed 9325.45 samples/sec   Loss 10.5159   LearningRate 0.0876   Epoch: 1   Global Step: 21370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:01,969-Speed 9246.54 samples/sec   Loss 10.7236   LearningRate 0.0876   Epoch: 1   Global Step: 21380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:03,008-Speed 9856.70 samples/sec   Loss 10.6637   LearningRate 0.0876   Epoch: 1   Global Step: 21390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:04,096-Speed 9417.27 samples/sec   Loss 10.5238   LearningRate 0.0876   Epoch: 1   Global Step: 21400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:05,199-Speed 9290.82 samples/sec   Loss 10.5995   LearningRate 0.0876   Epoch: 1   Global Step: 21410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:06,268-Speed 9588.75 samples/sec   Loss 10.5176   LearningRate 0.0876   Epoch: 1   Global Step: 21420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:07,366-Speed 9328.53 samples/sec   Loss 10.6418   LearningRate 0.0876   Epoch: 1   Global Step: 21430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:08,443-Speed 9520.21 samples/sec   Loss 10.4559   LearningRate 0.0876   Epoch: 1   Global Step: 21440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:09,555-Speed 9211.80 samples/sec   Loss 10.5420   LearningRate 0.0876   Epoch: 1   Global Step: 21450   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:24:10,626-Speed 9567.96 samples/sec   Loss 10.5625   LearningRate 0.0876   Epoch: 1   Global Step: 21460   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:24:11,685-Speed 9673.95 samples/sec   Loss 10.6341   LearningRate 0.0876   Epoch: 1   Global Step: 21470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:12,744-Speed 9680.05 samples/sec   Loss 10.7319   LearningRate 0.0875   Epoch: 1   Global Step: 21480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:13,838-Speed 9362.33 samples/sec   Loss 10.5825   LearningRate 0.0875   Epoch: 1   Global Step: 21490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:14,925-Speed 9425.06 samples/sec   Loss 10.5363   LearningRate 0.0875   Epoch: 1   Global Step: 21500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:15,978-Speed 9729.06 samples/sec   Loss 10.3902   LearningRate 0.0875   Epoch: 1   Global Step: 21510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:17,060-Speed 9469.24 samples/sec   Loss 10.5600   LearningRate 0.0875   Epoch: 1   Global Step: 21520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:18,144-Speed 9453.41 samples/sec   Loss 10.5266   LearningRate 0.0875   Epoch: 1   Global Step: 21530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:19,196-Speed 9739.49 samples/sec   Loss 10.6076   LearningRate 0.0875   Epoch: 1   Global Step: 21540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:20,285-Speed 9407.20 samples/sec   Loss 10.6040   LearningRate 0.0875   Epoch: 1   Global Step: 21550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:21,344-Speed 9678.54 samples/sec   Loss 10.4400   LearningRate 0.0875   Epoch: 1   Global Step: 21560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:22,457-Speed 9204.99 samples/sec   Loss 10.5324   LearningRate 0.0875   Epoch: 1   Global Step: 21570   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:24:23,540-Speed 9465.15 samples/sec   Loss 10.5145   LearningRate 0.0875   Epoch: 1   Global Step: 21580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:24,622-Speed 9472.46 samples/sec   Loss 10.6145   LearningRate 0.0875   Epoch: 1   Global Step: 21590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:25,664-Speed 9833.83 samples/sec   Loss 10.5065   LearningRate 0.0875   Epoch: 1   Global Step: 21600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:26,758-Speed 9366.00 samples/sec   Loss 10.5579   LearningRate 0.0875   Epoch: 1   Global Step: 21610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:27,822-Speed 9621.68 samples/sec   Loss 10.5528   LearningRate 0.0875   Epoch: 1   Global Step: 21620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:28,922-Speed 9315.80 samples/sec   Loss 10.6590   LearningRate 0.0875   Epoch: 1   Global Step: 21630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:30,038-Speed 9183.14 samples/sec   Loss 10.5925   LearningRate 0.0875   Epoch: 1   Global Step: 21640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:31,151-Speed 9206.13 samples/sec   Loss 10.6071   LearningRate 0.0874   Epoch: 1   Global Step: 21650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:32,221-Speed 9580.89 samples/sec   Loss 10.5458   LearningRate 0.0874   Epoch: 1   Global Step: 21660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:33,323-Speed 9292.69 samples/sec   Loss 10.5553   LearningRate 0.0874   Epoch: 1   Global Step: 21670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:34,408-Speed 9441.62 samples/sec   Loss 10.5002   LearningRate 0.0874   Epoch: 1   Global Step: 21680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:35,493-Speed 9452.32 samples/sec   Loss 10.5427   LearningRate 0.0874   Epoch: 1   Global Step: 21690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:36,600-Speed 9254.45 samples/sec   Loss 10.5575   LearningRate 0.0874   Epoch: 1   Global Step: 21700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:37,688-Speed 9413.65 samples/sec   Loss 10.5588   LearningRate 0.0874   Epoch: 1   Global Step: 21710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:38,727-Speed 9858.18 samples/sec   Loss 10.6535   LearningRate 0.0874   Epoch: 1   Global Step: 21720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:39,807-Speed 9487.58 samples/sec   Loss 10.5903   LearningRate 0.0874   Epoch: 1   Global Step: 21730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:40,915-Speed 9244.15 samples/sec   Loss 10.4848   LearningRate 0.0874   Epoch: 1   Global Step: 21740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:41,996-Speed 9486.32 samples/sec   Loss 10.5308   LearningRate 0.0874   Epoch: 1   Global Step: 21750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:43,140-Speed 8956.45 samples/sec   Loss 10.5724   LearningRate 0.0874   Epoch: 1   Global Step: 21760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:44,278-Speed 9000.32 samples/sec   Loss 10.5411   LearningRate 0.0874   Epoch: 1   Global Step: 21770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:45,361-Speed 9460.47 samples/sec   Loss 10.6020   LearningRate 0.0874   Epoch: 1   Global Step: 21780   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:24:46,427-Speed 9612.61 samples/sec   Loss 10.5885   LearningRate 0.0874   Epoch: 1   Global Step: 21790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:47,522-Speed 9364.40 samples/sec   Loss 10.6697   LearningRate 0.0874   Epoch: 1   Global Step: 21800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:24:48,636-Speed 9190.18 samples/sec   Loss 10.5559   LearningRate 0.0874   Epoch: 1   Global Step: 21810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:24:49,759-Speed 9124.74 samples/sec   Loss 10.5105   LearningRate 0.0874   Epoch: 1   Global Step: 21820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:24:50,853-Speed 9366.26 samples/sec   Loss 10.4907   LearningRate 0.0873   Epoch: 1   Global Step: 21830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:24:51,912-Speed 9672.61 samples/sec   Loss 10.5998   LearningRate 0.0873   Epoch: 1   Global Step: 21840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:24:52,977-Speed 9625.22 samples/sec   Loss 10.5149   LearningRate 0.0873   Epoch: 1   Global Step: 21850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:24:54,064-Speed 9428.23 samples/sec   Loss 10.5805   LearningRate 0.0873   Epoch: 1   Global Step: 21860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:24:55,166-Speed 9295.89 samples/sec   Loss 10.6261   LearningRate 0.0873   Epoch: 1   Global Step: 21870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:24:56,259-Speed 9375.04 samples/sec   Loss 10.4220   LearningRate 0.0873   Epoch: 1   Global Step: 21880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:24:57,367-Speed 9248.19 samples/sec   Loss 10.6654   LearningRate 0.0873   Epoch: 1   Global Step: 21890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:24:58,431-Speed 9627.02 samples/sec   Loss 10.7219   LearningRate 0.0873   Epoch: 1   Global Step: 21900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:24:59,524-Speed 9376.85 samples/sec   Loss 10.5424   LearningRate 0.0873   Epoch: 1   Global Step: 21910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:25:00,633-Speed 9241.12 samples/sec   Loss 10.6173   LearningRate 0.0873   Epoch: 1   Global Step: 21920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:25:01,730-Speed 9335.12 samples/sec   Loss 10.5134   LearningRate 0.0873   Epoch: 1   Global Step: 21930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:25:02,783-Speed 9729.91 samples/sec   Loss 10.5015   LearningRate 0.0873   Epoch: 1   Global Step: 21940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:25:03,873-Speed 9402.38 samples/sec   Loss 10.5379   LearningRate 0.0873   Epoch: 1   Global Step: 21950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:25:04,946-Speed 9549.53 samples/sec   Loss 10.5907   LearningRate 0.0873   Epoch: 1   Global Step: 21960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:25:06,017-Speed 9573.16 samples/sec   Loss 10.4607   LearningRate 0.0873   Epoch: 1   Global Step: 21970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:25:07,113-Speed 9346.40 samples/sec   Loss 10.5958   LearningRate 0.0873   Epoch: 1   Global Step: 21980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:25:08,210-Speed 9342.57 samples/sec   Loss 10.4098   LearningRate 0.0873   Epoch: 1   Global Step: 21990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:25:09,288-Speed 9498.10 samples/sec   Loss 10.5701   LearningRate 0.0873   Epoch: 1   Global Step: 22000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:25:31,191-[lfw][22000]XNorm: 14.017649
Training: 2022-04-11 12:25:31,192-[lfw][22000]Accuracy-Flip: 0.99267+-0.00410
Training: 2022-04-11 12:25:31,193-[lfw][22000]Accuracy-Highest: 0.99383
Training: 2022-04-11 12:25:56,456-[cfp_fp][22000]XNorm: 11.786355
Training: 2022-04-11 12:25:56,456-[cfp_fp][22000]Accuracy-Flip: 0.92629+-0.01027
Training: 2022-04-11 12:25:56,457-[cfp_fp][22000]Accuracy-Highest: 0.92629
Training: 2022-04-11 12:26:18,244-[agedb_30][22000]XNorm: 13.529340
Training: 2022-04-11 12:26:18,244-[agedb_30][22000]Accuracy-Flip: 0.92400+-0.01711
Training: 2022-04-11 12:26:18,245-[agedb_30][22000]Accuracy-Highest: 0.93433
Training: 2022-04-11 12:26:19,338-Speed 146.18 samples/sec   Loss 10.6185   LearningRate 0.0872   Epoch: 1   Global Step: 22010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:26:20,405-Speed 9601.90 samples/sec   Loss 10.5722   LearningRate 0.0872   Epoch: 1   Global Step: 22020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:21,526-Speed 9142.29 samples/sec   Loss 10.5398   LearningRate 0.0872   Epoch: 1   Global Step: 22030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:22,608-Speed 9470.54 samples/sec   Loss 10.4360   LearningRate 0.0872   Epoch: 1   Global Step: 22040   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:23,667-Speed 9671.48 samples/sec   Loss 10.5502   LearningRate 0.0872   Epoch: 1   Global Step: 22050   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:24,760-Speed 9374.10 samples/sec   Loss 10.4915   LearningRate 0.0872   Epoch: 1   Global Step: 22060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:25,843-Speed 9460.89 samples/sec   Loss 10.4926   LearningRate 0.0872   Epoch: 1   Global Step: 22070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:26,898-Speed 9713.26 samples/sec   Loss 10.5980   LearningRate 0.0872   Epoch: 1   Global Step: 22080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:27,950-Speed 9737.61 samples/sec   Loss 10.6281   LearningRate 0.0872   Epoch: 1   Global Step: 22090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:29,006-Speed 9703.04 samples/sec   Loss 10.5684   LearningRate 0.0872   Epoch: 1   Global Step: 22100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:30,087-Speed 9480.14 samples/sec   Loss 10.6833   LearningRate 0.0872   Epoch: 1   Global Step: 22110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:31,157-Speed 9575.90 samples/sec   Loss 10.5762   LearningRate 0.0872   Epoch: 1   Global Step: 22120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:32,230-Speed 9551.10 samples/sec   Loss 10.5693   LearningRate 0.0872   Epoch: 1   Global Step: 22130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:33,301-Speed 9569.62 samples/sec   Loss 10.5089   LearningRate 0.0872   Epoch: 1   Global Step: 22140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:34,365-Speed 9626.73 samples/sec   Loss 10.5593   LearningRate 0.0872   Epoch: 1   Global Step: 22150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:35,464-Speed 9319.52 samples/sec   Loss 10.5475   LearningRate 0.0872   Epoch: 1   Global Step: 22160   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:36,583-Speed 9159.43 samples/sec   Loss 10.4974   LearningRate 0.0872   Epoch: 1   Global Step: 22170   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:37,695-Speed 9212.44 samples/sec   Loss 10.4946   LearningRate 0.0872   Epoch: 1   Global Step: 22180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:38,739-Speed 9819.85 samples/sec   Loss 10.5188   LearningRate 0.0871   Epoch: 1   Global Step: 22190   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:40,057-Speed 7771.73 samples/sec   Loss 10.4372   LearningRate 0.0871   Epoch: 1   Global Step: 22200   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:41,209-Speed 8894.11 samples/sec   Loss 10.4903   LearningRate 0.0871   Epoch: 1   Global Step: 22210   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:42,314-Speed 9268.41 samples/sec   Loss 10.5758   LearningRate 0.0871   Epoch: 1   Global Step: 22220   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:26:43,419-Speed 9269.54 samples/sec   Loss 10.3990   LearningRate 0.0871   Epoch: 1   Global Step: 22230   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:44,507-Speed 9418.84 samples/sec   Loss 10.6372   LearningRate 0.0871   Epoch: 1   Global Step: 22240   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:45,573-Speed 9610.08 samples/sec   Loss 10.4960   LearningRate 0.0871   Epoch: 1   Global Step: 22250   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:46,643-Speed 9575.02 samples/sec   Loss 10.4013   LearningRate 0.0871   Epoch: 1   Global Step: 22260   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:47,716-Speed 9552.98 samples/sec   Loss 10.4872   LearningRate 0.0871   Epoch: 1   Global Step: 22270   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:48,858-Speed 8972.53 samples/sec   Loss 10.3963   LearningRate 0.0871   Epoch: 1   Global Step: 22280   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:49,919-Speed 9649.68 samples/sec   Loss 10.4794   LearningRate 0.0871   Epoch: 1   Global Step: 22290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:50,998-Speed 9502.10 samples/sec   Loss 10.5723   LearningRate 0.0871   Epoch: 1   Global Step: 22300   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:52,061-Speed 9632.34 samples/sec   Loss 10.4710   LearningRate 0.0871   Epoch: 1   Global Step: 22310   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:53,135-Speed 9548.93 samples/sec   Loss 10.4484   LearningRate 0.0871   Epoch: 1   Global Step: 22320   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:54,206-Speed 9561.71 samples/sec   Loss 10.5098   LearningRate 0.0871   Epoch: 1   Global Step: 22330   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:26:55,269-Speed 9644.83 samples/sec   Loss 10.4977   LearningRate 0.0871   Epoch: 1   Global Step: 22340   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:56,350-Speed 9484.88 samples/sec   Loss 10.4794   LearningRate 0.0871   Epoch: 1   Global Step: 22350   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:57,424-Speed 9544.63 samples/sec   Loss 10.3705   LearningRate 0.0871   Epoch: 1   Global Step: 22360   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:58,514-Speed 9397.11 samples/sec   Loss 10.4055   LearningRate 0.0870   Epoch: 1   Global Step: 22370   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:26:59,550-Speed 9895.21 samples/sec   Loss 10.4563   LearningRate 0.0870   Epoch: 1   Global Step: 22380   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:00,669-Speed 9162.84 samples/sec   Loss 10.5205   LearningRate 0.0870   Epoch: 1   Global Step: 22390   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:01,729-Speed 9665.90 samples/sec   Loss 10.4946   LearningRate 0.0870   Epoch: 1   Global Step: 22400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:02,797-Speed 9586.01 samples/sec   Loss 10.3338   LearningRate 0.0870   Epoch: 1   Global Step: 22410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:03,886-Speed 9411.88 samples/sec   Loss 10.4533   LearningRate 0.0870   Epoch: 1   Global Step: 22420   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:05,023-Speed 9015.70 samples/sec   Loss 10.4959   LearningRate 0.0870   Epoch: 1   Global Step: 22430   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:06,100-Speed 9507.49 samples/sec   Loss 10.4430   LearningRate 0.0870   Epoch: 1   Global Step: 22440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:27:07,182-Speed 9471.26 samples/sec   Loss 10.5298   LearningRate 0.0870   Epoch: 1   Global Step: 22450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:27:08,299-Speed 9171.74 samples/sec   Loss 10.5088   LearningRate 0.0870   Epoch: 1   Global Step: 22460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:27:09,434-Speed 9024.83 samples/sec   Loss 10.5200   LearningRate 0.0870   Epoch: 1   Global Step: 22470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:27:10,508-Speed 9538.51 samples/sec   Loss 10.4413   LearningRate 0.0870   Epoch: 1   Global Step: 22480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:27:11,567-Speed 9680.25 samples/sec   Loss 10.5200   LearningRate 0.0870   Epoch: 1   Global Step: 22490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:27:12,633-Speed 9653.08 samples/sec   Loss 10.4436   LearningRate 0.0870   Epoch: 1   Global Step: 22500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:27:13,718-Speed 9443.15 samples/sec   Loss 10.5221   LearningRate 0.0870   Epoch: 1   Global Step: 22510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:27:14,801-Speed 9471.04 samples/sec   Loss 10.4392   LearningRate 0.0870   Epoch: 1   Global Step: 22520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:27:15,840-Speed 9857.72 samples/sec   Loss 10.5514   LearningRate 0.0870   Epoch: 1   Global Step: 22530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-11 12:27:16,903-Speed 9638.39 samples/sec   Loss 10.5099   LearningRate 0.0870   Epoch: 1   Global Step: 22540   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:17,991-Speed 9420.41 samples/sec   Loss 10.4617   LearningRate 0.0869   Epoch: 1   Global Step: 22550   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:19,090-Speed 9322.59 samples/sec   Loss 10.4549   LearningRate 0.0869   Epoch: 1   Global Step: 22560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:20,212-Speed 9127.45 samples/sec   Loss 10.3980   LearningRate 0.0869   Epoch: 1   Global Step: 22570   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:21,287-Speed 9539.08 samples/sec   Loss 10.5740   LearningRate 0.0869   Epoch: 1   Global Step: 22580   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:22,362-Speed 9528.50 samples/sec   Loss 10.4127   LearningRate 0.0869   Epoch: 1   Global Step: 22590   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:23,436-Speed 9539.29 samples/sec   Loss 10.5041   LearningRate 0.0869   Epoch: 1   Global Step: 22600   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:24,530-Speed 9366.94 samples/sec   Loss 10.5665   LearningRate 0.0869   Epoch: 1   Global Step: 22610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:25,585-Speed 9715.13 samples/sec   Loss 10.4543   LearningRate 0.0869   Epoch: 1   Global Step: 22620   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:26,648-Speed 9636.80 samples/sec   Loss 10.4644   LearningRate 0.0869   Epoch: 1   Global Step: 22630   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:27,769-Speed 9138.08 samples/sec   Loss 10.5529   LearningRate 0.0869   Epoch: 1   Global Step: 22640   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:28,850-Speed 9479.90 samples/sec   Loss 10.4384   LearningRate 0.0869   Epoch: 1   Global Step: 22650   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:29,934-Speed 9455.11 samples/sec   Loss 10.5410   LearningRate 0.0869   Epoch: 1   Global Step: 22660   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:31,011-Speed 9508.98 samples/sec   Loss 10.4172   LearningRate 0.0869   Epoch: 1   Global Step: 22670   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:32,093-Speed 9469.39 samples/sec   Loss 10.4144   LearningRate 0.0869   Epoch: 1   Global Step: 22680   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:33,211-Speed 9165.84 samples/sec   Loss 10.4763   LearningRate 0.0869   Epoch: 1   Global Step: 22690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:34,326-Speed 9196.81 samples/sec   Loss 10.3885   LearningRate 0.0869   Epoch: 1   Global Step: 22700   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:35,401-Speed 9524.91 samples/sec   Loss 10.3788   LearningRate 0.0869   Epoch: 1   Global Step: 22710   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:27:36,496-Speed 9360.45 samples/sec   Loss 10.4153   LearningRate 0.0869   Epoch: 1   Global Step: 22720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:37,584-Speed 9417.91 samples/sec   Loss 10.4106   LearningRate 0.0868   Epoch: 1   Global Step: 22730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:38,658-Speed 9542.49 samples/sec   Loss 10.3428   LearningRate 0.0868   Epoch: 1   Global Step: 22740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:39,700-Speed 9830.88 samples/sec   Loss 10.5456   LearningRate 0.0868   Epoch: 1   Global Step: 22750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:40,798-Speed 9326.37 samples/sec   Loss 10.3381   LearningRate 0.0868   Epoch: 1   Global Step: 22760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:41,881-Speed 9466.40 samples/sec   Loss 10.3417   LearningRate 0.0868   Epoch: 1   Global Step: 22770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:42,965-Speed 9452.55 samples/sec   Loss 10.5312   LearningRate 0.0868   Epoch: 1   Global Step: 22780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:44,051-Speed 9431.80 samples/sec   Loss 10.3223   LearningRate 0.0868   Epoch: 1   Global Step: 22790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:45,151-Speed 9313.58 samples/sec   Loss 10.4199   LearningRate 0.0868   Epoch: 1   Global Step: 22800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:46,222-Speed 9568.79 samples/sec   Loss 10.4502   LearningRate 0.0868   Epoch: 1   Global Step: 22810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:47,335-Speed 9203.95 samples/sec   Loss 10.5242   LearningRate 0.0868   Epoch: 1   Global Step: 22820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:48,422-Speed 9428.87 samples/sec   Loss 10.4508   LearningRate 0.0868   Epoch: 1   Global Step: 22830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:49,539-Speed 9175.33 samples/sec   Loss 10.5605   LearningRate 0.0868   Epoch: 1   Global Step: 22840   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:27:50,583-Speed 9811.68 samples/sec   Loss 10.4451   LearningRate 0.0868   Epoch: 1   Global Step: 22850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:51,682-Speed 9326.83 samples/sec   Loss 10.4346   LearningRate 0.0868   Epoch: 1   Global Step: 22860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:52,743-Speed 9657.48 samples/sec   Loss 10.2977   LearningRate 0.0868   Epoch: 1   Global Step: 22870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:53,823-Speed 9487.75 samples/sec   Loss 10.4713   LearningRate 0.0868   Epoch: 1   Global Step: 22880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:54,935-Speed 9211.66 samples/sec   Loss 10.4657   LearningRate 0.0868   Epoch: 1   Global Step: 22890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:56,052-Speed 9174.59 samples/sec   Loss 10.3979   LearningRate 0.0868   Epoch: 1   Global Step: 22900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:57,129-Speed 9509.50 samples/sec   Loss 10.5161   LearningRate 0.0867   Epoch: 1   Global Step: 22910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:58,211-Speed 9472.26 samples/sec   Loss 10.5457   LearningRate 0.0867   Epoch: 1   Global Step: 22920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:27:59,286-Speed 9531.67 samples/sec   Loss 10.3816   LearningRate 0.0867   Epoch: 1   Global Step: 22930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:00,384-Speed 9334.81 samples/sec   Loss 10.4568   LearningRate 0.0867   Epoch: 1   Global Step: 22940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:01,469-Speed 9442.34 samples/sec   Loss 10.3981   LearningRate 0.0867   Epoch: 1   Global Step: 22950   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:28:02,549-Speed 9481.18 samples/sec   Loss 10.5083   LearningRate 0.0867   Epoch: 1   Global Step: 22960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:03,633-Speed 9459.29 samples/sec   Loss 10.3865   LearningRate 0.0867   Epoch: 1   Global Step: 22970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:04,687-Speed 9715.48 samples/sec   Loss 10.3331   LearningRate 0.0867   Epoch: 1   Global Step: 22980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:05,722-Speed 9899.25 samples/sec   Loss 10.3295   LearningRate 0.0867   Epoch: 1   Global Step: 22990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:06,807-Speed 9444.76 samples/sec   Loss 10.4023   LearningRate 0.0867   Epoch: 1   Global Step: 23000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:07,913-Speed 9264.52 samples/sec   Loss 10.4233   LearningRate 0.0867   Epoch: 1   Global Step: 23010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:08,991-Speed 9506.67 samples/sec   Loss 10.5148   LearningRate 0.0867   Epoch: 1   Global Step: 23020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:10,107-Speed 9179.82 samples/sec   Loss 10.4635   LearningRate 0.0867   Epoch: 1   Global Step: 23030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:11,183-Speed 9526.33 samples/sec   Loss 10.3380   LearningRate 0.0867   Epoch: 1   Global Step: 23040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:12,240-Speed 9696.96 samples/sec   Loss 10.6054   LearningRate 0.0867   Epoch: 1   Global Step: 23050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:13,282-Speed 9829.93 samples/sec   Loss 10.4399   LearningRate 0.0867   Epoch: 1   Global Step: 23060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:14,340-Speed 9685.42 samples/sec   Loss 10.4291   LearningRate 0.0867   Epoch: 1   Global Step: 23070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:15,418-Speed 9506.52 samples/sec   Loss 10.4541   LearningRate 0.0867   Epoch: 1   Global Step: 23080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:16,531-Speed 9205.34 samples/sec   Loss 10.3394   LearningRate 0.0866   Epoch: 1   Global Step: 23090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:17,599-Speed 9590.38 samples/sec   Loss 10.3851   LearningRate 0.0866   Epoch: 1   Global Step: 23100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:18,673-Speed 9543.22 samples/sec   Loss 10.4196   LearningRate 0.0866   Epoch: 1   Global Step: 23110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:19,758-Speed 9440.74 samples/sec   Loss 10.4841   LearningRate 0.0866   Epoch: 1   Global Step: 23120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:20,850-Speed 9381.42 samples/sec   Loss 10.4346   LearningRate 0.0866   Epoch: 1   Global Step: 23130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:21,940-Speed 9400.18 samples/sec   Loss 10.3589   LearningRate 0.0866   Epoch: 1   Global Step: 23140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:23,026-Speed 9448.18 samples/sec   Loss 10.4876   LearningRate 0.0866   Epoch: 1   Global Step: 23150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:24,136-Speed 9229.60 samples/sec   Loss 10.4672   LearningRate 0.0866   Epoch: 1   Global Step: 23160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:25,189-Speed 9732.52 samples/sec   Loss 10.3791   LearningRate 0.0866   Epoch: 1   Global Step: 23170   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:28:26,282-Speed 9369.41 samples/sec   Loss 10.2918   LearningRate 0.0866   Epoch: 1   Global Step: 23180   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:28:27,389-Speed 9259.65 samples/sec   Loss 10.3239   LearningRate 0.0866   Epoch: 1   Global Step: 23190   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:28:28,447-Speed 9680.82 samples/sec   Loss 10.4871   LearningRate 0.0866   Epoch: 1   Global Step: 23200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:29,547-Speed 9320.42 samples/sec   Loss 10.3638   LearningRate 0.0866   Epoch: 1   Global Step: 23210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:30,632-Speed 9445.21 samples/sec   Loss 10.4758   LearningRate 0.0866   Epoch: 1   Global Step: 23220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:31,717-Speed 9439.90 samples/sec   Loss 10.2838   LearningRate 0.0866   Epoch: 1   Global Step: 23230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:32,790-Speed 9549.67 samples/sec   Loss 10.4091   LearningRate 0.0866   Epoch: 1   Global Step: 23240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:33,941-Speed 8902.43 samples/sec   Loss 10.3885   LearningRate 0.0866   Epoch: 1   Global Step: 23250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:35,038-Speed 9344.99 samples/sec   Loss 10.3453   LearningRate 0.0865   Epoch: 1   Global Step: 23260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:36,162-Speed 9114.90 samples/sec   Loss 10.3331   LearningRate 0.0865   Epoch: 1   Global Step: 23270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:37,281-Speed 9154.13 samples/sec   Loss 10.4371   LearningRate 0.0865   Epoch: 1   Global Step: 23280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:38,329-Speed 9778.19 samples/sec   Loss 10.4002   LearningRate 0.0865   Epoch: 1   Global Step: 23290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:39,407-Speed 9504.06 samples/sec   Loss 10.3147   LearningRate 0.0865   Epoch: 1   Global Step: 23300   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:28:40,514-Speed 9249.04 samples/sec   Loss 10.4119   LearningRate 0.0865   Epoch: 1   Global Step: 23310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:41,604-Speed 9408.06 samples/sec   Loss 10.4838   LearningRate 0.0865   Epoch: 1   Global Step: 23320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:42,680-Speed 9523.25 samples/sec   Loss 10.3396   LearningRate 0.0865   Epoch: 1   Global Step: 23330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:43,818-Speed 9002.30 samples/sec   Loss 10.4057   LearningRate 0.0865   Epoch: 1   Global Step: 23340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:44,887-Speed 9584.57 samples/sec   Loss 10.4303   LearningRate 0.0865   Epoch: 1   Global Step: 23350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:45,988-Speed 9300.13 samples/sec   Loss 10.3897   LearningRate 0.0865   Epoch: 1   Global Step: 23360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:47,113-Speed 9106.95 samples/sec   Loss 10.4756   LearningRate 0.0865   Epoch: 1   Global Step: 23370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:48,211-Speed 9339.63 samples/sec   Loss 10.4864   LearningRate 0.0865   Epoch: 1   Global Step: 23380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:49,305-Speed 9369.88 samples/sec   Loss 10.4501   LearningRate 0.0865   Epoch: 1   Global Step: 23390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:50,386-Speed 9473.18 samples/sec   Loss 10.3423   LearningRate 0.0865   Epoch: 1   Global Step: 23400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:51,482-Speed 9351.04 samples/sec   Loss 10.4167   LearningRate 0.0865   Epoch: 1   Global Step: 23410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:52,573-Speed 9392.24 samples/sec   Loss 10.3492   LearningRate 0.0865   Epoch: 1   Global Step: 23420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:28:53,643-Speed 9578.36 samples/sec   Loss 10.2862   LearningRate 0.0865   Epoch: 1   Global Step: 23430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:54,801-Speed 8844.33 samples/sec   Loss 10.2811   LearningRate 0.0864   Epoch: 1   Global Step: 23440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:55,863-Speed 9648.99 samples/sec   Loss 10.4162   LearningRate 0.0864   Epoch: 1   Global Step: 23450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:56,907-Speed 9818.63 samples/sec   Loss 10.4633   LearningRate 0.0864   Epoch: 1   Global Step: 23460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:57,984-Speed 9512.20 samples/sec   Loss 10.3827   LearningRate 0.0864   Epoch: 1   Global Step: 23470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:28:59,046-Speed 9647.93 samples/sec   Loss 10.3232   LearningRate 0.0864   Epoch: 1   Global Step: 23480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:00,082-Speed 9888.83 samples/sec   Loss 10.4324   LearningRate 0.0864   Epoch: 1   Global Step: 23490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:01,138-Speed 9706.42 samples/sec   Loss 10.3440   LearningRate 0.0864   Epoch: 1   Global Step: 23500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:02,257-Speed 9152.18 samples/sec   Loss 10.4804   LearningRate 0.0864   Epoch: 1   Global Step: 23510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:03,357-Speed 9319.49 samples/sec   Loss 10.3063   LearningRate 0.0864   Epoch: 1   Global Step: 23520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:04,436-Speed 9493.82 samples/sec   Loss 10.4642   LearningRate 0.0864   Epoch: 1   Global Step: 23530   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:29:05,497-Speed 9660.11 samples/sec   Loss 10.2735   LearningRate 0.0864   Epoch: 1   Global Step: 23540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:06,592-Speed 9357.86 samples/sec   Loss 10.3615   LearningRate 0.0864   Epoch: 1   Global Step: 23550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:07,668-Speed 9526.74 samples/sec   Loss 10.3765   LearningRate 0.0864   Epoch: 1   Global Step: 23560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:08,723-Speed 9706.95 samples/sec   Loss 10.3680   LearningRate 0.0864   Epoch: 1   Global Step: 23570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:09,778-Speed 9715.47 samples/sec   Loss 10.4589   LearningRate 0.0864   Epoch: 1   Global Step: 23580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:10,878-Speed 9313.45 samples/sec   Loss 10.2711   LearningRate 0.0864   Epoch: 1   Global Step: 23590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:11,932-Speed 9721.54 samples/sec   Loss 10.3269   LearningRate 0.0864   Epoch: 1   Global Step: 23600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:13,008-Speed 9521.31 samples/sec   Loss 10.4981   LearningRate 0.0864   Epoch: 1   Global Step: 23610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:14,075-Speed 9604.75 samples/sec   Loss 10.2604   LearningRate 0.0863   Epoch: 1   Global Step: 23620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:15,209-Speed 9037.64 samples/sec   Loss 10.3717   LearningRate 0.0863   Epoch: 1   Global Step: 23630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:16,306-Speed 9338.96 samples/sec   Loss 10.3038   LearningRate 0.0863   Epoch: 1   Global Step: 23640   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:29:17,384-Speed 9505.35 samples/sec   Loss 10.2949   LearningRate 0.0863   Epoch: 1   Global Step: 23650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:18,458-Speed 9531.48 samples/sec   Loss 10.3498   LearningRate 0.0863   Epoch: 1   Global Step: 23660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:19,552-Speed 9367.49 samples/sec   Loss 10.3255   LearningRate 0.0863   Epoch: 1   Global Step: 23670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:20,650-Speed 9329.45 samples/sec   Loss 10.2648   LearningRate 0.0863   Epoch: 1   Global Step: 23680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:21,744-Speed 9366.11 samples/sec   Loss 10.2792   LearningRate 0.0863   Epoch: 1   Global Step: 23690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:22,848-Speed 9289.67 samples/sec   Loss 10.2730   LearningRate 0.0863   Epoch: 1   Global Step: 23700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:23,929-Speed 9475.70 samples/sec   Loss 10.4155   LearningRate 0.0863   Epoch: 1   Global Step: 23710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:25,011-Speed 9475.00 samples/sec   Loss 10.3391   LearningRate 0.0863   Epoch: 1   Global Step: 23720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:26,090-Speed 9498.74 samples/sec   Loss 10.4074   LearningRate 0.0863   Epoch: 1   Global Step: 23730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:27,188-Speed 9328.50 samples/sec   Loss 10.3183   LearningRate 0.0863   Epoch: 1   Global Step: 23740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:28,324-Speed 9022.00 samples/sec   Loss 10.2913   LearningRate 0.0863   Epoch: 1   Global Step: 23750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:29,427-Speed 9290.11 samples/sec   Loss 10.4577   LearningRate 0.0863   Epoch: 1   Global Step: 23760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:30,496-Speed 9584.92 samples/sec   Loss 10.3439   LearningRate 0.0863   Epoch: 1   Global Step: 23770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:31,575-Speed 9499.36 samples/sec   Loss 10.3414   LearningRate 0.0863   Epoch: 1   Global Step: 23780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:32,702-Speed 9087.09 samples/sec   Loss 10.2964   LearningRate 0.0863   Epoch: 1   Global Step: 23790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:33,787-Speed 9445.72 samples/sec   Loss 10.3920   LearningRate 0.0862   Epoch: 1   Global Step: 23800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:34,850-Speed 9635.24 samples/sec   Loss 10.2887   LearningRate 0.0862   Epoch: 1   Global Step: 23810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:35,929-Speed 9498.78 samples/sec   Loss 10.3817   LearningRate 0.0862   Epoch: 1   Global Step: 23820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:37,020-Speed 9388.65 samples/sec   Loss 10.4462   LearningRate 0.0862   Epoch: 1   Global Step: 23830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:38,056-Speed 9890.98 samples/sec   Loss 10.3404   LearningRate 0.0862   Epoch: 1   Global Step: 23840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:29:39,127-Speed 9561.93 samples/sec   Loss 10.3438   LearningRate 0.0862   Epoch: 1   Global Step: 23850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:29:40,239-Speed 9215.37 samples/sec   Loss 10.2881   LearningRate 0.0862   Epoch: 1   Global Step: 23860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:29:41,309-Speed 9582.41 samples/sec   Loss 10.3065   LearningRate 0.0862   Epoch: 1   Global Step: 23870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:29:42,385-Speed 9523.53 samples/sec   Loss 10.3817   LearningRate 0.0862   Epoch: 1   Global Step: 23880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:29:43,460-Speed 9529.93 samples/sec   Loss 10.4226   LearningRate 0.0862   Epoch: 1   Global Step: 23890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:29:44,525-Speed 9624.60 samples/sec   Loss 10.3662   LearningRate 0.0862   Epoch: 1   Global Step: 23900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:29:45,583-Speed 9687.20 samples/sec   Loss 10.2329   LearningRate 0.0862   Epoch: 1   Global Step: 23910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:29:46,647-Speed 9626.46 samples/sec   Loss 10.3397   LearningRate 0.0862   Epoch: 1   Global Step: 23920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:29:47,710-Speed 9637.74 samples/sec   Loss 10.3983   LearningRate 0.0862   Epoch: 1   Global Step: 23930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:29:48,821-Speed 9235.94 samples/sec   Loss 10.3939   LearningRate 0.0862   Epoch: 1   Global Step: 23940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:49,956-Speed 9024.90 samples/sec   Loss 10.1666   LearningRate 0.0862   Epoch: 1   Global Step: 23950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:51,042-Speed 9433.47 samples/sec   Loss 10.2557   LearningRate 0.0862   Epoch: 1   Global Step: 23960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:52,150-Speed 9250.61 samples/sec   Loss 10.5061   LearningRate 0.0862   Epoch: 1   Global Step: 23970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:53,254-Speed 9282.25 samples/sec   Loss 10.2282   LearningRate 0.0861   Epoch: 1   Global Step: 23980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:54,272-Speed 10060.13 samples/sec   Loss 10.3002   LearningRate 0.0861   Epoch: 1   Global Step: 23990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:29:55,363-Speed 9393.49 samples/sec   Loss 10.2970   LearningRate 0.0861   Epoch: 1   Global Step: 24000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:30:17,123-[lfw][24000]XNorm: 13.734053
Training: 2022-04-11 12:30:17,124-[lfw][24000]Accuracy-Flip: 0.99317+-0.00369
Training: 2022-04-11 12:30:17,124-[lfw][24000]Accuracy-Highest: 0.99383
Training: 2022-04-11 12:30:42,364-[cfp_fp][24000]XNorm: 11.648384
Training: 2022-04-11 12:30:42,365-[cfp_fp][24000]Accuracy-Flip: 0.93186+-0.01275
Training: 2022-04-11 12:30:42,366-[cfp_fp][24000]Accuracy-Highest: 0.93186
Training: 2022-04-11 12:31:04,111-[agedb_30][24000]XNorm: 13.287366
Training: 2022-04-11 12:31:04,112-[agedb_30][24000]Accuracy-Flip: 0.94283+-0.01145
Training: 2022-04-11 12:31:04,112-[agedb_30][24000]Accuracy-Highest: 0.94283
Training: 2022-04-11 12:31:05,203-Speed 146.62 samples/sec   Loss 10.2748   LearningRate 0.0861   Epoch: 1   Global Step: 24010   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:31:06,264-Speed 9661.05 samples/sec   Loss 10.3185   LearningRate 0.0861   Epoch: 1   Global Step: 24020   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:31:07,345-Speed 9478.34 samples/sec   Loss 10.3732   LearningRate 0.0861   Epoch: 1   Global Step: 24030   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:31:08,402-Speed 9691.85 samples/sec   Loss 10.3596   LearningRate 0.0861   Epoch: 1   Global Step: 24040   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:31:09,471-Speed 9579.90 samples/sec   Loss 10.3140   LearningRate 0.0861   Epoch: 1   Global Step: 24050   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:31:10,534-Speed 9638.30 samples/sec   Loss 10.2468   LearningRate 0.0861   Epoch: 1   Global Step: 24060   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:31:11,625-Speed 9400.64 samples/sec   Loss 10.2657   LearningRate 0.0861   Epoch: 1   Global Step: 24070   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:31:12,696-Speed 9567.33 samples/sec   Loss 10.3354   LearningRate 0.0861   Epoch: 1   Global Step: 24080   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:31:13,771-Speed 9528.62 samples/sec   Loss 10.3918   LearningRate 0.0861   Epoch: 1   Global Step: 24090   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:31:14,883-Speed 9209.88 samples/sec   Loss 10.3198   LearningRate 0.0861   Epoch: 1   Global Step: 24100   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:31:15,970-Speed 9426.02 samples/sec   Loss 10.2842   LearningRate 0.0861   Epoch: 1   Global Step: 24110   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:31:17,059-Speed 9416.36 samples/sec   Loss 10.2389   LearningRate 0.0861   Epoch: 1   Global Step: 24120   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:31:18,128-Speed 9585.28 samples/sec   Loss 10.3663   LearningRate 0.0861   Epoch: 1   Global Step: 24130   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:31:19,232-Speed 9276.70 samples/sec   Loss 10.3412   LearningRate 0.0861   Epoch: 1   Global Step: 24140   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:31:20,305-Speed 9548.54 samples/sec   Loss 10.4007   LearningRate 0.0861   Epoch: 1   Global Step: 24150   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-11 12:31:21,411-Speed 9265.08 samples/sec   Loss 10.1090   LearningRate 0.0860   Epoch: 1   Global Step: 24160   Fp16 Grad Scale: 262144   Required: 13 hours
Training: 2022-04-11 12:31:22,509-Speed 9331.14 samples/sec   Loss 10.3444   LearningRate 0.0860   Epoch: 1   Global Step: 24170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:23,570-Speed 9659.91 samples/sec   Loss 10.2378   LearningRate 0.0860   Epoch: 1   Global Step: 24180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:24,647-Speed 9509.16 samples/sec   Loss 10.2758   LearningRate 0.0860   Epoch: 1   Global Step: 24190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:25,739-Speed 9382.32 samples/sec   Loss 10.2772   LearningRate 0.0860   Epoch: 1   Global Step: 24200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:26,814-Speed 9534.64 samples/sec   Loss 10.2905   LearningRate 0.0860   Epoch: 1   Global Step: 24210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:27,905-Speed 9391.64 samples/sec   Loss 10.2088   LearningRate 0.0860   Epoch: 1   Global Step: 24220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:29,007-Speed 9295.45 samples/sec   Loss 10.1116   LearningRate 0.0860   Epoch: 1   Global Step: 24230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:30,066-Speed 9679.45 samples/sec   Loss 10.1976   LearningRate 0.0860   Epoch: 1   Global Step: 24240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:31,121-Speed 9713.94 samples/sec   Loss 10.1934   LearningRate 0.0860   Epoch: 1   Global Step: 24250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:32,158-Speed 9877.63 samples/sec   Loss 10.4111   LearningRate 0.0860   Epoch: 1   Global Step: 24260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:33,223-Speed 9618.16 samples/sec   Loss 10.3129   LearningRate 0.0860   Epoch: 1   Global Step: 24270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:34,331-Speed 9249.60 samples/sec   Loss 10.2311   LearningRate 0.0860   Epoch: 1   Global Step: 24280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:35,420-Speed 9417.52 samples/sec   Loss 10.2134   LearningRate 0.0860   Epoch: 1   Global Step: 24290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:36,518-Speed 9326.59 samples/sec   Loss 10.3142   LearningRate 0.0860   Epoch: 1   Global Step: 24300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:37,572-Speed 9721.99 samples/sec   Loss 10.2754   LearningRate 0.0860   Epoch: 1   Global Step: 24310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:38,683-Speed 9222.12 samples/sec   Loss 10.3866   LearningRate 0.0860   Epoch: 1   Global Step: 24320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:39,801-Speed 9162.64 samples/sec   Loss 10.2194   LearningRate 0.0860   Epoch: 1   Global Step: 24330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:40,931-Speed 9069.83 samples/sec   Loss 10.2737   LearningRate 0.0859   Epoch: 1   Global Step: 24340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:42,019-Speed 9414.71 samples/sec   Loss 10.3541   LearningRate 0.0859   Epoch: 1   Global Step: 24350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:43,104-Speed 9446.51 samples/sec   Loss 10.3393   LearningRate 0.0859   Epoch: 1   Global Step: 24360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:44,182-Speed 9504.11 samples/sec   Loss 10.2647   LearningRate 0.0859   Epoch: 1   Global Step: 24370   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:31:45,228-Speed 9796.67 samples/sec   Loss 10.3709   LearningRate 0.0859   Epoch: 1   Global Step: 24380   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:31:46,320-Speed 9384.87 samples/sec   Loss 10.3044   LearningRate 0.0859   Epoch: 1   Global Step: 24390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:47,441-Speed 9134.37 samples/sec   Loss 10.2129   LearningRate 0.0859   Epoch: 1   Global Step: 24400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:48,472-Speed 9938.51 samples/sec   Loss 10.3648   LearningRate 0.0859   Epoch: 1   Global Step: 24410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:49,563-Speed 9389.89 samples/sec   Loss 10.3042   LearningRate 0.0859   Epoch: 1   Global Step: 24420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:50,655-Speed 9384.18 samples/sec   Loss 10.2733   LearningRate 0.0859   Epoch: 1   Global Step: 24430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:51,739-Speed 9451.82 samples/sec   Loss 10.1586   LearningRate 0.0859   Epoch: 1   Global Step: 24440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:52,816-Speed 9517.95 samples/sec   Loss 10.2811   LearningRate 0.0859   Epoch: 1   Global Step: 24450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:53,922-Speed 9267.61 samples/sec   Loss 10.2797   LearningRate 0.0859   Epoch: 1   Global Step: 24460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:55,043-Speed 9140.98 samples/sec   Loss 10.1499   LearningRate 0.0859   Epoch: 1   Global Step: 24470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:56,090-Speed 9784.33 samples/sec   Loss 10.4613   LearningRate 0.0859   Epoch: 1   Global Step: 24480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:57,195-Speed 9274.38 samples/sec   Loss 10.1778   LearningRate 0.0859   Epoch: 1   Global Step: 24490   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:31:58,273-Speed 9509.88 samples/sec   Loss 10.2431   LearningRate 0.0859   Epoch: 1   Global Step: 24500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:31:59,348-Speed 9526.12 samples/sec   Loss 10.2044   LearningRate 0.0859   Epoch: 1   Global Step: 24510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:00,434-Speed 9433.89 samples/sec   Loss 10.4041   LearningRate 0.0858   Epoch: 1   Global Step: 24520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:01,506-Speed 9557.58 samples/sec   Loss 10.2936   LearningRate 0.0858   Epoch: 1   Global Step: 24530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:02,538-Speed 9930.26 samples/sec   Loss 10.1982   LearningRate 0.0858   Epoch: 1   Global Step: 24540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:03,635-Speed 9338.34 samples/sec   Loss 10.3205   LearningRate 0.0858   Epoch: 1   Global Step: 24550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:04,705-Speed 9589.59 samples/sec   Loss 10.4277   LearningRate 0.0858   Epoch: 1   Global Step: 24560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:05,791-Speed 9430.03 samples/sec   Loss 10.3227   LearningRate 0.0858   Epoch: 1   Global Step: 24570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:06,915-Speed 9122.41 samples/sec   Loss 10.2522   LearningRate 0.0858   Epoch: 1   Global Step: 24580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:08,043-Speed 9076.48 samples/sec   Loss 10.2981   LearningRate 0.0858   Epoch: 1   Global Step: 24590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:09,129-Speed 9435.59 samples/sec   Loss 10.2600   LearningRate 0.0858   Epoch: 1   Global Step: 24600   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:32:10,229-Speed 9313.56 samples/sec   Loss 10.2395   LearningRate 0.0858   Epoch: 1   Global Step: 24610   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:32:11,303-Speed 9542.45 samples/sec   Loss 10.3210   LearningRate 0.0858   Epoch: 1   Global Step: 24620   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:32:12,398-Speed 9358.50 samples/sec   Loss 10.1966   LearningRate 0.0858   Epoch: 1   Global Step: 24630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:13,467-Speed 9583.00 samples/sec   Loss 10.1968   LearningRate 0.0858   Epoch: 1   Global Step: 24640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:14,536-Speed 9587.96 samples/sec   Loss 10.1916   LearningRate 0.0858   Epoch: 1   Global Step: 24650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:15,611-Speed 9531.21 samples/sec   Loss 10.3482   LearningRate 0.0858   Epoch: 1   Global Step: 24660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:16,721-Speed 9229.96 samples/sec   Loss 10.2210   LearningRate 0.0858   Epoch: 1   Global Step: 24670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:17,815-Speed 9367.86 samples/sec   Loss 10.2028   LearningRate 0.0858   Epoch: 1   Global Step: 24680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:18,911-Speed 9344.33 samples/sec   Loss 10.3417   LearningRate 0.0858   Epoch: 1   Global Step: 24690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:19,996-Speed 9441.02 samples/sec   Loss 10.3139   LearningRate 0.0857   Epoch: 1   Global Step: 24700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:21,073-Speed 9515.51 samples/sec   Loss 10.2916   LearningRate 0.0857   Epoch: 1   Global Step: 24710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:22,129-Speed 9700.44 samples/sec   Loss 10.1658   LearningRate 0.0857   Epoch: 1   Global Step: 24720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:23,173-Speed 9818.08 samples/sec   Loss 10.2489   LearningRate 0.0857   Epoch: 1   Global Step: 24730   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:32:24,237-Speed 9631.22 samples/sec   Loss 10.2731   LearningRate 0.0857   Epoch: 1   Global Step: 24740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:25,304-Speed 9598.64 samples/sec   Loss 10.4324   LearningRate 0.0857   Epoch: 1   Global Step: 24750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:26,360-Speed 9703.40 samples/sec   Loss 10.2017   LearningRate 0.0857   Epoch: 1   Global Step: 24760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:27,408-Speed 9773.97 samples/sec   Loss 10.1716   LearningRate 0.0857   Epoch: 1   Global Step: 24770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:28,450-Speed 9843.69 samples/sec   Loss 10.1465   LearningRate 0.0857   Epoch: 1   Global Step: 24780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:29,549-Speed 9324.58 samples/sec   Loss 10.1483   LearningRate 0.0857   Epoch: 1   Global Step: 24790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:30,667-Speed 9163.62 samples/sec   Loss 10.3208   LearningRate 0.0857   Epoch: 1   Global Step: 24800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:31,724-Speed 9697.37 samples/sec   Loss 10.1908   LearningRate 0.0857   Epoch: 1   Global Step: 24810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:32,785-Speed 9657.80 samples/sec   Loss 10.1523   LearningRate 0.0857   Epoch: 1   Global Step: 24820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:33,861-Speed 9515.07 samples/sec   Loss 10.2906   LearningRate 0.0857   Epoch: 1   Global Step: 24830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:34,939-Speed 9510.60 samples/sec   Loss 10.2392   LearningRate 0.0857   Epoch: 1   Global Step: 24840   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:32:36,003-Speed 9631.75 samples/sec   Loss 10.2440   LearningRate 0.0857   Epoch: 1   Global Step: 24850   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:32:37,115-Speed 9215.29 samples/sec   Loss 10.1303   LearningRate 0.0857   Epoch: 1   Global Step: 24860   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:32:38,219-Speed 9273.94 samples/sec   Loss 10.2271   LearningRate 0.0857   Epoch: 1   Global Step: 24870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:39,327-Speed 9263.46 samples/sec   Loss 10.2352   LearningRate 0.0856   Epoch: 1   Global Step: 24880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:40,420-Speed 9375.66 samples/sec   Loss 10.2538   LearningRate 0.0856   Epoch: 1   Global Step: 24890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:41,546-Speed 9098.27 samples/sec   Loss 10.2217   LearningRate 0.0856   Epoch: 1   Global Step: 24900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:42,653-Speed 9257.10 samples/sec   Loss 10.4120   LearningRate 0.0856   Epoch: 1   Global Step: 24910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:43,725-Speed 9557.87 samples/sec   Loss 10.2801   LearningRate 0.0856   Epoch: 1   Global Step: 24920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:44,796-Speed 9565.59 samples/sec   Loss 10.1868   LearningRate 0.0856   Epoch: 1   Global Step: 24930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:32:45,872-Speed 9534.80 samples/sec   Loss 10.2288   LearningRate 0.0856   Epoch: 1   Global Step: 24940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:32:46,956-Speed 9457.44 samples/sec   Loss 10.1478   LearningRate 0.0856   Epoch: 1   Global Step: 24950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:32:48,028-Speed 9557.79 samples/sec   Loss 10.2918   LearningRate 0.0856   Epoch: 1   Global Step: 24960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:32:49,070-Speed 9828.10 samples/sec   Loss 10.2419   LearningRate 0.0856   Epoch: 1   Global Step: 24970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:32:50,190-Speed 9156.30 samples/sec   Loss 10.2173   LearningRate 0.0856   Epoch: 1   Global Step: 24980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:32:51,262-Speed 9555.55 samples/sec   Loss 10.3063   LearningRate 0.0856   Epoch: 1   Global Step: 24990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:32:52,368-Speed 9263.41 samples/sec   Loss 10.1940   LearningRate 0.0856   Epoch: 1   Global Step: 25000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:32:53,454-Speed 9437.82 samples/sec   Loss 10.2017   LearningRate 0.0856   Epoch: 1   Global Step: 25010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:32:54,530-Speed 9520.70 samples/sec   Loss 10.2602   LearningRate 0.0856   Epoch: 1   Global Step: 25020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:32:55,631-Speed 9302.44 samples/sec   Loss 10.1338   LearningRate 0.0856   Epoch: 1   Global Step: 25030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:56,728-Speed 9344.78 samples/sec   Loss 10.1588   LearningRate 0.0856   Epoch: 1   Global Step: 25040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:57,790-Speed 9642.34 samples/sec   Loss 10.2005   LearningRate 0.0856   Epoch: 1   Global Step: 25050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:58,848-Speed 9689.49 samples/sec   Loss 10.1808   LearningRate 0.0855   Epoch: 1   Global Step: 25060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:32:59,902-Speed 9720.17 samples/sec   Loss 10.1074   LearningRate 0.0855   Epoch: 1   Global Step: 25070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:01,025-Speed 9123.91 samples/sec   Loss 10.2182   LearningRate 0.0855   Epoch: 1   Global Step: 25080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:02,107-Speed 9467.28 samples/sec   Loss 10.0747   LearningRate 0.0855   Epoch: 1   Global Step: 25090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:03,215-Speed 9245.30 samples/sec   Loss 10.2481   LearningRate 0.0855   Epoch: 1   Global Step: 25100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:04,318-Speed 9293.52 samples/sec   Loss 10.1581   LearningRate 0.0855   Epoch: 1   Global Step: 25110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:05,387-Speed 9593.00 samples/sec   Loss 10.1250   LearningRate 0.0855   Epoch: 1   Global Step: 25120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:06,461-Speed 9536.43 samples/sec   Loss 10.1712   LearningRate 0.0855   Epoch: 1   Global Step: 25130   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:33:07,509-Speed 9775.48 samples/sec   Loss 10.1233   LearningRate 0.0855   Epoch: 1   Global Step: 25140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:08,604-Speed 9357.11 samples/sec   Loss 10.0820   LearningRate 0.0855   Epoch: 1   Global Step: 25150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:09,687-Speed 9462.15 samples/sec   Loss 10.2164   LearningRate 0.0855   Epoch: 1   Global Step: 25160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:10,762-Speed 9538.73 samples/sec   Loss 10.2448   LearningRate 0.0855   Epoch: 1   Global Step: 25170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:11,842-Speed 9483.02 samples/sec   Loss 10.1754   LearningRate 0.0855   Epoch: 1   Global Step: 25180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:12,938-Speed 9347.28 samples/sec   Loss 10.1369   LearningRate 0.0855   Epoch: 1   Global Step: 25190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:14,033-Speed 9355.42 samples/sec   Loss 10.1008   LearningRate 0.0855   Epoch: 1   Global Step: 25200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:15,104-Speed 9570.39 samples/sec   Loss 10.1534   LearningRate 0.0855   Epoch: 1   Global Step: 25210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:16,199-Speed 9355.45 samples/sec   Loss 10.1750   LearningRate 0.0855   Epoch: 1   Global Step: 25220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:17,307-Speed 9248.07 samples/sec   Loss 10.2709   LearningRate 0.0855   Epoch: 1   Global Step: 25230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:18,384-Speed 9519.26 samples/sec   Loss 10.1398   LearningRate 0.0854   Epoch: 1   Global Step: 25240   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:33:19,468-Speed 9450.60 samples/sec   Loss 10.1249   LearningRate 0.0854   Epoch: 1   Global Step: 25250   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:33:20,582-Speed 9191.99 samples/sec   Loss 10.1582   LearningRate 0.0854   Epoch: 1   Global Step: 25260   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:33:21,701-Speed 9157.09 samples/sec   Loss 10.1569   LearningRate 0.0854   Epoch: 1   Global Step: 25270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:22,809-Speed 9248.74 samples/sec   Loss 10.0831   LearningRate 0.0854   Epoch: 1   Global Step: 25280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:23,911-Speed 9297.04 samples/sec   Loss 10.2078   LearningRate 0.0854   Epoch: 1   Global Step: 25290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:24,986-Speed 9529.16 samples/sec   Loss 10.2295   LearningRate 0.0854   Epoch: 1   Global Step: 25300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:26,102-Speed 9188.86 samples/sec   Loss 9.9842   LearningRate 0.0854   Epoch: 1   Global Step: 25310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:27,220-Speed 9161.03 samples/sec   Loss 10.2942   LearningRate 0.0854   Epoch: 1   Global Step: 25320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:28,377-Speed 8861.37 samples/sec   Loss 10.2184   LearningRate 0.0854   Epoch: 1   Global Step: 25330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:29,459-Speed 9468.25 samples/sec   Loss 10.1595   LearningRate 0.0854   Epoch: 1   Global Step: 25340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:30,510-Speed 9742.63 samples/sec   Loss 10.2893   LearningRate 0.0854   Epoch: 1   Global Step: 25350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:31,553-Speed 9829.50 samples/sec   Loss 10.1897   LearningRate 0.0854   Epoch: 1   Global Step: 25360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:32,597-Speed 9806.76 samples/sec   Loss 10.1476   LearningRate 0.0854   Epoch: 1   Global Step: 25370   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:33:33,651-Speed 9725.13 samples/sec   Loss 10.2328   LearningRate 0.0854   Epoch: 1   Global Step: 25380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:34,770-Speed 9160.49 samples/sec   Loss 10.1433   LearningRate 0.0854   Epoch: 1   Global Step: 25390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:35,821-Speed 9746.02 samples/sec   Loss 10.2132   LearningRate 0.0854   Epoch: 1   Global Step: 25400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:36,888-Speed 9601.79 samples/sec   Loss 10.1114   LearningRate 0.0854   Epoch: 1   Global Step: 25410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:37,963-Speed 9531.59 samples/sec   Loss 10.2312   LearningRate 0.0854   Epoch: 1   Global Step: 25420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:39,099-Speed 9023.40 samples/sec   Loss 10.2477   LearningRate 0.0853   Epoch: 1   Global Step: 25430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:40,183-Speed 9447.93 samples/sec   Loss 10.1793   LearningRate 0.0853   Epoch: 1   Global Step: 25440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:41,251-Speed 9596.94 samples/sec   Loss 10.2038   LearningRate 0.0853   Epoch: 1   Global Step: 25450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:42,369-Speed 9156.51 samples/sec   Loss 10.1932   LearningRate 0.0853   Epoch: 1   Global Step: 25460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:43,474-Speed 9274.87 samples/sec   Loss 10.2018   LearningRate 0.0853   Epoch: 1   Global Step: 25470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:44,599-Speed 9108.56 samples/sec   Loss 10.2026   LearningRate 0.0853   Epoch: 1   Global Step: 25480   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:33:45,699-Speed 9318.93 samples/sec   Loss 10.2223   LearningRate 0.0853   Epoch: 1   Global Step: 25490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:46,787-Speed 9413.32 samples/sec   Loss 10.1370   LearningRate 0.0853   Epoch: 1   Global Step: 25500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:47,854-Speed 9602.96 samples/sec   Loss 10.1266   LearningRate 0.0853   Epoch: 1   Global Step: 25510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:48,928-Speed 9546.01 samples/sec   Loss 10.0250   LearningRate 0.0853   Epoch: 1   Global Step: 25520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:49,988-Speed 9663.55 samples/sec   Loss 10.0897   LearningRate 0.0853   Epoch: 1   Global Step: 25530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:51,116-Speed 9085.78 samples/sec   Loss 10.2813   LearningRate 0.0853   Epoch: 1   Global Step: 25540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:52,195-Speed 9493.41 samples/sec   Loss 10.2366   LearningRate 0.0853   Epoch: 1   Global Step: 25550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:53,251-Speed 9704.99 samples/sec   Loss 10.2582   LearningRate 0.0853   Epoch: 1   Global Step: 25560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:54,336-Speed 9443.43 samples/sec   Loss 10.0590   LearningRate 0.0853   Epoch: 1   Global Step: 25570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:55,432-Speed 9345.53 samples/sec   Loss 10.0920   LearningRate 0.0853   Epoch: 1   Global Step: 25580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:56,557-Speed 9101.36 samples/sec   Loss 10.1094   LearningRate 0.0853   Epoch: 1   Global Step: 25590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:57,606-Speed 9768.63 samples/sec   Loss 10.2430   LearningRate 0.0853   Epoch: 1   Global Step: 25600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:58,702-Speed 9358.48 samples/sec   Loss 10.0635   LearningRate 0.0852   Epoch: 1   Global Step: 25610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:33:59,795-Speed 9370.19 samples/sec   Loss 10.0968   LearningRate 0.0852   Epoch: 1   Global Step: 25620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:00,919-Speed 9115.61 samples/sec   Loss 10.1183   LearningRate 0.0852   Epoch: 1   Global Step: 25630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:02,005-Speed 9439.42 samples/sec   Loss 10.3032   LearningRate 0.0852   Epoch: 1   Global Step: 25640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:03,073-Speed 9599.50 samples/sec   Loss 10.1273   LearningRate 0.0852   Epoch: 1   Global Step: 25650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:04,149-Speed 9522.04 samples/sec   Loss 10.1185   LearningRate 0.0852   Epoch: 1   Global Step: 25660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:05,200-Speed 9752.59 samples/sec   Loss 10.0458   LearningRate 0.0852   Epoch: 1   Global Step: 25670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:06,287-Speed 9423.71 samples/sec   Loss 10.2324   LearningRate 0.0852   Epoch: 1   Global Step: 25680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:07,397-Speed 9228.92 samples/sec   Loss 10.0508   LearningRate 0.0852   Epoch: 1   Global Step: 25690   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:34:08,483-Speed 9434.52 samples/sec   Loss 10.2287   LearningRate 0.0852   Epoch: 1   Global Step: 25700   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:34:09,639-Speed 8868.78 samples/sec   Loss 10.0152   LearningRate 0.0852   Epoch: 1   Global Step: 25710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:10,735-Speed 9342.62 samples/sec   Loss 10.2103   LearningRate 0.0852   Epoch: 1   Global Step: 25720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:11,853-Speed 9170.18 samples/sec   Loss 10.2013   LearningRate 0.0852   Epoch: 1   Global Step: 25730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:12,936-Speed 9460.33 samples/sec   Loss 10.1466   LearningRate 0.0852   Epoch: 1   Global Step: 25740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:14,047-Speed 9217.61 samples/sec   Loss 10.0597   LearningRate 0.0852   Epoch: 1   Global Step: 25750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:15,146-Speed 9328.60 samples/sec   Loss 10.1323   LearningRate 0.0852   Epoch: 1   Global Step: 25760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:16,202-Speed 9699.69 samples/sec   Loss 9.9995   LearningRate 0.0852   Epoch: 1   Global Step: 25770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:17,280-Speed 9507.65 samples/sec   Loss 10.1769   LearningRate 0.0852   Epoch: 1   Global Step: 25780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:18,368-Speed 9417.04 samples/sec   Loss 10.1072   LearningRate 0.0851   Epoch: 1   Global Step: 25790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:19,448-Speed 9485.13 samples/sec   Loss 10.1634   LearningRate 0.0851   Epoch: 1   Global Step: 25800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:20,506-Speed 9683.94 samples/sec   Loss 10.1837   LearningRate 0.0851   Epoch: 1   Global Step: 25810   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:34:21,612-Speed 9266.48 samples/sec   Loss 10.0252   LearningRate 0.0851   Epoch: 1   Global Step: 25820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:22,672-Speed 9663.20 samples/sec   Loss 10.0243   LearningRate 0.0851   Epoch: 1   Global Step: 25830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:23,713-Speed 9844.59 samples/sec   Loss 10.1306   LearningRate 0.0851   Epoch: 1   Global Step: 25840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:24,785-Speed 9561.78 samples/sec   Loss 10.0585   LearningRate 0.0851   Epoch: 1   Global Step: 25850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:25,843-Speed 9686.50 samples/sec   Loss 10.1257   LearningRate 0.0851   Epoch: 1   Global Step: 25860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:26,908-Speed 9614.58 samples/sec   Loss 10.0814   LearningRate 0.0851   Epoch: 1   Global Step: 25870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:28,042-Speed 9039.55 samples/sec   Loss 9.9582   LearningRate 0.0851   Epoch: 1   Global Step: 25880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:29,119-Speed 9511.67 samples/sec   Loss 10.1009   LearningRate 0.0851   Epoch: 1   Global Step: 25890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:30,211-Speed 9388.49 samples/sec   Loss 10.1269   LearningRate 0.0851   Epoch: 1   Global Step: 25900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:31,313-Speed 9296.88 samples/sec   Loss 10.0865   LearningRate 0.0851   Epoch: 1   Global Step: 25910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:32,388-Speed 9523.88 samples/sec   Loss 10.1059   LearningRate 0.0851   Epoch: 1   Global Step: 25920   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:34:33,464-Speed 9529.88 samples/sec   Loss 10.1539   LearningRate 0.0851   Epoch: 1   Global Step: 25930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:34,556-Speed 9380.69 samples/sec   Loss 10.0344   LearningRate 0.0851   Epoch: 1   Global Step: 25940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:35,652-Speed 9349.85 samples/sec   Loss 10.0931   LearningRate 0.0851   Epoch: 1   Global Step: 25950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:36,754-Speed 9296.91 samples/sec   Loss 9.9897   LearningRate 0.0851   Epoch: 1   Global Step: 25960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:37,819-Speed 9621.92 samples/sec   Loss 9.9910   LearningRate 0.0850   Epoch: 1   Global Step: 25970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:38,925-Speed 9260.29 samples/sec   Loss 10.1812   LearningRate 0.0850   Epoch: 1   Global Step: 25980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:40,029-Speed 9283.06 samples/sec   Loss 9.9923   LearningRate 0.0850   Epoch: 1   Global Step: 25990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:34:41,120-Speed 9396.35 samples/sec   Loss 10.1747   LearningRate 0.0850   Epoch: 1   Global Step: 26000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:35:03,182-[lfw][26000]XNorm: 14.049655
Training: 2022-04-11 12:35:03,182-[lfw][26000]Accuracy-Flip: 0.99317+-0.00302
Training: 2022-04-11 12:35:03,183-[lfw][26000]Accuracy-Highest: 0.99383
Training: 2022-04-11 12:35:28,655-[cfp_fp][26000]XNorm: 11.619706
Training: 2022-04-11 12:35:28,655-[cfp_fp][26000]Accuracy-Flip: 0.93614+-0.01509
Training: 2022-04-11 12:35:28,656-[cfp_fp][26000]Accuracy-Highest: 0.93614
Training: 2022-04-11 12:35:50,672-[agedb_30][26000]XNorm: 13.422707
Training: 2022-04-11 12:35:50,673-[agedb_30][26000]Accuracy-Flip: 0.94400+-0.00892
Training: 2022-04-11 12:35:50,673-[agedb_30][26000]Accuracy-Highest: 0.94400
Training: 2022-04-11 12:35:51,795-Speed 144.89 samples/sec   Loss 10.0688   LearningRate 0.0850   Epoch: 1   Global Step: 26010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:35:52,848-Speed 9728.87 samples/sec   Loss 9.8276   LearningRate 0.0850   Epoch: 1   Global Step: 26020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:35:53,938-Speed 9404.96 samples/sec   Loss 10.0729   LearningRate 0.0850   Epoch: 1   Global Step: 26030   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:35:55,040-Speed 9292.66 samples/sec   Loss 10.0992   LearningRate 0.0850   Epoch: 1   Global Step: 26040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:35:56,104-Speed 9637.04 samples/sec   Loss 10.1112   LearningRate 0.0850   Epoch: 1   Global Step: 26050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:35:57,188-Speed 9447.12 samples/sec   Loss 10.1035   LearningRate 0.0850   Epoch: 1   Global Step: 26060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:35:58,288-Speed 9318.70 samples/sec   Loss 10.1875   LearningRate 0.0850   Epoch: 1   Global Step: 26070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:35:59,366-Speed 9505.46 samples/sec   Loss 10.0487   LearningRate 0.0850   Epoch: 1   Global Step: 26080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:00,413-Speed 9788.30 samples/sec   Loss 10.1959   LearningRate 0.0850   Epoch: 1   Global Step: 26090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:01,492-Speed 9488.36 samples/sec   Loss 9.9845   LearningRate 0.0850   Epoch: 1   Global Step: 26100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:02,561-Speed 9587.11 samples/sec   Loss 10.0784   LearningRate 0.0850   Epoch: 1   Global Step: 26110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:03,624-Speed 9640.27 samples/sec   Loss 10.0609   LearningRate 0.0850   Epoch: 1   Global Step: 26120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:04,743-Speed 9150.61 samples/sec   Loss 10.0633   LearningRate 0.0850   Epoch: 1   Global Step: 26130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:05,786-Speed 9827.08 samples/sec   Loss 10.1050   LearningRate 0.0850   Epoch: 1   Global Step: 26140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:06,841-Speed 9712.82 samples/sec   Loss 10.1604   LearningRate 0.0849   Epoch: 1   Global Step: 26150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:07,928-Speed 9433.79 samples/sec   Loss 10.2028   LearningRate 0.0849   Epoch: 1   Global Step: 26160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:09,014-Speed 9430.27 samples/sec   Loss 10.0521   LearningRate 0.0849   Epoch: 1   Global Step: 26170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:10,145-Speed 9063.55 samples/sec   Loss 10.1578   LearningRate 0.0849   Epoch: 1   Global Step: 26180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:11,229-Speed 9444.18 samples/sec   Loss 10.0532   LearningRate 0.0849   Epoch: 1   Global Step: 26190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:12,288-Speed 9677.03 samples/sec   Loss 10.0856   LearningRate 0.0849   Epoch: 1   Global Step: 26200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:13,371-Speed 9462.70 samples/sec   Loss 10.0665   LearningRate 0.0849   Epoch: 1   Global Step: 26210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:14,487-Speed 9183.38 samples/sec   Loss 10.0165   LearningRate 0.0849   Epoch: 1   Global Step: 26220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:15,580-Speed 9377.20 samples/sec   Loss 10.0652   LearningRate 0.0849   Epoch: 1   Global Step: 26230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:16,675-Speed 9354.59 samples/sec   Loss 10.0263   LearningRate 0.0849   Epoch: 1   Global Step: 26240   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:36:17,728-Speed 9730.19 samples/sec   Loss 9.9913   LearningRate 0.0849   Epoch: 1   Global Step: 26250   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:36:18,856-Speed 9080.91 samples/sec   Loss 10.0916   LearningRate 0.0849   Epoch: 1   Global Step: 26260   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:36:19,913-Speed 9699.49 samples/sec   Loss 10.0293   LearningRate 0.0849   Epoch: 1   Global Step: 26270   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:36:21,015-Speed 9291.29 samples/sec   Loss 10.0257   LearningRate 0.0849   Epoch: 1   Global Step: 26280   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:36:22,111-Speed 9348.43 samples/sec   Loss 10.1668   LearningRate 0.0849   Epoch: 1   Global Step: 26290   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:36:23,208-Speed 9340.42 samples/sec   Loss 10.0246   LearningRate 0.0849   Epoch: 1   Global Step: 26300   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:36:24,297-Speed 9406.35 samples/sec   Loss 9.9714   LearningRate 0.0849   Epoch: 1   Global Step: 26310   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:36:25,405-Speed 9249.97 samples/sec   Loss 9.9603   LearningRate 0.0849   Epoch: 1   Global Step: 26320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:26,504-Speed 9333.14 samples/sec   Loss 10.2102   LearningRate 0.0848   Epoch: 1   Global Step: 26330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:27,592-Speed 9421.16 samples/sec   Loss 9.9052   LearningRate 0.0848   Epoch: 1   Global Step: 26340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:28,714-Speed 9124.46 samples/sec   Loss 9.8990   LearningRate 0.0848   Epoch: 1   Global Step: 26350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:36:29,819-Speed 9278.65 samples/sec   Loss 9.9747   LearningRate 0.0848   Epoch: 1   Global Step: 26360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:36:30,936-Speed 9166.60 samples/sec   Loss 10.0562   LearningRate 0.0848   Epoch: 1   Global Step: 26370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:36:32,006-Speed 9574.37 samples/sec   Loss 9.9547   LearningRate 0.0848   Epoch: 1   Global Step: 26380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:36:33,104-Speed 9335.94 samples/sec   Loss 10.0615   LearningRate 0.0848   Epoch: 1   Global Step: 26390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:36:34,200-Speed 9349.07 samples/sec   Loss 9.9837   LearningRate 0.0848   Epoch: 1   Global Step: 26400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:36:35,286-Speed 9430.30 samples/sec   Loss 10.0306   LearningRate 0.0848   Epoch: 1   Global Step: 26410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:36:36,386-Speed 9311.17 samples/sec   Loss 10.0765   LearningRate 0.0848   Epoch: 1   Global Step: 26420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:36:37,486-Speed 9323.02 samples/sec   Loss 9.9343   LearningRate 0.0848   Epoch: 1   Global Step: 26430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:36:38,597-Speed 9215.09 samples/sec   Loss 10.0799   LearningRate 0.0848   Epoch: 1   Global Step: 26440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:36:39,686-Speed 9412.71 samples/sec   Loss 9.9477   LearningRate 0.0848   Epoch: 1   Global Step: 26450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:40,763-Speed 9513.22 samples/sec   Loss 10.1436   LearningRate 0.0848   Epoch: 1   Global Step: 26460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:41,821-Speed 9687.07 samples/sec   Loss 10.1571   LearningRate 0.0848   Epoch: 1   Global Step: 26470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:42,896-Speed 9530.91 samples/sec   Loss 9.9304   LearningRate 0.0848   Epoch: 1   Global Step: 26480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:43,968-Speed 9556.19 samples/sec   Loss 10.0393   LearningRate 0.0848   Epoch: 1   Global Step: 26490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:45,053-Speed 9444.46 samples/sec   Loss 9.9978   LearningRate 0.0848   Epoch: 1   Global Step: 26500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:46,125-Speed 9562.83 samples/sec   Loss 10.0830   LearningRate 0.0847   Epoch: 1   Global Step: 26510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:47,209-Speed 9446.71 samples/sec   Loss 10.0801   LearningRate 0.0847   Epoch: 1   Global Step: 26520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:48,286-Speed 9520.18 samples/sec   Loss 10.0059   LearningRate 0.0847   Epoch: 1   Global Step: 26530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:49,363-Speed 9516.18 samples/sec   Loss 9.9751   LearningRate 0.0847   Epoch: 1   Global Step: 26540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:50,425-Speed 9648.06 samples/sec   Loss 9.9776   LearningRate 0.0847   Epoch: 1   Global Step: 26550   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:36:51,529-Speed 9277.27 samples/sec   Loss 9.9661   LearningRate 0.0847   Epoch: 1   Global Step: 26560   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:36:52,608-Speed 9497.45 samples/sec   Loss 10.0462   LearningRate 0.0847   Epoch: 1   Global Step: 26570   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:36:53,703-Speed 9353.62 samples/sec   Loss 10.0591   LearningRate 0.0847   Epoch: 1   Global Step: 26580   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:36:54,766-Speed 9647.66 samples/sec   Loss 10.0799   LearningRate 0.0847   Epoch: 1   Global Step: 26590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:55,833-Speed 9607.98 samples/sec   Loss 10.0339   LearningRate 0.0847   Epoch: 1   Global Step: 26600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:56,891-Speed 9691.92 samples/sec   Loss 10.0778   LearningRate 0.0847   Epoch: 1   Global Step: 26610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:57,935-Speed 9810.53 samples/sec   Loss 10.1204   LearningRate 0.0847   Epoch: 1   Global Step: 26620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:36:59,000-Speed 9618.09 samples/sec   Loss 9.9669   LearningRate 0.0847   Epoch: 1   Global Step: 26630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:00,068-Speed 9594.32 samples/sec   Loss 9.9708   LearningRate 0.0847   Epoch: 1   Global Step: 26640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:01,162-Speed 9371.28 samples/sec   Loss 9.9764   LearningRate 0.0847   Epoch: 1   Global Step: 26650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:02,221-Speed 9673.86 samples/sec   Loss 10.0013   LearningRate 0.0847   Epoch: 1   Global Step: 26660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:03,269-Speed 9775.02 samples/sec   Loss 10.1088   LearningRate 0.0847   Epoch: 1   Global Step: 26670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:04,345-Speed 9521.98 samples/sec   Loss 10.0185   LearningRate 0.0847   Epoch: 1   Global Step: 26680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:05,396-Speed 9751.21 samples/sec   Loss 9.9978   LearningRate 0.0846   Epoch: 1   Global Step: 26690   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:37:06,475-Speed 9494.17 samples/sec   Loss 10.0263   LearningRate 0.0846   Epoch: 1   Global Step: 26700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:07,540-Speed 9628.41 samples/sec   Loss 9.9378   LearningRate 0.0846   Epoch: 1   Global Step: 26710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:08,621-Speed 9476.57 samples/sec   Loss 9.9922   LearningRate 0.0846   Epoch: 1   Global Step: 26720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:09,675-Speed 9718.71 samples/sec   Loss 9.9514   LearningRate 0.0846   Epoch: 1   Global Step: 26730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:10,711-Speed 9894.69 samples/sec   Loss 10.0078   LearningRate 0.0846   Epoch: 1   Global Step: 26740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:11,801-Speed 9399.73 samples/sec   Loss 10.0679   LearningRate 0.0846   Epoch: 1   Global Step: 26750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:12,927-Speed 9096.01 samples/sec   Loss 10.0237   LearningRate 0.0846   Epoch: 1   Global Step: 26760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:13,988-Speed 9659.78 samples/sec   Loss 10.0274   LearningRate 0.0846   Epoch: 1   Global Step: 26770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:15,043-Speed 9712.66 samples/sec   Loss 10.0140   LearningRate 0.0846   Epoch: 1   Global Step: 26780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:16,124-Speed 9480.36 samples/sec   Loss 10.1123   LearningRate 0.0846   Epoch: 1   Global Step: 26790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:17,196-Speed 9550.92 samples/sec   Loss 10.1631   LearningRate 0.0846   Epoch: 1   Global Step: 26800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:18,286-Speed 9400.25 samples/sec   Loss 10.1681   LearningRate 0.0846   Epoch: 1   Global Step: 26810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:19,377-Speed 9394.12 samples/sec   Loss 10.0142   LearningRate 0.0846   Epoch: 1   Global Step: 26820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:20,494-Speed 9170.18 samples/sec   Loss 10.0365   LearningRate 0.0846   Epoch: 1   Global Step: 26830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:21,594-Speed 9320.15 samples/sec   Loss 9.8891   LearningRate 0.0846   Epoch: 1   Global Step: 26840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:22,711-Speed 9169.72 samples/sec   Loss 9.9426   LearningRate 0.0846   Epoch: 1   Global Step: 26850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:23,796-Speed 9446.45 samples/sec   Loss 10.0418   LearningRate 0.0846   Epoch: 1   Global Step: 26860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:24,944-Speed 8929.45 samples/sec   Loss 10.0490   LearningRate 0.0845   Epoch: 1   Global Step: 26870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:26,019-Speed 9533.13 samples/sec   Loss 10.0492   LearningRate 0.0845   Epoch: 1   Global Step: 26880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:27,128-Speed 9233.42 samples/sec   Loss 9.9413   LearningRate 0.0845   Epoch: 1   Global Step: 26890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:28,154-Speed 9988.86 samples/sec   Loss 10.0283   LearningRate 0.0845   Epoch: 1   Global Step: 26900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:29,233-Speed 9496.77 samples/sec   Loss 9.8660   LearningRate 0.0845   Epoch: 1   Global Step: 26910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:30,312-Speed 9492.14 samples/sec   Loss 9.8875   LearningRate 0.0845   Epoch: 1   Global Step: 26920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:31,384-Speed 9562.57 samples/sec   Loss 9.9279   LearningRate 0.0845   Epoch: 1   Global Step: 26930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:32,483-Speed 9322.93 samples/sec   Loss 10.1063   LearningRate 0.0845   Epoch: 1   Global Step: 26940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:33,608-Speed 9100.21 samples/sec   Loss 9.9585   LearningRate 0.0845   Epoch: 1   Global Step: 26950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:34,700-Speed 9386.75 samples/sec   Loss 9.9617   LearningRate 0.0845   Epoch: 1   Global Step: 26960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:35,807-Speed 9259.32 samples/sec   Loss 10.0042   LearningRate 0.0845   Epoch: 1   Global Step: 26970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:36,898-Speed 9391.06 samples/sec   Loss 10.0912   LearningRate 0.0845   Epoch: 1   Global Step: 26980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:37,981-Speed 9462.79 samples/sec   Loss 9.9638   LearningRate 0.0845   Epoch: 1   Global Step: 26990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:39,051-Speed 9573.91 samples/sec   Loss 10.0546   LearningRate 0.0845   Epoch: 1   Global Step: 27000   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:37:40,107-Speed 9705.46 samples/sec   Loss 9.9517   LearningRate 0.0845   Epoch: 1   Global Step: 27010   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:37:41,218-Speed 9218.72 samples/sec   Loss 9.9245   LearningRate 0.0845   Epoch: 1   Global Step: 27020   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:37:42,299-Speed 9486.92 samples/sec   Loss 10.0468   LearningRate 0.0845   Epoch: 1   Global Step: 27030   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:37:43,390-Speed 9388.15 samples/sec   Loss 9.9209   LearningRate 0.0845   Epoch: 1   Global Step: 27040   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:37:44,467-Speed 9524.99 samples/sec   Loss 9.9395   LearningRate 0.0845   Epoch: 1   Global Step: 27050   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:37:45,561-Speed 9363.85 samples/sec   Loss 10.0654   LearningRate 0.0844   Epoch: 1   Global Step: 27060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:46,664-Speed 9286.13 samples/sec   Loss 10.0459   LearningRate 0.0844   Epoch: 1   Global Step: 27070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:47,737-Speed 9548.75 samples/sec   Loss 9.8473   LearningRate 0.0844   Epoch: 1   Global Step: 27080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:48,856-Speed 9155.39 samples/sec   Loss 9.9951   LearningRate 0.0844   Epoch: 1   Global Step: 27090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:49,927-Speed 9570.20 samples/sec   Loss 9.9768   LearningRate 0.0844   Epoch: 1   Global Step: 27100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:51,028-Speed 9306.24 samples/sec   Loss 9.9888   LearningRate 0.0844   Epoch: 1   Global Step: 27110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:52,096-Speed 9587.97 samples/sec   Loss 9.9965   LearningRate 0.0844   Epoch: 1   Global Step: 27120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:53,227-Speed 9060.97 samples/sec   Loss 10.0133   LearningRate 0.0844   Epoch: 1   Global Step: 27130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:54,299-Speed 9554.45 samples/sec   Loss 10.0079   LearningRate 0.0844   Epoch: 1   Global Step: 27140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:55,369-Speed 9576.82 samples/sec   Loss 9.8037   LearningRate 0.0844   Epoch: 1   Global Step: 27150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:56,449-Speed 9481.87 samples/sec   Loss 9.8827   LearningRate 0.0844   Epoch: 1   Global Step: 27160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:57,536-Speed 9438.82 samples/sec   Loss 10.1171   LearningRate 0.0844   Epoch: 1   Global Step: 27170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:37:58,582-Speed 9787.68 samples/sec   Loss 9.9871   LearningRate 0.0844   Epoch: 1   Global Step: 27180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:37:59,687-Speed 9273.10 samples/sec   Loss 9.8889   LearningRate 0.0844   Epoch: 1   Global Step: 27190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:00,762-Speed 9538.15 samples/sec   Loss 9.9999   LearningRate 0.0844   Epoch: 1   Global Step: 27200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:01,842-Speed 9486.95 samples/sec   Loss 9.8751   LearningRate 0.0844   Epoch: 1   Global Step: 27210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:02,946-Speed 9279.68 samples/sec   Loss 9.8742   LearningRate 0.0844   Epoch: 1   Global Step: 27220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:04,028-Speed 9469.90 samples/sec   Loss 10.0553   LearningRate 0.0844   Epoch: 1   Global Step: 27230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:05,190-Speed 8819.30 samples/sec   Loss 10.0157   LearningRate 0.0843   Epoch: 1   Global Step: 27240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:06,304-Speed 9194.36 samples/sec   Loss 9.8818   LearningRate 0.0843   Epoch: 1   Global Step: 27250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:07,379-Speed 9533.85 samples/sec   Loss 10.0201   LearningRate 0.0843   Epoch: 1   Global Step: 27260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:08,455-Speed 9524.70 samples/sec   Loss 9.9848   LearningRate 0.0843   Epoch: 1   Global Step: 27270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:09,524-Speed 9583.97 samples/sec   Loss 10.0171   LearningRate 0.0843   Epoch: 1   Global Step: 27280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:10,586-Speed 9644.09 samples/sec   Loss 9.9775   LearningRate 0.0843   Epoch: 1   Global Step: 27290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:11,674-Speed 9417.06 samples/sec   Loss 9.8774   LearningRate 0.0843   Epoch: 1   Global Step: 27300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:12,787-Speed 9209.06 samples/sec   Loss 10.0837   LearningRate 0.0843   Epoch: 1   Global Step: 27310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:13,853-Speed 9612.89 samples/sec   Loss 9.9041   LearningRate 0.0843   Epoch: 1   Global Step: 27320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:14,963-Speed 9233.79 samples/sec   Loss 9.9356   LearningRate 0.0843   Epoch: 1   Global Step: 27330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:16,036-Speed 9550.46 samples/sec   Loss 9.9079   LearningRate 0.0843   Epoch: 1   Global Step: 27340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:17,096-Speed 9659.15 samples/sec   Loss 9.9574   LearningRate 0.0843   Epoch: 1   Global Step: 27350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:18,192-Speed 9350.65 samples/sec   Loss 10.0757   LearningRate 0.0843   Epoch: 1   Global Step: 27360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:19,283-Speed 9392.79 samples/sec   Loss 10.0034   LearningRate 0.0843   Epoch: 1   Global Step: 27370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:20,393-Speed 9231.39 samples/sec   Loss 9.9204   LearningRate 0.0843   Epoch: 1   Global Step: 27380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:21,493-Speed 9314.21 samples/sec   Loss 9.8857   LearningRate 0.0843   Epoch: 1   Global Step: 27390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:22,593-Speed 9323.59 samples/sec   Loss 9.8660   LearningRate 0.0843   Epoch: 1   Global Step: 27400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:23,708-Speed 9187.46 samples/sec   Loss 9.8000   LearningRate 0.0843   Epoch: 1   Global Step: 27410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:38:24,794-Speed 9427.55 samples/sec   Loss 9.9048   LearningRate 0.0842   Epoch: 1   Global Step: 27420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:25,858-Speed 9630.86 samples/sec   Loss 9.9282   LearningRate 0.0842   Epoch: 1   Global Step: 27430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:26,933-Speed 9535.89 samples/sec   Loss 10.1023   LearningRate 0.0842   Epoch: 1   Global Step: 27440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:28,017-Speed 9458.10 samples/sec   Loss 9.9186   LearningRate 0.0842   Epoch: 1   Global Step: 27450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:29,099-Speed 9464.68 samples/sec   Loss 9.9907   LearningRate 0.0842   Epoch: 1   Global Step: 27460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:30,219-Speed 9146.36 samples/sec   Loss 9.8790   LearningRate 0.0842   Epoch: 1   Global Step: 27470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:31,312-Speed 9374.19 samples/sec   Loss 9.9777   LearningRate 0.0842   Epoch: 1   Global Step: 27480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:32,406-Speed 9363.91 samples/sec   Loss 10.0124   LearningRate 0.0842   Epoch: 1   Global Step: 27490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:33,503-Speed 9345.98 samples/sec   Loss 9.9034   LearningRate 0.0842   Epoch: 1   Global Step: 27500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:34,570-Speed 9600.79 samples/sec   Loss 9.9298   LearningRate 0.0842   Epoch: 1   Global Step: 27510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:35,683-Speed 9205.04 samples/sec   Loss 9.9015   LearningRate 0.0842   Epoch: 1   Global Step: 27520   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:38:36,753-Speed 9577.25 samples/sec   Loss 9.9984   LearningRate 0.0842   Epoch: 1   Global Step: 27530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:37,852-Speed 9329.06 samples/sec   Loss 10.0656   LearningRate 0.0842   Epoch: 1   Global Step: 27540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:38,970-Speed 9165.15 samples/sec   Loss 10.0126   LearningRate 0.0842   Epoch: 1   Global Step: 27550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:40,043-Speed 9551.61 samples/sec   Loss 9.9536   LearningRate 0.0842   Epoch: 1   Global Step: 27560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:41,147-Speed 9279.99 samples/sec   Loss 9.9427   LearningRate 0.0842   Epoch: 1   Global Step: 27570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:42,237-Speed 9398.87 samples/sec   Loss 9.9984   LearningRate 0.0842   Epoch: 1   Global Step: 27580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:43,319-Speed 9469.75 samples/sec   Loss 9.9106   LearningRate 0.0842   Epoch: 1   Global Step: 27590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:44,413-Speed 9367.11 samples/sec   Loss 9.9183   LearningRate 0.0841   Epoch: 1   Global Step: 27600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:45,493-Speed 9488.60 samples/sec   Loss 9.9036   LearningRate 0.0841   Epoch: 1   Global Step: 27610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:46,587-Speed 9362.33 samples/sec   Loss 9.9777   LearningRate 0.0841   Epoch: 1   Global Step: 27620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:47,670-Speed 9465.22 samples/sec   Loss 9.7919   LearningRate 0.0841   Epoch: 1   Global Step: 27630   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:38:48,722-Speed 9739.20 samples/sec   Loss 9.8366   LearningRate 0.0841   Epoch: 1   Global Step: 27640   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:38:49,808-Speed 9432.95 samples/sec   Loss 9.9307   LearningRate 0.0841   Epoch: 1   Global Step: 27650   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:38:50,898-Speed 9396.05 samples/sec   Loss 9.8410   LearningRate 0.0841   Epoch: 1   Global Step: 27660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:51,973-Speed 9533.13 samples/sec   Loss 9.9027   LearningRate 0.0841   Epoch: 1   Global Step: 27670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:53,044-Speed 9570.39 samples/sec   Loss 9.9146   LearningRate 0.0841   Epoch: 1   Global Step: 27680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:54,180-Speed 9017.03 samples/sec   Loss 9.7807   LearningRate 0.0841   Epoch: 1   Global Step: 27690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:55,256-Speed 9520.20 samples/sec   Loss 9.9441   LearningRate 0.0841   Epoch: 1   Global Step: 27700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:56,356-Speed 9316.70 samples/sec   Loss 9.8531   LearningRate 0.0841   Epoch: 1   Global Step: 27710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:57,460-Speed 9282.03 samples/sec   Loss 9.8548   LearningRate 0.0841   Epoch: 1   Global Step: 27720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:58,567-Speed 9262.29 samples/sec   Loss 9.7866   LearningRate 0.0841   Epoch: 1   Global Step: 27730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:38:59,681-Speed 9193.54 samples/sec   Loss 9.8516   LearningRate 0.0841   Epoch: 1   Global Step: 27740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:00,775-Speed 9383.40 samples/sec   Loss 9.8570   LearningRate 0.0841   Epoch: 1   Global Step: 27750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:01,872-Speed 9336.10 samples/sec   Loss 9.9685   LearningRate 0.0841   Epoch: 1   Global Step: 27760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:02,974-Speed 9309.36 samples/sec   Loss 9.9362   LearningRate 0.0841   Epoch: 1   Global Step: 27770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:04,055-Speed 9484.24 samples/sec   Loss 9.8887   LearningRate 0.0840   Epoch: 1   Global Step: 27780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:05,114-Speed 9667.84 samples/sec   Loss 9.9470   LearningRate 0.0840   Epoch: 1   Global Step: 27790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:06,230-Speed 9184.27 samples/sec   Loss 9.8526   LearningRate 0.0840   Epoch: 1   Global Step: 27800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:07,339-Speed 9238.02 samples/sec   Loss 9.8043   LearningRate 0.0840   Epoch: 1   Global Step: 27810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:08,432-Speed 9384.79 samples/sec   Loss 9.8880   LearningRate 0.0840   Epoch: 1   Global Step: 27820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:09,534-Speed 9295.91 samples/sec   Loss 9.7993   LearningRate 0.0840   Epoch: 1   Global Step: 27830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:10,657-Speed 9123.11 samples/sec   Loss 9.7936   LearningRate 0.0840   Epoch: 1   Global Step: 27840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:11,739-Speed 9465.35 samples/sec   Loss 9.9932   LearningRate 0.0840   Epoch: 1   Global Step: 27850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:12,847-Speed 9250.35 samples/sec   Loss 9.9120   LearningRate 0.0840   Epoch: 1   Global Step: 27860   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:39:13,936-Speed 9410.09 samples/sec   Loss 9.9262   LearningRate 0.0840   Epoch: 1   Global Step: 27870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:15,057-Speed 9137.88 samples/sec   Loss 9.8769   LearningRate 0.0840   Epoch: 1   Global Step: 27880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:16,155-Speed 9338.09 samples/sec   Loss 9.9240   LearningRate 0.0840   Epoch: 1   Global Step: 27890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:17,224-Speed 9584.21 samples/sec   Loss 9.8329   LearningRate 0.0840   Epoch: 1   Global Step: 27900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:18,312-Speed 9414.42 samples/sec   Loss 9.8530   LearningRate 0.0840   Epoch: 1   Global Step: 27910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:19,388-Speed 9523.11 samples/sec   Loss 9.8343   LearningRate 0.0840   Epoch: 1   Global Step: 27920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:20,510-Speed 9137.17 samples/sec   Loss 9.8701   LearningRate 0.0840   Epoch: 1   Global Step: 27930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:21,626-Speed 9180.15 samples/sec   Loss 9.8124   LearningRate 0.0840   Epoch: 1   Global Step: 27940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:22,770-Speed 8956.35 samples/sec   Loss 9.9604   LearningRate 0.0840   Epoch: 1   Global Step: 27950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:23,863-Speed 9367.16 samples/sec   Loss 9.8978   LearningRate 0.0839   Epoch: 1   Global Step: 27960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:24,953-Speed 9405.20 samples/sec   Loss 9.9326   LearningRate 0.0839   Epoch: 1   Global Step: 27970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:26,024-Speed 9561.62 samples/sec   Loss 9.8772   LearningRate 0.0839   Epoch: 1   Global Step: 27980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:39:27,118-Speed 9370.26 samples/sec   Loss 9.9454   LearningRate 0.0839   Epoch: 1   Global Step: 27990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:39:28,229-Speed 9221.30 samples/sec   Loss 9.7739   LearningRate 0.0839   Epoch: 1   Global Step: 28000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:39:50,209-[lfw][28000]XNorm: 13.534033
Training: 2022-04-11 12:39:50,210-[lfw][28000]Accuracy-Flip: 0.99450+-0.00299
Training: 2022-04-11 12:39:50,211-[lfw][28000]Accuracy-Highest: 0.99450
Training: 2022-04-11 12:40:15,659-[cfp_fp][28000]XNorm: 11.307848
Training: 2022-04-11 12:40:15,660-[cfp_fp][28000]Accuracy-Flip: 0.93257+-0.01217
Training: 2022-04-11 12:40:15,660-[cfp_fp][28000]Accuracy-Highest: 0.93614
Training: 2022-04-11 12:40:37,543-[agedb_30][28000]XNorm: 13.094939
Training: 2022-04-11 12:40:37,544-[agedb_30][28000]Accuracy-Flip: 0.94133+-0.01273
Training: 2022-04-11 12:40:37,544-[agedb_30][28000]Accuracy-Highest: 0.94400
Training: 2022-04-11 12:40:38,618-Speed 145.48 samples/sec   Loss 9.9535   LearningRate 0.0839   Epoch: 1   Global Step: 28010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:40:39,687-Speed 9584.55 samples/sec   Loss 9.8609   LearningRate 0.0839   Epoch: 1   Global Step: 28020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:40:40,734-Speed 9785.39 samples/sec   Loss 9.8625   LearningRate 0.0839   Epoch: 1   Global Step: 28030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:40:41,818-Speed 9450.62 samples/sec   Loss 9.8539   LearningRate 0.0839   Epoch: 1   Global Step: 28040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:40:42,869-Speed 9745.79 samples/sec   Loss 9.7582   LearningRate 0.0839   Epoch: 1   Global Step: 28050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:40:43,915-Speed 9804.05 samples/sec   Loss 9.8632   LearningRate 0.0839   Epoch: 1   Global Step: 28060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:40:44,964-Speed 9766.87 samples/sec   Loss 9.7644   LearningRate 0.0839   Epoch: 1   Global Step: 28070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:40:46,045-Speed 9480.48 samples/sec   Loss 9.7484   LearningRate 0.0839   Epoch: 1   Global Step: 28080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:40:47,172-Speed 9086.41 samples/sec   Loss 9.8030   LearningRate 0.0839   Epoch: 1   Global Step: 28090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:40:48,270-Speed 9332.01 samples/sec   Loss 9.9332   LearningRate 0.0839   Epoch: 1   Global Step: 28100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:40:49,342-Speed 9560.92 samples/sec   Loss 9.8449   LearningRate 0.0839   Epoch: 1   Global Step: 28110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:40:50,444-Speed 9297.45 samples/sec   Loss 9.8789   LearningRate 0.0839   Epoch: 1   Global Step: 28120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:40:51,507-Speed 9635.25 samples/sec   Loss 9.8481   LearningRate 0.0839   Epoch: 1   Global Step: 28130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:40:52,632-Speed 9109.93 samples/sec   Loss 9.9771   LearningRate 0.0839   Epoch: 1   Global Step: 28140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:40:53,768-Speed 9017.49 samples/sec   Loss 9.7107   LearningRate 0.0838   Epoch: 1   Global Step: 28150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:40:54,850-Speed 9469.02 samples/sec   Loss 9.7940   LearningRate 0.0838   Epoch: 1   Global Step: 28160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:40:55,906-Speed 9702.64 samples/sec   Loss 9.8586   LearningRate 0.0838   Epoch: 1   Global Step: 28170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:40:56,977-Speed 9571.03 samples/sec   Loss 9.8399   LearningRate 0.0838   Epoch: 1   Global Step: 28180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:40:58,070-Speed 9370.52 samples/sec   Loss 9.8361   LearningRate 0.0838   Epoch: 1   Global Step: 28190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:40:59,179-Speed 9238.44 samples/sec   Loss 9.8645   LearningRate 0.0838   Epoch: 1   Global Step: 28200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:00,316-Speed 9015.30 samples/sec   Loss 9.7977   LearningRate 0.0838   Epoch: 1   Global Step: 28210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:01,423-Speed 9250.11 samples/sec   Loss 9.8031   LearningRate 0.0838   Epoch: 1   Global Step: 28220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:02,501-Speed 9509.27 samples/sec   Loss 9.7855   LearningRate 0.0838   Epoch: 1   Global Step: 28230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:03,680-Speed 8687.49 samples/sec   Loss 9.9462   LearningRate 0.0838   Epoch: 1   Global Step: 28240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:41:04,776-Speed 9349.88 samples/sec   Loss 9.8401   LearningRate 0.0838   Epoch: 1   Global Step: 28250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:41:05,854-Speed 9507.07 samples/sec   Loss 9.7712   LearningRate 0.0838   Epoch: 1   Global Step: 28260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:41:06,958-Speed 9286.11 samples/sec   Loss 9.8307   LearningRate 0.0838   Epoch: 1   Global Step: 28270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:41:08,055-Speed 9337.68 samples/sec   Loss 9.8938   LearningRate 0.0838   Epoch: 1   Global Step: 28280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:41:09,142-Speed 9425.36 samples/sec   Loss 9.8761   LearningRate 0.0838   Epoch: 1   Global Step: 28290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:41:10,246-Speed 9287.04 samples/sec   Loss 9.9945   LearningRate 0.0838   Epoch: 1   Global Step: 28300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:41:11,307-Speed 9656.50 samples/sec   Loss 10.0122   LearningRate 0.0838   Epoch: 1   Global Step: 28310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:41:12,411-Speed 9272.05 samples/sec   Loss 10.0450   LearningRate 0.0838   Epoch: 1   Global Step: 28320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:41:13,476-Speed 9624.71 samples/sec   Loss 9.8663   LearningRate 0.0837   Epoch: 1   Global Step: 28330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:41:14,587-Speed 9227.51 samples/sec   Loss 9.7223   LearningRate 0.0837   Epoch: 1   Global Step: 28340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:16,582-Speed 5133.30 samples/sec   Loss 9.7920   LearningRate 0.0837   Epoch: 1   Global Step: 28350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:17,685-Speed 9286.86 samples/sec   Loss 9.8426   LearningRate 0.0837   Epoch: 1   Global Step: 28360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:18,769-Speed 9455.93 samples/sec   Loss 9.8703   LearningRate 0.0837   Epoch: 1   Global Step: 28370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:19,874-Speed 9268.96 samples/sec   Loss 9.7349   LearningRate 0.0837   Epoch: 1   Global Step: 28380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:20,929-Speed 9717.51 samples/sec   Loss 9.8571   LearningRate 0.0837   Epoch: 1   Global Step: 28390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:21,990-Speed 9649.13 samples/sec   Loss 9.8200   LearningRate 0.0837   Epoch: 1   Global Step: 28400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:23,027-Speed 9881.83 samples/sec   Loss 9.8949   LearningRate 0.0837   Epoch: 1   Global Step: 28410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:24,125-Speed 9335.62 samples/sec   Loss 9.7652   LearningRate 0.0837   Epoch: 1   Global Step: 28420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:25,162-Speed 9873.62 samples/sec   Loss 9.9524   LearningRate 0.0837   Epoch: 1   Global Step: 28430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:26,223-Speed 9660.33 samples/sec   Loss 9.7685   LearningRate 0.0837   Epoch: 1   Global Step: 28440   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:41:27,312-Speed 9411.49 samples/sec   Loss 9.8903   LearningRate 0.0837   Epoch: 1   Global Step: 28450   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:41:28,388-Speed 9519.46 samples/sec   Loss 9.7405   LearningRate 0.0837   Epoch: 1   Global Step: 28460   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:41:29,461-Speed 9548.61 samples/sec   Loss 9.8726   LearningRate 0.0837   Epoch: 1   Global Step: 28470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:30,546-Speed 9440.93 samples/sec   Loss 9.7551   LearningRate 0.0837   Epoch: 1   Global Step: 28480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:31,600-Speed 9722.58 samples/sec   Loss 9.7869   LearningRate 0.0837   Epoch: 1   Global Step: 28490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:32,681-Speed 9482.56 samples/sec   Loss 9.6997   LearningRate 0.0837   Epoch: 1   Global Step: 28500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:33,808-Speed 9086.36 samples/sec   Loss 9.8193   LearningRate 0.0836   Epoch: 1   Global Step: 28510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:34,883-Speed 9530.67 samples/sec   Loss 9.7777   LearningRate 0.0836   Epoch: 1   Global Step: 28520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:36,029-Speed 8943.06 samples/sec   Loss 9.8635   LearningRate 0.0836   Epoch: 1   Global Step: 28530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:37,156-Speed 9092.14 samples/sec   Loss 9.7635   LearningRate 0.0836   Epoch: 1   Global Step: 28540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:38,254-Speed 9338.06 samples/sec   Loss 9.9154   LearningRate 0.0836   Epoch: 1   Global Step: 28550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:39,382-Speed 9082.09 samples/sec   Loss 9.8291   LearningRate 0.0836   Epoch: 1   Global Step: 28560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:40,495-Speed 9203.71 samples/sec   Loss 9.8292   LearningRate 0.0836   Epoch: 1   Global Step: 28570   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:41:41,567-Speed 9562.25 samples/sec   Loss 9.7328   LearningRate 0.0836   Epoch: 1   Global Step: 28580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:42,659-Speed 9379.69 samples/sec   Loss 9.6902   LearningRate 0.0836   Epoch: 1   Global Step: 28590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:43,732-Speed 9545.05 samples/sec   Loss 9.8115   LearningRate 0.0836   Epoch: 1   Global Step: 28600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:44,828-Speed 9361.55 samples/sec   Loss 9.8403   LearningRate 0.0836   Epoch: 1   Global Step: 28610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:45,878-Speed 9756.13 samples/sec   Loss 9.8243   LearningRate 0.0836   Epoch: 1   Global Step: 28620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:46,993-Speed 9190.98 samples/sec   Loss 9.8158   LearningRate 0.0836   Epoch: 1   Global Step: 28630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:48,111-Speed 9168.18 samples/sec   Loss 9.9877   LearningRate 0.0836   Epoch: 1   Global Step: 28640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:49,226-Speed 9187.91 samples/sec   Loss 9.8399   LearningRate 0.0836   Epoch: 1   Global Step: 28650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:50,285-Speed 9673.07 samples/sec   Loss 9.8902   LearningRate 0.0836   Epoch: 1   Global Step: 28660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:51,387-Speed 9292.82 samples/sec   Loss 9.7609   LearningRate 0.0836   Epoch: 1   Global Step: 28670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:52,458-Speed 9574.09 samples/sec   Loss 9.7296   LearningRate 0.0836   Epoch: 1   Global Step: 28680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:53,551-Speed 9376.23 samples/sec   Loss 9.7665   LearningRate 0.0835   Epoch: 1   Global Step: 28690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:54,637-Speed 9435.98 samples/sec   Loss 9.7391   LearningRate 0.0835   Epoch: 1   Global Step: 28700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:55,755-Speed 9165.28 samples/sec   Loss 9.8400   LearningRate 0.0835   Epoch: 1   Global Step: 28710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:56,849-Speed 9370.35 samples/sec   Loss 9.7430   LearningRate 0.0835   Epoch: 1   Global Step: 28720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:41:57,895-Speed 9794.10 samples/sec   Loss 9.8481   LearningRate 0.0835   Epoch: 1   Global Step: 28730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:41:58,968-Speed 9547.70 samples/sec   Loss 9.8500   LearningRate 0.0835   Epoch: 1   Global Step: 28740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:00,062-Speed 9369.27 samples/sec   Loss 9.7217   LearningRate 0.0835   Epoch: 1   Global Step: 28750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:01,131-Speed 9583.60 samples/sec   Loss 9.8039   LearningRate 0.0835   Epoch: 1   Global Step: 28760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:02,218-Speed 9426.02 samples/sec   Loss 9.7475   LearningRate 0.0835   Epoch: 1   Global Step: 28770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:03,300-Speed 9461.34 samples/sec   Loss 9.7692   LearningRate 0.0835   Epoch: 1   Global Step: 28780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:04,394-Speed 9381.20 samples/sec   Loss 9.8589   LearningRate 0.0835   Epoch: 1   Global Step: 28790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:05,497-Speed 9286.81 samples/sec   Loss 9.7850   LearningRate 0.0835   Epoch: 1   Global Step: 28800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:06,588-Speed 9391.21 samples/sec   Loss 9.7573   LearningRate 0.0835   Epoch: 1   Global Step: 28810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:07,680-Speed 9381.45 samples/sec   Loss 9.7311   LearningRate 0.0835   Epoch: 1   Global Step: 28820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:08,794-Speed 9199.78 samples/sec   Loss 9.7634   LearningRate 0.0835   Epoch: 1   Global Step: 28830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:09,866-Speed 9559.34 samples/sec   Loss 9.7618   LearningRate 0.0835   Epoch: 1   Global Step: 28840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:10,951-Speed 9445.04 samples/sec   Loss 9.7612   LearningRate 0.0835   Epoch: 1   Global Step: 28850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:12,044-Speed 9374.60 samples/sec   Loss 9.7169   LearningRate 0.0835   Epoch: 1   Global Step: 28860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:13,120-Speed 9520.99 samples/sec   Loss 9.8200   LearningRate 0.0835   Epoch: 1   Global Step: 28870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:14,211-Speed 9393.16 samples/sec   Loss 9.8483   LearningRate 0.0834   Epoch: 1   Global Step: 28880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:15,311-Speed 9322.03 samples/sec   Loss 9.9056   LearningRate 0.0834   Epoch: 1   Global Step: 28890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:16,394-Speed 9468.80 samples/sec   Loss 9.7909   LearningRate 0.0834   Epoch: 1   Global Step: 28900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:17,463-Speed 9584.30 samples/sec   Loss 9.7840   LearningRate 0.0834   Epoch: 1   Global Step: 28910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:18,572-Speed 9244.16 samples/sec   Loss 9.7927   LearningRate 0.0834   Epoch: 1   Global Step: 28920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:19,649-Speed 9510.48 samples/sec   Loss 9.7789   LearningRate 0.0834   Epoch: 1   Global Step: 28930   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:42:20,749-Speed 9320.82 samples/sec   Loss 9.7503   LearningRate 0.0834   Epoch: 1   Global Step: 28940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:21,854-Speed 9265.81 samples/sec   Loss 9.8231   LearningRate 0.0834   Epoch: 1   Global Step: 28950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:22,911-Speed 9700.17 samples/sec   Loss 9.8450   LearningRate 0.0834   Epoch: 1   Global Step: 28960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:24,027-Speed 9179.94 samples/sec   Loss 9.5556   LearningRate 0.0834   Epoch: 1   Global Step: 28970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:25,099-Speed 9559.14 samples/sec   Loss 9.8143   LearningRate 0.0834   Epoch: 1   Global Step: 28980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:26,149-Speed 9752.13 samples/sec   Loss 9.7728   LearningRate 0.0834   Epoch: 1   Global Step: 28990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:27,219-Speed 9575.67 samples/sec   Loss 9.6987   LearningRate 0.0834   Epoch: 1   Global Step: 29000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:28,293-Speed 9546.75 samples/sec   Loss 9.8010   LearningRate 0.0834   Epoch: 1   Global Step: 29010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:29,359-Speed 9614.40 samples/sec   Loss 9.7725   LearningRate 0.0834   Epoch: 1   Global Step: 29020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:30,428-Speed 9578.72 samples/sec   Loss 9.7744   LearningRate 0.0834   Epoch: 1   Global Step: 29030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:31,533-Speed 9271.99 samples/sec   Loss 9.9010   LearningRate 0.0834   Epoch: 1   Global Step: 29040   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:42:32,629-Speed 9351.61 samples/sec   Loss 9.8040   LearningRate 0.0834   Epoch: 1   Global Step: 29050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:33,749-Speed 9154.07 samples/sec   Loss 9.7437   LearningRate 0.0833   Epoch: 1   Global Step: 29060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:34,834-Speed 9438.05 samples/sec   Loss 9.7267   LearningRate 0.0833   Epoch: 1   Global Step: 29070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:35,900-Speed 9617.74 samples/sec   Loss 9.9015   LearningRate 0.0833   Epoch: 1   Global Step: 29080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:36,945-Speed 9800.36 samples/sec   Loss 9.6954   LearningRate 0.0833   Epoch: 1   Global Step: 29090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:38,038-Speed 9382.23 samples/sec   Loss 9.7252   LearningRate 0.0833   Epoch: 1   Global Step: 29100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:39,150-Speed 9214.37 samples/sec   Loss 9.7184   LearningRate 0.0833   Epoch: 1   Global Step: 29110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:40,199-Speed 9762.48 samples/sec   Loss 9.7969   LearningRate 0.0833   Epoch: 1   Global Step: 29120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:41,280-Speed 9477.66 samples/sec   Loss 9.6626   LearningRate 0.0833   Epoch: 1   Global Step: 29130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:42,346-Speed 9613.43 samples/sec   Loss 9.7126   LearningRate 0.0833   Epoch: 1   Global Step: 29140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:43,448-Speed 9302.15 samples/sec   Loss 9.7062   LearningRate 0.0833   Epoch: 1   Global Step: 29150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:44,594-Speed 8947.12 samples/sec   Loss 9.7326   LearningRate 0.0833   Epoch: 1   Global Step: 29160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:45,639-Speed 9808.83 samples/sec   Loss 9.7612   LearningRate 0.0833   Epoch: 1   Global Step: 29170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:46,705-Speed 9607.75 samples/sec   Loss 9.7020   LearningRate 0.0833   Epoch: 1   Global Step: 29180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:47,801-Speed 9343.46 samples/sec   Loss 9.7697   LearningRate 0.0833   Epoch: 1   Global Step: 29190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:48,973-Speed 8748.74 samples/sec   Loss 9.8007   LearningRate 0.0833   Epoch: 1   Global Step: 29200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:50,111-Speed 9006.36 samples/sec   Loss 9.8041   LearningRate 0.0833   Epoch: 1   Global Step: 29210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:51,217-Speed 9265.25 samples/sec   Loss 9.7313   LearningRate 0.0833   Epoch: 1   Global Step: 29220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:52,320-Speed 9285.89 samples/sec   Loss 9.7267   LearningRate 0.0833   Epoch: 1   Global Step: 29230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:42:53,376-Speed 9711.17 samples/sec   Loss 9.8131   LearningRate 0.0832   Epoch: 1   Global Step: 29240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:54,476-Speed 9311.84 samples/sec   Loss 9.7691   LearningRate 0.0832   Epoch: 1   Global Step: 29250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:55,588-Speed 9213.48 samples/sec   Loss 9.7768   LearningRate 0.0832   Epoch: 1   Global Step: 29260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:56,647-Speed 9672.84 samples/sec   Loss 9.8007   LearningRate 0.0832   Epoch: 1   Global Step: 29270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:57,742-Speed 9355.82 samples/sec   Loss 9.7375   LearningRate 0.0832   Epoch: 1   Global Step: 29280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:58,903-Speed 8821.91 samples/sec   Loss 9.7665   LearningRate 0.0832   Epoch: 1   Global Step: 29290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:42:59,999-Speed 9355.95 samples/sec   Loss 9.8489   LearningRate 0.0832   Epoch: 1   Global Step: 29300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:43:01,045-Speed 9799.90 samples/sec   Loss 9.8709   LearningRate 0.0832   Epoch: 1   Global Step: 29310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:43:02,136-Speed 9386.71 samples/sec   Loss 9.8000   LearningRate 0.0832   Epoch: 1   Global Step: 29320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:43:03,211-Speed 9536.28 samples/sec   Loss 9.7903   LearningRate 0.0832   Epoch: 1   Global Step: 29330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:43:04,276-Speed 9622.47 samples/sec   Loss 9.6465   LearningRate 0.0832   Epoch: 1   Global Step: 29340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:05,311-Speed 9898.42 samples/sec   Loss 9.8435   LearningRate 0.0832   Epoch: 1   Global Step: 29350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:06,401-Speed 9397.87 samples/sec   Loss 9.7817   LearningRate 0.0832   Epoch: 1   Global Step: 29360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:07,485-Speed 9451.13 samples/sec   Loss 9.8567   LearningRate 0.0832   Epoch: 1   Global Step: 29370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:08,602-Speed 9175.47 samples/sec   Loss 9.6748   LearningRate 0.0832   Epoch: 1   Global Step: 29380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:09,681-Speed 9493.72 samples/sec   Loss 9.7517   LearningRate 0.0832   Epoch: 1   Global Step: 29390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:10,757-Speed 9518.69 samples/sec   Loss 9.7257   LearningRate 0.0832   Epoch: 1   Global Step: 29400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:11,860-Speed 9292.92 samples/sec   Loss 9.7827   LearningRate 0.0832   Epoch: 1   Global Step: 29410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:12,922-Speed 9646.49 samples/sec   Loss 9.5943   LearningRate 0.0832   Epoch: 1   Global Step: 29420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:14,006-Speed 9453.94 samples/sec   Loss 9.6260   LearningRate 0.0831   Epoch: 1   Global Step: 29430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:15,097-Speed 9394.56 samples/sec   Loss 9.7436   LearningRate 0.0831   Epoch: 1   Global Step: 29440   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:43:16,198-Speed 9306.73 samples/sec   Loss 9.6777   LearningRate 0.0831   Epoch: 1   Global Step: 29450   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:43:17,323-Speed 9102.02 samples/sec   Loss 9.7604   LearningRate 0.0831   Epoch: 1   Global Step: 29460   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:43:18,399-Speed 9525.79 samples/sec   Loss 9.8156   LearningRate 0.0831   Epoch: 1   Global Step: 29470   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:43:19,467-Speed 9597.29 samples/sec   Loss 9.7528   LearningRate 0.0831   Epoch: 1   Global Step: 29480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:20,511-Speed 9813.26 samples/sec   Loss 9.8298   LearningRate 0.0831   Epoch: 1   Global Step: 29490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:21,566-Speed 9710.73 samples/sec   Loss 9.7847   LearningRate 0.0831   Epoch: 1   Global Step: 29500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:22,643-Speed 9520.70 samples/sec   Loss 9.6704   LearningRate 0.0831   Epoch: 1   Global Step: 29510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:23,726-Speed 9457.43 samples/sec   Loss 9.7301   LearningRate 0.0831   Epoch: 1   Global Step: 29520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:24,822-Speed 9349.32 samples/sec   Loss 9.7409   LearningRate 0.0831   Epoch: 1   Global Step: 29530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:25,919-Speed 9333.52 samples/sec   Loss 9.6738   LearningRate 0.0831   Epoch: 1   Global Step: 29540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:27,057-Speed 9008.84 samples/sec   Loss 9.7943   LearningRate 0.0831   Epoch: 1   Global Step: 29550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:28,172-Speed 9187.95 samples/sec   Loss 9.7315   LearningRate 0.0831   Epoch: 1   Global Step: 29560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:29,273-Speed 9305.72 samples/sec   Loss 9.8428   LearningRate 0.0831   Epoch: 1   Global Step: 29570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:30,354-Speed 9477.35 samples/sec   Loss 9.6967   LearningRate 0.0831   Epoch: 1   Global Step: 29580   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:43:31,441-Speed 9424.27 samples/sec   Loss 9.6767   LearningRate 0.0831   Epoch: 1   Global Step: 29590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:32,539-Speed 9333.13 samples/sec   Loss 9.6270   LearningRate 0.0831   Epoch: 1   Global Step: 29600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:33,617-Speed 9505.99 samples/sec   Loss 9.6554   LearningRate 0.0830   Epoch: 1   Global Step: 29610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:34,679-Speed 9650.09 samples/sec   Loss 9.8603   LearningRate 0.0830   Epoch: 1   Global Step: 29620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:35,748-Speed 9586.89 samples/sec   Loss 9.8332   LearningRate 0.0830   Epoch: 1   Global Step: 29630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:36,879-Speed 9057.50 samples/sec   Loss 9.8348   LearningRate 0.0830   Epoch: 1   Global Step: 29640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:37,989-Speed 9234.32 samples/sec   Loss 9.6102   LearningRate 0.0830   Epoch: 1   Global Step: 29650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:39,070-Speed 9474.33 samples/sec   Loss 9.7486   LearningRate 0.0830   Epoch: 1   Global Step: 29660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:40,143-Speed 9544.93 samples/sec   Loss 9.6035   LearningRate 0.0830   Epoch: 1   Global Step: 29670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:41,226-Speed 9464.67 samples/sec   Loss 9.6308   LearningRate 0.0830   Epoch: 1   Global Step: 29680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:42,344-Speed 9161.80 samples/sec   Loss 9.6572   LearningRate 0.0830   Epoch: 1   Global Step: 29690   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:43:43,458-Speed 9197.82 samples/sec   Loss 9.7068   LearningRate 0.0830   Epoch: 1   Global Step: 29700   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:43:44,545-Speed 9431.60 samples/sec   Loss 9.6981   LearningRate 0.0830   Epoch: 1   Global Step: 29710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:45,635-Speed 9394.96 samples/sec   Loss 9.7448   LearningRate 0.0830   Epoch: 1   Global Step: 29720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:46,728-Speed 9377.79 samples/sec   Loss 9.7888   LearningRate 0.0830   Epoch: 1   Global Step: 29730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:47,822-Speed 9369.62 samples/sec   Loss 9.6536   LearningRate 0.0830   Epoch: 1   Global Step: 29740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:48,929-Speed 9254.34 samples/sec   Loss 9.6750   LearningRate 0.0830   Epoch: 1   Global Step: 29750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:50,027-Speed 9355.37 samples/sec   Loss 9.6684   LearningRate 0.0830   Epoch: 1   Global Step: 29760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:51,094-Speed 9602.05 samples/sec   Loss 9.7117   LearningRate 0.0830   Epoch: 1   Global Step: 29770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:52,178-Speed 9450.19 samples/sec   Loss 9.6775   LearningRate 0.0830   Epoch: 1   Global Step: 29780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:53,316-Speed 8998.59 samples/sec   Loss 9.8390   LearningRate 0.0829   Epoch: 1   Global Step: 29790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:54,407-Speed 9396.88 samples/sec   Loss 9.6539   LearningRate 0.0829   Epoch: 1   Global Step: 29800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:43:55,458-Speed 9745.63 samples/sec   Loss 9.7128   LearningRate 0.0829   Epoch: 1   Global Step: 29810   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:43:56,550-Speed 9387.08 samples/sec   Loss 9.6710   LearningRate 0.0829   Epoch: 1   Global Step: 29820   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:43:57,655-Speed 9268.70 samples/sec   Loss 9.6924   LearningRate 0.0829   Epoch: 1   Global Step: 29830   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:43:58,786-Speed 9057.81 samples/sec   Loss 9.6304   LearningRate 0.0829   Epoch: 1   Global Step: 29840   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:43:59,916-Speed 9069.66 samples/sec   Loss 9.7562   LearningRate 0.0829   Epoch: 1   Global Step: 29850   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:44:00,985-Speed 9582.57 samples/sec   Loss 9.6394   LearningRate 0.0829   Epoch: 1   Global Step: 29860   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:44:02,066-Speed 9481.67 samples/sec   Loss 9.8580   LearningRate 0.0829   Epoch: 1   Global Step: 29870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:44:03,143-Speed 9511.79 samples/sec   Loss 9.7132   LearningRate 0.0829   Epoch: 1   Global Step: 29880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:44:04,212-Speed 9588.08 samples/sec   Loss 9.7023   LearningRate 0.0829   Epoch: 1   Global Step: 29890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:44:05,265-Speed 9729.61 samples/sec   Loss 9.5744   LearningRate 0.0829   Epoch: 1   Global Step: 29900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:44:06,350-Speed 9446.05 samples/sec   Loss 9.7133   LearningRate 0.0829   Epoch: 1   Global Step: 29910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:44:07,406-Speed 9699.34 samples/sec   Loss 9.7025   LearningRate 0.0829   Epoch: 1   Global Step: 29920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:44:08,487-Speed 9477.73 samples/sec   Loss 9.6392   LearningRate 0.0829   Epoch: 1   Global Step: 29930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:44:09,558-Speed 9568.32 samples/sec   Loss 9.7124   LearningRate 0.0829   Epoch: 1   Global Step: 29940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:44:10,681-Speed 9122.83 samples/sec   Loss 9.6752   LearningRate 0.0829   Epoch: 1   Global Step: 29950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:44:11,792-Speed 9218.21 samples/sec   Loss 9.6548   LearningRate 0.0829   Epoch: 1   Global Step: 29960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:44:12,886-Speed 9363.41 samples/sec   Loss 9.8001   LearningRate 0.0829   Epoch: 1   Global Step: 29970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:44:13,939-Speed 9738.81 samples/sec   Loss 9.6802   LearningRate 0.0828   Epoch: 1   Global Step: 29980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:44:15,019-Speed 9490.29 samples/sec   Loss 9.7430   LearningRate 0.0828   Epoch: 1   Global Step: 29990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:44:16,177-Speed 8844.85 samples/sec   Loss 9.9114   LearningRate 0.0828   Epoch: 1   Global Step: 30000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:44:38,088-[lfw][30000]XNorm: 13.702796
Training: 2022-04-11 12:44:38,089-[lfw][30000]Accuracy-Flip: 0.99517+-0.00320
Training: 2022-04-11 12:44:38,089-[lfw][30000]Accuracy-Highest: 0.99517
Training: 2022-04-11 12:45:03,386-[cfp_fp][30000]XNorm: 11.492991
Training: 2022-04-11 12:45:03,387-[cfp_fp][30000]Accuracy-Flip: 0.93014+-0.01568
Training: 2022-04-11 12:45:03,388-[cfp_fp][30000]Accuracy-Highest: 0.93614
Training: 2022-04-11 12:45:25,187-[agedb_30][30000]XNorm: 13.185564
Training: 2022-04-11 12:45:25,188-[agedb_30][30000]Accuracy-Flip: 0.94233+-0.01506
Training: 2022-04-11 12:45:25,189-[agedb_30][30000]Accuracy-Highest: 0.94400
Training: 2022-04-11 12:45:26,283-Speed 146.07 samples/sec   Loss 9.7293   LearningRate 0.0828   Epoch: 1   Global Step: 30010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:45:27,387-Speed 9280.98 samples/sec   Loss 9.7441   LearningRate 0.0828   Epoch: 1   Global Step: 30020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:45:28,450-Speed 9641.57 samples/sec   Loss 9.7787   LearningRate 0.0828   Epoch: 1   Global Step: 30030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:45:29,526-Speed 9522.94 samples/sec   Loss 9.5943   LearningRate 0.0828   Epoch: 1   Global Step: 30040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:45:30,627-Speed 9301.24 samples/sec   Loss 9.5887   LearningRate 0.0828   Epoch: 1   Global Step: 30050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:45:31,706-Speed 9496.82 samples/sec   Loss 9.7364   LearningRate 0.0828   Epoch: 1   Global Step: 30060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:45:32,794-Speed 9421.16 samples/sec   Loss 9.6745   LearningRate 0.0828   Epoch: 1   Global Step: 30070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:45:33,856-Speed 9653.50 samples/sec   Loss 9.6352   LearningRate 0.0828   Epoch: 1   Global Step: 30080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:34,925-Speed 9577.13 samples/sec   Loss 9.6950   LearningRate 0.0828   Epoch: 1   Global Step: 30090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:36,029-Speed 9286.00 samples/sec   Loss 9.7086   LearningRate 0.0828   Epoch: 1   Global Step: 30100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:37,100-Speed 9561.32 samples/sec   Loss 9.7223   LearningRate 0.0828   Epoch: 1   Global Step: 30110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:38,185-Speed 9445.45 samples/sec   Loss 9.5864   LearningRate 0.0828   Epoch: 1   Global Step: 30120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:39,247-Speed 9648.60 samples/sec   Loss 9.7443   LearningRate 0.0828   Epoch: 1   Global Step: 30130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:40,341-Speed 9360.33 samples/sec   Loss 9.7940   LearningRate 0.0828   Epoch: 1   Global Step: 30140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:41,415-Speed 9545.97 samples/sec   Loss 9.6196   LearningRate 0.0828   Epoch: 1   Global Step: 30150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:42,542-Speed 9090.14 samples/sec   Loss 9.6089   LearningRate 0.0827   Epoch: 1   Global Step: 30160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:43,630-Speed 9417.30 samples/sec   Loss 9.6102   LearningRate 0.0827   Epoch: 1   Global Step: 30170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:44,748-Speed 9167.78 samples/sec   Loss 9.7452   LearningRate 0.0827   Epoch: 1   Global Step: 30180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:45:45,888-Speed 8986.03 samples/sec   Loss 9.5893   LearningRate 0.0827   Epoch: 1   Global Step: 30190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:45:46,996-Speed 9248.41 samples/sec   Loss 9.5569   LearningRate 0.0827   Epoch: 1   Global Step: 30200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:45:48,132-Speed 9019.61 samples/sec   Loss 9.6371   LearningRate 0.0827   Epoch: 1   Global Step: 30210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:49,251-Speed 9159.88 samples/sec   Loss 9.7069   LearningRate 0.0827   Epoch: 1   Global Step: 30220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:50,358-Speed 9257.36 samples/sec   Loss 9.6330   LearningRate 0.0827   Epoch: 1   Global Step: 30230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:51,457-Speed 9321.48 samples/sec   Loss 9.6528   LearningRate 0.0827   Epoch: 1   Global Step: 30240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:52,562-Speed 9277.04 samples/sec   Loss 9.7428   LearningRate 0.0827   Epoch: 1   Global Step: 30250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:53,635-Speed 9546.04 samples/sec   Loss 9.7776   LearningRate 0.0827   Epoch: 1   Global Step: 30260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:54,701-Speed 9614.77 samples/sec   Loss 9.6906   LearningRate 0.0827   Epoch: 1   Global Step: 30270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:55,758-Speed 9689.61 samples/sec   Loss 9.6332   LearningRate 0.0827   Epoch: 1   Global Step: 30280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:56,804-Speed 9790.39 samples/sec   Loss 9.7548   LearningRate 0.0827   Epoch: 1   Global Step: 30290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:57,893-Speed 9412.86 samples/sec   Loss 9.6557   LearningRate 0.0827   Epoch: 1   Global Step: 30300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:45:58,989-Speed 9347.57 samples/sec   Loss 9.6360   LearningRate 0.0827   Epoch: 1   Global Step: 30310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:00,088-Speed 9327.20 samples/sec   Loss 9.6244   LearningRate 0.0827   Epoch: 1   Global Step: 30320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:01,161-Speed 9548.96 samples/sec   Loss 9.5412   LearningRate 0.0827   Epoch: 1   Global Step: 30330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:02,253-Speed 9382.48 samples/sec   Loss 9.7573   LearningRate 0.0826   Epoch: 1   Global Step: 30340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:03,319-Speed 9612.10 samples/sec   Loss 9.6654   LearningRate 0.0826   Epoch: 1   Global Step: 30350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:04,397-Speed 9502.09 samples/sec   Loss 9.6754   LearningRate 0.0826   Epoch: 1   Global Step: 30360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:05,489-Speed 9382.05 samples/sec   Loss 9.6137   LearningRate 0.0826   Epoch: 1   Global Step: 30370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:06,570-Speed 9486.94 samples/sec   Loss 9.5174   LearningRate 0.0826   Epoch: 1   Global Step: 30380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:07,686-Speed 9176.34 samples/sec   Loss 9.6223   LearningRate 0.0826   Epoch: 1   Global Step: 30390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:08,779-Speed 9374.00 samples/sec   Loss 9.6982   LearningRate 0.0826   Epoch: 1   Global Step: 30400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:09,919-Speed 8987.48 samples/sec   Loss 9.6874   LearningRate 0.0826   Epoch: 1   Global Step: 30410   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:46:10,989-Speed 9582.98 samples/sec   Loss 9.6137   LearningRate 0.0826   Epoch: 1   Global Step: 30420   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:46:12,065-Speed 9517.47 samples/sec   Loss 9.5499   LearningRate 0.0826   Epoch: 1   Global Step: 30430   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:46:13,151-Speed 9431.50 samples/sec   Loss 9.6270   LearningRate 0.0826   Epoch: 1   Global Step: 30440   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:46:14,228-Speed 9517.56 samples/sec   Loss 9.7888   LearningRate 0.0826   Epoch: 1   Global Step: 30450   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:46:15,323-Speed 9356.29 samples/sec   Loss 9.6322   LearningRate 0.0826   Epoch: 1   Global Step: 30460   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:46:16,396-Speed 9550.38 samples/sec   Loss 9.7969   LearningRate 0.0826   Epoch: 1   Global Step: 30470   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:46:17,493-Speed 9341.59 samples/sec   Loss 9.7136   LearningRate 0.0826   Epoch: 1   Global Step: 30480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:18,586-Speed 9373.57 samples/sec   Loss 9.7052   LearningRate 0.0826   Epoch: 1   Global Step: 30490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:19,686-Speed 9315.28 samples/sec   Loss 9.5591   LearningRate 0.0826   Epoch: 1   Global Step: 30500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:20,777-Speed 9393.64 samples/sec   Loss 9.5757   LearningRate 0.0826   Epoch: 1   Global Step: 30510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:21,846-Speed 9585.37 samples/sec   Loss 9.5667   LearningRate 0.0826   Epoch: 1   Global Step: 30520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:22,954-Speed 9242.98 samples/sec   Loss 9.6367   LearningRate 0.0825   Epoch: 1   Global Step: 30530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:24,051-Speed 9339.96 samples/sec   Loss 9.6576   LearningRate 0.0825   Epoch: 1   Global Step: 30540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:25,144-Speed 9378.25 samples/sec   Loss 9.5692   LearningRate 0.0825   Epoch: 1   Global Step: 30550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:26,218-Speed 9537.66 samples/sec   Loss 9.5821   LearningRate 0.0825   Epoch: 1   Global Step: 30560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:27,316-Speed 9336.83 samples/sec   Loss 9.5537   LearningRate 0.0825   Epoch: 1   Global Step: 30570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:28,426-Speed 9227.53 samples/sec   Loss 9.7803   LearningRate 0.0825   Epoch: 1   Global Step: 30580   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:46:29,523-Speed 9341.33 samples/sec   Loss 9.5627   LearningRate 0.0825   Epoch: 1   Global Step: 30590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:30,614-Speed 9391.22 samples/sec   Loss 9.6516   LearningRate 0.0825   Epoch: 1   Global Step: 30600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:33,157-Speed 4027.46 samples/sec   Loss 9.7050   LearningRate 0.0825   Epoch: 1   Global Step: 30610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:34,297-Speed 8990.91 samples/sec   Loss 9.5817   LearningRate 0.0825   Epoch: 1   Global Step: 30620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:35,366-Speed 9580.00 samples/sec   Loss 9.5828   LearningRate 0.0825   Epoch: 1   Global Step: 30630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:36,471-Speed 9282.23 samples/sec   Loss 9.5409   LearningRate 0.0825   Epoch: 1   Global Step: 30640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:37,560-Speed 9406.87 samples/sec   Loss 9.6068   LearningRate 0.0825   Epoch: 1   Global Step: 30650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:38,651-Speed 9386.71 samples/sec   Loss 9.5727   LearningRate 0.0825   Epoch: 1   Global Step: 30660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:39,759-Speed 9248.34 samples/sec   Loss 9.6786   LearningRate 0.0825   Epoch: 1   Global Step: 30670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:40,862-Speed 9293.29 samples/sec   Loss 9.6402   LearningRate 0.0825   Epoch: 1   Global Step: 30680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:41,969-Speed 9251.57 samples/sec   Loss 9.5708   LearningRate 0.0825   Epoch: 1   Global Step: 30690   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:46:43,053-Speed 9454.05 samples/sec   Loss 9.6594   LearningRate 0.0825   Epoch: 1   Global Step: 30700   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:46:44,165-Speed 9218.79 samples/sec   Loss 9.6001   LearningRate 0.0824   Epoch: 1   Global Step: 30710   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:46:45,228-Speed 9638.00 samples/sec   Loss 9.6239   LearningRate 0.0824   Epoch: 1   Global Step: 30720   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:46:46,312-Speed 9451.07 samples/sec   Loss 9.6021   LearningRate 0.0824   Epoch: 1   Global Step: 30730   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:46:47,395-Speed 9461.87 samples/sec   Loss 9.7018   LearningRate 0.0824   Epoch: 1   Global Step: 30740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:48,489-Speed 9362.83 samples/sec   Loss 9.7267   LearningRate 0.0824   Epoch: 1   Global Step: 30750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:49,539-Speed 9762.17 samples/sec   Loss 9.5699   LearningRate 0.0824   Epoch: 1   Global Step: 30760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:50,596-Speed 9689.40 samples/sec   Loss 9.7487   LearningRate 0.0824   Epoch: 1   Global Step: 30770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:51,686-Speed 9402.79 samples/sec   Loss 9.7694   LearningRate 0.0824   Epoch: 1   Global Step: 30780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:52,746-Speed 9664.67 samples/sec   Loss 9.6065   LearningRate 0.0824   Epoch: 1   Global Step: 30790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:53,817-Speed 9570.05 samples/sec   Loss 9.6883   LearningRate 0.0824   Epoch: 1   Global Step: 30800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:54,896-Speed 9488.76 samples/sec   Loss 9.6673   LearningRate 0.0824   Epoch: 1   Global Step: 30810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:55,980-Speed 9455.75 samples/sec   Loss 9.4960   LearningRate 0.0824   Epoch: 1   Global Step: 30820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:57,054-Speed 9539.50 samples/sec   Loss 9.5787   LearningRate 0.0824   Epoch: 1   Global Step: 30830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:46:58,090-Speed 9888.82 samples/sec   Loss 9.7050   LearningRate 0.0824   Epoch: 1   Global Step: 30840   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:46:59,156-Speed 9617.49 samples/sec   Loss 9.5735   LearningRate 0.0824   Epoch: 1   Global Step: 30850   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:47:00,257-Speed 9309.92 samples/sec   Loss 9.6880   LearningRate 0.0824   Epoch: 1   Global Step: 30860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:01,344-Speed 9429.33 samples/sec   Loss 9.6340   LearningRate 0.0824   Epoch: 1   Global Step: 30870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:02,417-Speed 9543.20 samples/sec   Loss 9.5691   LearningRate 0.0824   Epoch: 1   Global Step: 30880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:03,506-Speed 9406.74 samples/sec   Loss 9.6377   LearningRate 0.0823   Epoch: 1   Global Step: 30890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:04,652-Speed 8944.13 samples/sec   Loss 9.6245   LearningRate 0.0823   Epoch: 1   Global Step: 30900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:47:05,749-Speed 9336.77 samples/sec   Loss 9.5597   LearningRate 0.0823   Epoch: 1   Global Step: 30910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:47:06,808-Speed 9681.14 samples/sec   Loss 9.6541   LearningRate 0.0823   Epoch: 1   Global Step: 30920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:47:07,905-Speed 9344.85 samples/sec   Loss 9.5438   LearningRate 0.0823   Epoch: 1   Global Step: 30930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:47:09,009-Speed 9280.60 samples/sec   Loss 9.6314   LearningRate 0.0823   Epoch: 1   Global Step: 30940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:47:10,110-Speed 9303.28 samples/sec   Loss 9.6714   LearningRate 0.0823   Epoch: 1   Global Step: 30950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:47:11,240-Speed 9063.33 samples/sec   Loss 9.5705   LearningRate 0.0823   Epoch: 1   Global Step: 30960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:47:12,316-Speed 9526.41 samples/sec   Loss 9.6041   LearningRate 0.0823   Epoch: 1   Global Step: 30970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:47:13,413-Speed 9334.83 samples/sec   Loss 9.5565   LearningRate 0.0823   Epoch: 1   Global Step: 30980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:47:14,533-Speed 9153.23 samples/sec   Loss 9.5528   LearningRate 0.0823   Epoch: 1   Global Step: 30990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:47:15,624-Speed 9392.80 samples/sec   Loss 9.7179   LearningRate 0.0823   Epoch: 1   Global Step: 31000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:16,721-Speed 9334.70 samples/sec   Loss 9.5396   LearningRate 0.0823   Epoch: 1   Global Step: 31010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:17,851-Speed 9065.91 samples/sec   Loss 9.6765   LearningRate 0.0823   Epoch: 1   Global Step: 31020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:18,958-Speed 9262.21 samples/sec   Loss 9.5608   LearningRate 0.0823   Epoch: 1   Global Step: 31030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:20,028-Speed 9575.03 samples/sec   Loss 9.6627   LearningRate 0.0823   Epoch: 1   Global Step: 31040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:21,115-Speed 9427.40 samples/sec   Loss 9.5930   LearningRate 0.0823   Epoch: 1   Global Step: 31050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:22,223-Speed 9250.03 samples/sec   Loss 9.5270   LearningRate 0.0823   Epoch: 1   Global Step: 31060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:23,315-Speed 9377.04 samples/sec   Loss 9.4082   LearningRate 0.0823   Epoch: 1   Global Step: 31070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:24,408-Speed 9375.16 samples/sec   Loss 9.6637   LearningRate 0.0822   Epoch: 1   Global Step: 31080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:25,462-Speed 9722.62 samples/sec   Loss 9.5893   LearningRate 0.0822   Epoch: 1   Global Step: 31090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:26,591-Speed 9080.21 samples/sec   Loss 9.5665   LearningRate 0.0822   Epoch: 1   Global Step: 31100   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:47:27,675-Speed 9450.95 samples/sec   Loss 9.5596   LearningRate 0.0822   Epoch: 1   Global Step: 31110   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:47:28,753-Speed 9504.85 samples/sec   Loss 9.5241   LearningRate 0.0822   Epoch: 1   Global Step: 31120   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:47:29,792-Speed 9861.13 samples/sec   Loss 9.5224   LearningRate 0.0822   Epoch: 1   Global Step: 31130   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:47:30,929-Speed 9013.82 samples/sec   Loss 9.5318   LearningRate 0.0822   Epoch: 1   Global Step: 31140   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:47:31,998-Speed 9585.46 samples/sec   Loss 9.4987   LearningRate 0.0822   Epoch: 1   Global Step: 31150   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:47:33,126-Speed 9086.92 samples/sec   Loss 9.6014   LearningRate 0.0822   Epoch: 1   Global Step: 31160   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:47:34,207-Speed 9472.16 samples/sec   Loss 9.4925   LearningRate 0.0822   Epoch: 1   Global Step: 31170   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:47:35,307-Speed 9314.63 samples/sec   Loss 9.6451   LearningRate 0.0822   Epoch: 1   Global Step: 31180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:36,376-Speed 9584.71 samples/sec   Loss 9.6047   LearningRate 0.0822   Epoch: 1   Global Step: 31190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:37,431-Speed 9711.47 samples/sec   Loss 9.6703   LearningRate 0.0822   Epoch: 1   Global Step: 31200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:38,516-Speed 9451.50 samples/sec   Loss 9.5781   LearningRate 0.0822   Epoch: 1   Global Step: 31210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:39,580-Speed 9630.77 samples/sec   Loss 9.5924   LearningRate 0.0822   Epoch: 1   Global Step: 31220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:40,658-Speed 9501.56 samples/sec   Loss 9.6165   LearningRate 0.0822   Epoch: 1   Global Step: 31230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:41,722-Speed 9626.79 samples/sec   Loss 9.6122   LearningRate 0.0822   Epoch: 1   Global Step: 31240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:42,796-Speed 9542.11 samples/sec   Loss 9.7307   LearningRate 0.0822   Epoch: 1   Global Step: 31250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:43,869-Speed 9549.29 samples/sec   Loss 9.7020   LearningRate 0.0821   Epoch: 1   Global Step: 31260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:44,917-Speed 9774.16 samples/sec   Loss 9.6057   LearningRate 0.0821   Epoch: 1   Global Step: 31270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:45,948-Speed 9935.83 samples/sec   Loss 9.5719   LearningRate 0.0821   Epoch: 1   Global Step: 31280   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:47:46,993-Speed 9811.26 samples/sec   Loss 9.6461   LearningRate 0.0821   Epoch: 1   Global Step: 31290   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:47:48,091-Speed 9330.36 samples/sec   Loss 9.6435   LearningRate 0.0821   Epoch: 1   Global Step: 31300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:49,154-Speed 9635.65 samples/sec   Loss 9.7245   LearningRate 0.0821   Epoch: 1   Global Step: 31310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:50,254-Speed 9317.94 samples/sec   Loss 9.5194   LearningRate 0.0821   Epoch: 1   Global Step: 31320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:51,331-Speed 9514.29 samples/sec   Loss 9.5242   LearningRate 0.0821   Epoch: 1   Global Step: 31330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:52,414-Speed 9457.31 samples/sec   Loss 9.5316   LearningRate 0.0821   Epoch: 1   Global Step: 31340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:53,483-Speed 9585.33 samples/sec   Loss 9.6115   LearningRate 0.0821   Epoch: 1   Global Step: 31350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:54,545-Speed 9651.59 samples/sec   Loss 9.5054   LearningRate 0.0821   Epoch: 1   Global Step: 31360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:47:55,572-Speed 9977.95 samples/sec   Loss 9.5350   LearningRate 0.0821   Epoch: 1   Global Step: 31370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:00,002-Speed 2311.64 samples/sec   Loss 9.6068   LearningRate 0.0821   Epoch: 1   Global Step: 31380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:02,321-Speed 4418.75 samples/sec   Loss 9.7042   LearningRate 0.0821   Epoch: 1   Global Step: 31390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:03,396-Speed 9531.78 samples/sec   Loss 9.5769   LearningRate 0.0821   Epoch: 1   Global Step: 31400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:04,456-Speed 9658.54 samples/sec   Loss 9.6052   LearningRate 0.0821   Epoch: 1   Global Step: 31410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:05,548-Speed 9382.18 samples/sec   Loss 9.6086   LearningRate 0.0821   Epoch: 1   Global Step: 31420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:06,645-Speed 9343.85 samples/sec   Loss 9.6397   LearningRate 0.0821   Epoch: 1   Global Step: 31430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:07,708-Speed 9638.47 samples/sec   Loss 9.4660   LearningRate 0.0821   Epoch: 1   Global Step: 31440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:08,814-Speed 9266.94 samples/sec   Loss 9.5138   LearningRate 0.0820   Epoch: 1   Global Step: 31450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:09,901-Speed 9426.66 samples/sec   Loss 9.6906   LearningRate 0.0820   Epoch: 1   Global Step: 31460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:10,973-Speed 9554.21 samples/sec   Loss 9.4658   LearningRate 0.0820   Epoch: 1   Global Step: 31470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:12,025-Speed 9737.93 samples/sec   Loss 9.5392   LearningRate 0.0820   Epoch: 1   Global Step: 31480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:13,101-Speed 9529.30 samples/sec   Loss 9.5066   LearningRate 0.0820   Epoch: 1   Global Step: 31490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:14,195-Speed 9365.77 samples/sec   Loss 9.6105   LearningRate 0.0820   Epoch: 1   Global Step: 31500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:15,268-Speed 9551.23 samples/sec   Loss 9.5383   LearningRate 0.0820   Epoch: 1   Global Step: 31510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:16,340-Speed 9558.66 samples/sec   Loss 9.4757   LearningRate 0.0820   Epoch: 1   Global Step: 31520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:17,441-Speed 9300.65 samples/sec   Loss 9.5498   LearningRate 0.0820   Epoch: 1   Global Step: 31530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:18,528-Speed 9424.27 samples/sec   Loss 9.6103   LearningRate 0.0820   Epoch: 1   Global Step: 31540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:19,589-Speed 9662.66 samples/sec   Loss 9.5274   LearningRate 0.0820   Epoch: 1   Global Step: 31550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:20,719-Speed 9071.87 samples/sec   Loss 9.5509   LearningRate 0.0820   Epoch: 1   Global Step: 31560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:21,800-Speed 9481.29 samples/sec   Loss 9.4348   LearningRate 0.0820   Epoch: 1   Global Step: 31570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:22,827-Speed 9971.47 samples/sec   Loss 9.5458   LearningRate 0.0820   Epoch: 1   Global Step: 31580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:23,880-Speed 9731.30 samples/sec   Loss 9.5989   LearningRate 0.0820   Epoch: 1   Global Step: 31590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:24,922-Speed 9830.79 samples/sec   Loss 9.6205   LearningRate 0.0820   Epoch: 1   Global Step: 31600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:25,975-Speed 9742.82 samples/sec   Loss 9.5259   LearningRate 0.0820   Epoch: 1   Global Step: 31610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:27,039-Speed 9630.87 samples/sec   Loss 9.5542   LearningRate 0.0820   Epoch: 1   Global Step: 31620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:28,122-Speed 9455.19 samples/sec   Loss 9.5618   LearningRate 0.0819   Epoch: 1   Global Step: 31630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:29,249-Speed 9096.75 samples/sec   Loss 9.4037   LearningRate 0.0819   Epoch: 1   Global Step: 31640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:30,328-Speed 9498.51 samples/sec   Loss 9.7313   LearningRate 0.0819   Epoch: 1   Global Step: 31650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:31,440-Speed 9212.84 samples/sec   Loss 9.5357   LearningRate 0.0819   Epoch: 1   Global Step: 31660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:32,511-Speed 9570.25 samples/sec   Loss 9.4368   LearningRate 0.0819   Epoch: 1   Global Step: 31670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:33,587-Speed 9520.94 samples/sec   Loss 9.4772   LearningRate 0.0819   Epoch: 1   Global Step: 31680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:34,682-Speed 9359.13 samples/sec   Loss 9.6262   LearningRate 0.0819   Epoch: 1   Global Step: 31690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:35,765-Speed 9457.88 samples/sec   Loss 9.5317   LearningRate 0.0819   Epoch: 1   Global Step: 31700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:36,848-Speed 9463.28 samples/sec   Loss 9.5382   LearningRate 0.0819   Epoch: 1   Global Step: 31710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:37,939-Speed 9397.03 samples/sec   Loss 9.3932   LearningRate 0.0819   Epoch: 1   Global Step: 31720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:39,017-Speed 9496.44 samples/sec   Loss 9.6414   LearningRate 0.0819   Epoch: 1   Global Step: 31730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:40,082-Speed 9625.51 samples/sec   Loss 9.5224   LearningRate 0.0819   Epoch: 1   Global Step: 31740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:41,152-Speed 9570.38 samples/sec   Loss 9.6763   LearningRate 0.0819   Epoch: 1   Global Step: 31750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:42,234-Speed 9473.64 samples/sec   Loss 9.6227   LearningRate 0.0819   Epoch: 1   Global Step: 31760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:43,328-Speed 9365.61 samples/sec   Loss 9.5643   LearningRate 0.0819   Epoch: 1   Global Step: 31770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:44,396-Speed 9593.71 samples/sec   Loss 9.4931   LearningRate 0.0819   Epoch: 1   Global Step: 31780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:45,460-Speed 9623.44 samples/sec   Loss 9.4522   LearningRate 0.0819   Epoch: 1   Global Step: 31790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:46,551-Speed 9392.84 samples/sec   Loss 9.4563   LearningRate 0.0819   Epoch: 1   Global Step: 31800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:47,674-Speed 9130.06 samples/sec   Loss 9.4965   LearningRate 0.0818   Epoch: 1   Global Step: 31810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:48,779-Speed 9267.86 samples/sec   Loss 9.5018   LearningRate 0.0818   Epoch: 1   Global Step: 31820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:49,852-Speed 9557.24 samples/sec   Loss 9.5151   LearningRate 0.0818   Epoch: 1   Global Step: 31830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:50,915-Speed 9641.47 samples/sec   Loss 9.5954   LearningRate 0.0818   Epoch: 1   Global Step: 31840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:48:52,034-Speed 9156.46 samples/sec   Loss 9.5894   LearningRate 0.0818   Epoch: 1   Global Step: 31850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:53,139-Speed 9273.82 samples/sec   Loss 9.5308   LearningRate 0.0818   Epoch: 1   Global Step: 31860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:54,219-Speed 9486.99 samples/sec   Loss 9.5573   LearningRate 0.0818   Epoch: 1   Global Step: 31870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:55,328-Speed 9243.50 samples/sec   Loss 9.6679   LearningRate 0.0818   Epoch: 1   Global Step: 31880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:56,398-Speed 9572.58 samples/sec   Loss 9.5792   LearningRate 0.0818   Epoch: 1   Global Step: 31890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:57,488-Speed 9402.09 samples/sec   Loss 9.5271   LearningRate 0.0818   Epoch: 1   Global Step: 31900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:58,566-Speed 9503.87 samples/sec   Loss 9.4243   LearningRate 0.0818   Epoch: 1   Global Step: 31910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:48:59,639-Speed 9552.78 samples/sec   Loss 9.6382   LearningRate 0.0818   Epoch: 1   Global Step: 31920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:49:00,716-Speed 9513.77 samples/sec   Loss 9.5931   LearningRate 0.0818   Epoch: 1   Global Step: 31930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:49:01,818-Speed 9298.28 samples/sec   Loss 9.7007   LearningRate 0.0818   Epoch: 1   Global Step: 31940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:49:02,965-Speed 8927.65 samples/sec   Loss 9.5599   LearningRate 0.0818   Epoch: 1   Global Step: 31950   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:49:04,006-Speed 9844.36 samples/sec   Loss 9.5562   LearningRate 0.0818   Epoch: 1   Global Step: 31960   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:49:05,117-Speed 9224.99 samples/sec   Loss 9.5280   LearningRate 0.0818   Epoch: 1   Global Step: 31970   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:49:06,168-Speed 9747.90 samples/sec   Loss 9.5730   LearningRate 0.0818   Epoch: 1   Global Step: 31980   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:49:07,297-Speed 9075.79 samples/sec   Loss 9.4346   LearningRate 0.0818   Epoch: 1   Global Step: 31990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:49:08,379-Speed 9466.95 samples/sec   Loss 9.4637   LearningRate 0.0817   Epoch: 1   Global Step: 32000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:49:30,509-[lfw][32000]XNorm: 13.668435
Training: 2022-04-11 12:49:30,510-[lfw][32000]Accuracy-Flip: 0.99533+-0.00314
Training: 2022-04-11 12:49:30,510-[lfw][32000]Accuracy-Highest: 0.99533
Training: 2022-04-11 12:49:56,141-[cfp_fp][32000]XNorm: 11.320121
Training: 2022-04-11 12:49:56,141-[cfp_fp][32000]Accuracy-Flip: 0.93143+-0.01148
Training: 2022-04-11 12:49:56,142-[cfp_fp][32000]Accuracy-Highest: 0.93614
Training: 2022-04-11 12:50:18,199-[agedb_30][32000]XNorm: 13.186324
Training: 2022-04-11 12:50:18,200-[agedb_30][32000]Accuracy-Flip: 0.94550+-0.01138
Training: 2022-04-11 12:50:18,201-[agedb_30][32000]Accuracy-Highest: 0.94550
Training: 2022-04-11 12:50:19,308-Speed 144.37 samples/sec   Loss 9.5827   LearningRate 0.0817   Epoch: 1   Global Step: 32010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:20,367-Speed 9674.81 samples/sec   Loss 9.5916   LearningRate 0.0817   Epoch: 1   Global Step: 32020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:21,419-Speed 9739.36 samples/sec   Loss 9.4947   LearningRate 0.0817   Epoch: 1   Global Step: 32030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:22,524-Speed 9271.67 samples/sec   Loss 9.4691   LearningRate 0.0817   Epoch: 1   Global Step: 32040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:23,595-Speed 9565.43 samples/sec   Loss 9.6127   LearningRate 0.0817   Epoch: 1   Global Step: 32050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:24,659-Speed 9623.18 samples/sec   Loss 9.5454   LearningRate 0.0817   Epoch: 1   Global Step: 32060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:50:25,765-Speed 9278.36 samples/sec   Loss 9.4259   LearningRate 0.0817   Epoch: 1   Global Step: 32070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:50:26,841-Speed 9521.27 samples/sec   Loss 9.5783   LearningRate 0.0817   Epoch: 1   Global Step: 32080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:50:27,901-Speed 9666.68 samples/sec   Loss 9.4954   LearningRate 0.0817   Epoch: 1   Global Step: 32090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:50:28,956-Speed 9714.05 samples/sec   Loss 9.6590   LearningRate 0.0817   Epoch: 1   Global Step: 32100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:50:30,055-Speed 9322.29 samples/sec   Loss 9.5236   LearningRate 0.0817   Epoch: 1   Global Step: 32110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:50:31,169-Speed 9198.52 samples/sec   Loss 9.4549   LearningRate 0.0817   Epoch: 1   Global Step: 32120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:50:32,237-Speed 9593.95 samples/sec   Loss 9.6262   LearningRate 0.0817   Epoch: 1   Global Step: 32130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:50:33,354-Speed 9172.34 samples/sec   Loss 9.5882   LearningRate 0.0817   Epoch: 1   Global Step: 32140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:50:34,433-Speed 9493.85 samples/sec   Loss 9.4976   LearningRate 0.0817   Epoch: 1   Global Step: 32150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:50:35,512-Speed 9498.07 samples/sec   Loss 9.5197   LearningRate 0.0817   Epoch: 1   Global Step: 32160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:36,585-Speed 9552.73 samples/sec   Loss 9.4525   LearningRate 0.0817   Epoch: 1   Global Step: 32170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:37,663-Speed 9502.49 samples/sec   Loss 9.4229   LearningRate 0.0816   Epoch: 1   Global Step: 32180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:38,742-Speed 9498.16 samples/sec   Loss 9.4338   LearningRate 0.0816   Epoch: 1   Global Step: 32190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:39,805-Speed 9638.99 samples/sec   Loss 9.4483   LearningRate 0.0816   Epoch: 1   Global Step: 32200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:40,863-Speed 9687.24 samples/sec   Loss 9.4168   LearningRate 0.0816   Epoch: 1   Global Step: 32210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:41,951-Speed 9416.30 samples/sec   Loss 9.5056   LearningRate 0.0816   Epoch: 1   Global Step: 32220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:43,047-Speed 9344.09 samples/sec   Loss 9.5282   LearningRate 0.0816   Epoch: 1   Global Step: 32230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:44,074-Speed 9976.55 samples/sec   Loss 9.6586   LearningRate 0.0816   Epoch: 1   Global Step: 32240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:45,159-Speed 9446.53 samples/sec   Loss 9.4717   LearningRate 0.0816   Epoch: 1   Global Step: 32250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:46,248-Speed 9412.33 samples/sec   Loss 9.4497   LearningRate 0.0816   Epoch: 1   Global Step: 32260   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:50:47,297-Speed 9764.75 samples/sec   Loss 9.5075   LearningRate 0.0816   Epoch: 1   Global Step: 32270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:48,346-Speed 9761.13 samples/sec   Loss 9.4525   LearningRate 0.0816   Epoch: 1   Global Step: 32280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:49,408-Speed 9653.67 samples/sec   Loss 9.4302   LearningRate 0.0816   Epoch: 1   Global Step: 32290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:50,477-Speed 9582.44 samples/sec   Loss 9.4918   LearningRate 0.0816   Epoch: 1   Global Step: 32300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:51,528-Speed 9751.57 samples/sec   Loss 9.5155   LearningRate 0.0816   Epoch: 1   Global Step: 32310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:52,623-Speed 9355.77 samples/sec   Loss 9.4418   LearningRate 0.0816   Epoch: 1   Global Step: 32320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:53,696-Speed 9548.92 samples/sec   Loss 9.3628   LearningRate 0.0816   Epoch: 1   Global Step: 32330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:54,858-Speed 8819.10 samples/sec   Loss 9.4948   LearningRate 0.0816   Epoch: 1   Global Step: 32340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:55,939-Speed 9476.01 samples/sec   Loss 9.4711   LearningRate 0.0816   Epoch: 1   Global Step: 32350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:57,062-Speed 9122.57 samples/sec   Loss 9.5958   LearningRate 0.0816   Epoch: 1   Global Step: 32360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:50:58,150-Speed 9425.54 samples/sec   Loss 9.5784   LearningRate 0.0815   Epoch: 1   Global Step: 32370   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:50:59,235-Speed 9442.82 samples/sec   Loss 9.4804   LearningRate 0.0815   Epoch: 1   Global Step: 32380   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:00,332-Speed 9342.47 samples/sec   Loss 9.6315   LearningRate 0.0815   Epoch: 1   Global Step: 32390   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:01,388-Speed 9712.30 samples/sec   Loss 9.4392   LearningRate 0.0815   Epoch: 1   Global Step: 32400   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:02,508-Speed 9149.60 samples/sec   Loss 9.3912   LearningRate 0.0815   Epoch: 1   Global Step: 32410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:03,556-Speed 9778.29 samples/sec   Loss 9.4370   LearningRate 0.0815   Epoch: 1   Global Step: 32420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:04,631-Speed 9531.44 samples/sec   Loss 9.5638   LearningRate 0.0815   Epoch: 1   Global Step: 32430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:05,713-Speed 9473.72 samples/sec   Loss 9.4648   LearningRate 0.0815   Epoch: 1   Global Step: 32440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:06,780-Speed 9608.71 samples/sec   Loss 9.6420   LearningRate 0.0815   Epoch: 1   Global Step: 32450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:07,881-Speed 9302.40 samples/sec   Loss 9.5252   LearningRate 0.0815   Epoch: 1   Global Step: 32460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:09,028-Speed 8932.56 samples/sec   Loss 9.4540   LearningRate 0.0815   Epoch: 1   Global Step: 32470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:10,128-Speed 9315.96 samples/sec   Loss 9.6114   LearningRate 0.0815   Epoch: 1   Global Step: 32480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:11,204-Speed 9527.22 samples/sec   Loss 9.4863   LearningRate 0.0815   Epoch: 1   Global Step: 32490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:12,273-Speed 9584.32 samples/sec   Loss 9.5817   LearningRate 0.0815   Epoch: 1   Global Step: 32500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:13,383-Speed 9226.23 samples/sec   Loss 9.4970   LearningRate 0.0815   Epoch: 1   Global Step: 32510   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:14,478-Speed 9352.97 samples/sec   Loss 9.4856   LearningRate 0.0815   Epoch: 1   Global Step: 32520   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:15,532-Speed 9729.92 samples/sec   Loss 9.4889   LearningRate 0.0815   Epoch: 1   Global Step: 32530   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:16,588-Speed 9696.13 samples/sec   Loss 9.5101   LearningRate 0.0815   Epoch: 1   Global Step: 32540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:17,685-Speed 9345.04 samples/sec   Loss 9.5140   LearningRate 0.0814   Epoch: 1   Global Step: 32550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:18,817-Speed 9050.05 samples/sec   Loss 9.2994   LearningRate 0.0814   Epoch: 1   Global Step: 32560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:19,927-Speed 9231.70 samples/sec   Loss 9.4627   LearningRate 0.0814   Epoch: 1   Global Step: 32570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:21,079-Speed 8899.44 samples/sec   Loss 9.3938   LearningRate 0.0814   Epoch: 1   Global Step: 32580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:22,122-Speed 9823.20 samples/sec   Loss 9.5413   LearningRate 0.0814   Epoch: 1   Global Step: 32590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:23,198-Speed 9519.21 samples/sec   Loss 9.3691   LearningRate 0.0814   Epoch: 1   Global Step: 32600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:24,288-Speed 9397.49 samples/sec   Loss 9.3696   LearningRate 0.0814   Epoch: 1   Global Step: 32610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:25,399-Speed 9225.76 samples/sec   Loss 9.4535   LearningRate 0.0814   Epoch: 1   Global Step: 32620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:26,491-Speed 9384.93 samples/sec   Loss 9.4275   LearningRate 0.0814   Epoch: 1   Global Step: 32630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:27,591-Speed 9314.47 samples/sec   Loss 9.4110   LearningRate 0.0814   Epoch: 1   Global Step: 32640   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:28,676-Speed 9438.28 samples/sec   Loss 9.4410   LearningRate 0.0814   Epoch: 1   Global Step: 32650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:29,799-Speed 9122.20 samples/sec   Loss 9.6048   LearningRate 0.0814   Epoch: 1   Global Step: 32660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:30,903-Speed 9282.39 samples/sec   Loss 9.3545   LearningRate 0.0814   Epoch: 1   Global Step: 32670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:31,971-Speed 9598.79 samples/sec   Loss 9.4526   LearningRate 0.0814   Epoch: 1   Global Step: 32680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:33,054-Speed 9456.14 samples/sec   Loss 9.6651   LearningRate 0.0814   Epoch: 1   Global Step: 32690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:34,135-Speed 9482.62 samples/sec   Loss 9.5099   LearningRate 0.0814   Epoch: 1   Global Step: 32700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:35,209-Speed 9536.70 samples/sec   Loss 9.4914   LearningRate 0.0814   Epoch: 1   Global Step: 32710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:36,296-Speed 9427.47 samples/sec   Loss 9.5017   LearningRate 0.0814   Epoch: 1   Global Step: 32720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:37,384-Speed 9418.00 samples/sec   Loss 9.4755   LearningRate 0.0814   Epoch: 1   Global Step: 32730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:38,433-Speed 9775.21 samples/sec   Loss 9.4673   LearningRate 0.0813   Epoch: 1   Global Step: 32740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:39,484-Speed 9742.46 samples/sec   Loss 9.4458   LearningRate 0.0813   Epoch: 1   Global Step: 32750   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:40,585-Speed 9307.19 samples/sec   Loss 9.3607   LearningRate 0.0813   Epoch: 1   Global Step: 32760   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:41,683-Speed 9336.84 samples/sec   Loss 9.4395   LearningRate 0.0813   Epoch: 1   Global Step: 32770   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:42,788-Speed 9268.04 samples/sec   Loss 9.4680   LearningRate 0.0813   Epoch: 1   Global Step: 32780   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:43,854-Speed 9617.95 samples/sec   Loss 9.3317   LearningRate 0.0813   Epoch: 1   Global Step: 32790   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:44,917-Speed 9636.72 samples/sec   Loss 9.4554   LearningRate 0.0813   Epoch: 1   Global Step: 32800   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:45,973-Speed 9702.99 samples/sec   Loss 9.3862   LearningRate 0.0813   Epoch: 1   Global Step: 32810   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:47,028-Speed 9704.84 samples/sec   Loss 9.4378   LearningRate 0.0813   Epoch: 1   Global Step: 32820   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:48,137-Speed 9239.67 samples/sec   Loss 9.3672   LearningRate 0.0813   Epoch: 1   Global Step: 32830   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:49,227-Speed 9401.02 samples/sec   Loss 9.4289   LearningRate 0.0813   Epoch: 1   Global Step: 32840   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:50,324-Speed 9337.84 samples/sec   Loss 9.3893   LearningRate 0.0813   Epoch: 1   Global Step: 32850   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:51,434-Speed 9235.96 samples/sec   Loss 9.2880   LearningRate 0.0813   Epoch: 1   Global Step: 32860   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:51:52,529-Speed 9349.93 samples/sec   Loss 9.3504   LearningRate 0.0813   Epoch: 1   Global Step: 32870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:53,655-Speed 9104.62 samples/sec   Loss 9.5713   LearningRate 0.0813   Epoch: 1   Global Step: 32880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:54,742-Speed 9422.45 samples/sec   Loss 9.4527   LearningRate 0.0813   Epoch: 1   Global Step: 32890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:55,791-Speed 9776.49 samples/sec   Loss 9.4545   LearningRate 0.0813   Epoch: 1   Global Step: 32900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:56,954-Speed 8811.45 samples/sec   Loss 9.2636   LearningRate 0.0813   Epoch: 1   Global Step: 32910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:58,038-Speed 9449.67 samples/sec   Loss 9.4616   LearningRate 0.0812   Epoch: 1   Global Step: 32920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:51:59,141-Speed 9289.32 samples/sec   Loss 9.5086   LearningRate 0.0812   Epoch: 1   Global Step: 32930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:00,210-Speed 9581.19 samples/sec   Loss 9.4181   LearningRate 0.0812   Epoch: 1   Global Step: 32940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:01,317-Speed 9256.97 samples/sec   Loss 9.3925   LearningRate 0.0812   Epoch: 1   Global Step: 32950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:02,402-Speed 9447.46 samples/sec   Loss 9.3635   LearningRate 0.0812   Epoch: 1   Global Step: 32960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:03,508-Speed 9259.74 samples/sec   Loss 9.3522   LearningRate 0.0812   Epoch: 1   Global Step: 32970   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:52:04,594-Speed 9433.29 samples/sec   Loss 9.4077   LearningRate 0.0812   Epoch: 1   Global Step: 32980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:05,680-Speed 9446.83 samples/sec   Loss 9.4973   LearningRate 0.0812   Epoch: 1   Global Step: 32990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:06,802-Speed 9130.64 samples/sec   Loss 9.4121   LearningRate 0.0812   Epoch: 1   Global Step: 33000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:07,903-Speed 9308.57 samples/sec   Loss 9.3986   LearningRate 0.0812   Epoch: 1   Global Step: 33010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:08,978-Speed 9533.00 samples/sec   Loss 9.3563   LearningRate 0.0812   Epoch: 1   Global Step: 33020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:10,080-Speed 9293.00 samples/sec   Loss 9.3322   LearningRate 0.0812   Epoch: 1   Global Step: 33030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:11,152-Speed 9553.65 samples/sec   Loss 9.4277   LearningRate 0.0812   Epoch: 1   Global Step: 33040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:12,226-Speed 9551.04 samples/sec   Loss 9.4511   LearningRate 0.0812   Epoch: 1   Global Step: 33050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:13,289-Speed 9635.91 samples/sec   Loss 9.2886   LearningRate 0.0812   Epoch: 1   Global Step: 33060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:14,364-Speed 9537.40 samples/sec   Loss 9.3263   LearningRate 0.0812   Epoch: 1   Global Step: 33070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:15,444-Speed 9487.96 samples/sec   Loss 9.4438   LearningRate 0.0812   Epoch: 1   Global Step: 33080   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:52:16,508-Speed 9630.91 samples/sec   Loss 9.3068   LearningRate 0.0812   Epoch: 1   Global Step: 33090   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:52:17,623-Speed 9182.27 samples/sec   Loss 9.3506   LearningRate 0.0812   Epoch: 1   Global Step: 33100   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:52:18,730-Speed 9256.16 samples/sec   Loss 9.4511   LearningRate 0.0811   Epoch: 1   Global Step: 33110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:19,766-Speed 9890.34 samples/sec   Loss 9.5137   LearningRate 0.0811   Epoch: 1   Global Step: 33120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:20,844-Speed 9509.63 samples/sec   Loss 9.3503   LearningRate 0.0811   Epoch: 1   Global Step: 33130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:21,959-Speed 9188.08 samples/sec   Loss 9.3447   LearningRate 0.0811   Epoch: 1   Global Step: 33140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:23,060-Speed 9304.63 samples/sec   Loss 9.3398   LearningRate 0.0811   Epoch: 1   Global Step: 33150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:24,096-Speed 9891.99 samples/sec   Loss 9.4489   LearningRate 0.0811   Epoch: 1   Global Step: 33160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:25,194-Speed 9331.08 samples/sec   Loss 9.4166   LearningRate 0.0811   Epoch: 1   Global Step: 33170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:26,253-Speed 9680.41 samples/sec   Loss 9.4872   LearningRate 0.0811   Epoch: 1   Global Step: 33180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:27,336-Speed 9457.46 samples/sec   Loss 9.4438   LearningRate 0.0811   Epoch: 1   Global Step: 33190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:28,405-Speed 9581.09 samples/sec   Loss 9.5628   LearningRate 0.0811   Epoch: 1   Global Step: 33200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:29,462-Speed 9699.17 samples/sec   Loss 9.4219   LearningRate 0.0811   Epoch: 1   Global Step: 33210   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:52:30,535-Speed 9550.10 samples/sec   Loss 9.4436   LearningRate 0.0811   Epoch: 1   Global Step: 33220   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:52:31,627-Speed 9380.06 samples/sec   Loss 9.4443   LearningRate 0.0811   Epoch: 1   Global Step: 33230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:32,711-Speed 9447.60 samples/sec   Loss 9.4743   LearningRate 0.0811   Epoch: 1   Global Step: 33240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:33,772-Speed 9662.12 samples/sec   Loss 9.3936   LearningRate 0.0811   Epoch: 1   Global Step: 33250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:34,865-Speed 9376.86 samples/sec   Loss 9.4899   LearningRate 0.0811   Epoch: 1   Global Step: 33260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:35,967-Speed 9297.02 samples/sec   Loss 9.3525   LearningRate 0.0811   Epoch: 1   Global Step: 33270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:37,057-Speed 9407.22 samples/sec   Loss 9.3950   LearningRate 0.0811   Epoch: 1   Global Step: 33280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:38,140-Speed 9455.87 samples/sec   Loss 9.5069   LearningRate 0.0810   Epoch: 1   Global Step: 33290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:39,251-Speed 9224.72 samples/sec   Loss 9.5053   LearningRate 0.0810   Epoch: 1   Global Step: 33300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:40,331-Speed 9483.85 samples/sec   Loss 9.5029   LearningRate 0.0810   Epoch: 1   Global Step: 33310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:41,417-Speed 9431.54 samples/sec   Loss 9.3924   LearningRate 0.0810   Epoch: 1   Global Step: 33320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:42,481-Speed 9642.04 samples/sec   Loss 9.5133   LearningRate 0.0810   Epoch: 1   Global Step: 33330   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:52:43,547-Speed 9604.01 samples/sec   Loss 9.3729   LearningRate 0.0810   Epoch: 1   Global Step: 33340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:44,622-Speed 9537.47 samples/sec   Loss 9.3561   LearningRate 0.0810   Epoch: 1   Global Step: 33350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:45,698-Speed 9520.10 samples/sec   Loss 9.3005   LearningRate 0.0810   Epoch: 1   Global Step: 33360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:46,804-Speed 9260.28 samples/sec   Loss 9.3837   LearningRate 0.0810   Epoch: 1   Global Step: 33370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:52:48,101-Speed 7902.22 samples/sec   Loss 9.3482   LearningRate 0.0810   Epoch: 1   Global Step: 33380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:16,385-Speed 362.06 samples/sec   Loss 8.8359   LearningRate 0.0810   Epoch: 2   Global Step: 33390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:17,491-Speed 9270.30 samples/sec   Loss 8.6036   LearningRate 0.0810   Epoch: 2   Global Step: 33400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:18,593-Speed 9294.49 samples/sec   Loss 8.5436   LearningRate 0.0810   Epoch: 2   Global Step: 33410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:19,672-Speed 9497.50 samples/sec   Loss 8.6140   LearningRate 0.0810   Epoch: 2   Global Step: 33420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:21,086-Speed 7246.96 samples/sec   Loss 8.5390   LearningRate 0.0810   Epoch: 2   Global Step: 33430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:22,273-Speed 8630.00 samples/sec   Loss 8.5016   LearningRate 0.0810   Epoch: 2   Global Step: 33440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:23,375-Speed 9301.40 samples/sec   Loss 8.4967   LearningRate 0.0810   Epoch: 2   Global Step: 33450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:24,491-Speed 9182.05 samples/sec   Loss 8.6369   LearningRate 0.0810   Epoch: 2   Global Step: 33460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:25,551-Speed 9662.30 samples/sec   Loss 8.5898   LearningRate 0.0810   Epoch: 2   Global Step: 33470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:26,644-Speed 9375.10 samples/sec   Loss 8.6141   LearningRate 0.0809   Epoch: 2   Global Step: 33480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:27,761-Speed 9171.15 samples/sec   Loss 8.7718   LearningRate 0.0809   Epoch: 2   Global Step: 33490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:53:28,879-Speed 9170.59 samples/sec   Loss 8.6562   LearningRate 0.0809   Epoch: 2   Global Step: 33500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:53:29,953-Speed 9534.27 samples/sec   Loss 8.6487   LearningRate 0.0809   Epoch: 2   Global Step: 33510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:53:31,078-Speed 9110.84 samples/sec   Loss 8.6917   LearningRate 0.0809   Epoch: 2   Global Step: 33520   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:53:32,121-Speed 9822.60 samples/sec   Loss 8.6543   LearningRate 0.0809   Epoch: 2   Global Step: 33530   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:53:33,244-Speed 9123.34 samples/sec   Loss 8.6407   LearningRate 0.0809   Epoch: 2   Global Step: 33540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:53:34,355-Speed 9225.00 samples/sec   Loss 8.6084   LearningRate 0.0809   Epoch: 2   Global Step: 33550   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:53:35,388-Speed 9920.91 samples/sec   Loss 8.7200   LearningRate 0.0809   Epoch: 2   Global Step: 33560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:53:36,456-Speed 9595.47 samples/sec   Loss 8.6380   LearningRate 0.0809   Epoch: 2   Global Step: 33570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:53:37,537-Speed 9477.69 samples/sec   Loss 8.7323   LearningRate 0.0809   Epoch: 2   Global Step: 33580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:53:38,625-Speed 9419.52 samples/sec   Loss 8.5866   LearningRate 0.0809   Epoch: 2   Global Step: 33590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:39,702-Speed 9515.13 samples/sec   Loss 8.6267   LearningRate 0.0809   Epoch: 2   Global Step: 33600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:40,743-Speed 9845.15 samples/sec   Loss 8.6528   LearningRate 0.0809   Epoch: 2   Global Step: 33610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:41,998-Speed 8166.59 samples/sec   Loss 8.6600   LearningRate 0.0809   Epoch: 2   Global Step: 33620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:43,221-Speed 8376.63 samples/sec   Loss 8.7400   LearningRate 0.0809   Epoch: 2   Global Step: 33630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:44,829-Speed 6370.04 samples/sec   Loss 8.7941   LearningRate 0.0809   Epoch: 2   Global Step: 33640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:45,898-Speed 9583.35 samples/sec   Loss 8.7318   LearningRate 0.0809   Epoch: 2   Global Step: 33650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:46,944-Speed 9795.96 samples/sec   Loss 8.6820   LearningRate 0.0809   Epoch: 2   Global Step: 33660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:48,003-Speed 9671.48 samples/sec   Loss 8.6380   LearningRate 0.0808   Epoch: 2   Global Step: 33670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:49,099-Speed 9353.68 samples/sec   Loss 8.5868   LearningRate 0.0808   Epoch: 2   Global Step: 33680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:50,171-Speed 9557.74 samples/sec   Loss 8.5969   LearningRate 0.0808   Epoch: 2   Global Step: 33690   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:53:51,255-Speed 9446.07 samples/sec   Loss 8.6889   LearningRate 0.0808   Epoch: 2   Global Step: 33700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:52,304-Speed 9769.36 samples/sec   Loss 8.6853   LearningRate 0.0808   Epoch: 2   Global Step: 33710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:53,390-Speed 9434.62 samples/sec   Loss 8.5546   LearningRate 0.0808   Epoch: 2   Global Step: 33720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:54,455-Speed 9620.72 samples/sec   Loss 8.7785   LearningRate 0.0808   Epoch: 2   Global Step: 33730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:55,543-Speed 9423.88 samples/sec   Loss 8.6119   LearningRate 0.0808   Epoch: 2   Global Step: 33740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:56,577-Speed 9902.41 samples/sec   Loss 8.7408   LearningRate 0.0808   Epoch: 2   Global Step: 33750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:57,658-Speed 9482.63 samples/sec   Loss 8.6124   LearningRate 0.0808   Epoch: 2   Global Step: 33760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:58,708-Speed 9758.18 samples/sec   Loss 8.7068   LearningRate 0.0808   Epoch: 2   Global Step: 33770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:53:59,755-Speed 9784.63 samples/sec   Loss 8.6629   LearningRate 0.0808   Epoch: 2   Global Step: 33780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:00,810-Speed 9709.61 samples/sec   Loss 8.7244   LearningRate 0.0808   Epoch: 2   Global Step: 33790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:01,888-Speed 9507.08 samples/sec   Loss 8.6575   LearningRate 0.0808   Epoch: 2   Global Step: 33800   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:54:02,942-Speed 9719.66 samples/sec   Loss 8.7483   LearningRate 0.0808   Epoch: 2   Global Step: 33810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:04,014-Speed 9559.43 samples/sec   Loss 8.7844   LearningRate 0.0808   Epoch: 2   Global Step: 33820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:05,075-Speed 9660.04 samples/sec   Loss 8.7815   LearningRate 0.0808   Epoch: 2   Global Step: 33830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:06,154-Speed 9495.33 samples/sec   Loss 8.7386   LearningRate 0.0808   Epoch: 2   Global Step: 33840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:07,235-Speed 9483.24 samples/sec   Loss 8.6899   LearningRate 0.0807   Epoch: 2   Global Step: 33850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:08,319-Speed 9448.55 samples/sec   Loss 8.7067   LearningRate 0.0807   Epoch: 2   Global Step: 33860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:09,381-Speed 9650.53 samples/sec   Loss 8.7873   LearningRate 0.0807   Epoch: 2   Global Step: 33870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:10,475-Speed 9365.45 samples/sec   Loss 8.7104   LearningRate 0.0807   Epoch: 2   Global Step: 33880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:11,546-Speed 9566.24 samples/sec   Loss 8.8003   LearningRate 0.0807   Epoch: 2   Global Step: 33890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:12,653-Speed 9258.40 samples/sec   Loss 8.6235   LearningRate 0.0807   Epoch: 2   Global Step: 33900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:13,762-Speed 9239.43 samples/sec   Loss 8.7390   LearningRate 0.0807   Epoch: 2   Global Step: 33910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:14,860-Speed 9329.02 samples/sec   Loss 8.7169   LearningRate 0.0807   Epoch: 2   Global Step: 33920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:15,985-Speed 9103.39 samples/sec   Loss 8.7377   LearningRate 0.0807   Epoch: 2   Global Step: 33930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:17,113-Speed 9088.56 samples/sec   Loss 8.8265   LearningRate 0.0807   Epoch: 2   Global Step: 33940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:18,216-Speed 9288.38 samples/sec   Loss 8.5822   LearningRate 0.0807   Epoch: 2   Global Step: 33950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:19,304-Speed 9422.03 samples/sec   Loss 8.6262   LearningRate 0.0807   Epoch: 2   Global Step: 33960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:20,346-Speed 9828.84 samples/sec   Loss 8.7826   LearningRate 0.0807   Epoch: 2   Global Step: 33970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:21,412-Speed 9615.88 samples/sec   Loss 8.6856   LearningRate 0.0807   Epoch: 2   Global Step: 33980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:22,483-Speed 9563.97 samples/sec   Loss 8.6193   LearningRate 0.0807   Epoch: 2   Global Step: 33990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:23,529-Speed 9797.45 samples/sec   Loss 8.8220   LearningRate 0.0807   Epoch: 2   Global Step: 34000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:54:45,433-[lfw][34000]XNorm: 13.287884
Training: 2022-04-11 12:54:45,434-[lfw][34000]Accuracy-Flip: 0.99467+-0.00306
Training: 2022-04-11 12:54:45,434-[lfw][34000]Accuracy-Highest: 0.99533
Training: 2022-04-11 12:55:10,772-[cfp_fp][34000]XNorm: 11.163665
Training: 2022-04-11 12:55:10,772-[cfp_fp][34000]Accuracy-Flip: 0.93943+-0.01171
Training: 2022-04-11 12:55:10,773-[cfp_fp][34000]Accuracy-Highest: 0.93943
Training: 2022-04-11 12:55:32,618-[agedb_30][34000]XNorm: 12.864343
Training: 2022-04-11 12:55:32,618-[agedb_30][34000]Accuracy-Flip: 0.94900+-0.01379
Training: 2022-04-11 12:55:32,619-[agedb_30][34000]Accuracy-Highest: 0.94900
Training: 2022-04-11 12:55:33,703-Speed 145.92 samples/sec   Loss 8.8143   LearningRate 0.0807   Epoch: 2   Global Step: 34010   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:55:34,760-Speed 9694.02 samples/sec   Loss 8.7910   LearningRate 0.0807   Epoch: 2   Global Step: 34020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:55:35,825-Speed 9622.24 samples/sec   Loss 8.7506   LearningRate 0.0807   Epoch: 2   Global Step: 34030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:55:36,884-Speed 9673.38 samples/sec   Loss 8.7495   LearningRate 0.0806   Epoch: 2   Global Step: 34040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:55:37,918-Speed 9908.64 samples/sec   Loss 8.7269   LearningRate 0.0806   Epoch: 2   Global Step: 34050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:55:38,946-Speed 9965.11 samples/sec   Loss 8.8047   LearningRate 0.0806   Epoch: 2   Global Step: 34060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:55:40,008-Speed 9659.38 samples/sec   Loss 8.7177   LearningRate 0.0806   Epoch: 2   Global Step: 34070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:55:41,059-Speed 9745.48 samples/sec   Loss 8.6673   LearningRate 0.0806   Epoch: 2   Global Step: 34080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:55:42,138-Speed 9495.77 samples/sec   Loss 8.8057   LearningRate 0.0806   Epoch: 2   Global Step: 34090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:55:43,178-Speed 9855.58 samples/sec   Loss 8.7438   LearningRate 0.0806   Epoch: 2   Global Step: 34100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:55:44,249-Speed 9557.97 samples/sec   Loss 8.7973   LearningRate 0.0806   Epoch: 2   Global Step: 34110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:55:45,290-Speed 9846.87 samples/sec   Loss 8.7788   LearningRate 0.0806   Epoch: 2   Global Step: 34120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:55:46,400-Speed 9227.15 samples/sec   Loss 8.6546   LearningRate 0.0806   Epoch: 2   Global Step: 34130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:55:47,456-Speed 9708.85 samples/sec   Loss 8.7115   LearningRate 0.0806   Epoch: 2   Global Step: 34140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:55:48,555-Speed 9316.45 samples/sec   Loss 8.6035   LearningRate 0.0806   Epoch: 2   Global Step: 34150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:55:49,652-Speed 9349.03 samples/sec   Loss 8.7494   LearningRate 0.0806   Epoch: 2   Global Step: 34160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:55:50,713-Speed 9654.58 samples/sec   Loss 8.8211   LearningRate 0.0806   Epoch: 2   Global Step: 34170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:55:51,769-Speed 9701.38 samples/sec   Loss 8.8355   LearningRate 0.0806   Epoch: 2   Global Step: 34180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:55:52,824-Speed 9711.92 samples/sec   Loss 8.7341   LearningRate 0.0806   Epoch: 2   Global Step: 34190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:55:53,921-Speed 9334.81 samples/sec   Loss 8.8113   LearningRate 0.0806   Epoch: 2   Global Step: 34200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:55:55,006-Speed 9448.01 samples/sec   Loss 8.6526   LearningRate 0.0806   Epoch: 2   Global Step: 34210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:55:56,108-Speed 9298.49 samples/sec   Loss 8.8752   LearningRate 0.0805   Epoch: 2   Global Step: 34220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:55:57,228-Speed 9144.56 samples/sec   Loss 8.7981   LearningRate 0.0805   Epoch: 2   Global Step: 34230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:55:58,318-Speed 9405.45 samples/sec   Loss 8.7572   LearningRate 0.0805   Epoch: 2   Global Step: 34240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:55:59,428-Speed 9232.51 samples/sec   Loss 8.8518   LearningRate 0.0805   Epoch: 2   Global Step: 34250   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:56:00,486-Speed 9683.92 samples/sec   Loss 8.8035   LearningRate 0.0805   Epoch: 2   Global Step: 34260   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:56:01,510-Speed 10004.95 samples/sec   Loss 8.7139   LearningRate 0.0805   Epoch: 2   Global Step: 34270   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:56:02,545-Speed 9897.97 samples/sec   Loss 8.9207   LearningRate 0.0805   Epoch: 2   Global Step: 34280   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:56:03,597-Speed 9744.58 samples/sec   Loss 8.7473   LearningRate 0.0805   Epoch: 2   Global Step: 34290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:04,663-Speed 9608.62 samples/sec   Loss 8.7117   LearningRate 0.0805   Epoch: 2   Global Step: 34300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:05,734-Speed 9565.87 samples/sec   Loss 8.8715   LearningRate 0.0805   Epoch: 2   Global Step: 34310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:06,808-Speed 9538.69 samples/sec   Loss 8.7832   LearningRate 0.0805   Epoch: 2   Global Step: 34320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:07,863-Speed 9709.96 samples/sec   Loss 8.5845   LearningRate 0.0805   Epoch: 2   Global Step: 34330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:08,958-Speed 9360.30 samples/sec   Loss 8.8952   LearningRate 0.0805   Epoch: 2   Global Step: 34340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:10,035-Speed 9512.86 samples/sec   Loss 8.8102   LearningRate 0.0805   Epoch: 2   Global Step: 34350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:11,126-Speed 9390.12 samples/sec   Loss 8.7245   LearningRate 0.0805   Epoch: 2   Global Step: 34360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:12,192-Speed 9612.67 samples/sec   Loss 8.6812   LearningRate 0.0805   Epoch: 2   Global Step: 34370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:13,237-Speed 9804.25 samples/sec   Loss 8.7611   LearningRate 0.0805   Epoch: 2   Global Step: 34380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:14,303-Speed 9617.37 samples/sec   Loss 8.8646   LearningRate 0.0805   Epoch: 2   Global Step: 34390   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:56:15,400-Speed 9333.52 samples/sec   Loss 8.8751   LearningRate 0.0805   Epoch: 2   Global Step: 34400   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:56:16,488-Speed 9421.50 samples/sec   Loss 8.8845   LearningRate 0.0804   Epoch: 2   Global Step: 34410   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:56:17,618-Speed 9065.08 samples/sec   Loss 8.9081   LearningRate 0.0804   Epoch: 2   Global Step: 34420   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:56:18,717-Speed 9321.59 samples/sec   Loss 8.7776   LearningRate 0.0804   Epoch: 2   Global Step: 34430   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:56:19,818-Speed 9312.45 samples/sec   Loss 8.8775   LearningRate 0.0804   Epoch: 2   Global Step: 34440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:20,859-Speed 9845.47 samples/sec   Loss 8.9185   LearningRate 0.0804   Epoch: 2   Global Step: 34450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:21,914-Speed 9712.38 samples/sec   Loss 8.8180   LearningRate 0.0804   Epoch: 2   Global Step: 34460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:23,002-Speed 9412.47 samples/sec   Loss 8.8103   LearningRate 0.0804   Epoch: 2   Global Step: 34470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:24,045-Speed 9830.14 samples/sec   Loss 8.9768   LearningRate 0.0804   Epoch: 2   Global Step: 34480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:25,076-Speed 9931.56 samples/sec   Loss 8.9270   LearningRate 0.0804   Epoch: 2   Global Step: 34490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:26,125-Speed 9769.52 samples/sec   Loss 8.8499   LearningRate 0.0804   Epoch: 2   Global Step: 34500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:27,186-Speed 9658.10 samples/sec   Loss 8.8239   LearningRate 0.0804   Epoch: 2   Global Step: 34510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:28,246-Speed 9665.27 samples/sec   Loss 8.9536   LearningRate 0.0804   Epoch: 2   Global Step: 34520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:29,310-Speed 9631.96 samples/sec   Loss 8.8421   LearningRate 0.0804   Epoch: 2   Global Step: 34530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:30,370-Speed 9664.66 samples/sec   Loss 8.8632   LearningRate 0.0804   Epoch: 2   Global Step: 34540   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:56:31,427-Speed 9690.49 samples/sec   Loss 8.8778   LearningRate 0.0804   Epoch: 2   Global Step: 34550   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:56:32,477-Speed 9761.50 samples/sec   Loss 8.8343   LearningRate 0.0804   Epoch: 2   Global Step: 34560   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:56:33,543-Speed 9611.57 samples/sec   Loss 9.0016   LearningRate 0.0804   Epoch: 2   Global Step: 34570   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:56:34,630-Speed 9427.92 samples/sec   Loss 8.7997   LearningRate 0.0804   Epoch: 2   Global Step: 34580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:35,735-Speed 9266.25 samples/sec   Loss 8.9755   LearningRate 0.0803   Epoch: 2   Global Step: 34590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:36,822-Speed 9431.18 samples/sec   Loss 8.8979   LearningRate 0.0803   Epoch: 2   Global Step: 34600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:37,887-Speed 9621.19 samples/sec   Loss 8.8267   LearningRate 0.0803   Epoch: 2   Global Step: 34610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:38,951-Speed 9631.64 samples/sec   Loss 8.8826   LearningRate 0.0803   Epoch: 2   Global Step: 34620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:39,997-Speed 9795.05 samples/sec   Loss 8.8029   LearningRate 0.0803   Epoch: 2   Global Step: 34630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:41,044-Speed 9787.45 samples/sec   Loss 8.8471   LearningRate 0.0803   Epoch: 2   Global Step: 34640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:42,081-Speed 9883.08 samples/sec   Loss 8.8569   LearningRate 0.0803   Epoch: 2   Global Step: 34650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:43,198-Speed 9172.20 samples/sec   Loss 8.7394   LearningRate 0.0803   Epoch: 2   Global Step: 34660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:44,262-Speed 9624.03 samples/sec   Loss 8.9160   LearningRate 0.0803   Epoch: 2   Global Step: 34670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:45,327-Speed 9628.08 samples/sec   Loss 8.8742   LearningRate 0.0803   Epoch: 2   Global Step: 34680   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:56:46,423-Speed 9345.33 samples/sec   Loss 8.8375   LearningRate 0.0803   Epoch: 2   Global Step: 34690   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:56:47,466-Speed 9820.91 samples/sec   Loss 8.7929   LearningRate 0.0803   Epoch: 2   Global Step: 34700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:48,519-Speed 9732.14 samples/sec   Loss 8.9416   LearningRate 0.0803   Epoch: 2   Global Step: 34710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:49,584-Speed 9615.73 samples/sec   Loss 8.8926   LearningRate 0.0803   Epoch: 2   Global Step: 34720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:50,685-Speed 9315.68 samples/sec   Loss 8.9595   LearningRate 0.0803   Epoch: 2   Global Step: 34730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:51,771-Speed 9436.01 samples/sec   Loss 8.8756   LearningRate 0.0803   Epoch: 2   Global Step: 34740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:52,859-Speed 9409.65 samples/sec   Loss 8.9481   LearningRate 0.0803   Epoch: 2   Global Step: 34750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:56:53,909-Speed 9762.34 samples/sec   Loss 8.9176   LearningRate 0.0803   Epoch: 2   Global Step: 34760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:56:54,979-Speed 9570.47 samples/sec   Loss 8.9118   LearningRate 0.0803   Epoch: 2   Global Step: 34770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:56:56,043-Speed 9637.21 samples/sec   Loss 8.8941   LearningRate 0.0802   Epoch: 2   Global Step: 34780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:56:57,136-Speed 9371.91 samples/sec   Loss 8.9109   LearningRate 0.0802   Epoch: 2   Global Step: 34790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:56:58,218-Speed 9478.83 samples/sec   Loss 8.9331   LearningRate 0.0802   Epoch: 2   Global Step: 34800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:56:59,255-Speed 9878.01 samples/sec   Loss 8.9183   LearningRate 0.0802   Epoch: 2   Global Step: 34810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:57:00,330-Speed 9525.42 samples/sec   Loss 8.8684   LearningRate 0.0802   Epoch: 2   Global Step: 34820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:57:01,443-Speed 9211.19 samples/sec   Loss 8.7919   LearningRate 0.0802   Epoch: 2   Global Step: 34830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:57:02,509-Speed 9606.79 samples/sec   Loss 8.9931   LearningRate 0.0802   Epoch: 2   Global Step: 34840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:57:03,586-Speed 9514.18 samples/sec   Loss 8.9182   LearningRate 0.0802   Epoch: 2   Global Step: 34850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:57:04,645-Speed 9673.76 samples/sec   Loss 9.0019   LearningRate 0.0802   Epoch: 2   Global Step: 34860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:05,693-Speed 9774.59 samples/sec   Loss 8.9758   LearningRate 0.0802   Epoch: 2   Global Step: 34870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:06,724-Speed 9940.70 samples/sec   Loss 8.9437   LearningRate 0.0802   Epoch: 2   Global Step: 34880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:07,779-Speed 9709.71 samples/sec   Loss 8.8150   LearningRate 0.0802   Epoch: 2   Global Step: 34890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:08,827-Speed 9782.74 samples/sec   Loss 8.7662   LearningRate 0.0802   Epoch: 2   Global Step: 34900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:09,910-Speed 9457.40 samples/sec   Loss 8.8368   LearningRate 0.0802   Epoch: 2   Global Step: 34910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:11,007-Speed 9341.66 samples/sec   Loss 8.8876   LearningRate 0.0802   Epoch: 2   Global Step: 34920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:12,088-Speed 9477.76 samples/sec   Loss 8.9375   LearningRate 0.0802   Epoch: 2   Global Step: 34930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:13,154-Speed 9613.19 samples/sec   Loss 8.8742   LearningRate 0.0802   Epoch: 2   Global Step: 34940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:14,212-Speed 9676.92 samples/sec   Loss 8.7999   LearningRate 0.0802   Epoch: 2   Global Step: 34950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:15,301-Speed 9415.08 samples/sec   Loss 8.8954   LearningRate 0.0802   Epoch: 2   Global Step: 34960   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:57:16,373-Speed 9553.83 samples/sec   Loss 9.0614   LearningRate 0.0801   Epoch: 2   Global Step: 34970   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:57:17,440-Speed 9607.57 samples/sec   Loss 8.9927   LearningRate 0.0801   Epoch: 2   Global Step: 34980   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:57:18,562-Speed 9135.16 samples/sec   Loss 8.8527   LearningRate 0.0801   Epoch: 2   Global Step: 34990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:19,628-Speed 9606.86 samples/sec   Loss 8.9622   LearningRate 0.0801   Epoch: 2   Global Step: 35000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:20,716-Speed 9421.57 samples/sec   Loss 8.8350   LearningRate 0.0801   Epoch: 2   Global Step: 35010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:21,777-Speed 9662.39 samples/sec   Loss 8.9390   LearningRate 0.0801   Epoch: 2   Global Step: 35020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:22,851-Speed 9534.93 samples/sec   Loss 8.8514   LearningRate 0.0801   Epoch: 2   Global Step: 35030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:23,978-Speed 9096.57 samples/sec   Loss 8.8750   LearningRate 0.0801   Epoch: 2   Global Step: 35040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:25,041-Speed 9641.14 samples/sec   Loss 8.9435   LearningRate 0.0801   Epoch: 2   Global Step: 35050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:26,133-Speed 9381.50 samples/sec   Loss 8.8721   LearningRate 0.0801   Epoch: 2   Global Step: 35060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:27,220-Speed 9425.57 samples/sec   Loss 8.9568   LearningRate 0.0801   Epoch: 2   Global Step: 35070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:28,288-Speed 9591.81 samples/sec   Loss 8.9450   LearningRate 0.0801   Epoch: 2   Global Step: 35080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:29,350-Speed 9645.29 samples/sec   Loss 8.8819   LearningRate 0.0801   Epoch: 2   Global Step: 35090   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:57:30,420-Speed 9575.14 samples/sec   Loss 8.9730   LearningRate 0.0801   Epoch: 2   Global Step: 35100   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:57:31,509-Speed 9412.62 samples/sec   Loss 8.9795   LearningRate 0.0801   Epoch: 2   Global Step: 35110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:32,611-Speed 9292.43 samples/sec   Loss 8.9402   LearningRate 0.0801   Epoch: 2   Global Step: 35120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:33,672-Speed 9662.15 samples/sec   Loss 8.9560   LearningRate 0.0801   Epoch: 2   Global Step: 35130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:34,747-Speed 9534.96 samples/sec   Loss 8.9472   LearningRate 0.0801   Epoch: 2   Global Step: 35140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:35,788-Speed 9837.75 samples/sec   Loss 8.9536   LearningRate 0.0800   Epoch: 2   Global Step: 35150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:36,871-Speed 9458.69 samples/sec   Loss 8.9141   LearningRate 0.0800   Epoch: 2   Global Step: 35160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:37,938-Speed 9605.95 samples/sec   Loss 9.0176   LearningRate 0.0800   Epoch: 2   Global Step: 35170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:39,015-Speed 9516.45 samples/sec   Loss 9.0068   LearningRate 0.0800   Epoch: 2   Global Step: 35180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:40,061-Speed 9793.54 samples/sec   Loss 8.8076   LearningRate 0.0800   Epoch: 2   Global Step: 35190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:41,141-Speed 9490.93 samples/sec   Loss 8.8944   LearningRate 0.0800   Epoch: 2   Global Step: 35200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:57:42,181-Speed 9846.19 samples/sec   Loss 8.7943   LearningRate 0.0800   Epoch: 2   Global Step: 35210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:57:43,229-Speed 9775.83 samples/sec   Loss 8.8213   LearningRate 0.0800   Epoch: 2   Global Step: 35220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:57:44,283-Speed 9724.42 samples/sec   Loss 8.7849   LearningRate 0.0800   Epoch: 2   Global Step: 35230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:57:45,339-Speed 9705.27 samples/sec   Loss 8.8837   LearningRate 0.0800   Epoch: 2   Global Step: 35240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:57:46,379-Speed 9849.21 samples/sec   Loss 9.0079   LearningRate 0.0800   Epoch: 2   Global Step: 35250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:57:47,412-Speed 9919.36 samples/sec   Loss 8.9085   LearningRate 0.0800   Epoch: 2   Global Step: 35260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:57:48,533-Speed 9141.80 samples/sec   Loss 8.9345   LearningRate 0.0800   Epoch: 2   Global Step: 35270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:57:49,586-Speed 9723.28 samples/sec   Loss 8.9135   LearningRate 0.0800   Epoch: 2   Global Step: 35280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:57:50,691-Speed 9278.84 samples/sec   Loss 8.9163   LearningRate 0.0800   Epoch: 2   Global Step: 35290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 12:57:51,745-Speed 9718.78 samples/sec   Loss 8.8876   LearningRate 0.0800   Epoch: 2   Global Step: 35300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:52,822-Speed 9513.98 samples/sec   Loss 8.9251   LearningRate 0.0800   Epoch: 2   Global Step: 35310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:53,861-Speed 9860.95 samples/sec   Loss 8.9330   LearningRate 0.0800   Epoch: 2   Global Step: 35320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:54,942-Speed 9480.87 samples/sec   Loss 8.9122   LearningRate 0.0800   Epoch: 2   Global Step: 35330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:55,996-Speed 9724.07 samples/sec   Loss 9.0874   LearningRate 0.0799   Epoch: 2   Global Step: 35340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:57,104-Speed 9244.44 samples/sec   Loss 8.8894   LearningRate 0.0799   Epoch: 2   Global Step: 35350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:58,199-Speed 9362.26 samples/sec   Loss 9.0026   LearningRate 0.0799   Epoch: 2   Global Step: 35360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:57:59,303-Speed 9273.99 samples/sec   Loss 8.8781   LearningRate 0.0799   Epoch: 2   Global Step: 35370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:00,405-Speed 9301.07 samples/sec   Loss 9.0783   LearningRate 0.0799   Epoch: 2   Global Step: 35380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:01,463-Speed 9685.76 samples/sec   Loss 8.9445   LearningRate 0.0799   Epoch: 2   Global Step: 35390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:02,516-Speed 9729.78 samples/sec   Loss 8.9639   LearningRate 0.0799   Epoch: 2   Global Step: 35400   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:58:03,573-Speed 9690.12 samples/sec   Loss 8.9445   LearningRate 0.0799   Epoch: 2   Global Step: 35410   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:58:04,670-Speed 9339.30 samples/sec   Loss 8.9948   LearningRate 0.0799   Epoch: 2   Global Step: 35420   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:58:05,734-Speed 9634.05 samples/sec   Loss 8.9486   LearningRate 0.0799   Epoch: 2   Global Step: 35430   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:58:06,819-Speed 9439.85 samples/sec   Loss 9.0189   LearningRate 0.0799   Epoch: 2   Global Step: 35440   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:58:07,857-Speed 9872.07 samples/sec   Loss 8.9522   LearningRate 0.0799   Epoch: 2   Global Step: 35450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:08,912-Speed 9709.47 samples/sec   Loss 9.0246   LearningRate 0.0799   Epoch: 2   Global Step: 35460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:10,014-Speed 9295.95 samples/sec   Loss 8.9203   LearningRate 0.0799   Epoch: 2   Global Step: 35470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:11,070-Speed 9710.27 samples/sec   Loss 9.0261   LearningRate 0.0799   Epoch: 2   Global Step: 35480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:12,159-Speed 9405.14 samples/sec   Loss 9.1103   LearningRate 0.0799   Epoch: 2   Global Step: 35490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:13,255-Speed 9353.38 samples/sec   Loss 9.0426   LearningRate 0.0799   Epoch: 2   Global Step: 35500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:14,324-Speed 9585.45 samples/sec   Loss 9.1036   LearningRate 0.0799   Epoch: 2   Global Step: 35510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:15,366-Speed 9832.43 samples/sec   Loss 9.0563   LearningRate 0.0799   Epoch: 2   Global Step: 35520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:16,429-Speed 9639.98 samples/sec   Loss 8.9742   LearningRate 0.0798   Epoch: 2   Global Step: 35530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:17,525-Speed 9346.21 samples/sec   Loss 9.0255   LearningRate 0.0798   Epoch: 2   Global Step: 35540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:18,547-Speed 10021.12 samples/sec   Loss 9.0409   LearningRate 0.0798   Epoch: 2   Global Step: 35550   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:58:19,609-Speed 9652.72 samples/sec   Loss 9.1184   LearningRate 0.0798   Epoch: 2   Global Step: 35560   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:58:20,686-Speed 9511.83 samples/sec   Loss 8.8920   LearningRate 0.0798   Epoch: 2   Global Step: 35570   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:58:21,750-Speed 9630.43 samples/sec   Loss 9.1130   LearningRate 0.0798   Epoch: 2   Global Step: 35580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:22,835-Speed 9443.09 samples/sec   Loss 8.9935   LearningRate 0.0798   Epoch: 2   Global Step: 35590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:23,918-Speed 9461.90 samples/sec   Loss 8.9953   LearningRate 0.0798   Epoch: 2   Global Step: 35600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:25,012-Speed 9367.85 samples/sec   Loss 8.9744   LearningRate 0.0798   Epoch: 2   Global Step: 35610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:26,057-Speed 9801.88 samples/sec   Loss 8.9579   LearningRate 0.0798   Epoch: 2   Global Step: 35620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:27,108-Speed 9751.61 samples/sec   Loss 8.8666   LearningRate 0.0798   Epoch: 2   Global Step: 35630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:28,173-Speed 9625.69 samples/sec   Loss 8.8815   LearningRate 0.0798   Epoch: 2   Global Step: 35640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:29,279-Speed 9259.95 samples/sec   Loss 9.0195   LearningRate 0.0798   Epoch: 2   Global Step: 35650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:30,347-Speed 9597.70 samples/sec   Loss 9.1255   LearningRate 0.0798   Epoch: 2   Global Step: 35660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:31,469-Speed 9129.05 samples/sec   Loss 8.9756   LearningRate 0.0798   Epoch: 2   Global Step: 35670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:32,521-Speed 9744.38 samples/sec   Loss 8.9217   LearningRate 0.0798   Epoch: 2   Global Step: 35680   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:58:33,595-Speed 9538.10 samples/sec   Loss 8.9267   LearningRate 0.0798   Epoch: 2   Global Step: 35690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:34,634-Speed 9858.48 samples/sec   Loss 9.1153   LearningRate 0.0798   Epoch: 2   Global Step: 35700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:35,711-Speed 9516.49 samples/sec   Loss 8.8875   LearningRate 0.0797   Epoch: 2   Global Step: 35710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:36,773-Speed 9649.83 samples/sec   Loss 8.8996   LearningRate 0.0797   Epoch: 2   Global Step: 35720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:37,882-Speed 9233.60 samples/sec   Loss 8.9421   LearningRate 0.0797   Epoch: 2   Global Step: 35730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:38,954-Speed 9558.73 samples/sec   Loss 9.0324   LearningRate 0.0797   Epoch: 2   Global Step: 35740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:40,016-Speed 9654.24 samples/sec   Loss 8.9048   LearningRate 0.0797   Epoch: 2   Global Step: 35750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:41,066-Speed 9754.08 samples/sec   Loss 8.9650   LearningRate 0.0797   Epoch: 2   Global Step: 35760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:42,165-Speed 9325.69 samples/sec   Loss 9.0279   LearningRate 0.0797   Epoch: 2   Global Step: 35770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:43,212-Speed 9787.10 samples/sec   Loss 8.9117   LearningRate 0.0797   Epoch: 2   Global Step: 35780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:44,267-Speed 9711.07 samples/sec   Loss 8.9128   LearningRate 0.0797   Epoch: 2   Global Step: 35790   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:58:45,320-Speed 9727.55 samples/sec   Loss 8.9128   LearningRate 0.0797   Epoch: 2   Global Step: 35800   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:58:46,368-Speed 9784.65 samples/sec   Loss 9.0628   LearningRate 0.0797   Epoch: 2   Global Step: 35810   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:58:47,453-Speed 9440.23 samples/sec   Loss 8.9478   LearningRate 0.0797   Epoch: 2   Global Step: 35820   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:58:48,530-Speed 9513.07 samples/sec   Loss 8.9365   LearningRate 0.0797   Epoch: 2   Global Step: 35830   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:58:49,607-Speed 9517.85 samples/sec   Loss 8.9108   LearningRate 0.0797   Epoch: 2   Global Step: 35840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:50,712-Speed 9273.86 samples/sec   Loss 8.8955   LearningRate 0.0797   Epoch: 2   Global Step: 35850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:51,773-Speed 9651.90 samples/sec   Loss 9.0093   LearningRate 0.0797   Epoch: 2   Global Step: 35860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:52,844-Speed 9567.90 samples/sec   Loss 8.9248   LearningRate 0.0797   Epoch: 2   Global Step: 35870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:53,911-Speed 9605.64 samples/sec   Loss 8.9815   LearningRate 0.0797   Epoch: 2   Global Step: 35880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:55,043-Speed 9048.87 samples/sec   Loss 9.0243   LearningRate 0.0797   Epoch: 2   Global Step: 35890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:56,134-Speed 9392.23 samples/sec   Loss 8.9424   LearningRate 0.0796   Epoch: 2   Global Step: 35900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:57,202-Speed 9596.95 samples/sec   Loss 8.9432   LearningRate 0.0796   Epoch: 2   Global Step: 35910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:58,280-Speed 9502.81 samples/sec   Loss 8.9727   LearningRate 0.0796   Epoch: 2   Global Step: 35920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:58:59,368-Speed 9413.12 samples/sec   Loss 9.0301   LearningRate 0.0796   Epoch: 2   Global Step: 35930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 12:59:00,471-Speed 9292.11 samples/sec   Loss 9.0853   LearningRate 0.0796   Epoch: 2   Global Step: 35940   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:59:01,587-Speed 9184.89 samples/sec   Loss 8.8599   LearningRate 0.0796   Epoch: 2   Global Step: 35950   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:59:02,657-Speed 9575.67 samples/sec   Loss 9.0722   LearningRate 0.0796   Epoch: 2   Global Step: 35960   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:59:03,768-Speed 9221.04 samples/sec   Loss 8.9520   LearningRate 0.0796   Epoch: 2   Global Step: 35970   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:59:04,886-Speed 9170.32 samples/sec   Loss 8.9953   LearningRate 0.0796   Epoch: 2   Global Step: 35980   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:59:05,956-Speed 9572.35 samples/sec   Loss 9.0154   LearningRate 0.0796   Epoch: 2   Global Step: 35990   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:59:07,017-Speed 9653.93 samples/sec   Loss 9.0127   LearningRate 0.0796   Epoch: 2   Global Step: 36000   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 12:59:28,802-[lfw][36000]XNorm: 13.217900
Training: 2022-04-11 12:59:28,802-[lfw][36000]Accuracy-Flip: 0.99233+-0.00423
Training: 2022-04-11 12:59:28,803-[lfw][36000]Accuracy-Highest: 0.99533
Training: 2022-04-11 12:59:54,051-[cfp_fp][36000]XNorm: 11.195447
Training: 2022-04-11 12:59:54,052-[cfp_fp][36000]Accuracy-Flip: 0.93986+-0.01076
Training: 2022-04-11 12:59:54,052-[cfp_fp][36000]Accuracy-Highest: 0.93986
Training: 2022-04-11 13:00:15,821-[agedb_30][36000]XNorm: 12.856884
Training: 2022-04-11 13:00:15,822-[agedb_30][36000]Accuracy-Flip: 0.94950+-0.01108
Training: 2022-04-11 13:00:15,822-[agedb_30][36000]Accuracy-Highest: 0.94950
Training: 2022-04-11 13:00:16,892-Speed 146.55 samples/sec   Loss 9.0474   LearningRate 0.0796   Epoch: 2   Global Step: 36010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:17,953-Speed 9659.55 samples/sec   Loss 8.9576   LearningRate 0.0796   Epoch: 2   Global Step: 36020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:19,042-Speed 9402.02 samples/sec   Loss 8.9291   LearningRate 0.0796   Epoch: 2   Global Step: 36030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:20,138-Speed 9353.44 samples/sec   Loss 9.0817   LearningRate 0.0796   Epoch: 2   Global Step: 36040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:21,218-Speed 9490.42 samples/sec   Loss 8.9079   LearningRate 0.0796   Epoch: 2   Global Step: 36050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:22,296-Speed 9504.37 samples/sec   Loss 8.8977   LearningRate 0.0796   Epoch: 2   Global Step: 36060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:23,352-Speed 9708.82 samples/sec   Loss 9.0590   LearningRate 0.0796   Epoch: 2   Global Step: 36070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:24,416-Speed 9628.85 samples/sec   Loss 8.9535   LearningRate 0.0796   Epoch: 2   Global Step: 36080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:25,476-Speed 9661.30 samples/sec   Loss 9.1140   LearningRate 0.0795   Epoch: 2   Global Step: 36090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:26,534-Speed 9688.12 samples/sec   Loss 8.9867   LearningRate 0.0795   Epoch: 2   Global Step: 36100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:27,577-Speed 9819.66 samples/sec   Loss 8.9759   LearningRate 0.0795   Epoch: 2   Global Step: 36110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:28,625-Speed 9779.17 samples/sec   Loss 8.9939   LearningRate 0.0795   Epoch: 2   Global Step: 36120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:29,712-Speed 9421.82 samples/sec   Loss 9.0746   LearningRate 0.0795   Epoch: 2   Global Step: 36130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:30,785-Speed 9550.59 samples/sec   Loss 9.0484   LearningRate 0.0795   Epoch: 2   Global Step: 36140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:31,859-Speed 9546.50 samples/sec   Loss 8.9638   LearningRate 0.0795   Epoch: 2   Global Step: 36150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:32,948-Speed 9400.98 samples/sec   Loss 8.9858   LearningRate 0.0795   Epoch: 2   Global Step: 36160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:34,049-Speed 9310.65 samples/sec   Loss 9.0294   LearningRate 0.0795   Epoch: 2   Global Step: 36170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:35,129-Speed 9486.40 samples/sec   Loss 9.0535   LearningRate 0.0795   Epoch: 2   Global Step: 36180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:36,191-Speed 9642.74 samples/sec   Loss 9.0385   LearningRate 0.0795   Epoch: 2   Global Step: 36190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:37,280-Speed 9413.23 samples/sec   Loss 8.9956   LearningRate 0.0795   Epoch: 2   Global Step: 36200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:38,387-Speed 9250.33 samples/sec   Loss 9.0360   LearningRate 0.0795   Epoch: 2   Global Step: 36210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:39,487-Speed 9321.59 samples/sec   Loss 9.1179   LearningRate 0.0795   Epoch: 2   Global Step: 36220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:40,592-Speed 9272.37 samples/sec   Loss 9.0112   LearningRate 0.0795   Epoch: 2   Global Step: 36230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:41,696-Speed 9279.43 samples/sec   Loss 8.9221   LearningRate 0.0795   Epoch: 2   Global Step: 36240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:42,788-Speed 9377.72 samples/sec   Loss 9.1400   LearningRate 0.0795   Epoch: 2   Global Step: 36250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:43,905-Speed 9174.35 samples/sec   Loss 9.0618   LearningRate 0.0795   Epoch: 2   Global Step: 36260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:44,979-Speed 9536.91 samples/sec   Loss 9.0623   LearningRate 0.0795   Epoch: 2   Global Step: 36270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:46,046-Speed 9607.97 samples/sec   Loss 9.0125   LearningRate 0.0794   Epoch: 2   Global Step: 36280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:47,126-Speed 9483.98 samples/sec   Loss 8.9811   LearningRate 0.0794   Epoch: 2   Global Step: 36290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:48,185-Speed 9675.59 samples/sec   Loss 9.0893   LearningRate 0.0794   Epoch: 2   Global Step: 36300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:49,301-Speed 9187.76 samples/sec   Loss 9.0063   LearningRate 0.0794   Epoch: 2   Global Step: 36310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:50,369-Speed 9591.61 samples/sec   Loss 9.0084   LearningRate 0.0794   Epoch: 2   Global Step: 36320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:51,501-Speed 9053.02 samples/sec   Loss 8.9519   LearningRate 0.0794   Epoch: 2   Global Step: 36330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:52,563-Speed 9648.84 samples/sec   Loss 9.0453   LearningRate 0.0794   Epoch: 2   Global Step: 36340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:53,637-Speed 9541.29 samples/sec   Loss 9.0336   LearningRate 0.0794   Epoch: 2   Global Step: 36350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:54,741-Speed 9280.96 samples/sec   Loss 8.9602   LearningRate 0.0794   Epoch: 2   Global Step: 36360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:55,813-Speed 9552.63 samples/sec   Loss 9.0198   LearningRate 0.0794   Epoch: 2   Global Step: 36370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:56,894-Speed 9483.00 samples/sec   Loss 8.9406   LearningRate 0.0794   Epoch: 2   Global Step: 36380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:57,944-Speed 9762.22 samples/sec   Loss 8.8988   LearningRate 0.0794   Epoch: 2   Global Step: 36390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:00:59,082-Speed 8999.13 samples/sec   Loss 9.0030   LearningRate 0.0794   Epoch: 2   Global Step: 36400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:00,180-Speed 9330.04 samples/sec   Loss 9.0084   LearningRate 0.0794   Epoch: 2   Global Step: 36410   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:01:01,279-Speed 9329.15 samples/sec   Loss 9.0028   LearningRate 0.0794   Epoch: 2   Global Step: 36420   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:01:02,339-Speed 9662.98 samples/sec   Loss 8.9342   LearningRate 0.0794   Epoch: 2   Global Step: 36430   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:01:03,402-Speed 9633.76 samples/sec   Loss 9.0220   LearningRate 0.0794   Epoch: 2   Global Step: 36440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:04,507-Speed 9277.65 samples/sec   Loss 8.9949   LearningRate 0.0794   Epoch: 2   Global Step: 36450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:05,530-Speed 10009.92 samples/sec   Loss 8.9402   LearningRate 0.0793   Epoch: 2   Global Step: 36460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:06,610-Speed 9491.02 samples/sec   Loss 9.0547   LearningRate 0.0793   Epoch: 2   Global Step: 36470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:07,716-Speed 9268.28 samples/sec   Loss 8.9632   LearningRate 0.0793   Epoch: 2   Global Step: 36480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:08,806-Speed 9397.54 samples/sec   Loss 9.0883   LearningRate 0.0793   Epoch: 2   Global Step: 36490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:09,846-Speed 9849.54 samples/sec   Loss 9.0055   LearningRate 0.0793   Epoch: 2   Global Step: 36500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:10,896-Speed 9764.15 samples/sec   Loss 9.0475   LearningRate 0.0793   Epoch: 2   Global Step: 36510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:11,973-Speed 9507.03 samples/sec   Loss 8.8101   LearningRate 0.0793   Epoch: 2   Global Step: 36520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:13,061-Speed 9417.06 samples/sec   Loss 8.9322   LearningRate 0.0793   Epoch: 2   Global Step: 36530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:14,152-Speed 9394.13 samples/sec   Loss 9.0557   LearningRate 0.0793   Epoch: 2   Global Step: 36540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:15,234-Speed 9465.90 samples/sec   Loss 8.8593   LearningRate 0.0793   Epoch: 2   Global Step: 36550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:16,311-Speed 9514.86 samples/sec   Loss 9.1445   LearningRate 0.0793   Epoch: 2   Global Step: 36560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:17,434-Speed 9119.24 samples/sec   Loss 8.9231   LearningRate 0.0793   Epoch: 2   Global Step: 36570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:18,510-Speed 9528.07 samples/sec   Loss 8.9645   LearningRate 0.0793   Epoch: 2   Global Step: 36580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:19,559-Speed 9766.10 samples/sec   Loss 8.9154   LearningRate 0.0793   Epoch: 2   Global Step: 36590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:20,653-Speed 9364.00 samples/sec   Loss 9.0256   LearningRate 0.0793   Epoch: 2   Global Step: 36600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:21,775-Speed 9136.67 samples/sec   Loss 9.1624   LearningRate 0.0793   Epoch: 2   Global Step: 36610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:22,890-Speed 9193.07 samples/sec   Loss 9.0058   LearningRate 0.0793   Epoch: 2   Global Step: 36620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:24,249-Speed 7539.13 samples/sec   Loss 9.0002   LearningRate 0.0793   Epoch: 2   Global Step: 36630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:25,316-Speed 9610.63 samples/sec   Loss 9.0874   LearningRate 0.0793   Epoch: 2   Global Step: 36640   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:01:26,407-Speed 9385.68 samples/sec   Loss 9.0635   LearningRate 0.0792   Epoch: 2   Global Step: 36650   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:01:27,509-Speed 9303.24 samples/sec   Loss 9.0252   LearningRate 0.0792   Epoch: 2   Global Step: 36660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:28,579-Speed 9583.62 samples/sec   Loss 9.1095   LearningRate 0.0792   Epoch: 2   Global Step: 36670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:29,674-Speed 9350.01 samples/sec   Loss 9.0492   LearningRate 0.0792   Epoch: 2   Global Step: 36680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:30,772-Speed 9334.21 samples/sec   Loss 8.8967   LearningRate 0.0792   Epoch: 2   Global Step: 36690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:31,839-Speed 9598.23 samples/sec   Loss 9.0981   LearningRate 0.0792   Epoch: 2   Global Step: 36700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:32,914-Speed 9532.46 samples/sec   Loss 8.9801   LearningRate 0.0792   Epoch: 2   Global Step: 36710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:33,998-Speed 9454.58 samples/sec   Loss 9.1562   LearningRate 0.0792   Epoch: 2   Global Step: 36720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:35,086-Speed 9412.64 samples/sec   Loss 9.0839   LearningRate 0.0792   Epoch: 2   Global Step: 36730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:36,165-Speed 9500.91 samples/sec   Loss 9.0680   LearningRate 0.0792   Epoch: 2   Global Step: 36740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:37,232-Speed 9598.40 samples/sec   Loss 9.0636   LearningRate 0.0792   Epoch: 2   Global Step: 36750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:38,353-Speed 9139.24 samples/sec   Loss 9.0248   LearningRate 0.0792   Epoch: 2   Global Step: 36760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:39,433-Speed 9499.92 samples/sec   Loss 9.0367   LearningRate 0.0792   Epoch: 2   Global Step: 36770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:40,570-Speed 9013.23 samples/sec   Loss 9.0454   LearningRate 0.0792   Epoch: 2   Global Step: 36780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:41,697-Speed 9093.05 samples/sec   Loss 9.1089   LearningRate 0.0792   Epoch: 2   Global Step: 36790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:42,783-Speed 9442.56 samples/sec   Loss 8.9476   LearningRate 0.0792   Epoch: 2   Global Step: 36800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:43,830-Speed 9787.59 samples/sec   Loss 9.1052   LearningRate 0.0792   Epoch: 2   Global Step: 36810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:44,892-Speed 9651.84 samples/sec   Loss 9.0299   LearningRate 0.0792   Epoch: 2   Global Step: 36820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:45,910-Speed 10057.78 samples/sec   Loss 8.9701   LearningRate 0.0792   Epoch: 2   Global Step: 36830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:46,950-Speed 9858.24 samples/sec   Loss 9.0443   LearningRate 0.0791   Epoch: 2   Global Step: 36840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:47,987-Speed 9879.12 samples/sec   Loss 8.9434   LearningRate 0.0791   Epoch: 2   Global Step: 36850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:49,032-Speed 9799.97 samples/sec   Loss 9.0296   LearningRate 0.0791   Epoch: 2   Global Step: 36860   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:01:50,088-Speed 9706.80 samples/sec   Loss 9.0074   LearningRate 0.0791   Epoch: 2   Global Step: 36870   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:01:51,146-Speed 9689.30 samples/sec   Loss 9.1142   LearningRate 0.0791   Epoch: 2   Global Step: 36880   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:01:52,223-Speed 9514.48 samples/sec   Loss 9.0844   LearningRate 0.0791   Epoch: 2   Global Step: 36890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:53,275-Speed 9734.00 samples/sec   Loss 9.0061   LearningRate 0.0791   Epoch: 2   Global Step: 36900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:54,348-Speed 9554.13 samples/sec   Loss 9.0077   LearningRate 0.0791   Epoch: 2   Global Step: 36910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:55,374-Speed 9979.93 samples/sec   Loss 9.1047   LearningRate 0.0791   Epoch: 2   Global Step: 36920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:56,452-Speed 9505.76 samples/sec   Loss 8.9723   LearningRate 0.0791   Epoch: 2   Global Step: 36930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:57,519-Speed 9604.70 samples/sec   Loss 9.0433   LearningRate 0.0791   Epoch: 2   Global Step: 36940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:58,586-Speed 9604.37 samples/sec   Loss 9.0893   LearningRate 0.0791   Epoch: 2   Global Step: 36950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:01:59,646-Speed 9666.39 samples/sec   Loss 9.0271   LearningRate 0.0791   Epoch: 2   Global Step: 36960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:00,677-Speed 9932.89 samples/sec   Loss 8.8995   LearningRate 0.0791   Epoch: 2   Global Step: 36970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:01,767-Speed 9399.90 samples/sec   Loss 8.9446   LearningRate 0.0791   Epoch: 2   Global Step: 36980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:02,841-Speed 9553.01 samples/sec   Loss 9.0390   LearningRate 0.0791   Epoch: 2   Global Step: 36990   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:02:03,916-Speed 9527.07 samples/sec   Loss 8.9755   LearningRate 0.0791   Epoch: 2   Global Step: 37000   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:02:04,975-Speed 9681.85 samples/sec   Loss 8.9523   LearningRate 0.0791   Epoch: 2   Global Step: 37010   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:02:06,054-Speed 9491.83 samples/sec   Loss 9.0587   LearningRate 0.0791   Epoch: 2   Global Step: 37020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:07,119-Speed 9626.04 samples/sec   Loss 9.1389   LearningRate 0.0790   Epoch: 2   Global Step: 37030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:08,194-Speed 9526.74 samples/sec   Loss 9.0202   LearningRate 0.0790   Epoch: 2   Global Step: 37040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:09,257-Speed 9641.92 samples/sec   Loss 8.9856   LearningRate 0.0790   Epoch: 2   Global Step: 37050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:10,410-Speed 8884.77 samples/sec   Loss 9.0157   LearningRate 0.0790   Epoch: 2   Global Step: 37060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:11,497-Speed 9430.27 samples/sec   Loss 9.0765   LearningRate 0.0790   Epoch: 2   Global Step: 37070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:12,576-Speed 9494.02 samples/sec   Loss 9.0667   LearningRate 0.0790   Epoch: 2   Global Step: 37080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:13,649-Speed 9556.64 samples/sec   Loss 9.0701   LearningRate 0.0790   Epoch: 2   Global Step: 37090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:14,702-Speed 9725.21 samples/sec   Loss 8.9451   LearningRate 0.0790   Epoch: 2   Global Step: 37100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:15,738-Speed 9890.46 samples/sec   Loss 8.9523   LearningRate 0.0790   Epoch: 2   Global Step: 37110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:16,783-Speed 9803.64 samples/sec   Loss 9.0650   LearningRate 0.0790   Epoch: 2   Global Step: 37120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:17,883-Speed 9314.03 samples/sec   Loss 8.9989   LearningRate 0.0790   Epoch: 2   Global Step: 37130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:18,991-Speed 9246.55 samples/sec   Loss 9.0423   LearningRate 0.0790   Epoch: 2   Global Step: 37140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:20,081-Speed 9397.57 samples/sec   Loss 8.8333   LearningRate 0.0790   Epoch: 2   Global Step: 37150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:21,195-Speed 9201.40 samples/sec   Loss 9.0161   LearningRate 0.0790   Epoch: 2   Global Step: 37160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:22,268-Speed 9560.72 samples/sec   Loss 9.1452   LearningRate 0.0790   Epoch: 2   Global Step: 37170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:23,337-Speed 9585.96 samples/sec   Loss 9.1845   LearningRate 0.0790   Epoch: 2   Global Step: 37180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:24,443-Speed 9264.37 samples/sec   Loss 9.0347   LearningRate 0.0790   Epoch: 2   Global Step: 37190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:25,533-Speed 9407.99 samples/sec   Loss 9.0508   LearningRate 0.0790   Epoch: 2   Global Step: 37200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:26,643-Speed 9230.80 samples/sec   Loss 9.0164   LearningRate 0.0789   Epoch: 2   Global Step: 37210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:27,695-Speed 9735.36 samples/sec   Loss 9.0288   LearningRate 0.0789   Epoch: 2   Global Step: 37220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:28,776-Speed 9484.15 samples/sec   Loss 8.9979   LearningRate 0.0789   Epoch: 2   Global Step: 37230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:29,871-Speed 9356.18 samples/sec   Loss 9.1038   LearningRate 0.0789   Epoch: 2   Global Step: 37240   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:02:30,948-Speed 9511.58 samples/sec   Loss 9.0282   LearningRate 0.0789   Epoch: 2   Global Step: 37250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:32,019-Speed 9562.00 samples/sec   Loss 9.1071   LearningRate 0.0789   Epoch: 2   Global Step: 37260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:33,091-Speed 9557.07 samples/sec   Loss 8.9428   LearningRate 0.0789   Epoch: 2   Global Step: 37270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:34,139-Speed 9779.20 samples/sec   Loss 9.0356   LearningRate 0.0789   Epoch: 2   Global Step: 37280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:35,220-Speed 9476.74 samples/sec   Loss 9.0561   LearningRate 0.0789   Epoch: 2   Global Step: 37290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:36,269-Speed 9771.93 samples/sec   Loss 8.9578   LearningRate 0.0789   Epoch: 2   Global Step: 37300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:37,340-Speed 9564.87 samples/sec   Loss 8.9849   LearningRate 0.0789   Epoch: 2   Global Step: 37310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:38,415-Speed 9531.86 samples/sec   Loss 8.9693   LearningRate 0.0789   Epoch: 2   Global Step: 37320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:39,521-Speed 9266.74 samples/sec   Loss 9.0403   LearningRate 0.0789   Epoch: 2   Global Step: 37330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:40,626-Speed 9272.92 samples/sec   Loss 8.8455   LearningRate 0.0789   Epoch: 2   Global Step: 37340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:02:41,725-Speed 9329.56 samples/sec   Loss 9.0450   LearningRate 0.0789   Epoch: 2   Global Step: 37350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:42,811-Speed 9432.91 samples/sec   Loss 9.0942   LearningRate 0.0789   Epoch: 2   Global Step: 37360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:43,875-Speed 9625.81 samples/sec   Loss 8.9358   LearningRate 0.0789   Epoch: 2   Global Step: 37370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:44,959-Speed 9451.45 samples/sec   Loss 9.0701   LearningRate 0.0789   Epoch: 2   Global Step: 37380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:46,040-Speed 9480.44 samples/sec   Loss 9.0275   LearningRate 0.0789   Epoch: 2   Global Step: 37390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:47,070-Speed 9950.07 samples/sec   Loss 9.0804   LearningRate 0.0788   Epoch: 2   Global Step: 37400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:48,142-Speed 9550.94 samples/sec   Loss 9.1316   LearningRate 0.0788   Epoch: 2   Global Step: 37410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:49,197-Speed 9714.23 samples/sec   Loss 8.9944   LearningRate 0.0788   Epoch: 2   Global Step: 37420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:50,226-Speed 9959.97 samples/sec   Loss 9.0244   LearningRate 0.0788   Epoch: 2   Global Step: 37430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:51,318-Speed 9386.20 samples/sec   Loss 8.9669   LearningRate 0.0788   Epoch: 2   Global Step: 37440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:52,411-Speed 9372.99 samples/sec   Loss 9.0671   LearningRate 0.0788   Epoch: 2   Global Step: 37450   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:02:53,525-Speed 9198.43 samples/sec   Loss 9.1388   LearningRate 0.0788   Epoch: 2   Global Step: 37460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:54,627-Speed 9292.64 samples/sec   Loss 9.0757   LearningRate 0.0788   Epoch: 2   Global Step: 37470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:55,706-Speed 9497.23 samples/sec   Loss 9.1355   LearningRate 0.0788   Epoch: 2   Global Step: 37480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:56,777-Speed 9569.49 samples/sec   Loss 9.0546   LearningRate 0.0788   Epoch: 2   Global Step: 37490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:57,845-Speed 9592.96 samples/sec   Loss 9.0698   LearningRate 0.0788   Epoch: 2   Global Step: 37500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:58,918-Speed 9553.18 samples/sec   Loss 9.0766   LearningRate 0.0788   Epoch: 2   Global Step: 37510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:02:59,997-Speed 9494.40 samples/sec   Loss 9.0122   LearningRate 0.0788   Epoch: 2   Global Step: 37520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:01,094-Speed 9346.46 samples/sec   Loss 8.9494   LearningRate 0.0788   Epoch: 2   Global Step: 37530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:02,131-Speed 9872.58 samples/sec   Loss 8.9992   LearningRate 0.0788   Epoch: 2   Global Step: 37540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:03,194-Speed 9642.05 samples/sec   Loss 9.0597   LearningRate 0.0788   Epoch: 2   Global Step: 37550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:04,294-Speed 9313.23 samples/sec   Loss 9.0213   LearningRate 0.0788   Epoch: 2   Global Step: 37560   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:03:05,322-Speed 9964.83 samples/sec   Loss 9.0749   LearningRate 0.0788   Epoch: 2   Global Step: 37570   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:03:06,399-Speed 9514.38 samples/sec   Loss 9.1401   LearningRate 0.0788   Epoch: 2   Global Step: 37580   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:03:07,441-Speed 9837.21 samples/sec   Loss 9.0169   LearningRate 0.0787   Epoch: 2   Global Step: 37590   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:03:08,479-Speed 9868.54 samples/sec   Loss 8.8961   LearningRate 0.0787   Epoch: 2   Global Step: 37600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:09,520-Speed 9839.89 samples/sec   Loss 9.0085   LearningRate 0.0787   Epoch: 2   Global Step: 37610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:10,597-Speed 9522.81 samples/sec   Loss 8.9871   LearningRate 0.0787   Epoch: 2   Global Step: 37620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:11,700-Speed 9282.92 samples/sec   Loss 9.0576   LearningRate 0.0787   Epoch: 2   Global Step: 37630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:12,774-Speed 9538.28 samples/sec   Loss 8.9760   LearningRate 0.0787   Epoch: 2   Global Step: 37640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:13,840-Speed 9616.17 samples/sec   Loss 8.8450   LearningRate 0.0787   Epoch: 2   Global Step: 37650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:14,931-Speed 9388.28 samples/sec   Loss 9.0729   LearningRate 0.0787   Epoch: 2   Global Step: 37660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:16,017-Speed 9433.38 samples/sec   Loss 9.0746   LearningRate 0.0787   Epoch: 2   Global Step: 37670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:17,066-Speed 9765.46 samples/sec   Loss 8.9919   LearningRate 0.0787   Epoch: 2   Global Step: 37680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:18,155-Speed 9416.80 samples/sec   Loss 8.8136   LearningRate 0.0787   Epoch: 2   Global Step: 37690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:19,199-Speed 9812.94 samples/sec   Loss 9.0042   LearningRate 0.0787   Epoch: 2   Global Step: 37700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:20,263-Speed 9625.30 samples/sec   Loss 9.1071   LearningRate 0.0787   Epoch: 2   Global Step: 37710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:21,360-Speed 9343.82 samples/sec   Loss 9.0399   LearningRate 0.0787   Epoch: 2   Global Step: 37720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:22,444-Speed 9455.83 samples/sec   Loss 9.0127   LearningRate 0.0787   Epoch: 2   Global Step: 37730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:23,541-Speed 9340.83 samples/sec   Loss 8.9573   LearningRate 0.0787   Epoch: 2   Global Step: 37740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:24,632-Speed 9392.56 samples/sec   Loss 8.9842   LearningRate 0.0787   Epoch: 2   Global Step: 37750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:25,702-Speed 9572.78 samples/sec   Loss 9.0717   LearningRate 0.0787   Epoch: 2   Global Step: 37760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:26,756-Speed 9719.88 samples/sec   Loss 8.9239   LearningRate 0.0787   Epoch: 2   Global Step: 37770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:27,810-Speed 9725.59 samples/sec   Loss 9.0303   LearningRate 0.0786   Epoch: 2   Global Step: 37780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:28,932-Speed 9130.65 samples/sec   Loss 9.0989   LearningRate 0.0786   Epoch: 2   Global Step: 37790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:30,011-Speed 9492.73 samples/sec   Loss 9.1439   LearningRate 0.0786   Epoch: 2   Global Step: 37800   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:03:31,043-Speed 9932.33 samples/sec   Loss 8.9891   LearningRate 0.0786   Epoch: 2   Global Step: 37810   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:03:32,115-Speed 9556.67 samples/sec   Loss 8.9333   LearningRate 0.0786   Epoch: 2   Global Step: 37820   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:03:33,194-Speed 9496.78 samples/sec   Loss 8.9646   LearningRate 0.0786   Epoch: 2   Global Step: 37830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:34,288-Speed 9365.39 samples/sec   Loss 8.8659   LearningRate 0.0786   Epoch: 2   Global Step: 37840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:35,387-Speed 9320.04 samples/sec   Loss 8.9612   LearningRate 0.0786   Epoch: 2   Global Step: 37850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:36,507-Speed 9154.77 samples/sec   Loss 9.0698   LearningRate 0.0786   Epoch: 2   Global Step: 37860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:37,582-Speed 9527.97 samples/sec   Loss 8.9910   LearningRate 0.0786   Epoch: 2   Global Step: 37870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:38,676-Speed 9365.21 samples/sec   Loss 9.0615   LearningRate 0.0786   Epoch: 2   Global Step: 37880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:39,751-Speed 9533.03 samples/sec   Loss 9.0274   LearningRate 0.0786   Epoch: 2   Global Step: 37890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:40,790-Speed 9868.05 samples/sec   Loss 9.0998   LearningRate 0.0786   Epoch: 2   Global Step: 37900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:41,900-Speed 9223.33 samples/sec   Loss 9.0930   LearningRate 0.0786   Epoch: 2   Global Step: 37910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:42,957-Speed 9693.23 samples/sec   Loss 8.9242   LearningRate 0.0786   Epoch: 2   Global Step: 37920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:03:44,008-Speed 9754.53 samples/sec   Loss 8.9647   LearningRate 0.0786   Epoch: 2   Global Step: 37930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:45,093-Speed 9442.34 samples/sec   Loss 9.0396   LearningRate 0.0786   Epoch: 2   Global Step: 37940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:46,185-Speed 9387.87 samples/sec   Loss 8.9657   LearningRate 0.0786   Epoch: 2   Global Step: 37950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:47,253-Speed 9590.78 samples/sec   Loss 9.0547   LearningRate 0.0786   Epoch: 2   Global Step: 37960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:48,325-Speed 9557.80 samples/sec   Loss 9.0636   LearningRate 0.0785   Epoch: 2   Global Step: 37970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:49,383-Speed 9683.73 samples/sec   Loss 8.9812   LearningRate 0.0785   Epoch: 2   Global Step: 37980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:50,468-Speed 9448.88 samples/sec   Loss 8.9998   LearningRate 0.0785   Epoch: 2   Global Step: 37990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:03:51,496-Speed 9967.16 samples/sec   Loss 8.8745   LearningRate 0.0785   Epoch: 2   Global Step: 38000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:04:13,481-[lfw][38000]XNorm: 13.333310
Training: 2022-04-11 13:04:13,482-[lfw][38000]Accuracy-Flip: 0.99533+-0.00256
Training: 2022-04-11 13:04:13,482-[lfw][38000]Accuracy-Highest: 0.99533
Training: 2022-04-11 13:04:38,859-[cfp_fp][38000]XNorm: 11.191384
Training: 2022-04-11 13:04:38,860-[cfp_fp][38000]Accuracy-Flip: 0.93886+-0.01285
Training: 2022-04-11 13:04:38,860-[cfp_fp][38000]Accuracy-Highest: 0.93986
Training: 2022-04-11 13:05:00,667-[agedb_30][38000]XNorm: 12.882927
Training: 2022-04-11 13:05:00,668-[agedb_30][38000]Accuracy-Flip: 0.95333+-0.01140
Training: 2022-04-11 13:05:00,669-[agedb_30][38000]Accuracy-Highest: 0.95333
Training: 2022-04-11 13:05:01,754-Speed 145.75 samples/sec   Loss 8.9804   LearningRate 0.0785   Epoch: 2   Global Step: 38010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:02,806-Speed 9735.86 samples/sec   Loss 8.9816   LearningRate 0.0785   Epoch: 2   Global Step: 38020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:03,852-Speed 9800.46 samples/sec   Loss 8.9943   LearningRate 0.0785   Epoch: 2   Global Step: 38030   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:05:04,884-Speed 9922.68 samples/sec   Loss 9.0138   LearningRate 0.0785   Epoch: 2   Global Step: 38040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:05,926-Speed 9836.59 samples/sec   Loss 8.9417   LearningRate 0.0785   Epoch: 2   Global Step: 38050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:07,002-Speed 9520.15 samples/sec   Loss 9.0427   LearningRate 0.0785   Epoch: 2   Global Step: 38060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:08,079-Speed 9516.31 samples/sec   Loss 9.0201   LearningRate 0.0785   Epoch: 2   Global Step: 38070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:09,183-Speed 9282.52 samples/sec   Loss 9.0218   LearningRate 0.0785   Epoch: 2   Global Step: 38080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:10,231-Speed 9774.01 samples/sec   Loss 8.9910   LearningRate 0.0785   Epoch: 2   Global Step: 38090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:11,289-Speed 9683.28 samples/sec   Loss 9.1933   LearningRate 0.0785   Epoch: 2   Global Step: 38100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:12,336-Speed 9794.15 samples/sec   Loss 9.0043   LearningRate 0.0785   Epoch: 2   Global Step: 38110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:13,362-Speed 9981.23 samples/sec   Loss 9.0363   LearningRate 0.0785   Epoch: 2   Global Step: 38120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:14,457-Speed 9356.36 samples/sec   Loss 9.1038   LearningRate 0.0785   Epoch: 2   Global Step: 38130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:15,533-Speed 9522.57 samples/sec   Loss 8.8659   LearningRate 0.0785   Epoch: 2   Global Step: 38140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:16,618-Speed 9443.31 samples/sec   Loss 8.8252   LearningRate 0.0784   Epoch: 2   Global Step: 38150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:17,705-Speed 9421.59 samples/sec   Loss 9.1835   LearningRate 0.0784   Epoch: 2   Global Step: 38160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:18,726-Speed 10049.74 samples/sec   Loss 9.1246   LearningRate 0.0784   Epoch: 2   Global Step: 38170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:19,803-Speed 9511.18 samples/sec   Loss 9.1013   LearningRate 0.0784   Epoch: 2   Global Step: 38180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:20,929-Speed 9106.07 samples/sec   Loss 9.0300   LearningRate 0.0784   Epoch: 2   Global Step: 38190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:21,971-Speed 9826.20 samples/sec   Loss 8.9327   LearningRate 0.0784   Epoch: 2   Global Step: 38200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:23,035-Speed 9636.24 samples/sec   Loss 8.9470   LearningRate 0.0784   Epoch: 2   Global Step: 38210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:24,120-Speed 9442.43 samples/sec   Loss 8.9805   LearningRate 0.0784   Epoch: 2   Global Step: 38220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:25,197-Speed 9514.71 samples/sec   Loss 8.9078   LearningRate 0.0784   Epoch: 2   Global Step: 38230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:26,230-Speed 9917.19 samples/sec   Loss 9.0737   LearningRate 0.0784   Epoch: 2   Global Step: 38240   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:05:27,289-Speed 9678.38 samples/sec   Loss 9.0451   LearningRate 0.0784   Epoch: 2   Global Step: 38250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:28,390-Speed 9311.68 samples/sec   Loss 9.0789   LearningRate 0.0784   Epoch: 2   Global Step: 38260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:29,496-Speed 9258.65 samples/sec   Loss 8.9823   LearningRate 0.0784   Epoch: 2   Global Step: 38270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:30,549-Speed 9728.22 samples/sec   Loss 9.0247   LearningRate 0.0784   Epoch: 2   Global Step: 38280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:31,590-Speed 9846.66 samples/sec   Loss 9.0113   LearningRate 0.0784   Epoch: 2   Global Step: 38290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:32,690-Speed 9309.44 samples/sec   Loss 8.9419   LearningRate 0.0784   Epoch: 2   Global Step: 38300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:33,733-Speed 9830.43 samples/sec   Loss 9.1244   LearningRate 0.0784   Epoch: 2   Global Step: 38310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:34,828-Speed 9351.01 samples/sec   Loss 9.0540   LearningRate 0.0784   Epoch: 2   Global Step: 38320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:35,888-Speed 9669.14 samples/sec   Loss 9.0714   LearningRate 0.0784   Epoch: 2   Global Step: 38330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:36,938-Speed 9754.82 samples/sec   Loss 8.9675   LearningRate 0.0783   Epoch: 2   Global Step: 38340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:38,038-Speed 9313.36 samples/sec   Loss 9.0125   LearningRate 0.0783   Epoch: 2   Global Step: 38350   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:05:39,090-Speed 9743.33 samples/sec   Loss 9.0745   LearningRate 0.0783   Epoch: 2   Global Step: 38360   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:05:40,168-Speed 9508.70 samples/sec   Loss 8.8973   LearningRate 0.0783   Epoch: 2   Global Step: 38370   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:05:41,288-Speed 9145.66 samples/sec   Loss 9.1078   LearningRate 0.0783   Epoch: 2   Global Step: 38380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:42,390-Speed 9301.19 samples/sec   Loss 9.1042   LearningRate 0.0783   Epoch: 2   Global Step: 38390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:43,449-Speed 9671.03 samples/sec   Loss 9.0682   LearningRate 0.0783   Epoch: 2   Global Step: 38400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:44,508-Speed 9674.85 samples/sec   Loss 8.9549   LearningRate 0.0783   Epoch: 2   Global Step: 38410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:45,575-Speed 9607.62 samples/sec   Loss 9.0231   LearningRate 0.0783   Epoch: 2   Global Step: 38420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:46,649-Speed 9538.45 samples/sec   Loss 8.9901   LearningRate 0.0783   Epoch: 2   Global Step: 38430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:47,750-Speed 9306.62 samples/sec   Loss 9.0904   LearningRate 0.0783   Epoch: 2   Global Step: 38440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:48,816-Speed 9610.91 samples/sec   Loss 9.1360   LearningRate 0.0783   Epoch: 2   Global Step: 38450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:49,912-Speed 9347.16 samples/sec   Loss 9.1725   LearningRate 0.0783   Epoch: 2   Global Step: 38460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:50,975-Speed 9642.25 samples/sec   Loss 8.9715   LearningRate 0.0783   Epoch: 2   Global Step: 38470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:05:52,053-Speed 9507.79 samples/sec   Loss 9.0778   LearningRate 0.0783   Epoch: 2   Global Step: 38480   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:05:53,119-Speed 9608.53 samples/sec   Loss 8.9594   LearningRate 0.0783   Epoch: 2   Global Step: 38490   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:05:54,185-Speed 9609.83 samples/sec   Loss 8.9563   LearningRate 0.0783   Epoch: 2   Global Step: 38500   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:05:55,210-Speed 9994.23 samples/sec   Loss 8.9442   LearningRate 0.0783   Epoch: 2   Global Step: 38510   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:05:56,277-Speed 9608.12 samples/sec   Loss 9.0028   LearningRate 0.0783   Epoch: 2   Global Step: 38520   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:05:57,365-Speed 9423.01 samples/sec   Loss 9.0192   LearningRate 0.0782   Epoch: 2   Global Step: 38530   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:05:58,464-Speed 9319.16 samples/sec   Loss 9.1128   LearningRate 0.0782   Epoch: 2   Global Step: 38540   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:05:59,575-Speed 9220.73 samples/sec   Loss 8.9655   LearningRate 0.0782   Epoch: 2   Global Step: 38550   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:06:00,646-Speed 9573.01 samples/sec   Loss 9.0797   LearningRate 0.0782   Epoch: 2   Global Step: 38560   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:06:01,703-Speed 9691.41 samples/sec   Loss 8.9637   LearningRate 0.0782   Epoch: 2   Global Step: 38570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:02,788-Speed 9445.67 samples/sec   Loss 8.9535   LearningRate 0.0782   Epoch: 2   Global Step: 38580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:03,877-Speed 9407.43 samples/sec   Loss 8.9365   LearningRate 0.0782   Epoch: 2   Global Step: 38590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:04,975-Speed 9337.86 samples/sec   Loss 9.0495   LearningRate 0.0782   Epoch: 2   Global Step: 38600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:06,069-Speed 9365.76 samples/sec   Loss 8.8749   LearningRate 0.0782   Epoch: 2   Global Step: 38610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:07,146-Speed 9510.06 samples/sec   Loss 9.0596   LearningRate 0.0782   Epoch: 2   Global Step: 38620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:08,189-Speed 9822.76 samples/sec   Loss 8.9461   LearningRate 0.0782   Epoch: 2   Global Step: 38630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:09,272-Speed 9467.67 samples/sec   Loss 9.0530   LearningRate 0.0782   Epoch: 2   Global Step: 38640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:10,308-Speed 9889.60 samples/sec   Loss 9.0207   LearningRate 0.0782   Epoch: 2   Global Step: 38650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:11,333-Speed 9998.50 samples/sec   Loss 9.0922   LearningRate 0.0782   Epoch: 2   Global Step: 38660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:12,391-Speed 9681.47 samples/sec   Loss 9.0205   LearningRate 0.0782   Epoch: 2   Global Step: 38670   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:06:13,426-Speed 9894.67 samples/sec   Loss 9.0699   LearningRate 0.0782   Epoch: 2   Global Step: 38680   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:06:14,453-Speed 9983.05 samples/sec   Loss 9.1481   LearningRate 0.0782   Epoch: 2   Global Step: 38690   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:06:15,529-Speed 9516.60 samples/sec   Loss 9.0408   LearningRate 0.0782   Epoch: 2   Global Step: 38700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:16,627-Speed 9337.15 samples/sec   Loss 9.1100   LearningRate 0.0782   Epoch: 2   Global Step: 38710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:17,738-Speed 9217.76 samples/sec   Loss 9.0947   LearningRate 0.0781   Epoch: 2   Global Step: 38720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:18,877-Speed 8997.11 samples/sec   Loss 9.0377   LearningRate 0.0781   Epoch: 2   Global Step: 38730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:19,959-Speed 9465.71 samples/sec   Loss 8.9832   LearningRate 0.0781   Epoch: 2   Global Step: 38740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:21,052-Speed 9373.38 samples/sec   Loss 9.0960   LearningRate 0.0781   Epoch: 2   Global Step: 38750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:22,149-Speed 9348.24 samples/sec   Loss 9.0277   LearningRate 0.0781   Epoch: 2   Global Step: 38760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:23,201-Speed 9742.97 samples/sec   Loss 9.0165   LearningRate 0.0781   Epoch: 2   Global Step: 38770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:24,266-Speed 9613.47 samples/sec   Loss 9.0477   LearningRate 0.0781   Epoch: 2   Global Step: 38780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:25,380-Speed 9199.94 samples/sec   Loss 8.9384   LearningRate 0.0781   Epoch: 2   Global Step: 38790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:26,477-Speed 9337.49 samples/sec   Loss 9.0152   LearningRate 0.0781   Epoch: 2   Global Step: 38800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:27,545-Speed 9597.08 samples/sec   Loss 9.0148   LearningRate 0.0781   Epoch: 2   Global Step: 38810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:28,610-Speed 9619.74 samples/sec   Loss 9.0217   LearningRate 0.0781   Epoch: 2   Global Step: 38820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:29,656-Speed 9797.38 samples/sec   Loss 9.0388   LearningRate 0.0781   Epoch: 2   Global Step: 38830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:30,736-Speed 9490.15 samples/sec   Loss 8.8573   LearningRate 0.0781   Epoch: 2   Global Step: 38840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:31,822-Speed 9431.17 samples/sec   Loss 9.0855   LearningRate 0.0781   Epoch: 2   Global Step: 38850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:32,938-Speed 9187.70 samples/sec   Loss 8.8926   LearningRate 0.0781   Epoch: 2   Global Step: 38860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:34,009-Speed 9563.31 samples/sec   Loss 8.9891   LearningRate 0.0781   Epoch: 2   Global Step: 38870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:35,086-Speed 9511.60 samples/sec   Loss 9.0444   LearningRate 0.0781   Epoch: 2   Global Step: 38880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:36,155-Speed 9586.25 samples/sec   Loss 9.0013   LearningRate 0.0781   Epoch: 2   Global Step: 38890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:37,221-Speed 9608.91 samples/sec   Loss 8.8537   LearningRate 0.0781   Epoch: 2   Global Step: 38900   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:06:38,294-Speed 9549.65 samples/sec   Loss 8.8822   LearningRate 0.0780   Epoch: 2   Global Step: 38910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:39,336-Speed 9836.16 samples/sec   Loss 8.9245   LearningRate 0.0780   Epoch: 2   Global Step: 38920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:40,402-Speed 9615.80 samples/sec   Loss 8.9691   LearningRate 0.0780   Epoch: 2   Global Step: 38930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:41,509-Speed 9257.77 samples/sec   Loss 8.9465   LearningRate 0.0780   Epoch: 2   Global Step: 38940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:42,605-Speed 9350.10 samples/sec   Loss 9.0324   LearningRate 0.0780   Epoch: 2   Global Step: 38950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:43,683-Speed 9506.30 samples/sec   Loss 8.9701   LearningRate 0.0780   Epoch: 2   Global Step: 38960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:44,714-Speed 9937.61 samples/sec   Loss 8.9488   LearningRate 0.0780   Epoch: 2   Global Step: 38970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:45,775-Speed 9658.04 samples/sec   Loss 9.0468   LearningRate 0.0780   Epoch: 2   Global Step: 38980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:46,851-Speed 9520.99 samples/sec   Loss 9.0414   LearningRate 0.0780   Epoch: 2   Global Step: 38990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:47,915-Speed 9624.48 samples/sec   Loss 9.0539   LearningRate 0.0780   Epoch: 2   Global Step: 39000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:48,968-Speed 9730.67 samples/sec   Loss 8.9250   LearningRate 0.0780   Epoch: 2   Global Step: 39010   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:06:50,034-Speed 9618.43 samples/sec   Loss 9.0693   LearningRate 0.0780   Epoch: 2   Global Step: 39020   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:06:51,129-Speed 9359.76 samples/sec   Loss 9.0676   LearningRate 0.0780   Epoch: 2   Global Step: 39030   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:06:52,206-Speed 9507.51 samples/sec   Loss 9.0796   LearningRate 0.0780   Epoch: 2   Global Step: 39040   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:06:53,342-Speed 9027.02 samples/sec   Loss 9.0446   LearningRate 0.0780   Epoch: 2   Global Step: 39050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:06:54,434-Speed 9385.06 samples/sec   Loss 9.0727   LearningRate 0.0780   Epoch: 2   Global Step: 39060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:06:55,507-Speed 9551.02 samples/sec   Loss 9.0596   LearningRate 0.0780   Epoch: 2   Global Step: 39070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:06:56,614-Speed 9257.31 samples/sec   Loss 9.0073   LearningRate 0.0780   Epoch: 2   Global Step: 39080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:06:57,723-Speed 9235.77 samples/sec   Loss 8.9504   LearningRate 0.0780   Epoch: 2   Global Step: 39090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:06:58,800-Speed 9513.75 samples/sec   Loss 9.0704   LearningRate 0.0779   Epoch: 2   Global Step: 39100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:06:59,863-Speed 9640.40 samples/sec   Loss 9.1015   LearningRate 0.0779   Epoch: 2   Global Step: 39110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:07:00,939-Speed 9529.09 samples/sec   Loss 8.9928   LearningRate 0.0779   Epoch: 2   Global Step: 39120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:07:02,002-Speed 9638.19 samples/sec   Loss 8.9522   LearningRate 0.0779   Epoch: 2   Global Step: 39130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:07:03,056-Speed 9719.37 samples/sec   Loss 8.9065   LearningRate 0.0779   Epoch: 2   Global Step: 39140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:07:04,125-Speed 9581.29 samples/sec   Loss 8.8552   LearningRate 0.0779   Epoch: 2   Global Step: 39150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:07:05,194-Speed 9585.18 samples/sec   Loss 9.0344   LearningRate 0.0779   Epoch: 2   Global Step: 39160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:06,299-Speed 9270.80 samples/sec   Loss 9.0266   LearningRate 0.0779   Epoch: 2   Global Step: 39170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:07,349-Speed 9759.12 samples/sec   Loss 8.9799   LearningRate 0.0779   Epoch: 2   Global Step: 39180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:08,432-Speed 9460.80 samples/sec   Loss 9.0874   LearningRate 0.0779   Epoch: 2   Global Step: 39190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:09,491-Speed 9674.12 samples/sec   Loss 8.9767   LearningRate 0.0779   Epoch: 2   Global Step: 39200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:10,610-Speed 9161.02 samples/sec   Loss 8.9860   LearningRate 0.0779   Epoch: 2   Global Step: 39210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:11,708-Speed 9332.56 samples/sec   Loss 9.0072   LearningRate 0.0779   Epoch: 2   Global Step: 39220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:12,756-Speed 9771.43 samples/sec   Loss 9.0098   LearningRate 0.0779   Epoch: 2   Global Step: 39230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:13,851-Speed 9359.77 samples/sec   Loss 9.0797   LearningRate 0.0779   Epoch: 2   Global Step: 39240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:14,949-Speed 9328.53 samples/sec   Loss 9.0399   LearningRate 0.0779   Epoch: 2   Global Step: 39250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:16,009-Speed 9664.59 samples/sec   Loss 9.0129   LearningRate 0.0779   Epoch: 2   Global Step: 39260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:17,085-Speed 9523.69 samples/sec   Loss 8.9659   LearningRate 0.0779   Epoch: 2   Global Step: 39270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:18,202-Speed 9178.16 samples/sec   Loss 9.0637   LearningRate 0.0779   Epoch: 2   Global Step: 39280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:19,265-Speed 9639.24 samples/sec   Loss 9.0412   LearningRate 0.0778   Epoch: 2   Global Step: 39290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:20,348-Speed 9459.78 samples/sec   Loss 9.0261   LearningRate 0.0778   Epoch: 2   Global Step: 39300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:21,408-Speed 9673.86 samples/sec   Loss 8.8913   LearningRate 0.0778   Epoch: 2   Global Step: 39310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:22,490-Speed 9467.26 samples/sec   Loss 8.9986   LearningRate 0.0778   Epoch: 2   Global Step: 39320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:23,565-Speed 9529.73 samples/sec   Loss 9.0947   LearningRate 0.0778   Epoch: 2   Global Step: 39330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:24,650-Speed 9444.26 samples/sec   Loss 9.0652   LearningRate 0.0778   Epoch: 2   Global Step: 39340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:25,686-Speed 9892.44 samples/sec   Loss 9.0534   LearningRate 0.0778   Epoch: 2   Global Step: 39350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:26,779-Speed 9375.48 samples/sec   Loss 8.9911   LearningRate 0.0778   Epoch: 2   Global Step: 39360   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:07:27,869-Speed 9399.17 samples/sec   Loss 8.9213   LearningRate 0.0778   Epoch: 2   Global Step: 39370   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:07:28,930-Speed 9656.82 samples/sec   Loss 8.9801   LearningRate 0.0778   Epoch: 2   Global Step: 39380   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:07:29,999-Speed 9586.91 samples/sec   Loss 8.9421   LearningRate 0.0778   Epoch: 2   Global Step: 39390   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:07:31,146-Speed 8934.54 samples/sec   Loss 9.0456   LearningRate 0.0778   Epoch: 2   Global Step: 39400   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:07:32,210-Speed 9633.91 samples/sec   Loss 8.9966   LearningRate 0.0778   Epoch: 2   Global Step: 39410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:33,297-Speed 9425.94 samples/sec   Loss 9.0230   LearningRate 0.0778   Epoch: 2   Global Step: 39420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:34,390-Speed 9368.24 samples/sec   Loss 9.0225   LearningRate 0.0778   Epoch: 2   Global Step: 39430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:35,476-Speed 9439.99 samples/sec   Loss 9.0912   LearningRate 0.0778   Epoch: 2   Global Step: 39440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:36,573-Speed 9345.29 samples/sec   Loss 9.0754   LearningRate 0.0778   Epoch: 2   Global Step: 39450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:37,634-Speed 9654.26 samples/sec   Loss 9.0054   LearningRate 0.0778   Epoch: 2   Global Step: 39460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:38,745-Speed 9222.74 samples/sec   Loss 9.0311   LearningRate 0.0778   Epoch: 2   Global Step: 39470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:39,820-Speed 9531.52 samples/sec   Loss 8.9716   LearningRate 0.0777   Epoch: 2   Global Step: 39480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:40,908-Speed 9412.25 samples/sec   Loss 8.9523   LearningRate 0.0777   Epoch: 2   Global Step: 39490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:43,330-Speed 9615.85 samples/sec   Loss 8.9285   LearningRate 0.0777   Epoch: 2   Global Step: 39500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:45,903-Speed 9342.29 samples/sec   Loss 8.9795   LearningRate 0.0777   Epoch: 2   Global Step: 39510   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:07:47,030-Speed 9093.75 samples/sec   Loss 8.9484   LearningRate 0.0777   Epoch: 2   Global Step: 39520   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:07:48,083-Speed 9733.87 samples/sec   Loss 8.9716   LearningRate 0.0777   Epoch: 2   Global Step: 39530   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:07:49,166-Speed 9456.64 samples/sec   Loss 8.9302   LearningRate 0.0777   Epoch: 2   Global Step: 39540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:50,293-Speed 9092.86 samples/sec   Loss 9.0153   LearningRate 0.0777   Epoch: 2   Global Step: 39550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:51,368-Speed 9533.40 samples/sec   Loss 8.9689   LearningRate 0.0777   Epoch: 2   Global Step: 39560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:52,471-Speed 9288.41 samples/sec   Loss 8.9724   LearningRate 0.0777   Epoch: 2   Global Step: 39570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:53,544-Speed 9552.90 samples/sec   Loss 8.8756   LearningRate 0.0777   Epoch: 2   Global Step: 39580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:54,647-Speed 9288.06 samples/sec   Loss 8.9805   LearningRate 0.0777   Epoch: 2   Global Step: 39590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:55,756-Speed 9236.97 samples/sec   Loss 8.9996   LearningRate 0.0777   Epoch: 2   Global Step: 39600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:56,853-Speed 9347.10 samples/sec   Loss 8.8793   LearningRate 0.0777   Epoch: 2   Global Step: 39610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:57,930-Speed 9505.03 samples/sec   Loss 9.1432   LearningRate 0.0777   Epoch: 2   Global Step: 39620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:07:59,006-Speed 9526.51 samples/sec   Loss 9.0518   LearningRate 0.0777   Epoch: 2   Global Step: 39630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:00,072-Speed 9614.38 samples/sec   Loss 8.9290   LearningRate 0.0777   Epoch: 2   Global Step: 39640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:01,121-Speed 9764.77 samples/sec   Loss 8.9225   LearningRate 0.0777   Epoch: 2   Global Step: 39650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:02,184-Speed 9641.17 samples/sec   Loss 8.9615   LearningRate 0.0777   Epoch: 2   Global Step: 39660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:03,252-Speed 9598.57 samples/sec   Loss 8.9117   LearningRate 0.0776   Epoch: 2   Global Step: 39670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:04,328-Speed 9521.50 samples/sec   Loss 8.8826   LearningRate 0.0776   Epoch: 2   Global Step: 39680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:05,390-Speed 9650.58 samples/sec   Loss 8.9962   LearningRate 0.0776   Epoch: 2   Global Step: 39690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:06,467-Speed 9509.54 samples/sec   Loss 8.8959   LearningRate 0.0776   Epoch: 2   Global Step: 39700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:07,552-Speed 9443.59 samples/sec   Loss 9.0422   LearningRate 0.0776   Epoch: 2   Global Step: 39710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:08,620-Speed 9595.07 samples/sec   Loss 8.9598   LearningRate 0.0776   Epoch: 2   Global Step: 39720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:09,719-Speed 9320.28 samples/sec   Loss 8.9922   LearningRate 0.0776   Epoch: 2   Global Step: 39730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:10,788-Speed 9586.75 samples/sec   Loss 8.9530   LearningRate 0.0776   Epoch: 2   Global Step: 39740   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:08:11,862-Speed 9532.77 samples/sec   Loss 8.9990   LearningRate 0.0776   Epoch: 2   Global Step: 39750   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:08:12,922-Speed 9670.27 samples/sec   Loss 8.9049   LearningRate 0.0776   Epoch: 2   Global Step: 39760   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:08:14,008-Speed 9433.45 samples/sec   Loss 8.9300   LearningRate 0.0776   Epoch: 2   Global Step: 39770   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:08:15,101-Speed 9376.35 samples/sec   Loss 8.9809   LearningRate 0.0776   Epoch: 2   Global Step: 39780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:16,150-Speed 9778.57 samples/sec   Loss 8.8477   LearningRate 0.0776   Epoch: 2   Global Step: 39790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:17,181-Speed 9933.53 samples/sec   Loss 8.9855   LearningRate 0.0776   Epoch: 2   Global Step: 39800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:18,271-Speed 9404.56 samples/sec   Loss 8.9277   LearningRate 0.0776   Epoch: 2   Global Step: 39810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:19,336-Speed 9623.82 samples/sec   Loss 9.0801   LearningRate 0.0776   Epoch: 2   Global Step: 39820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:20,435-Speed 9319.87 samples/sec   Loss 8.9686   LearningRate 0.0776   Epoch: 2   Global Step: 39830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:21,552-Speed 9176.28 samples/sec   Loss 9.0234   LearningRate 0.0776   Epoch: 2   Global Step: 39840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:22,657-Speed 9270.56 samples/sec   Loss 8.9916   LearningRate 0.0775   Epoch: 2   Global Step: 39850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:23,713-Speed 9695.65 samples/sec   Loss 8.9879   LearningRate 0.0775   Epoch: 2   Global Step: 39860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:24,737-Speed 10010.89 samples/sec   Loss 9.0470   LearningRate 0.0775   Epoch: 2   Global Step: 39870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:25,781-Speed 9813.76 samples/sec   Loss 8.9809   LearningRate 0.0775   Epoch: 2   Global Step: 39880   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:08:26,820-Speed 9862.13 samples/sec   Loss 8.9463   LearningRate 0.0775   Epoch: 2   Global Step: 39890   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:08:27,858-Speed 9872.13 samples/sec   Loss 9.0554   LearningRate 0.0775   Epoch: 2   Global Step: 39900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:28,905-Speed 9783.06 samples/sec   Loss 9.0617   LearningRate 0.0775   Epoch: 2   Global Step: 39910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:30,010-Speed 9276.91 samples/sec   Loss 9.0075   LearningRate 0.0775   Epoch: 2   Global Step: 39920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:31,080-Speed 9571.13 samples/sec   Loss 9.0221   LearningRate 0.0775   Epoch: 2   Global Step: 39930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:32,197-Speed 9171.85 samples/sec   Loss 8.8383   LearningRate 0.0775   Epoch: 2   Global Step: 39940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:33,281-Speed 9460.49 samples/sec   Loss 8.9434   LearningRate 0.0775   Epoch: 2   Global Step: 39950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:34,346-Speed 9621.54 samples/sec   Loss 8.8275   LearningRate 0.0775   Epoch: 2   Global Step: 39960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:35,445-Speed 9321.07 samples/sec   Loss 9.0798   LearningRate 0.0775   Epoch: 2   Global Step: 39970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:36,511-Speed 9612.67 samples/sec   Loss 9.0086   LearningRate 0.0775   Epoch: 2   Global Step: 39980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:37,589-Speed 9505.40 samples/sec   Loss 8.8911   LearningRate 0.0775   Epoch: 2   Global Step: 39990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:08:38,626-Speed 9880.14 samples/sec   Loss 9.0410   LearningRate 0.0775   Epoch: 2   Global Step: 40000   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:09:00,570-[lfw][40000]XNorm: 12.913431
Training: 2022-04-11 13:09:00,570-[lfw][40000]Accuracy-Flip: 0.99367+-0.00379
Training: 2022-04-11 13:09:00,571-[lfw][40000]Accuracy-Highest: 0.99533
Training: 2022-04-11 13:09:25,883-[cfp_fp][40000]XNorm: 10.887237
Training: 2022-04-11 13:09:25,884-[cfp_fp][40000]Accuracy-Flip: 0.93914+-0.01023
Training: 2022-04-11 13:09:25,884-[cfp_fp][40000]Accuracy-Highest: 0.93986
Training: 2022-04-11 13:09:47,661-[agedb_30][40000]XNorm: 12.465581
Training: 2022-04-11 13:09:47,662-[agedb_30][40000]Accuracy-Flip: 0.95083+-0.01193
Training: 2022-04-11 13:09:47,663-[agedb_30][40000]Accuracy-Highest: 0.95333
Training: 2022-04-11 13:09:48,732-Speed 146.07 samples/sec   Loss 9.0025   LearningRate 0.0775   Epoch: 2   Global Step: 40010   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:09:49,768-Speed 9886.22 samples/sec   Loss 8.9505   LearningRate 0.0775   Epoch: 2   Global Step: 40020   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:09:50,813-Speed 9808.71 samples/sec   Loss 8.9868   LearningRate 0.0775   Epoch: 2   Global Step: 40030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:09:51,849-Speed 9883.28 samples/sec   Loss 9.0375   LearningRate 0.0774   Epoch: 2   Global Step: 40040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:09:52,886-Speed 9882.69 samples/sec   Loss 9.0238   LearningRate 0.0774   Epoch: 2   Global Step: 40050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:09:53,933-Speed 9784.10 samples/sec   Loss 8.9884   LearningRate 0.0774   Epoch: 2   Global Step: 40060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:09:54,983-Speed 9761.94 samples/sec   Loss 9.0793   LearningRate 0.0774   Epoch: 2   Global Step: 40070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:09:56,075-Speed 9381.21 samples/sec   Loss 9.0057   LearningRate 0.0774   Epoch: 2   Global Step: 40080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:09:57,156-Speed 9479.88 samples/sec   Loss 8.9855   LearningRate 0.0774   Epoch: 2   Global Step: 40090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:09:58,246-Speed 9402.66 samples/sec   Loss 9.0150   LearningRate 0.0774   Epoch: 2   Global Step: 40100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:09:59,323-Speed 9510.63 samples/sec   Loss 8.9809   LearningRate 0.0774   Epoch: 2   Global Step: 40110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:00,405-Speed 9466.21 samples/sec   Loss 8.9324   LearningRate 0.0774   Epoch: 2   Global Step: 40120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:01,467-Speed 9652.36 samples/sec   Loss 9.0550   LearningRate 0.0774   Epoch: 2   Global Step: 40130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:02,549-Speed 9469.46 samples/sec   Loss 8.9972   LearningRate 0.0774   Epoch: 2   Global Step: 40140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:03,613-Speed 9635.19 samples/sec   Loss 8.9594   LearningRate 0.0774   Epoch: 2   Global Step: 40150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:04,653-Speed 9846.71 samples/sec   Loss 9.0127   LearningRate 0.0774   Epoch: 2   Global Step: 40160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:05,704-Speed 9751.64 samples/sec   Loss 8.9725   LearningRate 0.0774   Epoch: 2   Global Step: 40170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:06,804-Speed 9311.29 samples/sec   Loss 8.9278   LearningRate 0.0774   Epoch: 2   Global Step: 40180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:07,830-Speed 9993.11 samples/sec   Loss 8.8745   LearningRate 0.0774   Epoch: 2   Global Step: 40190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:08,904-Speed 9536.46 samples/sec   Loss 8.9070   LearningRate 0.0774   Epoch: 2   Global Step: 40200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:09,998-Speed 9363.54 samples/sec   Loss 8.8589   LearningRate 0.0774   Epoch: 2   Global Step: 40210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:11,044-Speed 9802.13 samples/sec   Loss 9.0252   LearningRate 0.0774   Epoch: 2   Global Step: 40220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:12,110-Speed 9606.85 samples/sec   Loss 9.0289   LearningRate 0.0773   Epoch: 2   Global Step: 40230   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:10:13,164-Speed 9720.48 samples/sec   Loss 8.9332   LearningRate 0.0773   Epoch: 2   Global Step: 40240   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:10:14,252-Speed 9416.03 samples/sec   Loss 8.9437   LearningRate 0.0773   Epoch: 2   Global Step: 40250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:15,298-Speed 9804.53 samples/sec   Loss 9.0073   LearningRate 0.0773   Epoch: 2   Global Step: 40260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:16,338-Speed 9850.49 samples/sec   Loss 8.8719   LearningRate 0.0773   Epoch: 2   Global Step: 40270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:17,362-Speed 10009.90 samples/sec   Loss 9.0210   LearningRate 0.0773   Epoch: 2   Global Step: 40280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:18,422-Speed 9664.91 samples/sec   Loss 8.8892   LearningRate 0.0773   Epoch: 2   Global Step: 40290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:19,527-Speed 9267.53 samples/sec   Loss 8.8211   LearningRate 0.0773   Epoch: 2   Global Step: 40300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:20,586-Speed 9680.37 samples/sec   Loss 9.0142   LearningRate 0.0773   Epoch: 2   Global Step: 40310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:21,694-Speed 9243.39 samples/sec   Loss 8.9407   LearningRate 0.0773   Epoch: 2   Global Step: 40320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:22,740-Speed 9797.36 samples/sec   Loss 8.9595   LearningRate 0.0773   Epoch: 2   Global Step: 40330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:23,831-Speed 9387.65 samples/sec   Loss 8.9538   LearningRate 0.0773   Epoch: 2   Global Step: 40340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:24,899-Speed 9591.23 samples/sec   Loss 8.9444   LearningRate 0.0773   Epoch: 2   Global Step: 40350   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:10:25,920-Speed 10045.03 samples/sec   Loss 8.9645   LearningRate 0.0773   Epoch: 2   Global Step: 40360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:26,955-Speed 9902.92 samples/sec   Loss 8.9822   LearningRate 0.0773   Epoch: 2   Global Step: 40370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:28,040-Speed 9439.64 samples/sec   Loss 9.0171   LearningRate 0.0773   Epoch: 2   Global Step: 40380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:29,128-Speed 9421.94 samples/sec   Loss 9.0643   LearningRate 0.0773   Epoch: 2   Global Step: 40390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:30,222-Speed 9361.70 samples/sec   Loss 8.9876   LearningRate 0.0773   Epoch: 2   Global Step: 40400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:31,255-Speed 9917.77 samples/sec   Loss 9.1057   LearningRate 0.0773   Epoch: 2   Global Step: 40410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:32,318-Speed 9642.54 samples/sec   Loss 9.0205   LearningRate 0.0772   Epoch: 2   Global Step: 40420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:33,357-Speed 9862.73 samples/sec   Loss 9.0268   LearningRate 0.0772   Epoch: 2   Global Step: 40430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:34,416-Speed 9674.42 samples/sec   Loss 8.8824   LearningRate 0.0772   Epoch: 2   Global Step: 40440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:35,494-Speed 9503.19 samples/sec   Loss 8.8667   LearningRate 0.0772   Epoch: 2   Global Step: 40450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:36,580-Speed 9434.28 samples/sec   Loss 8.9058   LearningRate 0.0772   Epoch: 2   Global Step: 40460   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:10:37,655-Speed 9533.58 samples/sec   Loss 8.9498   LearningRate 0.0772   Epoch: 2   Global Step: 40470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:38,745-Speed 9396.96 samples/sec   Loss 9.1132   LearningRate 0.0772   Epoch: 2   Global Step: 40480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:39,824-Speed 9496.66 samples/sec   Loss 9.0148   LearningRate 0.0772   Epoch: 2   Global Step: 40490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:40,943-Speed 9159.93 samples/sec   Loss 8.9473   LearningRate 0.0772   Epoch: 2   Global Step: 40500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:42,061-Speed 9157.19 samples/sec   Loss 8.9100   LearningRate 0.0772   Epoch: 2   Global Step: 40510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:43,120-Speed 9678.46 samples/sec   Loss 8.9641   LearningRate 0.0772   Epoch: 2   Global Step: 40520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:44,235-Speed 9191.40 samples/sec   Loss 8.9327   LearningRate 0.0772   Epoch: 2   Global Step: 40530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:45,295-Speed 9664.63 samples/sec   Loss 9.0076   LearningRate 0.0772   Epoch: 2   Global Step: 40540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:46,357-Speed 9656.28 samples/sec   Loss 8.9618   LearningRate 0.0772   Epoch: 2   Global Step: 40550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:47,392-Speed 9892.95 samples/sec   Loss 8.9928   LearningRate 0.0772   Epoch: 2   Global Step: 40560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:48,473-Speed 9475.59 samples/sec   Loss 8.9530   LearningRate 0.0772   Epoch: 2   Global Step: 40570   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:10:49,539-Speed 9611.23 samples/sec   Loss 8.9295   LearningRate 0.0772   Epoch: 2   Global Step: 40580   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:10:50,591-Speed 9747.61 samples/sec   Loss 8.9215   LearningRate 0.0772   Epoch: 2   Global Step: 40590   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:10:51,683-Speed 9378.06 samples/sec   Loss 9.0014   LearningRate 0.0772   Epoch: 2   Global Step: 40600   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:10:52,725-Speed 9836.27 samples/sec   Loss 8.9208   LearningRate 0.0771   Epoch: 2   Global Step: 40610   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:10:53,778-Speed 9730.54 samples/sec   Loss 8.9924   LearningRate 0.0771   Epoch: 2   Global Step: 40620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:54,850-Speed 9551.51 samples/sec   Loss 9.0865   LearningRate 0.0771   Epoch: 2   Global Step: 40630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:55,917-Speed 9601.51 samples/sec   Loss 8.9878   LearningRate 0.0771   Epoch: 2   Global Step: 40640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:56,983-Speed 9617.95 samples/sec   Loss 8.9752   LearningRate 0.0771   Epoch: 2   Global Step: 40650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:58,032-Speed 9761.49 samples/sec   Loss 8.9023   LearningRate 0.0771   Epoch: 2   Global Step: 40660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:10:59,137-Speed 9276.00 samples/sec   Loss 9.0323   LearningRate 0.0771   Epoch: 2   Global Step: 40670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:00,240-Speed 9286.48 samples/sec   Loss 9.0183   LearningRate 0.0771   Epoch: 2   Global Step: 40680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:01,335-Speed 9355.60 samples/sec   Loss 8.8972   LearningRate 0.0771   Epoch: 2   Global Step: 40690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:02,451-Speed 9182.79 samples/sec   Loss 8.8908   LearningRate 0.0771   Epoch: 2   Global Step: 40700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:03,568-Speed 9179.53 samples/sec   Loss 8.9260   LearningRate 0.0771   Epoch: 2   Global Step: 40710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:04,615-Speed 9791.14 samples/sec   Loss 9.0169   LearningRate 0.0771   Epoch: 2   Global Step: 40720   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:11:05,715-Speed 9315.55 samples/sec   Loss 8.9767   LearningRate 0.0771   Epoch: 2   Global Step: 40730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:06,796-Speed 9475.13 samples/sec   Loss 9.0509   LearningRate 0.0771   Epoch: 2   Global Step: 40740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:07,866-Speed 9574.18 samples/sec   Loss 8.9618   LearningRate 0.0771   Epoch: 2   Global Step: 40750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:08,941-Speed 9531.82 samples/sec   Loss 8.9539   LearningRate 0.0771   Epoch: 2   Global Step: 40760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:09,996-Speed 9706.45 samples/sec   Loss 8.8462   LearningRate 0.0771   Epoch: 2   Global Step: 40770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:11,050-Speed 9721.93 samples/sec   Loss 8.9458   LearningRate 0.0771   Epoch: 2   Global Step: 40780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:12,101-Speed 9752.37 samples/sec   Loss 8.9268   LearningRate 0.0771   Epoch: 2   Global Step: 40790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:11:13,208-Speed 9254.57 samples/sec   Loss 8.8051   LearningRate 0.0770   Epoch: 2   Global Step: 40800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:11:14,301-Speed 9377.15 samples/sec   Loss 8.9323   LearningRate 0.0770   Epoch: 2   Global Step: 40810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:11:15,403-Speed 9291.40 samples/sec   Loss 8.9646   LearningRate 0.0770   Epoch: 2   Global Step: 40820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:11:16,528-Speed 9114.63 samples/sec   Loss 8.9300   LearningRate 0.0770   Epoch: 2   Global Step: 40830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:11:17,585-Speed 9690.74 samples/sec   Loss 8.9685   LearningRate 0.0770   Epoch: 2   Global Step: 40840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:11:18,670-Speed 9443.95 samples/sec   Loss 9.0019   LearningRate 0.0770   Epoch: 2   Global Step: 40850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:11:19,712-Speed 9834.56 samples/sec   Loss 8.9626   LearningRate 0.0770   Epoch: 2   Global Step: 40860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:11:20,794-Speed 9464.82 samples/sec   Loss 8.8959   LearningRate 0.0770   Epoch: 2   Global Step: 40870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:11:21,909-Speed 9190.32 samples/sec   Loss 8.9464   LearningRate 0.0770   Epoch: 2   Global Step: 40880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:11:22,986-Speed 9512.67 samples/sec   Loss 8.9822   LearningRate 0.0770   Epoch: 2   Global Step: 40890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:24,073-Speed 9433.57 samples/sec   Loss 8.9556   LearningRate 0.0770   Epoch: 2   Global Step: 40900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:25,155-Speed 9468.59 samples/sec   Loss 9.1168   LearningRate 0.0770   Epoch: 2   Global Step: 40910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:26,272-Speed 9172.64 samples/sec   Loss 9.0688   LearningRate 0.0770   Epoch: 2   Global Step: 40920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:27,314-Speed 9833.82 samples/sec   Loss 8.9701   LearningRate 0.0770   Epoch: 2   Global Step: 40930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:28,398-Speed 9450.01 samples/sec   Loss 8.9163   LearningRate 0.0770   Epoch: 2   Global Step: 40940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:29,482-Speed 9455.93 samples/sec   Loss 8.8712   LearningRate 0.0770   Epoch: 2   Global Step: 40950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:30,544-Speed 9644.98 samples/sec   Loss 8.9502   LearningRate 0.0770   Epoch: 2   Global Step: 40960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:31,606-Speed 9643.11 samples/sec   Loss 8.9774   LearningRate 0.0770   Epoch: 2   Global Step: 40970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:32,668-Speed 9654.96 samples/sec   Loss 8.8506   LearningRate 0.0770   Epoch: 2   Global Step: 40980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:33,750-Speed 9469.13 samples/sec   Loss 8.9521   LearningRate 0.0769   Epoch: 2   Global Step: 40990   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:11:34,825-Speed 9537.21 samples/sec   Loss 8.9311   LearningRate 0.0769   Epoch: 2   Global Step: 41000   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:11:35,912-Speed 9420.30 samples/sec   Loss 8.8973   LearningRate 0.0769   Epoch: 2   Global Step: 41010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:36,973-Speed 9667.63 samples/sec   Loss 8.9156   LearningRate 0.0769   Epoch: 2   Global Step: 41020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:38,065-Speed 9376.56 samples/sec   Loss 9.0269   LearningRate 0.0769   Epoch: 2   Global Step: 41030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:39,189-Speed 9115.93 samples/sec   Loss 8.9751   LearningRate 0.0769   Epoch: 2   Global Step: 41040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:40,300-Speed 9223.35 samples/sec   Loss 8.8168   LearningRate 0.0769   Epoch: 2   Global Step: 41050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:41,389-Speed 9409.24 samples/sec   Loss 8.8202   LearningRate 0.0769   Epoch: 2   Global Step: 41060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:42,465-Speed 9531.41 samples/sec   Loss 9.0169   LearningRate 0.0769   Epoch: 2   Global Step: 41070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:43,537-Speed 9557.59 samples/sec   Loss 9.0053   LearningRate 0.0769   Epoch: 2   Global Step: 41080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:44,611-Speed 9543.21 samples/sec   Loss 8.9845   LearningRate 0.0769   Epoch: 2   Global Step: 41090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:45,667-Speed 9701.61 samples/sec   Loss 8.9134   LearningRate 0.0769   Epoch: 2   Global Step: 41100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:46,729-Speed 9658.64 samples/sec   Loss 8.9614   LearningRate 0.0769   Epoch: 2   Global Step: 41110   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:11:47,825-Speed 9347.45 samples/sec   Loss 8.9577   LearningRate 0.0769   Epoch: 2   Global Step: 41120   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:11:48,897-Speed 9562.25 samples/sec   Loss 8.9708   LearningRate 0.0769   Epoch: 2   Global Step: 41130   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:11:49,958-Speed 9653.19 samples/sec   Loss 8.8644   LearningRate 0.0769   Epoch: 2   Global Step: 41140   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:11:50,980-Speed 10028.63 samples/sec   Loss 8.9161   LearningRate 0.0769   Epoch: 2   Global Step: 41150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:52,030-Speed 9758.88 samples/sec   Loss 9.0150   LearningRate 0.0769   Epoch: 2   Global Step: 41160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:53,088-Speed 9681.57 samples/sec   Loss 8.9359   LearningRate 0.0769   Epoch: 2   Global Step: 41170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:54,183-Speed 9356.45 samples/sec   Loss 8.9327   LearningRate 0.0768   Epoch: 2   Global Step: 41180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:55,250-Speed 9606.12 samples/sec   Loss 9.0466   LearningRate 0.0768   Epoch: 2   Global Step: 41190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:56,363-Speed 9205.46 samples/sec   Loss 8.8979   LearningRate 0.0768   Epoch: 2   Global Step: 41200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:57,454-Speed 9390.00 samples/sec   Loss 9.0704   LearningRate 0.0768   Epoch: 2   Global Step: 41210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:58,506-Speed 9742.89 samples/sec   Loss 8.9708   LearningRate 0.0768   Epoch: 2   Global Step: 41220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:11:59,587-Speed 9481.94 samples/sec   Loss 8.9772   LearningRate 0.0768   Epoch: 2   Global Step: 41230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:00,671-Speed 9449.14 samples/sec   Loss 8.9876   LearningRate 0.0768   Epoch: 2   Global Step: 41240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:01,737-Speed 9613.89 samples/sec   Loss 8.8320   LearningRate 0.0768   Epoch: 2   Global Step: 41250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:02,848-Speed 9223.01 samples/sec   Loss 8.9125   LearningRate 0.0768   Epoch: 2   Global Step: 41260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:03,924-Speed 9525.78 samples/sec   Loss 8.9455   LearningRate 0.0768   Epoch: 2   Global Step: 41270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:04,999-Speed 9531.90 samples/sec   Loss 9.0216   LearningRate 0.0768   Epoch: 2   Global Step: 41280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:06,070-Speed 9565.74 samples/sec   Loss 8.9936   LearningRate 0.0768   Epoch: 2   Global Step: 41290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:07,139-Speed 9582.17 samples/sec   Loss 8.9806   LearningRate 0.0768   Epoch: 2   Global Step: 41300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:08,179-Speed 9846.05 samples/sec   Loss 8.8791   LearningRate 0.0768   Epoch: 2   Global Step: 41310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:09,262-Speed 9463.24 samples/sec   Loss 8.8584   LearningRate 0.0768   Epoch: 2   Global Step: 41320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:10,307-Speed 9810.50 samples/sec   Loss 8.9316   LearningRate 0.0768   Epoch: 2   Global Step: 41330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:11,370-Speed 9639.18 samples/sec   Loss 8.9153   LearningRate 0.0768   Epoch: 2   Global Step: 41340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:12,458-Speed 9411.65 samples/sec   Loss 9.0087   LearningRate 0.0768   Epoch: 2   Global Step: 41350   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:12:13,525-Speed 9601.60 samples/sec   Loss 8.9221   LearningRate 0.0768   Epoch: 2   Global Step: 41360   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:12:14,597-Speed 9560.19 samples/sec   Loss 8.8941   LearningRate 0.0768   Epoch: 2   Global Step: 41370   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:12:15,692-Speed 9365.76 samples/sec   Loss 8.9457   LearningRate 0.0767   Epoch: 2   Global Step: 41380   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:12:16,790-Speed 9330.77 samples/sec   Loss 8.9783   LearningRate 0.0767   Epoch: 2   Global Step: 41390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:17,862-Speed 9552.27 samples/sec   Loss 8.9267   LearningRate 0.0767   Epoch: 2   Global Step: 41400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:18,961-Speed 9327.24 samples/sec   Loss 8.9561   LearningRate 0.0767   Epoch: 2   Global Step: 41410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:20,062-Speed 9304.02 samples/sec   Loss 8.8721   LearningRate 0.0767   Epoch: 2   Global Step: 41420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:21,164-Speed 9298.51 samples/sec   Loss 8.8592   LearningRate 0.0767   Epoch: 2   Global Step: 41430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:22,212-Speed 9774.80 samples/sec   Loss 8.9494   LearningRate 0.0767   Epoch: 2   Global Step: 41440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:23,299-Speed 9428.40 samples/sec   Loss 8.9212   LearningRate 0.0767   Epoch: 2   Global Step: 41450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:24,396-Speed 9337.57 samples/sec   Loss 8.8786   LearningRate 0.0767   Epoch: 2   Global Step: 41460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:25,438-Speed 9830.47 samples/sec   Loss 8.9473   LearningRate 0.0767   Epoch: 2   Global Step: 41470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:26,531-Speed 9383.47 samples/sec   Loss 8.9208   LearningRate 0.0767   Epoch: 2   Global Step: 41480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:27,601-Speed 9570.57 samples/sec   Loss 8.9469   LearningRate 0.0767   Epoch: 2   Global Step: 41490   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:12:28,635-Speed 9907.80 samples/sec   Loss 8.9257   LearningRate 0.0767   Epoch: 2   Global Step: 41500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:29,758-Speed 9124.05 samples/sec   Loss 8.8882   LearningRate 0.0767   Epoch: 2   Global Step: 41510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:30,854-Speed 9349.06 samples/sec   Loss 9.0059   LearningRate 0.0767   Epoch: 2   Global Step: 41520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:31,924-Speed 9576.72 samples/sec   Loss 8.9747   LearningRate 0.0767   Epoch: 2   Global Step: 41530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:32,967-Speed 9827.69 samples/sec   Loss 8.9523   LearningRate 0.0767   Epoch: 2   Global Step: 41540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:34,050-Speed 9464.82 samples/sec   Loss 8.8299   LearningRate 0.0767   Epoch: 2   Global Step: 41550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:35,132-Speed 9463.40 samples/sec   Loss 8.9293   LearningRate 0.0767   Epoch: 2   Global Step: 41560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:36,231-Speed 9326.23 samples/sec   Loss 8.9316   LearningRate 0.0766   Epoch: 2   Global Step: 41570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:37,305-Speed 9541.96 samples/sec   Loss 8.8339   LearningRate 0.0766   Epoch: 2   Global Step: 41580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:38,372-Speed 9610.21 samples/sec   Loss 9.0010   LearningRate 0.0766   Epoch: 2   Global Step: 41590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:39,517-Speed 8943.18 samples/sec   Loss 8.9428   LearningRate 0.0766   Epoch: 2   Global Step: 41600   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:12:40,581-Speed 9631.15 samples/sec   Loss 8.9524   LearningRate 0.0766   Epoch: 2   Global Step: 41610   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:12:41,693-Speed 9216.80 samples/sec   Loss 9.0069   LearningRate 0.0766   Epoch: 2   Global Step: 41620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:42,762-Speed 9577.13 samples/sec   Loss 8.7866   LearningRate 0.0766   Epoch: 2   Global Step: 41630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:43,836-Speed 9544.02 samples/sec   Loss 8.8330   LearningRate 0.0766   Epoch: 2   Global Step: 41640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:44,907-Speed 9569.80 samples/sec   Loss 8.8421   LearningRate 0.0766   Epoch: 2   Global Step: 41650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:45,957-Speed 9761.85 samples/sec   Loss 8.7768   LearningRate 0.0766   Epoch: 2   Global Step: 41660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:47,036-Speed 9498.92 samples/sec   Loss 9.1117   LearningRate 0.0766   Epoch: 2   Global Step: 41670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:48,060-Speed 10005.60 samples/sec   Loss 8.9321   LearningRate 0.0766   Epoch: 2   Global Step: 41680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:49,099-Speed 9857.54 samples/sec   Loss 8.9485   LearningRate 0.0766   Epoch: 2   Global Step: 41690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:50,188-Speed 9411.99 samples/sec   Loss 8.9883   LearningRate 0.0766   Epoch: 2   Global Step: 41700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:51,239-Speed 9752.79 samples/sec   Loss 8.9674   LearningRate 0.0766   Epoch: 2   Global Step: 41710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:52,349-Speed 9226.85 samples/sec   Loss 8.9434   LearningRate 0.0766   Epoch: 2   Global Step: 41720   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:12:53,450-Speed 9304.17 samples/sec   Loss 8.8845   LearningRate 0.0766   Epoch: 2   Global Step: 41730   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:12:54,552-Speed 9301.02 samples/sec   Loss 8.9981   LearningRate 0.0766   Epoch: 2   Global Step: 41740   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:12:55,619-Speed 9602.48 samples/sec   Loss 8.8834   LearningRate 0.0766   Epoch: 2   Global Step: 41750   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:12:56,695-Speed 9530.62 samples/sec   Loss 8.9471   LearningRate 0.0765   Epoch: 2   Global Step: 41760   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:12:57,787-Speed 9380.52 samples/sec   Loss 8.9676   LearningRate 0.0765   Epoch: 2   Global Step: 41770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:12:58,837-Speed 9759.18 samples/sec   Loss 8.8730   LearningRate 0.0765   Epoch: 2   Global Step: 41780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:12:59,910-Speed 9548.67 samples/sec   Loss 8.9118   LearningRate 0.0765   Epoch: 2   Global Step: 41790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:13:00,981-Speed 9562.03 samples/sec   Loss 8.8607   LearningRate 0.0765   Epoch: 2   Global Step: 41800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:13:02,049-Speed 9600.91 samples/sec   Loss 8.7937   LearningRate 0.0765   Epoch: 2   Global Step: 41810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:13:03,132-Speed 9461.08 samples/sec   Loss 8.9442   LearningRate 0.0765   Epoch: 2   Global Step: 41820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:13:04,240-Speed 9244.11 samples/sec   Loss 8.7461   LearningRate 0.0765   Epoch: 2   Global Step: 41830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:13:05,316-Speed 9520.21 samples/sec   Loss 8.9309   LearningRate 0.0765   Epoch: 2   Global Step: 41840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:13:06,393-Speed 9516.31 samples/sec   Loss 8.9232   LearningRate 0.0765   Epoch: 2   Global Step: 41850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:13:07,503-Speed 9232.18 samples/sec   Loss 8.9837   LearningRate 0.0765   Epoch: 2   Global Step: 41860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:13:08,564-Speed 9662.67 samples/sec   Loss 9.0172   LearningRate 0.0765   Epoch: 2   Global Step: 41870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:13:09,677-Speed 9197.94 samples/sec   Loss 8.8613   LearningRate 0.0765   Epoch: 2   Global Step: 41880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:13:10,766-Speed 9410.55 samples/sec   Loss 9.0024   LearningRate 0.0765   Epoch: 2   Global Step: 41890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:13:11,839-Speed 9548.33 samples/sec   Loss 8.9887   LearningRate 0.0765   Epoch: 2   Global Step: 41900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:13:12,917-Speed 9508.50 samples/sec   Loss 8.8248   LearningRate 0.0765   Epoch: 2   Global Step: 41910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:13:14,013-Speed 9345.65 samples/sec   Loss 8.8878   LearningRate 0.0765   Epoch: 2   Global Step: 41920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:13:15,090-Speed 9520.84 samples/sec   Loss 9.0758   LearningRate 0.0765   Epoch: 2   Global Step: 41930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:13:16,165-Speed 9525.94 samples/sec   Loss 8.9615   LearningRate 0.0765   Epoch: 2   Global Step: 41940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:13:17,208-Speed 9823.88 samples/sec   Loss 8.9391   LearningRate 0.0764   Epoch: 2   Global Step: 41950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:13:18,240-Speed 9926.07 samples/sec   Loss 8.9104   LearningRate 0.0764   Epoch: 2   Global Step: 41960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:13:19,306-Speed 9615.23 samples/sec   Loss 8.7970   LearningRate 0.0764   Epoch: 2   Global Step: 41970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:13:20,405-Speed 9318.53 samples/sec   Loss 8.9417   LearningRate 0.0764   Epoch: 2   Global Step: 41980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:13:21,466-Speed 9666.82 samples/sec   Loss 9.0242   LearningRate 0.0764   Epoch: 2   Global Step: 41990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:13:22,513-Speed 9779.88 samples/sec   Loss 8.8990   LearningRate 0.0764   Epoch: 2   Global Step: 42000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:13:44,368-[lfw][42000]XNorm: 13.008962
Training: 2022-04-11 13:13:44,368-[lfw][42000]Accuracy-Flip: 0.99450+-0.00224
Training: 2022-04-11 13:13:44,369-[lfw][42000]Accuracy-Highest: 0.99533
Training: 2022-04-11 13:14:09,693-[cfp_fp][42000]XNorm: 10.938136
Training: 2022-04-11 13:14:09,694-[cfp_fp][42000]Accuracy-Flip: 0.93857+-0.01342
Training: 2022-04-11 13:14:09,694-[cfp_fp][42000]Accuracy-Highest: 0.93986
Training: 2022-04-11 13:14:31,568-[agedb_30][42000]XNorm: 12.512479
Training: 2022-04-11 13:14:31,569-[agedb_30][42000]Accuracy-Flip: 0.95333+-0.01101
Training: 2022-04-11 13:14:31,569-[agedb_30][42000]Accuracy-Highest: 0.95333
Training: 2022-04-11 13:14:32,613-Speed 146.08 samples/sec   Loss 8.9938   LearningRate 0.0764   Epoch: 2   Global Step: 42010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:33,667-Speed 9716.94 samples/sec   Loss 8.9582   LearningRate 0.0764   Epoch: 2   Global Step: 42020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:34,748-Speed 9479.19 samples/sec   Loss 8.8757   LearningRate 0.0764   Epoch: 2   Global Step: 42030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:35,807-Speed 9678.72 samples/sec   Loss 8.8417   LearningRate 0.0764   Epoch: 2   Global Step: 42040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:36,875-Speed 9597.25 samples/sec   Loss 8.9371   LearningRate 0.0764   Epoch: 2   Global Step: 42050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:37,962-Speed 9430.38 samples/sec   Loss 9.0127   LearningRate 0.0764   Epoch: 2   Global Step: 42060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:39,017-Speed 9711.11 samples/sec   Loss 8.9387   LearningRate 0.0764   Epoch: 2   Global Step: 42070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:40,079-Speed 9649.49 samples/sec   Loss 8.8931   LearningRate 0.0764   Epoch: 2   Global Step: 42080   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:14:41,158-Speed 9495.69 samples/sec   Loss 8.9574   LearningRate 0.0764   Epoch: 2   Global Step: 42090   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:14:42,222-Speed 9626.27 samples/sec   Loss 8.8835   LearningRate 0.0764   Epoch: 2   Global Step: 42100   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:14:43,342-Speed 9151.69 samples/sec   Loss 8.9928   LearningRate 0.0764   Epoch: 2   Global Step: 42110   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:14:44,394-Speed 9739.00 samples/sec   Loss 8.7610   LearningRate 0.0764   Epoch: 2   Global Step: 42120   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:14:45,459-Speed 9620.63 samples/sec   Loss 8.9355   LearningRate 0.0764   Epoch: 2   Global Step: 42130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:46,518-Speed 9676.24 samples/sec   Loss 8.9489   LearningRate 0.0763   Epoch: 2   Global Step: 42140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:47,631-Speed 9204.11 samples/sec   Loss 8.9975   LearningRate 0.0763   Epoch: 2   Global Step: 42150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:48,679-Speed 9777.93 samples/sec   Loss 8.8160   LearningRate 0.0763   Epoch: 2   Global Step: 42160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:49,750-Speed 9563.60 samples/sec   Loss 9.0342   LearningRate 0.0763   Epoch: 2   Global Step: 42170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:50,850-Speed 9315.78 samples/sec   Loss 8.8898   LearningRate 0.0763   Epoch: 2   Global Step: 42180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:51,944-Speed 9365.23 samples/sec   Loss 8.8765   LearningRate 0.0763   Epoch: 2   Global Step: 42190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:53,001-Speed 9691.69 samples/sec   Loss 8.9166   LearningRate 0.0763   Epoch: 2   Global Step: 42200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:54,062-Speed 9651.64 samples/sec   Loss 8.8462   LearningRate 0.0763   Epoch: 2   Global Step: 42210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:55,110-Speed 9778.08 samples/sec   Loss 8.9280   LearningRate 0.0763   Epoch: 2   Global Step: 42220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:56,138-Speed 9976.57 samples/sec   Loss 8.9001   LearningRate 0.0763   Epoch: 2   Global Step: 42230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:57,211-Speed 9541.78 samples/sec   Loss 8.9302   LearningRate 0.0763   Epoch: 2   Global Step: 42240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:58,275-Speed 9632.70 samples/sec   Loss 8.8059   LearningRate 0.0763   Epoch: 2   Global Step: 42250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:14:59,301-Speed 9985.12 samples/sec   Loss 8.9019   LearningRate 0.0763   Epoch: 2   Global Step: 42260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:00,341-Speed 9851.66 samples/sec   Loss 8.8879   LearningRate 0.0763   Epoch: 2   Global Step: 42270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:01,413-Speed 9557.06 samples/sec   Loss 8.8690   LearningRate 0.0763   Epoch: 2   Global Step: 42280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:02,514-Speed 9306.99 samples/sec   Loss 8.8674   LearningRate 0.0763   Epoch: 2   Global Step: 42290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:03,649-Speed 9031.05 samples/sec   Loss 9.0666   LearningRate 0.0763   Epoch: 2   Global Step: 42300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:04,771-Speed 9132.06 samples/sec   Loss 8.8948   LearningRate 0.0763   Epoch: 2   Global Step: 42310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:05,841-Speed 9572.03 samples/sec   Loss 8.9039   LearningRate 0.0763   Epoch: 2   Global Step: 42320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:06,942-Speed 9307.33 samples/sec   Loss 8.7921   LearningRate 0.0762   Epoch: 2   Global Step: 42330   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:15:08,032-Speed 9403.07 samples/sec   Loss 8.8596   LearningRate 0.0762   Epoch: 2   Global Step: 42340   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:15:09,084-Speed 9736.11 samples/sec   Loss 8.8294   LearningRate 0.0762   Epoch: 2   Global Step: 42350   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:15:10,123-Speed 9858.59 samples/sec   Loss 8.9545   LearningRate 0.0762   Epoch: 2   Global Step: 42360   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:15:11,194-Speed 9568.70 samples/sec   Loss 9.0362   LearningRate 0.0762   Epoch: 2   Global Step: 42370   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:15:12,270-Speed 9526.01 samples/sec   Loss 8.8393   LearningRate 0.0762   Epoch: 2   Global Step: 42380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:13,384-Speed 9200.39 samples/sec   Loss 9.0459   LearningRate 0.0762   Epoch: 2   Global Step: 42390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:14,419-Speed 9900.51 samples/sec   Loss 8.9304   LearningRate 0.0762   Epoch: 2   Global Step: 42400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:15,498-Speed 9494.01 samples/sec   Loss 8.9359   LearningRate 0.0762   Epoch: 2   Global Step: 42410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:16,573-Speed 9525.88 samples/sec   Loss 8.8043   LearningRate 0.0762   Epoch: 2   Global Step: 42420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:17,649-Speed 9528.43 samples/sec   Loss 8.9315   LearningRate 0.0762   Epoch: 2   Global Step: 42430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:18,694-Speed 9802.23 samples/sec   Loss 8.9175   LearningRate 0.0762   Epoch: 2   Global Step: 42440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:19,756-Speed 9655.49 samples/sec   Loss 8.8826   LearningRate 0.0762   Epoch: 2   Global Step: 42450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:20,858-Speed 9298.72 samples/sec   Loss 8.7998   LearningRate 0.0762   Epoch: 2   Global Step: 42460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:21,945-Speed 9430.28 samples/sec   Loss 9.0762   LearningRate 0.0762   Epoch: 2   Global Step: 42470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:23,032-Speed 9424.92 samples/sec   Loss 8.8091   LearningRate 0.0762   Epoch: 2   Global Step: 42480   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:15:24,123-Speed 9389.56 samples/sec   Loss 8.6940   LearningRate 0.0762   Epoch: 2   Global Step: 42490   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:15:25,190-Speed 9598.27 samples/sec   Loss 8.8500   LearningRate 0.0762   Epoch: 2   Global Step: 42500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:26,282-Speed 9381.63 samples/sec   Loss 8.7450   LearningRate 0.0762   Epoch: 2   Global Step: 42510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:27,360-Speed 9511.54 samples/sec   Loss 8.9190   LearningRate 0.0761   Epoch: 2   Global Step: 42520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:28,414-Speed 9711.60 samples/sec   Loss 8.8827   LearningRate 0.0761   Epoch: 2   Global Step: 42530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:29,485-Speed 9569.95 samples/sec   Loss 8.8716   LearningRate 0.0761   Epoch: 2   Global Step: 42540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:30,586-Speed 9302.77 samples/sec   Loss 8.8715   LearningRate 0.0761   Epoch: 2   Global Step: 42550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:31,696-Speed 9236.59 samples/sec   Loss 8.8765   LearningRate 0.0761   Epoch: 2   Global Step: 42560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:32,811-Speed 9192.23 samples/sec   Loss 8.8534   LearningRate 0.0761   Epoch: 2   Global Step: 42570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:33,893-Speed 9471.11 samples/sec   Loss 8.9205   LearningRate 0.0761   Epoch: 2   Global Step: 42580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:35,002-Speed 9235.92 samples/sec   Loss 8.9178   LearningRate 0.0761   Epoch: 2   Global Step: 42590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:36,114-Speed 9212.20 samples/sec   Loss 8.8243   LearningRate 0.0761   Epoch: 2   Global Step: 42600   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:15:37,179-Speed 9621.70 samples/sec   Loss 8.9296   LearningRate 0.0761   Epoch: 2   Global Step: 42610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:15:38,272-Speed 9374.54 samples/sec   Loss 8.9945   LearningRate 0.0761   Epoch: 2   Global Step: 42620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:15:39,349-Speed 9515.84 samples/sec   Loss 8.8764   LearningRate 0.0761   Epoch: 2   Global Step: 42630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:15:40,438-Speed 9407.81 samples/sec   Loss 8.9013   LearningRate 0.0761   Epoch: 2   Global Step: 42640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:15:41,505-Speed 9604.84 samples/sec   Loss 8.8740   LearningRate 0.0761   Epoch: 2   Global Step: 42650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:15:42,576-Speed 9572.52 samples/sec   Loss 8.8895   LearningRate 0.0761   Epoch: 2   Global Step: 42660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:15:43,667-Speed 9393.67 samples/sec   Loss 8.9087   LearningRate 0.0761   Epoch: 2   Global Step: 42670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:15:44,715-Speed 9772.42 samples/sec   Loss 8.8873   LearningRate 0.0761   Epoch: 2   Global Step: 42680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:15:45,777-Speed 9650.22 samples/sec   Loss 8.9013   LearningRate 0.0761   Epoch: 2   Global Step: 42690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:15:46,879-Speed 9293.70 samples/sec   Loss 8.8536   LearningRate 0.0761   Epoch: 2   Global Step: 42700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:15:47,947-Speed 9594.85 samples/sec   Loss 8.7454   LearningRate 0.0760   Epoch: 2   Global Step: 42710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:49,002-Speed 9712.18 samples/sec   Loss 8.9062   LearningRate 0.0760   Epoch: 2   Global Step: 42720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:50,079-Speed 9516.68 samples/sec   Loss 8.8284   LearningRate 0.0760   Epoch: 2   Global Step: 42730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:51,132-Speed 9730.30 samples/sec   Loss 8.8030   LearningRate 0.0760   Epoch: 2   Global Step: 42740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:52,175-Speed 9819.82 samples/sec   Loss 8.8576   LearningRate 0.0760   Epoch: 2   Global Step: 42750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:53,193-Speed 10064.89 samples/sec   Loss 8.7912   LearningRate 0.0760   Epoch: 2   Global Step: 42760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:54,279-Speed 9435.52 samples/sec   Loss 8.8851   LearningRate 0.0760   Epoch: 2   Global Step: 42770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:55,343-Speed 9628.23 samples/sec   Loss 8.9506   LearningRate 0.0760   Epoch: 2   Global Step: 42780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:56,430-Speed 9434.45 samples/sec   Loss 8.8945   LearningRate 0.0760   Epoch: 2   Global Step: 42790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:57,497-Speed 9596.68 samples/sec   Loss 8.7440   LearningRate 0.0760   Epoch: 2   Global Step: 42800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:15:58,550-Speed 9737.00 samples/sec   Loss 8.8921   LearningRate 0.0760   Epoch: 2   Global Step: 42810   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:15:59,603-Speed 9723.25 samples/sec   Loss 8.8594   LearningRate 0.0760   Epoch: 2   Global Step: 42820   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:16:00,638-Speed 9906.20 samples/sec   Loss 8.9261   LearningRate 0.0760   Epoch: 2   Global Step: 42830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:01,687-Speed 9763.42 samples/sec   Loss 8.7924   LearningRate 0.0760   Epoch: 2   Global Step: 42840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:02,756-Speed 9587.82 samples/sec   Loss 8.8627   LearningRate 0.0760   Epoch: 2   Global Step: 42850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:03,862-Speed 9265.32 samples/sec   Loss 8.8054   LearningRate 0.0760   Epoch: 2   Global Step: 42860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:04,963-Speed 9307.23 samples/sec   Loss 8.7702   LearningRate 0.0760   Epoch: 2   Global Step: 42870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:06,067-Speed 9281.83 samples/sec   Loss 8.9636   LearningRate 0.0760   Epoch: 2   Global Step: 42880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:07,129-Speed 9641.91 samples/sec   Loss 8.8151   LearningRate 0.0760   Epoch: 2   Global Step: 42890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:08,253-Speed 9121.85 samples/sec   Loss 8.8597   LearningRate 0.0759   Epoch: 2   Global Step: 42900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:09,296-Speed 9821.74 samples/sec   Loss 8.9409   LearningRate 0.0759   Epoch: 2   Global Step: 42910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:10,320-Speed 10007.75 samples/sec   Loss 8.8508   LearningRate 0.0759   Epoch: 2   Global Step: 42920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:11,349-Speed 9950.03 samples/sec   Loss 8.9381   LearningRate 0.0759   Epoch: 2   Global Step: 42930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:12,438-Speed 9415.37 samples/sec   Loss 8.8251   LearningRate 0.0759   Epoch: 2   Global Step: 42940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:13,512-Speed 9542.41 samples/sec   Loss 8.8865   LearningRate 0.0759   Epoch: 2   Global Step: 42950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:14,562-Speed 9757.97 samples/sec   Loss 8.7994   LearningRate 0.0759   Epoch: 2   Global Step: 42960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:15,671-Speed 9232.01 samples/sec   Loss 8.9434   LearningRate 0.0759   Epoch: 2   Global Step: 42970   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:16,718-Speed 9795.19 samples/sec   Loss 8.9626   LearningRate 0.0759   Epoch: 2   Global Step: 42980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:17,817-Speed 9324.33 samples/sec   Loss 8.8712   LearningRate 0.0759   Epoch: 2   Global Step: 42990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:18,872-Speed 9711.32 samples/sec   Loss 8.8713   LearningRate 0.0759   Epoch: 2   Global Step: 43000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:19,909-Speed 9882.41 samples/sec   Loss 8.8257   LearningRate 0.0759   Epoch: 2   Global Step: 43010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:20,941-Speed 9922.37 samples/sec   Loss 8.8937   LearningRate 0.0759   Epoch: 2   Global Step: 43020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:22,004-Speed 9645.10 samples/sec   Loss 8.9655   LearningRate 0.0759   Epoch: 2   Global Step: 43030   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:16:23,049-Speed 9800.40 samples/sec   Loss 8.8755   LearningRate 0.0759   Epoch: 2   Global Step: 43040   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:16:24,137-Speed 9417.06 samples/sec   Loss 8.8609   LearningRate 0.0759   Epoch: 2   Global Step: 43050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:25,262-Speed 9106.00 samples/sec   Loss 8.8463   LearningRate 0.0759   Epoch: 2   Global Step: 43060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:26,342-Speed 9484.82 samples/sec   Loss 8.8893   LearningRate 0.0759   Epoch: 2   Global Step: 43070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:27,404-Speed 9648.51 samples/sec   Loss 8.8228   LearningRate 0.0759   Epoch: 2   Global Step: 43080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:28,536-Speed 9050.28 samples/sec   Loss 8.8805   LearningRate 0.0758   Epoch: 2   Global Step: 43090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:29,666-Speed 9068.83 samples/sec   Loss 8.7809   LearningRate 0.0758   Epoch: 2   Global Step: 43100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:30,740-Speed 9543.67 samples/sec   Loss 8.9697   LearningRate 0.0758   Epoch: 2   Global Step: 43110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:31,840-Speed 9317.00 samples/sec   Loss 8.7979   LearningRate 0.0758   Epoch: 2   Global Step: 43120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:32,890-Speed 9758.49 samples/sec   Loss 8.7077   LearningRate 0.0758   Epoch: 2   Global Step: 43130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:33,933-Speed 9815.24 samples/sec   Loss 8.8087   LearningRate 0.0758   Epoch: 2   Global Step: 43140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:34,995-Speed 9652.56 samples/sec   Loss 8.9307   LearningRate 0.0758   Epoch: 2   Global Step: 43150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:36,060-Speed 9623.25 samples/sec   Loss 8.7851   LearningRate 0.0758   Epoch: 2   Global Step: 43160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:37,166-Speed 9268.37 samples/sec   Loss 8.8124   LearningRate 0.0758   Epoch: 2   Global Step: 43170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:38,204-Speed 9867.10 samples/sec   Loss 8.8673   LearningRate 0.0758   Epoch: 2   Global Step: 43180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:39,239-Speed 9898.87 samples/sec   Loss 8.8979   LearningRate 0.0758   Epoch: 2   Global Step: 43190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:40,293-Speed 9716.74 samples/sec   Loss 8.8711   LearningRate 0.0758   Epoch: 2   Global Step: 43200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:41,368-Speed 9536.42 samples/sec   Loss 8.9463   LearningRate 0.0758   Epoch: 2   Global Step: 43210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:42,458-Speed 9404.05 samples/sec   Loss 8.8725   LearningRate 0.0758   Epoch: 2   Global Step: 43220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:43,539-Speed 9479.78 samples/sec   Loss 9.0121   LearningRate 0.0758   Epoch: 2   Global Step: 43230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:44,626-Speed 9422.49 samples/sec   Loss 8.7107   LearningRate 0.0758   Epoch: 2   Global Step: 43240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:45,701-Speed 9534.66 samples/sec   Loss 8.8246   LearningRate 0.0758   Epoch: 2   Global Step: 43250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:46,741-Speed 9851.38 samples/sec   Loss 8.7626   LearningRate 0.0758   Epoch: 2   Global Step: 43260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:47,780-Speed 9857.30 samples/sec   Loss 8.7904   LearningRate 0.0758   Epoch: 2   Global Step: 43270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:48,876-Speed 9354.60 samples/sec   Loss 8.8441   LearningRate 0.0758   Epoch: 2   Global Step: 43280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:49,927-Speed 9747.42 samples/sec   Loss 8.8107   LearningRate 0.0757   Epoch: 2   Global Step: 43290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:50,998-Speed 9560.88 samples/sec   Loss 8.7519   LearningRate 0.0757   Epoch: 2   Global Step: 43300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:52,072-Speed 9544.07 samples/sec   Loss 8.9499   LearningRate 0.0757   Epoch: 2   Global Step: 43310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:53,171-Speed 9320.70 samples/sec   Loss 8.7900   LearningRate 0.0757   Epoch: 2   Global Step: 43320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:54,258-Speed 9427.28 samples/sec   Loss 8.9822   LearningRate 0.0757   Epoch: 2   Global Step: 43330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:55,320-Speed 9652.89 samples/sec   Loss 8.7893   LearningRate 0.0757   Epoch: 2   Global Step: 43340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:56,408-Speed 9421.48 samples/sec   Loss 8.9293   LearningRate 0.0757   Epoch: 2   Global Step: 43350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:57,531-Speed 9123.29 samples/sec   Loss 8.9651   LearningRate 0.0757   Epoch: 2   Global Step: 43360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:16:58,643-Speed 9213.41 samples/sec   Loss 8.7650   LearningRate 0.0757   Epoch: 2   Global Step: 43370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:16:59,730-Speed 9421.21 samples/sec   Loss 8.8084   LearningRate 0.0757   Epoch: 2   Global Step: 43380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:00,812-Speed 9470.59 samples/sec   Loss 8.8171   LearningRate 0.0757   Epoch: 2   Global Step: 43390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:01,902-Speed 9409.78 samples/sec   Loss 8.9647   LearningRate 0.0757   Epoch: 2   Global Step: 43400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:02,978-Speed 9516.98 samples/sec   Loss 8.8870   LearningRate 0.0757   Epoch: 2   Global Step: 43410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:04,015-Speed 9881.52 samples/sec   Loss 8.8677   LearningRate 0.0757   Epoch: 2   Global Step: 43420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:05,092-Speed 9511.94 samples/sec   Loss 8.8100   LearningRate 0.0757   Epoch: 2   Global Step: 43430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:06,229-Speed 9013.93 samples/sec   Loss 8.8202   LearningRate 0.0757   Epoch: 2   Global Step: 43440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:07,296-Speed 9604.21 samples/sec   Loss 8.8344   LearningRate 0.0757   Epoch: 2   Global Step: 43450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:08,364-Speed 9596.17 samples/sec   Loss 8.8320   LearningRate 0.0757   Epoch: 2   Global Step: 43460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:09,438-Speed 9539.63 samples/sec   Loss 8.7737   LearningRate 0.0757   Epoch: 2   Global Step: 43470   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:17:10,535-Speed 9334.49 samples/sec   Loss 8.9993   LearningRate 0.0756   Epoch: 2   Global Step: 43480   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:17:11,613-Speed 9503.78 samples/sec   Loss 8.8626   LearningRate 0.0756   Epoch: 2   Global Step: 43490   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:17:12,673-Speed 9677.68 samples/sec   Loss 8.8459   LearningRate 0.0756   Epoch: 2   Global Step: 43500   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:17:13,736-Speed 9640.67 samples/sec   Loss 8.8466   LearningRate 0.0756   Epoch: 2   Global Step: 43510   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:17:14,836-Speed 9310.64 samples/sec   Loss 8.8513   LearningRate 0.0756   Epoch: 2   Global Step: 43520   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:17:15,924-Speed 9420.06 samples/sec   Loss 8.9812   LearningRate 0.0756   Epoch: 2   Global Step: 43530   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:17:16,964-Speed 9844.86 samples/sec   Loss 8.8493   LearningRate 0.0756   Epoch: 2   Global Step: 43540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:18,058-Speed 9370.71 samples/sec   Loss 8.7925   LearningRate 0.0756   Epoch: 2   Global Step: 43550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:19,154-Speed 9353.12 samples/sec   Loss 8.7460   LearningRate 0.0756   Epoch: 2   Global Step: 43560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:20,231-Speed 9508.34 samples/sec   Loss 8.8563   LearningRate 0.0756   Epoch: 2   Global Step: 43570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:21,257-Speed 9991.28 samples/sec   Loss 8.8153   LearningRate 0.0756   Epoch: 2   Global Step: 43580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:22,352-Speed 9350.28 samples/sec   Loss 8.9829   LearningRate 0.0756   Epoch: 2   Global Step: 43590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:23,413-Speed 9660.74 samples/sec   Loss 8.8824   LearningRate 0.0756   Epoch: 2   Global Step: 43600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:24,507-Speed 9367.16 samples/sec   Loss 8.7703   LearningRate 0.0756   Epoch: 2   Global Step: 43610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:25,577-Speed 9574.23 samples/sec   Loss 8.8738   LearningRate 0.0756   Epoch: 2   Global Step: 43620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:17:26,652-Speed 9525.43 samples/sec   Loss 8.9146   LearningRate 0.0756   Epoch: 2   Global Step: 43630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:17:27,707-Speed 9710.11 samples/sec   Loss 8.9286   LearningRate 0.0756   Epoch: 2   Global Step: 43640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:17:28,792-Speed 9449.28 samples/sec   Loss 8.8034   LearningRate 0.0756   Epoch: 2   Global Step: 43650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:17:29,852-Speed 9667.42 samples/sec   Loss 8.7475   LearningRate 0.0756   Epoch: 2   Global Step: 43660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:17:30,932-Speed 9489.87 samples/sec   Loss 9.0600   LearningRate 0.0755   Epoch: 2   Global Step: 43670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:17:32,018-Speed 9437.53 samples/sec   Loss 8.9077   LearningRate 0.0755   Epoch: 2   Global Step: 43680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:17:33,056-Speed 9875.00 samples/sec   Loss 8.7997   LearningRate 0.0755   Epoch: 2   Global Step: 43690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:17:34,099-Speed 9821.15 samples/sec   Loss 8.8908   LearningRate 0.0755   Epoch: 2   Global Step: 43700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:17:35,140-Speed 9843.95 samples/sec   Loss 8.7314   LearningRate 0.0755   Epoch: 2   Global Step: 43710   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:17:36,183-Speed 9819.89 samples/sec   Loss 8.7923   LearningRate 0.0755   Epoch: 2   Global Step: 43720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:37,256-Speed 9555.46 samples/sec   Loss 8.7669   LearningRate 0.0755   Epoch: 2   Global Step: 43730   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:38,314-Speed 9676.04 samples/sec   Loss 8.9514   LearningRate 0.0755   Epoch: 2   Global Step: 43740   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:39,406-Speed 9389.08 samples/sec   Loss 8.7852   LearningRate 0.0755   Epoch: 2   Global Step: 43750   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:40,501-Speed 9358.26 samples/sec   Loss 8.8011   LearningRate 0.0755   Epoch: 2   Global Step: 43760   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:41,611-Speed 9228.93 samples/sec   Loss 8.7557   LearningRate 0.0755   Epoch: 2   Global Step: 43770   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:42,682-Speed 9568.12 samples/sec   Loss 8.8144   LearningRate 0.0755   Epoch: 2   Global Step: 43780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:43,714-Speed 9924.99 samples/sec   Loss 8.8859   LearningRate 0.0755   Epoch: 2   Global Step: 43790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:44,777-Speed 9638.52 samples/sec   Loss 8.8308   LearningRate 0.0755   Epoch: 2   Global Step: 43800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:45,838-Speed 9662.31 samples/sec   Loss 8.9899   LearningRate 0.0755   Epoch: 2   Global Step: 43810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:46,897-Speed 9679.74 samples/sec   Loss 8.7963   LearningRate 0.0755   Epoch: 2   Global Step: 43820   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:17:47,950-Speed 9727.61 samples/sec   Loss 8.8820   LearningRate 0.0755   Epoch: 2   Global Step: 43830   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:17:49,025-Speed 9530.61 samples/sec   Loss 8.7989   LearningRate 0.0755   Epoch: 2   Global Step: 43840   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:17:50,097-Speed 9559.55 samples/sec   Loss 8.7891   LearningRate 0.0755   Epoch: 2   Global Step: 43850   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:17:51,206-Speed 9246.44 samples/sec   Loss 8.8128   LearningRate 0.0754   Epoch: 2   Global Step: 43860   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:17:52,280-Speed 9537.64 samples/sec   Loss 8.7883   LearningRate 0.0754   Epoch: 2   Global Step: 43870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:53,308-Speed 9967.36 samples/sec   Loss 8.7888   LearningRate 0.0754   Epoch: 2   Global Step: 43880   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:54,417-Speed 9239.69 samples/sec   Loss 8.9476   LearningRate 0.0754   Epoch: 2   Global Step: 43890   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:55,453-Speed 9881.74 samples/sec   Loss 8.8484   LearningRate 0.0754   Epoch: 2   Global Step: 43900   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:56,507-Speed 9727.64 samples/sec   Loss 8.7750   LearningRate 0.0754   Epoch: 2   Global Step: 43910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:57,540-Speed 9913.31 samples/sec   Loss 8.8479   LearningRate 0.0754   Epoch: 2   Global Step: 43920   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:58,569-Speed 9961.00 samples/sec   Loss 8.7917   LearningRate 0.0754   Epoch: 2   Global Step: 43930   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:17:59,621-Speed 9735.38 samples/sec   Loss 8.8305   LearningRate 0.0754   Epoch: 2   Global Step: 43940   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:18:00,707-Speed 9437.25 samples/sec   Loss 8.7624   LearningRate 0.0754   Epoch: 2   Global Step: 43950   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:18:01,787-Speed 9489.31 samples/sec   Loss 8.7394   LearningRate 0.0754   Epoch: 2   Global Step: 43960   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:18:02,874-Speed 9423.26 samples/sec   Loss 8.8629   LearningRate 0.0754   Epoch: 2   Global Step: 43970   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:18:03,962-Speed 9423.30 samples/sec   Loss 8.7772   LearningRate 0.0754   Epoch: 2   Global Step: 43980   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:18:05,090-Speed 9077.65 samples/sec   Loss 8.7716   LearningRate 0.0754   Epoch: 2   Global Step: 43990   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:18:06,170-Speed 9492.68 samples/sec   Loss 8.6962   LearningRate 0.0754   Epoch: 2   Global Step: 44000   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:18:27,953-[lfw][44000]XNorm: 13.420222
Training: 2022-04-11 13:18:27,953-[lfw][44000]Accuracy-Flip: 0.99467+-0.00233
Training: 2022-04-11 13:18:27,954-[lfw][44000]Accuracy-Highest: 0.99533
Training: 2022-04-11 13:18:53,142-[cfp_fp][44000]XNorm: 11.165973
Training: 2022-04-11 13:18:53,143-[cfp_fp][44000]Accuracy-Flip: 0.94700+-0.00946
Training: 2022-04-11 13:18:53,144-[cfp_fp][44000]Accuracy-Highest: 0.94700
Training: 2022-04-11 13:19:14,870-[agedb_30][44000]XNorm: 12.899060
Training: 2022-04-11 13:19:14,871-[agedb_30][44000]Accuracy-Flip: 0.95117+-0.01057
Training: 2022-04-11 13:19:14,872-[agedb_30][44000]Accuracy-Highest: 0.95333
Training: 2022-04-11 13:19:15,919-Speed 146.81 samples/sec   Loss 8.9209   LearningRate 0.0754   Epoch: 2   Global Step: 44010   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:19:16,970-Speed 9749.72 samples/sec   Loss 8.7271   LearningRate 0.0754   Epoch: 2   Global Step: 44020   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:19:18,090-Speed 9146.68 samples/sec   Loss 8.8448   LearningRate 0.0754   Epoch: 2   Global Step: 44030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:19,187-Speed 9334.14 samples/sec   Loss 8.8155   LearningRate 0.0754   Epoch: 2   Global Step: 44040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:20,259-Speed 9555.58 samples/sec   Loss 8.7834   LearningRate 0.0753   Epoch: 2   Global Step: 44050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:21,342-Speed 9460.41 samples/sec   Loss 8.7854   LearningRate 0.0753   Epoch: 2   Global Step: 44060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:22,417-Speed 9531.08 samples/sec   Loss 8.8808   LearningRate 0.0753   Epoch: 2   Global Step: 44070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:23,497-Speed 9488.12 samples/sec   Loss 8.8473   LearningRate 0.0753   Epoch: 2   Global Step: 44080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:24,605-Speed 9251.73 samples/sec   Loss 8.7933   LearningRate 0.0753   Epoch: 2   Global Step: 44090   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:25,636-Speed 9935.47 samples/sec   Loss 8.8326   LearningRate 0.0753   Epoch: 2   Global Step: 44100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:26,701-Speed 9621.19 samples/sec   Loss 8.9323   LearningRate 0.0753   Epoch: 2   Global Step: 44110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:27,751-Speed 9756.41 samples/sec   Loss 8.7485   LearningRate 0.0753   Epoch: 2   Global Step: 44120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:28,836-Speed 9448.75 samples/sec   Loss 8.8605   LearningRate 0.0753   Epoch: 2   Global Step: 44130   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:19:29,878-Speed 9833.47 samples/sec   Loss 8.9039   LearningRate 0.0753   Epoch: 2   Global Step: 44140   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:19:30,941-Speed 9636.88 samples/sec   Loss 8.7672   LearningRate 0.0753   Epoch: 2   Global Step: 44150   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:19:32,045-Speed 9277.82 samples/sec   Loss 8.7393   LearningRate 0.0753   Epoch: 2   Global Step: 44160   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:19:33,115-Speed 9578.76 samples/sec   Loss 8.8539   LearningRate 0.0753   Epoch: 2   Global Step: 44170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:34,183-Speed 9597.73 samples/sec   Loss 8.7441   LearningRate 0.0753   Epoch: 2   Global Step: 44180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:35,258-Speed 9532.01 samples/sec   Loss 8.7095   LearningRate 0.0753   Epoch: 2   Global Step: 44190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:36,309-Speed 9745.00 samples/sec   Loss 8.7469   LearningRate 0.0753   Epoch: 2   Global Step: 44200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:37,339-Speed 9948.98 samples/sec   Loss 8.7673   LearningRate 0.0753   Epoch: 2   Global Step: 44210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:38,404-Speed 9619.98 samples/sec   Loss 8.8417   LearningRate 0.0753   Epoch: 2   Global Step: 44220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:39,468-Speed 9629.76 samples/sec   Loss 8.7360   LearningRate 0.0753   Epoch: 2   Global Step: 44230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:40,530-Speed 9647.05 samples/sec   Loss 8.8016   LearningRate 0.0753   Epoch: 2   Global Step: 44240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:41,563-Speed 9923.33 samples/sec   Loss 8.7889   LearningRate 0.0752   Epoch: 2   Global Step: 44250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:42,605-Speed 9836.66 samples/sec   Loss 8.8353   LearningRate 0.0752   Epoch: 2   Global Step: 44260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:43,668-Speed 9639.52 samples/sec   Loss 8.9564   LearningRate 0.0752   Epoch: 2   Global Step: 44270   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:19:44,732-Speed 9626.68 samples/sec   Loss 8.7940   LearningRate 0.0752   Epoch: 2   Global Step: 44280   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:19:45,784-Speed 9741.16 samples/sec   Loss 8.9046   LearningRate 0.0752   Epoch: 2   Global Step: 44290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:46,884-Speed 9312.18 samples/sec   Loss 8.8050   LearningRate 0.0752   Epoch: 2   Global Step: 44300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:48,000-Speed 9182.60 samples/sec   Loss 8.7870   LearningRate 0.0752   Epoch: 2   Global Step: 44310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:49,040-Speed 9856.04 samples/sec   Loss 8.9060   LearningRate 0.0752   Epoch: 2   Global Step: 44320   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:50,136-Speed 9348.43 samples/sec   Loss 8.9373   LearningRate 0.0752   Epoch: 2   Global Step: 44330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:51,214-Speed 9504.01 samples/sec   Loss 8.8446   LearningRate 0.0752   Epoch: 2   Global Step: 44340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:52,316-Speed 9298.03 samples/sec   Loss 8.7975   LearningRate 0.0752   Epoch: 2   Global Step: 44350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:53,376-Speed 9661.81 samples/sec   Loss 8.8676   LearningRate 0.0752   Epoch: 2   Global Step: 44360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:54,453-Speed 9517.47 samples/sec   Loss 8.8233   LearningRate 0.0752   Epoch: 2   Global Step: 44370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:55,592-Speed 8994.95 samples/sec   Loss 8.8220   LearningRate 0.0752   Epoch: 2   Global Step: 44380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:56,673-Speed 9476.07 samples/sec   Loss 8.7806   LearningRate 0.0752   Epoch: 2   Global Step: 44390   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:19:57,737-Speed 9639.75 samples/sec   Loss 8.8816   LearningRate 0.0752   Epoch: 2   Global Step: 44400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:58,848-Speed 9219.63 samples/sec   Loss 8.9045   LearningRate 0.0752   Epoch: 2   Global Step: 44410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:19:59,952-Speed 9282.08 samples/sec   Loss 8.8531   LearningRate 0.0752   Epoch: 2   Global Step: 44420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:01,012-Speed 9667.49 samples/sec   Loss 8.9604   LearningRate 0.0752   Epoch: 2   Global Step: 44430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:02,050-Speed 9865.46 samples/sec   Loss 8.7230   LearningRate 0.0751   Epoch: 2   Global Step: 44440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:03,134-Speed 9453.44 samples/sec   Loss 8.8121   LearningRate 0.0751   Epoch: 2   Global Step: 44450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:04,202-Speed 9595.80 samples/sec   Loss 8.8299   LearningRate 0.0751   Epoch: 2   Global Step: 44460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:05,266-Speed 9631.67 samples/sec   Loss 8.7917   LearningRate 0.0751   Epoch: 2   Global Step: 44470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:06,380-Speed 9196.98 samples/sec   Loss 8.7676   LearningRate 0.0751   Epoch: 2   Global Step: 44480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:07,423-Speed 9822.86 samples/sec   Loss 8.6792   LearningRate 0.0751   Epoch: 2   Global Step: 44490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:08,506-Speed 9459.64 samples/sec   Loss 8.7228   LearningRate 0.0751   Epoch: 2   Global Step: 44500   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:20:09,608-Speed 9297.47 samples/sec   Loss 8.8836   LearningRate 0.0751   Epoch: 2   Global Step: 44510   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:20:10,696-Speed 9417.89 samples/sec   Loss 8.7486   LearningRate 0.0751   Epoch: 2   Global Step: 44520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:11,738-Speed 9840.03 samples/sec   Loss 8.8404   LearningRate 0.0751   Epoch: 2   Global Step: 44530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:12,827-Speed 9403.22 samples/sec   Loss 8.7630   LearningRate 0.0751   Epoch: 2   Global Step: 44540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:13,922-Speed 9358.90 samples/sec   Loss 8.6991   LearningRate 0.0751   Epoch: 2   Global Step: 44550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:15,000-Speed 9505.06 samples/sec   Loss 8.8255   LearningRate 0.0751   Epoch: 2   Global Step: 44560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:16,078-Speed 9502.04 samples/sec   Loss 8.6120   LearningRate 0.0751   Epoch: 2   Global Step: 44570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:17,180-Speed 9305.15 samples/sec   Loss 8.7503   LearningRate 0.0751   Epoch: 2   Global Step: 44580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:18,232-Speed 9744.20 samples/sec   Loss 8.8107   LearningRate 0.0751   Epoch: 2   Global Step: 44590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:19,299-Speed 9603.07 samples/sec   Loss 8.8865   LearningRate 0.0751   Epoch: 2   Global Step: 44600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:20,380-Speed 9475.96 samples/sec   Loss 8.7177   LearningRate 0.0751   Epoch: 2   Global Step: 44610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:21,438-Speed 9685.13 samples/sec   Loss 8.7901   LearningRate 0.0751   Epoch: 2   Global Step: 44620   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:20:22,471-Speed 9917.24 samples/sec   Loss 8.8259   LearningRate 0.0750   Epoch: 2   Global Step: 44630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:23,565-Speed 9368.19 samples/sec   Loss 8.8098   LearningRate 0.0750   Epoch: 2   Global Step: 44640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:24,665-Speed 9313.90 samples/sec   Loss 8.7540   LearningRate 0.0750   Epoch: 2   Global Step: 44650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:25,731-Speed 9613.74 samples/sec   Loss 8.8423   LearningRate 0.0750   Epoch: 2   Global Step: 44660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:26,786-Speed 9712.62 samples/sec   Loss 8.7207   LearningRate 0.0750   Epoch: 2   Global Step: 44670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:27,867-Speed 9472.33 samples/sec   Loss 8.7951   LearningRate 0.0750   Epoch: 2   Global Step: 44680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:28,988-Speed 9146.11 samples/sec   Loss 8.9003   LearningRate 0.0750   Epoch: 2   Global Step: 44690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:30,045-Speed 9685.44 samples/sec   Loss 8.8927   LearningRate 0.0750   Epoch: 2   Global Step: 44700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:31,137-Speed 9386.37 samples/sec   Loss 8.8176   LearningRate 0.0750   Epoch: 2   Global Step: 44710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:32,246-Speed 9234.17 samples/sec   Loss 8.8114   LearningRate 0.0750   Epoch: 2   Global Step: 44720   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:33,354-Speed 9248.09 samples/sec   Loss 8.7559   LearningRate 0.0750   Epoch: 2   Global Step: 44730   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:20:34,458-Speed 9287.45 samples/sec   Loss 8.7448   LearningRate 0.0750   Epoch: 2   Global Step: 44740   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:20:35,559-Speed 9305.89 samples/sec   Loss 8.7844   LearningRate 0.0750   Epoch: 2   Global Step: 44750   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:20:36,642-Speed 9464.69 samples/sec   Loss 8.6456   LearningRate 0.0750   Epoch: 2   Global Step: 44760   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:20:37,721-Speed 9493.23 samples/sec   Loss 8.7767   LearningRate 0.0750   Epoch: 2   Global Step: 44770   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:20:38,762-Speed 9842.32 samples/sec   Loss 8.7308   LearningRate 0.0750   Epoch: 2   Global Step: 44780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:39,785-Speed 10011.60 samples/sec   Loss 8.8490   LearningRate 0.0750   Epoch: 2   Global Step: 44790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:40,847-Speed 9654.20 samples/sec   Loss 8.6208   LearningRate 0.0750   Epoch: 2   Global Step: 44800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:41,949-Speed 9298.66 samples/sec   Loss 8.7948   LearningRate 0.0750   Epoch: 2   Global Step: 44810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:43,023-Speed 9531.73 samples/sec   Loss 8.8818   LearningRate 0.0749   Epoch: 2   Global Step: 44820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:44,133-Speed 9232.57 samples/sec   Loss 8.9192   LearningRate 0.0749   Epoch: 2   Global Step: 44830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:45,223-Speed 9403.19 samples/sec   Loss 8.7029   LearningRate 0.0749   Epoch: 2   Global Step: 44840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:46,295-Speed 9556.37 samples/sec   Loss 8.8851   LearningRate 0.0749   Epoch: 2   Global Step: 44850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:47,362-Speed 9602.39 samples/sec   Loss 8.7453   LearningRate 0.0749   Epoch: 2   Global Step: 44860   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:48,477-Speed 9192.46 samples/sec   Loss 8.9138   LearningRate 0.0749   Epoch: 2   Global Step: 44870   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:20:49,536-Speed 9668.74 samples/sec   Loss 8.7085   LearningRate 0.0749   Epoch: 2   Global Step: 44880   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:20:50,610-Speed 9543.82 samples/sec   Loss 8.7363   LearningRate 0.0749   Epoch: 2   Global Step: 44890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:20:51,657-Speed 9785.70 samples/sec   Loss 8.7288   LearningRate 0.0749   Epoch: 2   Global Step: 44900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:20:52,717-Speed 9667.20 samples/sec   Loss 8.6731   LearningRate 0.0749   Epoch: 2   Global Step: 44910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:20:53,755-Speed 9874.55 samples/sec   Loss 8.6950   LearningRate 0.0749   Epoch: 2   Global Step: 44920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:20:54,842-Speed 9425.82 samples/sec   Loss 8.8036   LearningRate 0.0749   Epoch: 2   Global Step: 44930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:20:55,916-Speed 9538.35 samples/sec   Loss 8.8400   LearningRate 0.0749   Epoch: 2   Global Step: 44940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:20:56,995-Speed 9496.32 samples/sec   Loss 8.8288   LearningRate 0.0749   Epoch: 2   Global Step: 44950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:20:58,067-Speed 9561.72 samples/sec   Loss 8.7994   LearningRate 0.0749   Epoch: 2   Global Step: 44960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:20:59,166-Speed 9327.21 samples/sec   Loss 8.8697   LearningRate 0.0749   Epoch: 2   Global Step: 44970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:21:00,217-Speed 9740.32 samples/sec   Loss 8.9072   LearningRate 0.0749   Epoch: 2   Global Step: 44980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:21:01,250-Speed 9917.74 samples/sec   Loss 8.7040   LearningRate 0.0749   Epoch: 2   Global Step: 44990   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:02,304-Speed 9724.69 samples/sec   Loss 8.8022   LearningRate 0.0749   Epoch: 2   Global Step: 45000   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:03,341-Speed 9878.92 samples/sec   Loss 8.7983   LearningRate 0.0749   Epoch: 2   Global Step: 45010   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:04,444-Speed 9295.58 samples/sec   Loss 8.6992   LearningRate 0.0748   Epoch: 2   Global Step: 45020   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:05,522-Speed 9503.89 samples/sec   Loss 8.7396   LearningRate 0.0748   Epoch: 2   Global Step: 45030   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:06,588-Speed 9610.05 samples/sec   Loss 8.7864   LearningRate 0.0748   Epoch: 2   Global Step: 45040   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:07,652-Speed 9623.22 samples/sec   Loss 8.7108   LearningRate 0.0748   Epoch: 2   Global Step: 45050   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:08,709-Speed 9699.51 samples/sec   Loss 8.7316   LearningRate 0.0748   Epoch: 2   Global Step: 45060   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:09,797-Speed 9414.00 samples/sec   Loss 8.7570   LearningRate 0.0748   Epoch: 2   Global Step: 45070   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:10,870-Speed 9553.86 samples/sec   Loss 8.7483   LearningRate 0.0748   Epoch: 2   Global Step: 45080   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:11,938-Speed 9592.30 samples/sec   Loss 8.6949   LearningRate 0.0748   Epoch: 2   Global Step: 45090   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:21:12,983-Speed 9809.54 samples/sec   Loss 8.7839   LearningRate 0.0748   Epoch: 2   Global Step: 45100   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:21:14,099-Speed 9179.26 samples/sec   Loss 8.7972   LearningRate 0.0748   Epoch: 2   Global Step: 45110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:15,193-Speed 9366.42 samples/sec   Loss 8.6970   LearningRate 0.0748   Epoch: 2   Global Step: 45120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:16,243-Speed 9752.55 samples/sec   Loss 8.7498   LearningRate 0.0748   Epoch: 2   Global Step: 45130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:17,320-Speed 9522.50 samples/sec   Loss 8.6524   LearningRate 0.0748   Epoch: 2   Global Step: 45140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:18,406-Speed 9434.92 samples/sec   Loss 8.7273   LearningRate 0.0748   Epoch: 2   Global Step: 45150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:19,488-Speed 9468.54 samples/sec   Loss 8.6519   LearningRate 0.0748   Epoch: 2   Global Step: 45160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:20,575-Speed 9425.63 samples/sec   Loss 8.8382   LearningRate 0.0748   Epoch: 2   Global Step: 45170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:21,635-Speed 9658.00 samples/sec   Loss 8.8214   LearningRate 0.0748   Epoch: 2   Global Step: 45180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:22,724-Speed 9410.29 samples/sec   Loss 8.8231   LearningRate 0.0748   Epoch: 2   Global Step: 45190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:23,769-Speed 9803.21 samples/sec   Loss 8.8637   LearningRate 0.0748   Epoch: 2   Global Step: 45200   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:24,847-Speed 9510.65 samples/sec   Loss 8.8781   LearningRate 0.0747   Epoch: 2   Global Step: 45210   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:25,895-Speed 9771.84 samples/sec   Loss 8.8383   LearningRate 0.0747   Epoch: 2   Global Step: 45220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:27,006-Speed 9217.52 samples/sec   Loss 8.6908   LearningRate 0.0747   Epoch: 2   Global Step: 45230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:28,113-Speed 9260.03 samples/sec   Loss 8.8400   LearningRate 0.0747   Epoch: 2   Global Step: 45240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:29,168-Speed 9720.72 samples/sec   Loss 8.7119   LearningRate 0.0747   Epoch: 2   Global Step: 45250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:30,235-Speed 9598.49 samples/sec   Loss 8.7656   LearningRate 0.0747   Epoch: 2   Global Step: 45260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:31,342-Speed 9262.26 samples/sec   Loss 8.7184   LearningRate 0.0747   Epoch: 2   Global Step: 45270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:32,375-Speed 9911.25 samples/sec   Loss 8.7845   LearningRate 0.0747   Epoch: 2   Global Step: 45280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:33,431-Speed 9706.28 samples/sec   Loss 8.7855   LearningRate 0.0747   Epoch: 2   Global Step: 45290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:34,525-Speed 9368.83 samples/sec   Loss 8.6352   LearningRate 0.0747   Epoch: 2   Global Step: 45300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:35,560-Speed 9901.41 samples/sec   Loss 8.8178   LearningRate 0.0747   Epoch: 2   Global Step: 45310   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:21:36,620-Speed 9658.34 samples/sec   Loss 8.6910   LearningRate 0.0747   Epoch: 2   Global Step: 45320   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:21:37,678-Speed 9690.68 samples/sec   Loss 8.7396   LearningRate 0.0747   Epoch: 2   Global Step: 45330   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:21:38,739-Speed 9655.60 samples/sec   Loss 8.6392   LearningRate 0.0747   Epoch: 2   Global Step: 45340   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:21:39,798-Speed 9675.71 samples/sec   Loss 8.7956   LearningRate 0.0747   Epoch: 2   Global Step: 45350   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:21:40,871-Speed 9547.33 samples/sec   Loss 8.7834   LearningRate 0.0747   Epoch: 2   Global Step: 45360   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:21:41,973-Speed 9295.59 samples/sec   Loss 8.8126   LearningRate 0.0747   Epoch: 2   Global Step: 45370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:43,016-Speed 9820.39 samples/sec   Loss 8.7409   LearningRate 0.0747   Epoch: 2   Global Step: 45380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:44,069-Speed 9735.33 samples/sec   Loss 8.8166   LearningRate 0.0747   Epoch: 2   Global Step: 45390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:45,120-Speed 9743.62 samples/sec   Loss 8.7521   LearningRate 0.0746   Epoch: 2   Global Step: 45400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:46,161-Speed 9843.68 samples/sec   Loss 8.8527   LearningRate 0.0746   Epoch: 2   Global Step: 45410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:47,183-Speed 10034.77 samples/sec   Loss 8.6200   LearningRate 0.0746   Epoch: 2   Global Step: 45420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:48,219-Speed 9891.64 samples/sec   Loss 8.7878   LearningRate 0.0746   Epoch: 2   Global Step: 45430   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:49,279-Speed 9664.69 samples/sec   Loss 8.7301   LearningRate 0.0746   Epoch: 2   Global Step: 45440   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:50,381-Speed 9294.42 samples/sec   Loss 8.7245   LearningRate 0.0746   Epoch: 2   Global Step: 45450   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:51,484-Speed 9288.74 samples/sec   Loss 8.8085   LearningRate 0.0746   Epoch: 2   Global Step: 45460   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:52,549-Speed 9622.64 samples/sec   Loss 8.7663   LearningRate 0.0746   Epoch: 2   Global Step: 45470   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:53,624-Speed 9527.49 samples/sec   Loss 8.7546   LearningRate 0.0746   Epoch: 2   Global Step: 45480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:54,724-Speed 9317.98 samples/sec   Loss 8.7000   LearningRate 0.0746   Epoch: 2   Global Step: 45490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:55,766-Speed 9835.97 samples/sec   Loss 8.6527   LearningRate 0.0746   Epoch: 2   Global Step: 45500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:56,847-Speed 9473.42 samples/sec   Loss 8.9544   LearningRate 0.0746   Epoch: 2   Global Step: 45510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:57,898-Speed 9749.95 samples/sec   Loss 8.7188   LearningRate 0.0746   Epoch: 2   Global Step: 45520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:21:58,951-Speed 9730.14 samples/sec   Loss 8.6906   LearningRate 0.0746   Epoch: 2   Global Step: 45530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:00,011-Speed 9670.28 samples/sec   Loss 8.8114   LearningRate 0.0746   Epoch: 2   Global Step: 45540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:01,076-Speed 9623.36 samples/sec   Loss 8.7845   LearningRate 0.0746   Epoch: 2   Global Step: 45550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:02,145-Speed 9582.74 samples/sec   Loss 8.8469   LearningRate 0.0746   Epoch: 2   Global Step: 45560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:03,168-Speed 10009.02 samples/sec   Loss 8.7908   LearningRate 0.0746   Epoch: 2   Global Step: 45570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:04,198-Speed 9951.26 samples/sec   Loss 8.8896   LearningRate 0.0746   Epoch: 2   Global Step: 45580   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:05,274-Speed 9523.44 samples/sec   Loss 8.7742   LearningRate 0.0746   Epoch: 2   Global Step: 45590   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:06,330-Speed 9707.82 samples/sec   Loss 8.6579   LearningRate 0.0745   Epoch: 2   Global Step: 45600   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:07,364-Speed 9910.47 samples/sec   Loss 8.6794   LearningRate 0.0745   Epoch: 2   Global Step: 45610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:08,483-Speed 9158.95 samples/sec   Loss 8.7770   LearningRate 0.0745   Epoch: 2   Global Step: 45620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:09,548-Speed 9611.87 samples/sec   Loss 8.7659   LearningRate 0.0745   Epoch: 2   Global Step: 45630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:10,622-Speed 9545.25 samples/sec   Loss 8.7466   LearningRate 0.0745   Epoch: 2   Global Step: 45640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:11,693-Speed 9566.86 samples/sec   Loss 8.6494   LearningRate 0.0745   Epoch: 2   Global Step: 45650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:12,741-Speed 9779.72 samples/sec   Loss 8.7305   LearningRate 0.0745   Epoch: 2   Global Step: 45660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:13,773-Speed 9923.39 samples/sec   Loss 8.7280   LearningRate 0.0745   Epoch: 2   Global Step: 45670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:14,839-Speed 9610.32 samples/sec   Loss 8.6897   LearningRate 0.0745   Epoch: 2   Global Step: 45680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:15,885-Speed 9797.15 samples/sec   Loss 8.8029   LearningRate 0.0745   Epoch: 2   Global Step: 45690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:22:16,925-Speed 9854.83 samples/sec   Loss 8.8452   LearningRate 0.0745   Epoch: 2   Global Step: 45700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:17,986-Speed 9654.66 samples/sec   Loss 8.6930   LearningRate 0.0745   Epoch: 2   Global Step: 45710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:19,013-Speed 9980.32 samples/sec   Loss 8.5679   LearningRate 0.0745   Epoch: 2   Global Step: 45720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:20,048-Speed 9903.65 samples/sec   Loss 8.5868   LearningRate 0.0745   Epoch: 2   Global Step: 45730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:21,081-Speed 9917.35 samples/sec   Loss 8.8030   LearningRate 0.0745   Epoch: 2   Global Step: 45740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:22,161-Speed 9488.83 samples/sec   Loss 8.7530   LearningRate 0.0745   Epoch: 2   Global Step: 45750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:23,183-Speed 10017.97 samples/sec   Loss 8.7187   LearningRate 0.0745   Epoch: 2   Global Step: 45760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:24,238-Speed 9711.65 samples/sec   Loss 8.7876   LearningRate 0.0745   Epoch: 2   Global Step: 45770   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:22:25,297-Speed 9678.92 samples/sec   Loss 8.7119   LearningRate 0.0745   Epoch: 2   Global Step: 45780   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:22:26,337-Speed 9852.22 samples/sec   Loss 8.6477   LearningRate 0.0744   Epoch: 2   Global Step: 45790   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:22:27,406-Speed 9584.38 samples/sec   Loss 8.6746   LearningRate 0.0744   Epoch: 2   Global Step: 45800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:28,475-Speed 9587.70 samples/sec   Loss 8.7528   LearningRate 0.0744   Epoch: 2   Global Step: 45810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:29,507-Speed 9930.66 samples/sec   Loss 8.7212   LearningRate 0.0744   Epoch: 2   Global Step: 45820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:30,569-Speed 9648.73 samples/sec   Loss 8.7610   LearningRate 0.0744   Epoch: 2   Global Step: 45830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:31,642-Speed 9552.89 samples/sec   Loss 8.7244   LearningRate 0.0744   Epoch: 2   Global Step: 45840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:32,717-Speed 9525.60 samples/sec   Loss 8.7731   LearningRate 0.0744   Epoch: 2   Global Step: 45850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:33,798-Speed 9479.02 samples/sec   Loss 8.6557   LearningRate 0.0744   Epoch: 2   Global Step: 45860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:34,879-Speed 9486.23 samples/sec   Loss 8.7029   LearningRate 0.0744   Epoch: 2   Global Step: 45870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:35,948-Speed 9580.87 samples/sec   Loss 8.7144   LearningRate 0.0744   Epoch: 2   Global Step: 45880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:37,020-Speed 9560.51 samples/sec   Loss 8.7679   LearningRate 0.0744   Epoch: 2   Global Step: 45890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:38,113-Speed 9374.64 samples/sec   Loss 8.7744   LearningRate 0.0744   Epoch: 2   Global Step: 45900   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:22:39,171-Speed 9681.67 samples/sec   Loss 8.7900   LearningRate 0.0744   Epoch: 2   Global Step: 45910   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:22:40,197-Speed 9993.52 samples/sec   Loss 8.8409   LearningRate 0.0744   Epoch: 2   Global Step: 45920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:41,261-Speed 9624.96 samples/sec   Loss 8.6911   LearningRate 0.0744   Epoch: 2   Global Step: 45930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:42,303-Speed 9832.19 samples/sec   Loss 8.8016   LearningRate 0.0744   Epoch: 2   Global Step: 45940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:43,409-Speed 9260.98 samples/sec   Loss 8.8750   LearningRate 0.0744   Epoch: 2   Global Step: 45950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:44,478-Speed 9591.73 samples/sec   Loss 8.7124   LearningRate 0.0744   Epoch: 2   Global Step: 45960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:45,526-Speed 9774.32 samples/sec   Loss 8.7002   LearningRate 0.0744   Epoch: 2   Global Step: 45970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:46,599-Speed 9549.02 samples/sec   Loss 8.6904   LearningRate 0.0743   Epoch: 2   Global Step: 45980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:47,649-Speed 9768.94 samples/sec   Loss 8.8401   LearningRate 0.0743   Epoch: 2   Global Step: 45990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:22:48,715-Speed 9607.99 samples/sec   Loss 8.8528   LearningRate 0.0743   Epoch: 2   Global Step: 46000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:23:10,955-[lfw][46000]XNorm: 13.305444
Training: 2022-04-11 13:23:10,956-[lfw][46000]Accuracy-Flip: 0.99583+-0.00171
Training: 2022-04-11 13:23:10,956-[lfw][46000]Accuracy-Highest: 0.99583
Training: 2022-04-11 13:23:36,540-[cfp_fp][46000]XNorm: 11.134653
Training: 2022-04-11 13:23:36,541-[cfp_fp][46000]Accuracy-Flip: 0.94143+-0.01166
Training: 2022-04-11 13:23:36,541-[cfp_fp][46000]Accuracy-Highest: 0.94700
Training: 2022-04-11 13:23:58,484-[agedb_30][46000]XNorm: 12.852607
Training: 2022-04-11 13:23:58,485-[agedb_30][46000]Accuracy-Flip: 0.95483+-0.01104
Training: 2022-04-11 13:23:58,485-[agedb_30][46000]Accuracy-Highest: 0.95483
Training: 2022-04-11 13:23:59,581-Speed 144.50 samples/sec   Loss 8.6357   LearningRate 0.0743   Epoch: 2   Global Step: 46010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:24:00,631-Speed 9752.74 samples/sec   Loss 8.8439   LearningRate 0.0743   Epoch: 2   Global Step: 46020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:24:01,716-Speed 9443.63 samples/sec   Loss 8.6035   LearningRate 0.0743   Epoch: 2   Global Step: 46030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:24:02,762-Speed 9799.53 samples/sec   Loss 8.6714   LearningRate 0.0743   Epoch: 2   Global Step: 46040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:24:03,794-Speed 9928.76 samples/sec   Loss 8.7012   LearningRate 0.0743   Epoch: 2   Global Step: 46050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:24:04,828-Speed 9910.06 samples/sec   Loss 8.7175   LearningRate 0.0743   Epoch: 2   Global Step: 46060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:24:05,910-Speed 9471.92 samples/sec   Loss 8.7578   LearningRate 0.0743   Epoch: 2   Global Step: 46070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:24:06,979-Speed 9581.86 samples/sec   Loss 8.6313   LearningRate 0.0743   Epoch: 2   Global Step: 46080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:24:08,029-Speed 9754.47 samples/sec   Loss 8.8264   LearningRate 0.0743   Epoch: 2   Global Step: 46090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-11 13:24:09,089-Speed 9671.34 samples/sec   Loss 8.5590   LearningRate 0.0743   Epoch: 2   Global Step: 46100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:10,143-Speed 9716.89 samples/sec   Loss 8.7519   LearningRate 0.0743   Epoch: 2   Global Step: 46110   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:11,206-Speed 9638.85 samples/sec   Loss 8.7330   LearningRate 0.0743   Epoch: 2   Global Step: 46120   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:12,317-Speed 9222.97 samples/sec   Loss 8.6139   LearningRate 0.0743   Epoch: 2   Global Step: 46130   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:13,437-Speed 9144.43 samples/sec   Loss 8.7355   LearningRate 0.0743   Epoch: 2   Global Step: 46140   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:14,530-Speed 9378.56 samples/sec   Loss 8.6501   LearningRate 0.0743   Epoch: 2   Global Step: 46150   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:15,618-Speed 9414.86 samples/sec   Loss 8.6987   LearningRate 0.0743   Epoch: 2   Global Step: 46160   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:16,681-Speed 9645.79 samples/sec   Loss 8.7234   LearningRate 0.0743   Epoch: 2   Global Step: 46170   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:17,752-Speed 9560.77 samples/sec   Loss 8.7590   LearningRate 0.0742   Epoch: 2   Global Step: 46180   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:18,846-Speed 9368.95 samples/sec   Loss 8.7466   LearningRate 0.0742   Epoch: 2   Global Step: 46190   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:19,929-Speed 9463.10 samples/sec   Loss 8.7795   LearningRate 0.0742   Epoch: 2   Global Step: 46200   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:24:21,017-Speed 9413.89 samples/sec   Loss 8.7236   LearningRate 0.0742   Epoch: 2   Global Step: 46210   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:24:22,083-Speed 9613.66 samples/sec   Loss 8.6940   LearningRate 0.0742   Epoch: 2   Global Step: 46220   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:23,185-Speed 9294.40 samples/sec   Loss 8.7953   LearningRate 0.0742   Epoch: 2   Global Step: 46230   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:24,256-Speed 9566.64 samples/sec   Loss 8.7034   LearningRate 0.0742   Epoch: 2   Global Step: 46240   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:25,300-Speed 9816.89 samples/sec   Loss 8.7546   LearningRate 0.0742   Epoch: 2   Global Step: 46250   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:26,411-Speed 9221.02 samples/sec   Loss 8.5990   LearningRate 0.0742   Epoch: 2   Global Step: 46260   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:27,523-Speed 9216.60 samples/sec   Loss 8.7254   LearningRate 0.0742   Epoch: 2   Global Step: 46270   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:28,582-Speed 9673.28 samples/sec   Loss 8.8094   LearningRate 0.0742   Epoch: 2   Global Step: 46280   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:29,626-Speed 9811.85 samples/sec   Loss 8.7973   LearningRate 0.0742   Epoch: 2   Global Step: 46290   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:30,686-Speed 9668.01 samples/sec   Loss 8.7049   LearningRate 0.0742   Epoch: 2   Global Step: 46300   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:31,745-Speed 9670.88 samples/sec   Loss 8.7291   LearningRate 0.0742   Epoch: 2   Global Step: 46310   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:32,831-Speed 9441.02 samples/sec   Loss 8.6252   LearningRate 0.0742   Epoch: 2   Global Step: 46320   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:24:33,887-Speed 9702.69 samples/sec   Loss 8.7548   LearningRate 0.0742   Epoch: 2   Global Step: 46330   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:34,968-Speed 9473.48 samples/sec   Loss 8.7271   LearningRate 0.0742   Epoch: 2   Global Step: 46340   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:36,073-Speed 9275.18 samples/sec   Loss 8.6294   LearningRate 0.0742   Epoch: 2   Global Step: 46350   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:37,138-Speed 9624.50 samples/sec   Loss 8.5904   LearningRate 0.0742   Epoch: 2   Global Step: 46360   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:38,180-Speed 9838.04 samples/sec   Loss 8.6019   LearningRate 0.0741   Epoch: 2   Global Step: 46370   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:39,224-Speed 9811.36 samples/sec   Loss 8.7150   LearningRate 0.0741   Epoch: 2   Global Step: 46380   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:40,303-Speed 9499.45 samples/sec   Loss 8.6764   LearningRate 0.0741   Epoch: 2   Global Step: 46390   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:41,384-Speed 9479.11 samples/sec   Loss 8.6426   LearningRate 0.0741   Epoch: 2   Global Step: 46400   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:42,503-Speed 9157.98 samples/sec   Loss 8.7718   LearningRate 0.0741   Epoch: 2   Global Step: 46410   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:43,553-Speed 9755.28 samples/sec   Loss 8.7260   LearningRate 0.0741   Epoch: 2   Global Step: 46420   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:44,628-Speed 9535.81 samples/sec   Loss 8.5450   LearningRate 0.0741   Epoch: 2   Global Step: 46430   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:24:45,689-Speed 9657.44 samples/sec   Loss 8.7284   LearningRate 0.0741   Epoch: 2   Global Step: 46440   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:24:46,784-Speed 9353.50 samples/sec   Loss 8.7257   LearningRate 0.0741   Epoch: 2   Global Step: 46450   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:24:47,879-Speed 9357.30 samples/sec   Loss 8.5912   LearningRate 0.0741   Epoch: 2   Global Step: 46460   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:24:48,971-Speed 9382.82 samples/sec   Loss 8.7290   LearningRate 0.0741   Epoch: 2   Global Step: 46470   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:24:50,048-Speed 9511.17 samples/sec   Loss 8.7418   LearningRate 0.0741   Epoch: 2   Global Step: 46480   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:51,108-Speed 9666.24 samples/sec   Loss 8.6601   LearningRate 0.0741   Epoch: 2   Global Step: 46490   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:52,161-Speed 9727.27 samples/sec   Loss 8.6692   LearningRate 0.0741   Epoch: 2   Global Step: 46500   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:53,203-Speed 9837.24 samples/sec   Loss 8.7101   LearningRate 0.0741   Epoch: 2   Global Step: 46510   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:54,282-Speed 9490.56 samples/sec   Loss 8.6920   LearningRate 0.0741   Epoch: 2   Global Step: 46520   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:55,353-Speed 9574.27 samples/sec   Loss 8.7327   LearningRate 0.0741   Epoch: 2   Global Step: 46530   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:56,455-Speed 9293.61 samples/sec   Loss 8.7420   LearningRate 0.0741   Epoch: 2   Global Step: 46540   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:57,518-Speed 9642.60 samples/sec   Loss 8.7686   LearningRate 0.0741   Epoch: 2   Global Step: 46550   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:58,592-Speed 9540.11 samples/sec   Loss 8.6922   LearningRate 0.0741   Epoch: 2   Global Step: 46560   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:24:59,668-Speed 9524.79 samples/sec   Loss 8.8519   LearningRate 0.0740   Epoch: 2   Global Step: 46570   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:00,730-Speed 9645.62 samples/sec   Loss 8.7020   LearningRate 0.0740   Epoch: 2   Global Step: 46580   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:25:01,772-Speed 9837.04 samples/sec   Loss 8.6507   LearningRate 0.0740   Epoch: 2   Global Step: 46590   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:25:02,845-Speed 9541.90 samples/sec   Loss 8.6257   LearningRate 0.0740   Epoch: 2   Global Step: 46600   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:25:03,908-Speed 9644.12 samples/sec   Loss 8.7092   LearningRate 0.0740   Epoch: 2   Global Step: 46610   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:25:04,986-Speed 9504.53 samples/sec   Loss 8.7492   LearningRate 0.0740   Epoch: 2   Global Step: 46620   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:06,104-Speed 9161.54 samples/sec   Loss 8.6625   LearningRate 0.0740   Epoch: 2   Global Step: 46630   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:07,205-Speed 9309.36 samples/sec   Loss 8.6244   LearningRate 0.0740   Epoch: 2   Global Step: 46640   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:08,244-Speed 9867.03 samples/sec   Loss 8.8757   LearningRate 0.0740   Epoch: 2   Global Step: 46650   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:09,286-Speed 9831.06 samples/sec   Loss 8.5867   LearningRate 0.0740   Epoch: 2   Global Step: 46660   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:10,340-Speed 9715.17 samples/sec   Loss 8.6006   LearningRate 0.0740   Epoch: 2   Global Step: 46670   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:11,421-Speed 9482.80 samples/sec   Loss 8.7421   LearningRate 0.0740   Epoch: 2   Global Step: 46680   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:12,464-Speed 9821.45 samples/sec   Loss 8.7749   LearningRate 0.0740   Epoch: 2   Global Step: 46690   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:13,559-Speed 9357.83 samples/sec   Loss 8.7510   LearningRate 0.0740   Epoch: 2   Global Step: 46700   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:14,645-Speed 9440.15 samples/sec   Loss 8.7732   LearningRate 0.0740   Epoch: 2   Global Step: 46710   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:15,706-Speed 9656.29 samples/sec   Loss 8.7648   LearningRate 0.0740   Epoch: 2   Global Step: 46720   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:25:16,769-Speed 9638.02 samples/sec   Loss 8.5846   LearningRate 0.0740   Epoch: 2   Global Step: 46730   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:25:17,900-Speed 9061.75 samples/sec   Loss 8.6816   LearningRate 0.0740   Epoch: 2   Global Step: 46740   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:25:18,995-Speed 9351.44 samples/sec   Loss 8.6397   LearningRate 0.0740   Epoch: 2   Global Step: 46750   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:25:20,090-Speed 9361.24 samples/sec   Loss 8.7717   LearningRate 0.0739   Epoch: 2   Global Step: 46760   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:25:21,175-Speed 9445.25 samples/sec   Loss 8.6981   LearningRate 0.0739   Epoch: 2   Global Step: 46770   Fp16 Grad Scale: 262144   Required: 12 hours
Training: 2022-04-11 13:25:22,243-Speed 9588.34 samples/sec   Loss 8.7360   LearningRate 0.0739   Epoch: 2   Global Step: 46780   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:23,330-Speed 9424.98 samples/sec   Loss 8.7308   LearningRate 0.0739   Epoch: 2   Global Step: 46790   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:24,378-Speed 9783.66 samples/sec   Loss 8.7553   LearningRate 0.0739   Epoch: 2   Global Step: 46800   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:25,447-Speed 9578.62 samples/sec   Loss 8.6155   LearningRate 0.0739   Epoch: 2   Global Step: 46810   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:26,472-Speed 9994.49 samples/sec   Loss 8.7316   LearningRate 0.0739   Epoch: 2   Global Step: 46820   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:27,515-Speed 9832.38 samples/sec   Loss 8.7957   LearningRate 0.0739   Epoch: 2   Global Step: 46830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:28,581-Speed 9607.85 samples/sec   Loss 8.6501   LearningRate 0.0739   Epoch: 2   Global Step: 46840   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:29,676-Speed 9358.31 samples/sec   Loss 8.7031   LearningRate 0.0739   Epoch: 2   Global Step: 46850   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-11 13:25:30,763-Speed 9424.28 samples/sec   Loss 8.5608   LearningRate 0.0739   Epoch: 2   Global Step: 46860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:31,866-Speed 9287.54 samples/sec   Loss 8.6613   LearningRate 0.0739   Epoch: 2   Global Step: 46870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:32,958-Speed 9385.59 samples/sec   Loss 8.6814   LearningRate 0.0739   Epoch: 2   Global Step: 46880   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:25:34,090-Speed 9045.96 samples/sec   Loss 8.5878   LearningRate 0.0739   Epoch: 2   Global Step: 46890   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:25:35,174-Speed 9457.19 samples/sec   Loss 8.7130   LearningRate 0.0739   Epoch: 2   Global Step: 46900   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:25:36,301-Speed 9090.59 samples/sec   Loss 8.7463   LearningRate 0.0739   Epoch: 2   Global Step: 46910   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:25:37,393-Speed 9390.01 samples/sec   Loss 8.6399   LearningRate 0.0739   Epoch: 2   Global Step: 46920   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:25:38,450-Speed 9701.96 samples/sec   Loss 8.5601   LearningRate 0.0739   Epoch: 2   Global Step: 46930   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:25:39,515-Speed 9622.70 samples/sec   Loss 8.6885   LearningRate 0.0739   Epoch: 2   Global Step: 46940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:40,593-Speed 9498.59 samples/sec   Loss 8.6525   LearningRate 0.0738   Epoch: 2   Global Step: 46950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:41,652-Speed 9677.36 samples/sec   Loss 8.6925   LearningRate 0.0738   Epoch: 2   Global Step: 46960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:42,729-Speed 9510.48 samples/sec   Loss 8.7086   LearningRate 0.0738   Epoch: 2   Global Step: 46970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:43,833-Speed 9288.14 samples/sec   Loss 8.6497   LearningRate 0.0738   Epoch: 2   Global Step: 46980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:44,912-Speed 9492.20 samples/sec   Loss 8.6237   LearningRate 0.0738   Epoch: 2   Global Step: 46990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:45,968-Speed 9704.06 samples/sec   Loss 8.6669   LearningRate 0.0738   Epoch: 2   Global Step: 47000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:47,042-Speed 9542.92 samples/sec   Loss 8.7187   LearningRate 0.0738   Epoch: 2   Global Step: 47010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:48,112-Speed 9569.79 samples/sec   Loss 8.8249   LearningRate 0.0738   Epoch: 2   Global Step: 47020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:49,160-Speed 9782.20 samples/sec   Loss 8.6180   LearningRate 0.0738   Epoch: 2   Global Step: 47030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:50,222-Speed 9643.24 samples/sec   Loss 8.7391   LearningRate 0.0738   Epoch: 2   Global Step: 47040   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:25:51,310-Speed 9415.76 samples/sec   Loss 8.6799   LearningRate 0.0738   Epoch: 2   Global Step: 47050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:52,409-Speed 9328.43 samples/sec   Loss 8.5987   LearningRate 0.0738   Epoch: 2   Global Step: 47060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:53,540-Speed 9057.40 samples/sec   Loss 8.6794   LearningRate 0.0738   Epoch: 2   Global Step: 47070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:54,628-Speed 9418.13 samples/sec   Loss 8.6305   LearningRate 0.0738   Epoch: 2   Global Step: 47080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:55,727-Speed 9323.16 samples/sec   Loss 8.7785   LearningRate 0.0738   Epoch: 2   Global Step: 47090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:56,804-Speed 9510.59 samples/sec   Loss 8.6437   LearningRate 0.0738   Epoch: 2   Global Step: 47100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:57,898-Speed 9370.64 samples/sec   Loss 8.7383   LearningRate 0.0738   Epoch: 2   Global Step: 47110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:25:58,959-Speed 9655.16 samples/sec   Loss 8.6786   LearningRate 0.0738   Epoch: 2   Global Step: 47120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:00,045-Speed 9440.12 samples/sec   Loss 8.5937   LearningRate 0.0738   Epoch: 2   Global Step: 47130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:01,136-Speed 9387.97 samples/sec   Loss 8.6676   LearningRate 0.0738   Epoch: 2   Global Step: 47140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:02,259-Speed 9121.74 samples/sec   Loss 8.6650   LearningRate 0.0737   Epoch: 2   Global Step: 47150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:03,325-Speed 9616.21 samples/sec   Loss 8.6786   LearningRate 0.0737   Epoch: 2   Global Step: 47160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:04,384-Speed 9677.04 samples/sec   Loss 8.6906   LearningRate 0.0737   Epoch: 2   Global Step: 47170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:05,459-Speed 9530.02 samples/sec   Loss 8.5705   LearningRate 0.0737   Epoch: 2   Global Step: 47180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:06,546-Speed 9423.98 samples/sec   Loss 8.7331   LearningRate 0.0737   Epoch: 2   Global Step: 47190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:07,623-Speed 9515.46 samples/sec   Loss 8.5832   LearningRate 0.0737   Epoch: 2   Global Step: 47200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:08,701-Speed 9501.41 samples/sec   Loss 8.7269   LearningRate 0.0737   Epoch: 2   Global Step: 47210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:09,855-Speed 8884.19 samples/sec   Loss 8.8033   LearningRate 0.0737   Epoch: 2   Global Step: 47220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:10,924-Speed 9582.13 samples/sec   Loss 8.7055   LearningRate 0.0737   Epoch: 2   Global Step: 47230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:12,008-Speed 9453.53 samples/sec   Loss 8.8053   LearningRate 0.0737   Epoch: 2   Global Step: 47240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:13,098-Speed 9391.98 samples/sec   Loss 8.7512   LearningRate 0.0737   Epoch: 2   Global Step: 47250   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:26:14,174-Speed 9524.71 samples/sec   Loss 8.7874   LearningRate 0.0737   Epoch: 2   Global Step: 47260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:15,209-Speed 9903.86 samples/sec   Loss 8.7318   LearningRate 0.0737   Epoch: 2   Global Step: 47270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:16,259-Speed 9764.14 samples/sec   Loss 8.6651   LearningRate 0.0737   Epoch: 2   Global Step: 47280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:17,314-Speed 9710.37 samples/sec   Loss 8.7815   LearningRate 0.0737   Epoch: 2   Global Step: 47290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:18,419-Speed 9266.33 samples/sec   Loss 8.5583   LearningRate 0.0737   Epoch: 2   Global Step: 47300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:19,488-Speed 9585.23 samples/sec   Loss 8.7416   LearningRate 0.0737   Epoch: 2   Global Step: 47310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:20,584-Speed 9348.49 samples/sec   Loss 8.7142   LearningRate 0.0737   Epoch: 2   Global Step: 47320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:21,699-Speed 9189.62 samples/sec   Loss 8.6408   LearningRate 0.0737   Epoch: 2   Global Step: 47330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:22,768-Speed 9585.13 samples/sec   Loss 8.7605   LearningRate 0.0736   Epoch: 2   Global Step: 47340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:23,811-Speed 9823.79 samples/sec   Loss 8.7864   LearningRate 0.0736   Epoch: 2   Global Step: 47350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:24,866-Speed 9710.90 samples/sec   Loss 8.7204   LearningRate 0.0736   Epoch: 2   Global Step: 47360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:25,918-Speed 9741.58 samples/sec   Loss 8.6229   LearningRate 0.0736   Epoch: 2   Global Step: 47370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:26,986-Speed 9591.81 samples/sec   Loss 8.7811   LearningRate 0.0736   Epoch: 2   Global Step: 47380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:28,063-Speed 9513.97 samples/sec   Loss 8.7027   LearningRate 0.0736   Epoch: 2   Global Step: 47390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:29,154-Speed 9393.47 samples/sec   Loss 8.7529   LearningRate 0.0736   Epoch: 2   Global Step: 47400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:30,274-Speed 9142.25 samples/sec   Loss 8.6324   LearningRate 0.0736   Epoch: 2   Global Step: 47410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:31,367-Speed 9379.66 samples/sec   Loss 8.6475   LearningRate 0.0736   Epoch: 2   Global Step: 47420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:32,433-Speed 9611.67 samples/sec   Loss 8.6540   LearningRate 0.0736   Epoch: 2   Global Step: 47430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:33,511-Speed 9509.10 samples/sec   Loss 8.5979   LearningRate 0.0736   Epoch: 2   Global Step: 47440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:34,551-Speed 9851.39 samples/sec   Loss 8.7277   LearningRate 0.0736   Epoch: 2   Global Step: 47450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:35,616-Speed 9618.48 samples/sec   Loss 8.7038   LearningRate 0.0736   Epoch: 2   Global Step: 47460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:36,683-Speed 9607.86 samples/sec   Loss 8.7254   LearningRate 0.0736   Epoch: 2   Global Step: 47470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:37,741-Speed 9683.29 samples/sec   Loss 8.6028   LearningRate 0.0736   Epoch: 2   Global Step: 47480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:38,801-Speed 9667.72 samples/sec   Loss 8.7128   LearningRate 0.0736   Epoch: 2   Global Step: 47490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:39,912-Speed 9224.70 samples/sec   Loss 8.5821   LearningRate 0.0736   Epoch: 2   Global Step: 47500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:26:40,992-Speed 9488.79 samples/sec   Loss 8.7421   LearningRate 0.0736   Epoch: 2   Global Step: 47510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:42,113-Speed 9134.38 samples/sec   Loss 8.6554   LearningRate 0.0736   Epoch: 2   Global Step: 47520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:43,158-Speed 9805.41 samples/sec   Loss 8.6531   LearningRate 0.0736   Epoch: 2   Global Step: 47530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:44,231-Speed 9549.21 samples/sec   Loss 8.7014   LearningRate 0.0735   Epoch: 2   Global Step: 47540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:45,277-Speed 9800.70 samples/sec   Loss 8.5862   LearningRate 0.0735   Epoch: 2   Global Step: 47550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:46,324-Speed 9790.43 samples/sec   Loss 8.6744   LearningRate 0.0735   Epoch: 2   Global Step: 47560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:47,402-Speed 9501.12 samples/sec   Loss 8.5998   LearningRate 0.0735   Epoch: 2   Global Step: 47570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:48,470-Speed 9589.39 samples/sec   Loss 8.7331   LearningRate 0.0735   Epoch: 2   Global Step: 47580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:49,561-Speed 9398.34 samples/sec   Loss 8.6499   LearningRate 0.0735   Epoch: 2   Global Step: 47590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:50,668-Speed 9254.46 samples/sec   Loss 8.7910   LearningRate 0.0735   Epoch: 2   Global Step: 47600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:51,726-Speed 9687.07 samples/sec   Loss 8.6900   LearningRate 0.0735   Epoch: 2   Global Step: 47610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:52,829-Speed 9294.91 samples/sec   Loss 8.6720   LearningRate 0.0735   Epoch: 2   Global Step: 47620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:53,907-Speed 9501.32 samples/sec   Loss 8.5804   LearningRate 0.0735   Epoch: 2   Global Step: 47630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:54,954-Speed 9788.95 samples/sec   Loss 8.7009   LearningRate 0.0735   Epoch: 2   Global Step: 47640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:56,004-Speed 9749.33 samples/sec   Loss 8.8158   LearningRate 0.0735   Epoch: 2   Global Step: 47650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:57,096-Speed 9386.12 samples/sec   Loss 8.6576   LearningRate 0.0735   Epoch: 2   Global Step: 47660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:58,152-Speed 9708.38 samples/sec   Loss 8.5466   LearningRate 0.0735   Epoch: 2   Global Step: 47670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:26:59,232-Speed 9485.18 samples/sec   Loss 8.6743   LearningRate 0.0735   Epoch: 2   Global Step: 47680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:00,294-Speed 9642.45 samples/sec   Loss 8.6471   LearningRate 0.0735   Epoch: 2   Global Step: 47690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:01,374-Speed 9486.08 samples/sec   Loss 8.6964   LearningRate 0.0735   Epoch: 2   Global Step: 47700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:02,507-Speed 9044.67 samples/sec   Loss 8.6115   LearningRate 0.0735   Epoch: 2   Global Step: 47710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:03,588-Speed 9482.81 samples/sec   Loss 8.6674   LearningRate 0.0735   Epoch: 2   Global Step: 47720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:04,632-Speed 9810.06 samples/sec   Loss 8.7321   LearningRate 0.0734   Epoch: 2   Global Step: 47730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:05,714-Speed 9465.71 samples/sec   Loss 8.6510   LearningRate 0.0734   Epoch: 2   Global Step: 47740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:06,803-Speed 9409.63 samples/sec   Loss 8.6085   LearningRate 0.0734   Epoch: 2   Global Step: 47750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:07,904-Speed 9305.61 samples/sec   Loss 8.7034   LearningRate 0.0734   Epoch: 2   Global Step: 47760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:08,996-Speed 9394.68 samples/sec   Loss 8.7727   LearningRate 0.0734   Epoch: 2   Global Step: 47770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:10,065-Speed 9587.92 samples/sec   Loss 8.8193   LearningRate 0.0734   Epoch: 2   Global Step: 47780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:27:11,180-Speed 9190.68 samples/sec   Loss 8.6929   LearningRate 0.0734   Epoch: 2   Global Step: 47790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:27:12,288-Speed 9245.27 samples/sec   Loss 8.8012   LearningRate 0.0734   Epoch: 2   Global Step: 47800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:27:13,373-Speed 9448.74 samples/sec   Loss 8.6952   LearningRate 0.0734   Epoch: 2   Global Step: 47810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:14,446-Speed 9541.46 samples/sec   Loss 8.6531   LearningRate 0.0734   Epoch: 2   Global Step: 47820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:15,542-Speed 9357.78 samples/sec   Loss 8.6714   LearningRate 0.0734   Epoch: 2   Global Step: 47830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:16,564-Speed 10023.56 samples/sec   Loss 8.7735   LearningRate 0.0734   Epoch: 2   Global Step: 47840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:17,664-Speed 9310.53 samples/sec   Loss 8.7448   LearningRate 0.0734   Epoch: 2   Global Step: 47850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:18,747-Speed 9460.39 samples/sec   Loss 8.6661   LearningRate 0.0734   Epoch: 2   Global Step: 47860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:19,825-Speed 9508.27 samples/sec   Loss 8.6612   LearningRate 0.0734   Epoch: 2   Global Step: 47870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:20,908-Speed 9460.08 samples/sec   Loss 8.8066   LearningRate 0.0734   Epoch: 2   Global Step: 47880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:21,958-Speed 9759.22 samples/sec   Loss 8.5782   LearningRate 0.0734   Epoch: 2   Global Step: 47890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:23,074-Speed 9178.10 samples/sec   Loss 8.6388   LearningRate 0.0734   Epoch: 2   Global Step: 47900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:27:24,139-Speed 9625.44 samples/sec   Loss 8.6433   LearningRate 0.0734   Epoch: 2   Global Step: 47910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:27:25,235-Speed 9345.23 samples/sec   Loss 8.6256   LearningRate 0.0734   Epoch: 2   Global Step: 47920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:27:26,305-Speed 9577.82 samples/sec   Loss 8.5434   LearningRate 0.0733   Epoch: 2   Global Step: 47930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:27:27,409-Speed 9279.94 samples/sec   Loss 8.7116   LearningRate 0.0733   Epoch: 2   Global Step: 47940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:27:28,489-Speed 9493.21 samples/sec   Loss 8.7112   LearningRate 0.0733   Epoch: 2   Global Step: 47950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:27:29,552-Speed 9645.44 samples/sec   Loss 8.5132   LearningRate 0.0733   Epoch: 2   Global Step: 47960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:27:30,644-Speed 9382.21 samples/sec   Loss 8.6563   LearningRate 0.0733   Epoch: 2   Global Step: 47970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:27:31,794-Speed 8909.86 samples/sec   Loss 8.5490   LearningRate 0.0733   Epoch: 2   Global Step: 47980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:27:32,884-Speed 9395.87 samples/sec   Loss 8.6384   LearningRate 0.0733   Epoch: 2   Global Step: 47990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:27:33,940-Speed 9703.08 samples/sec   Loss 8.5644   LearningRate 0.0733   Epoch: 2   Global Step: 48000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:27:55,715-[lfw][48000]XNorm: 12.743046
Training: 2022-04-11 13:27:55,715-[lfw][48000]Accuracy-Flip: 0.99583+-0.00291
Training: 2022-04-11 13:27:55,716-[lfw][48000]Accuracy-Highest: 0.99583
Training: 2022-04-11 13:28:20,929-[cfp_fp][48000]XNorm: 10.735440
Training: 2022-04-11 13:28:20,930-[cfp_fp][48000]Accuracy-Flip: 0.94686+-0.01185
Training: 2022-04-11 13:28:20,930-[cfp_fp][48000]Accuracy-Highest: 0.94700
Training: 2022-04-11 13:28:42,638-[agedb_30][48000]XNorm: 12.322298
Training: 2022-04-11 13:28:42,638-[agedb_30][48000]Accuracy-Flip: 0.95433+-0.00810
Training: 2022-04-11 13:28:42,639-[agedb_30][48000]Accuracy-Highest: 0.95483
Training: 2022-04-11 13:28:43,712-Speed 146.77 samples/sec   Loss 8.5289   LearningRate 0.0733   Epoch: 2   Global Step: 48010   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:28:44,781-Speed 9585.18 samples/sec   Loss 8.6732   LearningRate 0.0733   Epoch: 2   Global Step: 48020   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:28:45,849-Speed 9592.96 samples/sec   Loss 8.5902   LearningRate 0.0733   Epoch: 2   Global Step: 48030   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:28:46,891-Speed 9829.54 samples/sec   Loss 8.6468   LearningRate 0.0733   Epoch: 2   Global Step: 48040   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:28:47,950-Speed 9680.07 samples/sec   Loss 8.6227   LearningRate 0.0733   Epoch: 2   Global Step: 48050   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:28:49,060-Speed 9225.03 samples/sec   Loss 8.6764   LearningRate 0.0733   Epoch: 2   Global Step: 48060   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:28:50,121-Speed 9662.51 samples/sec   Loss 8.6311   LearningRate 0.0733   Epoch: 2   Global Step: 48070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:28:51,223-Speed 9300.14 samples/sec   Loss 8.6428   LearningRate 0.0733   Epoch: 2   Global Step: 48080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:28:52,289-Speed 9611.40 samples/sec   Loss 8.5692   LearningRate 0.0733   Epoch: 2   Global Step: 48090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:28:53,338-Speed 9768.16 samples/sec   Loss 8.6217   LearningRate 0.0733   Epoch: 2   Global Step: 48100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:28:54,406-Speed 9587.74 samples/sec   Loss 8.5579   LearningRate 0.0733   Epoch: 2   Global Step: 48110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:28:55,469-Speed 9645.63 samples/sec   Loss 8.6980   LearningRate 0.0732   Epoch: 2   Global Step: 48120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:28:56,542-Speed 9546.59 samples/sec   Loss 8.6658   LearningRate 0.0732   Epoch: 2   Global Step: 48130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:28:57,561-Speed 10050.75 samples/sec   Loss 8.5673   LearningRate 0.0732   Epoch: 2   Global Step: 48140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:28:58,631-Speed 9573.38 samples/sec   Loss 8.6536   LearningRate 0.0732   Epoch: 2   Global Step: 48150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:28:59,697-Speed 9615.14 samples/sec   Loss 8.7337   LearningRate 0.0732   Epoch: 2   Global Step: 48160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:00,803-Speed 9260.95 samples/sec   Loss 8.7263   LearningRate 0.0732   Epoch: 2   Global Step: 48170   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:29:01,876-Speed 9549.57 samples/sec   Loss 8.5941   LearningRate 0.0732   Epoch: 2   Global Step: 48180   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:29:02,965-Speed 9416.97 samples/sec   Loss 8.6580   LearningRate 0.0732   Epoch: 2   Global Step: 48190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:04,055-Speed 9396.71 samples/sec   Loss 8.6893   LearningRate 0.0732   Epoch: 2   Global Step: 48200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:05,170-Speed 9189.62 samples/sec   Loss 8.5824   LearningRate 0.0732   Epoch: 2   Global Step: 48210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:06,215-Speed 9806.91 samples/sec   Loss 8.6169   LearningRate 0.0732   Epoch: 2   Global Step: 48220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:07,300-Speed 9443.92 samples/sec   Loss 8.6369   LearningRate 0.0732   Epoch: 2   Global Step: 48230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:08,366-Speed 9608.58 samples/sec   Loss 8.6595   LearningRate 0.0732   Epoch: 2   Global Step: 48240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:09,455-Speed 9413.30 samples/sec   Loss 8.5946   LearningRate 0.0732   Epoch: 2   Global Step: 48250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:10,529-Speed 9537.72 samples/sec   Loss 8.7465   LearningRate 0.0732   Epoch: 2   Global Step: 48260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:11,613-Speed 9453.52 samples/sec   Loss 8.6218   LearningRate 0.0732   Epoch: 2   Global Step: 48270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:12,674-Speed 9656.88 samples/sec   Loss 8.6803   LearningRate 0.0732   Epoch: 2   Global Step: 48280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:13,737-Speed 9633.13 samples/sec   Loss 8.6808   LearningRate 0.0732   Epoch: 2   Global Step: 48290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:14,852-Speed 9197.41 samples/sec   Loss 8.5292   LearningRate 0.0732   Epoch: 2   Global Step: 48300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:15,952-Speed 9311.60 samples/sec   Loss 8.6503   LearningRate 0.0732   Epoch: 2   Global Step: 48310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:17,043-Speed 9396.96 samples/sec   Loss 8.6123   LearningRate 0.0731   Epoch: 2   Global Step: 48320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:18,185-Speed 8970.60 samples/sec   Loss 8.7287   LearningRate 0.0731   Epoch: 2   Global Step: 48330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:19,267-Speed 9473.60 samples/sec   Loss 8.5009   LearningRate 0.0731   Epoch: 2   Global Step: 48340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:20,373-Speed 9258.57 samples/sec   Loss 8.6217   LearningRate 0.0731   Epoch: 2   Global Step: 48350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:21,450-Speed 9519.92 samples/sec   Loss 8.7171   LearningRate 0.0731   Epoch: 2   Global Step: 48360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:22,517-Speed 9597.31 samples/sec   Loss 8.6066   LearningRate 0.0731   Epoch: 2   Global Step: 48370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:23,550-Speed 9921.85 samples/sec   Loss 8.4243   LearningRate 0.0731   Epoch: 2   Global Step: 48380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:24,626-Speed 9523.50 samples/sec   Loss 8.5695   LearningRate 0.0731   Epoch: 2   Global Step: 48390   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:29:25,679-Speed 9729.11 samples/sec   Loss 8.7001   LearningRate 0.0731   Epoch: 2   Global Step: 48400   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:29:26,734-Speed 9706.75 samples/sec   Loss 8.6078   LearningRate 0.0731   Epoch: 2   Global Step: 48410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:27,851-Speed 9177.45 samples/sec   Loss 8.6925   LearningRate 0.0731   Epoch: 2   Global Step: 48420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:28,931-Speed 9487.51 samples/sec   Loss 8.5497   LearningRate 0.0731   Epoch: 2   Global Step: 48430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:30,018-Speed 9423.30 samples/sec   Loss 8.5698   LearningRate 0.0731   Epoch: 2   Global Step: 48440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:31,091-Speed 9548.16 samples/sec   Loss 8.6191   LearningRate 0.0731   Epoch: 2   Global Step: 48450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:32,124-Speed 9919.28 samples/sec   Loss 8.7051   LearningRate 0.0731   Epoch: 2   Global Step: 48460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:33,207-Speed 9457.39 samples/sec   Loss 8.5098   LearningRate 0.0731   Epoch: 2   Global Step: 48470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:34,233-Speed 9989.85 samples/sec   Loss 8.6481   LearningRate 0.0731   Epoch: 2   Global Step: 48480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:35,266-Speed 9923.32 samples/sec   Loss 8.6776   LearningRate 0.0731   Epoch: 2   Global Step: 48490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:36,350-Speed 9449.14 samples/sec   Loss 8.7051   LearningRate 0.0731   Epoch: 2   Global Step: 48500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:37,405-Speed 9708.99 samples/sec   Loss 8.5729   LearningRate 0.0730   Epoch: 2   Global Step: 48510   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:29:38,494-Speed 9421.30 samples/sec   Loss 8.6538   LearningRate 0.0730   Epoch: 2   Global Step: 48520   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:29:39,551-Speed 9693.34 samples/sec   Loss 8.5190   LearningRate 0.0730   Epoch: 2   Global Step: 48530   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:29:40,621-Speed 9576.81 samples/sec   Loss 8.6359   LearningRate 0.0730   Epoch: 2   Global Step: 48540   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:29:41,660-Speed 9858.95 samples/sec   Loss 8.6428   LearningRate 0.0730   Epoch: 2   Global Step: 48550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:29:42,725-Speed 9619.81 samples/sec   Loss 8.6477   LearningRate 0.0730   Epoch: 2   Global Step: 48560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:29:43,802-Speed 9512.88 samples/sec   Loss 8.6414   LearningRate 0.0730   Epoch: 2   Global Step: 48570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:29:44,877-Speed 9537.09 samples/sec   Loss 8.6020   LearningRate 0.0730   Epoch: 2   Global Step: 48580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:29:45,941-Speed 9624.91 samples/sec   Loss 8.5658   LearningRate 0.0730   Epoch: 2   Global Step: 48590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:29:47,024-Speed 9465.08 samples/sec   Loss 8.6597   LearningRate 0.0730   Epoch: 2   Global Step: 48600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:29:48,084-Speed 9666.83 samples/sec   Loss 8.6671   LearningRate 0.0730   Epoch: 2   Global Step: 48610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:29:49,167-Speed 9461.77 samples/sec   Loss 8.7540   LearningRate 0.0730   Epoch: 2   Global Step: 48620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:29:50,281-Speed 9193.79 samples/sec   Loss 8.5777   LearningRate 0.0730   Epoch: 2   Global Step: 48630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:29:51,354-Speed 9549.36 samples/sec   Loss 8.6982   LearningRate 0.0730   Epoch: 2   Global Step: 48640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:29:52,455-Speed 9307.72 samples/sec   Loss 8.6999   LearningRate 0.0730   Epoch: 2   Global Step: 48650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:53,528-Speed 9554.09 samples/sec   Loss 8.6042   LearningRate 0.0730   Epoch: 2   Global Step: 48660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:54,585-Speed 9688.23 samples/sec   Loss 8.6605   LearningRate 0.0730   Epoch: 2   Global Step: 48670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:55,614-Speed 9959.80 samples/sec   Loss 8.6233   LearningRate 0.0730   Epoch: 2   Global Step: 48680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:56,694-Speed 9492.56 samples/sec   Loss 8.4690   LearningRate 0.0730   Epoch: 2   Global Step: 48690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:57,794-Speed 9321.27 samples/sec   Loss 8.6001   LearningRate 0.0730   Epoch: 2   Global Step: 48700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:58,903-Speed 9235.87 samples/sec   Loss 8.7219   LearningRate 0.0729   Epoch: 2   Global Step: 48710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:29:59,997-Speed 9363.74 samples/sec   Loss 8.6060   LearningRate 0.0729   Epoch: 2   Global Step: 48720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:01,089-Speed 9381.34 samples/sec   Loss 8.5705   LearningRate 0.0729   Epoch: 2   Global Step: 48730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:02,194-Speed 9276.79 samples/sec   Loss 8.5421   LearningRate 0.0729   Epoch: 2   Global Step: 48740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:03,254-Speed 9662.19 samples/sec   Loss 8.6753   LearningRate 0.0729   Epoch: 2   Global Step: 48750   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:30:04,321-Speed 9603.80 samples/sec   Loss 8.4803   LearningRate 0.0729   Epoch: 2   Global Step: 48760   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:30:05,410-Speed 9413.26 samples/sec   Loss 8.5345   LearningRate 0.0729   Epoch: 2   Global Step: 48770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:06,484-Speed 9537.59 samples/sec   Loss 8.6330   LearningRate 0.0729   Epoch: 2   Global Step: 48780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:07,560-Speed 9522.02 samples/sec   Loss 8.5256   LearningRate 0.0729   Epoch: 2   Global Step: 48790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:08,661-Speed 9307.06 samples/sec   Loss 8.5493   LearningRate 0.0729   Epoch: 2   Global Step: 48800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:09,731-Speed 9573.73 samples/sec   Loss 8.6787   LearningRate 0.0729   Epoch: 2   Global Step: 48810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:10,766-Speed 9903.47 samples/sec   Loss 8.6279   LearningRate 0.0729   Epoch: 2   Global Step: 48820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:11,820-Speed 9715.70 samples/sec   Loss 8.7059   LearningRate 0.0729   Epoch: 2   Global Step: 48830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:12,859-Speed 9867.75 samples/sec   Loss 8.5279   LearningRate 0.0729   Epoch: 2   Global Step: 48840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:13,957-Speed 9333.55 samples/sec   Loss 8.6545   LearningRate 0.0729   Epoch: 2   Global Step: 48850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:15,024-Speed 9598.10 samples/sec   Loss 8.5421   LearningRate 0.0729   Epoch: 2   Global Step: 48860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:16,085-Speed 9662.49 samples/sec   Loss 8.5912   LearningRate 0.0729   Epoch: 2   Global Step: 48870   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:30:17,151-Speed 9607.83 samples/sec   Loss 8.6583   LearningRate 0.0729   Epoch: 2   Global Step: 48880   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:30:18,257-Speed 9268.80 samples/sec   Loss 8.6153   LearningRate 0.0729   Epoch: 2   Global Step: 48890   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:30:19,313-Speed 9700.57 samples/sec   Loss 8.6202   LearningRate 0.0728   Epoch: 2   Global Step: 48900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:20,383-Speed 9580.99 samples/sec   Loss 8.6080   LearningRate 0.0728   Epoch: 2   Global Step: 48910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:21,466-Speed 9462.23 samples/sec   Loss 8.5397   LearningRate 0.0728   Epoch: 2   Global Step: 48920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:22,500-Speed 9911.00 samples/sec   Loss 8.5591   LearningRate 0.0728   Epoch: 2   Global Step: 48930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:23,556-Speed 9700.43 samples/sec   Loss 8.6573   LearningRate 0.0728   Epoch: 2   Global Step: 48940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:24,619-Speed 9632.48 samples/sec   Loss 8.6347   LearningRate 0.0728   Epoch: 2   Global Step: 48950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:25,671-Speed 9740.68 samples/sec   Loss 8.5618   LearningRate 0.0728   Epoch: 2   Global Step: 48960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:26,819-Speed 8928.02 samples/sec   Loss 8.5120   LearningRate 0.0728   Epoch: 2   Global Step: 48970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:27,931-Speed 9212.03 samples/sec   Loss 8.6643   LearningRate 0.0728   Epoch: 2   Global Step: 48980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:28,982-Speed 9751.53 samples/sec   Loss 8.6410   LearningRate 0.0728   Epoch: 2   Global Step: 48990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:30,066-Speed 9446.62 samples/sec   Loss 8.6574   LearningRate 0.0728   Epoch: 2   Global Step: 49000   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:30:31,155-Speed 9411.14 samples/sec   Loss 8.5935   LearningRate 0.0728   Epoch: 2   Global Step: 49010   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:30:32,235-Speed 9484.05 samples/sec   Loss 8.5682   LearningRate 0.0728   Epoch: 2   Global Step: 49020   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:30:33,306-Speed 9573.09 samples/sec   Loss 8.5277   LearningRate 0.0728   Epoch: 2   Global Step: 49030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:30:34,392-Speed 9436.78 samples/sec   Loss 8.5329   LearningRate 0.0728   Epoch: 2   Global Step: 49040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:30:35,491-Speed 9322.02 samples/sec   Loss 8.6214   LearningRate 0.0728   Epoch: 2   Global Step: 49050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:30:36,566-Speed 9535.61 samples/sec   Loss 8.6183   LearningRate 0.0728   Epoch: 2   Global Step: 49060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:30:37,678-Speed 9212.08 samples/sec   Loss 8.6525   LearningRate 0.0728   Epoch: 2   Global Step: 49070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:30:38,750-Speed 9558.05 samples/sec   Loss 8.5166   LearningRate 0.0728   Epoch: 2   Global Step: 49080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:30:39,815-Speed 9624.91 samples/sec   Loss 8.5288   LearningRate 0.0728   Epoch: 2   Global Step: 49090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:30:40,910-Speed 9352.59 samples/sec   Loss 8.6464   LearningRate 0.0727   Epoch: 2   Global Step: 49100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:30:41,990-Speed 9491.65 samples/sec   Loss 8.5637   LearningRate 0.0727   Epoch: 2   Global Step: 49110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:30:43,053-Speed 9638.25 samples/sec   Loss 8.6384   LearningRate 0.0727   Epoch: 2   Global Step: 49120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:30:44,114-Speed 9653.90 samples/sec   Loss 8.6716   LearningRate 0.0727   Epoch: 2   Global Step: 49130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:45,188-Speed 9541.67 samples/sec   Loss 8.6559   LearningRate 0.0727   Epoch: 2   Global Step: 49140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:46,276-Speed 9421.02 samples/sec   Loss 8.5534   LearningRate 0.0727   Epoch: 2   Global Step: 49150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:47,390-Speed 9199.70 samples/sec   Loss 8.6470   LearningRate 0.0727   Epoch: 2   Global Step: 49160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:48,448-Speed 9675.35 samples/sec   Loss 8.6038   LearningRate 0.0727   Epoch: 2   Global Step: 49170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:49,557-Speed 9245.58 samples/sec   Loss 8.6068   LearningRate 0.0727   Epoch: 2   Global Step: 49180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:50,629-Speed 9558.14 samples/sec   Loss 8.7646   LearningRate 0.0727   Epoch: 2   Global Step: 49190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:51,677-Speed 9783.23 samples/sec   Loss 8.5147   LearningRate 0.0727   Epoch: 2   Global Step: 49200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:52,713-Speed 9892.45 samples/sec   Loss 8.6210   LearningRate 0.0727   Epoch: 2   Global Step: 49210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:53,788-Speed 9528.90 samples/sec   Loss 8.4373   LearningRate 0.0727   Epoch: 2   Global Step: 49220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:54,805-Speed 10076.51 samples/sec   Loss 8.5735   LearningRate 0.0727   Epoch: 2   Global Step: 49230   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:30:55,826-Speed 10034.01 samples/sec   Loss 8.5748   LearningRate 0.0727   Epoch: 2   Global Step: 49240   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:30:56,876-Speed 9756.73 samples/sec   Loss 8.5109   LearningRate 0.0727   Epoch: 2   Global Step: 49250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:57,899-Speed 10013.89 samples/sec   Loss 8.5253   LearningRate 0.0727   Epoch: 2   Global Step: 49260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:30:58,959-Speed 9667.70 samples/sec   Loss 8.7142   LearningRate 0.0727   Epoch: 2   Global Step: 49270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:00,003-Speed 9809.23 samples/sec   Loss 8.6269   LearningRate 0.0727   Epoch: 2   Global Step: 49280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:01,094-Speed 9394.07 samples/sec   Loss 8.6152   LearningRate 0.0726   Epoch: 2   Global Step: 49290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:02,211-Speed 9173.87 samples/sec   Loss 8.5770   LearningRate 0.0726   Epoch: 2   Global Step: 49300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:03,341-Speed 9069.47 samples/sec   Loss 8.5504   LearningRate 0.0726   Epoch: 2   Global Step: 49310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:04,392-Speed 9746.35 samples/sec   Loss 8.6053   LearningRate 0.0726   Epoch: 2   Global Step: 49320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:05,466-Speed 9540.79 samples/sec   Loss 8.6308   LearningRate 0.0726   Epoch: 2   Global Step: 49330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:06,525-Speed 9684.15 samples/sec   Loss 8.5654   LearningRate 0.0726   Epoch: 2   Global Step: 49340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:07,619-Speed 9365.32 samples/sec   Loss 8.6275   LearningRate 0.0726   Epoch: 2   Global Step: 49350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:08,691-Speed 9557.00 samples/sec   Loss 8.6253   LearningRate 0.0726   Epoch: 2   Global Step: 49360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:09,733-Speed 9833.57 samples/sec   Loss 8.5626   LearningRate 0.0726   Epoch: 2   Global Step: 49370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:10,825-Speed 9379.73 samples/sec   Loss 8.6415   LearningRate 0.0726   Epoch: 2   Global Step: 49380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:11,928-Speed 9283.55 samples/sec   Loss 8.6018   LearningRate 0.0726   Epoch: 2   Global Step: 49390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:13,020-Speed 9391.79 samples/sec   Loss 8.6718   LearningRate 0.0726   Epoch: 2   Global Step: 49400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:14,079-Speed 9671.77 samples/sec   Loss 8.6470   LearningRate 0.0726   Epoch: 2   Global Step: 49410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:15,153-Speed 9541.01 samples/sec   Loss 8.6156   LearningRate 0.0726   Epoch: 2   Global Step: 49420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:16,243-Speed 9404.30 samples/sec   Loss 8.6124   LearningRate 0.0726   Epoch: 2   Global Step: 49430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:17,313-Speed 9571.70 samples/sec   Loss 8.6623   LearningRate 0.0726   Epoch: 2   Global Step: 49440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:18,391-Speed 9503.62 samples/sec   Loss 8.6222   LearningRate 0.0726   Epoch: 2   Global Step: 49450   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:31:19,436-Speed 9810.72 samples/sec   Loss 8.5883   LearningRate 0.0726   Epoch: 2   Global Step: 49460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:20,553-Speed 9171.16 samples/sec   Loss 8.4676   LearningRate 0.0726   Epoch: 2   Global Step: 49470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:21,644-Speed 9390.44 samples/sec   Loss 8.6011   LearningRate 0.0726   Epoch: 2   Global Step: 49480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:22,729-Speed 9445.39 samples/sec   Loss 8.5911   LearningRate 0.0725   Epoch: 2   Global Step: 49490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:23,816-Speed 9430.90 samples/sec   Loss 8.6282   LearningRate 0.0725   Epoch: 2   Global Step: 49500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:24,902-Speed 9434.56 samples/sec   Loss 8.4980   LearningRate 0.0725   Epoch: 2   Global Step: 49510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:25,972-Speed 9576.67 samples/sec   Loss 8.5675   LearningRate 0.0725   Epoch: 2   Global Step: 49520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:27,009-Speed 9878.75 samples/sec   Loss 8.7179   LearningRate 0.0725   Epoch: 2   Global Step: 49530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:28,065-Speed 9706.53 samples/sec   Loss 8.5737   LearningRate 0.0725   Epoch: 2   Global Step: 49540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:29,132-Speed 9603.87 samples/sec   Loss 8.6653   LearningRate 0.0725   Epoch: 2   Global Step: 49550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:30,181-Speed 9766.85 samples/sec   Loss 8.6526   LearningRate 0.0725   Epoch: 2   Global Step: 49560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:31,282-Speed 9301.50 samples/sec   Loss 8.6555   LearningRate 0.0725   Epoch: 2   Global Step: 49570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:32,386-Speed 9283.18 samples/sec   Loss 8.5450   LearningRate 0.0725   Epoch: 2   Global Step: 49580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:33,482-Speed 9354.17 samples/sec   Loss 8.6426   LearningRate 0.0725   Epoch: 2   Global Step: 49590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:34,587-Speed 9271.40 samples/sec   Loss 8.5823   LearningRate 0.0725   Epoch: 2   Global Step: 49600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:35,685-Speed 9329.08 samples/sec   Loss 8.6485   LearningRate 0.0725   Epoch: 2   Global Step: 49610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:36,782-Speed 9342.49 samples/sec   Loss 8.5508   LearningRate 0.0725   Epoch: 2   Global Step: 49620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:37,880-Speed 9330.17 samples/sec   Loss 8.5255   LearningRate 0.0725   Epoch: 2   Global Step: 49630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:38,988-Speed 9253.93 samples/sec   Loss 8.5603   LearningRate 0.0725   Epoch: 2   Global Step: 49640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:40,092-Speed 9281.12 samples/sec   Loss 8.6701   LearningRate 0.0725   Epoch: 2   Global Step: 49650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:41,159-Speed 9601.62 samples/sec   Loss 8.5656   LearningRate 0.0725   Epoch: 2   Global Step: 49660   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:31:42,208-Speed 9773.47 samples/sec   Loss 8.6017   LearningRate 0.0725   Epoch: 2   Global Step: 49670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:43,256-Speed 9779.31 samples/sec   Loss 8.5634   LearningRate 0.0725   Epoch: 2   Global Step: 49680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:44,319-Speed 9640.34 samples/sec   Loss 8.6704   LearningRate 0.0724   Epoch: 2   Global Step: 49690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:45,367-Speed 9777.11 samples/sec   Loss 8.6426   LearningRate 0.0724   Epoch: 2   Global Step: 49700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:46,442-Speed 9528.42 samples/sec   Loss 8.6496   LearningRate 0.0724   Epoch: 2   Global Step: 49710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:47,535-Speed 9374.41 samples/sec   Loss 8.6322   LearningRate 0.0724   Epoch: 2   Global Step: 49720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:48,572-Speed 9876.72 samples/sec   Loss 8.5751   LearningRate 0.0724   Epoch: 2   Global Step: 49730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:49,638-Speed 9614.73 samples/sec   Loss 8.6972   LearningRate 0.0724   Epoch: 2   Global Step: 49740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:50,678-Speed 9850.15 samples/sec   Loss 8.4684   LearningRate 0.0724   Epoch: 2   Global Step: 49750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:51,757-Speed 9495.83 samples/sec   Loss 8.6256   LearningRate 0.0724   Epoch: 2   Global Step: 49760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:52,823-Speed 9615.29 samples/sec   Loss 8.5607   LearningRate 0.0724   Epoch: 2   Global Step: 49770   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:31:53,907-Speed 9450.03 samples/sec   Loss 8.6310   LearningRate 0.0724   Epoch: 2   Global Step: 49780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:54,975-Speed 9598.65 samples/sec   Loss 8.6012   LearningRate 0.0724   Epoch: 2   Global Step: 49790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:56,090-Speed 9191.31 samples/sec   Loss 8.6054   LearningRate 0.0724   Epoch: 2   Global Step: 49800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:57,144-Speed 9716.49 samples/sec   Loss 8.6025   LearningRate 0.0724   Epoch: 2   Global Step: 49810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:58,223-Speed 9497.14 samples/sec   Loss 8.5761   LearningRate 0.0724   Epoch: 2   Global Step: 49820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:31:59,327-Speed 9282.01 samples/sec   Loss 8.6417   LearningRate 0.0724   Epoch: 2   Global Step: 49830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:32:00,462-Speed 9021.99 samples/sec   Loss 8.5760   LearningRate 0.0724   Epoch: 2   Global Step: 49840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:32:01,552-Speed 9405.27 samples/sec   Loss 8.4439   LearningRate 0.0724   Epoch: 2   Global Step: 49850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:32:02,597-Speed 9808.32 samples/sec   Loss 8.5452   LearningRate 0.0724   Epoch: 2   Global Step: 49860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:32:03,636-Speed 9871.01 samples/sec   Loss 8.4468   LearningRate 0.0724   Epoch: 2   Global Step: 49870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:32:04,724-Speed 9412.90 samples/sec   Loss 8.5748   LearningRate 0.0723   Epoch: 2   Global Step: 49880   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:32:05,812-Speed 9425.41 samples/sec   Loss 8.5480   LearningRate 0.0723   Epoch: 2   Global Step: 49890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:32:06,888-Speed 9518.90 samples/sec   Loss 8.5140   LearningRate 0.0723   Epoch: 2   Global Step: 49900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:32:07,963-Speed 9528.31 samples/sec   Loss 8.5696   LearningRate 0.0723   Epoch: 2   Global Step: 49910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:32:09,072-Speed 9241.86 samples/sec   Loss 8.5518   LearningRate 0.0723   Epoch: 2   Global Step: 49920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:32:10,137-Speed 9619.09 samples/sec   Loss 8.5864   LearningRate 0.0723   Epoch: 2   Global Step: 49930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:32:11,206-Speed 9588.67 samples/sec   Loss 8.5398   LearningRate 0.0723   Epoch: 2   Global Step: 49940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:32:12,236-Speed 9939.30 samples/sec   Loss 8.5693   LearningRate 0.0723   Epoch: 2   Global Step: 49950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:32:13,275-Speed 9868.40 samples/sec   Loss 8.5204   LearningRate 0.0723   Epoch: 2   Global Step: 49960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:32:14,343-Speed 9595.07 samples/sec   Loss 8.6014   LearningRate 0.0723   Epoch: 2   Global Step: 49970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:32:15,385-Speed 9835.45 samples/sec   Loss 8.5849   LearningRate 0.0723   Epoch: 2   Global Step: 49980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:32:16,434-Speed 9764.13 samples/sec   Loss 8.5771   LearningRate 0.0723   Epoch: 2   Global Step: 49990   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:32:17,521-Speed 9424.60 samples/sec   Loss 8.6085   LearningRate 0.0723   Epoch: 2   Global Step: 50000   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:32:39,655-[lfw][50000]XNorm: 13.013867
Training: 2022-04-11 13:32:39,655-[lfw][50000]Accuracy-Flip: 0.99417+-0.00367
Training: 2022-04-11 13:32:39,656-[lfw][50000]Accuracy-Highest: 0.99583
Training: 2022-04-11 13:33:05,454-[cfp_fp][50000]XNorm: 10.992907
Training: 2022-04-11 13:33:05,454-[cfp_fp][50000]Accuracy-Flip: 0.94686+-0.01207
Training: 2022-04-11 13:33:05,455-[cfp_fp][50000]Accuracy-Highest: 0.94700
Training: 2022-04-11 13:33:27,735-[agedb_30][50000]XNorm: 12.554788
Training: 2022-04-11 13:33:27,736-[agedb_30][50000]Accuracy-Flip: 0.95467+-0.00921
Training: 2022-04-11 13:33:27,736-[agedb_30][50000]Accuracy-Highest: 0.95483
Training: 2022-04-11 13:33:28,833-Speed 143.59 samples/sec   Loss 8.6626   LearningRate 0.0723   Epoch: 2   Global Step: 50010   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:33:29,890-Speed 9695.62 samples/sec   Loss 8.5304   LearningRate 0.0723   Epoch: 2   Global Step: 50020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:33:30,920-Speed 9946.00 samples/sec   Loss 8.7091   LearningRate 0.0723   Epoch: 2   Global Step: 50030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:33:31,988-Speed 9591.46 samples/sec   Loss 8.5207   LearningRate 0.0723   Epoch: 2   Global Step: 50040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:33:33,044-Speed 9700.96 samples/sec   Loss 8.5665   LearningRate 0.0723   Epoch: 2   Global Step: 50050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:33:34,132-Speed 9424.69 samples/sec   Loss 8.4573   LearningRate 0.0723   Epoch: 2   Global Step: 50060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:33:35,413-Speed 7994.61 samples/sec   Loss 8.5390   LearningRate 0.0723   Epoch: 2   Global Step: 50070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:03,019-Speed 370.96 samples/sec   Loss 7.9372   LearningRate 0.0722   Epoch: 3   Global Step: 50080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:04,697-Speed 6104.68 samples/sec   Loss 7.7026   LearningRate 0.0722   Epoch: 3   Global Step: 50090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:05,939-Speed 8248.86 samples/sec   Loss 7.7866   LearningRate 0.0722   Epoch: 3   Global Step: 50100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:07,068-Speed 9080.95 samples/sec   Loss 7.6512   LearningRate 0.0722   Epoch: 3   Global Step: 50110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:08,163-Speed 9352.93 samples/sec   Loss 7.7732   LearningRate 0.0722   Epoch: 3   Global Step: 50120   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:34:09,529-Speed 7502.26 samples/sec   Loss 7.7649   LearningRate 0.0722   Epoch: 3   Global Step: 50130   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:34:10,595-Speed 9621.29 samples/sec   Loss 7.6923   LearningRate 0.0722   Epoch: 3   Global Step: 50140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:11,692-Speed 9339.08 samples/sec   Loss 7.8423   LearningRate 0.0722   Epoch: 3   Global Step: 50150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:12,787-Speed 9355.73 samples/sec   Loss 7.7708   LearningRate 0.0722   Epoch: 3   Global Step: 50160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:13,839-Speed 9744.54 samples/sec   Loss 7.7831   LearningRate 0.0722   Epoch: 3   Global Step: 50170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:14,907-Speed 9594.41 samples/sec   Loss 7.7230   LearningRate 0.0722   Epoch: 3   Global Step: 50180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:16,121-Speed 8434.37 samples/sec   Loss 7.7178   LearningRate 0.0722   Epoch: 3   Global Step: 50190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:17,258-Speed 9012.44 samples/sec   Loss 7.7357   LearningRate 0.0722   Epoch: 3   Global Step: 50200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:18,338-Speed 9487.87 samples/sec   Loss 7.7081   LearningRate 0.0722   Epoch: 3   Global Step: 50210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:19,426-Speed 9426.37 samples/sec   Loss 7.7461   LearningRate 0.0722   Epoch: 3   Global Step: 50220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:20,530-Speed 9274.74 samples/sec   Loss 7.6717   LearningRate 0.0722   Epoch: 3   Global Step: 50230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:21,651-Speed 9141.92 samples/sec   Loss 7.8544   LearningRate 0.0722   Epoch: 3   Global Step: 50240   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:34:22,712-Speed 9659.35 samples/sec   Loss 7.8009   LearningRate 0.0722   Epoch: 3   Global Step: 50250   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:34:24,056-Speed 7623.55 samples/sec   Loss 7.8475   LearningRate 0.0722   Epoch: 3   Global Step: 50260   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:34:25,170-Speed 9191.62 samples/sec   Loss 7.6610   LearningRate 0.0721   Epoch: 3   Global Step: 50270   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:34:26,285-Speed 9196.63 samples/sec   Loss 7.7167   LearningRate 0.0721   Epoch: 3   Global Step: 50280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:27,365-Speed 9489.68 samples/sec   Loss 7.7954   LearningRate 0.0721   Epoch: 3   Global Step: 50290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:28,443-Speed 9504.35 samples/sec   Loss 7.7362   LearningRate 0.0721   Epoch: 3   Global Step: 50300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:29,538-Speed 9352.62 samples/sec   Loss 7.7372   LearningRate 0.0721   Epoch: 3   Global Step: 50310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:30,619-Speed 9477.49 samples/sec   Loss 7.7471   LearningRate 0.0721   Epoch: 3   Global Step: 50320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:31,664-Speed 9805.83 samples/sec   Loss 7.8190   LearningRate 0.0721   Epoch: 3   Global Step: 50330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:32,727-Speed 9638.27 samples/sec   Loss 7.8527   LearningRate 0.0721   Epoch: 3   Global Step: 50340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:33,803-Speed 9520.75 samples/sec   Loss 7.7960   LearningRate 0.0721   Epoch: 3   Global Step: 50350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:34,887-Speed 9449.16 samples/sec   Loss 7.8814   LearningRate 0.0721   Epoch: 3   Global Step: 50360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:35,962-Speed 9530.36 samples/sec   Loss 7.9161   LearningRate 0.0721   Epoch: 3   Global Step: 50370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:37,057-Speed 9363.73 samples/sec   Loss 7.7806   LearningRate 0.0721   Epoch: 3   Global Step: 50380   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:34:38,147-Speed 9397.88 samples/sec   Loss 7.8523   LearningRate 0.0721   Epoch: 3   Global Step: 50390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:39,346-Speed 8549.42 samples/sec   Loss 7.8130   LearningRate 0.0721   Epoch: 3   Global Step: 50400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:40,444-Speed 9324.70 samples/sec   Loss 7.9039   LearningRate 0.0721   Epoch: 3   Global Step: 50410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:41,547-Speed 9292.40 samples/sec   Loss 7.8687   LearningRate 0.0721   Epoch: 3   Global Step: 50420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:42,633-Speed 9440.86 samples/sec   Loss 7.7654   LearningRate 0.0721   Epoch: 3   Global Step: 50430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:43,679-Speed 9796.52 samples/sec   Loss 7.9068   LearningRate 0.0721   Epoch: 3   Global Step: 50440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:44,744-Speed 9612.37 samples/sec   Loss 7.7492   LearningRate 0.0721   Epoch: 3   Global Step: 50450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:45,830-Speed 9440.19 samples/sec   Loss 7.7499   LearningRate 0.0721   Epoch: 3   Global Step: 50460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:47,758-Speed 5312.98 samples/sec   Loss 7.8558   LearningRate 0.0720   Epoch: 3   Global Step: 50470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:49,033-Speed 8032.50 samples/sec   Loss 7.7369   LearningRate 0.0720   Epoch: 3   Global Step: 50480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:50,106-Speed 9555.50 samples/sec   Loss 7.7753   LearningRate 0.0720   Epoch: 3   Global Step: 50490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:51,184-Speed 9500.51 samples/sec   Loss 7.9901   LearningRate 0.0720   Epoch: 3   Global Step: 50500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:52,230-Speed 9794.65 samples/sec   Loss 7.7782   LearningRate 0.0720   Epoch: 3   Global Step: 50510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:53,328-Speed 9329.29 samples/sec   Loss 7.9215   LearningRate 0.0720   Epoch: 3   Global Step: 50520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:54,440-Speed 9219.50 samples/sec   Loss 7.9155   LearningRate 0.0720   Epoch: 3   Global Step: 50530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:55,541-Speed 9303.90 samples/sec   Loss 7.8676   LearningRate 0.0720   Epoch: 3   Global Step: 50540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:56,602-Speed 9662.27 samples/sec   Loss 7.8156   LearningRate 0.0720   Epoch: 3   Global Step: 50550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:57,695-Speed 9368.04 samples/sec   Loss 7.8181   LearningRate 0.0720   Epoch: 3   Global Step: 50560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:58,790-Speed 9355.36 samples/sec   Loss 7.9685   LearningRate 0.0720   Epoch: 3   Global Step: 50570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:34:59,925-Speed 9031.75 samples/sec   Loss 7.8473   LearningRate 0.0720   Epoch: 3   Global Step: 50580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:01,033-Speed 9251.74 samples/sec   Loss 7.9339   LearningRate 0.0720   Epoch: 3   Global Step: 50590   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:35:02,092-Speed 9675.25 samples/sec   Loss 7.9426   LearningRate 0.0720   Epoch: 3   Global Step: 50600   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:35:03,148-Speed 9702.23 samples/sec   Loss 7.8271   LearningRate 0.0720   Epoch: 3   Global Step: 50610   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:35:04,300-Speed 8900.39 samples/sec   Loss 7.8809   LearningRate 0.0720   Epoch: 3   Global Step: 50620   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:35:05,415-Speed 9191.25 samples/sec   Loss 7.9798   LearningRate 0.0720   Epoch: 3   Global Step: 50630   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:35:06,489-Speed 9537.20 samples/sec   Loss 7.8743   LearningRate 0.0720   Epoch: 3   Global Step: 50640   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:35:07,583-Speed 9372.03 samples/sec   Loss 7.9252   LearningRate 0.0720   Epoch: 3   Global Step: 50650   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:35:08,650-Speed 9598.26 samples/sec   Loss 7.8309   LearningRate 0.0720   Epoch: 3   Global Step: 50660   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:35:09,707-Speed 9692.07 samples/sec   Loss 7.7595   LearningRate 0.0719   Epoch: 3   Global Step: 50670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:10,743-Speed 9893.71 samples/sec   Loss 7.8112   LearningRate 0.0719   Epoch: 3   Global Step: 50680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:11,793-Speed 9756.12 samples/sec   Loss 7.7976   LearningRate 0.0719   Epoch: 3   Global Step: 50690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:12,913-Speed 9152.66 samples/sec   Loss 7.8680   LearningRate 0.0719   Epoch: 3   Global Step: 50700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:13,969-Speed 9699.36 samples/sec   Loss 7.8249   LearningRate 0.0719   Epoch: 3   Global Step: 50710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:15,074-Speed 9273.28 samples/sec   Loss 7.8844   LearningRate 0.0719   Epoch: 3   Global Step: 50720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:16,110-Speed 9882.90 samples/sec   Loss 8.0126   LearningRate 0.0719   Epoch: 3   Global Step: 50730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:35:17,207-Speed 9343.43 samples/sec   Loss 7.9024   LearningRate 0.0719   Epoch: 3   Global Step: 50740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:35:18,254-Speed 9792.11 samples/sec   Loss 7.8770   LearningRate 0.0719   Epoch: 3   Global Step: 50750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:35:19,342-Speed 9417.67 samples/sec   Loss 7.8336   LearningRate 0.0719   Epoch: 3   Global Step: 50760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:35:20,407-Speed 9615.27 samples/sec   Loss 7.9026   LearningRate 0.0719   Epoch: 3   Global Step: 50770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:35:21,461-Speed 9727.68 samples/sec   Loss 7.7711   LearningRate 0.0719   Epoch: 3   Global Step: 50780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:35:22,546-Speed 9437.71 samples/sec   Loss 7.9148   LearningRate 0.0719   Epoch: 3   Global Step: 50790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:35:23,602-Speed 9705.15 samples/sec   Loss 7.9416   LearningRate 0.0719   Epoch: 3   Global Step: 50800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:35:24,676-Speed 9539.33 samples/sec   Loss 8.0119   LearningRate 0.0719   Epoch: 3   Global Step: 50810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:35:25,757-Speed 9485.04 samples/sec   Loss 8.0232   LearningRate 0.0719   Epoch: 3   Global Step: 50820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:35:26,833-Speed 9521.25 samples/sec   Loss 7.9131   LearningRate 0.0719   Epoch: 3   Global Step: 50830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:27,935-Speed 9291.09 samples/sec   Loss 7.9950   LearningRate 0.0719   Epoch: 3   Global Step: 50840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:29,010-Speed 9534.44 samples/sec   Loss 7.9308   LearningRate 0.0719   Epoch: 3   Global Step: 50850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:30,054-Speed 9817.11 samples/sec   Loss 7.8796   LearningRate 0.0718   Epoch: 3   Global Step: 50860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:31,188-Speed 9033.52 samples/sec   Loss 7.9510   LearningRate 0.0718   Epoch: 3   Global Step: 50870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:32,252-Speed 9628.24 samples/sec   Loss 7.9018   LearningRate 0.0718   Epoch: 3   Global Step: 50880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:33,313-Speed 9656.46 samples/sec   Loss 7.9446   LearningRate 0.0718   Epoch: 3   Global Step: 50890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:34,360-Speed 9788.62 samples/sec   Loss 8.0498   LearningRate 0.0718   Epoch: 3   Global Step: 50900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:35,476-Speed 9181.02 samples/sec   Loss 7.9171   LearningRate 0.0718   Epoch: 3   Global Step: 50910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:36,592-Speed 9184.42 samples/sec   Loss 7.9478   LearningRate 0.0718   Epoch: 3   Global Step: 50920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:37,623-Speed 9931.02 samples/sec   Loss 7.8964   LearningRate 0.0718   Epoch: 3   Global Step: 50930   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:35:38,654-Speed 9941.03 samples/sec   Loss 7.9857   LearningRate 0.0718   Epoch: 3   Global Step: 50940   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:35:39,692-Speed 9876.26 samples/sec   Loss 7.9305   LearningRate 0.0718   Epoch: 3   Global Step: 50950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:40,737-Speed 9801.00 samples/sec   Loss 8.1610   LearningRate 0.0718   Epoch: 3   Global Step: 50960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:41,785-Speed 9774.00 samples/sec   Loss 8.0386   LearningRate 0.0718   Epoch: 3   Global Step: 50970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:42,890-Speed 9276.79 samples/sec   Loss 7.8977   LearningRate 0.0718   Epoch: 3   Global Step: 50980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:43,953-Speed 9637.56 samples/sec   Loss 7.9367   LearningRate 0.0718   Epoch: 3   Global Step: 50990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:45,030-Speed 9509.64 samples/sec   Loss 7.9879   LearningRate 0.0718   Epoch: 3   Global Step: 51000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:46,082-Speed 9738.78 samples/sec   Loss 8.0358   LearningRate 0.0718   Epoch: 3   Global Step: 51010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:47,145-Speed 9643.26 samples/sec   Loss 7.9259   LearningRate 0.0718   Epoch: 3   Global Step: 51020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:48,253-Speed 9242.01 samples/sec   Loss 8.0252   LearningRate 0.0718   Epoch: 3   Global Step: 51030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:49,297-Speed 9820.11 samples/sec   Loss 7.8942   LearningRate 0.0718   Epoch: 3   Global Step: 51040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:50,345-Speed 9779.15 samples/sec   Loss 7.9136   LearningRate 0.0718   Epoch: 3   Global Step: 51050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:51,389-Speed 9816.46 samples/sec   Loss 7.8881   LearningRate 0.0717   Epoch: 3   Global Step: 51060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:52,439-Speed 9754.45 samples/sec   Loss 7.8833   LearningRate 0.0717   Epoch: 3   Global Step: 51070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:53,479-Speed 9850.18 samples/sec   Loss 7.9122   LearningRate 0.0717   Epoch: 3   Global Step: 51080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:54,516-Speed 9881.45 samples/sec   Loss 8.0642   LearningRate 0.0717   Epoch: 3   Global Step: 51090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:55,593-Speed 9519.29 samples/sec   Loss 8.0843   LearningRate 0.0717   Epoch: 3   Global Step: 51100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:56,635-Speed 9836.48 samples/sec   Loss 8.0955   LearningRate 0.0717   Epoch: 3   Global Step: 51110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:57,690-Speed 9706.24 samples/sec   Loss 7.9927   LearningRate 0.0717   Epoch: 3   Global Step: 51120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:58,785-Speed 9357.62 samples/sec   Loss 7.9386   LearningRate 0.0717   Epoch: 3   Global Step: 51130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:35:59,856-Speed 9564.62 samples/sec   Loss 8.0671   LearningRate 0.0717   Epoch: 3   Global Step: 51140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:00,928-Speed 9556.58 samples/sec   Loss 7.9339   LearningRate 0.0717   Epoch: 3   Global Step: 51150   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:36:01,963-Speed 9906.70 samples/sec   Loss 7.9170   LearningRate 0.0717   Epoch: 3   Global Step: 51160   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:36:03,033-Speed 9575.84 samples/sec   Loss 8.1064   LearningRate 0.0717   Epoch: 3   Global Step: 51170   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:36:04,143-Speed 9227.64 samples/sec   Loss 8.0719   LearningRate 0.0717   Epoch: 3   Global Step: 51180   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:36:05,196-Speed 9726.54 samples/sec   Loss 7.9729   LearningRate 0.0717   Epoch: 3   Global Step: 51190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:06,271-Speed 9535.36 samples/sec   Loss 7.9333   LearningRate 0.0717   Epoch: 3   Global Step: 51200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:07,343-Speed 9552.23 samples/sec   Loss 8.0842   LearningRate 0.0717   Epoch: 3   Global Step: 51210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:08,415-Speed 9568.70 samples/sec   Loss 8.0664   LearningRate 0.0717   Epoch: 3   Global Step: 51220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:09,463-Speed 9772.40 samples/sec   Loss 8.0177   LearningRate 0.0717   Epoch: 3   Global Step: 51230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:10,510-Speed 9788.95 samples/sec   Loss 7.8352   LearningRate 0.0717   Epoch: 3   Global Step: 51240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:11,573-Speed 9641.20 samples/sec   Loss 8.0327   LearningRate 0.0717   Epoch: 3   Global Step: 51250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:12,620-Speed 9780.62 samples/sec   Loss 8.0293   LearningRate 0.0716   Epoch: 3   Global Step: 51260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:13,689-Speed 9581.39 samples/sec   Loss 8.0703   LearningRate 0.0716   Epoch: 3   Global Step: 51270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:14,748-Speed 9678.55 samples/sec   Loss 7.9514   LearningRate 0.0716   Epoch: 3   Global Step: 51280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:15,787-Speed 9859.80 samples/sec   Loss 8.0732   LearningRate 0.0716   Epoch: 3   Global Step: 51290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:16,830-Speed 9824.16 samples/sec   Loss 7.9161   LearningRate 0.0716   Epoch: 3   Global Step: 51300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:17,880-Speed 9755.38 samples/sec   Loss 8.0554   LearningRate 0.0716   Epoch: 3   Global Step: 51310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:18,996-Speed 9189.11 samples/sec   Loss 8.0119   LearningRate 0.0716   Epoch: 3   Global Step: 51320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:20,035-Speed 9861.79 samples/sec   Loss 7.8894   LearningRate 0.0716   Epoch: 3   Global Step: 51330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:21,132-Speed 9332.01 samples/sec   Loss 8.2109   LearningRate 0.0716   Epoch: 3   Global Step: 51340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:22,193-Speed 9663.26 samples/sec   Loss 7.9923   LearningRate 0.0716   Epoch: 3   Global Step: 51350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:23,290-Speed 9332.86 samples/sec   Loss 8.1016   LearningRate 0.0716   Epoch: 3   Global Step: 51360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:24,333-Speed 9835.13 samples/sec   Loss 8.1285   LearningRate 0.0716   Epoch: 3   Global Step: 51370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:25,369-Speed 9888.46 samples/sec   Loss 7.9687   LearningRate 0.0716   Epoch: 3   Global Step: 51380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:26,434-Speed 9618.46 samples/sec   Loss 8.0259   LearningRate 0.0716   Epoch: 3   Global Step: 51390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:27,513-Speed 9495.26 samples/sec   Loss 8.1173   LearningRate 0.0716   Epoch: 3   Global Step: 51400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:28,598-Speed 9443.00 samples/sec   Loss 8.0964   LearningRate 0.0716   Epoch: 3   Global Step: 51410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:29,671-Speed 9549.38 samples/sec   Loss 8.0113   LearningRate 0.0716   Epoch: 3   Global Step: 51420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:30,756-Speed 9455.27 samples/sec   Loss 8.0618   LearningRate 0.0716   Epoch: 3   Global Step: 51430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:31,886-Speed 9064.94 samples/sec   Loss 8.1028   LearningRate 0.0716   Epoch: 3   Global Step: 51440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:32,955-Speed 9588.46 samples/sec   Loss 8.1454   LearningRate 0.0716   Epoch: 3   Global Step: 51450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:34,038-Speed 9453.01 samples/sec   Loss 8.2106   LearningRate 0.0715   Epoch: 3   Global Step: 51460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:35,076-Speed 9869.16 samples/sec   Loss 8.1026   LearningRate 0.0715   Epoch: 3   Global Step: 51470   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:36:36,147-Speed 9569.13 samples/sec   Loss 8.0111   LearningRate 0.0715   Epoch: 3   Global Step: 51480   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:36:37,273-Speed 9096.49 samples/sec   Loss 8.1833   LearningRate 0.0715   Epoch: 3   Global Step: 51490   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:36:38,338-Speed 9620.77 samples/sec   Loss 8.1434   LearningRate 0.0715   Epoch: 3   Global Step: 51500   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:36:39,426-Speed 9420.22 samples/sec   Loss 8.0253   LearningRate 0.0715   Epoch: 3   Global Step: 51510   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:36:40,474-Speed 9779.05 samples/sec   Loss 8.0582   LearningRate 0.0715   Epoch: 3   Global Step: 51520   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:36:41,509-Speed 9895.17 samples/sec   Loss 8.1606   LearningRate 0.0715   Epoch: 3   Global Step: 51530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:42,627-Speed 9168.27 samples/sec   Loss 8.0235   LearningRate 0.0715   Epoch: 3   Global Step: 51540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:43,703-Speed 9521.25 samples/sec   Loss 8.0241   LearningRate 0.0715   Epoch: 3   Global Step: 51550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:44,769-Speed 9611.39 samples/sec   Loss 8.1393   LearningRate 0.0715   Epoch: 3   Global Step: 51560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:45,877-Speed 9250.26 samples/sec   Loss 8.1237   LearningRate 0.0715   Epoch: 3   Global Step: 51570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:46,966-Speed 9411.84 samples/sec   Loss 8.1078   LearningRate 0.0715   Epoch: 3   Global Step: 51580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:48,054-Speed 9414.97 samples/sec   Loss 8.1381   LearningRate 0.0715   Epoch: 3   Global Step: 51590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:49,138-Speed 9457.10 samples/sec   Loss 8.1278   LearningRate 0.0715   Epoch: 3   Global Step: 51600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:50,229-Speed 9388.78 samples/sec   Loss 8.0882   LearningRate 0.0715   Epoch: 3   Global Step: 51610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:51,287-Speed 9684.53 samples/sec   Loss 8.0886   LearningRate 0.0715   Epoch: 3   Global Step: 51620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:52,343-Speed 9702.53 samples/sec   Loss 8.0056   LearningRate 0.0715   Epoch: 3   Global Step: 51630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:53,415-Speed 9556.35 samples/sec   Loss 8.1355   LearningRate 0.0715   Epoch: 3   Global Step: 51640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:54,468-Speed 9735.96 samples/sec   Loss 8.0754   LearningRate 0.0714   Epoch: 3   Global Step: 51650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:55,538-Speed 9578.19 samples/sec   Loss 8.1559   LearningRate 0.0714   Epoch: 3   Global Step: 51660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:56,627-Speed 9401.27 samples/sec   Loss 7.9985   LearningRate 0.0714   Epoch: 3   Global Step: 51670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:57,717-Speed 9399.32 samples/sec   Loss 8.0194   LearningRate 0.0714   Epoch: 3   Global Step: 51680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:36:58,749-Speed 9933.51 samples/sec   Loss 8.0634   LearningRate 0.0714   Epoch: 3   Global Step: 51690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:36:59,863-Speed 9193.46 samples/sec   Loss 8.0883   LearningRate 0.0714   Epoch: 3   Global Step: 51700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:37:00,948-Speed 9448.48 samples/sec   Loss 8.1016   LearningRate 0.0714   Epoch: 3   Global Step: 51710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:37:02,015-Speed 9600.46 samples/sec   Loss 7.9962   LearningRate 0.0714   Epoch: 3   Global Step: 51720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:37:03,072-Speed 9695.24 samples/sec   Loss 8.0894   LearningRate 0.0714   Epoch: 3   Global Step: 51730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:37:04,147-Speed 9532.38 samples/sec   Loss 8.1524   LearningRate 0.0714   Epoch: 3   Global Step: 51740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:37:05,219-Speed 9554.67 samples/sec   Loss 8.0707   LearningRate 0.0714   Epoch: 3   Global Step: 51750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:37:06,305-Speed 9442.87 samples/sec   Loss 8.1119   LearningRate 0.0714   Epoch: 3   Global Step: 51760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:37:07,365-Speed 9660.70 samples/sec   Loss 8.0144   LearningRate 0.0714   Epoch: 3   Global Step: 51770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:37:08,422-Speed 9697.53 samples/sec   Loss 8.0218   LearningRate 0.0714   Epoch: 3   Global Step: 51780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:37:09,483-Speed 9652.57 samples/sec   Loss 8.0432   LearningRate 0.0714   Epoch: 3   Global Step: 51790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:10,575-Speed 9384.58 samples/sec   Loss 8.0733   LearningRate 0.0714   Epoch: 3   Global Step: 51800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:11,633-Speed 9680.03 samples/sec   Loss 8.2104   LearningRate 0.0714   Epoch: 3   Global Step: 51810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:12,693-Speed 9664.20 samples/sec   Loss 8.0859   LearningRate 0.0714   Epoch: 3   Global Step: 51820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:13,763-Speed 9581.19 samples/sec   Loss 8.0686   LearningRate 0.0714   Epoch: 3   Global Step: 51830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:14,821-Speed 9681.47 samples/sec   Loss 8.1982   LearningRate 0.0714   Epoch: 3   Global Step: 51840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:15,904-Speed 9462.26 samples/sec   Loss 8.0741   LearningRate 0.0713   Epoch: 3   Global Step: 51850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:16,960-Speed 9704.19 samples/sec   Loss 8.1649   LearningRate 0.0713   Epoch: 3   Global Step: 51860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:18,025-Speed 9622.99 samples/sec   Loss 8.0866   LearningRate 0.0713   Epoch: 3   Global Step: 51870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:19,116-Speed 9384.64 samples/sec   Loss 8.1190   LearningRate 0.0713   Epoch: 3   Global Step: 51880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:20,207-Speed 9393.90 samples/sec   Loss 8.2586   LearningRate 0.0713   Epoch: 3   Global Step: 51890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:21,300-Speed 9371.78 samples/sec   Loss 8.1380   LearningRate 0.0713   Epoch: 3   Global Step: 51900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:22,435-Speed 9029.15 samples/sec   Loss 8.1153   LearningRate 0.0713   Epoch: 3   Global Step: 51910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:23,518-Speed 9461.92 samples/sec   Loss 8.0372   LearningRate 0.0713   Epoch: 3   Global Step: 51920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:24,573-Speed 9718.68 samples/sec   Loss 8.1527   LearningRate 0.0713   Epoch: 3   Global Step: 51930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:25,689-Speed 9180.33 samples/sec   Loss 8.1530   LearningRate 0.0713   Epoch: 3   Global Step: 51940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:26,789-Speed 9309.54 samples/sec   Loss 8.1775   LearningRate 0.0713   Epoch: 3   Global Step: 51950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:27,938-Speed 8922.15 samples/sec   Loss 8.0289   LearningRate 0.0713   Epoch: 3   Global Step: 51960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:29,002-Speed 9622.69 samples/sec   Loss 8.0509   LearningRate 0.0713   Epoch: 3   Global Step: 51970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:30,066-Speed 9629.56 samples/sec   Loss 8.1327   LearningRate 0.0713   Epoch: 3   Global Step: 51980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:37:31,179-Speed 9214.44 samples/sec   Loss 8.0502   LearningRate 0.0713   Epoch: 3   Global Step: 51990   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:37:32,258-Speed 9493.36 samples/sec   Loss 8.1574   LearningRate 0.0713   Epoch: 3   Global Step: 52000   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:37:54,475-[lfw][52000]XNorm: 12.606832
Training: 2022-04-11 13:37:54,476-[lfw][52000]Accuracy-Flip: 0.99483+-0.00293
Training: 2022-04-11 13:37:54,476-[lfw][52000]Accuracy-Highest: 0.99583
Training: 2022-04-11 13:38:20,180-[cfp_fp][52000]XNorm: 10.611780
Training: 2022-04-11 13:38:20,180-[cfp_fp][52000]Accuracy-Flip: 0.94686+-0.01178
Training: 2022-04-11 13:38:20,181-[cfp_fp][52000]Accuracy-Highest: 0.94700
Training: 2022-04-11 13:38:42,644-[agedb_30][52000]XNorm: 12.101429
Training: 2022-04-11 13:38:42,645-[agedb_30][52000]Accuracy-Flip: 0.95383+-0.01049
Training: 2022-04-11 13:38:42,646-[agedb_30][52000]Accuracy-Highest: 0.95483
Training: 2022-04-11 13:38:43,696-Speed 143.34 samples/sec   Loss 8.2263   LearningRate 0.0713   Epoch: 3   Global Step: 52010   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:38:44,769-Speed 9549.66 samples/sec   Loss 8.1483   LearningRate 0.0713   Epoch: 3   Global Step: 52020   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:38:45,821-Speed 9751.13 samples/sec   Loss 8.0570   LearningRate 0.0713   Epoch: 3   Global Step: 52030   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:38:46,894-Speed 9552.37 samples/sec   Loss 8.1761   LearningRate 0.0713   Epoch: 3   Global Step: 52040   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:38:47,975-Speed 9473.40 samples/sec   Loss 8.1266   LearningRate 0.0712   Epoch: 3   Global Step: 52050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:38:49,082-Speed 9257.56 samples/sec   Loss 8.1055   LearningRate 0.0712   Epoch: 3   Global Step: 52060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:38:50,304-Speed 8387.34 samples/sec   Loss 8.1777   LearningRate 0.0712   Epoch: 3   Global Step: 52070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:38:51,344-Speed 9854.17 samples/sec   Loss 8.1970   LearningRate 0.0712   Epoch: 3   Global Step: 52080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:38:52,408-Speed 9625.66 samples/sec   Loss 8.0860   LearningRate 0.0712   Epoch: 3   Global Step: 52090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:38:53,465-Speed 9698.92 samples/sec   Loss 8.1673   LearningRate 0.0712   Epoch: 3   Global Step: 52100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:38:54,694-Speed 8336.63 samples/sec   Loss 8.0418   LearningRate 0.0712   Epoch: 3   Global Step: 52110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:38:55,752-Speed 9687.39 samples/sec   Loss 8.0985   LearningRate 0.0712   Epoch: 3   Global Step: 52120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:38:56,783-Speed 9935.88 samples/sec   Loss 8.1774   LearningRate 0.0712   Epoch: 3   Global Step: 52130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:38:57,831-Speed 9775.61 samples/sec   Loss 8.1756   LearningRate 0.0712   Epoch: 3   Global Step: 52140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:38:58,921-Speed 9402.02 samples/sec   Loss 8.0671   LearningRate 0.0712   Epoch: 3   Global Step: 52150   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:38:59,980-Speed 9675.35 samples/sec   Loss 8.0896   LearningRate 0.0712   Epoch: 3   Global Step: 52160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:01,057-Speed 9522.78 samples/sec   Loss 8.2286   LearningRate 0.0712   Epoch: 3   Global Step: 52170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:02,152-Speed 9352.89 samples/sec   Loss 8.0966   LearningRate 0.0712   Epoch: 3   Global Step: 52180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:03,231-Speed 9491.51 samples/sec   Loss 8.1187   LearningRate 0.0712   Epoch: 3   Global Step: 52190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:04,294-Speed 9637.61 samples/sec   Loss 8.1910   LearningRate 0.0712   Epoch: 3   Global Step: 52200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:05,344-Speed 9760.55 samples/sec   Loss 8.1010   LearningRate 0.0712   Epoch: 3   Global Step: 52210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:39:06,498-Speed 8880.61 samples/sec   Loss 8.1558   LearningRate 0.0712   Epoch: 3   Global Step: 52220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:39:07,570-Speed 9556.83 samples/sec   Loss 8.2178   LearningRate 0.0712   Epoch: 3   Global Step: 52230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:39:08,640-Speed 9578.69 samples/sec   Loss 8.0594   LearningRate 0.0712   Epoch: 3   Global Step: 52240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 13:39:09,688-Speed 9772.14 samples/sec   Loss 8.0657   LearningRate 0.0711   Epoch: 3   Global Step: 52250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 13:39:10,768-Speed 9493.29 samples/sec   Loss 8.0974   LearningRate 0.0711   Epoch: 3   Global Step: 52260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 13:39:11,834-Speed 9605.63 samples/sec   Loss 8.0948   LearningRate 0.0711   Epoch: 3   Global Step: 52270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 13:39:12,925-Speed 9397.61 samples/sec   Loss 8.1620   LearningRate 0.0711   Epoch: 3   Global Step: 52280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 13:39:14,043-Speed 9167.19 samples/sec   Loss 8.0501   LearningRate 0.0711   Epoch: 3   Global Step: 52290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 13:39:15,113-Speed 9574.13 samples/sec   Loss 8.1912   LearningRate 0.0711   Epoch: 3   Global Step: 52300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 13:39:16,206-Speed 9376.21 samples/sec   Loss 8.0495   LearningRate 0.0711   Epoch: 3   Global Step: 52310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 13:39:17,275-Speed 9578.39 samples/sec   Loss 8.0841   LearningRate 0.0711   Epoch: 3   Global Step: 52320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 13:39:18,331-Speed 9707.68 samples/sec   Loss 8.1680   LearningRate 0.0711   Epoch: 3   Global Step: 52330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-11 13:39:19,409-Speed 9503.11 samples/sec   Loss 8.1114   LearningRate 0.0711   Epoch: 3   Global Step: 52340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:39:20,449-Speed 9852.64 samples/sec   Loss 8.2681   LearningRate 0.0711   Epoch: 3   Global Step: 52350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:39:21,571-Speed 9132.47 samples/sec   Loss 8.0953   LearningRate 0.0711   Epoch: 3   Global Step: 52360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:39:22,646-Speed 9527.17 samples/sec   Loss 8.1058   LearningRate 0.0711   Epoch: 3   Global Step: 52370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:39:23,709-Speed 9642.85 samples/sec   Loss 8.1472   LearningRate 0.0711   Epoch: 3   Global Step: 52380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:39:24,743-Speed 9911.34 samples/sec   Loss 8.1894   LearningRate 0.0711   Epoch: 3   Global Step: 52390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:39:25,819-Speed 9521.49 samples/sec   Loss 8.2171   LearningRate 0.0711   Epoch: 3   Global Step: 52400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:39:26,870-Speed 9748.08 samples/sec   Loss 8.0587   LearningRate 0.0711   Epoch: 3   Global Step: 52410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:39:27,941-Speed 9568.95 samples/sec   Loss 8.1388   LearningRate 0.0711   Epoch: 3   Global Step: 52420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:39:29,038-Speed 9342.07 samples/sec   Loss 8.0219   LearningRate 0.0711   Epoch: 3   Global Step: 52430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:39:30,149-Speed 9221.57 samples/sec   Loss 8.1597   LearningRate 0.0710   Epoch: 3   Global Step: 52440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:31,229-Speed 9490.30 samples/sec   Loss 8.1978   LearningRate 0.0710   Epoch: 3   Global Step: 52450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:32,298-Speed 9587.17 samples/sec   Loss 8.0561   LearningRate 0.0710   Epoch: 3   Global Step: 52460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:33,363-Speed 9617.33 samples/sec   Loss 8.1551   LearningRate 0.0710   Epoch: 3   Global Step: 52470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:34,438-Speed 9529.00 samples/sec   Loss 8.1761   LearningRate 0.0710   Epoch: 3   Global Step: 52480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:35,506-Speed 9597.45 samples/sec   Loss 8.2204   LearningRate 0.0710   Epoch: 3   Global Step: 52490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:36,588-Speed 9468.57 samples/sec   Loss 8.1533   LearningRate 0.0710   Epoch: 3   Global Step: 52500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:37,647-Speed 9674.17 samples/sec   Loss 8.3020   LearningRate 0.0710   Epoch: 3   Global Step: 52510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:38,736-Speed 9409.21 samples/sec   Loss 8.1491   LearningRate 0.0710   Epoch: 3   Global Step: 52520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:39,835-Speed 9319.80 samples/sec   Loss 8.0958   LearningRate 0.0710   Epoch: 3   Global Step: 52530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:40,886-Speed 9744.96 samples/sec   Loss 8.0636   LearningRate 0.0710   Epoch: 3   Global Step: 52540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:41,959-Speed 9555.78 samples/sec   Loss 8.1772   LearningRate 0.0710   Epoch: 3   Global Step: 52550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:43,005-Speed 9790.24 samples/sec   Loss 8.1416   LearningRate 0.0710   Epoch: 3   Global Step: 52560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:44,064-Speed 9682.04 samples/sec   Loss 8.1745   LearningRate 0.0710   Epoch: 3   Global Step: 52570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:45,122-Speed 9682.83 samples/sec   Loss 8.1865   LearningRate 0.0710   Epoch: 3   Global Step: 52580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:46,195-Speed 9546.59 samples/sec   Loss 8.1172   LearningRate 0.0710   Epoch: 3   Global Step: 52590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:47,285-Speed 9406.27 samples/sec   Loss 8.1739   LearningRate 0.0710   Epoch: 3   Global Step: 52600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:48,351-Speed 9613.58 samples/sec   Loss 8.2287   LearningRate 0.0710   Epoch: 3   Global Step: 52610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:49,409-Speed 9683.95 samples/sec   Loss 8.1908   LearningRate 0.0710   Epoch: 3   Global Step: 52620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:50,451-Speed 9832.42 samples/sec   Loss 8.1590   LearningRate 0.0710   Epoch: 3   Global Step: 52630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:39:51,512-Speed 9656.87 samples/sec   Loss 8.1395   LearningRate 0.0709   Epoch: 3   Global Step: 52640   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:39:52,541-Speed 9953.05 samples/sec   Loss 8.2835   LearningRate 0.0709   Epoch: 3   Global Step: 52650   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:39:53,644-Speed 9297.54 samples/sec   Loss 8.2015   LearningRate 0.0709   Epoch: 3   Global Step: 52660   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:39:54,775-Speed 9064.60 samples/sec   Loss 8.1719   LearningRate 0.0709   Epoch: 3   Global Step: 52670   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:39:55,881-Speed 9258.28 samples/sec   Loss 8.2039   LearningRate 0.0709   Epoch: 3   Global Step: 52680   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:39:56,974-Speed 9377.23 samples/sec   Loss 8.1566   LearningRate 0.0709   Epoch: 3   Global Step: 52690   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:39:58,071-Speed 9338.76 samples/sec   Loss 8.2022   LearningRate 0.0709   Epoch: 3   Global Step: 52700   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:39:59,180-Speed 9240.25 samples/sec   Loss 8.1622   LearningRate 0.0709   Epoch: 3   Global Step: 52710   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:40:00,239-Speed 9669.00 samples/sec   Loss 8.2643   LearningRate 0.0709   Epoch: 3   Global Step: 52720   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:40:01,304-Speed 9628.37 samples/sec   Loss 8.1011   LearningRate 0.0709   Epoch: 3   Global Step: 52730   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:40:02,365-Speed 9651.49 samples/sec   Loss 8.2176   LearningRate 0.0709   Epoch: 3   Global Step: 52740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:03,452-Speed 9433.07 samples/sec   Loss 8.1553   LearningRate 0.0709   Epoch: 3   Global Step: 52750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:04,550-Speed 9324.61 samples/sec   Loss 8.1677   LearningRate 0.0709   Epoch: 3   Global Step: 52760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:05,628-Speed 9512.77 samples/sec   Loss 8.1932   LearningRate 0.0709   Epoch: 3   Global Step: 52770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:06,718-Speed 9400.43 samples/sec   Loss 8.1630   LearningRate 0.0709   Epoch: 3   Global Step: 52780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:07,839-Speed 9135.19 samples/sec   Loss 8.1441   LearningRate 0.0709   Epoch: 3   Global Step: 52790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:08,908-Speed 9589.77 samples/sec   Loss 8.1657   LearningRate 0.0709   Epoch: 3   Global Step: 52800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:09,959-Speed 9748.60 samples/sec   Loss 8.2225   LearningRate 0.0709   Epoch: 3   Global Step: 52810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:11,023-Speed 9626.41 samples/sec   Loss 8.1348   LearningRate 0.0709   Epoch: 3   Global Step: 52820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:12,104-Speed 9481.69 samples/sec   Loss 8.2576   LearningRate 0.0709   Epoch: 3   Global Step: 52830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:13,240-Speed 9017.03 samples/sec   Loss 8.3476   LearningRate 0.0708   Epoch: 3   Global Step: 52840   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:40:14,325-Speed 9448.69 samples/sec   Loss 8.1222   LearningRate 0.0708   Epoch: 3   Global Step: 52850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:40:15,407-Speed 9470.85 samples/sec   Loss 8.1431   LearningRate 0.0708   Epoch: 3   Global Step: 52860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:40:16,491-Speed 9449.38 samples/sec   Loss 8.2106   LearningRate 0.0708   Epoch: 3   Global Step: 52870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:40:17,597-Speed 9260.57 samples/sec   Loss 8.0731   LearningRate 0.0708   Epoch: 3   Global Step: 52880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:40:18,665-Speed 9592.23 samples/sec   Loss 8.2211   LearningRate 0.0708   Epoch: 3   Global Step: 52890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:40:19,715-Speed 9757.87 samples/sec   Loss 8.1522   LearningRate 0.0708   Epoch: 3   Global Step: 52900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:40:20,739-Speed 10006.59 samples/sec   Loss 8.2654   LearningRate 0.0708   Epoch: 3   Global Step: 52910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:40:21,819-Speed 9491.88 samples/sec   Loss 8.1650   LearningRate 0.0708   Epoch: 3   Global Step: 52920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:40:22,860-Speed 9838.29 samples/sec   Loss 8.1981   LearningRate 0.0708   Epoch: 3   Global Step: 52930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:40:23,982-Speed 9137.50 samples/sec   Loss 8.1910   LearningRate 0.0708   Epoch: 3   Global Step: 52940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:40:25,086-Speed 9280.86 samples/sec   Loss 8.1791   LearningRate 0.0708   Epoch: 3   Global Step: 52950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:26,176-Speed 9398.99 samples/sec   Loss 8.1356   LearningRate 0.0708   Epoch: 3   Global Step: 52960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:27,237-Speed 9659.26 samples/sec   Loss 8.0749   LearningRate 0.0708   Epoch: 3   Global Step: 52970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:28,298-Speed 9655.09 samples/sec   Loss 8.1959   LearningRate 0.0708   Epoch: 3   Global Step: 52980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:29,337-Speed 9865.70 samples/sec   Loss 8.1255   LearningRate 0.0708   Epoch: 3   Global Step: 52990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:30,377-Speed 9846.22 samples/sec   Loss 8.1814   LearningRate 0.0708   Epoch: 3   Global Step: 53000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:31,445-Speed 9603.05 samples/sec   Loss 8.1961   LearningRate 0.0708   Epoch: 3   Global Step: 53010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:32,522-Speed 9506.42 samples/sec   Loss 8.1726   LearningRate 0.0708   Epoch: 3   Global Step: 53020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:33,620-Speed 9335.75 samples/sec   Loss 8.2505   LearningRate 0.0708   Epoch: 3   Global Step: 53030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:34,704-Speed 9452.10 samples/sec   Loss 8.2283   LearningRate 0.0707   Epoch: 3   Global Step: 53040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:35,769-Speed 9615.29 samples/sec   Loss 8.0925   LearningRate 0.0707   Epoch: 3   Global Step: 53050   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:40:36,838-Speed 9583.87 samples/sec   Loss 8.0925   LearningRate 0.0707   Epoch: 3   Global Step: 53060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:37,944-Speed 9261.88 samples/sec   Loss 8.1183   LearningRate 0.0707   Epoch: 3   Global Step: 53070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:38,989-Speed 9810.72 samples/sec   Loss 8.1994   LearningRate 0.0707   Epoch: 3   Global Step: 53080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:40,057-Speed 9588.53 samples/sec   Loss 8.1935   LearningRate 0.0707   Epoch: 3   Global Step: 53090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:41,126-Speed 9591.08 samples/sec   Loss 8.2505   LearningRate 0.0707   Epoch: 3   Global Step: 53100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:42,195-Speed 9582.20 samples/sec   Loss 8.1925   LearningRate 0.0707   Epoch: 3   Global Step: 53110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:43,254-Speed 9678.25 samples/sec   Loss 8.1964   LearningRate 0.0707   Epoch: 3   Global Step: 53120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:44,348-Speed 9364.38 samples/sec   Loss 8.1828   LearningRate 0.0707   Epoch: 3   Global Step: 53130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:45,433-Speed 9444.61 samples/sec   Loss 8.2002   LearningRate 0.0707   Epoch: 3   Global Step: 53140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:46,498-Speed 9623.20 samples/sec   Loss 8.2899   LearningRate 0.0707   Epoch: 3   Global Step: 53150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:47,570-Speed 9556.63 samples/sec   Loss 8.1920   LearningRate 0.0707   Epoch: 3   Global Step: 53160   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:40:48,638-Speed 9594.07 samples/sec   Loss 8.2152   LearningRate 0.0707   Epoch: 3   Global Step: 53170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:49,717-Speed 9498.87 samples/sec   Loss 8.1642   LearningRate 0.0707   Epoch: 3   Global Step: 53180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:50,811-Speed 9363.32 samples/sec   Loss 8.2314   LearningRate 0.0707   Epoch: 3   Global Step: 53190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:51,856-Speed 9806.24 samples/sec   Loss 8.0783   LearningRate 0.0707   Epoch: 3   Global Step: 53200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:52,924-Speed 9589.95 samples/sec   Loss 8.3011   LearningRate 0.0707   Epoch: 3   Global Step: 53210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:54,059-Speed 9027.31 samples/sec   Loss 8.1012   LearningRate 0.0707   Epoch: 3   Global Step: 53220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:55,102-Speed 9823.30 samples/sec   Loss 8.1973   LearningRate 0.0707   Epoch: 3   Global Step: 53230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:56,174-Speed 9558.44 samples/sec   Loss 8.2248   LearningRate 0.0706   Epoch: 3   Global Step: 53240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:57,249-Speed 9530.33 samples/sec   Loss 8.1378   LearningRate 0.0706   Epoch: 3   Global Step: 53250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:58,340-Speed 9395.06 samples/sec   Loss 8.2104   LearningRate 0.0706   Epoch: 3   Global Step: 53260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:40:59,421-Speed 9473.80 samples/sec   Loss 8.1745   LearningRate 0.0706   Epoch: 3   Global Step: 53270   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:41:00,493-Speed 9559.18 samples/sec   Loss 8.2149   LearningRate 0.0706   Epoch: 3   Global Step: 53280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:01,575-Speed 9474.72 samples/sec   Loss 8.3245   LearningRate 0.0706   Epoch: 3   Global Step: 53290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:02,675-Speed 9314.47 samples/sec   Loss 8.2374   LearningRate 0.0706   Epoch: 3   Global Step: 53300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:03,767-Speed 9384.10 samples/sec   Loss 8.3554   LearningRate 0.0706   Epoch: 3   Global Step: 53310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:04,858-Speed 9392.29 samples/sec   Loss 8.2188   LearningRate 0.0706   Epoch: 3   Global Step: 53320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:05,928-Speed 9574.07 samples/sec   Loss 8.2509   LearningRate 0.0706   Epoch: 3   Global Step: 53330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:06,964-Speed 9892.50 samples/sec   Loss 8.2309   LearningRate 0.0706   Epoch: 3   Global Step: 53340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:08,048-Speed 9449.84 samples/sec   Loss 8.1764   LearningRate 0.0706   Epoch: 3   Global Step: 53350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:09,170-Speed 9127.70 samples/sec   Loss 8.3361   LearningRate 0.0706   Epoch: 3   Global Step: 53360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:10,244-Speed 9543.08 samples/sec   Loss 8.1817   LearningRate 0.0706   Epoch: 3   Global Step: 53370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:11,312-Speed 9594.68 samples/sec   Loss 8.3211   LearningRate 0.0706   Epoch: 3   Global Step: 53380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:12,381-Speed 9581.58 samples/sec   Loss 8.2553   LearningRate 0.0706   Epoch: 3   Global Step: 53390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:13,475-Speed 9365.82 samples/sec   Loss 8.1901   LearningRate 0.0706   Epoch: 3   Global Step: 53400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:14,569-Speed 9369.52 samples/sec   Loss 8.3465   LearningRate 0.0706   Epoch: 3   Global Step: 53410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:15,657-Speed 9415.63 samples/sec   Loss 8.1435   LearningRate 0.0706   Epoch: 3   Global Step: 53420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:16,728-Speed 9564.95 samples/sec   Loss 8.3571   LearningRate 0.0706   Epoch: 3   Global Step: 53430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:17,832-Speed 9280.51 samples/sec   Loss 8.2376   LearningRate 0.0705   Epoch: 3   Global Step: 53440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:18,900-Speed 9596.01 samples/sec   Loss 8.1365   LearningRate 0.0705   Epoch: 3   Global Step: 53450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:20,012-Speed 9217.58 samples/sec   Loss 8.2522   LearningRate 0.0705   Epoch: 3   Global Step: 53460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:21,124-Speed 9217.90 samples/sec   Loss 8.2699   LearningRate 0.0705   Epoch: 3   Global Step: 53470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:22,160-Speed 9885.74 samples/sec   Loss 8.1410   LearningRate 0.0705   Epoch: 3   Global Step: 53480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:23,206-Speed 9798.29 samples/sec   Loss 8.2233   LearningRate 0.0705   Epoch: 3   Global Step: 53490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:24,298-Speed 9388.67 samples/sec   Loss 8.2908   LearningRate 0.0705   Epoch: 3   Global Step: 53500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:25,398-Speed 9310.50 samples/sec   Loss 8.1779   LearningRate 0.0705   Epoch: 3   Global Step: 53510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:26,482-Speed 9455.45 samples/sec   Loss 8.2930   LearningRate 0.0705   Epoch: 3   Global Step: 53520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:27,543-Speed 9660.05 samples/sec   Loss 8.3384   LearningRate 0.0705   Epoch: 3   Global Step: 53530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:28,652-Speed 9238.57 samples/sec   Loss 8.1971   LearningRate 0.0705   Epoch: 3   Global Step: 53540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:29,762-Speed 9228.46 samples/sec   Loss 8.2311   LearningRate 0.0705   Epoch: 3   Global Step: 53550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:30,858-Speed 9345.54 samples/sec   Loss 8.1460   LearningRate 0.0705   Epoch: 3   Global Step: 53560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:31,924-Speed 9616.26 samples/sec   Loss 8.1873   LearningRate 0.0705   Epoch: 3   Global Step: 53570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:32,996-Speed 9552.13 samples/sec   Loss 8.1961   LearningRate 0.0705   Epoch: 3   Global Step: 53580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:34,142-Speed 8943.68 samples/sec   Loss 8.1966   LearningRate 0.0705   Epoch: 3   Global Step: 53590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:35,197-Speed 9706.64 samples/sec   Loss 8.1617   LearningRate 0.0705   Epoch: 3   Global Step: 53600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:36,285-Speed 9417.87 samples/sec   Loss 8.3019   LearningRate 0.0705   Epoch: 3   Global Step: 53610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:37,373-Speed 9418.59 samples/sec   Loss 8.1877   LearningRate 0.0705   Epoch: 3   Global Step: 53620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:41:38,464-Speed 9398.14 samples/sec   Loss 8.2598   LearningRate 0.0704   Epoch: 3   Global Step: 53630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:39,546-Speed 9470.50 samples/sec   Loss 8.2452   LearningRate 0.0704   Epoch: 3   Global Step: 53640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:40,670-Speed 9117.05 samples/sec   Loss 8.3087   LearningRate 0.0704   Epoch: 3   Global Step: 53650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:41,726-Speed 9701.22 samples/sec   Loss 8.1612   LearningRate 0.0704   Epoch: 3   Global Step: 53660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:42,818-Speed 9377.09 samples/sec   Loss 8.1409   LearningRate 0.0704   Epoch: 3   Global Step: 53670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:43,914-Speed 9349.33 samples/sec   Loss 8.3211   LearningRate 0.0704   Epoch: 3   Global Step: 53680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:45,006-Speed 9388.46 samples/sec   Loss 8.1447   LearningRate 0.0704   Epoch: 3   Global Step: 53690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:46,088-Speed 9472.53 samples/sec   Loss 8.2956   LearningRate 0.0704   Epoch: 3   Global Step: 53700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:47,163-Speed 9531.72 samples/sec   Loss 8.3247   LearningRate 0.0704   Epoch: 3   Global Step: 53710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:48,234-Speed 9563.72 samples/sec   Loss 8.2513   LearningRate 0.0704   Epoch: 3   Global Step: 53720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:49,349-Speed 9192.23 samples/sec   Loss 8.1626   LearningRate 0.0704   Epoch: 3   Global Step: 53730   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:41:50,405-Speed 9702.20 samples/sec   Loss 8.1666   LearningRate 0.0704   Epoch: 3   Global Step: 53740   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:41:51,484-Speed 9490.38 samples/sec   Loss 8.1644   LearningRate 0.0704   Epoch: 3   Global Step: 53750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:52,556-Speed 9558.55 samples/sec   Loss 8.2693   LearningRate 0.0704   Epoch: 3   Global Step: 53760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:53,659-Speed 9292.23 samples/sec   Loss 8.1331   LearningRate 0.0704   Epoch: 3   Global Step: 53770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:54,730-Speed 9574.45 samples/sec   Loss 8.1036   LearningRate 0.0704   Epoch: 3   Global Step: 53780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:55,815-Speed 9442.36 samples/sec   Loss 8.1174   LearningRate 0.0704   Epoch: 3   Global Step: 53790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:56,888-Speed 9544.57 samples/sec   Loss 8.2055   LearningRate 0.0704   Epoch: 3   Global Step: 53800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:57,980-Speed 9384.14 samples/sec   Loss 8.2306   LearningRate 0.0704   Epoch: 3   Global Step: 53810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:41:59,097-Speed 9175.95 samples/sec   Loss 8.2002   LearningRate 0.0704   Epoch: 3   Global Step: 53820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:42:00,162-Speed 9613.84 samples/sec   Loss 8.2248   LearningRate 0.0703   Epoch: 3   Global Step: 53830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:42:01,222-Speed 9668.85 samples/sec   Loss 8.1618   LearningRate 0.0703   Epoch: 3   Global Step: 53840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:42:02,304-Speed 9477.19 samples/sec   Loss 8.2953   LearningRate 0.0703   Epoch: 3   Global Step: 53850   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:42:03,361-Speed 9693.14 samples/sec   Loss 8.1653   LearningRate 0.0703   Epoch: 3   Global Step: 53860   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:42:04,389-Speed 9964.99 samples/sec   Loss 8.1264   LearningRate 0.0703   Epoch: 3   Global Step: 53870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:42:05,478-Speed 9406.90 samples/sec   Loss 8.1782   LearningRate 0.0703   Epoch: 3   Global Step: 53880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:42:06,518-Speed 9849.80 samples/sec   Loss 8.1850   LearningRate 0.0703   Epoch: 3   Global Step: 53890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:42:07,593-Speed 9528.16 samples/sec   Loss 8.1535   LearningRate 0.0703   Epoch: 3   Global Step: 53900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:42:08,657-Speed 9632.29 samples/sec   Loss 8.2027   LearningRate 0.0703   Epoch: 3   Global Step: 53910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:42:09,742-Speed 9444.21 samples/sec   Loss 8.2432   LearningRate 0.0703   Epoch: 3   Global Step: 53920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:42:10,810-Speed 9596.40 samples/sec   Loss 8.2155   LearningRate 0.0703   Epoch: 3   Global Step: 53930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:42:11,892-Speed 9465.62 samples/sec   Loss 8.1439   LearningRate 0.0703   Epoch: 3   Global Step: 53940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:42:12,952-Speed 9670.93 samples/sec   Loss 8.1168   LearningRate 0.0703   Epoch: 3   Global Step: 53950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:42:14,046-Speed 9371.62 samples/sec   Loss 8.1308   LearningRate 0.0703   Epoch: 3   Global Step: 53960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:42:15,113-Speed 9599.31 samples/sec   Loss 8.2446   LearningRate 0.0703   Epoch: 3   Global Step: 53970   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:42:16,154-Speed 9839.25 samples/sec   Loss 8.2080   LearningRate 0.0703   Epoch: 3   Global Step: 53980   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:42:17,217-Speed 9639.70 samples/sec   Loss 8.2194   LearningRate 0.0703   Epoch: 3   Global Step: 53990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:42:18,300-Speed 9465.92 samples/sec   Loss 8.1913   LearningRate 0.0703   Epoch: 3   Global Step: 54000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:42:40,487-[lfw][54000]XNorm: 12.649266
Training: 2022-04-11 13:42:40,488-[lfw][54000]Accuracy-Flip: 0.99467+-0.00256
Training: 2022-04-11 13:42:40,488-[lfw][54000]Accuracy-Highest: 0.99583
Training: 2022-04-11 13:43:06,183-[cfp_fp][54000]XNorm: 10.701333
Training: 2022-04-11 13:43:06,184-[cfp_fp][54000]Accuracy-Flip: 0.94943+-0.01430
Training: 2022-04-11 13:43:06,184-[cfp_fp][54000]Accuracy-Highest: 0.94943
Training: 2022-04-11 13:43:28,587-[agedb_30][54000]XNorm: 12.216305
Training: 2022-04-11 13:43:28,588-[agedb_30][54000]Accuracy-Flip: 0.95400+-0.01188
Training: 2022-04-11 13:43:28,588-[agedb_30][54000]Accuracy-Highest: 0.95483
Training: 2022-04-11 13:43:29,703-Speed 143.41 samples/sec   Loss 8.1593   LearningRate 0.0703   Epoch: 3   Global Step: 54010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:30,794-Speed 9387.95 samples/sec   Loss 8.2810   LearningRate 0.0703   Epoch: 3   Global Step: 54020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:31,857-Speed 9637.82 samples/sec   Loss 8.2453   LearningRate 0.0702   Epoch: 3   Global Step: 54030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:32,978-Speed 9142.86 samples/sec   Loss 8.2523   LearningRate 0.0702   Epoch: 3   Global Step: 54040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:34,053-Speed 9531.68 samples/sec   Loss 8.2273   LearningRate 0.0702   Epoch: 3   Global Step: 54050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:35,150-Speed 9335.04 samples/sec   Loss 8.2847   LearningRate 0.0702   Epoch: 3   Global Step: 54060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:36,195-Speed 9812.66 samples/sec   Loss 8.0513   LearningRate 0.0702   Epoch: 3   Global Step: 54070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:37,242-Speed 9780.69 samples/sec   Loss 8.2570   LearningRate 0.0702   Epoch: 3   Global Step: 54080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:38,297-Speed 9711.16 samples/sec   Loss 8.2688   LearningRate 0.0702   Epoch: 3   Global Step: 54090   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:43:39,346-Speed 9771.54 samples/sec   Loss 8.2621   LearningRate 0.0702   Epoch: 3   Global Step: 54100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:40,482-Speed 9017.46 samples/sec   Loss 8.2609   LearningRate 0.0702   Epoch: 3   Global Step: 54110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:41,534-Speed 9738.28 samples/sec   Loss 8.1630   LearningRate 0.0702   Epoch: 3   Global Step: 54120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:43:42,628-Speed 9372.10 samples/sec   Loss 8.3370   LearningRate 0.0702   Epoch: 3   Global Step: 54130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:43:43,712-Speed 9447.98 samples/sec   Loss 8.3074   LearningRate 0.0702   Epoch: 3   Global Step: 54140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:43:44,792-Speed 9492.72 samples/sec   Loss 8.2015   LearningRate 0.0702   Epoch: 3   Global Step: 54150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:43:45,861-Speed 9580.03 samples/sec   Loss 8.2472   LearningRate 0.0702   Epoch: 3   Global Step: 54160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:43:46,945-Speed 9458.34 samples/sec   Loss 8.2168   LearningRate 0.0702   Epoch: 3   Global Step: 54170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:43:48,033-Speed 9410.29 samples/sec   Loss 8.2702   LearningRate 0.0702   Epoch: 3   Global Step: 54180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:43:49,174-Speed 8985.49 samples/sec   Loss 8.2058   LearningRate 0.0702   Epoch: 3   Global Step: 54190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:43:50,235-Speed 9652.16 samples/sec   Loss 8.1362   LearningRate 0.0702   Epoch: 3   Global Step: 54200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:43:51,302-Speed 9608.50 samples/sec   Loss 8.2032   LearningRate 0.0702   Epoch: 3   Global Step: 54210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:43:52,370-Speed 9591.90 samples/sec   Loss 8.2934   LearningRate 0.0702   Epoch: 3   Global Step: 54220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:53,415-Speed 9806.13 samples/sec   Loss 8.3101   LearningRate 0.0701   Epoch: 3   Global Step: 54230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:54,514-Speed 9324.95 samples/sec   Loss 8.1376   LearningRate 0.0701   Epoch: 3   Global Step: 54240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:55,606-Speed 9379.42 samples/sec   Loss 8.1973   LearningRate 0.0701   Epoch: 3   Global Step: 54250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:56,657-Speed 9744.76 samples/sec   Loss 8.2702   LearningRate 0.0701   Epoch: 3   Global Step: 54260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:57,756-Speed 9326.19 samples/sec   Loss 8.2137   LearningRate 0.0701   Epoch: 3   Global Step: 54270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:58,828-Speed 9558.54 samples/sec   Loss 8.2548   LearningRate 0.0701   Epoch: 3   Global Step: 54280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:43:59,898-Speed 9575.38 samples/sec   Loss 8.1448   LearningRate 0.0701   Epoch: 3   Global Step: 54290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:00,974-Speed 9515.74 samples/sec   Loss 8.2836   LearningRate 0.0701   Epoch: 3   Global Step: 54300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:02,033-Speed 9682.10 samples/sec   Loss 8.1851   LearningRate 0.0701   Epoch: 3   Global Step: 54310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:03,087-Speed 9718.35 samples/sec   Loss 8.2715   LearningRate 0.0701   Epoch: 3   Global Step: 54320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:04,122-Speed 9906.45 samples/sec   Loss 8.2447   LearningRate 0.0701   Epoch: 3   Global Step: 54330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:05,192-Speed 9568.09 samples/sec   Loss 8.2169   LearningRate 0.0701   Epoch: 3   Global Step: 54340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:06,331-Speed 9002.42 samples/sec   Loss 8.2179   LearningRate 0.0701   Epoch: 3   Global Step: 54350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:07,428-Speed 9341.84 samples/sec   Loss 8.1644   LearningRate 0.0701   Epoch: 3   Global Step: 54360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:08,530-Speed 9294.81 samples/sec   Loss 8.2863   LearningRate 0.0701   Epoch: 3   Global Step: 54370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:09,601-Speed 9571.20 samples/sec   Loss 8.2568   LearningRate 0.0701   Epoch: 3   Global Step: 54380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:10,643-Speed 9825.84 samples/sec   Loss 8.2781   LearningRate 0.0701   Epoch: 3   Global Step: 54390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:11,714-Speed 9568.40 samples/sec   Loss 8.2323   LearningRate 0.0701   Epoch: 3   Global Step: 54400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:12,748-Speed 9916.89 samples/sec   Loss 8.2184   LearningRate 0.0701   Epoch: 3   Global Step: 54410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:13,809-Speed 9649.75 samples/sec   Loss 8.3204   LearningRate 0.0701   Epoch: 3   Global Step: 54420   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:44:14,897-Speed 9422.35 samples/sec   Loss 8.1797   LearningRate 0.0700   Epoch: 3   Global Step: 54430   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:44:16,000-Speed 9289.05 samples/sec   Loss 8.1874   LearningRate 0.0700   Epoch: 3   Global Step: 54440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:17,109-Speed 9236.13 samples/sec   Loss 8.2270   LearningRate 0.0700   Epoch: 3   Global Step: 54450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:18,207-Speed 9335.52 samples/sec   Loss 8.1297   LearningRate 0.0700   Epoch: 3   Global Step: 54460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:19,283-Speed 9524.98 samples/sec   Loss 8.2907   LearningRate 0.0700   Epoch: 3   Global Step: 54470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:20,360-Speed 9507.89 samples/sec   Loss 8.3008   LearningRate 0.0700   Epoch: 3   Global Step: 54480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:21,428-Speed 9600.47 samples/sec   Loss 8.2268   LearningRate 0.0700   Epoch: 3   Global Step: 54490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:22,518-Speed 9399.99 samples/sec   Loss 8.1561   LearningRate 0.0700   Epoch: 3   Global Step: 54500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:23,590-Speed 9551.98 samples/sec   Loss 8.2581   LearningRate 0.0700   Epoch: 3   Global Step: 54510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:24,657-Speed 9604.95 samples/sec   Loss 8.1601   LearningRate 0.0700   Epoch: 3   Global Step: 54520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:25,733-Speed 9524.40 samples/sec   Loss 8.2678   LearningRate 0.0700   Epoch: 3   Global Step: 54530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:26,826-Speed 9378.48 samples/sec   Loss 8.1538   LearningRate 0.0700   Epoch: 3   Global Step: 54540   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:44:27,916-Speed 9398.49 samples/sec   Loss 8.1910   LearningRate 0.0700   Epoch: 3   Global Step: 54550   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:44:28,951-Speed 9897.45 samples/sec   Loss 8.3511   LearningRate 0.0700   Epoch: 3   Global Step: 54560   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:44:29,992-Speed 9847.32 samples/sec   Loss 8.2830   LearningRate 0.0700   Epoch: 3   Global Step: 54570   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:44:31,103-Speed 9217.18 samples/sec   Loss 8.2647   LearningRate 0.0700   Epoch: 3   Global Step: 54580   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:44:32,192-Speed 9412.53 samples/sec   Loss 8.1839   LearningRate 0.0700   Epoch: 3   Global Step: 54590   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:44:33,295-Speed 9280.68 samples/sec   Loss 8.3049   LearningRate 0.0700   Epoch: 3   Global Step: 54600   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:44:34,364-Speed 9595.15 samples/sec   Loss 8.3536   LearningRate 0.0700   Epoch: 3   Global Step: 54610   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:44:35,441-Speed 9512.77 samples/sec   Loss 8.2795   LearningRate 0.0700   Epoch: 3   Global Step: 54620   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:44:36,475-Speed 9904.55 samples/sec   Loss 8.2459   LearningRate 0.0699   Epoch: 3   Global Step: 54630   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:44:37,544-Speed 9584.20 samples/sec   Loss 8.2192   LearningRate 0.0699   Epoch: 3   Global Step: 54640   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:44:38,593-Speed 9766.27 samples/sec   Loss 8.1944   LearningRate 0.0699   Epoch: 3   Global Step: 54650   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:44:39,666-Speed 9552.83 samples/sec   Loss 8.1121   LearningRate 0.0699   Epoch: 3   Global Step: 54660   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:44:40,727-Speed 9658.23 samples/sec   Loss 8.2042   LearningRate 0.0699   Epoch: 3   Global Step: 54670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:41,820-Speed 9367.40 samples/sec   Loss 8.2266   LearningRate 0.0699   Epoch: 3   Global Step: 54680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:42,884-Speed 9638.43 samples/sec   Loss 8.2756   LearningRate 0.0699   Epoch: 3   Global Step: 54690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:43,942-Speed 9683.33 samples/sec   Loss 8.2216   LearningRate 0.0699   Epoch: 3   Global Step: 54700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:45,014-Speed 9626.25 samples/sec   Loss 8.1866   LearningRate 0.0699   Epoch: 3   Global Step: 54710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:44:46,060-Speed 9798.46 samples/sec   Loss 8.1668   LearningRate 0.0699   Epoch: 3   Global Step: 54720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:44:47,137-Speed 9505.76 samples/sec   Loss 8.2870   LearningRate 0.0699   Epoch: 3   Global Step: 54730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:44:48,185-Speed 9778.73 samples/sec   Loss 8.3115   LearningRate 0.0699   Epoch: 3   Global Step: 54740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:44:49,271-Speed 9434.10 samples/sec   Loss 8.2150   LearningRate 0.0699   Epoch: 3   Global Step: 54750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:44:50,316-Speed 9800.66 samples/sec   Loss 8.2640   LearningRate 0.0699   Epoch: 3   Global Step: 54760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:44:51,359-Speed 9833.75 samples/sec   Loss 8.2032   LearningRate 0.0699   Epoch: 3   Global Step: 54770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:44:52,436-Speed 9513.94 samples/sec   Loss 8.2904   LearningRate 0.0699   Epoch: 3   Global Step: 54780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:44:53,516-Speed 9485.65 samples/sec   Loss 8.1529   LearningRate 0.0699   Epoch: 3   Global Step: 54790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:44:54,578-Speed 9645.21 samples/sec   Loss 8.3394   LearningRate 0.0699   Epoch: 3   Global Step: 54800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:44:55,617-Speed 9865.49 samples/sec   Loss 8.1529   LearningRate 0.0699   Epoch: 3   Global Step: 54810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:56,694-Speed 9513.52 samples/sec   Loss 8.2656   LearningRate 0.0699   Epoch: 3   Global Step: 54820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:57,821-Speed 9088.74 samples/sec   Loss 8.1323   LearningRate 0.0698   Epoch: 3   Global Step: 54830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:58,889-Speed 9590.68 samples/sec   Loss 8.1642   LearningRate 0.0698   Epoch: 3   Global Step: 54840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:44:59,964-Speed 9533.51 samples/sec   Loss 8.2490   LearningRate 0.0698   Epoch: 3   Global Step: 54850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:01,048-Speed 9456.11 samples/sec   Loss 8.1615   LearningRate 0.0698   Epoch: 3   Global Step: 54860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:02,121-Speed 9544.98 samples/sec   Loss 8.2762   LearningRate 0.0698   Epoch: 3   Global Step: 54870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:03,175-Speed 9720.93 samples/sec   Loss 8.2611   LearningRate 0.0698   Epoch: 3   Global Step: 54880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:04,221-Speed 9807.28 samples/sec   Loss 8.1443   LearningRate 0.0698   Epoch: 3   Global Step: 54890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:05,285-Speed 9632.64 samples/sec   Loss 8.1568   LearningRate 0.0698   Epoch: 3   Global Step: 54900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:06,366-Speed 9484.70 samples/sec   Loss 8.1623   LearningRate 0.0698   Epoch: 3   Global Step: 54910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:07,450-Speed 9454.70 samples/sec   Loss 8.1600   LearningRate 0.0698   Epoch: 3   Global Step: 54920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:08,511-Speed 9654.22 samples/sec   Loss 8.2425   LearningRate 0.0698   Epoch: 3   Global Step: 54930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:09,607-Speed 9341.63 samples/sec   Loss 8.1786   LearningRate 0.0698   Epoch: 3   Global Step: 54940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:45:10,701-Speed 9371.15 samples/sec   Loss 8.2613   LearningRate 0.0698   Epoch: 3   Global Step: 54950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:45:11,762-Speed 9650.13 samples/sec   Loss 8.2149   LearningRate 0.0698   Epoch: 3   Global Step: 54960   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:45:12,829-Speed 9607.14 samples/sec   Loss 8.2566   LearningRate 0.0698   Epoch: 3   Global Step: 54970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:45:13,905-Speed 9518.30 samples/sec   Loss 8.3345   LearningRate 0.0698   Epoch: 3   Global Step: 54980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:45:14,979-Speed 9550.27 samples/sec   Loss 8.3482   LearningRate 0.0698   Epoch: 3   Global Step: 54990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:45:16,091-Speed 9213.84 samples/sec   Loss 8.1742   LearningRate 0.0698   Epoch: 3   Global Step: 55000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:45:17,202-Speed 9217.58 samples/sec   Loss 8.2265   LearningRate 0.0698   Epoch: 3   Global Step: 55010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:45:18,281-Speed 9493.01 samples/sec   Loss 8.1072   LearningRate 0.0698   Epoch: 3   Global Step: 55020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:45:19,378-Speed 9344.03 samples/sec   Loss 8.2418   LearningRate 0.0697   Epoch: 3   Global Step: 55030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:45:20,465-Speed 9429.81 samples/sec   Loss 8.1556   LearningRate 0.0697   Epoch: 3   Global Step: 55040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:21,630-Speed 8794.10 samples/sec   Loss 8.1550   LearningRate 0.0697   Epoch: 3   Global Step: 55050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:22,745-Speed 9192.64 samples/sec   Loss 8.2624   LearningRate 0.0697   Epoch: 3   Global Step: 55060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:23,836-Speed 9395.10 samples/sec   Loss 8.1905   LearningRate 0.0697   Epoch: 3   Global Step: 55070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:24,919-Speed 9462.98 samples/sec   Loss 8.2953   LearningRate 0.0697   Epoch: 3   Global Step: 55080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:25,943-Speed 9999.66 samples/sec   Loss 8.2893   LearningRate 0.0697   Epoch: 3   Global Step: 55090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:26,996-Speed 9726.45 samples/sec   Loss 8.4008   LearningRate 0.0697   Epoch: 3   Global Step: 55100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:28,057-Speed 9662.30 samples/sec   Loss 8.3523   LearningRate 0.0697   Epoch: 3   Global Step: 55110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:29,148-Speed 9391.80 samples/sec   Loss 8.2298   LearningRate 0.0697   Epoch: 3   Global Step: 55120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:30,203-Speed 9705.30 samples/sec   Loss 8.3522   LearningRate 0.0697   Epoch: 3   Global Step: 55130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:31,312-Speed 9240.98 samples/sec   Loss 8.2204   LearningRate 0.0697   Epoch: 3   Global Step: 55140   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:45:32,413-Speed 9308.12 samples/sec   Loss 8.1523   LearningRate 0.0697   Epoch: 3   Global Step: 55150   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:45:33,507-Speed 9364.87 samples/sec   Loss 8.2908   LearningRate 0.0697   Epoch: 3   Global Step: 55160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:34,575-Speed 9597.42 samples/sec   Loss 8.2348   LearningRate 0.0697   Epoch: 3   Global Step: 55170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:35,663-Speed 9417.90 samples/sec   Loss 8.1691   LearningRate 0.0697   Epoch: 3   Global Step: 55180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:36,737-Speed 9537.69 samples/sec   Loss 8.2878   LearningRate 0.0697   Epoch: 3   Global Step: 55190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:37,787-Speed 9762.01 samples/sec   Loss 8.2933   LearningRate 0.0697   Epoch: 3   Global Step: 55200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:38,835-Speed 9773.06 samples/sec   Loss 8.3158   LearningRate 0.0697   Epoch: 3   Global Step: 55210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:39,905-Speed 9569.92 samples/sec   Loss 8.2417   LearningRate 0.0697   Epoch: 3   Global Step: 55220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:41,000-Speed 9360.41 samples/sec   Loss 8.2263   LearningRate 0.0696   Epoch: 3   Global Step: 55230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:42,059-Speed 9680.92 samples/sec   Loss 8.3120   LearningRate 0.0696   Epoch: 3   Global Step: 55240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:43,142-Speed 9461.22 samples/sec   Loss 8.2692   LearningRate 0.0696   Epoch: 3   Global Step: 55250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:44,201-Speed 9674.72 samples/sec   Loss 8.2786   LearningRate 0.0696   Epoch: 3   Global Step: 55260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:45,240-Speed 9860.05 samples/sec   Loss 8.2122   LearningRate 0.0696   Epoch: 3   Global Step: 55270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:46,323-Speed 9466.11 samples/sec   Loss 8.1666   LearningRate 0.0696   Epoch: 3   Global Step: 55280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:47,448-Speed 9106.36 samples/sec   Loss 8.2822   LearningRate 0.0696   Epoch: 3   Global Step: 55290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:48,521-Speed 9544.17 samples/sec   Loss 8.3066   LearningRate 0.0696   Epoch: 3   Global Step: 55300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:49,631-Speed 9233.21 samples/sec   Loss 8.3007   LearningRate 0.0696   Epoch: 3   Global Step: 55310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:50,731-Speed 9314.14 samples/sec   Loss 8.3097   LearningRate 0.0696   Epoch: 3   Global Step: 55320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:51,805-Speed 9547.61 samples/sec   Loss 8.3224   LearningRate 0.0696   Epoch: 3   Global Step: 55330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:52,872-Speed 9599.57 samples/sec   Loss 8.3417   LearningRate 0.0696   Epoch: 3   Global Step: 55340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:53,925-Speed 9727.39 samples/sec   Loss 8.3079   LearningRate 0.0696   Epoch: 3   Global Step: 55350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:55,010-Speed 9444.78 samples/sec   Loss 8.2579   LearningRate 0.0696   Epoch: 3   Global Step: 55360   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:45:56,053-Speed 9821.43 samples/sec   Loss 8.2357   LearningRate 0.0696   Epoch: 3   Global Step: 55370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:57,118-Speed 9627.12 samples/sec   Loss 8.2656   LearningRate 0.0696   Epoch: 3   Global Step: 55380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:58,182-Speed 9629.05 samples/sec   Loss 8.2648   LearningRate 0.0696   Epoch: 3   Global Step: 55390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:45:59,257-Speed 9526.78 samples/sec   Loss 8.1061   LearningRate 0.0696   Epoch: 3   Global Step: 55400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:00,342-Speed 9442.13 samples/sec   Loss 8.2697   LearningRate 0.0696   Epoch: 3   Global Step: 55410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:01,474-Speed 9056.53 samples/sec   Loss 8.1595   LearningRate 0.0696   Epoch: 3   Global Step: 55420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:02,518-Speed 9817.09 samples/sec   Loss 8.2704   LearningRate 0.0695   Epoch: 3   Global Step: 55430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:03,597-Speed 9492.83 samples/sec   Loss 8.3163   LearningRate 0.0695   Epoch: 3   Global Step: 55440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:04,715-Speed 9170.11 samples/sec   Loss 8.2302   LearningRate 0.0695   Epoch: 3   Global Step: 55450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:05,819-Speed 9277.13 samples/sec   Loss 8.3101   LearningRate 0.0695   Epoch: 3   Global Step: 55460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:06,890-Speed 9562.97 samples/sec   Loss 8.2888   LearningRate 0.0695   Epoch: 3   Global Step: 55470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:07,982-Speed 9380.42 samples/sec   Loss 8.1523   LearningRate 0.0695   Epoch: 3   Global Step: 55480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:09,056-Speed 9543.20 samples/sec   Loss 8.2959   LearningRate 0.0695   Epoch: 3   Global Step: 55490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:10,144-Speed 9418.11 samples/sec   Loss 8.2013   LearningRate 0.0695   Epoch: 3   Global Step: 55500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:11,208-Speed 9629.81 samples/sec   Loss 8.2706   LearningRate 0.0695   Epoch: 3   Global Step: 55510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:12,263-Speed 9707.35 samples/sec   Loss 8.1558   LearningRate 0.0695   Epoch: 3   Global Step: 55520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:13,338-Speed 9538.48 samples/sec   Loss 8.1635   LearningRate 0.0695   Epoch: 3   Global Step: 55530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:14,414-Speed 9513.59 samples/sec   Loss 8.2313   LearningRate 0.0695   Epoch: 3   Global Step: 55540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:15,473-Speed 9679.24 samples/sec   Loss 8.3965   LearningRate 0.0695   Epoch: 3   Global Step: 55550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:16,562-Speed 9415.33 samples/sec   Loss 8.2489   LearningRate 0.0695   Epoch: 3   Global Step: 55560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:17,669-Speed 9254.24 samples/sec   Loss 8.3517   LearningRate 0.0695   Epoch: 3   Global Step: 55570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:18,724-Speed 9712.19 samples/sec   Loss 8.1804   LearningRate 0.0695   Epoch: 3   Global Step: 55580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:19,765-Speed 9845.46 samples/sec   Loss 8.3627   LearningRate 0.0695   Epoch: 3   Global Step: 55590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:20,877-Speed 9216.77 samples/sec   Loss 8.3075   LearningRate 0.0695   Epoch: 3   Global Step: 55600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:21,986-Speed 9233.95 samples/sec   Loss 8.2339   LearningRate 0.0695   Epoch: 3   Global Step: 55610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:23,108-Speed 9132.91 samples/sec   Loss 8.1566   LearningRate 0.0695   Epoch: 3   Global Step: 55620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:24,199-Speed 9393.47 samples/sec   Loss 8.2822   LearningRate 0.0694   Epoch: 3   Global Step: 55630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:25,289-Speed 9396.66 samples/sec   Loss 8.3542   LearningRate 0.0694   Epoch: 3   Global Step: 55640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:26,429-Speed 8987.23 samples/sec   Loss 8.2432   LearningRate 0.0694   Epoch: 3   Global Step: 55650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:27,508-Speed 9497.51 samples/sec   Loss 8.1570   LearningRate 0.0694   Epoch: 3   Global Step: 55660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:28,551-Speed 9821.38 samples/sec   Loss 8.2208   LearningRate 0.0694   Epoch: 3   Global Step: 55670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:29,590-Speed 9861.09 samples/sec   Loss 8.1997   LearningRate 0.0694   Epoch: 3   Global Step: 55680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:30,655-Speed 9622.32 samples/sec   Loss 8.2155   LearningRate 0.0694   Epoch: 3   Global Step: 55690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:31,726-Speed 9562.43 samples/sec   Loss 8.1751   LearningRate 0.0694   Epoch: 3   Global Step: 55700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:32,772-Speed 9802.21 samples/sec   Loss 8.2925   LearningRate 0.0694   Epoch: 3   Global Step: 55710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:33,853-Speed 9473.37 samples/sec   Loss 8.2803   LearningRate 0.0694   Epoch: 3   Global Step: 55720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:34,942-Speed 9416.25 samples/sec   Loss 8.2522   LearningRate 0.0694   Epoch: 3   Global Step: 55730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:46:36,014-Speed 9553.32 samples/sec   Loss 8.2135   LearningRate 0.0694   Epoch: 3   Global Step: 55740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:37,106-Speed 9385.66 samples/sec   Loss 8.2079   LearningRate 0.0694   Epoch: 3   Global Step: 55750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:38,187-Speed 9476.36 samples/sec   Loss 8.2347   LearningRate 0.0694   Epoch: 3   Global Step: 55760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:39,284-Speed 9337.84 samples/sec   Loss 8.3358   LearningRate 0.0694   Epoch: 3   Global Step: 55770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:40,367-Speed 9464.17 samples/sec   Loss 8.2169   LearningRate 0.0694   Epoch: 3   Global Step: 55780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:41,467-Speed 9317.97 samples/sec   Loss 8.2930   LearningRate 0.0694   Epoch: 3   Global Step: 55790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:42,527-Speed 9664.45 samples/sec   Loss 8.2882   LearningRate 0.0694   Epoch: 3   Global Step: 55800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:43,627-Speed 9308.91 samples/sec   Loss 8.2009   LearningRate 0.0694   Epoch: 3   Global Step: 55810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:44,735-Speed 9251.98 samples/sec   Loss 8.1982   LearningRate 0.0694   Epoch: 3   Global Step: 55820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:45,794-Speed 9671.49 samples/sec   Loss 8.2371   LearningRate 0.0693   Epoch: 3   Global Step: 55830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:46,890-Speed 9351.82 samples/sec   Loss 8.2420   LearningRate 0.0693   Epoch: 3   Global Step: 55840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:47,976-Speed 9433.82 samples/sec   Loss 8.1583   LearningRate 0.0693   Epoch: 3   Global Step: 55850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:49,094-Speed 9161.05 samples/sec   Loss 8.2111   LearningRate 0.0693   Epoch: 3   Global Step: 55860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:50,158-Speed 9635.64 samples/sec   Loss 8.3429   LearningRate 0.0693   Epoch: 3   Global Step: 55870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:51,242-Speed 9446.49 samples/sec   Loss 8.2537   LearningRate 0.0693   Epoch: 3   Global Step: 55880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:52,353-Speed 9225.18 samples/sec   Loss 8.1808   LearningRate 0.0693   Epoch: 3   Global Step: 55890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:53,414-Speed 9655.32 samples/sec   Loss 8.1541   LearningRate 0.0693   Epoch: 3   Global Step: 55900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:54,463-Speed 9769.34 samples/sec   Loss 8.3049   LearningRate 0.0693   Epoch: 3   Global Step: 55910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:55,499-Speed 9895.61 samples/sec   Loss 8.1911   LearningRate 0.0693   Epoch: 3   Global Step: 55920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:56,581-Speed 9471.03 samples/sec   Loss 8.2246   LearningRate 0.0693   Epoch: 3   Global Step: 55930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:57,673-Speed 9382.79 samples/sec   Loss 8.2258   LearningRate 0.0693   Epoch: 3   Global Step: 55940   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:46:58,763-Speed 9396.17 samples/sec   Loss 8.2825   LearningRate 0.0693   Epoch: 3   Global Step: 55950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:46:59,844-Speed 9479.17 samples/sec   Loss 8.2862   LearningRate 0.0693   Epoch: 3   Global Step: 55960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:47:00,903-Speed 9671.68 samples/sec   Loss 8.2384   LearningRate 0.0693   Epoch: 3   Global Step: 55970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:47:01,952-Speed 9771.01 samples/sec   Loss 8.3176   LearningRate 0.0693   Epoch: 3   Global Step: 55980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:47:03,078-Speed 9101.04 samples/sec   Loss 8.2950   LearningRate 0.0693   Epoch: 3   Global Step: 55990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:47:04,162-Speed 9454.38 samples/sec   Loss 8.2364   LearningRate 0.0693   Epoch: 3   Global Step: 56000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:47:26,217-[lfw][56000]XNorm: 12.584653
Training: 2022-04-11 13:47:26,218-[lfw][56000]Accuracy-Flip: 0.99583+-0.00250
Training: 2022-04-11 13:47:26,218-[lfw][56000]Accuracy-Highest: 0.99583
Training: 2022-04-11 13:47:51,659-[cfp_fp][56000]XNorm: 10.578777
Training: 2022-04-11 13:47:51,660-[cfp_fp][56000]Accuracy-Flip: 0.95157+-0.00902
Training: 2022-04-11 13:47:51,660-[cfp_fp][56000]Accuracy-Highest: 0.95157
Training: 2022-04-11 13:48:13,659-[agedb_30][56000]XNorm: 12.140642
Training: 2022-04-11 13:48:13,659-[agedb_30][56000]Accuracy-Flip: 0.95767+-0.01146
Training: 2022-04-11 13:48:13,659-[agedb_30][56000]Accuracy-Highest: 0.95767
Training: 2022-04-11 13:48:14,722-Speed 145.13 samples/sec   Loss 8.2175   LearningRate 0.0693   Epoch: 3   Global Step: 56010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:15,856-Speed 9036.18 samples/sec   Loss 8.3754   LearningRate 0.0693   Epoch: 3   Global Step: 56020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:16,967-Speed 9222.54 samples/sec   Loss 8.2073   LearningRate 0.0692   Epoch: 3   Global Step: 56030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:18,027-Speed 9671.62 samples/sec   Loss 8.2540   LearningRate 0.0692   Epoch: 3   Global Step: 56040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:19,084-Speed 9693.01 samples/sec   Loss 8.2931   LearningRate 0.0692   Epoch: 3   Global Step: 56050   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:48:20,154-Speed 9570.17 samples/sec   Loss 8.1814   LearningRate 0.0692   Epoch: 3   Global Step: 56060   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:48:21,227-Speed 9548.65 samples/sec   Loss 8.2391   LearningRate 0.0692   Epoch: 3   Global Step: 56070   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:48:22,288-Speed 9666.62 samples/sec   Loss 8.1901   LearningRate 0.0692   Epoch: 3   Global Step: 56080   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:48:23,344-Speed 9703.54 samples/sec   Loss 8.3303   LearningRate 0.0692   Epoch: 3   Global Step: 56090   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:48:24,415-Speed 9562.66 samples/sec   Loss 8.3603   LearningRate 0.0692   Epoch: 3   Global Step: 56100   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:48:25,516-Speed 9309.70 samples/sec   Loss 8.3050   LearningRate 0.0692   Epoch: 3   Global Step: 56110   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:48:26,552-Speed 9890.35 samples/sec   Loss 8.2943   LearningRate 0.0692   Epoch: 3   Global Step: 56120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:27,631-Speed 9489.56 samples/sec   Loss 8.2912   LearningRate 0.0692   Epoch: 3   Global Step: 56130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:28,708-Speed 9516.76 samples/sec   Loss 8.2772   LearningRate 0.0692   Epoch: 3   Global Step: 56140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:29,780-Speed 9557.89 samples/sec   Loss 8.1796   LearningRate 0.0692   Epoch: 3   Global Step: 56150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:30,863-Speed 9455.02 samples/sec   Loss 8.2259   LearningRate 0.0692   Epoch: 3   Global Step: 56160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:31,917-Speed 9726.69 samples/sec   Loss 8.0919   LearningRate 0.0692   Epoch: 3   Global Step: 56170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:32,982-Speed 9615.60 samples/sec   Loss 8.1671   LearningRate 0.0692   Epoch: 3   Global Step: 56180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:34,047-Speed 9621.53 samples/sec   Loss 8.2387   LearningRate 0.0692   Epoch: 3   Global Step: 56190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:35,076-Speed 9963.15 samples/sec   Loss 8.2407   LearningRate 0.0692   Epoch: 3   Global Step: 56200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:36,128-Speed 9737.52 samples/sec   Loss 8.3123   LearningRate 0.0692   Epoch: 3   Global Step: 56210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:37,199-Speed 9568.44 samples/sec   Loss 8.3494   LearningRate 0.0692   Epoch: 3   Global Step: 56220   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:48:38,289-Speed 9402.57 samples/sec   Loss 8.2229   LearningRate 0.0691   Epoch: 3   Global Step: 56230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:39,349-Speed 9662.47 samples/sec   Loss 8.3374   LearningRate 0.0691   Epoch: 3   Global Step: 56240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:40,437-Speed 9415.76 samples/sec   Loss 8.1772   LearningRate 0.0691   Epoch: 3   Global Step: 56250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:41,526-Speed 9405.69 samples/sec   Loss 8.2230   LearningRate 0.0691   Epoch: 3   Global Step: 56260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:42,620-Speed 9370.35 samples/sec   Loss 8.1382   LearningRate 0.0691   Epoch: 3   Global Step: 56270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:43,707-Speed 9422.24 samples/sec   Loss 8.2392   LearningRate 0.0691   Epoch: 3   Global Step: 56280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:48:44,773-Speed 9613.34 samples/sec   Loss 8.2373   LearningRate 0.0691   Epoch: 3   Global Step: 56290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:48:45,856-Speed 9467.54 samples/sec   Loss 8.2678   LearningRate 0.0691   Epoch: 3   Global Step: 56300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:48:46,904-Speed 9779.22 samples/sec   Loss 8.2167   LearningRate 0.0691   Epoch: 3   Global Step: 56310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:48:47,966-Speed 9644.00 samples/sec   Loss 8.3252   LearningRate 0.0691   Epoch: 3   Global Step: 56320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:48:49,064-Speed 9331.13 samples/sec   Loss 8.2782   LearningRate 0.0691   Epoch: 3   Global Step: 56330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:48:50,131-Speed 9599.78 samples/sec   Loss 8.2287   LearningRate 0.0691   Epoch: 3   Global Step: 56340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:48:51,194-Speed 9638.74 samples/sec   Loss 8.2681   LearningRate 0.0691   Epoch: 3   Global Step: 56350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:48:52,298-Speed 9282.50 samples/sec   Loss 8.1541   LearningRate 0.0691   Epoch: 3   Global Step: 56360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:48:53,389-Speed 9395.06 samples/sec   Loss 8.2121   LearningRate 0.0691   Epoch: 3   Global Step: 56370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:48:54,441-Speed 9739.34 samples/sec   Loss 8.2134   LearningRate 0.0691   Epoch: 3   Global Step: 56380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:55,477-Speed 9887.49 samples/sec   Loss 8.3420   LearningRate 0.0691   Epoch: 3   Global Step: 56390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:56,517-Speed 9850.58 samples/sec   Loss 8.1431   LearningRate 0.0691   Epoch: 3   Global Step: 56400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:57,657-Speed 8992.75 samples/sec   Loss 8.3403   LearningRate 0.0691   Epoch: 3   Global Step: 56410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:58,730-Speed 9545.76 samples/sec   Loss 8.2620   LearningRate 0.0691   Epoch: 3   Global Step: 56420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:48:59,837-Speed 9256.22 samples/sec   Loss 8.3387   LearningRate 0.0690   Epoch: 3   Global Step: 56430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:00,895-Speed 9682.54 samples/sec   Loss 8.1751   LearningRate 0.0690   Epoch: 3   Global Step: 56440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:01,959-Speed 9631.07 samples/sec   Loss 8.2234   LearningRate 0.0690   Epoch: 3   Global Step: 56450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:03,004-Speed 9806.20 samples/sec   Loss 8.3616   LearningRate 0.0690   Epoch: 3   Global Step: 56460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:49:04,087-Speed 9458.42 samples/sec   Loss 8.2580   LearningRate 0.0690   Epoch: 3   Global Step: 56470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:49:05,130-Speed 9827.60 samples/sec   Loss 8.1728   LearningRate 0.0690   Epoch: 3   Global Step: 56480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:49:06,183-Speed 9730.00 samples/sec   Loss 8.3297   LearningRate 0.0690   Epoch: 3   Global Step: 56490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:49:07,267-Speed 9456.45 samples/sec   Loss 8.1411   LearningRate 0.0690   Epoch: 3   Global Step: 56500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:49:08,398-Speed 9053.72 samples/sec   Loss 8.1026   LearningRate 0.0690   Epoch: 3   Global Step: 56510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:49:09,488-Speed 9399.91 samples/sec   Loss 8.2663   LearningRate 0.0690   Epoch: 3   Global Step: 56520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:49:10,570-Speed 9470.15 samples/sec   Loss 8.2314   LearningRate 0.0690   Epoch: 3   Global Step: 56530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:49:11,688-Speed 9162.75 samples/sec   Loss 8.2626   LearningRate 0.0690   Epoch: 3   Global Step: 56540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:49:12,776-Speed 9416.88 samples/sec   Loss 8.3499   LearningRate 0.0690   Epoch: 3   Global Step: 56550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:49:13,883-Speed 9256.44 samples/sec   Loss 8.2857   LearningRate 0.0690   Epoch: 3   Global Step: 56560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:14,957-Speed 9540.78 samples/sec   Loss 8.2273   LearningRate 0.0690   Epoch: 3   Global Step: 56570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:16,067-Speed 9238.23 samples/sec   Loss 8.2749   LearningRate 0.0690   Epoch: 3   Global Step: 56580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:17,151-Speed 9451.74 samples/sec   Loss 8.2050   LearningRate 0.0690   Epoch: 3   Global Step: 56590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:18,215-Speed 9632.76 samples/sec   Loss 8.2375   LearningRate 0.0690   Epoch: 3   Global Step: 56600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:19,322-Speed 9254.80 samples/sec   Loss 8.1025   LearningRate 0.0690   Epoch: 3   Global Step: 56610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:20,431-Speed 9240.96 samples/sec   Loss 8.1848   LearningRate 0.0690   Epoch: 3   Global Step: 56620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:21,527-Speed 9345.29 samples/sec   Loss 8.2327   LearningRate 0.0689   Epoch: 3   Global Step: 56630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:22,637-Speed 9239.36 samples/sec   Loss 8.2814   LearningRate 0.0689   Epoch: 3   Global Step: 56640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:23,696-Speed 9671.46 samples/sec   Loss 8.4072   LearningRate 0.0689   Epoch: 3   Global Step: 56650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:24,733-Speed 9888.82 samples/sec   Loss 8.2530   LearningRate 0.0689   Epoch: 3   Global Step: 56660   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:49:25,775-Speed 9827.25 samples/sec   Loss 8.1619   LearningRate 0.0689   Epoch: 3   Global Step: 56670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:26,843-Speed 9598.48 samples/sec   Loss 8.1726   LearningRate 0.0689   Epoch: 3   Global Step: 56680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:27,895-Speed 9737.98 samples/sec   Loss 8.3184   LearningRate 0.0689   Epoch: 3   Global Step: 56690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:28,975-Speed 9480.03 samples/sec   Loss 8.2833   LearningRate 0.0689   Epoch: 3   Global Step: 56700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:30,070-Speed 9362.63 samples/sec   Loss 8.2683   LearningRate 0.0689   Epoch: 3   Global Step: 56710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:31,176-Speed 9258.92 samples/sec   Loss 8.3372   LearningRate 0.0689   Epoch: 3   Global Step: 56720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:32,277-Speed 9310.49 samples/sec   Loss 8.1456   LearningRate 0.0689   Epoch: 3   Global Step: 56730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:33,400-Speed 9119.14 samples/sec   Loss 8.1361   LearningRate 0.0689   Epoch: 3   Global Step: 56740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:34,541-Speed 8981.71 samples/sec   Loss 8.1742   LearningRate 0.0689   Epoch: 3   Global Step: 56750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:35,614-Speed 9552.64 samples/sec   Loss 8.3604   LearningRate 0.0689   Epoch: 3   Global Step: 56760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:36,661-Speed 9789.07 samples/sec   Loss 8.2959   LearningRate 0.0689   Epoch: 3   Global Step: 56770   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:49:37,752-Speed 9391.50 samples/sec   Loss 8.2062   LearningRate 0.0689   Epoch: 3   Global Step: 56780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:38,812-Speed 9661.11 samples/sec   Loss 8.2344   LearningRate 0.0689   Epoch: 3   Global Step: 56790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:39,887-Speed 9535.13 samples/sec   Loss 8.3294   LearningRate 0.0689   Epoch: 3   Global Step: 56800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:40,977-Speed 9403.72 samples/sec   Loss 8.2467   LearningRate 0.0689   Epoch: 3   Global Step: 56810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:42,038-Speed 9650.44 samples/sec   Loss 8.1932   LearningRate 0.0689   Epoch: 3   Global Step: 56820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:43,144-Speed 9263.35 samples/sec   Loss 8.3676   LearningRate 0.0688   Epoch: 3   Global Step: 56830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:44,244-Speed 9315.34 samples/sec   Loss 8.1714   LearningRate 0.0688   Epoch: 3   Global Step: 56840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:45,319-Speed 9539.85 samples/sec   Loss 8.2324   LearningRate 0.0688   Epoch: 3   Global Step: 56850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:46,406-Speed 9425.47 samples/sec   Loss 8.3718   LearningRate 0.0688   Epoch: 3   Global Step: 56860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:47,501-Speed 9360.63 samples/sec   Loss 8.3005   LearningRate 0.0688   Epoch: 3   Global Step: 56870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:48,551-Speed 9755.79 samples/sec   Loss 8.1982   LearningRate 0.0688   Epoch: 3   Global Step: 56880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:49,623-Speed 9558.68 samples/sec   Loss 8.1985   LearningRate 0.0688   Epoch: 3   Global Step: 56890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:50,689-Speed 9606.28 samples/sec   Loss 8.3011   LearningRate 0.0688   Epoch: 3   Global Step: 56900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:51,754-Speed 9628.01 samples/sec   Loss 8.1984   LearningRate 0.0688   Epoch: 3   Global Step: 56910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:52,787-Speed 9916.72 samples/sec   Loss 8.3061   LearningRate 0.0688   Epoch: 3   Global Step: 56920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:53,897-Speed 9227.65 samples/sec   Loss 8.2753   LearningRate 0.0688   Epoch: 3   Global Step: 56930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:55,007-Speed 9238.65 samples/sec   Loss 8.1945   LearningRate 0.0688   Epoch: 3   Global Step: 56940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:56,033-Speed 9984.03 samples/sec   Loss 8.2925   LearningRate 0.0688   Epoch: 3   Global Step: 56950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:57,166-Speed 9037.73 samples/sec   Loss 8.3263   LearningRate 0.0688   Epoch: 3   Global Step: 56960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:49:58,252-Speed 9438.41 samples/sec   Loss 8.2449   LearningRate 0.0688   Epoch: 3   Global Step: 56970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:49:59,348-Speed 9351.96 samples/sec   Loss 8.1806   LearningRate 0.0688   Epoch: 3   Global Step: 56980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:50:00,431-Speed 9462.00 samples/sec   Loss 8.2993   LearningRate 0.0688   Epoch: 3   Global Step: 56990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:50:01,477-Speed 9789.50 samples/sec   Loss 8.2252   LearningRate 0.0688   Epoch: 3   Global Step: 57000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:50:02,547-Speed 9573.82 samples/sec   Loss 8.2488   LearningRate 0.0688   Epoch: 3   Global Step: 57010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:50:03,676-Speed 9078.53 samples/sec   Loss 8.2208   LearningRate 0.0688   Epoch: 3   Global Step: 57020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:50:04,736-Speed 9671.25 samples/sec   Loss 8.2910   LearningRate 0.0688   Epoch: 3   Global Step: 57030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:50:05,785-Speed 9771.96 samples/sec   Loss 8.2857   LearningRate 0.0687   Epoch: 3   Global Step: 57040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:50:06,874-Speed 9400.18 samples/sec   Loss 8.2635   LearningRate 0.0687   Epoch: 3   Global Step: 57050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:50:07,953-Speed 9501.53 samples/sec   Loss 8.1876   LearningRate 0.0687   Epoch: 3   Global Step: 57060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:50:09,015-Speed 9647.41 samples/sec   Loss 8.3294   LearningRate 0.0687   Epoch: 3   Global Step: 57070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:10,108-Speed 9374.41 samples/sec   Loss 8.2449   LearningRate 0.0687   Epoch: 3   Global Step: 57080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:11,256-Speed 8924.46 samples/sec   Loss 8.2571   LearningRate 0.0687   Epoch: 3   Global Step: 57090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:12,353-Speed 9340.21 samples/sec   Loss 8.2616   LearningRate 0.0687   Epoch: 3   Global Step: 57100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:13,457-Speed 9274.47 samples/sec   Loss 8.2397   LearningRate 0.0687   Epoch: 3   Global Step: 57110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:14,559-Speed 9295.78 samples/sec   Loss 8.3236   LearningRate 0.0687   Epoch: 3   Global Step: 57120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:15,625-Speed 9618.46 samples/sec   Loss 8.2456   LearningRate 0.0687   Epoch: 3   Global Step: 57130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:16,670-Speed 9799.45 samples/sec   Loss 8.2730   LearningRate 0.0687   Epoch: 3   Global Step: 57140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:17,799-Speed 9082.65 samples/sec   Loss 8.2903   LearningRate 0.0687   Epoch: 3   Global Step: 57150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:18,908-Speed 9240.52 samples/sec   Loss 8.2137   LearningRate 0.0687   Epoch: 3   Global Step: 57160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:20,001-Speed 9369.06 samples/sec   Loss 8.2945   LearningRate 0.0687   Epoch: 3   Global Step: 57170   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:50:21,128-Speed 9094.87 samples/sec   Loss 8.2309   LearningRate 0.0687   Epoch: 3   Global Step: 57180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:22,264-Speed 9023.27 samples/sec   Loss 8.3334   LearningRate 0.0687   Epoch: 3   Global Step: 57190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:23,328-Speed 9629.40 samples/sec   Loss 8.1894   LearningRate 0.0687   Epoch: 3   Global Step: 57200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:24,385-Speed 9688.58 samples/sec   Loss 8.2600   LearningRate 0.0687   Epoch: 3   Global Step: 57210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:25,457-Speed 9562.93 samples/sec   Loss 8.2992   LearningRate 0.0687   Epoch: 3   Global Step: 57220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:26,539-Speed 9469.30 samples/sec   Loss 8.2048   LearningRate 0.0687   Epoch: 3   Global Step: 57230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:27,603-Speed 9623.42 samples/sec   Loss 8.1825   LearningRate 0.0686   Epoch: 3   Global Step: 57240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:28,673-Speed 9576.60 samples/sec   Loss 8.1107   LearningRate 0.0686   Epoch: 3   Global Step: 57250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:29,748-Speed 9532.42 samples/sec   Loss 8.1410   LearningRate 0.0686   Epoch: 3   Global Step: 57260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:30,825-Speed 9515.91 samples/sec   Loss 8.1710   LearningRate 0.0686   Epoch: 3   Global Step: 57270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:31,894-Speed 9582.97 samples/sec   Loss 8.2946   LearningRate 0.0686   Epoch: 3   Global Step: 57280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:32,975-Speed 9475.14 samples/sec   Loss 8.2709   LearningRate 0.0686   Epoch: 3   Global Step: 57290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:34,043-Speed 9600.47 samples/sec   Loss 8.1833   LearningRate 0.0686   Epoch: 3   Global Step: 57300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:35,082-Speed 9860.05 samples/sec   Loss 8.2734   LearningRate 0.0686   Epoch: 3   Global Step: 57310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:36,186-Speed 9283.01 samples/sec   Loss 8.3258   LearningRate 0.0686   Epoch: 3   Global Step: 57320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:37,240-Speed 9722.67 samples/sec   Loss 8.2243   LearningRate 0.0686   Epoch: 3   Global Step: 57330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:38,320-Speed 9486.42 samples/sec   Loss 8.3106   LearningRate 0.0686   Epoch: 3   Global Step: 57340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:39,377-Speed 9697.09 samples/sec   Loss 8.3143   LearningRate 0.0686   Epoch: 3   Global Step: 57350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:40,469-Speed 9376.11 samples/sec   Loss 8.1914   LearningRate 0.0686   Epoch: 3   Global Step: 57360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:41,566-Speed 9346.62 samples/sec   Loss 8.2268   LearningRate 0.0686   Epoch: 3   Global Step: 57370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:42,613-Speed 9783.75 samples/sec   Loss 8.1525   LearningRate 0.0686   Epoch: 3   Global Step: 57380   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:50:43,692-Speed 9494.64 samples/sec   Loss 8.1967   LearningRate 0.0686   Epoch: 3   Global Step: 57390   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:50:44,732-Speed 9849.15 samples/sec   Loss 8.1981   LearningRate 0.0686   Epoch: 3   Global Step: 57400   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:50:45,777-Speed 9811.61 samples/sec   Loss 8.2706   LearningRate 0.0686   Epoch: 3   Global Step: 57410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:46,820-Speed 9826.60 samples/sec   Loss 8.1864   LearningRate 0.0686   Epoch: 3   Global Step: 57420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:47,901-Speed 9472.84 samples/sec   Loss 8.1682   LearningRate 0.0686   Epoch: 3   Global Step: 57430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:48,983-Speed 9470.02 samples/sec   Loss 8.3930   LearningRate 0.0685   Epoch: 3   Global Step: 57440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:50,068-Speed 9441.60 samples/sec   Loss 8.0900   LearningRate 0.0685   Epoch: 3   Global Step: 57450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:51,168-Speed 9315.68 samples/sec   Loss 8.3599   LearningRate 0.0685   Epoch: 3   Global Step: 57460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:52,246-Speed 9506.64 samples/sec   Loss 8.1893   LearningRate 0.0685   Epoch: 3   Global Step: 57470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:53,308-Speed 9651.66 samples/sec   Loss 8.2281   LearningRate 0.0685   Epoch: 3   Global Step: 57480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:54,407-Speed 9321.41 samples/sec   Loss 8.2125   LearningRate 0.0685   Epoch: 3   Global Step: 57490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:55,461-Speed 9727.89 samples/sec   Loss 8.1828   LearningRate 0.0685   Epoch: 3   Global Step: 57500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:50:56,557-Speed 9348.90 samples/sec   Loss 8.1656   LearningRate 0.0685   Epoch: 3   Global Step: 57510   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:50:57,634-Speed 9508.44 samples/sec   Loss 8.2597   LearningRate 0.0685   Epoch: 3   Global Step: 57520   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:50:58,712-Speed 9504.47 samples/sec   Loss 8.1175   LearningRate 0.0685   Epoch: 3   Global Step: 57530   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:50:59,751-Speed 9860.36 samples/sec   Loss 8.2035   LearningRate 0.0685   Epoch: 3   Global Step: 57540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:00,834-Speed 9459.57 samples/sec   Loss 8.1739   LearningRate 0.0685   Epoch: 3   Global Step: 57550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:01,912-Speed 9505.14 samples/sec   Loss 8.2602   LearningRate 0.0685   Epoch: 3   Global Step: 57560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:03,004-Speed 9382.03 samples/sec   Loss 8.2983   LearningRate 0.0685   Epoch: 3   Global Step: 57570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:04,092-Speed 9419.29 samples/sec   Loss 8.2384   LearningRate 0.0685   Epoch: 3   Global Step: 57580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:05,159-Speed 9606.39 samples/sec   Loss 8.2910   LearningRate 0.0685   Epoch: 3   Global Step: 57590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:06,242-Speed 9461.82 samples/sec   Loss 8.3047   LearningRate 0.0685   Epoch: 3   Global Step: 57600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:07,330-Speed 9410.03 samples/sec   Loss 8.1520   LearningRate 0.0685   Epoch: 3   Global Step: 57610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:08,421-Speed 9393.94 samples/sec   Loss 8.1481   LearningRate 0.0685   Epoch: 3   Global Step: 57620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:09,509-Speed 9418.23 samples/sec   Loss 8.2881   LearningRate 0.0685   Epoch: 3   Global Step: 57630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:10,580-Speed 9571.30 samples/sec   Loss 8.2558   LearningRate 0.0684   Epoch: 3   Global Step: 57640   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:51:11,645-Speed 9618.10 samples/sec   Loss 8.2118   LearningRate 0.0684   Epoch: 3   Global Step: 57650   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:51:12,692-Speed 9783.79 samples/sec   Loss 8.1958   LearningRate 0.0684   Epoch: 3   Global Step: 57660   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:51:13,755-Speed 9637.02 samples/sec   Loss 8.2653   LearningRate 0.0684   Epoch: 3   Global Step: 57670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:14,818-Speed 9643.61 samples/sec   Loss 8.1787   LearningRate 0.0684   Epoch: 3   Global Step: 57680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:15,901-Speed 9466.75 samples/sec   Loss 8.2074   LearningRate 0.0684   Epoch: 3   Global Step: 57690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:16,955-Speed 9716.06 samples/sec   Loss 8.2620   LearningRate 0.0684   Epoch: 3   Global Step: 57700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:18,009-Speed 9723.25 samples/sec   Loss 8.2156   LearningRate 0.0684   Epoch: 3   Global Step: 57710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:19,067-Speed 9686.64 samples/sec   Loss 8.1959   LearningRate 0.0684   Epoch: 3   Global Step: 57720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:20,120-Speed 9723.27 samples/sec   Loss 8.4180   LearningRate 0.0684   Epoch: 3   Global Step: 57730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:21,153-Speed 9921.38 samples/sec   Loss 8.3363   LearningRate 0.0684   Epoch: 3   Global Step: 57740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:22,197-Speed 9820.91 samples/sec   Loss 8.2204   LearningRate 0.0684   Epoch: 3   Global Step: 57750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:23,259-Speed 9646.21 samples/sec   Loss 8.2027   LearningRate 0.0684   Epoch: 3   Global Step: 57760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:24,364-Speed 9273.07 samples/sec   Loss 8.2323   LearningRate 0.0684   Epoch: 3   Global Step: 57770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:25,412-Speed 9779.46 samples/sec   Loss 8.2760   LearningRate 0.0684   Epoch: 3   Global Step: 57780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:26,444-Speed 9928.83 samples/sec   Loss 8.2825   LearningRate 0.0684   Epoch: 3   Global Step: 57790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:27,528-Speed 9452.03 samples/sec   Loss 8.1519   LearningRate 0.0684   Epoch: 3   Global Step: 57800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:28,577-Speed 9760.30 samples/sec   Loss 8.1819   LearningRate 0.0684   Epoch: 3   Global Step: 57810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:29,651-Speed 9545.94 samples/sec   Loss 8.1758   LearningRate 0.0684   Epoch: 3   Global Step: 57820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:30,744-Speed 9367.96 samples/sec   Loss 8.2411   LearningRate 0.0684   Epoch: 3   Global Step: 57830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:31,802-Speed 9687.51 samples/sec   Loss 8.3019   LearningRate 0.0683   Epoch: 3   Global Step: 57840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:32,874-Speed 9558.14 samples/sec   Loss 8.2496   LearningRate 0.0683   Epoch: 3   Global Step: 57850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:33,927-Speed 9727.26 samples/sec   Loss 8.1924   LearningRate 0.0683   Epoch: 3   Global Step: 57860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:34,994-Speed 9609.84 samples/sec   Loss 8.2486   LearningRate 0.0683   Epoch: 3   Global Step: 57870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:36,082-Speed 9420.99 samples/sec   Loss 8.1880   LearningRate 0.0683   Epoch: 3   Global Step: 57880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:37,153-Speed 9560.84 samples/sec   Loss 8.2408   LearningRate 0.0683   Epoch: 3   Global Step: 57890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:38,240-Speed 9433.11 samples/sec   Loss 8.2585   LearningRate 0.0683   Epoch: 3   Global Step: 57900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:39,315-Speed 9527.43 samples/sec   Loss 8.2087   LearningRate 0.0683   Epoch: 3   Global Step: 57910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:51:40,405-Speed 9397.29 samples/sec   Loss 8.1691   LearningRate 0.0683   Epoch: 3   Global Step: 57920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:41,480-Speed 9533.96 samples/sec   Loss 8.2250   LearningRate 0.0683   Epoch: 3   Global Step: 57930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:42,526-Speed 9791.80 samples/sec   Loss 8.2069   LearningRate 0.0683   Epoch: 3   Global Step: 57940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:43,639-Speed 9203.16 samples/sec   Loss 8.1621   LearningRate 0.0683   Epoch: 3   Global Step: 57950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:44,695-Speed 9707.89 samples/sec   Loss 8.1693   LearningRate 0.0683   Epoch: 3   Global Step: 57960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:45,756-Speed 9653.64 samples/sec   Loss 8.1566   LearningRate 0.0683   Epoch: 3   Global Step: 57970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:46,830-Speed 9544.88 samples/sec   Loss 8.2533   LearningRate 0.0683   Epoch: 3   Global Step: 57980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:47,932-Speed 9293.37 samples/sec   Loss 8.1696   LearningRate 0.0683   Epoch: 3   Global Step: 57990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:51:49,004-Speed 9556.06 samples/sec   Loss 8.1338   LearningRate 0.0683   Epoch: 3   Global Step: 58000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:52:11,094-[lfw][58000]XNorm: 12.571057
Training: 2022-04-11 13:52:11,095-[lfw][58000]Accuracy-Flip: 0.99417+-0.00344
Training: 2022-04-11 13:52:11,095-[lfw][58000]Accuracy-Highest: 0.99583
Training: 2022-04-11 13:52:36,691-[cfp_fp][58000]XNorm: 10.607200
Training: 2022-04-11 13:52:36,692-[cfp_fp][58000]Accuracy-Flip: 0.94529+-0.00963
Training: 2022-04-11 13:52:36,692-[cfp_fp][58000]Accuracy-Highest: 0.95157
Training: 2022-04-11 13:52:58,682-[agedb_30][58000]XNorm: 12.151300
Training: 2022-04-11 13:52:58,682-[agedb_30][58000]Accuracy-Flip: 0.95550+-0.00860
Training: 2022-04-11 13:52:58,683-[agedb_30][58000]Accuracy-Highest: 0.95767
Training: 2022-04-11 13:52:59,750-Speed 144.75 samples/sec   Loss 8.2142   LearningRate 0.0683   Epoch: 3   Global Step: 58010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:00,812-Speed 9650.21 samples/sec   Loss 8.2860   LearningRate 0.0683   Epoch: 3   Global Step: 58020   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:53:01,930-Speed 9158.60 samples/sec   Loss 8.1844   LearningRate 0.0683   Epoch: 3   Global Step: 58030   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:53:03,004-Speed 9538.73 samples/sec   Loss 8.2519   LearningRate 0.0682   Epoch: 3   Global Step: 58040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:04,072-Speed 9596.62 samples/sec   Loss 8.1966   LearningRate 0.0682   Epoch: 3   Global Step: 58050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:05,141-Speed 9584.67 samples/sec   Loss 8.2792   LearningRate 0.0682   Epoch: 3   Global Step: 58060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:06,193-Speed 9740.97 samples/sec   Loss 8.3177   LearningRate 0.0682   Epoch: 3   Global Step: 58070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:07,257-Speed 9629.85 samples/sec   Loss 8.3574   LearningRate 0.0682   Epoch: 3   Global Step: 58080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:08,292-Speed 9900.13 samples/sec   Loss 8.2578   LearningRate 0.0682   Epoch: 3   Global Step: 58090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:09,377-Speed 9440.31 samples/sec   Loss 8.3774   LearningRate 0.0682   Epoch: 3   Global Step: 58100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:10,450-Speed 9548.23 samples/sec   Loss 8.3156   LearningRate 0.0682   Epoch: 3   Global Step: 58110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:11,533-Speed 9466.35 samples/sec   Loss 8.2913   LearningRate 0.0682   Epoch: 3   Global Step: 58120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:12,636-Speed 9288.11 samples/sec   Loss 8.2234   LearningRate 0.0682   Epoch: 3   Global Step: 58130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:13,752-Speed 9179.40 samples/sec   Loss 8.2083   LearningRate 0.0682   Epoch: 3   Global Step: 58140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:14,843-Speed 9391.65 samples/sec   Loss 8.1485   LearningRate 0.0682   Epoch: 3   Global Step: 58150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:15,943-Speed 9318.59 samples/sec   Loss 8.3310   LearningRate 0.0682   Epoch: 3   Global Step: 58160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:17,044-Speed 9303.84 samples/sec   Loss 8.1786   LearningRate 0.0682   Epoch: 3   Global Step: 58170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:18,132-Speed 9418.03 samples/sec   Loss 8.1466   LearningRate 0.0682   Epoch: 3   Global Step: 58180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:19,176-Speed 9806.20 samples/sec   Loss 8.2116   LearningRate 0.0682   Epoch: 3   Global Step: 58190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:20,256-Speed 9492.74 samples/sec   Loss 8.3131   LearningRate 0.0682   Epoch: 3   Global Step: 58200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:21,356-Speed 9316.08 samples/sec   Loss 8.1196   LearningRate 0.0682   Epoch: 3   Global Step: 58210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:22,449-Speed 9373.26 samples/sec   Loss 8.2109   LearningRate 0.0682   Epoch: 3   Global Step: 58220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:23,534-Speed 9442.76 samples/sec   Loss 8.2694   LearningRate 0.0682   Epoch: 3   Global Step: 58230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:24,563-Speed 9955.39 samples/sec   Loss 8.2508   LearningRate 0.0682   Epoch: 3   Global Step: 58240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:25,624-Speed 9658.99 samples/sec   Loss 8.1522   LearningRate 0.0681   Epoch: 3   Global Step: 58250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:26,700-Speed 9525.60 samples/sec   Loss 8.1460   LearningRate 0.0681   Epoch: 3   Global Step: 58260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:27,761-Speed 9669.90 samples/sec   Loss 8.1902   LearningRate 0.0681   Epoch: 3   Global Step: 58270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:28,821-Speed 9668.09 samples/sec   Loss 8.1854   LearningRate 0.0681   Epoch: 3   Global Step: 58280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:53:29,862-Speed 9837.42 samples/sec   Loss 8.1874   LearningRate 0.0681   Epoch: 3   Global Step: 58290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:30,971-Speed 9240.35 samples/sec   Loss 8.2396   LearningRate 0.0681   Epoch: 3   Global Step: 58300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:32,061-Speed 9397.72 samples/sec   Loss 8.3257   LearningRate 0.0681   Epoch: 3   Global Step: 58310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:33,109-Speed 9780.51 samples/sec   Loss 8.2743   LearningRate 0.0681   Epoch: 3   Global Step: 58320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:34,148-Speed 9856.46 samples/sec   Loss 8.1804   LearningRate 0.0681   Epoch: 3   Global Step: 58330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:35,194-Speed 9798.08 samples/sec   Loss 8.3136   LearningRate 0.0681   Epoch: 3   Global Step: 58340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:36,245-Speed 9747.80 samples/sec   Loss 8.2070   LearningRate 0.0681   Epoch: 3   Global Step: 58350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:37,315-Speed 9576.12 samples/sec   Loss 8.1999   LearningRate 0.0681   Epoch: 3   Global Step: 58360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:38,407-Speed 9385.25 samples/sec   Loss 8.1329   LearningRate 0.0681   Epoch: 3   Global Step: 58370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:39,448-Speed 9841.47 samples/sec   Loss 8.2773   LearningRate 0.0681   Epoch: 3   Global Step: 58380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:40,481-Speed 9917.78 samples/sec   Loss 8.2536   LearningRate 0.0681   Epoch: 3   Global Step: 58390   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:53:41,595-Speed 9196.87 samples/sec   Loss 8.1078   LearningRate 0.0681   Epoch: 3   Global Step: 58400   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:53:42,665-Speed 9576.62 samples/sec   Loss 8.2312   LearningRate 0.0681   Epoch: 3   Global Step: 58410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:43,732-Speed 9609.93 samples/sec   Loss 8.2873   LearningRate 0.0681   Epoch: 3   Global Step: 58420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:44,820-Speed 9416.80 samples/sec   Loss 8.2808   LearningRate 0.0681   Epoch: 3   Global Step: 58430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:45,870-Speed 9755.04 samples/sec   Loss 8.2142   LearningRate 0.0681   Epoch: 3   Global Step: 58440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:46,988-Speed 9164.56 samples/sec   Loss 8.2592   LearningRate 0.0680   Epoch: 3   Global Step: 58450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:48,059-Speed 9567.50 samples/sec   Loss 8.1102   LearningRate 0.0680   Epoch: 3   Global Step: 58460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:49,143-Speed 9455.22 samples/sec   Loss 8.1939   LearningRate 0.0680   Epoch: 3   Global Step: 58470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:50,251-Speed 9244.73 samples/sec   Loss 8.2053   LearningRate 0.0680   Epoch: 3   Global Step: 58480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:51,318-Speed 9611.95 samples/sec   Loss 8.1196   LearningRate 0.0680   Epoch: 3   Global Step: 58490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:52,365-Speed 9780.29 samples/sec   Loss 8.2036   LearningRate 0.0680   Epoch: 3   Global Step: 58500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:53,410-Speed 9811.74 samples/sec   Loss 8.2012   LearningRate 0.0680   Epoch: 3   Global Step: 58510   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:53:54,484-Speed 9537.72 samples/sec   Loss 8.1310   LearningRate 0.0680   Epoch: 3   Global Step: 58520   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:53:55,543-Speed 9670.47 samples/sec   Loss 8.2756   LearningRate 0.0680   Epoch: 3   Global Step: 58530   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:53:56,606-Speed 9643.75 samples/sec   Loss 8.2380   LearningRate 0.0680   Epoch: 3   Global Step: 58540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:57,703-Speed 9346.03 samples/sec   Loss 8.2105   LearningRate 0.0680   Epoch: 3   Global Step: 58550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:58,803-Speed 9315.10 samples/sec   Loss 8.2918   LearningRate 0.0680   Epoch: 3   Global Step: 58560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:53:59,881-Speed 9502.46 samples/sec   Loss 8.2034   LearningRate 0.0680   Epoch: 3   Global Step: 58570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:00,937-Speed 9703.32 samples/sec   Loss 8.2036   LearningRate 0.0680   Epoch: 3   Global Step: 58580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:02,024-Speed 9417.47 samples/sec   Loss 8.2412   LearningRate 0.0680   Epoch: 3   Global Step: 58590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:03,047-Speed 10019.63 samples/sec   Loss 8.1909   LearningRate 0.0680   Epoch: 3   Global Step: 58600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:04,097-Speed 9761.10 samples/sec   Loss 8.2324   LearningRate 0.0680   Epoch: 3   Global Step: 58610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:05,217-Speed 9155.15 samples/sec   Loss 8.1736   LearningRate 0.0680   Epoch: 3   Global Step: 58620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:06,290-Speed 9547.82 samples/sec   Loss 8.0931   LearningRate 0.0680   Epoch: 3   Global Step: 58630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:07,333-Speed 9822.17 samples/sec   Loss 8.1754   LearningRate 0.0680   Epoch: 3   Global Step: 58640   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:54:08,385-Speed 9742.47 samples/sec   Loss 8.3889   LearningRate 0.0679   Epoch: 3   Global Step: 58650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:09,418-Speed 9914.85 samples/sec   Loss 8.2307   LearningRate 0.0679   Epoch: 3   Global Step: 58660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:10,489-Speed 9572.90 samples/sec   Loss 8.1945   LearningRate 0.0679   Epoch: 3   Global Step: 58670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:11,565-Speed 9526.42 samples/sec   Loss 8.3482   LearningRate 0.0679   Epoch: 3   Global Step: 58680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:12,638-Speed 9547.06 samples/sec   Loss 8.2956   LearningRate 0.0679   Epoch: 3   Global Step: 58690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:13,686-Speed 9773.24 samples/sec   Loss 8.2432   LearningRate 0.0679   Epoch: 3   Global Step: 58700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:14,791-Speed 9270.71 samples/sec   Loss 8.2989   LearningRate 0.0679   Epoch: 3   Global Step: 58710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:15,862-Speed 9571.02 samples/sec   Loss 8.2022   LearningRate 0.0679   Epoch: 3   Global Step: 58720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:16,905-Speed 9819.22 samples/sec   Loss 8.1376   LearningRate 0.0679   Epoch: 3   Global Step: 58730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:18,069-Speed 8806.91 samples/sec   Loss 8.1356   LearningRate 0.0679   Epoch: 3   Global Step: 58740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:19,137-Speed 9589.90 samples/sec   Loss 8.2520   LearningRate 0.0679   Epoch: 3   Global Step: 58750   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:54:20,167-Speed 9946.17 samples/sec   Loss 8.1845   LearningRate 0.0679   Epoch: 3   Global Step: 58760   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:54:21,242-Speed 9536.69 samples/sec   Loss 8.3549   LearningRate 0.0679   Epoch: 3   Global Step: 58770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:22,330-Speed 9419.43 samples/sec   Loss 8.2401   LearningRate 0.0679   Epoch: 3   Global Step: 58780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:23,428-Speed 9332.05 samples/sec   Loss 8.1098   LearningRate 0.0679   Epoch: 3   Global Step: 58790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:24,528-Speed 9316.41 samples/sec   Loss 8.2585   LearningRate 0.0679   Epoch: 3   Global Step: 58800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:25,637-Speed 9237.09 samples/sec   Loss 8.2660   LearningRate 0.0679   Epoch: 3   Global Step: 58810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:26,675-Speed 9872.97 samples/sec   Loss 8.2171   LearningRate 0.0679   Epoch: 3   Global Step: 58820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:27,773-Speed 9330.70 samples/sec   Loss 8.1113   LearningRate 0.0679   Epoch: 3   Global Step: 58830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:28,846-Speed 9546.30 samples/sec   Loss 8.0477   LearningRate 0.0679   Epoch: 3   Global Step: 58840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:29,903-Speed 9700.49 samples/sec   Loss 8.1923   LearningRate 0.0678   Epoch: 3   Global Step: 58850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:30,953-Speed 9751.67 samples/sec   Loss 8.2329   LearningRate 0.0678   Epoch: 3   Global Step: 58860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:32,030-Speed 9514.01 samples/sec   Loss 8.1654   LearningRate 0.0678   Epoch: 3   Global Step: 58870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:33,117-Speed 9428.89 samples/sec   Loss 8.1751   LearningRate 0.0678   Epoch: 3   Global Step: 58880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:34,180-Speed 9640.47 samples/sec   Loss 8.2501   LearningRate 0.0678   Epoch: 3   Global Step: 58890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:35,243-Speed 9633.30 samples/sec   Loss 8.0985   LearningRate 0.0678   Epoch: 3   Global Step: 58900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:36,290-Speed 9791.79 samples/sec   Loss 8.1078   LearningRate 0.0678   Epoch: 3   Global Step: 58910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:37,356-Speed 9604.70 samples/sec   Loss 8.3030   LearningRate 0.0678   Epoch: 3   Global Step: 58920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:38,476-Speed 9151.38 samples/sec   Loss 8.2042   LearningRate 0.0678   Epoch: 3   Global Step: 58930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:39,594-Speed 9160.57 samples/sec   Loss 8.2453   LearningRate 0.0678   Epoch: 3   Global Step: 58940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:40,654-Speed 9674.83 samples/sec   Loss 8.2320   LearningRate 0.0678   Epoch: 3   Global Step: 58950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:41,716-Speed 9644.71 samples/sec   Loss 8.1395   LearningRate 0.0678   Epoch: 3   Global Step: 58960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:42,757-Speed 9841.65 samples/sec   Loss 8.1925   LearningRate 0.0678   Epoch: 3   Global Step: 58970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:43,806-Speed 9768.06 samples/sec   Loss 8.2020   LearningRate 0.0678   Epoch: 3   Global Step: 58980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:44,856-Speed 9759.97 samples/sec   Loss 8.2834   LearningRate 0.0678   Epoch: 3   Global Step: 58990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:45,930-Speed 9544.54 samples/sec   Loss 8.2290   LearningRate 0.0678   Epoch: 3   Global Step: 59000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:47,055-Speed 9109.17 samples/sec   Loss 8.2753   LearningRate 0.0678   Epoch: 3   Global Step: 59010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:48,134-Speed 9498.31 samples/sec   Loss 8.2023   LearningRate 0.0678   Epoch: 3   Global Step: 59020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:49,224-Speed 9396.53 samples/sec   Loss 8.1716   LearningRate 0.0678   Epoch: 3   Global Step: 59030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:50,311-Speed 9427.64 samples/sec   Loss 8.3371   LearningRate 0.0678   Epoch: 3   Global Step: 59040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:51,384-Speed 9545.73 samples/sec   Loss 8.1204   LearningRate 0.0678   Epoch: 3   Global Step: 59050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:52,472-Speed 9417.81 samples/sec   Loss 8.2890   LearningRate 0.0677   Epoch: 3   Global Step: 59060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:54:53,550-Speed 9504.73 samples/sec   Loss 8.2348   LearningRate 0.0677   Epoch: 3   Global Step: 59070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:54,616-Speed 9609.24 samples/sec   Loss 8.2282   LearningRate 0.0677   Epoch: 3   Global Step: 59080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:55,688-Speed 9557.45 samples/sec   Loss 8.3381   LearningRate 0.0677   Epoch: 3   Global Step: 59090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:56,810-Speed 9134.36 samples/sec   Loss 8.0720   LearningRate 0.0677   Epoch: 3   Global Step: 59100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:57,914-Speed 9281.71 samples/sec   Loss 8.1578   LearningRate 0.0677   Epoch: 3   Global Step: 59110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:54:59,013-Speed 9320.50 samples/sec   Loss 8.2260   LearningRate 0.0677   Epoch: 3   Global Step: 59120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:00,084-Speed 9569.43 samples/sec   Loss 8.3199   LearningRate 0.0677   Epoch: 3   Global Step: 59130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:01,195-Speed 9224.56 samples/sec   Loss 8.0697   LearningRate 0.0677   Epoch: 3   Global Step: 59140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:02,277-Speed 9468.78 samples/sec   Loss 8.1813   LearningRate 0.0677   Epoch: 3   Global Step: 59150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:03,384-Speed 9259.93 samples/sec   Loss 8.2070   LearningRate 0.0677   Epoch: 3   Global Step: 59160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:04,415-Speed 9933.83 samples/sec   Loss 8.1996   LearningRate 0.0677   Epoch: 3   Global Step: 59170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:05,457-Speed 9835.92 samples/sec   Loss 8.2304   LearningRate 0.0677   Epoch: 3   Global Step: 59180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:06,511-Speed 9722.56 samples/sec   Loss 8.2512   LearningRate 0.0677   Epoch: 3   Global Step: 59190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:07,558-Speed 9785.60 samples/sec   Loss 8.2318   LearningRate 0.0677   Epoch: 3   Global Step: 59200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:55:08,644-Speed 9434.07 samples/sec   Loss 8.2829   LearningRate 0.0677   Epoch: 3   Global Step: 59210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:55:09,693-Speed 9765.39 samples/sec   Loss 8.2146   LearningRate 0.0677   Epoch: 3   Global Step: 59220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:55:10,782-Speed 9408.64 samples/sec   Loss 8.2648   LearningRate 0.0677   Epoch: 3   Global Step: 59230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:55:11,882-Speed 9312.07 samples/sec   Loss 8.2018   LearningRate 0.0677   Epoch: 3   Global Step: 59240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:55:12,970-Speed 9418.52 samples/sec   Loss 8.2239   LearningRate 0.0677   Epoch: 3   Global Step: 59250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:55:14,061-Speed 9390.37 samples/sec   Loss 8.2146   LearningRate 0.0676   Epoch: 3   Global Step: 59260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:55:15,136-Speed 9531.31 samples/sec   Loss 8.3030   LearningRate 0.0676   Epoch: 3   Global Step: 59270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:55:16,183-Speed 9789.53 samples/sec   Loss 8.3157   LearningRate 0.0676   Epoch: 3   Global Step: 59280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:55:17,243-Speed 9666.89 samples/sec   Loss 8.2096   LearningRate 0.0676   Epoch: 3   Global Step: 59290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:55:18,288-Speed 9804.28 samples/sec   Loss 8.1299   LearningRate 0.0676   Epoch: 3   Global Step: 59300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:19,350-Speed 9655.42 samples/sec   Loss 8.2640   LearningRate 0.0676   Epoch: 3   Global Step: 59310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:20,446-Speed 9341.66 samples/sec   Loss 8.1626   LearningRate 0.0676   Epoch: 3   Global Step: 59320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:21,496-Speed 9763.28 samples/sec   Loss 8.2417   LearningRate 0.0676   Epoch: 3   Global Step: 59330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:22,577-Speed 9474.98 samples/sec   Loss 8.1994   LearningRate 0.0676   Epoch: 3   Global Step: 59340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:23,655-Speed 9505.28 samples/sec   Loss 8.1792   LearningRate 0.0676   Epoch: 3   Global Step: 59350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:24,719-Speed 9629.20 samples/sec   Loss 8.1219   LearningRate 0.0676   Epoch: 3   Global Step: 59360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:25,775-Speed 9712.87 samples/sec   Loss 8.1714   LearningRate 0.0676   Epoch: 3   Global Step: 59370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:26,850-Speed 9522.41 samples/sec   Loss 8.2284   LearningRate 0.0676   Epoch: 3   Global Step: 59380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:27,917-Speed 9608.13 samples/sec   Loss 8.2127   LearningRate 0.0676   Epoch: 3   Global Step: 59390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:28,949-Speed 9923.51 samples/sec   Loss 8.2608   LearningRate 0.0676   Epoch: 3   Global Step: 59400   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:55:29,986-Speed 9878.09 samples/sec   Loss 8.2129   LearningRate 0.0676   Epoch: 3   Global Step: 59410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:31,098-Speed 9214.47 samples/sec   Loss 8.1594   LearningRate 0.0676   Epoch: 3   Global Step: 59420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:32,178-Speed 9489.67 samples/sec   Loss 8.2327   LearningRate 0.0676   Epoch: 3   Global Step: 59430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:33,269-Speed 9393.74 samples/sec   Loss 8.3130   LearningRate 0.0676   Epoch: 3   Global Step: 59440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:34,397-Speed 9089.54 samples/sec   Loss 8.2503   LearningRate 0.0676   Epoch: 3   Global Step: 59450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:35,519-Speed 9129.08 samples/sec   Loss 8.0988   LearningRate 0.0675   Epoch: 3   Global Step: 59460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:36,630-Speed 9223.18 samples/sec   Loss 8.2605   LearningRate 0.0675   Epoch: 3   Global Step: 59470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:37,704-Speed 9540.60 samples/sec   Loss 8.0923   LearningRate 0.0675   Epoch: 3   Global Step: 59480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:38,832-Speed 9079.72 samples/sec   Loss 8.1701   LearningRate 0.0675   Epoch: 3   Global Step: 59490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:39,910-Speed 9510.43 samples/sec   Loss 8.3050   LearningRate 0.0675   Epoch: 3   Global Step: 59500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:40,996-Speed 9435.39 samples/sec   Loss 8.4186   LearningRate 0.0675   Epoch: 3   Global Step: 59510   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:55:42,054-Speed 9682.25 samples/sec   Loss 8.1598   LearningRate 0.0675   Epoch: 3   Global Step: 59520   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:55:43,127-Speed 9543.46 samples/sec   Loss 8.2410   LearningRate 0.0675   Epoch: 3   Global Step: 59530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:44,150-Speed 10024.30 samples/sec   Loss 8.1485   LearningRate 0.0675   Epoch: 3   Global Step: 59540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:45,232-Speed 9466.40 samples/sec   Loss 8.1954   LearningRate 0.0675   Epoch: 3   Global Step: 59550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:46,323-Speed 9391.79 samples/sec   Loss 8.1898   LearningRate 0.0675   Epoch: 3   Global Step: 59560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:47,381-Speed 9684.76 samples/sec   Loss 8.2012   LearningRate 0.0675   Epoch: 3   Global Step: 59570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:48,465-Speed 9450.72 samples/sec   Loss 8.2984   LearningRate 0.0675   Epoch: 3   Global Step: 59580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:49,556-Speed 9392.61 samples/sec   Loss 8.1415   LearningRate 0.0675   Epoch: 3   Global Step: 59590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:50,679-Speed 9122.60 samples/sec   Loss 8.2184   LearningRate 0.0675   Epoch: 3   Global Step: 59600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:51,758-Speed 9501.08 samples/sec   Loss 8.2612   LearningRate 0.0675   Epoch: 3   Global Step: 59610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:52,829-Speed 9563.46 samples/sec   Loss 8.0902   LearningRate 0.0675   Epoch: 3   Global Step: 59620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:53,947-Speed 9165.15 samples/sec   Loss 8.2069   LearningRate 0.0675   Epoch: 3   Global Step: 59630   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:55:55,046-Speed 9319.85 samples/sec   Loss 8.1649   LearningRate 0.0675   Epoch: 3   Global Step: 59640   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:55:56,139-Speed 9380.22 samples/sec   Loss 8.2938   LearningRate 0.0675   Epoch: 3   Global Step: 59650   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:55:57,259-Speed 9144.41 samples/sec   Loss 8.1878   LearningRate 0.0675   Epoch: 3   Global Step: 59660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:58,318-Speed 9680.99 samples/sec   Loss 8.0856   LearningRate 0.0674   Epoch: 3   Global Step: 59670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:55:59,402-Speed 9454.37 samples/sec   Loss 8.2644   LearningRate 0.0674   Epoch: 3   Global Step: 59680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:00,463-Speed 9653.46 samples/sec   Loss 8.3345   LearningRate 0.0674   Epoch: 3   Global Step: 59690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:01,510-Speed 9789.77 samples/sec   Loss 8.1195   LearningRate 0.0674   Epoch: 3   Global Step: 59700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:02,597-Speed 9429.33 samples/sec   Loss 8.1911   LearningRate 0.0674   Epoch: 3   Global Step: 59710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:03,623-Speed 9982.97 samples/sec   Loss 8.1607   LearningRate 0.0674   Epoch: 3   Global Step: 59720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:04,740-Speed 9177.76 samples/sec   Loss 8.0736   LearningRate 0.0674   Epoch: 3   Global Step: 59730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:05,843-Speed 9287.69 samples/sec   Loss 8.0594   LearningRate 0.0674   Epoch: 3   Global Step: 59740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:06,972-Speed 9069.91 samples/sec   Loss 8.2433   LearningRate 0.0674   Epoch: 3   Global Step: 59750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:08,048-Speed 9524.94 samples/sec   Loss 8.2186   LearningRate 0.0674   Epoch: 3   Global Step: 59760   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:56:09,161-Speed 9206.12 samples/sec   Loss 8.1919   LearningRate 0.0674   Epoch: 3   Global Step: 59770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:10,251-Speed 9397.33 samples/sec   Loss 8.1611   LearningRate 0.0674   Epoch: 3   Global Step: 59780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:11,357-Speed 9264.46 samples/sec   Loss 8.3718   LearningRate 0.0674   Epoch: 3   Global Step: 59790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:12,460-Speed 9292.52 samples/sec   Loss 8.2131   LearningRate 0.0674   Epoch: 3   Global Step: 59800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:56:13,565-Speed 9270.31 samples/sec   Loss 8.2430   LearningRate 0.0674   Epoch: 3   Global Step: 59810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:56:14,606-Speed 9841.78 samples/sec   Loss 8.2228   LearningRate 0.0674   Epoch: 3   Global Step: 59820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:56:15,683-Speed 9519.49 samples/sec   Loss 8.2522   LearningRate 0.0674   Epoch: 3   Global Step: 59830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:56:16,794-Speed 9219.73 samples/sec   Loss 8.1686   LearningRate 0.0674   Epoch: 3   Global Step: 59840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:56:17,871-Speed 9518.16 samples/sec   Loss 8.2845   LearningRate 0.0674   Epoch: 3   Global Step: 59850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:56:18,955-Speed 9451.99 samples/sec   Loss 8.2407   LearningRate 0.0674   Epoch: 3   Global Step: 59860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:56:20,023-Speed 9590.92 samples/sec   Loss 8.1989   LearningRate 0.0673   Epoch: 3   Global Step: 59870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:56:21,076-Speed 9731.42 samples/sec   Loss 8.1332   LearningRate 0.0673   Epoch: 3   Global Step: 59880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:56:22,172-Speed 9344.99 samples/sec   Loss 8.1788   LearningRate 0.0673   Epoch: 3   Global Step: 59890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 13:56:23,282-Speed 9231.34 samples/sec   Loss 8.1157   LearningRate 0.0673   Epoch: 3   Global Step: 59900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:24,338-Speed 9698.41 samples/sec   Loss 8.2149   LearningRate 0.0673   Epoch: 3   Global Step: 59910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:25,397-Speed 9680.47 samples/sec   Loss 8.1941   LearningRate 0.0673   Epoch: 3   Global Step: 59920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:26,473-Speed 9526.66 samples/sec   Loss 8.2784   LearningRate 0.0673   Epoch: 3   Global Step: 59930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:27,606-Speed 9042.57 samples/sec   Loss 8.2506   LearningRate 0.0673   Epoch: 3   Global Step: 59940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:28,682-Speed 9522.19 samples/sec   Loss 8.2880   LearningRate 0.0673   Epoch: 3   Global Step: 59950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:29,818-Speed 9016.16 samples/sec   Loss 8.2768   LearningRate 0.0673   Epoch: 3   Global Step: 59960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:30,869-Speed 9745.46 samples/sec   Loss 8.3197   LearningRate 0.0673   Epoch: 3   Global Step: 59970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:31,922-Speed 9735.02 samples/sec   Loss 8.2464   LearningRate 0.0673   Epoch: 3   Global Step: 59980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:33,051-Speed 9074.79 samples/sec   Loss 8.2042   LearningRate 0.0673   Epoch: 3   Global Step: 59990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:56:34,121-Speed 9580.05 samples/sec   Loss 8.0757   LearningRate 0.0673   Epoch: 3   Global Step: 60000   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:56:55,829-[lfw][60000]XNorm: 12.664305
Training: 2022-04-11 13:56:55,830-[lfw][60000]Accuracy-Flip: 0.99500+-0.00269
Training: 2022-04-11 13:56:55,830-[lfw][60000]Accuracy-Highest: 0.99583
Training: 2022-04-11 13:57:20,917-[cfp_fp][60000]XNorm: 10.744781
Training: 2022-04-11 13:57:20,918-[cfp_fp][60000]Accuracy-Flip: 0.94186+-0.01319
Training: 2022-04-11 13:57:20,918-[cfp_fp][60000]Accuracy-Highest: 0.95157
Training: 2022-04-11 13:57:42,536-[agedb_30][60000]XNorm: 12.319594
Training: 2022-04-11 13:57:42,537-[agedb_30][60000]Accuracy-Flip: 0.95683+-0.01026
Training: 2022-04-11 13:57:42,537-[agedb_30][60000]Accuracy-Highest: 0.95767
Training: 2022-04-11 13:57:43,602-Speed 147.38 samples/sec   Loss 8.1429   LearningRate 0.0673   Epoch: 3   Global Step: 60010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:57:44,695-Speed 9373.62 samples/sec   Loss 8.0796   LearningRate 0.0673   Epoch: 3   Global Step: 60020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:57:45,781-Speed 9436.70 samples/sec   Loss 8.1655   LearningRate 0.0673   Epoch: 3   Global Step: 60030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:57:46,863-Speed 9466.61 samples/sec   Loss 8.2024   LearningRate 0.0673   Epoch: 3   Global Step: 60040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:57:47,964-Speed 9309.11 samples/sec   Loss 8.1506   LearningRate 0.0673   Epoch: 3   Global Step: 60050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:57:49,034-Speed 9575.62 samples/sec   Loss 8.2054   LearningRate 0.0673   Epoch: 3   Global Step: 60060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:57:50,093-Speed 9681.49 samples/sec   Loss 8.3502   LearningRate 0.0672   Epoch: 3   Global Step: 60070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:57:51,158-Speed 9616.51 samples/sec   Loss 7.9952   LearningRate 0.0672   Epoch: 3   Global Step: 60080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:57:52,235-Speed 9512.61 samples/sec   Loss 8.2210   LearningRate 0.0672   Epoch: 3   Global Step: 60090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:57:53,322-Speed 9426.74 samples/sec   Loss 8.0356   LearningRate 0.0672   Epoch: 3   Global Step: 60100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:57:54,358-Speed 9887.01 samples/sec   Loss 8.2132   LearningRate 0.0672   Epoch: 3   Global Step: 60110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:57:55,385-Speed 9979.40 samples/sec   Loss 8.0572   LearningRate 0.0672   Epoch: 3   Global Step: 60120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:57:56,444-Speed 9683.28 samples/sec   Loss 8.0905   LearningRate 0.0672   Epoch: 3   Global Step: 60130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:57:57,513-Speed 9577.74 samples/sec   Loss 8.1944   LearningRate 0.0672   Epoch: 3   Global Step: 60140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:57:58,620-Speed 9259.48 samples/sec   Loss 8.0852   LearningRate 0.0672   Epoch: 3   Global Step: 60150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:57:59,707-Speed 9423.64 samples/sec   Loss 8.2073   LearningRate 0.0672   Epoch: 3   Global Step: 60160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:00,818-Speed 9223.20 samples/sec   Loss 8.2171   LearningRate 0.0672   Epoch: 3   Global Step: 60170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:01,900-Speed 9472.68 samples/sec   Loss 8.2912   LearningRate 0.0672   Epoch: 3   Global Step: 60180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:02,985-Speed 9448.43 samples/sec   Loss 8.1360   LearningRate 0.0672   Epoch: 3   Global Step: 60190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:04,026-Speed 9840.52 samples/sec   Loss 8.1143   LearningRate 0.0672   Epoch: 3   Global Step: 60200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:05,131-Speed 9273.59 samples/sec   Loss 8.1945   LearningRate 0.0672   Epoch: 3   Global Step: 60210   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:58:06,181-Speed 9763.27 samples/sec   Loss 8.1031   LearningRate 0.0672   Epoch: 3   Global Step: 60220   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:58:07,257-Speed 9520.20 samples/sec   Loss 8.1236   LearningRate 0.0672   Epoch: 3   Global Step: 60230   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:58:08,319-Speed 9649.64 samples/sec   Loss 8.2286   LearningRate 0.0672   Epoch: 3   Global Step: 60240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:09,426-Speed 9249.67 samples/sec   Loss 8.1865   LearningRate 0.0672   Epoch: 3   Global Step: 60250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:10,498-Speed 9564.82 samples/sec   Loss 8.2261   LearningRate 0.0672   Epoch: 3   Global Step: 60260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:11,567-Speed 9581.86 samples/sec   Loss 8.1787   LearningRate 0.0672   Epoch: 3   Global Step: 60270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:12,651-Speed 9452.14 samples/sec   Loss 8.1810   LearningRate 0.0671   Epoch: 3   Global Step: 60280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:13,722-Speed 9560.71 samples/sec   Loss 8.1051   LearningRate 0.0671   Epoch: 3   Global Step: 60290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:14,778-Speed 9703.97 samples/sec   Loss 8.0494   LearningRate 0.0671   Epoch: 3   Global Step: 60300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:15,855-Speed 9514.33 samples/sec   Loss 8.0704   LearningRate 0.0671   Epoch: 3   Global Step: 60310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:16,940-Speed 9446.94 samples/sec   Loss 8.0352   LearningRate 0.0671   Epoch: 3   Global Step: 60320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:18,049-Speed 9237.70 samples/sec   Loss 8.1444   LearningRate 0.0671   Epoch: 3   Global Step: 60330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:19,102-Speed 9732.58 samples/sec   Loss 8.1090   LearningRate 0.0671   Epoch: 3   Global Step: 60340   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:58:20,152-Speed 9761.77 samples/sec   Loss 8.3264   LearningRate 0.0671   Epoch: 3   Global Step: 60350   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:58:21,243-Speed 9385.91 samples/sec   Loss 8.0773   LearningRate 0.0671   Epoch: 3   Global Step: 60360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:22,386-Speed 8970.33 samples/sec   Loss 8.1654   LearningRate 0.0671   Epoch: 3   Global Step: 60370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:23,474-Speed 9411.94 samples/sec   Loss 8.0406   LearningRate 0.0671   Epoch: 3   Global Step: 60380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:24,548-Speed 9546.22 samples/sec   Loss 8.1600   LearningRate 0.0671   Epoch: 3   Global Step: 60390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:25,599-Speed 9744.27 samples/sec   Loss 8.1617   LearningRate 0.0671   Epoch: 3   Global Step: 60400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:26,672-Speed 9556.51 samples/sec   Loss 8.1701   LearningRate 0.0671   Epoch: 3   Global Step: 60410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:27,752-Speed 9482.38 samples/sec   Loss 8.1743   LearningRate 0.0671   Epoch: 3   Global Step: 60420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:28,847-Speed 9361.46 samples/sec   Loss 8.2349   LearningRate 0.0671   Epoch: 3   Global Step: 60430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:29,913-Speed 9611.09 samples/sec   Loss 8.1078   LearningRate 0.0671   Epoch: 3   Global Step: 60440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:30,959-Speed 9796.95 samples/sec   Loss 8.0715   LearningRate 0.0671   Epoch: 3   Global Step: 60450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:32,032-Speed 9541.18 samples/sec   Loss 8.1914   LearningRate 0.0671   Epoch: 3   Global Step: 60460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:33,159-Speed 9096.49 samples/sec   Loss 8.2338   LearningRate 0.0671   Epoch: 3   Global Step: 60470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:34,275-Speed 9188.26 samples/sec   Loss 8.0229   LearningRate 0.0670   Epoch: 3   Global Step: 60480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:35,365-Speed 9398.05 samples/sec   Loss 8.2337   LearningRate 0.0670   Epoch: 3   Global Step: 60490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:36,443-Speed 9505.19 samples/sec   Loss 8.1606   LearningRate 0.0670   Epoch: 3   Global Step: 60500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:37,554-Speed 9219.85 samples/sec   Loss 8.1952   LearningRate 0.0670   Epoch: 3   Global Step: 60510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:38,660-Speed 9265.29 samples/sec   Loss 8.1749   LearningRate 0.0670   Epoch: 3   Global Step: 60520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:39,764-Speed 9281.21 samples/sec   Loss 8.1341   LearningRate 0.0670   Epoch: 3   Global Step: 60530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:40,833-Speed 9579.66 samples/sec   Loss 8.1027   LearningRate 0.0670   Epoch: 3   Global Step: 60540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:41,906-Speed 9547.52 samples/sec   Loss 8.1940   LearningRate 0.0670   Epoch: 3   Global Step: 60550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:42,989-Speed 9463.70 samples/sec   Loss 8.2067   LearningRate 0.0670   Epoch: 3   Global Step: 60560   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:58:44,058-Speed 9578.37 samples/sec   Loss 8.0477   LearningRate 0.0670   Epoch: 3   Global Step: 60570   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:58:45,105-Speed 9796.70 samples/sec   Loss 8.2121   LearningRate 0.0670   Epoch: 3   Global Step: 60580   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:58:46,219-Speed 9197.76 samples/sec   Loss 8.2032   LearningRate 0.0670   Epoch: 3   Global Step: 60590   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:58:47,270-Speed 9743.56 samples/sec   Loss 8.2739   LearningRate 0.0670   Epoch: 3   Global Step: 60600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:48,327-Speed 9692.75 samples/sec   Loss 8.1615   LearningRate 0.0670   Epoch: 3   Global Step: 60610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:49,425-Speed 9336.23 samples/sec   Loss 8.0659   LearningRate 0.0670   Epoch: 3   Global Step: 60620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:50,515-Speed 9401.56 samples/sec   Loss 8.1599   LearningRate 0.0670   Epoch: 3   Global Step: 60630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:51,597-Speed 9465.50 samples/sec   Loss 8.2498   LearningRate 0.0670   Epoch: 3   Global Step: 60640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:52,662-Speed 9626.75 samples/sec   Loss 8.0753   LearningRate 0.0670   Epoch: 3   Global Step: 60650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:53,731-Speed 9582.85 samples/sec   Loss 8.1831   LearningRate 0.0670   Epoch: 3   Global Step: 60660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:54,833-Speed 9298.57 samples/sec   Loss 8.1414   LearningRate 0.0670   Epoch: 3   Global Step: 60670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:55,899-Speed 9606.15 samples/sec   Loss 8.1479   LearningRate 0.0669   Epoch: 3   Global Step: 60680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:56,949-Speed 9766.46 samples/sec   Loss 8.0906   LearningRate 0.0669   Epoch: 3   Global Step: 60690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:58:58,040-Speed 9394.26 samples/sec   Loss 8.1200   LearningRate 0.0669   Epoch: 3   Global Step: 60700   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:58:59,123-Speed 9456.40 samples/sec   Loss 8.3083   LearningRate 0.0669   Epoch: 3   Global Step: 60710   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:59:00,183-Speed 9667.48 samples/sec   Loss 8.1575   LearningRate 0.0669   Epoch: 3   Global Step: 60720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:01,274-Speed 9398.43 samples/sec   Loss 8.0523   LearningRate 0.0669   Epoch: 3   Global Step: 60730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:02,388-Speed 9199.27 samples/sec   Loss 8.1453   LearningRate 0.0669   Epoch: 3   Global Step: 60740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:03,468-Speed 9480.79 samples/sec   Loss 8.0855   LearningRate 0.0669   Epoch: 3   Global Step: 60750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:04,575-Speed 9263.21 samples/sec   Loss 8.1884   LearningRate 0.0669   Epoch: 3   Global Step: 60760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:05,660-Speed 9443.64 samples/sec   Loss 8.1720   LearningRate 0.0669   Epoch: 3   Global Step: 60770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:06,729-Speed 9582.97 samples/sec   Loss 7.9977   LearningRate 0.0669   Epoch: 3   Global Step: 60780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:07,789-Speed 9664.13 samples/sec   Loss 8.1233   LearningRate 0.0669   Epoch: 3   Global Step: 60790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:08,839-Speed 9760.02 samples/sec   Loss 8.2105   LearningRate 0.0669   Epoch: 3   Global Step: 60800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:09,911-Speed 9561.32 samples/sec   Loss 8.1970   LearningRate 0.0669   Epoch: 3   Global Step: 60810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:10,957-Speed 9793.94 samples/sec   Loss 8.0403   LearningRate 0.0669   Epoch: 3   Global Step: 60820   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:59:12,031-Speed 9538.28 samples/sec   Loss 8.0926   LearningRate 0.0669   Epoch: 3   Global Step: 60830   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:59:13,110-Speed 9500.43 samples/sec   Loss 8.2031   LearningRate 0.0669   Epoch: 3   Global Step: 60840   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:59:14,160-Speed 9755.32 samples/sec   Loss 8.2000   LearningRate 0.0669   Epoch: 3   Global Step: 60850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:15,197-Speed 9878.63 samples/sec   Loss 8.1339   LearningRate 0.0669   Epoch: 3   Global Step: 60860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:16,266-Speed 9583.47 samples/sec   Loss 8.1877   LearningRate 0.0669   Epoch: 3   Global Step: 60870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:17,327-Speed 9660.71 samples/sec   Loss 8.2553   LearningRate 0.0669   Epoch: 3   Global Step: 60880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:18,420-Speed 9368.35 samples/sec   Loss 8.1248   LearningRate 0.0668   Epoch: 3   Global Step: 60890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:19,525-Speed 9272.86 samples/sec   Loss 8.0579   LearningRate 0.0668   Epoch: 3   Global Step: 60900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:20,611-Speed 9438.74 samples/sec   Loss 8.1268   LearningRate 0.0668   Epoch: 3   Global Step: 60910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:21,679-Speed 9595.31 samples/sec   Loss 8.1240   LearningRate 0.0668   Epoch: 3   Global Step: 60920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:22,724-Speed 9805.27 samples/sec   Loss 8.1440   LearningRate 0.0668   Epoch: 3   Global Step: 60930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:23,801-Speed 9507.94 samples/sec   Loss 8.0551   LearningRate 0.0668   Epoch: 3   Global Step: 60940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:24,876-Speed 9537.79 samples/sec   Loss 8.2239   LearningRate 0.0668   Epoch: 3   Global Step: 60950   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:59:25,955-Speed 9490.41 samples/sec   Loss 8.2043   LearningRate 0.0668   Epoch: 3   Global Step: 60960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:27,044-Speed 9410.70 samples/sec   Loss 8.1374   LearningRate 0.0668   Epoch: 3   Global Step: 60970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:28,108-Speed 9631.59 samples/sec   Loss 8.3644   LearningRate 0.0668   Epoch: 3   Global Step: 60980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:29,179-Speed 9569.85 samples/sec   Loss 8.0790   LearningRate 0.0668   Epoch: 3   Global Step: 60990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:30,259-Speed 9481.24 samples/sec   Loss 8.0779   LearningRate 0.0668   Epoch: 3   Global Step: 61000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:31,329-Speed 9580.55 samples/sec   Loss 8.1224   LearningRate 0.0668   Epoch: 3   Global Step: 61010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:32,412-Speed 9458.14 samples/sec   Loss 8.1707   LearningRate 0.0668   Epoch: 3   Global Step: 61020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:33,483-Speed 9570.14 samples/sec   Loss 8.1162   LearningRate 0.0668   Epoch: 3   Global Step: 61030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:34,577-Speed 9360.04 samples/sec   Loss 8.0674   LearningRate 0.0668   Epoch: 3   Global Step: 61040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:35,634-Speed 9692.94 samples/sec   Loss 8.2991   LearningRate 0.0668   Epoch: 3   Global Step: 61050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:36,684-Speed 9764.43 samples/sec   Loss 8.1536   LearningRate 0.0668   Epoch: 3   Global Step: 61060   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:59:37,757-Speed 9550.28 samples/sec   Loss 8.1055   LearningRate 0.0668   Epoch: 3   Global Step: 61070   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:59:38,824-Speed 9605.32 samples/sec   Loss 8.2215   LearningRate 0.0668   Epoch: 3   Global Step: 61080   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:59:39,918-Speed 9358.54 samples/sec   Loss 8.1713   LearningRate 0.0667   Epoch: 3   Global Step: 61090   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:59:40,983-Speed 9620.02 samples/sec   Loss 8.2883   LearningRate 0.0667   Epoch: 3   Global Step: 61100   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:59:42,030-Speed 9794.31 samples/sec   Loss 8.1534   LearningRate 0.0667   Epoch: 3   Global Step: 61110   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:59:43,110-Speed 9482.02 samples/sec   Loss 8.0619   LearningRate 0.0667   Epoch: 3   Global Step: 61120   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:59:44,180-Speed 9580.10 samples/sec   Loss 8.1360   LearningRate 0.0667   Epoch: 3   Global Step: 61130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:45,257-Speed 9513.09 samples/sec   Loss 8.1360   LearningRate 0.0667   Epoch: 3   Global Step: 61140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:46,356-Speed 9322.65 samples/sec   Loss 8.1594   LearningRate 0.0667   Epoch: 3   Global Step: 61150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:47,427-Speed 9571.88 samples/sec   Loss 8.1403   LearningRate 0.0667   Epoch: 3   Global Step: 61160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:48,490-Speed 9640.42 samples/sec   Loss 8.1740   LearningRate 0.0667   Epoch: 3   Global Step: 61170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:49,602-Speed 9207.34 samples/sec   Loss 8.0768   LearningRate 0.0667   Epoch: 3   Global Step: 61180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:50,654-Speed 9746.80 samples/sec   Loss 8.2546   LearningRate 0.0667   Epoch: 3   Global Step: 61190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:51,753-Speed 9318.58 samples/sec   Loss 8.0764   LearningRate 0.0667   Epoch: 3   Global Step: 61200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:52,830-Speed 9521.69 samples/sec   Loss 8.0808   LearningRate 0.0667   Epoch: 3   Global Step: 61210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:53,929-Speed 9322.99 samples/sec   Loss 8.0491   LearningRate 0.0667   Epoch: 3   Global Step: 61220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:55,038-Speed 9239.28 samples/sec   Loss 8.1429   LearningRate 0.0667   Epoch: 3   Global Step: 61230   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 13:59:56,093-Speed 9707.64 samples/sec   Loss 8.2563   LearningRate 0.0667   Epoch: 3   Global Step: 61240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:57,245-Speed 8896.86 samples/sec   Loss 8.0928   LearningRate 0.0667   Epoch: 3   Global Step: 61250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:58,321-Speed 9517.79 samples/sec   Loss 8.0921   LearningRate 0.0667   Epoch: 3   Global Step: 61260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 13:59:59,375-Speed 9723.54 samples/sec   Loss 8.1132   LearningRate 0.0667   Epoch: 3   Global Step: 61270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:00,477-Speed 9299.64 samples/sec   Loss 8.0698   LearningRate 0.0667   Epoch: 3   Global Step: 61280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:01,554-Speed 9512.22 samples/sec   Loss 8.0154   LearningRate 0.0667   Epoch: 3   Global Step: 61290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:02,623-Speed 9585.01 samples/sec   Loss 8.0801   LearningRate 0.0666   Epoch: 3   Global Step: 61300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:03,739-Speed 9185.54 samples/sec   Loss 8.1681   LearningRate 0.0666   Epoch: 3   Global Step: 61310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:04,822-Speed 9461.23 samples/sec   Loss 8.3041   LearningRate 0.0666   Epoch: 3   Global Step: 61320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:05,891-Speed 9585.83 samples/sec   Loss 8.2590   LearningRate 0.0666   Epoch: 3   Global Step: 61330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:07,005-Speed 9196.84 samples/sec   Loss 8.1364   LearningRate 0.0666   Epoch: 3   Global Step: 61340   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:00:08,093-Speed 9423.35 samples/sec   Loss 8.0723   LearningRate 0.0666   Epoch: 3   Global Step: 61350   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:00:09,155-Speed 9648.11 samples/sec   Loss 8.1451   LearningRate 0.0666   Epoch: 3   Global Step: 61360   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:00:10,221-Speed 9614.85 samples/sec   Loss 8.2209   LearningRate 0.0666   Epoch: 3   Global Step: 61370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:11,309-Speed 9417.49 samples/sec   Loss 8.0918   LearningRate 0.0666   Epoch: 3   Global Step: 61380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:12,411-Speed 9296.79 samples/sec   Loss 8.0769   LearningRate 0.0666   Epoch: 3   Global Step: 61390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:13,496-Speed 9441.99 samples/sec   Loss 8.1623   LearningRate 0.0666   Epoch: 3   Global Step: 61400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:14,563-Speed 9601.14 samples/sec   Loss 8.0910   LearningRate 0.0666   Epoch: 3   Global Step: 61410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:15,632-Speed 9583.66 samples/sec   Loss 8.2304   LearningRate 0.0666   Epoch: 3   Global Step: 61420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:16,705-Speed 9546.28 samples/sec   Loss 8.1340   LearningRate 0.0666   Epoch: 3   Global Step: 61430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:17,759-Speed 9722.96 samples/sec   Loss 8.2323   LearningRate 0.0666   Epoch: 3   Global Step: 61440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:18,845-Speed 9432.40 samples/sec   Loss 8.0576   LearningRate 0.0666   Epoch: 3   Global Step: 61450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:19,895-Speed 9761.07 samples/sec   Loss 8.0011   LearningRate 0.0666   Epoch: 3   Global Step: 61460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:20,971-Speed 9524.62 samples/sec   Loss 8.1380   LearningRate 0.0666   Epoch: 3   Global Step: 61470   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:00:22,053-Speed 9476.26 samples/sec   Loss 8.0904   LearningRate 0.0666   Epoch: 3   Global Step: 61480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:23,104-Speed 9741.85 samples/sec   Loss 8.1162   LearningRate 0.0666   Epoch: 3   Global Step: 61490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:00:24,158-Speed 9726.57 samples/sec   Loss 8.0278   LearningRate 0.0665   Epoch: 3   Global Step: 61500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:00:25,235-Speed 9511.04 samples/sec   Loss 8.1546   LearningRate 0.0665   Epoch: 3   Global Step: 61510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:00:26,330-Speed 9358.54 samples/sec   Loss 8.1378   LearningRate 0.0665   Epoch: 3   Global Step: 61520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:00:27,369-Speed 9862.01 samples/sec   Loss 8.0883   LearningRate 0.0665   Epoch: 3   Global Step: 61530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:00:28,426-Speed 9692.19 samples/sec   Loss 8.1439   LearningRate 0.0665   Epoch: 3   Global Step: 61540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:00:29,514-Speed 9417.95 samples/sec   Loss 8.1972   LearningRate 0.0665   Epoch: 3   Global Step: 61550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:00:30,591-Speed 9506.41 samples/sec   Loss 8.1121   LearningRate 0.0665   Epoch: 3   Global Step: 61560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:00:31,666-Speed 9533.74 samples/sec   Loss 8.0406   LearningRate 0.0665   Epoch: 3   Global Step: 61570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:00:32,755-Speed 9415.63 samples/sec   Loss 8.2305   LearningRate 0.0665   Epoch: 3   Global Step: 61580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:00:33,817-Speed 9643.73 samples/sec   Loss 8.0419   LearningRate 0.0665   Epoch: 3   Global Step: 61590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:34,915-Speed 9331.11 samples/sec   Loss 8.1358   LearningRate 0.0665   Epoch: 3   Global Step: 61600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:36,001-Speed 9436.17 samples/sec   Loss 8.0533   LearningRate 0.0665   Epoch: 3   Global Step: 61610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:37,087-Speed 9432.80 samples/sec   Loss 8.1674   LearningRate 0.0665   Epoch: 3   Global Step: 61620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:38,166-Speed 9505.81 samples/sec   Loss 8.1983   LearningRate 0.0665   Epoch: 3   Global Step: 61630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:39,228-Speed 9647.58 samples/sec   Loss 8.1667   LearningRate 0.0665   Epoch: 3   Global Step: 61640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:40,308-Speed 9485.01 samples/sec   Loss 8.2306   LearningRate 0.0665   Epoch: 3   Global Step: 61650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:41,394-Speed 9434.96 samples/sec   Loss 8.1200   LearningRate 0.0665   Epoch: 3   Global Step: 61660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:42,526-Speed 9053.08 samples/sec   Loss 8.2916   LearningRate 0.0665   Epoch: 3   Global Step: 61670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:43,621-Speed 9356.05 samples/sec   Loss 8.2578   LearningRate 0.0665   Epoch: 3   Global Step: 61680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:44,686-Speed 9619.30 samples/sec   Loss 8.1695   LearningRate 0.0665   Epoch: 3   Global Step: 61690   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:00:45,791-Speed 9272.15 samples/sec   Loss 8.0865   LearningRate 0.0665   Epoch: 3   Global Step: 61700   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:00:46,866-Speed 9535.51 samples/sec   Loss 8.1397   LearningRate 0.0664   Epoch: 3   Global Step: 61710   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:00:47,927-Speed 9649.64 samples/sec   Loss 8.1878   LearningRate 0.0664   Epoch: 3   Global Step: 61720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:48,987-Speed 9673.98 samples/sec   Loss 8.1552   LearningRate 0.0664   Epoch: 3   Global Step: 61730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:50,108-Speed 9138.15 samples/sec   Loss 8.0100   LearningRate 0.0664   Epoch: 3   Global Step: 61740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:51,185-Speed 9515.12 samples/sec   Loss 8.2545   LearningRate 0.0664   Epoch: 3   Global Step: 61750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:52,252-Speed 9604.06 samples/sec   Loss 8.1881   LearningRate 0.0664   Epoch: 3   Global Step: 61760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:53,370-Speed 9157.60 samples/sec   Loss 8.2490   LearningRate 0.0664   Epoch: 3   Global Step: 61770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:54,437-Speed 9610.18 samples/sec   Loss 8.2192   LearningRate 0.0664   Epoch: 3   Global Step: 61780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:55,484-Speed 9785.32 samples/sec   Loss 8.1950   LearningRate 0.0664   Epoch: 3   Global Step: 61790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:56,523-Speed 9862.82 samples/sec   Loss 8.0447   LearningRate 0.0664   Epoch: 3   Global Step: 61800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:57,608-Speed 9442.45 samples/sec   Loss 8.1328   LearningRate 0.0664   Epoch: 3   Global Step: 61810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:00:58,728-Speed 9143.67 samples/sec   Loss 8.1037   LearningRate 0.0664   Epoch: 3   Global Step: 61820   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:00:59,830-Speed 9296.68 samples/sec   Loss 8.2718   LearningRate 0.0664   Epoch: 3   Global Step: 61830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:01:00,892-Speed 9645.85 samples/sec   Loss 8.0425   LearningRate 0.0664   Epoch: 3   Global Step: 61840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:01:01,967-Speed 9538.11 samples/sec   Loss 8.0752   LearningRate 0.0664   Epoch: 3   Global Step: 61850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:01:03,075-Speed 9252.51 samples/sec   Loss 8.1950   LearningRate 0.0664   Epoch: 3   Global Step: 61860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:01:04,151-Speed 9518.81 samples/sec   Loss 8.1432   LearningRate 0.0664   Epoch: 3   Global Step: 61870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:01:05,226-Speed 9526.73 samples/sec   Loss 8.1614   LearningRate 0.0664   Epoch: 3   Global Step: 61880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:01:06,280-Speed 9727.05 samples/sec   Loss 8.1873   LearningRate 0.0664   Epoch: 3   Global Step: 61890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:01:07,316-Speed 9882.97 samples/sec   Loss 8.1182   LearningRate 0.0664   Epoch: 3   Global Step: 61900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:01:08,412-Speed 9353.68 samples/sec   Loss 8.1712   LearningRate 0.0663   Epoch: 3   Global Step: 61910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:01:09,506-Speed 9363.62 samples/sec   Loss 8.0398   LearningRate 0.0663   Epoch: 3   Global Step: 61920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:01:10,591-Speed 9444.12 samples/sec   Loss 8.1154   LearningRate 0.0663   Epoch: 3   Global Step: 61930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:01:11,688-Speed 9337.76 samples/sec   Loss 8.1352   LearningRate 0.0663   Epoch: 3   Global Step: 61940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:01:12,751-Speed 9638.45 samples/sec   Loss 8.0922   LearningRate 0.0663   Epoch: 3   Global Step: 61950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:01:13,840-Speed 9415.64 samples/sec   Loss 7.9529   LearningRate 0.0663   Epoch: 3   Global Step: 61960   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:01:14,914-Speed 9542.32 samples/sec   Loss 8.0005   LearningRate 0.0663   Epoch: 3   Global Step: 61970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:01:16,033-Speed 9148.58 samples/sec   Loss 8.1050   LearningRate 0.0663   Epoch: 3   Global Step: 61980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:01:17,101-Speed 9602.08 samples/sec   Loss 8.0619   LearningRate 0.0663   Epoch: 3   Global Step: 61990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:01:18,180-Speed 9486.86 samples/sec   Loss 8.2262   LearningRate 0.0663   Epoch: 3   Global Step: 62000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:01:39,848-[lfw][62000]XNorm: 12.535790
Training: 2022-04-11 14:01:39,849-[lfw][62000]Accuracy-Flip: 0.99467+-0.00267
Training: 2022-04-11 14:01:39,849-[lfw][62000]Accuracy-Highest: 0.99583
Training: 2022-04-11 14:02:04,907-[cfp_fp][62000]XNorm: 10.490901
Training: 2022-04-11 14:02:04,908-[cfp_fp][62000]Accuracy-Flip: 0.94614+-0.01275
Training: 2022-04-11 14:02:04,908-[cfp_fp][62000]Accuracy-Highest: 0.95157
Training: 2022-04-11 14:02:26,534-[agedb_30][62000]XNorm: 12.107767
Training: 2022-04-11 14:02:26,535-[agedb_30][62000]Accuracy-Flip: 0.95617+-0.01600
Training: 2022-04-11 14:02:26,535-[agedb_30][62000]Accuracy-Highest: 0.95767
Training: 2022-04-11 14:02:27,621-Speed 147.47 samples/sec   Loss 8.0962   LearningRate 0.0663   Epoch: 3   Global Step: 62010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:02:28,663-Speed 9829.99 samples/sec   Loss 8.0633   LearningRate 0.0663   Epoch: 3   Global Step: 62020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:02:29,739-Speed 9521.79 samples/sec   Loss 7.9930   LearningRate 0.0663   Epoch: 3   Global Step: 62030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:02:30,803-Speed 9633.26 samples/sec   Loss 8.1779   LearningRate 0.0663   Epoch: 3   Global Step: 62040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:02:31,940-Speed 9017.89 samples/sec   Loss 8.0082   LearningRate 0.0663   Epoch: 3   Global Step: 62050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:02:33,032-Speed 9376.42 samples/sec   Loss 8.1522   LearningRate 0.0663   Epoch: 3   Global Step: 62060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:02:34,139-Speed 9253.94 samples/sec   Loss 8.1681   LearningRate 0.0663   Epoch: 3   Global Step: 62070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:02:35,270-Speed 9061.51 samples/sec   Loss 8.0441   LearningRate 0.0663   Epoch: 3   Global Step: 62080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:02:36,334-Speed 9632.14 samples/sec   Loss 8.0274   LearningRate 0.0663   Epoch: 3   Global Step: 62090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:02:37,410-Speed 9522.15 samples/sec   Loss 8.2168   LearningRate 0.0663   Epoch: 3   Global Step: 62100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:02:38,482-Speed 9558.33 samples/sec   Loss 8.0907   LearningRate 0.0663   Epoch: 3   Global Step: 62110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:02:39,585-Speed 9292.04 samples/sec   Loss 8.1532   LearningRate 0.0662   Epoch: 3   Global Step: 62120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:02:40,655-Speed 9573.74 samples/sec   Loss 8.1692   LearningRate 0.0662   Epoch: 3   Global Step: 62130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:02:41,773-Speed 9166.54 samples/sec   Loss 8.0625   LearningRate 0.0662   Epoch: 3   Global Step: 62140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:02:42,848-Speed 9531.21 samples/sec   Loss 8.2612   LearningRate 0.0662   Epoch: 3   Global Step: 62150   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:02:43,931-Speed 9464.27 samples/sec   Loss 8.1555   LearningRate 0.0662   Epoch: 3   Global Step: 62160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:02:44,976-Speed 9801.08 samples/sec   Loss 8.0017   LearningRate 0.0662   Epoch: 3   Global Step: 62170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:02:46,052-Speed 9520.16 samples/sec   Loss 8.1947   LearningRate 0.0662   Epoch: 3   Global Step: 62180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:02:47,134-Speed 9472.50 samples/sec   Loss 8.0616   LearningRate 0.0662   Epoch: 3   Global Step: 62190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:02:48,226-Speed 9377.00 samples/sec   Loss 8.1016   LearningRate 0.0662   Epoch: 3   Global Step: 62200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:02:49,298-Speed 9560.93 samples/sec   Loss 8.1361   LearningRate 0.0662   Epoch: 3   Global Step: 62210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:02:50,348-Speed 9757.54 samples/sec   Loss 8.1979   LearningRate 0.0662   Epoch: 3   Global Step: 62220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:02:51,435-Speed 9432.14 samples/sec   Loss 8.0354   LearningRate 0.0662   Epoch: 3   Global Step: 62230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:02:52,544-Speed 9233.62 samples/sec   Loss 8.1337   LearningRate 0.0662   Epoch: 3   Global Step: 62240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:02:53,623-Speed 9501.48 samples/sec   Loss 8.2102   LearningRate 0.0662   Epoch: 3   Global Step: 62250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:02:54,725-Speed 9294.05 samples/sec   Loss 8.2173   LearningRate 0.0662   Epoch: 3   Global Step: 62260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:02:55,811-Speed 9431.56 samples/sec   Loss 8.2374   LearningRate 0.0662   Epoch: 3   Global Step: 62270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:02:56,915-Speed 9286.47 samples/sec   Loss 8.1744   LearningRate 0.0662   Epoch: 3   Global Step: 62280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:02:57,958-Speed 9823.58 samples/sec   Loss 8.0251   LearningRate 0.0662   Epoch: 3   Global Step: 62290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:02:59,033-Speed 9530.81 samples/sec   Loss 8.1215   LearningRate 0.0662   Epoch: 3   Global Step: 62300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:00,102-Speed 9579.67 samples/sec   Loss 8.2160   LearningRate 0.0662   Epoch: 3   Global Step: 62310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:01,156-Speed 9718.91 samples/sec   Loss 8.0433   LearningRate 0.0661   Epoch: 3   Global Step: 62320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:02,265-Speed 9244.18 samples/sec   Loss 8.0693   LearningRate 0.0661   Epoch: 3   Global Step: 62330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:03,366-Speed 9304.49 samples/sec   Loss 8.2141   LearningRate 0.0661   Epoch: 3   Global Step: 62340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:04,460-Speed 9363.61 samples/sec   Loss 8.1204   LearningRate 0.0661   Epoch: 3   Global Step: 62350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:05,581-Speed 9145.64 samples/sec   Loss 8.1539   LearningRate 0.0661   Epoch: 3   Global Step: 62360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:06,668-Speed 9424.58 samples/sec   Loss 8.1399   LearningRate 0.0661   Epoch: 3   Global Step: 62370   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:03:07,772-Speed 9283.06 samples/sec   Loss 8.0674   LearningRate 0.0661   Epoch: 3   Global Step: 62380   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:03:08,832-Speed 9674.36 samples/sec   Loss 8.1754   LearningRate 0.0661   Epoch: 3   Global Step: 62390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:03:09,877-Speed 9805.46 samples/sec   Loss 8.1413   LearningRate 0.0661   Epoch: 3   Global Step: 62400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:03:10,934-Speed 9686.84 samples/sec   Loss 8.1704   LearningRate 0.0661   Epoch: 3   Global Step: 62410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:03:12,021-Speed 9427.64 samples/sec   Loss 8.2325   LearningRate 0.0661   Epoch: 3   Global Step: 62420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:03:13,119-Speed 9333.72 samples/sec   Loss 8.0003   LearningRate 0.0661   Epoch: 3   Global Step: 62430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:03:14,201-Speed 9467.95 samples/sec   Loss 8.0435   LearningRate 0.0661   Epoch: 3   Global Step: 62440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:03:15,282-Speed 9472.11 samples/sec   Loss 8.0719   LearningRate 0.0661   Epoch: 3   Global Step: 62450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:03:16,374-Speed 9386.27 samples/sec   Loss 8.2025   LearningRate 0.0661   Epoch: 3   Global Step: 62460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:03:17,476-Speed 9294.47 samples/sec   Loss 8.0890   LearningRate 0.0661   Epoch: 3   Global Step: 62470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:03:18,560-Speed 9454.18 samples/sec   Loss 8.1051   LearningRate 0.0661   Epoch: 3   Global Step: 62480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:03:19,644-Speed 9450.80 samples/sec   Loss 8.2192   LearningRate 0.0661   Epoch: 3   Global Step: 62490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:20,756-Speed 9214.81 samples/sec   Loss 8.1217   LearningRate 0.0661   Epoch: 3   Global Step: 62500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:21,852-Speed 9346.55 samples/sec   Loss 8.1370   LearningRate 0.0661   Epoch: 3   Global Step: 62510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:22,908-Speed 9705.02 samples/sec   Loss 8.1773   LearningRate 0.0661   Epoch: 3   Global Step: 62520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:24,000-Speed 9382.38 samples/sec   Loss 8.0510   LearningRate 0.0660   Epoch: 3   Global Step: 62530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:25,066-Speed 9615.61 samples/sec   Loss 8.1701   LearningRate 0.0660   Epoch: 3   Global Step: 62540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:26,136-Speed 9575.56 samples/sec   Loss 8.1021   LearningRate 0.0660   Epoch: 3   Global Step: 62550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:27,202-Speed 9610.23 samples/sec   Loss 8.1574   LearningRate 0.0660   Epoch: 3   Global Step: 62560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:28,275-Speed 9553.60 samples/sec   Loss 8.0768   LearningRate 0.0660   Epoch: 3   Global Step: 62570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:29,356-Speed 9476.43 samples/sec   Loss 7.9883   LearningRate 0.0660   Epoch: 3   Global Step: 62580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:30,418-Speed 9647.15 samples/sec   Loss 8.0346   LearningRate 0.0660   Epoch: 3   Global Step: 62590   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:03:31,521-Speed 9290.11 samples/sec   Loss 8.0134   LearningRate 0.0660   Epoch: 3   Global Step: 62600   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:03:32,586-Speed 9624.10 samples/sec   Loss 8.1275   LearningRate 0.0660   Epoch: 3   Global Step: 62610   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:03:33,679-Speed 9373.93 samples/sec   Loss 8.1455   LearningRate 0.0660   Epoch: 3   Global Step: 62620   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:03:34,723-Speed 9812.99 samples/sec   Loss 8.0717   LearningRate 0.0660   Epoch: 3   Global Step: 62630   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:03:35,810-Speed 9421.21 samples/sec   Loss 8.0536   LearningRate 0.0660   Epoch: 3   Global Step: 62640   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:03:36,892-Speed 9472.48 samples/sec   Loss 8.1023   LearningRate 0.0660   Epoch: 3   Global Step: 62650   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:03:37,946-Speed 9719.43 samples/sec   Loss 8.1461   LearningRate 0.0660   Epoch: 3   Global Step: 62660   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:03:39,016-Speed 9582.64 samples/sec   Loss 8.1025   LearningRate 0.0660   Epoch: 3   Global Step: 62670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:40,129-Speed 9200.53 samples/sec   Loss 8.1334   LearningRate 0.0660   Epoch: 3   Global Step: 62680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:41,196-Speed 9605.31 samples/sec   Loss 8.0829   LearningRate 0.0660   Epoch: 3   Global Step: 62690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:42,291-Speed 9359.41 samples/sec   Loss 8.2012   LearningRate 0.0660   Epoch: 3   Global Step: 62700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:43,414-Speed 9119.06 samples/sec   Loss 8.0792   LearningRate 0.0660   Epoch: 3   Global Step: 62710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:44,538-Speed 9113.78 samples/sec   Loss 8.0695   LearningRate 0.0660   Epoch: 3   Global Step: 62720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:45,593-Speed 9712.87 samples/sec   Loss 8.2932   LearningRate 0.0659   Epoch: 3   Global Step: 62730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:46,642-Speed 9773.52 samples/sec   Loss 8.1341   LearningRate 0.0659   Epoch: 3   Global Step: 62740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:47,732-Speed 9404.37 samples/sec   Loss 8.0218   LearningRate 0.0659   Epoch: 3   Global Step: 62750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:48,836-Speed 9275.45 samples/sec   Loss 8.0970   LearningRate 0.0659   Epoch: 3   Global Step: 62760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:49,983-Speed 8933.83 samples/sec   Loss 8.1289   LearningRate 0.0659   Epoch: 3   Global Step: 62770   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:03:51,021-Speed 9870.42 samples/sec   Loss 8.1156   LearningRate 0.0659   Epoch: 3   Global Step: 62780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:52,134-Speed 9206.63 samples/sec   Loss 8.1907   LearningRate 0.0659   Epoch: 3   Global Step: 62790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:53,215-Speed 9476.17 samples/sec   Loss 8.1228   LearningRate 0.0659   Epoch: 3   Global Step: 62800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:54,280-Speed 9627.28 samples/sec   Loss 8.1857   LearningRate 0.0659   Epoch: 3   Global Step: 62810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:55,323-Speed 9818.71 samples/sec   Loss 8.1692   LearningRate 0.0659   Epoch: 3   Global Step: 62820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:56,406-Speed 9461.05 samples/sec   Loss 8.1778   LearningRate 0.0659   Epoch: 3   Global Step: 62830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:57,521-Speed 9185.38 samples/sec   Loss 8.0686   LearningRate 0.0659   Epoch: 3   Global Step: 62840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:58,617-Speed 9354.51 samples/sec   Loss 8.1596   LearningRate 0.0659   Epoch: 3   Global Step: 62850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:03:59,680-Speed 9640.87 samples/sec   Loss 8.0860   LearningRate 0.0659   Epoch: 3   Global Step: 62860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:00,768-Speed 9412.49 samples/sec   Loss 8.1127   LearningRate 0.0659   Epoch: 3   Global Step: 62870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:01,851-Speed 9462.53 samples/sec   Loss 8.1398   LearningRate 0.0659   Epoch: 3   Global Step: 62880   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:04:02,935-Speed 9455.10 samples/sec   Loss 8.1203   LearningRate 0.0659   Epoch: 3   Global Step: 62890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:03,985-Speed 9760.67 samples/sec   Loss 8.1116   LearningRate 0.0659   Epoch: 3   Global Step: 62900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:05,081-Speed 9345.75 samples/sec   Loss 8.1645   LearningRate 0.0659   Epoch: 3   Global Step: 62910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:06,178-Speed 9343.38 samples/sec   Loss 8.0598   LearningRate 0.0659   Epoch: 3   Global Step: 62920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:07,265-Speed 9429.66 samples/sec   Loss 7.8965   LearningRate 0.0659   Epoch: 3   Global Step: 62930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:08,334-Speed 9589.84 samples/sec   Loss 8.0228   LearningRate 0.0658   Epoch: 3   Global Step: 62940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:09,431-Speed 9336.19 samples/sec   Loss 8.0011   LearningRate 0.0658   Epoch: 3   Global Step: 62950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:10,496-Speed 9618.97 samples/sec   Loss 8.0410   LearningRate 0.0658   Epoch: 3   Global Step: 62960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:11,569-Speed 9551.09 samples/sec   Loss 8.1970   LearningRate 0.0658   Epoch: 3   Global Step: 62970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:12,640-Speed 9562.72 samples/sec   Loss 8.1439   LearningRate 0.0658   Epoch: 3   Global Step: 62980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:13,714-Speed 9544.02 samples/sec   Loss 8.0004   LearningRate 0.0658   Epoch: 3   Global Step: 62990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:14,778-Speed 9630.78 samples/sec   Loss 8.0898   LearningRate 0.0658   Epoch: 3   Global Step: 63000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:15,856-Speed 9505.09 samples/sec   Loss 8.0806   LearningRate 0.0658   Epoch: 3   Global Step: 63010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:16,925-Speed 9581.00 samples/sec   Loss 7.9764   LearningRate 0.0658   Epoch: 3   Global Step: 63020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:18,039-Speed 9196.11 samples/sec   Loss 8.0529   LearningRate 0.0658   Epoch: 3   Global Step: 63030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:19,139-Speed 9315.14 samples/sec   Loss 8.0894   LearningRate 0.0658   Epoch: 3   Global Step: 63040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:20,270-Speed 9061.56 samples/sec   Loss 8.0737   LearningRate 0.0658   Epoch: 3   Global Step: 63050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:21,366-Speed 9345.24 samples/sec   Loss 8.0601   LearningRate 0.0658   Epoch: 3   Global Step: 63060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:22,473-Speed 9256.54 samples/sec   Loss 8.0484   LearningRate 0.0658   Epoch: 3   Global Step: 63070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:23,602-Speed 9079.18 samples/sec   Loss 8.0660   LearningRate 0.0658   Epoch: 3   Global Step: 63080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:24,684-Speed 9466.43 samples/sec   Loss 8.1680   LearningRate 0.0658   Epoch: 3   Global Step: 63090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:25,763-Speed 9498.87 samples/sec   Loss 8.1682   LearningRate 0.0658   Epoch: 3   Global Step: 63100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:26,865-Speed 9305.37 samples/sec   Loss 8.0447   LearningRate 0.0658   Epoch: 3   Global Step: 63110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:27,979-Speed 9197.01 samples/sec   Loss 8.1414   LearningRate 0.0658   Epoch: 3   Global Step: 63120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:29,088-Speed 9235.29 samples/sec   Loss 8.1192   LearningRate 0.0658   Epoch: 3   Global Step: 63130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:30,217-Speed 9071.56 samples/sec   Loss 8.0559   LearningRate 0.0657   Epoch: 3   Global Step: 63140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:31,259-Speed 9834.21 samples/sec   Loss 8.1445   LearningRate 0.0657   Epoch: 3   Global Step: 63150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:32,316-Speed 9693.84 samples/sec   Loss 8.1010   LearningRate 0.0657   Epoch: 3   Global Step: 63160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:33,410-Speed 9373.77 samples/sec   Loss 8.1407   LearningRate 0.0657   Epoch: 3   Global Step: 63170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:34,471-Speed 9648.31 samples/sec   Loss 8.0964   LearningRate 0.0657   Epoch: 3   Global Step: 63180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:35,527-Speed 9709.39 samples/sec   Loss 8.0271   LearningRate 0.0657   Epoch: 3   Global Step: 63190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:36,621-Speed 9359.41 samples/sec   Loss 8.0819   LearningRate 0.0657   Epoch: 3   Global Step: 63200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:37,712-Speed 9392.80 samples/sec   Loss 7.9839   LearningRate 0.0657   Epoch: 3   Global Step: 63210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:38,808-Speed 9354.81 samples/sec   Loss 8.0921   LearningRate 0.0657   Epoch: 3   Global Step: 63220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:39,844-Speed 9897.38 samples/sec   Loss 8.1054   LearningRate 0.0657   Epoch: 3   Global Step: 63230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:40,884-Speed 9851.90 samples/sec   Loss 7.9855   LearningRate 0.0657   Epoch: 3   Global Step: 63240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:41,966-Speed 9468.39 samples/sec   Loss 8.0502   LearningRate 0.0657   Epoch: 3   Global Step: 63250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:43,071-Speed 9267.48 samples/sec   Loss 8.1213   LearningRate 0.0657   Epoch: 3   Global Step: 63260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:44,109-Speed 9873.57 samples/sec   Loss 7.9884   LearningRate 0.0657   Epoch: 3   Global Step: 63270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:45,197-Speed 9412.76 samples/sec   Loss 8.0888   LearningRate 0.0657   Epoch: 3   Global Step: 63280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:46,236-Speed 9861.88 samples/sec   Loss 8.0803   LearningRate 0.0657   Epoch: 3   Global Step: 63290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:47,280-Speed 9817.52 samples/sec   Loss 8.0684   LearningRate 0.0657   Epoch: 3   Global Step: 63300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:48,328-Speed 9782.27 samples/sec   Loss 8.1316   LearningRate 0.0657   Epoch: 3   Global Step: 63310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:49,385-Speed 9686.96 samples/sec   Loss 8.1295   LearningRate 0.0657   Epoch: 3   Global Step: 63320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:50,509-Speed 9118.40 samples/sec   Loss 8.1158   LearningRate 0.0657   Epoch: 3   Global Step: 63330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:51,570-Speed 9662.86 samples/sec   Loss 8.0966   LearningRate 0.0657   Epoch: 3   Global Step: 63340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:52,636-Speed 9606.98 samples/sec   Loss 8.1720   LearningRate 0.0656   Epoch: 3   Global Step: 63350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:53,676-Speed 9856.98 samples/sec   Loss 8.0711   LearningRate 0.0656   Epoch: 3   Global Step: 63360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:54,736-Speed 9658.57 samples/sec   Loss 7.9997   LearningRate 0.0656   Epoch: 3   Global Step: 63370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:55,767-Speed 9940.03 samples/sec   Loss 8.1925   LearningRate 0.0656   Epoch: 3   Global Step: 63380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:56,789-Speed 10025.60 samples/sec   Loss 8.0205   LearningRate 0.0656   Epoch: 3   Global Step: 63390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:04:57,926-Speed 9008.32 samples/sec   Loss 8.1010   LearningRate 0.0656   Epoch: 3   Global Step: 63400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:04:58,997-Speed 9572.97 samples/sec   Loss 8.0714   LearningRate 0.0656   Epoch: 3   Global Step: 63410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:00,086-Speed 9408.25 samples/sec   Loss 8.1929   LearningRate 0.0656   Epoch: 3   Global Step: 63420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:01,141-Speed 9708.23 samples/sec   Loss 8.1075   LearningRate 0.0656   Epoch: 3   Global Step: 63430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:02,205-Speed 9636.95 samples/sec   Loss 8.1317   LearningRate 0.0656   Epoch: 3   Global Step: 63440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:03,294-Speed 9406.05 samples/sec   Loss 8.0468   LearningRate 0.0656   Epoch: 3   Global Step: 63450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:04,405-Speed 9221.96 samples/sec   Loss 7.9587   LearningRate 0.0656   Epoch: 3   Global Step: 63460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:05,497-Speed 9385.28 samples/sec   Loss 8.0678   LearningRate 0.0656   Epoch: 3   Global Step: 63470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:06,544-Speed 9788.51 samples/sec   Loss 8.0582   LearningRate 0.0656   Epoch: 3   Global Step: 63480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:07,574-Speed 9942.34 samples/sec   Loss 8.0661   LearningRate 0.0656   Epoch: 3   Global Step: 63490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:08,629-Speed 9718.95 samples/sec   Loss 8.0227   LearningRate 0.0656   Epoch: 3   Global Step: 63500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:09,692-Speed 9640.64 samples/sec   Loss 8.1288   LearningRate 0.0656   Epoch: 3   Global Step: 63510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:10,785-Speed 9375.02 samples/sec   Loss 8.0826   LearningRate 0.0656   Epoch: 3   Global Step: 63520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:11,851-Speed 9610.76 samples/sec   Loss 8.1239   LearningRate 0.0656   Epoch: 3   Global Step: 63530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:12,937-Speed 9432.69 samples/sec   Loss 8.1068   LearningRate 0.0656   Epoch: 3   Global Step: 63540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:13,992-Speed 9710.74 samples/sec   Loss 8.0180   LearningRate 0.0655   Epoch: 3   Global Step: 63550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:15,043-Speed 9746.10 samples/sec   Loss 8.1522   LearningRate 0.0655   Epoch: 3   Global Step: 63560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:16,103-Speed 9666.31 samples/sec   Loss 8.2022   LearningRate 0.0655   Epoch: 3   Global Step: 63570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:17,201-Speed 9329.66 samples/sec   Loss 8.1329   LearningRate 0.0655   Epoch: 3   Global Step: 63580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:18,316-Speed 9190.49 samples/sec   Loss 8.1668   LearningRate 0.0655   Epoch: 3   Global Step: 63590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:19,399-Speed 9463.66 samples/sec   Loss 8.0617   LearningRate 0.0655   Epoch: 3   Global Step: 63600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:20,539-Speed 8981.47 samples/sec   Loss 8.2073   LearningRate 0.0655   Epoch: 3   Global Step: 63610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:21,627-Speed 9427.48 samples/sec   Loss 8.1505   LearningRate 0.0655   Epoch: 3   Global Step: 63620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:22,663-Speed 9884.47 samples/sec   Loss 8.1505   LearningRate 0.0655   Epoch: 3   Global Step: 63630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:23,768-Speed 9276.60 samples/sec   Loss 8.1001   LearningRate 0.0655   Epoch: 3   Global Step: 63640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:24,874-Speed 9258.82 samples/sec   Loss 8.0466   LearningRate 0.0655   Epoch: 3   Global Step: 63650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:25,998-Speed 9116.99 samples/sec   Loss 8.0762   LearningRate 0.0655   Epoch: 3   Global Step: 63660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:27,040-Speed 9831.16 samples/sec   Loss 8.0304   LearningRate 0.0655   Epoch: 3   Global Step: 63670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:28,096-Speed 9707.02 samples/sec   Loss 8.1927   LearningRate 0.0655   Epoch: 3   Global Step: 63680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:29,174-Speed 9506.65 samples/sec   Loss 7.9902   LearningRate 0.0655   Epoch: 3   Global Step: 63690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:30,242-Speed 9593.51 samples/sec   Loss 8.0577   LearningRate 0.0655   Epoch: 3   Global Step: 63700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:31,322-Speed 9483.55 samples/sec   Loss 8.0394   LearningRate 0.0655   Epoch: 3   Global Step: 63710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:32,395-Speed 9558.34 samples/sec   Loss 8.1531   LearningRate 0.0655   Epoch: 3   Global Step: 63720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:33,462-Speed 9599.46 samples/sec   Loss 8.0079   LearningRate 0.0655   Epoch: 3   Global Step: 63730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:34,542-Speed 9487.50 samples/sec   Loss 8.0117   LearningRate 0.0655   Epoch: 3   Global Step: 63740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:35,643-Speed 9304.58 samples/sec   Loss 8.0727   LearningRate 0.0655   Epoch: 3   Global Step: 63750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:36,743-Speed 9309.60 samples/sec   Loss 8.1463   LearningRate 0.0654   Epoch: 3   Global Step: 63760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:37,821-Speed 9503.14 samples/sec   Loss 8.0227   LearningRate 0.0654   Epoch: 3   Global Step: 63770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:38,930-Speed 9245.11 samples/sec   Loss 8.1757   LearningRate 0.0654   Epoch: 3   Global Step: 63780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:40,009-Speed 9496.19 samples/sec   Loss 8.1271   LearningRate 0.0654   Epoch: 3   Global Step: 63790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:41,082-Speed 9547.47 samples/sec   Loss 8.0107   LearningRate 0.0654   Epoch: 3   Global Step: 63800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:05:42,207-Speed 9106.20 samples/sec   Loss 8.1254   LearningRate 0.0654   Epoch: 3   Global Step: 63810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:43,314-Speed 9260.74 samples/sec   Loss 8.0651   LearningRate 0.0654   Epoch: 3   Global Step: 63820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:44,425-Speed 9215.25 samples/sec   Loss 8.0203   LearningRate 0.0654   Epoch: 3   Global Step: 63830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:45,499-Speed 9548.07 samples/sec   Loss 8.0333   LearningRate 0.0654   Epoch: 3   Global Step: 63840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:46,564-Speed 9618.18 samples/sec   Loss 8.1330   LearningRate 0.0654   Epoch: 3   Global Step: 63850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:47,648-Speed 9449.94 samples/sec   Loss 8.0104   LearningRate 0.0654   Epoch: 3   Global Step: 63860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:48,742-Speed 9374.55 samples/sec   Loss 7.9858   LearningRate 0.0654   Epoch: 3   Global Step: 63870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:49,811-Speed 9585.29 samples/sec   Loss 8.1236   LearningRate 0.0654   Epoch: 3   Global Step: 63880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:50,917-Speed 9261.56 samples/sec   Loss 8.0638   LearningRate 0.0654   Epoch: 3   Global Step: 63890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:51,997-Speed 9491.10 samples/sec   Loss 8.0112   LearningRate 0.0654   Epoch: 3   Global Step: 63900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:53,063-Speed 9604.31 samples/sec   Loss 8.0760   LearningRate 0.0654   Epoch: 3   Global Step: 63910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:54,178-Speed 9192.74 samples/sec   Loss 7.9444   LearningRate 0.0654   Epoch: 3   Global Step: 63920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:55,256-Speed 9500.14 samples/sec   Loss 8.1093   LearningRate 0.0654   Epoch: 3   Global Step: 63930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:56,319-Speed 9640.94 samples/sec   Loss 7.9435   LearningRate 0.0654   Epoch: 3   Global Step: 63940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:57,400-Speed 9474.21 samples/sec   Loss 8.0957   LearningRate 0.0654   Epoch: 3   Global Step: 63950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:58,490-Speed 9399.40 samples/sec   Loss 8.0984   LearningRate 0.0654   Epoch: 3   Global Step: 63960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:05:59,557-Speed 9609.32 samples/sec   Loss 8.0063   LearningRate 0.0653   Epoch: 3   Global Step: 63970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:06:00,612-Speed 9712.89 samples/sec   Loss 8.0726   LearningRate 0.0653   Epoch: 3   Global Step: 63980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:06:01,670-Speed 9684.04 samples/sec   Loss 8.0298   LearningRate 0.0653   Epoch: 3   Global Step: 63990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:06:02,735-Speed 9617.46 samples/sec   Loss 7.9546   LearningRate 0.0653   Epoch: 3   Global Step: 64000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:06:24,651-[lfw][64000]XNorm: 12.652423
Training: 2022-04-11 14:06:24,652-[lfw][64000]Accuracy-Flip: 0.99567+-0.00200
Training: 2022-04-11 14:06:24,652-[lfw][64000]Accuracy-Highest: 0.99583
Training: 2022-04-11 14:06:50,007-[cfp_fp][64000]XNorm: 10.509308
Training: 2022-04-11 14:06:50,008-[cfp_fp][64000]Accuracy-Flip: 0.94657+-0.01031
Training: 2022-04-11 14:06:50,008-[cfp_fp][64000]Accuracy-Highest: 0.95157
Training: 2022-04-11 14:07:11,890-[agedb_30][64000]XNorm: 12.163206
Training: 2022-04-11 14:07:11,891-[agedb_30][64000]Accuracy-Flip: 0.95833+-0.01121
Training: 2022-04-11 14:07:11,891-[agedb_30][64000]Accuracy-Highest: 0.95833
Training: 2022-04-11 14:07:12,945-Speed 145.85 samples/sec   Loss 8.1283   LearningRate 0.0653   Epoch: 3   Global Step: 64010   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:07:14,001-Speed 9701.68 samples/sec   Loss 8.0709   LearningRate 0.0653   Epoch: 3   Global Step: 64020   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:07:15,087-Speed 9435.80 samples/sec   Loss 8.0700   LearningRate 0.0653   Epoch: 3   Global Step: 64030   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:07:16,151-Speed 9634.82 samples/sec   Loss 8.0137   LearningRate 0.0653   Epoch: 3   Global Step: 64040   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:07:17,229-Speed 9503.94 samples/sec   Loss 8.0842   LearningRate 0.0653   Epoch: 3   Global Step: 64050   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:07:18,309-Speed 9488.26 samples/sec   Loss 7.9888   LearningRate 0.0653   Epoch: 3   Global Step: 64060   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:07:19,362-Speed 9733.24 samples/sec   Loss 8.0175   LearningRate 0.0653   Epoch: 3   Global Step: 64070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:20,413-Speed 9746.16 samples/sec   Loss 8.0968   LearningRate 0.0653   Epoch: 3   Global Step: 64080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:21,472-Speed 9678.33 samples/sec   Loss 7.9254   LearningRate 0.0653   Epoch: 3   Global Step: 64090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:22,558-Speed 9435.85 samples/sec   Loss 8.0795   LearningRate 0.0653   Epoch: 3   Global Step: 64100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:23,658-Speed 9315.37 samples/sec   Loss 8.0666   LearningRate 0.0653   Epoch: 3   Global Step: 64110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:24,758-Speed 9311.48 samples/sec   Loss 7.9737   LearningRate 0.0653   Epoch: 3   Global Step: 64120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:25,819-Speed 9652.08 samples/sec   Loss 8.0760   LearningRate 0.0653   Epoch: 3   Global Step: 64130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:26,927-Speed 9244.47 samples/sec   Loss 8.0761   LearningRate 0.0653   Epoch: 3   Global Step: 64140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:28,027-Speed 9320.52 samples/sec   Loss 8.1723   LearningRate 0.0653   Epoch: 3   Global Step: 64150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:29,150-Speed 9123.80 samples/sec   Loss 8.0231   LearningRate 0.0653   Epoch: 3   Global Step: 64160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:30,246-Speed 9341.62 samples/sec   Loss 8.0031   LearningRate 0.0652   Epoch: 3   Global Step: 64170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:31,355-Speed 9244.40 samples/sec   Loss 7.9814   LearningRate 0.0652   Epoch: 3   Global Step: 64180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:32,437-Speed 9465.14 samples/sec   Loss 8.0609   LearningRate 0.0652   Epoch: 3   Global Step: 64190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:33,552-Speed 9194.62 samples/sec   Loss 7.9305   LearningRate 0.0652   Epoch: 3   Global Step: 64200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:34,694-Speed 8970.68 samples/sec   Loss 7.9421   LearningRate 0.0652   Epoch: 3   Global Step: 64210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:07:35,791-Speed 9334.39 samples/sec   Loss 8.1242   LearningRate 0.0652   Epoch: 3   Global Step: 64220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:07:36,853-Speed 9650.34 samples/sec   Loss 7.8784   LearningRate 0.0652   Epoch: 3   Global Step: 64230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:07:37,896-Speed 9831.23 samples/sec   Loss 8.0598   LearningRate 0.0652   Epoch: 3   Global Step: 64240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:07:38,995-Speed 9327.77 samples/sec   Loss 7.9615   LearningRate 0.0652   Epoch: 3   Global Step: 64250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:07:40,069-Speed 9538.23 samples/sec   Loss 8.1566   LearningRate 0.0652   Epoch: 3   Global Step: 64260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:07:41,157-Speed 9419.78 samples/sec   Loss 8.0671   LearningRate 0.0652   Epoch: 3   Global Step: 64270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:07:42,224-Speed 9603.29 samples/sec   Loss 8.0551   LearningRate 0.0652   Epoch: 3   Global Step: 64280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:07:43,286-Speed 9642.16 samples/sec   Loss 8.1008   LearningRate 0.0652   Epoch: 3   Global Step: 64290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:07:44,358-Speed 9562.02 samples/sec   Loss 8.0470   LearningRate 0.0652   Epoch: 3   Global Step: 64300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:07:45,428-Speed 9574.53 samples/sec   Loss 8.0667   LearningRate 0.0652   Epoch: 3   Global Step: 64310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:46,483-Speed 9711.07 samples/sec   Loss 7.8928   LearningRate 0.0652   Epoch: 3   Global Step: 64320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:47,574-Speed 9388.13 samples/sec   Loss 8.0775   LearningRate 0.0652   Epoch: 3   Global Step: 64330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:48,638-Speed 9633.24 samples/sec   Loss 8.0966   LearningRate 0.0652   Epoch: 3   Global Step: 64340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:49,709-Speed 9564.79 samples/sec   Loss 7.9076   LearningRate 0.0652   Epoch: 3   Global Step: 64350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:50,776-Speed 9599.96 samples/sec   Loss 7.9930   LearningRate 0.0652   Epoch: 3   Global Step: 64360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:51,863-Speed 9433.07 samples/sec   Loss 7.9832   LearningRate 0.0652   Epoch: 3   Global Step: 64370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:52,920-Speed 9693.72 samples/sec   Loss 8.0771   LearningRate 0.0651   Epoch: 3   Global Step: 64380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:54,012-Speed 9376.04 samples/sec   Loss 8.0375   LearningRate 0.0651   Epoch: 3   Global Step: 64390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:55,114-Speed 9296.34 samples/sec   Loss 8.2211   LearningRate 0.0651   Epoch: 3   Global Step: 64400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:56,164-Speed 9764.24 samples/sec   Loss 8.0199   LearningRate 0.0651   Epoch: 3   Global Step: 64410   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:07:57,265-Speed 9308.06 samples/sec   Loss 8.1384   LearningRate 0.0651   Epoch: 3   Global Step: 64420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:58,349-Speed 9450.72 samples/sec   Loss 8.0184   LearningRate 0.0651   Epoch: 3   Global Step: 64430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:07:59,444-Speed 9358.11 samples/sec   Loss 8.0406   LearningRate 0.0651   Epoch: 3   Global Step: 64440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:00,507-Speed 9636.07 samples/sec   Loss 8.0393   LearningRate 0.0651   Epoch: 3   Global Step: 64450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:01,574-Speed 9600.38 samples/sec   Loss 8.0763   LearningRate 0.0651   Epoch: 3   Global Step: 64460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:02,668-Speed 9373.60 samples/sec   Loss 8.1346   LearningRate 0.0651   Epoch: 3   Global Step: 64470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:03,761-Speed 9373.90 samples/sec   Loss 8.1400   LearningRate 0.0651   Epoch: 3   Global Step: 64480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:04,847-Speed 9433.28 samples/sec   Loss 8.0697   LearningRate 0.0651   Epoch: 3   Global Step: 64490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:05,914-Speed 9605.19 samples/sec   Loss 8.0080   LearningRate 0.0651   Epoch: 3   Global Step: 64500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:06,967-Speed 9730.65 samples/sec   Loss 8.0913   LearningRate 0.0651   Epoch: 3   Global Step: 64510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:08,031-Speed 9633.26 samples/sec   Loss 8.0722   LearningRate 0.0651   Epoch: 3   Global Step: 64520   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:08:09,075-Speed 9811.46 samples/sec   Loss 8.0900   LearningRate 0.0651   Epoch: 3   Global Step: 64530   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:08:10,131-Speed 9696.20 samples/sec   Loss 8.0887   LearningRate 0.0651   Epoch: 3   Global Step: 64540   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:08:11,185-Speed 9728.43 samples/sec   Loss 7.9207   LearningRate 0.0651   Epoch: 3   Global Step: 64550   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:08:12,283-Speed 9325.90 samples/sec   Loss 8.0005   LearningRate 0.0651   Epoch: 3   Global Step: 64560   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:08:13,392-Speed 9241.53 samples/sec   Loss 8.0316   LearningRate 0.0651   Epoch: 3   Global Step: 64570   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:08:14,482-Speed 9400.37 samples/sec   Loss 8.1117   LearningRate 0.0651   Epoch: 3   Global Step: 64580   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:08:15,572-Speed 9402.74 samples/sec   Loss 8.0027   LearningRate 0.0650   Epoch: 3   Global Step: 64590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:16,683-Speed 9222.06 samples/sec   Loss 8.1698   LearningRate 0.0650   Epoch: 3   Global Step: 64600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:17,745-Speed 9651.64 samples/sec   Loss 8.0964   LearningRate 0.0650   Epoch: 3   Global Step: 64610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:18,830-Speed 9439.01 samples/sec   Loss 8.0425   LearningRate 0.0650   Epoch: 3   Global Step: 64620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:19,883-Speed 9728.81 samples/sec   Loss 8.0114   LearningRate 0.0650   Epoch: 3   Global Step: 64630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:20,935-Speed 9737.58 samples/sec   Loss 8.1107   LearningRate 0.0650   Epoch: 3   Global Step: 64640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:22,037-Speed 9302.57 samples/sec   Loss 7.9504   LearningRate 0.0650   Epoch: 3   Global Step: 64650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:23,118-Speed 9472.39 samples/sec   Loss 8.0223   LearningRate 0.0650   Epoch: 3   Global Step: 64660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:24,215-Speed 9341.76 samples/sec   Loss 8.0320   LearningRate 0.0650   Epoch: 3   Global Step: 64670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:25,309-Speed 9366.30 samples/sec   Loss 8.0781   LearningRate 0.0650   Epoch: 3   Global Step: 64680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:26,390-Speed 9484.10 samples/sec   Loss 8.1532   LearningRate 0.0650   Epoch: 3   Global Step: 64690   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:08:27,444-Speed 9718.98 samples/sec   Loss 8.0808   LearningRate 0.0650   Epoch: 3   Global Step: 64700   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:08:28,480-Speed 9900.03 samples/sec   Loss 8.0937   LearningRate 0.0650   Epoch: 3   Global Step: 64710   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:08:29,563-Speed 9459.24 samples/sec   Loss 7.9867   LearningRate 0.0650   Epoch: 3   Global Step: 64720   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:08:30,633-Speed 9581.74 samples/sec   Loss 7.9476   LearningRate 0.0650   Epoch: 3   Global Step: 64730   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:08:31,689-Speed 9701.51 samples/sec   Loss 8.1415   LearningRate 0.0650   Epoch: 3   Global Step: 64740   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:08:32,792-Speed 9282.96 samples/sec   Loss 8.1479   LearningRate 0.0650   Epoch: 3   Global Step: 64750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:33,863-Speed 9570.70 samples/sec   Loss 8.0216   LearningRate 0.0650   Epoch: 3   Global Step: 64760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:34,939-Speed 9528.11 samples/sec   Loss 8.0579   LearningRate 0.0650   Epoch: 3   Global Step: 64770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:36,000-Speed 9650.69 samples/sec   Loss 8.0303   LearningRate 0.0650   Epoch: 3   Global Step: 64780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:37,074-Speed 9546.65 samples/sec   Loss 8.1199   LearningRate 0.0649   Epoch: 3   Global Step: 64790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:38,150-Speed 9526.43 samples/sec   Loss 8.0903   LearningRate 0.0649   Epoch: 3   Global Step: 64800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:39,238-Speed 9408.19 samples/sec   Loss 8.0006   LearningRate 0.0649   Epoch: 3   Global Step: 64810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:40,324-Speed 9435.37 samples/sec   Loss 8.0429   LearningRate 0.0649   Epoch: 3   Global Step: 64820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:41,439-Speed 9194.17 samples/sec   Loss 8.0814   LearningRate 0.0649   Epoch: 3   Global Step: 64830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:42,540-Speed 9302.86 samples/sec   Loss 8.0427   LearningRate 0.0649   Epoch: 3   Global Step: 64840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:43,594-Speed 9720.83 samples/sec   Loss 8.1470   LearningRate 0.0649   Epoch: 3   Global Step: 64850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:44,651-Speed 9693.05 samples/sec   Loss 8.1010   LearningRate 0.0649   Epoch: 3   Global Step: 64860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:45,693-Speed 9832.75 samples/sec   Loss 8.0117   LearningRate 0.0649   Epoch: 3   Global Step: 64870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:46,750-Speed 9696.88 samples/sec   Loss 8.0178   LearningRate 0.0649   Epoch: 3   Global Step: 64880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:47,824-Speed 9542.43 samples/sec   Loss 8.0564   LearningRate 0.0649   Epoch: 3   Global Step: 64890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:48,891-Speed 9596.57 samples/sec   Loss 7.9556   LearningRate 0.0649   Epoch: 3   Global Step: 64900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:49,943-Speed 9740.15 samples/sec   Loss 8.0510   LearningRate 0.0649   Epoch: 3   Global Step: 64910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:51,028-Speed 9446.59 samples/sec   Loss 8.0956   LearningRate 0.0649   Epoch: 3   Global Step: 64920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:52,137-Speed 9234.02 samples/sec   Loss 7.9155   LearningRate 0.0649   Epoch: 3   Global Step: 64930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:53,247-Speed 9240.72 samples/sec   Loss 8.0429   LearningRate 0.0649   Epoch: 3   Global Step: 64940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:08:54,306-Speed 9670.38 samples/sec   Loss 7.9893   LearningRate 0.0649   Epoch: 3   Global Step: 64950   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:08:55,384-Speed 9506.99 samples/sec   Loss 8.1080   LearningRate 0.0649   Epoch: 3   Global Step: 64960   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:08:56,449-Speed 9621.87 samples/sec   Loss 8.0911   LearningRate 0.0649   Epoch: 3   Global Step: 64970   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:08:57,527-Speed 9502.42 samples/sec   Loss 8.0179   LearningRate 0.0649   Epoch: 3   Global Step: 64980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:08:58,610-Speed 9465.49 samples/sec   Loss 7.9756   LearningRate 0.0649   Epoch: 3   Global Step: 64990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:08:59,652-Speed 9836.82 samples/sec   Loss 8.0422   LearningRate 0.0648   Epoch: 3   Global Step: 65000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:00,725-Speed 9553.77 samples/sec   Loss 7.9897   LearningRate 0.0648   Epoch: 3   Global Step: 65010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:01,797-Speed 9561.71 samples/sec   Loss 8.0830   LearningRate 0.0648   Epoch: 3   Global Step: 65020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:02,898-Speed 9303.56 samples/sec   Loss 8.0507   LearningRate 0.0648   Epoch: 3   Global Step: 65030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:03,981-Speed 9458.66 samples/sec   Loss 8.0073   LearningRate 0.0648   Epoch: 3   Global Step: 65040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:05,071-Speed 9399.74 samples/sec   Loss 8.1053   LearningRate 0.0648   Epoch: 3   Global Step: 65050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:06,205-Speed 9035.36 samples/sec   Loss 7.9310   LearningRate 0.0648   Epoch: 3   Global Step: 65060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:07,319-Speed 9197.31 samples/sec   Loss 7.9385   LearningRate 0.0648   Epoch: 3   Global Step: 65070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:08,381-Speed 9653.11 samples/sec   Loss 7.8984   LearningRate 0.0648   Epoch: 3   Global Step: 65080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:09,460-Speed 9492.20 samples/sec   Loss 8.0434   LearningRate 0.0648   Epoch: 3   Global Step: 65090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:10,556-Speed 9346.63 samples/sec   Loss 7.9382   LearningRate 0.0648   Epoch: 3   Global Step: 65100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:11,677-Speed 9140.91 samples/sec   Loss 7.9864   LearningRate 0.0648   Epoch: 3   Global Step: 65110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:12,784-Speed 9254.87 samples/sec   Loss 7.9311   LearningRate 0.0648   Epoch: 3   Global Step: 65120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:13,852-Speed 9602.00 samples/sec   Loss 8.0205   LearningRate 0.0648   Epoch: 3   Global Step: 65130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:14,892-Speed 9851.29 samples/sec   Loss 7.9877   LearningRate 0.0648   Epoch: 3   Global Step: 65140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:16,015-Speed 9118.64 samples/sec   Loss 8.1050   LearningRate 0.0648   Epoch: 3   Global Step: 65150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:17,105-Speed 9402.86 samples/sec   Loss 7.9897   LearningRate 0.0648   Epoch: 3   Global Step: 65160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:18,155-Speed 9759.28 samples/sec   Loss 8.0740   LearningRate 0.0648   Epoch: 3   Global Step: 65170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:19,258-Speed 9289.49 samples/sec   Loss 7.9487   LearningRate 0.0648   Epoch: 3   Global Step: 65180   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:09:20,327-Speed 9581.58 samples/sec   Loss 7.9779   LearningRate 0.0648   Epoch: 3   Global Step: 65190   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:09:21,412-Speed 9447.90 samples/sec   Loss 8.0666   LearningRate 0.0648   Epoch: 3   Global Step: 65200   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:09:22,527-Speed 9187.18 samples/sec   Loss 8.1139   LearningRate 0.0647   Epoch: 3   Global Step: 65210   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:09:23,601-Speed 9533.05 samples/sec   Loss 7.9022   LearningRate 0.0647   Epoch: 3   Global Step: 65220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:24,644-Speed 9829.48 samples/sec   Loss 7.9829   LearningRate 0.0647   Epoch: 3   Global Step: 65230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:25,758-Speed 9192.27 samples/sec   Loss 8.1018   LearningRate 0.0647   Epoch: 3   Global Step: 65240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:26,840-Speed 9478.16 samples/sec   Loss 8.0775   LearningRate 0.0647   Epoch: 3   Global Step: 65250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:27,914-Speed 9534.98 samples/sec   Loss 8.0466   LearningRate 0.0647   Epoch: 3   Global Step: 65260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:28,954-Speed 9851.22 samples/sec   Loss 8.0221   LearningRate 0.0647   Epoch: 3   Global Step: 65270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:30,012-Speed 9687.09 samples/sec   Loss 7.9893   LearningRate 0.0647   Epoch: 3   Global Step: 65280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:31,049-Speed 9884.51 samples/sec   Loss 8.0942   LearningRate 0.0647   Epoch: 3   Global Step: 65290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:32,114-Speed 9614.82 samples/sec   Loss 8.0797   LearningRate 0.0647   Epoch: 3   Global Step: 65300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:33,231-Speed 9176.64 samples/sec   Loss 8.0160   LearningRate 0.0647   Epoch: 3   Global Step: 65310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:34,297-Speed 9617.60 samples/sec   Loss 8.0231   LearningRate 0.0647   Epoch: 3   Global Step: 65320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:35,371-Speed 9537.04 samples/sec   Loss 7.9912   LearningRate 0.0647   Epoch: 3   Global Step: 65330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:36,414-Speed 9820.12 samples/sec   Loss 8.1156   LearningRate 0.0647   Epoch: 3   Global Step: 65340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:37,453-Speed 9867.31 samples/sec   Loss 8.1842   LearningRate 0.0647   Epoch: 3   Global Step: 65350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:38,543-Speed 9405.22 samples/sec   Loss 7.8719   LearningRate 0.0647   Epoch: 3   Global Step: 65360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:39,638-Speed 9349.82 samples/sec   Loss 7.9718   LearningRate 0.0647   Epoch: 3   Global Step: 65370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:40,736-Speed 9331.38 samples/sec   Loss 7.9527   LearningRate 0.0647   Epoch: 3   Global Step: 65380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:41,827-Speed 9396.74 samples/sec   Loss 8.1064   LearningRate 0.0647   Epoch: 3   Global Step: 65390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:42,916-Speed 9405.48 samples/sec   Loss 7.9215   LearningRate 0.0647   Epoch: 3   Global Step: 65400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:44,003-Speed 9431.29 samples/sec   Loss 7.9619   LearningRate 0.0647   Epoch: 3   Global Step: 65410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:45,055-Speed 9739.87 samples/sec   Loss 7.9937   LearningRate 0.0646   Epoch: 3   Global Step: 65420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:46,149-Speed 9362.02 samples/sec   Loss 8.0421   LearningRate 0.0646   Epoch: 3   Global Step: 65430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:47,243-Speed 9366.90 samples/sec   Loss 8.0044   LearningRate 0.0646   Epoch: 3   Global Step: 65440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:09:48,322-Speed 9490.83 samples/sec   Loss 7.9631   LearningRate 0.0646   Epoch: 3   Global Step: 65450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:49,443-Speed 9141.65 samples/sec   Loss 8.0425   LearningRate 0.0646   Epoch: 3   Global Step: 65460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:50,565-Speed 9131.83 samples/sec   Loss 8.1015   LearningRate 0.0646   Epoch: 3   Global Step: 65470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:51,615-Speed 9759.67 samples/sec   Loss 8.1062   LearningRate 0.0646   Epoch: 3   Global Step: 65480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:52,683-Speed 9592.92 samples/sec   Loss 8.0420   LearningRate 0.0646   Epoch: 3   Global Step: 65490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:53,772-Speed 9409.26 samples/sec   Loss 7.8611   LearningRate 0.0646   Epoch: 3   Global Step: 65500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:54,867-Speed 9364.57 samples/sec   Loss 8.0058   LearningRate 0.0646   Epoch: 3   Global Step: 65510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:55,912-Speed 9796.43 samples/sec   Loss 7.9199   LearningRate 0.0646   Epoch: 3   Global Step: 65520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:56,982-Speed 9582.82 samples/sec   Loss 8.0225   LearningRate 0.0646   Epoch: 3   Global Step: 65530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:58,049-Speed 9601.51 samples/sec   Loss 7.9812   LearningRate 0.0646   Epoch: 3   Global Step: 65540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:09:59,098-Speed 9770.65 samples/sec   Loss 8.0134   LearningRate 0.0646   Epoch: 3   Global Step: 65550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:00,190-Speed 9377.83 samples/sec   Loss 7.9252   LearningRate 0.0646   Epoch: 3   Global Step: 65560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:01,259-Speed 9588.04 samples/sec   Loss 7.9848   LearningRate 0.0646   Epoch: 3   Global Step: 65570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:02,364-Speed 9275.06 samples/sec   Loss 8.0447   LearningRate 0.0646   Epoch: 3   Global Step: 65580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:03,428-Speed 9622.75 samples/sec   Loss 7.9897   LearningRate 0.0646   Epoch: 3   Global Step: 65590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:04,472-Speed 9812.79 samples/sec   Loss 8.0237   LearningRate 0.0646   Epoch: 3   Global Step: 65600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:05,553-Speed 9482.08 samples/sec   Loss 8.0374   LearningRate 0.0646   Epoch: 3   Global Step: 65610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:06,655-Speed 9296.64 samples/sec   Loss 8.0417   LearningRate 0.0645   Epoch: 3   Global Step: 65620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:07,703-Speed 9779.86 samples/sec   Loss 8.0344   LearningRate 0.0645   Epoch: 3   Global Step: 65630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:08,751-Speed 9778.16 samples/sec   Loss 7.9577   LearningRate 0.0645   Epoch: 3   Global Step: 65640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:09,856-Speed 9273.79 samples/sec   Loss 8.0463   LearningRate 0.0645   Epoch: 3   Global Step: 65650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:10,928-Speed 9567.11 samples/sec   Loss 8.0432   LearningRate 0.0645   Epoch: 3   Global Step: 65660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:11,998-Speed 9573.60 samples/sec   Loss 8.0637   LearningRate 0.0645   Epoch: 3   Global Step: 65670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:13,066-Speed 9589.17 samples/sec   Loss 7.9667   LearningRate 0.0645   Epoch: 3   Global Step: 65680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:14,129-Speed 9648.00 samples/sec   Loss 8.1192   LearningRate 0.0645   Epoch: 3   Global Step: 65690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:15,241-Speed 9217.76 samples/sec   Loss 8.0885   LearningRate 0.0645   Epoch: 3   Global Step: 65700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:16,297-Speed 9701.51 samples/sec   Loss 8.0942   LearningRate 0.0645   Epoch: 3   Global Step: 65710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:17,343-Speed 9791.93 samples/sec   Loss 8.0800   LearningRate 0.0645   Epoch: 3   Global Step: 65720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:18,417-Speed 9539.86 samples/sec   Loss 8.0560   LearningRate 0.0645   Epoch: 3   Global Step: 65730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:19,528-Speed 9218.76 samples/sec   Loss 8.1357   LearningRate 0.0645   Epoch: 3   Global Step: 65740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:20,573-Speed 9808.78 samples/sec   Loss 7.9669   LearningRate 0.0645   Epoch: 3   Global Step: 65750   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:10:21,660-Speed 9422.97 samples/sec   Loss 7.9796   LearningRate 0.0645   Epoch: 3   Global Step: 65760   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:10:22,744-Speed 9449.96 samples/sec   Loss 7.9333   LearningRate 0.0645   Epoch: 3   Global Step: 65770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:23,870-Speed 9104.26 samples/sec   Loss 8.0033   LearningRate 0.0645   Epoch: 3   Global Step: 65780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:24,950-Speed 9480.14 samples/sec   Loss 7.9725   LearningRate 0.0645   Epoch: 3   Global Step: 65790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:25,979-Speed 9956.96 samples/sec   Loss 8.0022   LearningRate 0.0645   Epoch: 3   Global Step: 65800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:27,038-Speed 9682.47 samples/sec   Loss 7.9188   LearningRate 0.0645   Epoch: 3   Global Step: 65810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:28,136-Speed 9326.94 samples/sec   Loss 8.0085   LearningRate 0.0645   Epoch: 3   Global Step: 65820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:29,226-Speed 9404.14 samples/sec   Loss 7.9952   LearningRate 0.0644   Epoch: 3   Global Step: 65830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:30,319-Speed 9375.85 samples/sec   Loss 8.1596   LearningRate 0.0644   Epoch: 3   Global Step: 65840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:31,394-Speed 9532.70 samples/sec   Loss 8.1137   LearningRate 0.0644   Epoch: 3   Global Step: 65850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:32,479-Speed 9444.87 samples/sec   Loss 8.0761   LearningRate 0.0644   Epoch: 3   Global Step: 65860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:33,550-Speed 9568.61 samples/sec   Loss 7.9996   LearningRate 0.0644   Epoch: 3   Global Step: 65870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:34,658-Speed 9245.20 samples/sec   Loss 8.0205   LearningRate 0.0644   Epoch: 3   Global Step: 65880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:35,763-Speed 9270.01 samples/sec   Loss 8.0969   LearningRate 0.0644   Epoch: 3   Global Step: 65890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:10:36,840-Speed 9512.48 samples/sec   Loss 7.9807   LearningRate 0.0644   Epoch: 3   Global Step: 65900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:37,914-Speed 9545.11 samples/sec   Loss 7.9965   LearningRate 0.0644   Epoch: 3   Global Step: 65910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:38,978-Speed 9622.91 samples/sec   Loss 7.9219   LearningRate 0.0644   Epoch: 3   Global Step: 65920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:40,040-Speed 9654.84 samples/sec   Loss 7.9535   LearningRate 0.0644   Epoch: 3   Global Step: 65930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:41,101-Speed 9652.48 samples/sec   Loss 7.9754   LearningRate 0.0644   Epoch: 3   Global Step: 65940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:42,175-Speed 9540.18 samples/sec   Loss 8.0615   LearningRate 0.0644   Epoch: 3   Global Step: 65950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:43,266-Speed 9394.28 samples/sec   Loss 7.8430   LearningRate 0.0644   Epoch: 3   Global Step: 65960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:44,354-Speed 9419.56 samples/sec   Loss 8.0604   LearningRate 0.0644   Epoch: 3   Global Step: 65970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:45,421-Speed 9603.78 samples/sec   Loss 8.0138   LearningRate 0.0644   Epoch: 3   Global Step: 65980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:46,546-Speed 9101.94 samples/sec   Loss 8.0194   LearningRate 0.0644   Epoch: 3   Global Step: 65990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:10:47,642-Speed 9357.12 samples/sec   Loss 7.9265   LearningRate 0.0644   Epoch: 3   Global Step: 66000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:11:09,384-[lfw][66000]XNorm: 12.267007
Training: 2022-04-11 14:11:09,385-[lfw][66000]Accuracy-Flip: 0.99517+-0.00302
Training: 2022-04-11 14:11:09,385-[lfw][66000]Accuracy-Highest: 0.99583
Training: 2022-04-11 14:11:34,523-[cfp_fp][66000]XNorm: 10.425723
Training: 2022-04-11 14:11:34,523-[cfp_fp][66000]Accuracy-Flip: 0.95171+-0.01142
Training: 2022-04-11 14:11:34,524-[cfp_fp][66000]Accuracy-Highest: 0.95171
Training: 2022-04-11 14:11:56,238-[agedb_30][66000]XNorm: 11.943447
Training: 2022-04-11 14:11:56,239-[agedb_30][66000]Accuracy-Flip: 0.96033+-0.01130
Training: 2022-04-11 14:11:56,239-[agedb_30][66000]Accuracy-Highest: 0.96033
Training: 2022-04-11 14:11:57,329-Speed 146.94 samples/sec   Loss 8.0687   LearningRate 0.0644   Epoch: 3   Global Step: 66010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:11:58,406-Speed 9510.36 samples/sec   Loss 8.0460   LearningRate 0.0644   Epoch: 3   Global Step: 66020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:11:59,522-Speed 9184.75 samples/sec   Loss 8.0151   LearningRate 0.0644   Epoch: 3   Global Step: 66030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:00,624-Speed 9292.57 samples/sec   Loss 7.9699   LearningRate 0.0643   Epoch: 3   Global Step: 66040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:01,717-Speed 9377.69 samples/sec   Loss 7.9313   LearningRate 0.0643   Epoch: 3   Global Step: 66050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:02,779-Speed 9647.95 samples/sec   Loss 8.0798   LearningRate 0.0643   Epoch: 3   Global Step: 66060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:03,818-Speed 9865.92 samples/sec   Loss 8.0049   LearningRate 0.0643   Epoch: 3   Global Step: 66070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:04,901-Speed 9461.22 samples/sec   Loss 8.1133   LearningRate 0.0643   Epoch: 3   Global Step: 66080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:05,970-Speed 9581.56 samples/sec   Loss 7.9738   LearningRate 0.0643   Epoch: 3   Global Step: 66090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:07,070-Speed 9314.56 samples/sec   Loss 8.1078   LearningRate 0.0643   Epoch: 3   Global Step: 66100   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:12:08,120-Speed 9766.04 samples/sec   Loss 7.9608   LearningRate 0.0643   Epoch: 3   Global Step: 66110   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:12:09,221-Speed 9299.68 samples/sec   Loss 7.9501   LearningRate 0.0643   Epoch: 3   Global Step: 66120   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:12:10,286-Speed 9626.65 samples/sec   Loss 7.9491   LearningRate 0.0643   Epoch: 3   Global Step: 66130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:11,335-Speed 9767.52 samples/sec   Loss 8.0248   LearningRate 0.0643   Epoch: 3   Global Step: 66140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:12,424-Speed 9406.48 samples/sec   Loss 8.0469   LearningRate 0.0643   Epoch: 3   Global Step: 66150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:13,524-Speed 9312.57 samples/sec   Loss 7.8567   LearningRate 0.0643   Epoch: 3   Global Step: 66160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:14,571-Speed 9787.90 samples/sec   Loss 7.9600   LearningRate 0.0643   Epoch: 3   Global Step: 66170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:15,656-Speed 9448.47 samples/sec   Loss 8.0852   LearningRate 0.0643   Epoch: 3   Global Step: 66180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:16,739-Speed 9457.41 samples/sec   Loss 7.9802   LearningRate 0.0643   Epoch: 3   Global Step: 66190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:17,796-Speed 9691.78 samples/sec   Loss 7.8552   LearningRate 0.0643   Epoch: 3   Global Step: 66200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:18,857-Speed 9664.42 samples/sec   Loss 7.9142   LearningRate 0.0643   Epoch: 3   Global Step: 66210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:19,942-Speed 9437.62 samples/sec   Loss 8.0451   LearningRate 0.0643   Epoch: 3   Global Step: 66220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:20,995-Speed 9735.25 samples/sec   Loss 7.9973   LearningRate 0.0643   Epoch: 3   Global Step: 66230   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:12:22,103-Speed 9246.55 samples/sec   Loss 8.0846   LearningRate 0.0643   Epoch: 3   Global Step: 66240   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:12:23,191-Speed 9412.03 samples/sec   Loss 7.9105   LearningRate 0.0642   Epoch: 3   Global Step: 66250   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:12:24,258-Speed 9610.54 samples/sec   Loss 7.8883   LearningRate 0.0642   Epoch: 3   Global Step: 66260   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:12:25,349-Speed 9384.62 samples/sec   Loss 7.8899   LearningRate 0.0642   Epoch: 3   Global Step: 66270   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:12:26,387-Speed 9875.32 samples/sec   Loss 8.0198   LearningRate 0.0642   Epoch: 3   Global Step: 66280   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:12:27,464-Speed 9515.41 samples/sec   Loss 8.0110   LearningRate 0.0642   Epoch: 3   Global Step: 66290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:28,554-Speed 9398.50 samples/sec   Loss 7.9022   LearningRate 0.0642   Epoch: 3   Global Step: 66300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:29,701-Speed 8935.45 samples/sec   Loss 8.0031   LearningRate 0.0642   Epoch: 3   Global Step: 66310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:30,802-Speed 9303.36 samples/sec   Loss 7.9696   LearningRate 0.0642   Epoch: 3   Global Step: 66320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:31,951-Speed 8914.56 samples/sec   Loss 8.0097   LearningRate 0.0642   Epoch: 3   Global Step: 66330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:33,029-Speed 9505.55 samples/sec   Loss 7.9180   LearningRate 0.0642   Epoch: 3   Global Step: 66340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:34,123-Speed 9369.78 samples/sec   Loss 7.9581   LearningRate 0.0642   Epoch: 3   Global Step: 66350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:35,175-Speed 9740.50 samples/sec   Loss 8.0000   LearningRate 0.0642   Epoch: 3   Global Step: 66360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:36,270-Speed 9355.14 samples/sec   Loss 7.9176   LearningRate 0.0642   Epoch: 3   Global Step: 66370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:37,372-Speed 9298.01 samples/sec   Loss 8.0473   LearningRate 0.0642   Epoch: 3   Global Step: 66380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:38,437-Speed 9622.50 samples/sec   Loss 8.0029   LearningRate 0.0642   Epoch: 3   Global Step: 66390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:39,488-Speed 9750.47 samples/sec   Loss 7.8504   LearningRate 0.0642   Epoch: 3   Global Step: 66400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:40,584-Speed 9345.98 samples/sec   Loss 8.0468   LearningRate 0.0642   Epoch: 3   Global Step: 66410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:41,694-Speed 9235.33 samples/sec   Loss 7.9567   LearningRate 0.0642   Epoch: 3   Global Step: 66420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:42,771-Speed 9506.37 samples/sec   Loss 8.0576   LearningRate 0.0642   Epoch: 3   Global Step: 66430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:43,884-Speed 9211.56 samples/sec   Loss 7.9668   LearningRate 0.0642   Epoch: 3   Global Step: 66440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:44,989-Speed 9273.10 samples/sec   Loss 7.9933   LearningRate 0.0642   Epoch: 3   Global Step: 66450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:46,089-Speed 9312.29 samples/sec   Loss 8.0357   LearningRate 0.0641   Epoch: 3   Global Step: 66460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:47,170-Speed 9474.76 samples/sec   Loss 7.9581   LearningRate 0.0641   Epoch: 3   Global Step: 66470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:48,203-Speed 9923.65 samples/sec   Loss 8.0215   LearningRate 0.0641   Epoch: 3   Global Step: 66480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:49,262-Speed 9676.35 samples/sec   Loss 7.9720   LearningRate 0.0641   Epoch: 3   Global Step: 66490   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:12:50,314-Speed 9733.24 samples/sec   Loss 7.9653   LearningRate 0.0641   Epoch: 3   Global Step: 66500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:51,365-Speed 9750.17 samples/sec   Loss 7.9464   LearningRate 0.0641   Epoch: 3   Global Step: 66510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:52,431-Speed 9615.49 samples/sec   Loss 8.0044   LearningRate 0.0641   Epoch: 3   Global Step: 66520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:53,502-Speed 9569.03 samples/sec   Loss 7.9836   LearningRate 0.0641   Epoch: 3   Global Step: 66530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:12:54,577-Speed 9532.06 samples/sec   Loss 7.9938   LearningRate 0.0641   Epoch: 3   Global Step: 66540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:12:55,662-Speed 9438.34 samples/sec   Loss 7.9005   LearningRate 0.0641   Epoch: 3   Global Step: 66550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:12:56,721-Speed 9676.88 samples/sec   Loss 7.9611   LearningRate 0.0641   Epoch: 3   Global Step: 66560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:12:57,824-Speed 9296.24 samples/sec   Loss 7.9384   LearningRate 0.0641   Epoch: 3   Global Step: 66570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:12:58,891-Speed 9602.14 samples/sec   Loss 7.9913   LearningRate 0.0641   Epoch: 3   Global Step: 66580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:12:59,926-Speed 9894.57 samples/sec   Loss 7.9702   LearningRate 0.0641   Epoch: 3   Global Step: 66590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:13:01,003-Speed 9513.01 samples/sec   Loss 8.1107   LearningRate 0.0641   Epoch: 3   Global Step: 66600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:13:02,094-Speed 9393.28 samples/sec   Loss 7.9806   LearningRate 0.0641   Epoch: 3   Global Step: 66610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:13:03,155-Speed 9657.22 samples/sec   Loss 8.0496   LearningRate 0.0641   Epoch: 3   Global Step: 66620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:13:04,239-Speed 9451.47 samples/sec   Loss 7.9840   LearningRate 0.0641   Epoch: 3   Global Step: 66630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:13:05,363-Speed 9109.99 samples/sec   Loss 8.1048   LearningRate 0.0641   Epoch: 3   Global Step: 66640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:13:06,431-Speed 9599.31 samples/sec   Loss 7.9175   LearningRate 0.0641   Epoch: 3   Global Step: 66650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:13:07,543-Speed 9215.41 samples/sec   Loss 7.9790   LearningRate 0.0640   Epoch: 3   Global Step: 66660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:13:08,644-Speed 9305.37 samples/sec   Loss 7.7761   LearningRate 0.0640   Epoch: 3   Global Step: 66670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:13:09,758-Speed 9196.82 samples/sec   Loss 7.9774   LearningRate 0.0640   Epoch: 3   Global Step: 66680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:13:10,858-Speed 9314.47 samples/sec   Loss 7.9859   LearningRate 0.0640   Epoch: 3   Global Step: 66690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:13:11,921-Speed 9639.94 samples/sec   Loss 8.0377   LearningRate 0.0640   Epoch: 3   Global Step: 66700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:13:13,031-Speed 9235.51 samples/sec   Loss 8.0559   LearningRate 0.0640   Epoch: 3   Global Step: 66710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:13:14,098-Speed 9598.49 samples/sec   Loss 8.0074   LearningRate 0.0640   Epoch: 3   Global Step: 66720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:13:15,185-Speed 9428.92 samples/sec   Loss 8.0217   LearningRate 0.0640   Epoch: 3   Global Step: 66730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:13:16,251-Speed 9609.80 samples/sec   Loss 8.0229   LearningRate 0.0640   Epoch: 3   Global Step: 66740   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:13:17,295-Speed 9811.11 samples/sec   Loss 7.8852   LearningRate 0.0640   Epoch: 3   Global Step: 66750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:13:18,562-Speed 8085.91 samples/sec   Loss 8.1504   LearningRate 0.0640   Epoch: 3   Global Step: 66760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:13:53,232-Speed 295.37 samples/sec   Loss 7.4094   LearningRate 0.0640   Epoch: 4   Global Step: 66770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:13:54,852-Speed 6328.04 samples/sec   Loss 7.0900   LearningRate 0.0640   Epoch: 4   Global Step: 66780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:13:56,269-Speed 7231.13 samples/sec   Loss 7.2590   LearningRate 0.0640   Epoch: 4   Global Step: 66790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:13:57,391-Speed 9135.87 samples/sec   Loss 7.1894   LearningRate 0.0640   Epoch: 4   Global Step: 66800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:13:58,829-Speed 7121.11 samples/sec   Loss 7.2561   LearningRate 0.0640   Epoch: 4   Global Step: 66810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:00,278-Speed 7073.57 samples/sec   Loss 7.1453   LearningRate 0.0640   Epoch: 4   Global Step: 66820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:01,599-Speed 7758.31 samples/sec   Loss 7.1766   LearningRate 0.0640   Epoch: 4   Global Step: 66830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:02,713-Speed 9194.97 samples/sec   Loss 7.2115   LearningRate 0.0640   Epoch: 4   Global Step: 66840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:03,799-Speed 9438.58 samples/sec   Loss 7.1589   LearningRate 0.0640   Epoch: 4   Global Step: 66850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:05,362-Speed 6553.61 samples/sec   Loss 7.2286   LearningRate 0.0640   Epoch: 4   Global Step: 66860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:06,735-Speed 7461.28 samples/sec   Loss 7.2590   LearningRate 0.0639   Epoch: 4   Global Step: 66870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:07,833-Speed 9327.17 samples/sec   Loss 7.2228   LearningRate 0.0639   Epoch: 4   Global Step: 66880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:08,939-Speed 9269.98 samples/sec   Loss 7.2232   LearningRate 0.0639   Epoch: 4   Global Step: 66890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:10,017-Speed 9497.22 samples/sec   Loss 7.1971   LearningRate 0.0639   Epoch: 4   Global Step: 66900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:11,101-Speed 9452.43 samples/sec   Loss 7.2997   LearningRate 0.0639   Epoch: 4   Global Step: 66910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:12,212-Speed 9225.10 samples/sec   Loss 7.1636   LearningRate 0.0639   Epoch: 4   Global Step: 66920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:13,303-Speed 9394.22 samples/sec   Loss 7.0216   LearningRate 0.0639   Epoch: 4   Global Step: 66930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:14,391-Speed 9414.75 samples/sec   Loss 7.2075   LearningRate 0.0639   Epoch: 4   Global Step: 66940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:15,523-Speed 9057.62 samples/sec   Loss 7.2688   LearningRate 0.0639   Epoch: 4   Global Step: 66950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:16,593-Speed 9578.22 samples/sec   Loss 7.1929   LearningRate 0.0639   Epoch: 4   Global Step: 66960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:17,654-Speed 9652.21 samples/sec   Loss 7.1791   LearningRate 0.0639   Epoch: 4   Global Step: 66970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:18,735-Speed 9480.91 samples/sec   Loss 7.2726   LearningRate 0.0639   Epoch: 4   Global Step: 66980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:19,853-Speed 9160.87 samples/sec   Loss 7.1900   LearningRate 0.0639   Epoch: 4   Global Step: 66990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:20,945-Speed 9384.00 samples/sec   Loss 7.2463   LearningRate 0.0639   Epoch: 4   Global Step: 67000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:22,023-Speed 9508.18 samples/sec   Loss 7.1563   LearningRate 0.0639   Epoch: 4   Global Step: 67010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:23,302-Speed 8006.81 samples/sec   Loss 7.2220   LearningRate 0.0639   Epoch: 4   Global Step: 67020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:24,504-Speed 8523.52 samples/sec   Loss 7.2982   LearningRate 0.0639   Epoch: 4   Global Step: 67030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:25,590-Speed 9433.81 samples/sec   Loss 7.1182   LearningRate 0.0639   Epoch: 4   Global Step: 67040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:26,719-Speed 9075.32 samples/sec   Loss 7.2586   LearningRate 0.0639   Epoch: 4   Global Step: 67050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:27,812-Speed 9382.68 samples/sec   Loss 7.1760   LearningRate 0.0639   Epoch: 4   Global Step: 67060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:28,901-Speed 9411.62 samples/sec   Loss 7.1952   LearningRate 0.0639   Epoch: 4   Global Step: 67070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:29,968-Speed 9601.28 samples/sec   Loss 7.2969   LearningRate 0.0638   Epoch: 4   Global Step: 67080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:31,057-Speed 9404.77 samples/sec   Loss 7.3231   LearningRate 0.0638   Epoch: 4   Global Step: 67090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:32,173-Speed 9184.47 samples/sec   Loss 7.2055   LearningRate 0.0638   Epoch: 4   Global Step: 67100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:33,228-Speed 9708.40 samples/sec   Loss 7.3040   LearningRate 0.0638   Epoch: 4   Global Step: 67110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:34,322-Speed 9370.14 samples/sec   Loss 7.2275   LearningRate 0.0638   Epoch: 4   Global Step: 67120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:35,436-Speed 9198.08 samples/sec   Loss 7.1912   LearningRate 0.0638   Epoch: 4   Global Step: 67130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:36,525-Speed 9410.05 samples/sec   Loss 7.2021   LearningRate 0.0638   Epoch: 4   Global Step: 67140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:37,602-Speed 9517.88 samples/sec   Loss 7.2906   LearningRate 0.0638   Epoch: 4   Global Step: 67150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:38,732-Speed 9062.69 samples/sec   Loss 7.3565   LearningRate 0.0638   Epoch: 4   Global Step: 67160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:39,801-Speed 9585.10 samples/sec   Loss 7.3511   LearningRate 0.0638   Epoch: 4   Global Step: 67170   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:14:40,868-Speed 9608.86 samples/sec   Loss 7.2775   LearningRate 0.0638   Epoch: 4   Global Step: 67180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:41,961-Speed 9369.07 samples/sec   Loss 7.2377   LearningRate 0.0638   Epoch: 4   Global Step: 67190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:43,104-Speed 8966.75 samples/sec   Loss 7.1536   LearningRate 0.0638   Epoch: 4   Global Step: 67200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:44,187-Speed 9460.90 samples/sec   Loss 7.2879   LearningRate 0.0638   Epoch: 4   Global Step: 67210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:45,270-Speed 9456.52 samples/sec   Loss 7.2592   LearningRate 0.0638   Epoch: 4   Global Step: 67220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:46,342-Speed 9561.65 samples/sec   Loss 7.4487   LearningRate 0.0638   Epoch: 4   Global Step: 67230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:47,429-Speed 9430.11 samples/sec   Loss 7.3110   LearningRate 0.0638   Epoch: 4   Global Step: 67240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:48,500-Speed 9569.88 samples/sec   Loss 7.3158   LearningRate 0.0638   Epoch: 4   Global Step: 67250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:49,581-Speed 9479.81 samples/sec   Loss 7.2538   LearningRate 0.0638   Epoch: 4   Global Step: 67260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:50,657-Speed 9519.14 samples/sec   Loss 7.2195   LearningRate 0.0638   Epoch: 4   Global Step: 67270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:51,751-Speed 9368.17 samples/sec   Loss 7.3283   LearningRate 0.0638   Epoch: 4   Global Step: 67280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:52,850-Speed 9321.19 samples/sec   Loss 7.2468   LearningRate 0.0637   Epoch: 4   Global Step: 67290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:53,963-Speed 9203.19 samples/sec   Loss 7.1727   LearningRate 0.0637   Epoch: 4   Global Step: 67300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:14:55,054-Speed 9395.49 samples/sec   Loss 7.3411   LearningRate 0.0637   Epoch: 4   Global Step: 67310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:56,511-Speed 7034.17 samples/sec   Loss 7.3170   LearningRate 0.0637   Epoch: 4   Global Step: 67320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:58,368-Speed 5514.76 samples/sec   Loss 7.2364   LearningRate 0.0637   Epoch: 4   Global Step: 67330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:14:59,693-Speed 7735.77 samples/sec   Loss 7.3485   LearningRate 0.0637   Epoch: 4   Global Step: 67340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:00,777-Speed 9450.85 samples/sec   Loss 7.2279   LearningRate 0.0637   Epoch: 4   Global Step: 67350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:01,881-Speed 9283.60 samples/sec   Loss 7.4170   LearningRate 0.0637   Epoch: 4   Global Step: 67360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:03,000-Speed 9150.17 samples/sec   Loss 7.2235   LearningRate 0.0637   Epoch: 4   Global Step: 67370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:04,150-Speed 8914.70 samples/sec   Loss 7.2001   LearningRate 0.0637   Epoch: 4   Global Step: 67380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:05,277-Speed 9089.91 samples/sec   Loss 7.3621   LearningRate 0.0637   Epoch: 4   Global Step: 67390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:06,356-Speed 9504.02 samples/sec   Loss 7.2358   LearningRate 0.0637   Epoch: 4   Global Step: 67400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:07,428-Speed 9554.60 samples/sec   Loss 7.3059   LearningRate 0.0637   Epoch: 4   Global Step: 67410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:08,491-Speed 9657.99 samples/sec   Loss 7.3097   LearningRate 0.0637   Epoch: 4   Global Step: 67420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:09,584-Speed 9367.92 samples/sec   Loss 7.3080   LearningRate 0.0637   Epoch: 4   Global Step: 67430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:10,657-Speed 9548.91 samples/sec   Loss 7.3150   LearningRate 0.0637   Epoch: 4   Global Step: 67440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:11,739-Speed 9474.18 samples/sec   Loss 7.4047   LearningRate 0.0637   Epoch: 4   Global Step: 67450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:12,861-Speed 9129.83 samples/sec   Loss 7.3297   LearningRate 0.0637   Epoch: 4   Global Step: 67460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:13,972-Speed 9222.96 samples/sec   Loss 7.2538   LearningRate 0.0637   Epoch: 4   Global Step: 67470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:15,057-Speed 9445.44 samples/sec   Loss 7.3564   LearningRate 0.0637   Epoch: 4   Global Step: 67480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:16,113-Speed 9703.85 samples/sec   Loss 7.3896   LearningRate 0.0637   Epoch: 4   Global Step: 67490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:17,194-Speed 9480.15 samples/sec   Loss 7.3555   LearningRate 0.0636   Epoch: 4   Global Step: 67500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:18,274-Speed 9481.97 samples/sec   Loss 7.4023   LearningRate 0.0636   Epoch: 4   Global Step: 67510   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:15:19,338-Speed 9630.59 samples/sec   Loss 7.3544   LearningRate 0.0636   Epoch: 4   Global Step: 67520   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:15:20,399-Speed 9656.04 samples/sec   Loss 7.4549   LearningRate 0.0636   Epoch: 4   Global Step: 67530   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:15:21,487-Speed 9418.97 samples/sec   Loss 7.2568   LearningRate 0.0636   Epoch: 4   Global Step: 67540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:22,608-Speed 9144.62 samples/sec   Loss 7.2584   LearningRate 0.0636   Epoch: 4   Global Step: 67550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:23,665-Speed 9687.99 samples/sec   Loss 7.3908   LearningRate 0.0636   Epoch: 4   Global Step: 67560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:24,701-Speed 9896.76 samples/sec   Loss 7.3087   LearningRate 0.0636   Epoch: 4   Global Step: 67570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:25,774-Speed 9544.33 samples/sec   Loss 7.3013   LearningRate 0.0636   Epoch: 4   Global Step: 67580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:26,868-Speed 9367.94 samples/sec   Loss 7.3206   LearningRate 0.0636   Epoch: 4   Global Step: 67590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:27,940-Speed 9556.80 samples/sec   Loss 7.3031   LearningRate 0.0636   Epoch: 4   Global Step: 67600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:28,975-Speed 9902.30 samples/sec   Loss 7.3324   LearningRate 0.0636   Epoch: 4   Global Step: 67610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:30,063-Speed 9424.55 samples/sec   Loss 7.3607   LearningRate 0.0636   Epoch: 4   Global Step: 67620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:31,135-Speed 9554.06 samples/sec   Loss 7.3618   LearningRate 0.0636   Epoch: 4   Global Step: 67630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:32,219-Speed 9450.14 samples/sec   Loss 7.4759   LearningRate 0.0636   Epoch: 4   Global Step: 67640   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:15:33,269-Speed 9767.43 samples/sec   Loss 7.3442   LearningRate 0.0636   Epoch: 4   Global Step: 67650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:34,360-Speed 9395.95 samples/sec   Loss 7.3519   LearningRate 0.0636   Epoch: 4   Global Step: 67660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:35,443-Speed 9461.64 samples/sec   Loss 7.2099   LearningRate 0.0636   Epoch: 4   Global Step: 67670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:36,520-Speed 9515.61 samples/sec   Loss 7.2891   LearningRate 0.0636   Epoch: 4   Global Step: 67680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:37,598-Speed 9503.83 samples/sec   Loss 7.4198   LearningRate 0.0636   Epoch: 4   Global Step: 67690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:38,683-Speed 9441.15 samples/sec   Loss 7.3173   LearningRate 0.0636   Epoch: 4   Global Step: 67700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:39,732-Speed 9762.12 samples/sec   Loss 7.2977   LearningRate 0.0635   Epoch: 4   Global Step: 67710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:40,793-Speed 9655.01 samples/sec   Loss 7.3151   LearningRate 0.0635   Epoch: 4   Global Step: 67720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:41,911-Speed 9167.46 samples/sec   Loss 7.3396   LearningRate 0.0635   Epoch: 4   Global Step: 67730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:43,016-Speed 9274.71 samples/sec   Loss 7.3969   LearningRate 0.0635   Epoch: 4   Global Step: 67740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:44,089-Speed 9546.36 samples/sec   Loss 7.4908   LearningRate 0.0635   Epoch: 4   Global Step: 67750   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:15:45,159-Speed 9574.97 samples/sec   Loss 7.4151   LearningRate 0.0635   Epoch: 4   Global Step: 67760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:46,231-Speed 9556.12 samples/sec   Loss 7.3862   LearningRate 0.0635   Epoch: 4   Global Step: 67770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:47,330-Speed 9327.83 samples/sec   Loss 7.2714   LearningRate 0.0635   Epoch: 4   Global Step: 67780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:48,427-Speed 9342.06 samples/sec   Loss 7.4462   LearningRate 0.0635   Epoch: 4   Global Step: 67790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:49,525-Speed 9327.68 samples/sec   Loss 7.4281   LearningRate 0.0635   Epoch: 4   Global Step: 67800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:50,605-Speed 9489.51 samples/sec   Loss 7.3758   LearningRate 0.0635   Epoch: 4   Global Step: 67810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:51,739-Speed 9036.96 samples/sec   Loss 7.3580   LearningRate 0.0635   Epoch: 4   Global Step: 67820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:52,808-Speed 9579.30 samples/sec   Loss 7.4268   LearningRate 0.0635   Epoch: 4   Global Step: 67830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:53,894-Speed 9433.78 samples/sec   Loss 7.3728   LearningRate 0.0635   Epoch: 4   Global Step: 67840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:54,970-Speed 9528.74 samples/sec   Loss 7.4795   LearningRate 0.0635   Epoch: 4   Global Step: 67850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:56,036-Speed 9610.75 samples/sec   Loss 7.5150   LearningRate 0.0635   Epoch: 4   Global Step: 67860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:57,214-Speed 8692.01 samples/sec   Loss 7.4593   LearningRate 0.0635   Epoch: 4   Global Step: 67870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:58,287-Speed 9557.54 samples/sec   Loss 7.3248   LearningRate 0.0635   Epoch: 4   Global Step: 67880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:15:59,373-Speed 9433.75 samples/sec   Loss 7.4443   LearningRate 0.0635   Epoch: 4   Global Step: 67890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:16:00,453-Speed 9482.22 samples/sec   Loss 7.3617   LearningRate 0.0635   Epoch: 4   Global Step: 67900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:16:01,511-Speed 9688.54 samples/sec   Loss 7.4159   LearningRate 0.0635   Epoch: 4   Global Step: 67910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:16:02,590-Speed 9492.86 samples/sec   Loss 7.3818   LearningRate 0.0634   Epoch: 4   Global Step: 67920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:16:03,667-Speed 9511.11 samples/sec   Loss 7.3483   LearningRate 0.0634   Epoch: 4   Global Step: 67930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:16:04,747-Speed 9496.14 samples/sec   Loss 7.3625   LearningRate 0.0634   Epoch: 4   Global Step: 67940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:16:05,848-Speed 9304.17 samples/sec   Loss 7.3534   LearningRate 0.0634   Epoch: 4   Global Step: 67950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:16:06,937-Speed 9409.17 samples/sec   Loss 7.4258   LearningRate 0.0634   Epoch: 4   Global Step: 67960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:16:08,003-Speed 9615.80 samples/sec   Loss 7.4294   LearningRate 0.0634   Epoch: 4   Global Step: 67970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:16:09,106-Speed 9286.53 samples/sec   Loss 7.3742   LearningRate 0.0634   Epoch: 4   Global Step: 67980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:16:10,170-Speed 9626.25 samples/sec   Loss 7.4652   LearningRate 0.0634   Epoch: 4   Global Step: 67990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:16:11,237-Speed 9606.36 samples/sec   Loss 7.4492   LearningRate 0.0634   Epoch: 4   Global Step: 68000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:16:33,360-[lfw][68000]XNorm: 12.272347
Training: 2022-04-11 14:16:33,361-[lfw][68000]Accuracy-Flip: 0.99533+-0.00267
Training: 2022-04-11 14:16:33,361-[lfw][68000]Accuracy-Highest: 0.99583
Training: 2022-04-11 14:16:58,876-[cfp_fp][68000]XNorm: 10.460671
Training: 2022-04-11 14:16:58,877-[cfp_fp][68000]Accuracy-Flip: 0.95014+-0.01296
Training: 2022-04-11 14:16:58,877-[cfp_fp][68000]Accuracy-Highest: 0.95171
Training: 2022-04-11 14:17:20,897-[agedb_30][68000]XNorm: 11.886236
Training: 2022-04-11 14:17:20,898-[agedb_30][68000]Accuracy-Flip: 0.95700+-0.00710
Training: 2022-04-11 14:17:20,898-[agedb_30][68000]Accuracy-Highest: 0.96033
Training: 2022-04-11 14:17:21,965-Speed 144.78 samples/sec   Loss 7.4270   LearningRate 0.0634   Epoch: 4   Global Step: 68010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:17:23,063-Speed 9327.98 samples/sec   Loss 7.4832   LearningRate 0.0634   Epoch: 4   Global Step: 68020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:17:24,165-Speed 9296.68 samples/sec   Loss 7.5107   LearningRate 0.0634   Epoch: 4   Global Step: 68030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:17:25,259-Speed 9366.83 samples/sec   Loss 7.4646   LearningRate 0.0634   Epoch: 4   Global Step: 68040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:17:26,395-Speed 9026.60 samples/sec   Loss 7.5099   LearningRate 0.0634   Epoch: 4   Global Step: 68050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:17:27,454-Speed 9672.35 samples/sec   Loss 7.5626   LearningRate 0.0634   Epoch: 4   Global Step: 68060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:17:28,550-Speed 9353.29 samples/sec   Loss 7.4349   LearningRate 0.0634   Epoch: 4   Global Step: 68070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:17:29,671-Speed 9140.16 samples/sec   Loss 7.4486   LearningRate 0.0634   Epoch: 4   Global Step: 68080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:30,773-Speed 9295.43 samples/sec   Loss 7.4644   LearningRate 0.0634   Epoch: 4   Global Step: 68090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:31,901-Speed 9087.28 samples/sec   Loss 7.5522   LearningRate 0.0634   Epoch: 4   Global Step: 68100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:32,982-Speed 9471.84 samples/sec   Loss 7.5020   LearningRate 0.0634   Epoch: 4   Global Step: 68110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:34,041-Speed 9678.88 samples/sec   Loss 7.4806   LearningRate 0.0634   Epoch: 4   Global Step: 68120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:35,143-Speed 9304.13 samples/sec   Loss 7.4374   LearningRate 0.0633   Epoch: 4   Global Step: 68130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:36,225-Speed 9464.99 samples/sec   Loss 7.5626   LearningRate 0.0633   Epoch: 4   Global Step: 68140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:37,279-Speed 9722.99 samples/sec   Loss 7.4898   LearningRate 0.0633   Epoch: 4   Global Step: 68150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:38,418-Speed 8995.48 samples/sec   Loss 7.3994   LearningRate 0.0633   Epoch: 4   Global Step: 68160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:39,542-Speed 9115.97 samples/sec   Loss 7.4380   LearningRate 0.0633   Epoch: 4   Global Step: 68170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:40,642-Speed 9313.10 samples/sec   Loss 7.5164   LearningRate 0.0633   Epoch: 4   Global Step: 68180   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:17:41,766-Speed 9118.28 samples/sec   Loss 7.4364   LearningRate 0.0633   Epoch: 4   Global Step: 68190   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:17:42,844-Speed 9499.53 samples/sec   Loss 7.4790   LearningRate 0.0633   Epoch: 4   Global Step: 68200   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:17:43,944-Speed 9315.47 samples/sec   Loss 7.4580   LearningRate 0.0633   Epoch: 4   Global Step: 68210   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:17:45,056-Speed 9214.68 samples/sec   Loss 7.5037   LearningRate 0.0633   Epoch: 4   Global Step: 68220   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:17:46,136-Speed 9487.27 samples/sec   Loss 7.4229   LearningRate 0.0633   Epoch: 4   Global Step: 68230   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:17:47,217-Speed 9480.68 samples/sec   Loss 7.5954   LearningRate 0.0633   Epoch: 4   Global Step: 68240   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:17:48,294-Speed 9513.98 samples/sec   Loss 7.3525   LearningRate 0.0633   Epoch: 4   Global Step: 68250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:49,365-Speed 9559.87 samples/sec   Loss 7.4272   LearningRate 0.0633   Epoch: 4   Global Step: 68260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:50,403-Speed 9874.76 samples/sec   Loss 7.4124   LearningRate 0.0633   Epoch: 4   Global Step: 68270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:51,514-Speed 9226.01 samples/sec   Loss 7.5711   LearningRate 0.0633   Epoch: 4   Global Step: 68280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:52,641-Speed 9084.48 samples/sec   Loss 7.4383   LearningRate 0.0633   Epoch: 4   Global Step: 68290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:53,745-Speed 9288.61 samples/sec   Loss 7.5615   LearningRate 0.0633   Epoch: 4   Global Step: 68300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:54,838-Speed 9375.61 samples/sec   Loss 7.3592   LearningRate 0.0633   Epoch: 4   Global Step: 68310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:55,951-Speed 9199.91 samples/sec   Loss 7.5768   LearningRate 0.0633   Epoch: 4   Global Step: 68320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:57,055-Speed 9282.38 samples/sec   Loss 7.4562   LearningRate 0.0633   Epoch: 4   Global Step: 68330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:58,127-Speed 9565.81 samples/sec   Loss 7.5039   LearningRate 0.0632   Epoch: 4   Global Step: 68340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:17:59,202-Speed 9526.50 samples/sec   Loss 7.4781   LearningRate 0.0632   Epoch: 4   Global Step: 68350   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:18:00,282-Speed 9486.09 samples/sec   Loss 7.4768   LearningRate 0.0632   Epoch: 4   Global Step: 68360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:01,378-Speed 9350.69 samples/sec   Loss 7.5977   LearningRate 0.0632   Epoch: 4   Global Step: 68370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:02,450-Speed 9558.17 samples/sec   Loss 7.4301   LearningRate 0.0632   Epoch: 4   Global Step: 68380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:03,558-Speed 9248.45 samples/sec   Loss 7.5740   LearningRate 0.0632   Epoch: 4   Global Step: 68390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:04,654-Speed 9348.25 samples/sec   Loss 7.5275   LearningRate 0.0632   Epoch: 4   Global Step: 68400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:05,775-Speed 9135.31 samples/sec   Loss 7.5092   LearningRate 0.0632   Epoch: 4   Global Step: 68410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:06,907-Speed 9054.48 samples/sec   Loss 7.4854   LearningRate 0.0632   Epoch: 4   Global Step: 68420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:08,013-Speed 9260.76 samples/sec   Loss 7.5379   LearningRate 0.0632   Epoch: 4   Global Step: 68430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:09,093-Speed 9494.68 samples/sec   Loss 7.4739   LearningRate 0.0632   Epoch: 4   Global Step: 68440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:10,176-Speed 9453.50 samples/sec   Loss 7.4928   LearningRate 0.0632   Epoch: 4   Global Step: 68450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:11,281-Speed 9276.99 samples/sec   Loss 7.4593   LearningRate 0.0632   Epoch: 4   Global Step: 68460   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:18:12,346-Speed 9621.96 samples/sec   Loss 7.5984   LearningRate 0.0632   Epoch: 4   Global Step: 68470   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:18:13,433-Speed 9422.01 samples/sec   Loss 7.5428   LearningRate 0.0632   Epoch: 4   Global Step: 68480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:14,487-Speed 9728.78 samples/sec   Loss 7.5258   LearningRate 0.0632   Epoch: 4   Global Step: 68490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:15,571-Speed 9450.87 samples/sec   Loss 7.5249   LearningRate 0.0632   Epoch: 4   Global Step: 68500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:16,644-Speed 9549.05 samples/sec   Loss 7.5697   LearningRate 0.0632   Epoch: 4   Global Step: 68510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:17,742-Speed 9329.66 samples/sec   Loss 7.4553   LearningRate 0.0632   Epoch: 4   Global Step: 68520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:18,823-Speed 9480.09 samples/sec   Loss 7.5288   LearningRate 0.0632   Epoch: 4   Global Step: 68530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:19,928-Speed 9266.49 samples/sec   Loss 7.4767   LearningRate 0.0632   Epoch: 4   Global Step: 68540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:20,971-Speed 9830.42 samples/sec   Loss 7.4960   LearningRate 0.0631   Epoch: 4   Global Step: 68550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:22,045-Speed 9537.92 samples/sec   Loss 7.4474   LearningRate 0.0631   Epoch: 4   Global Step: 68560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:23,130-Speed 9440.01 samples/sec   Loss 7.3279   LearningRate 0.0631   Epoch: 4   Global Step: 68570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:24,217-Speed 9426.07 samples/sec   Loss 7.4676   LearningRate 0.0631   Epoch: 4   Global Step: 68580   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:18:25,290-Speed 9548.68 samples/sec   Loss 7.5069   LearningRate 0.0631   Epoch: 4   Global Step: 68590   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:18:26,359-Speed 9587.07 samples/sec   Loss 7.6188   LearningRate 0.0631   Epoch: 4   Global Step: 68600   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:18:27,487-Speed 9082.00 samples/sec   Loss 7.5660   LearningRate 0.0631   Epoch: 4   Global Step: 68610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:28,627-Speed 8991.15 samples/sec   Loss 7.4796   LearningRate 0.0631   Epoch: 4   Global Step: 68620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:29,735-Speed 9248.02 samples/sec   Loss 7.5151   LearningRate 0.0631   Epoch: 4   Global Step: 68630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:30,804-Speed 9586.75 samples/sec   Loss 7.5127   LearningRate 0.0631   Epoch: 4   Global Step: 68640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:31,871-Speed 9601.36 samples/sec   Loss 7.5097   LearningRate 0.0631   Epoch: 4   Global Step: 68650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:32,931-Speed 9669.19 samples/sec   Loss 7.5846   LearningRate 0.0631   Epoch: 4   Global Step: 68660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:34,013-Speed 9468.30 samples/sec   Loss 7.4948   LearningRate 0.0631   Epoch: 4   Global Step: 68670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:35,094-Speed 9480.88 samples/sec   Loss 7.5854   LearningRate 0.0631   Epoch: 4   Global Step: 68680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:36,174-Speed 9482.38 samples/sec   Loss 7.6166   LearningRate 0.0631   Epoch: 4   Global Step: 68690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:37,263-Speed 9412.94 samples/sec   Loss 7.5778   LearningRate 0.0631   Epoch: 4   Global Step: 68700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:38,382-Speed 9157.21 samples/sec   Loss 7.4836   LearningRate 0.0631   Epoch: 4   Global Step: 68710   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:18:39,451-Speed 9583.58 samples/sec   Loss 7.4658   LearningRate 0.0631   Epoch: 4   Global Step: 68720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:40,499-Speed 9775.97 samples/sec   Loss 7.4602   LearningRate 0.0631   Epoch: 4   Global Step: 68730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:41,616-Speed 9169.53 samples/sec   Loss 7.6168   LearningRate 0.0631   Epoch: 4   Global Step: 68740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:42,716-Speed 9316.03 samples/sec   Loss 7.5211   LearningRate 0.0631   Epoch: 4   Global Step: 68750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:43,850-Speed 9032.83 samples/sec   Loss 7.6591   LearningRate 0.0630   Epoch: 4   Global Step: 68760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:44,953-Speed 9286.77 samples/sec   Loss 7.6212   LearningRate 0.0630   Epoch: 4   Global Step: 68770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:46,021-Speed 9601.84 samples/sec   Loss 7.4394   LearningRate 0.0630   Epoch: 4   Global Step: 68780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:47,076-Speed 9708.86 samples/sec   Loss 7.5392   LearningRate 0.0630   Epoch: 4   Global Step: 68790   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:48,187-Speed 9222.07 samples/sec   Loss 7.4433   LearningRate 0.0630   Epoch: 4   Global Step: 68800   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:49,267-Speed 9490.02 samples/sec   Loss 7.4628   LearningRate 0.0630   Epoch: 4   Global Step: 68810   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:50,336-Speed 9591.46 samples/sec   Loss 7.5026   LearningRate 0.0630   Epoch: 4   Global Step: 68820   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:51,442-Speed 9258.45 samples/sec   Loss 7.6104   LearningRate 0.0630   Epoch: 4   Global Step: 68830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:52,561-Speed 9155.21 samples/sec   Loss 7.5644   LearningRate 0.0630   Epoch: 4   Global Step: 68840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:53,675-Speed 9199.21 samples/sec   Loss 7.5680   LearningRate 0.0630   Epoch: 4   Global Step: 68850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:54,774-Speed 9323.16 samples/sec   Loss 7.4379   LearningRate 0.0630   Epoch: 4   Global Step: 68860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:55,854-Speed 9482.84 samples/sec   Loss 7.5547   LearningRate 0.0630   Epoch: 4   Global Step: 68870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:56,941-Speed 9431.39 samples/sec   Loss 7.5673   LearningRate 0.0630   Epoch: 4   Global Step: 68880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:58,054-Speed 9203.26 samples/sec   Loss 7.5145   LearningRate 0.0630   Epoch: 4   Global Step: 68890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:18:59,165-Speed 9226.42 samples/sec   Loss 7.4619   LearningRate 0.0630   Epoch: 4   Global Step: 68900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:00,234-Speed 9579.68 samples/sec   Loss 7.6014   LearningRate 0.0630   Epoch: 4   Global Step: 68910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:01,326-Speed 9388.75 samples/sec   Loss 7.6029   LearningRate 0.0630   Epoch: 4   Global Step: 68920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:02,425-Speed 9315.11 samples/sec   Loss 7.5063   LearningRate 0.0630   Epoch: 4   Global Step: 68930   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:03,515-Speed 9403.12 samples/sec   Loss 7.4975   LearningRate 0.0630   Epoch: 4   Global Step: 68940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:04,619-Speed 9282.63 samples/sec   Loss 7.6351   LearningRate 0.0630   Epoch: 4   Global Step: 68950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:05,699-Speed 9486.38 samples/sec   Loss 7.5386   LearningRate 0.0630   Epoch: 4   Global Step: 68960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:06,854-Speed 8873.05 samples/sec   Loss 7.6090   LearningRate 0.0629   Epoch: 4   Global Step: 68970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:07,961-Speed 9256.14 samples/sec   Loss 7.6085   LearningRate 0.0629   Epoch: 4   Global Step: 68980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:09,087-Speed 9100.69 samples/sec   Loss 7.6638   LearningRate 0.0629   Epoch: 4   Global Step: 68990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:10,192-Speed 9275.54 samples/sec   Loss 7.6065   LearningRate 0.0629   Epoch: 4   Global Step: 69000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:11,252-Speed 9658.65 samples/sec   Loss 7.6496   LearningRate 0.0629   Epoch: 4   Global Step: 69010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:12,312-Speed 9669.05 samples/sec   Loss 7.5110   LearningRate 0.0629   Epoch: 4   Global Step: 69020   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:19:13,359-Speed 9786.86 samples/sec   Loss 7.5995   LearningRate 0.0629   Epoch: 4   Global Step: 69030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:14,456-Speed 9341.84 samples/sec   Loss 7.5597   LearningRate 0.0629   Epoch: 4   Global Step: 69040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:15,528-Speed 9555.91 samples/sec   Loss 7.5414   LearningRate 0.0629   Epoch: 4   Global Step: 69050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:16,594-Speed 9611.99 samples/sec   Loss 7.6662   LearningRate 0.0629   Epoch: 4   Global Step: 69060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:17,673-Speed 9491.27 samples/sec   Loss 7.6274   LearningRate 0.0629   Epoch: 4   Global Step: 69070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:18,817-Speed 8959.97 samples/sec   Loss 7.4905   LearningRate 0.0629   Epoch: 4   Global Step: 69080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:19,868-Speed 9751.27 samples/sec   Loss 7.5885   LearningRate 0.0629   Epoch: 4   Global Step: 69090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:20,926-Speed 9684.83 samples/sec   Loss 7.5626   LearningRate 0.0629   Epoch: 4   Global Step: 69100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:22,006-Speed 9490.33 samples/sec   Loss 7.5536   LearningRate 0.0629   Epoch: 4   Global Step: 69110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:23,083-Speed 9510.67 samples/sec   Loss 7.5777   LearningRate 0.0629   Epoch: 4   Global Step: 69120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:24,173-Speed 9395.48 samples/sec   Loss 7.4710   LearningRate 0.0629   Epoch: 4   Global Step: 69130   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:19:25,229-Speed 9705.51 samples/sec   Loss 7.5705   LearningRate 0.0629   Epoch: 4   Global Step: 69140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:26,306-Speed 9518.16 samples/sec   Loss 7.6924   LearningRate 0.0629   Epoch: 4   Global Step: 69150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:27,376-Speed 9573.10 samples/sec   Loss 7.5629   LearningRate 0.0629   Epoch: 4   Global Step: 69160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:28,425-Speed 9767.55 samples/sec   Loss 7.6028   LearningRate 0.0629   Epoch: 4   Global Step: 69170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:29,499-Speed 9542.06 samples/sec   Loss 7.5713   LearningRate 0.0628   Epoch: 4   Global Step: 69180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:30,582-Speed 9458.77 samples/sec   Loss 7.6258   LearningRate 0.0628   Epoch: 4   Global Step: 69190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:31,681-Speed 9327.77 samples/sec   Loss 7.5294   LearningRate 0.0628   Epoch: 4   Global Step: 69200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:32,761-Speed 9485.40 samples/sec   Loss 7.5397   LearningRate 0.0628   Epoch: 4   Global Step: 69210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:33,832-Speed 9561.61 samples/sec   Loss 7.5343   LearningRate 0.0628   Epoch: 4   Global Step: 69220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:34,867-Speed 9911.26 samples/sec   Loss 7.5995   LearningRate 0.0628   Epoch: 4   Global Step: 69230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:35,935-Speed 9588.68 samples/sec   Loss 7.6603   LearningRate 0.0628   Epoch: 4   Global Step: 69240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:37,037-Speed 9295.30 samples/sec   Loss 7.5417   LearningRate 0.0628   Epoch: 4   Global Step: 69250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:38,128-Speed 9392.66 samples/sec   Loss 7.6068   LearningRate 0.0628   Epoch: 4   Global Step: 69260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:39,219-Speed 9388.75 samples/sec   Loss 7.6875   LearningRate 0.0628   Epoch: 4   Global Step: 69270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:40,305-Speed 9434.73 samples/sec   Loss 7.6174   LearningRate 0.0628   Epoch: 4   Global Step: 69280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:41,404-Speed 9327.79 samples/sec   Loss 7.6269   LearningRate 0.0628   Epoch: 4   Global Step: 69290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:42,509-Speed 9270.25 samples/sec   Loss 7.5751   LearningRate 0.0628   Epoch: 4   Global Step: 69300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:43,627-Speed 9165.64 samples/sec   Loss 7.6579   LearningRate 0.0628   Epoch: 4   Global Step: 69310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:44,713-Speed 9426.99 samples/sec   Loss 7.7556   LearningRate 0.0628   Epoch: 4   Global Step: 69320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:45,789-Speed 9526.01 samples/sec   Loss 7.6199   LearningRate 0.0628   Epoch: 4   Global Step: 69330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:46,885-Speed 9351.83 samples/sec   Loss 7.5693   LearningRate 0.0628   Epoch: 4   Global Step: 69340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:47,996-Speed 9228.24 samples/sec   Loss 7.5823   LearningRate 0.0628   Epoch: 4   Global Step: 69350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:19:49,071-Speed 9530.42 samples/sec   Loss 7.7412   LearningRate 0.0628   Epoch: 4   Global Step: 69360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:50,175-Speed 9273.51 samples/sec   Loss 7.5735   LearningRate 0.0628   Epoch: 4   Global Step: 69370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:51,288-Speed 9207.61 samples/sec   Loss 7.6462   LearningRate 0.0628   Epoch: 4   Global Step: 69380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:52,413-Speed 9107.57 samples/sec   Loss 7.5508   LearningRate 0.0627   Epoch: 4   Global Step: 69390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:53,511-Speed 9329.06 samples/sec   Loss 7.5786   LearningRate 0.0627   Epoch: 4   Global Step: 69400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:54,563-Speed 9745.58 samples/sec   Loss 7.5313   LearningRate 0.0627   Epoch: 4   Global Step: 69410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:55,640-Speed 9507.16 samples/sec   Loss 7.5835   LearningRate 0.0627   Epoch: 4   Global Step: 69420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:56,675-Speed 9902.29 samples/sec   Loss 7.6263   LearningRate 0.0627   Epoch: 4   Global Step: 69430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:57,715-Speed 9852.25 samples/sec   Loss 7.6617   LearningRate 0.0627   Epoch: 4   Global Step: 69440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:58,785-Speed 9578.63 samples/sec   Loss 7.6312   LearningRate 0.0627   Epoch: 4   Global Step: 69450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:19:59,834-Speed 9768.83 samples/sec   Loss 7.6030   LearningRate 0.0627   Epoch: 4   Global Step: 69460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:00,935-Speed 9306.55 samples/sec   Loss 7.5645   LearningRate 0.0627   Epoch: 4   Global Step: 69470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:02,029-Speed 9364.78 samples/sec   Loss 7.6210   LearningRate 0.0627   Epoch: 4   Global Step: 69480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:03,112-Speed 9455.34 samples/sec   Loss 7.5404   LearningRate 0.0627   Epoch: 4   Global Step: 69490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:04,178-Speed 9619.94 samples/sec   Loss 7.6116   LearningRate 0.0627   Epoch: 4   Global Step: 69500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:05,304-Speed 9104.07 samples/sec   Loss 7.5665   LearningRate 0.0627   Epoch: 4   Global Step: 69510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:06,413-Speed 9234.37 samples/sec   Loss 7.6820   LearningRate 0.0627   Epoch: 4   Global Step: 69520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:07,503-Speed 9399.48 samples/sec   Loss 7.6664   LearningRate 0.0627   Epoch: 4   Global Step: 69530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:08,615-Speed 9216.52 samples/sec   Loss 7.5358   LearningRate 0.0627   Epoch: 4   Global Step: 69540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:09,676-Speed 9652.03 samples/sec   Loss 7.6243   LearningRate 0.0627   Epoch: 4   Global Step: 69550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:10,740-Speed 9639.97 samples/sec   Loss 7.6886   LearningRate 0.0627   Epoch: 4   Global Step: 69560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:11,817-Speed 9513.54 samples/sec   Loss 7.5907   LearningRate 0.0627   Epoch: 4   Global Step: 69570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:12,931-Speed 9192.66 samples/sec   Loss 7.6224   LearningRate 0.0627   Epoch: 4   Global Step: 69580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:13,991-Speed 9663.39 samples/sec   Loss 7.5346   LearningRate 0.0627   Epoch: 4   Global Step: 69590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:15,051-Speed 9671.69 samples/sec   Loss 7.6025   LearningRate 0.0626   Epoch: 4   Global Step: 69600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:16,159-Speed 9242.28 samples/sec   Loss 7.7378   LearningRate 0.0626   Epoch: 4   Global Step: 69610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:17,246-Speed 9422.82 samples/sec   Loss 7.6973   LearningRate 0.0626   Epoch: 4   Global Step: 69620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:18,285-Speed 9862.82 samples/sec   Loss 7.5819   LearningRate 0.0626   Epoch: 4   Global Step: 69630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:19,377-Speed 9388.79 samples/sec   Loss 7.5340   LearningRate 0.0626   Epoch: 4   Global Step: 69640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:20,499-Speed 9123.56 samples/sec   Loss 7.6254   LearningRate 0.0626   Epoch: 4   Global Step: 69650   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:21,629-Speed 9068.63 samples/sec   Loss 7.6272   LearningRate 0.0626   Epoch: 4   Global Step: 69660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:22,687-Speed 9684.27 samples/sec   Loss 7.5885   LearningRate 0.0626   Epoch: 4   Global Step: 69670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:23,774-Speed 9435.00 samples/sec   Loss 7.6477   LearningRate 0.0626   Epoch: 4   Global Step: 69680   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:20:24,835-Speed 9661.88 samples/sec   Loss 7.4916   LearningRate 0.0626   Epoch: 4   Global Step: 69690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:25,919-Speed 9448.67 samples/sec   Loss 7.6743   LearningRate 0.0626   Epoch: 4   Global Step: 69700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:27,002-Speed 9454.32 samples/sec   Loss 7.6678   LearningRate 0.0626   Epoch: 4   Global Step: 69710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:28,123-Speed 9146.07 samples/sec   Loss 7.6243   LearningRate 0.0626   Epoch: 4   Global Step: 69720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:29,228-Speed 9269.08 samples/sec   Loss 7.6914   LearningRate 0.0626   Epoch: 4   Global Step: 69730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:30,330-Speed 9299.71 samples/sec   Loss 7.6608   LearningRate 0.0626   Epoch: 4   Global Step: 69740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:31,372-Speed 9825.55 samples/sec   Loss 7.6537   LearningRate 0.0626   Epoch: 4   Global Step: 69750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:32,480-Speed 9249.51 samples/sec   Loss 7.6006   LearningRate 0.0626   Epoch: 4   Global Step: 69760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:33,558-Speed 9511.09 samples/sec   Loss 7.7208   LearningRate 0.0626   Epoch: 4   Global Step: 69770   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:34,616-Speed 9684.35 samples/sec   Loss 7.7663   LearningRate 0.0626   Epoch: 4   Global Step: 69780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:35,688-Speed 9554.46 samples/sec   Loss 7.7717   LearningRate 0.0626   Epoch: 4   Global Step: 69790   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:20:36,758-Speed 9580.34 samples/sec   Loss 7.6585   LearningRate 0.0626   Epoch: 4   Global Step: 69800   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:20:37,834-Speed 9514.34 samples/sec   Loss 7.6605   LearningRate 0.0625   Epoch: 4   Global Step: 69810   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:20:38,869-Speed 9903.12 samples/sec   Loss 7.6171   LearningRate 0.0625   Epoch: 4   Global Step: 69820   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:20:39,932-Speed 9644.71 samples/sec   Loss 7.6100   LearningRate 0.0625   Epoch: 4   Global Step: 69830   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:20:40,978-Speed 9794.85 samples/sec   Loss 7.6373   LearningRate 0.0625   Epoch: 4   Global Step: 69840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:42,040-Speed 9645.12 samples/sec   Loss 7.6550   LearningRate 0.0625   Epoch: 4   Global Step: 69850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:43,119-Speed 9507.25 samples/sec   Loss 7.5974   LearningRate 0.0625   Epoch: 4   Global Step: 69860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:44,184-Speed 9624.06 samples/sec   Loss 7.6766   LearningRate 0.0625   Epoch: 4   Global Step: 69870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:45,274-Speed 9393.64 samples/sec   Loss 7.6902   LearningRate 0.0625   Epoch: 4   Global Step: 69880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:46,313-Speed 9868.24 samples/sec   Loss 7.6025   LearningRate 0.0625   Epoch: 4   Global Step: 69890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:47,414-Speed 9297.64 samples/sec   Loss 7.6302   LearningRate 0.0625   Epoch: 4   Global Step: 69900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:48,493-Speed 9496.79 samples/sec   Loss 7.6005   LearningRate 0.0625   Epoch: 4   Global Step: 69910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:49,584-Speed 9397.00 samples/sec   Loss 7.6347   LearningRate 0.0625   Epoch: 4   Global Step: 69920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:50,691-Speed 9251.18 samples/sec   Loss 7.6276   LearningRate 0.0625   Epoch: 4   Global Step: 69930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:20:51,763-Speed 9558.34 samples/sec   Loss 7.5177   LearningRate 0.0625   Epoch: 4   Global Step: 69940   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:52,853-Speed 9402.98 samples/sec   Loss 7.6405   LearningRate 0.0625   Epoch: 4   Global Step: 69950   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:53,947-Speed 9367.26 samples/sec   Loss 7.6385   LearningRate 0.0625   Epoch: 4   Global Step: 69960   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:55,000-Speed 9724.75 samples/sec   Loss 7.5679   LearningRate 0.0625   Epoch: 4   Global Step: 69970   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:56,114-Speed 9198.19 samples/sec   Loss 7.5227   LearningRate 0.0625   Epoch: 4   Global Step: 69980   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:57,175-Speed 9657.25 samples/sec   Loss 7.6414   LearningRate 0.0625   Epoch: 4   Global Step: 69990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:20:58,278-Speed 9292.13 samples/sec   Loss 7.6339   LearningRate 0.0625   Epoch: 4   Global Step: 70000   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:21:20,236-[lfw][70000]XNorm: 12.313847
Training: 2022-04-11 14:21:20,237-[lfw][70000]Accuracy-Flip: 0.99550+-0.00236
Training: 2022-04-11 14:21:20,237-[lfw][70000]Accuracy-Highest: 0.99583
Training: 2022-04-11 14:21:45,411-[cfp_fp][70000]XNorm: 10.456402
Training: 2022-04-11 14:21:45,412-[cfp_fp][70000]Accuracy-Flip: 0.95143+-0.00996
Training: 2022-04-11 14:21:45,412-[cfp_fp][70000]Accuracy-Highest: 0.95171
Training: 2022-04-11 14:22:07,101-[agedb_30][70000]XNorm: 11.907823
Training: 2022-04-11 14:22:07,102-[agedb_30][70000]Accuracy-Flip: 0.95767+-0.00904
Training: 2022-04-11 14:22:07,102-[agedb_30][70000]Accuracy-Highest: 0.96033
Training: 2022-04-11 14:22:08,216-Speed 146.42 samples/sec   Loss 7.6580   LearningRate 0.0625   Epoch: 4   Global Step: 70010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:09,327-Speed 9225.05 samples/sec   Loss 7.6722   LearningRate 0.0624   Epoch: 4   Global Step: 70020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:10,378-Speed 9747.67 samples/sec   Loss 7.6835   LearningRate 0.0624   Epoch: 4   Global Step: 70030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:11,462-Speed 9450.62 samples/sec   Loss 7.6394   LearningRate 0.0624   Epoch: 4   Global Step: 70040   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:22:12,541-Speed 9499.71 samples/sec   Loss 7.6854   LearningRate 0.0624   Epoch: 4   Global Step: 70050   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:22:13,639-Speed 9325.69 samples/sec   Loss 7.5419   LearningRate 0.0624   Epoch: 4   Global Step: 70060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:14,721-Speed 9477.25 samples/sec   Loss 7.7276   LearningRate 0.0624   Epoch: 4   Global Step: 70070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:15,768-Speed 9781.42 samples/sec   Loss 7.7028   LearningRate 0.0624   Epoch: 4   Global Step: 70080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:16,816-Speed 9779.31 samples/sec   Loss 7.6910   LearningRate 0.0624   Epoch: 4   Global Step: 70090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:17,919-Speed 9290.79 samples/sec   Loss 7.7285   LearningRate 0.0624   Epoch: 4   Global Step: 70100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:19,007-Speed 9412.94 samples/sec   Loss 7.5877   LearningRate 0.0624   Epoch: 4   Global Step: 70110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:20,130-Speed 9131.59 samples/sec   Loss 7.5968   LearningRate 0.0624   Epoch: 4   Global Step: 70120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:21,231-Speed 9303.45 samples/sec   Loss 7.7328   LearningRate 0.0624   Epoch: 4   Global Step: 70130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:22,284-Speed 9726.18 samples/sec   Loss 7.6851   LearningRate 0.0624   Epoch: 4   Global Step: 70140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:23,363-Speed 9495.58 samples/sec   Loss 7.6286   LearningRate 0.0624   Epoch: 4   Global Step: 70150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:24,476-Speed 9207.06 samples/sec   Loss 7.6216   LearningRate 0.0624   Epoch: 4   Global Step: 70160   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:22:25,620-Speed 8960.55 samples/sec   Loss 7.8567   LearningRate 0.0624   Epoch: 4   Global Step: 70170   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:22:26,694-Speed 9540.66 samples/sec   Loss 7.5610   LearningRate 0.0624   Epoch: 4   Global Step: 70180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:27,784-Speed 9426.87 samples/sec   Loss 7.5531   LearningRate 0.0624   Epoch: 4   Global Step: 70190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:28,902-Speed 9166.91 samples/sec   Loss 7.6241   LearningRate 0.0624   Epoch: 4   Global Step: 70200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:22:30,012-Speed 9225.57 samples/sec   Loss 7.7356   LearningRate 0.0624   Epoch: 4   Global Step: 70210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:22:31,120-Speed 9245.42 samples/sec   Loss 7.7479   LearningRate 0.0624   Epoch: 4   Global Step: 70220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:22:32,227-Speed 9256.54 samples/sec   Loss 7.6894   LearningRate 0.0623   Epoch: 4   Global Step: 70230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:22:33,330-Speed 9290.87 samples/sec   Loss 7.6973   LearningRate 0.0623   Epoch: 4   Global Step: 70240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:22:34,429-Speed 9319.84 samples/sec   Loss 7.6879   LearningRate 0.0623   Epoch: 4   Global Step: 70250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:22:35,567-Speed 9009.69 samples/sec   Loss 7.5463   LearningRate 0.0623   Epoch: 4   Global Step: 70260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:22:36,639-Speed 9550.01 samples/sec   Loss 7.5227   LearningRate 0.0623   Epoch: 4   Global Step: 70270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:22:37,727-Speed 9422.74 samples/sec   Loss 7.7614   LearningRate 0.0623   Epoch: 4   Global Step: 70280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:22:38,803-Speed 9516.31 samples/sec   Loss 7.6475   LearningRate 0.0623   Epoch: 4   Global Step: 70290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:22:39,878-Speed 9531.74 samples/sec   Loss 7.6472   LearningRate 0.0623   Epoch: 4   Global Step: 70300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:40,979-Speed 9315.24 samples/sec   Loss 7.5824   LearningRate 0.0623   Epoch: 4   Global Step: 70310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:42,080-Speed 9300.34 samples/sec   Loss 7.7547   LearningRate 0.0623   Epoch: 4   Global Step: 70320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:43,214-Speed 9040.00 samples/sec   Loss 7.6512   LearningRate 0.0623   Epoch: 4   Global Step: 70330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:44,258-Speed 9809.16 samples/sec   Loss 7.7607   LearningRate 0.0623   Epoch: 4   Global Step: 70340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:45,330-Speed 9553.51 samples/sec   Loss 7.7148   LearningRate 0.0623   Epoch: 4   Global Step: 70350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:46,396-Speed 9621.31 samples/sec   Loss 7.6979   LearningRate 0.0623   Epoch: 4   Global Step: 70360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:47,491-Speed 9354.71 samples/sec   Loss 7.7005   LearningRate 0.0623   Epoch: 4   Global Step: 70370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:48,567-Speed 9521.96 samples/sec   Loss 7.5061   LearningRate 0.0623   Epoch: 4   Global Step: 70380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:49,651-Speed 9454.18 samples/sec   Loss 7.6248   LearningRate 0.0623   Epoch: 4   Global Step: 70390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:50,764-Speed 9200.68 samples/sec   Loss 7.7108   LearningRate 0.0623   Epoch: 4   Global Step: 70400   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:22:51,848-Speed 9455.47 samples/sec   Loss 7.5916   LearningRate 0.0623   Epoch: 4   Global Step: 70410   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:22:52,984-Speed 9017.45 samples/sec   Loss 7.6787   LearningRate 0.0623   Epoch: 4   Global Step: 70420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:22:54,067-Speed 9466.49 samples/sec   Loss 7.5438   LearningRate 0.0623   Epoch: 4   Global Step: 70430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:22:55,175-Speed 9244.80 samples/sec   Loss 7.6729   LearningRate 0.0623   Epoch: 4   Global Step: 70440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:22:56,260-Speed 9442.97 samples/sec   Loss 7.6586   LearningRate 0.0622   Epoch: 4   Global Step: 70450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:22:57,390-Speed 9069.37 samples/sec   Loss 7.6598   LearningRate 0.0622   Epoch: 4   Global Step: 70460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:22:58,441-Speed 9747.22 samples/sec   Loss 7.6151   LearningRate 0.0622   Epoch: 4   Global Step: 70470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:22:59,548-Speed 9249.69 samples/sec   Loss 7.6811   LearningRate 0.0622   Epoch: 4   Global Step: 70480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:23:00,655-Speed 9261.55 samples/sec   Loss 7.5633   LearningRate 0.0622   Epoch: 4   Global Step: 70490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:23:01,772-Speed 9171.52 samples/sec   Loss 7.6966   LearningRate 0.0622   Epoch: 4   Global Step: 70500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:23:02,852-Speed 9483.50 samples/sec   Loss 7.5721   LearningRate 0.0622   Epoch: 4   Global Step: 70510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:23:03,927-Speed 9535.83 samples/sec   Loss 7.7301   LearningRate 0.0622   Epoch: 4   Global Step: 70520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:23:05,031-Speed 9284.79 samples/sec   Loss 7.5533   LearningRate 0.0622   Epoch: 4   Global Step: 70530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:06,130-Speed 9323.52 samples/sec   Loss 7.6423   LearningRate 0.0622   Epoch: 4   Global Step: 70540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:07,189-Speed 9669.69 samples/sec   Loss 7.6150   LearningRate 0.0622   Epoch: 4   Global Step: 70550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:08,312-Speed 9122.10 samples/sec   Loss 7.6652   LearningRate 0.0622   Epoch: 4   Global Step: 70560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:09,415-Speed 9290.68 samples/sec   Loss 7.6579   LearningRate 0.0622   Epoch: 4   Global Step: 70570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:10,501-Speed 9435.72 samples/sec   Loss 7.5890   LearningRate 0.0622   Epoch: 4   Global Step: 70580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:11,555-Speed 9724.53 samples/sec   Loss 7.6908   LearningRate 0.0622   Epoch: 4   Global Step: 70590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:12,631-Speed 9522.01 samples/sec   Loss 7.6703   LearningRate 0.0622   Epoch: 4   Global Step: 70600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:13,694-Speed 9640.63 samples/sec   Loss 7.7092   LearningRate 0.0622   Epoch: 4   Global Step: 70610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:14,777-Speed 9454.94 samples/sec   Loss 7.5271   LearningRate 0.0622   Epoch: 4   Global Step: 70620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:15,875-Speed 9337.11 samples/sec   Loss 7.6695   LearningRate 0.0622   Epoch: 4   Global Step: 70630   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:23:16,948-Speed 9544.63 samples/sec   Loss 7.6374   LearningRate 0.0622   Epoch: 4   Global Step: 70640   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:23:18,027-Speed 9505.42 samples/sec   Loss 7.7834   LearningRate 0.0622   Epoch: 4   Global Step: 70650   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:23:19,130-Speed 9289.27 samples/sec   Loss 7.7223   LearningRate 0.0621   Epoch: 4   Global Step: 70660   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:23:20,177-Speed 9784.28 samples/sec   Loss 7.7290   LearningRate 0.0621   Epoch: 4   Global Step: 70670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:21,268-Speed 9387.78 samples/sec   Loss 7.5355   LearningRate 0.0621   Epoch: 4   Global Step: 70680   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:22,331-Speed 9641.68 samples/sec   Loss 7.7627   LearningRate 0.0621   Epoch: 4   Global Step: 70690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:23,384-Speed 9735.55 samples/sec   Loss 7.6525   LearningRate 0.0621   Epoch: 4   Global Step: 70700   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:24,454-Speed 9577.44 samples/sec   Loss 7.6717   LearningRate 0.0621   Epoch: 4   Global Step: 70710   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:25,546-Speed 9381.22 samples/sec   Loss 7.5951   LearningRate 0.0621   Epoch: 4   Global Step: 70720   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:26,602-Speed 9694.96 samples/sec   Loss 7.7286   LearningRate 0.0621   Epoch: 4   Global Step: 70730   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:27,668-Speed 9616.03 samples/sec   Loss 7.7204   LearningRate 0.0621   Epoch: 4   Global Step: 70740   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:28,736-Speed 9595.65 samples/sec   Loss 7.7186   LearningRate 0.0621   Epoch: 4   Global Step: 70750   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:29,812-Speed 9520.11 samples/sec   Loss 7.7183   LearningRate 0.0621   Epoch: 4   Global Step: 70760   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:30,866-Speed 9716.77 samples/sec   Loss 7.7084   LearningRate 0.0621   Epoch: 4   Global Step: 70770   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:23:31,976-Speed 9232.05 samples/sec   Loss 7.6413   LearningRate 0.0621   Epoch: 4   Global Step: 70780   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:23:33,062-Speed 9429.82 samples/sec   Loss 7.7946   LearningRate 0.0621   Epoch: 4   Global Step: 70790   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:23:34,166-Speed 9291.34 samples/sec   Loss 7.6521   LearningRate 0.0621   Epoch: 4   Global Step: 70800   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:23:35,274-Speed 9244.98 samples/sec   Loss 7.5669   LearningRate 0.0621   Epoch: 4   Global Step: 70810   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:23:36,364-Speed 9403.02 samples/sec   Loss 7.6633   LearningRate 0.0621   Epoch: 4   Global Step: 70820   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:23:37,503-Speed 8995.99 samples/sec   Loss 7.7215   LearningRate 0.0621   Epoch: 4   Global Step: 70830   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:38,566-Speed 9639.29 samples/sec   Loss 7.6435   LearningRate 0.0621   Epoch: 4   Global Step: 70840   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:39,662-Speed 9344.18 samples/sec   Loss 7.7792   LearningRate 0.0621   Epoch: 4   Global Step: 70850   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:40,724-Speed 9651.33 samples/sec   Loss 7.7359   LearningRate 0.0621   Epoch: 4   Global Step: 70860   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:41,793-Speed 9588.89 samples/sec   Loss 7.7621   LearningRate 0.0620   Epoch: 4   Global Step: 70870   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:42,891-Speed 9329.18 samples/sec   Loss 7.7150   LearningRate 0.0620   Epoch: 4   Global Step: 70880   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:43,975-Speed 9456.67 samples/sec   Loss 7.6387   LearningRate 0.0620   Epoch: 4   Global Step: 70890   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:45,043-Speed 9589.23 samples/sec   Loss 7.6145   LearningRate 0.0620   Epoch: 4   Global Step: 70900   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:46,103-Speed 9666.93 samples/sec   Loss 7.6691   LearningRate 0.0620   Epoch: 4   Global Step: 70910   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:47,172-Speed 9586.43 samples/sec   Loss 7.6776   LearningRate 0.0620   Epoch: 4   Global Step: 70920   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:48,255-Speed 9458.84 samples/sec   Loss 7.7032   LearningRate 0.0620   Epoch: 4   Global Step: 70930   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:23:49,324-Speed 9579.31 samples/sec   Loss 7.6760   LearningRate 0.0620   Epoch: 4   Global Step: 70940   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:23:50,412-Speed 9423.28 samples/sec   Loss 7.7294   LearningRate 0.0620   Epoch: 4   Global Step: 70950   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:23:51,491-Speed 9490.19 samples/sec   Loss 7.6667   LearningRate 0.0620   Epoch: 4   Global Step: 70960   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:23:52,588-Speed 9348.24 samples/sec   Loss 7.7154   LearningRate 0.0620   Epoch: 4   Global Step: 70970   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:23:53,688-Speed 9314.30 samples/sec   Loss 7.6681   LearningRate 0.0620   Epoch: 4   Global Step: 70980   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:23:54,762-Speed 9543.20 samples/sec   Loss 7.6511   LearningRate 0.0620   Epoch: 4   Global Step: 70990   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:23:55,819-Speed 9690.51 samples/sec   Loss 7.6863   LearningRate 0.0620   Epoch: 4   Global Step: 71000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:23:56,867-Speed 9782.68 samples/sec   Loss 7.6567   LearningRate 0.0620   Epoch: 4   Global Step: 71010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:23:58,004-Speed 9009.70 samples/sec   Loss 7.6948   LearningRate 0.0620   Epoch: 4   Global Step: 71020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:23:59,087-Speed 9460.76 samples/sec   Loss 7.7254   LearningRate 0.0620   Epoch: 4   Global Step: 71030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:24:00,145-Speed 9677.98 samples/sec   Loss 7.6366   LearningRate 0.0620   Epoch: 4   Global Step: 71040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:24:01,237-Speed 9384.10 samples/sec   Loss 7.5290   LearningRate 0.0620   Epoch: 4   Global Step: 71050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:24:02,343-Speed 9268.29 samples/sec   Loss 7.6968   LearningRate 0.0620   Epoch: 4   Global Step: 71060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:24:03,408-Speed 9614.98 samples/sec   Loss 7.6482   LearningRate 0.0620   Epoch: 4   Global Step: 71070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:24:04,492-Speed 9456.76 samples/sec   Loss 7.6522   LearningRate 0.0619   Epoch: 4   Global Step: 71080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:24:05,619-Speed 9089.55 samples/sec   Loss 7.6873   LearningRate 0.0619   Epoch: 4   Global Step: 71090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:24:06,696-Speed 9514.62 samples/sec   Loss 7.6300   LearningRate 0.0619   Epoch: 4   Global Step: 71100   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:07,792-Speed 9349.96 samples/sec   Loss 7.6604   LearningRate 0.0619   Epoch: 4   Global Step: 71110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:08,900-Speed 9239.64 samples/sec   Loss 7.6503   LearningRate 0.0619   Epoch: 4   Global Step: 71120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:09,999-Speed 9323.70 samples/sec   Loss 7.6550   LearningRate 0.0619   Epoch: 4   Global Step: 71130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:11,052-Speed 9735.56 samples/sec   Loss 7.6706   LearningRate 0.0619   Epoch: 4   Global Step: 71140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:12,139-Speed 9433.23 samples/sec   Loss 7.6529   LearningRate 0.0619   Epoch: 4   Global Step: 71150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:13,233-Speed 9361.83 samples/sec   Loss 7.6979   LearningRate 0.0619   Epoch: 4   Global Step: 71160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:14,317-Speed 9453.03 samples/sec   Loss 7.7015   LearningRate 0.0619   Epoch: 4   Global Step: 71170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:15,390-Speed 9550.48 samples/sec   Loss 7.5794   LearningRate 0.0619   Epoch: 4   Global Step: 71180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:16,501-Speed 9217.63 samples/sec   Loss 7.6182   LearningRate 0.0619   Epoch: 4   Global Step: 71190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:17,566-Speed 9626.43 samples/sec   Loss 7.7532   LearningRate 0.0619   Epoch: 4   Global Step: 71200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:18,618-Speed 9732.10 samples/sec   Loss 7.7310   LearningRate 0.0619   Epoch: 4   Global Step: 71210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:19,692-Speed 9546.69 samples/sec   Loss 7.6483   LearningRate 0.0619   Epoch: 4   Global Step: 71220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:20,800-Speed 9238.61 samples/sec   Loss 7.6513   LearningRate 0.0619   Epoch: 4   Global Step: 71230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:21,889-Speed 9415.88 samples/sec   Loss 7.6608   LearningRate 0.0619   Epoch: 4   Global Step: 71240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:23,008-Speed 9149.68 samples/sec   Loss 7.6373   LearningRate 0.0619   Epoch: 4   Global Step: 71250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:24,134-Speed 9110.00 samples/sec   Loss 7.5746   LearningRate 0.0619   Epoch: 4   Global Step: 71260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:25,215-Speed 9472.36 samples/sec   Loss 7.6750   LearningRate 0.0619   Epoch: 4   Global Step: 71270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:26,298-Speed 9462.68 samples/sec   Loss 7.6608   LearningRate 0.0619   Epoch: 4   Global Step: 71280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:27,415-Speed 9175.50 samples/sec   Loss 7.6530   LearningRate 0.0618   Epoch: 4   Global Step: 71290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:28,543-Speed 9090.68 samples/sec   Loss 7.7215   LearningRate 0.0618   Epoch: 4   Global Step: 71300   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:24:29,655-Speed 9218.95 samples/sec   Loss 7.6042   LearningRate 0.0618   Epoch: 4   Global Step: 71310   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:30,759-Speed 9279.54 samples/sec   Loss 7.5946   LearningRate 0.0618   Epoch: 4   Global Step: 71320   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:31,853-Speed 9369.01 samples/sec   Loss 7.6070   LearningRate 0.0618   Epoch: 4   Global Step: 71330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:32,936-Speed 9453.91 samples/sec   Loss 7.6135   LearningRate 0.0618   Epoch: 4   Global Step: 71340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:34,085-Speed 8921.76 samples/sec   Loss 7.7572   LearningRate 0.0618   Epoch: 4   Global Step: 71350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:35,198-Speed 9205.51 samples/sec   Loss 7.6638   LearningRate 0.0618   Epoch: 4   Global Step: 71360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:36,297-Speed 9326.80 samples/sec   Loss 7.6283   LearningRate 0.0618   Epoch: 4   Global Step: 71370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:37,361-Speed 9626.05 samples/sec   Loss 7.7290   LearningRate 0.0618   Epoch: 4   Global Step: 71380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:38,406-Speed 9800.19 samples/sec   Loss 7.6450   LearningRate 0.0618   Epoch: 4   Global Step: 71390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:39,483-Speed 9516.40 samples/sec   Loss 7.7314   LearningRate 0.0618   Epoch: 4   Global Step: 71400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:40,606-Speed 9125.46 samples/sec   Loss 7.5976   LearningRate 0.0618   Epoch: 4   Global Step: 71410   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:24:41,690-Speed 9449.80 samples/sec   Loss 7.6678   LearningRate 0.0618   Epoch: 4   Global Step: 71420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:42,786-Speed 9363.07 samples/sec   Loss 7.7023   LearningRate 0.0618   Epoch: 4   Global Step: 71430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:43,860-Speed 9539.15 samples/sec   Loss 7.7109   LearningRate 0.0618   Epoch: 4   Global Step: 71440   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:44,889-Speed 9950.90 samples/sec   Loss 7.7611   LearningRate 0.0618   Epoch: 4   Global Step: 71450   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:45,923-Speed 9912.42 samples/sec   Loss 7.6449   LearningRate 0.0618   Epoch: 4   Global Step: 71460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:46,982-Speed 9671.54 samples/sec   Loss 7.5559   LearningRate 0.0618   Epoch: 4   Global Step: 71470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:48,035-Speed 9731.63 samples/sec   Loss 7.6977   LearningRate 0.0618   Epoch: 4   Global Step: 71480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:49,072-Speed 9886.86 samples/sec   Loss 7.7235   LearningRate 0.0618   Epoch: 4   Global Step: 71490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:50,139-Speed 9598.19 samples/sec   Loss 7.7156   LearningRate 0.0618   Epoch: 4   Global Step: 71500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:51,227-Speed 9419.44 samples/sec   Loss 7.6688   LearningRate 0.0617   Epoch: 4   Global Step: 71510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:52,283-Speed 9698.22 samples/sec   Loss 7.6616   LearningRate 0.0617   Epoch: 4   Global Step: 71520   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:24:53,383-Speed 9321.94 samples/sec   Loss 7.6553   LearningRate 0.0617   Epoch: 4   Global Step: 71530   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:24:54,449-Speed 9611.82 samples/sec   Loss 7.7888   LearningRate 0.0617   Epoch: 4   Global Step: 71540   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:24:55,548-Speed 9317.02 samples/sec   Loss 7.6183   LearningRate 0.0617   Epoch: 4   Global Step: 71550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:56,629-Speed 9485.01 samples/sec   Loss 7.7191   LearningRate 0.0617   Epoch: 4   Global Step: 71560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:57,724-Speed 9350.25 samples/sec   Loss 7.6636   LearningRate 0.0617   Epoch: 4   Global Step: 71570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:58,777-Speed 9735.97 samples/sec   Loss 7.6802   LearningRate 0.0617   Epoch: 4   Global Step: 71580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:24:59,858-Speed 9477.84 samples/sec   Loss 7.6208   LearningRate 0.0617   Epoch: 4   Global Step: 71590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:25:00,902-Speed 9814.99 samples/sec   Loss 7.6694   LearningRate 0.0617   Epoch: 4   Global Step: 71600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:25:01,988-Speed 9430.26 samples/sec   Loss 7.7439   LearningRate 0.0617   Epoch: 4   Global Step: 71610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:25:03,063-Speed 9529.17 samples/sec   Loss 7.7816   LearningRate 0.0617   Epoch: 4   Global Step: 71620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:25:04,136-Speed 9556.50 samples/sec   Loss 7.6320   LearningRate 0.0617   Epoch: 4   Global Step: 71630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:25:05,196-Speed 9663.04 samples/sec   Loss 7.7095   LearningRate 0.0617   Epoch: 4   Global Step: 71640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:25:06,259-Speed 9638.55 samples/sec   Loss 7.7807   LearningRate 0.0617   Epoch: 4   Global Step: 71650   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:25:07,395-Speed 9022.31 samples/sec   Loss 7.7595   LearningRate 0.0617   Epoch: 4   Global Step: 71660   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:25:08,489-Speed 9373.13 samples/sec   Loss 7.7536   LearningRate 0.0617   Epoch: 4   Global Step: 71670   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:25:09,555-Speed 9613.53 samples/sec   Loss 7.7601   LearningRate 0.0617   Epoch: 4   Global Step: 71680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:25:10,640-Speed 9439.77 samples/sec   Loss 7.7891   LearningRate 0.0617   Epoch: 4   Global Step: 71690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:25:11,735-Speed 9357.32 samples/sec   Loss 7.6829   LearningRate 0.0617   Epoch: 4   Global Step: 71700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:25:12,833-Speed 9329.73 samples/sec   Loss 7.6498   LearningRate 0.0617   Epoch: 4   Global Step: 71710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:25:13,953-Speed 9151.89 samples/sec   Loss 7.7418   LearningRate 0.0616   Epoch: 4   Global Step: 71720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-11 14:25:15,040-Speed 9427.67 samples/sec   Loss 7.6047   LearningRate 0.0616   Epoch: 4   Global Step: 71730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:25:16,122-Speed 9465.18 samples/sec   Loss 7.6466   LearningRate 0.0616   Epoch: 4   Global Step: 71740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:25:17,204-Speed 9470.55 samples/sec   Loss 7.7062   LearningRate 0.0616   Epoch: 4   Global Step: 71750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:25:18,297-Speed 9375.37 samples/sec   Loss 7.7431   LearningRate 0.0616   Epoch: 4   Global Step: 71760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:25:19,358-Speed 9660.09 samples/sec   Loss 7.6771   LearningRate 0.0616   Epoch: 4   Global Step: 71770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:25:20,431-Speed 9543.38 samples/sec   Loss 7.6095   LearningRate 0.0616   Epoch: 4   Global Step: 71780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:25:21,530-Speed 9321.44 samples/sec   Loss 7.7624   LearningRate 0.0616   Epoch: 4   Global Step: 71790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:25:22,623-Speed 9377.42 samples/sec   Loss 7.6001   LearningRate 0.0616   Epoch: 4   Global Step: 71800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:25:23,716-Speed 9373.82 samples/sec   Loss 7.7355   LearningRate 0.0616   Epoch: 4   Global Step: 71810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:25:24,802-Speed 9429.46 samples/sec   Loss 7.6223   LearningRate 0.0616   Epoch: 4   Global Step: 71820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:25:25,938-Speed 9022.52 samples/sec   Loss 7.7032   LearningRate 0.0616   Epoch: 4   Global Step: 71830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:25:27,009-Speed 9568.86 samples/sec   Loss 7.7544   LearningRate 0.0616   Epoch: 4   Global Step: 71840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:25:28,042-Speed 9921.56 samples/sec   Loss 7.6108   LearningRate 0.0616   Epoch: 4   Global Step: 71850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:25:29,100-Speed 9681.70 samples/sec   Loss 7.7560   LearningRate 0.0616   Epoch: 4   Global Step: 71860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:25:30,145-Speed 9808.45 samples/sec   Loss 7.7065   LearningRate 0.0616   Epoch: 4   Global Step: 71870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:25:31,266-Speed 9147.20 samples/sec   Loss 7.6847   LearningRate 0.0616   Epoch: 4   Global Step: 71880   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:25:32,360-Speed 9360.15 samples/sec   Loss 7.7373   LearningRate 0.0616   Epoch: 4   Global Step: 71890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:25:33,409-Speed 9768.88 samples/sec   Loss 7.6493   LearningRate 0.0616   Epoch: 4   Global Step: 71900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:25:34,486-Speed 9516.33 samples/sec   Loss 7.8857   LearningRate 0.0616   Epoch: 4   Global Step: 71910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:25:35,584-Speed 9328.39 samples/sec   Loss 7.7622   LearningRate 0.0616   Epoch: 4   Global Step: 71920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:25:36,648-Speed 9632.04 samples/sec   Loss 7.7293   LearningRate 0.0615   Epoch: 4   Global Step: 71930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:25:37,722-Speed 9543.57 samples/sec   Loss 7.6645   LearningRate 0.0615   Epoch: 4   Global Step: 71940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:25:38,770-Speed 9777.08 samples/sec   Loss 7.7445   LearningRate 0.0615   Epoch: 4   Global Step: 71950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:25:39,812-Speed 9831.37 samples/sec   Loss 7.7587   LearningRate 0.0615   Epoch: 4   Global Step: 71960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:25:40,881-Speed 9581.78 samples/sec   Loss 7.7119   LearningRate 0.0615   Epoch: 4   Global Step: 71970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:25:41,961-Speed 9489.02 samples/sec   Loss 7.6899   LearningRate 0.0615   Epoch: 4   Global Step: 71980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:25:43,037-Speed 9522.09 samples/sec   Loss 7.7383   LearningRate 0.0615   Epoch: 4   Global Step: 71990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:25:44,132-Speed 9360.48 samples/sec   Loss 7.7833   LearningRate 0.0615   Epoch: 4   Global Step: 72000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:26:06,353-[lfw][72000]XNorm: 12.254861
Training: 2022-04-11 14:26:06,353-[lfw][72000]Accuracy-Flip: 0.99550+-0.00279
Training: 2022-04-11 14:26:06,354-[lfw][72000]Accuracy-Highest: 0.99583
Training: 2022-04-11 14:26:31,903-[cfp_fp][72000]XNorm: 10.391574
Training: 2022-04-11 14:26:31,904-[cfp_fp][72000]Accuracy-Flip: 0.94943+-0.01196
Training: 2022-04-11 14:26:31,904-[cfp_fp][72000]Accuracy-Highest: 0.95171
Training: 2022-04-11 14:26:53,978-[agedb_30][72000]XNorm: 11.810187
Training: 2022-04-11 14:26:53,978-[agedb_30][72000]Accuracy-Flip: 0.95900+-0.00867
Training: 2022-04-11 14:26:53,979-[agedb_30][72000]Accuracy-Highest: 0.96033
Training: 2022-04-11 14:26:55,068-Speed 144.36 samples/sec   Loss 7.6925   LearningRate 0.0615   Epoch: 4   Global Step: 72010   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:26:56,125-Speed 9687.95 samples/sec   Loss 7.7176   LearningRate 0.0615   Epoch: 4   Global Step: 72020   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:26:57,179-Speed 9720.98 samples/sec   Loss 7.6447   LearningRate 0.0615   Epoch: 4   Global Step: 72030   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:26:58,264-Speed 9442.91 samples/sec   Loss 7.7842   LearningRate 0.0615   Epoch: 4   Global Step: 72040   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:26:59,354-Speed 9403.19 samples/sec   Loss 7.7173   LearningRate 0.0615   Epoch: 4   Global Step: 72050   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:00,455-Speed 9304.56 samples/sec   Loss 7.8604   LearningRate 0.0615   Epoch: 4   Global Step: 72060   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:01,534-Speed 9505.46 samples/sec   Loss 7.6283   LearningRate 0.0615   Epoch: 4   Global Step: 72070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:02,575-Speed 9845.13 samples/sec   Loss 7.6656   LearningRate 0.0615   Epoch: 4   Global Step: 72080   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:03,653-Speed 9501.66 samples/sec   Loss 7.7763   LearningRate 0.0615   Epoch: 4   Global Step: 72090   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:04,724-Speed 9570.97 samples/sec   Loss 7.7590   LearningRate 0.0615   Epoch: 4   Global Step: 72100   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:27:05,790-Speed 9610.04 samples/sec   Loss 7.7062   LearningRate 0.0615   Epoch: 4   Global Step: 72110   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:06,858-Speed 9590.82 samples/sec   Loss 7.7312   LearningRate 0.0615   Epoch: 4   Global Step: 72120   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:07,919-Speed 9659.47 samples/sec   Loss 7.7742   LearningRate 0.0615   Epoch: 4   Global Step: 72130   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:08,988-Speed 9578.06 samples/sec   Loss 7.7106   LearningRate 0.0614   Epoch: 4   Global Step: 72140   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:10,056-Speed 9596.50 samples/sec   Loss 7.7237   LearningRate 0.0614   Epoch: 4   Global Step: 72150   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:11,155-Speed 9322.80 samples/sec   Loss 7.7375   LearningRate 0.0614   Epoch: 4   Global Step: 72160   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:12,268-Speed 9209.48 samples/sec   Loss 7.6695   LearningRate 0.0614   Epoch: 4   Global Step: 72170   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:13,339-Speed 9568.02 samples/sec   Loss 7.8288   LearningRate 0.0614   Epoch: 4   Global Step: 72180   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:14,383-Speed 9811.12 samples/sec   Loss 7.7334   LearningRate 0.0614   Epoch: 4   Global Step: 72190   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:15,439-Speed 9702.23 samples/sec   Loss 7.7380   LearningRate 0.0614   Epoch: 4   Global Step: 72200   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:16,535-Speed 9350.75 samples/sec   Loss 7.6980   LearningRate 0.0614   Epoch: 4   Global Step: 72210   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:17,627-Speed 9383.36 samples/sec   Loss 7.6565   LearningRate 0.0614   Epoch: 4   Global Step: 72220   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:18,697-Speed 9576.29 samples/sec   Loss 7.6845   LearningRate 0.0614   Epoch: 4   Global Step: 72230   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:19,796-Speed 9323.63 samples/sec   Loss 7.7047   LearningRate 0.0614   Epoch: 4   Global Step: 72240   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:20,890-Speed 9371.79 samples/sec   Loss 7.7286   LearningRate 0.0614   Epoch: 4   Global Step: 72250   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:22,020-Speed 9063.21 samples/sec   Loss 7.6936   LearningRate 0.0614   Epoch: 4   Global Step: 72260   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:23,124-Speed 9283.80 samples/sec   Loss 7.7752   LearningRate 0.0614   Epoch: 4   Global Step: 72270   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:24,183-Speed 9673.55 samples/sec   Loss 7.7234   LearningRate 0.0614   Epoch: 4   Global Step: 72280   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:25,258-Speed 9524.50 samples/sec   Loss 7.7919   LearningRate 0.0614   Epoch: 4   Global Step: 72290   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:26,353-Speed 9361.48 samples/sec   Loss 7.5490   LearningRate 0.0614   Epoch: 4   Global Step: 72300   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:27,454-Speed 9305.25 samples/sec   Loss 7.5429   LearningRate 0.0614   Epoch: 4   Global Step: 72310   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:27:28,529-Speed 9533.45 samples/sec   Loss 7.6734   LearningRate 0.0614   Epoch: 4   Global Step: 72320   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:27:29,624-Speed 9362.00 samples/sec   Loss 7.6521   LearningRate 0.0614   Epoch: 4   Global Step: 72330   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:27:30,739-Speed 9189.24 samples/sec   Loss 7.7338   LearningRate 0.0614   Epoch: 4   Global Step: 72340   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:31,812-Speed 9544.94 samples/sec   Loss 7.7163   LearningRate 0.0614   Epoch: 4   Global Step: 72350   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:32,869-Speed 9693.97 samples/sec   Loss 7.5407   LearningRate 0.0613   Epoch: 4   Global Step: 72360   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:33,922-Speed 9735.87 samples/sec   Loss 7.8638   LearningRate 0.0613   Epoch: 4   Global Step: 72370   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:35,041-Speed 9150.41 samples/sec   Loss 7.6920   LearningRate 0.0613   Epoch: 4   Global Step: 72380   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:36,126-Speed 9444.22 samples/sec   Loss 7.7430   LearningRate 0.0613   Epoch: 4   Global Step: 72390   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:37,234-Speed 9249.63 samples/sec   Loss 7.7236   LearningRate 0.0613   Epoch: 4   Global Step: 72400   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:38,315-Speed 9481.14 samples/sec   Loss 7.5874   LearningRate 0.0613   Epoch: 4   Global Step: 72410   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:39,425-Speed 9229.67 samples/sec   Loss 7.6475   LearningRate 0.0613   Epoch: 4   Global Step: 72420   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:40,566-Speed 8975.04 samples/sec   Loss 7.8037   LearningRate 0.0613   Epoch: 4   Global Step: 72430   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:41,641-Speed 9528.74 samples/sec   Loss 7.6763   LearningRate 0.0613   Epoch: 4   Global Step: 72440   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:27:42,706-Speed 9626.85 samples/sec   Loss 7.7216   LearningRate 0.0613   Epoch: 4   Global Step: 72450   Fp16 Grad Scale: 262144   Required: 11 hours
Training: 2022-04-11 14:27:43,776-Speed 9569.47 samples/sec   Loss 7.6346   LearningRate 0.0613   Epoch: 4   Global Step: 72460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:44,887-Speed 9228.20 samples/sec   Loss 7.7020   LearningRate 0.0613   Epoch: 4   Global Step: 72470   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:45,963-Speed 9522.40 samples/sec   Loss 7.7713   LearningRate 0.0613   Epoch: 4   Global Step: 72480   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:47,048-Speed 9446.74 samples/sec   Loss 7.7897   LearningRate 0.0613   Epoch: 4   Global Step: 72490   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:48,167-Speed 9149.76 samples/sec   Loss 7.7014   LearningRate 0.0613   Epoch: 4   Global Step: 72500   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:49,252-Speed 9450.96 samples/sec   Loss 7.7439   LearningRate 0.0613   Epoch: 4   Global Step: 72510   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:50,377-Speed 9108.00 samples/sec   Loss 7.6534   LearningRate 0.0613   Epoch: 4   Global Step: 72520   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:51,444-Speed 9609.21 samples/sec   Loss 7.6724   LearningRate 0.0613   Epoch: 4   Global Step: 72530   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:52,535-Speed 9386.53 samples/sec   Loss 7.7020   LearningRate 0.0613   Epoch: 4   Global Step: 72540   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:53,573-Speed 9878.26 samples/sec   Loss 7.5872   LearningRate 0.0613   Epoch: 4   Global Step: 72550   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:54,635-Speed 9640.28 samples/sec   Loss 7.7825   LearningRate 0.0613   Epoch: 4   Global Step: 72560   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:55,727-Speed 9390.58 samples/sec   Loss 7.7078   LearningRate 0.0612   Epoch: 4   Global Step: 72570   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:56,788-Speed 9657.13 samples/sec   Loss 7.6699   LearningRate 0.0612   Epoch: 4   Global Step: 72580   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:57,874-Speed 9433.60 samples/sec   Loss 7.6908   LearningRate 0.0612   Epoch: 4   Global Step: 72590   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:27:59,014-Speed 8984.19 samples/sec   Loss 7.7327   LearningRate 0.0612   Epoch: 4   Global Step: 72600   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:28:00,090-Speed 9523.20 samples/sec   Loss 7.6681   LearningRate 0.0612   Epoch: 4   Global Step: 72610   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:28:01,180-Speed 9397.36 samples/sec   Loss 7.7447   LearningRate 0.0612   Epoch: 4   Global Step: 72620   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:28:02,274-Speed 9364.85 samples/sec   Loss 7.7184   LearningRate 0.0612   Epoch: 4   Global Step: 72630   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:28:03,388-Speed 9199.73 samples/sec   Loss 7.7549   LearningRate 0.0612   Epoch: 4   Global Step: 72640   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-04-11 14:28:04,488-Speed 9313.13 samples/sec   Loss 7.6965   LearningRate 0.0612   Epoch: 4   Global Step: 72650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:05,622-Speed 9040.80 samples/sec   Loss 7.7118   LearningRate 0.0612   Epoch: 4   Global Step: 72660   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:28:06,692-Speed 9572.51 samples/sec   Loss 7.7246   LearningRate 0.0612   Epoch: 4   Global Step: 72670   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:28:07,806-Speed 9203.41 samples/sec   Loss 7.6653   LearningRate 0.0612   Epoch: 4   Global Step: 72680   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:28:08,867-Speed 9653.74 samples/sec   Loss 7.7023   LearningRate 0.0612   Epoch: 4   Global Step: 72690   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:28:09,946-Speed 9502.38 samples/sec   Loss 7.7217   LearningRate 0.0612   Epoch: 4   Global Step: 72700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:11,039-Speed 9371.04 samples/sec   Loss 7.7522   LearningRate 0.0612   Epoch: 4   Global Step: 72710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:12,120-Speed 9476.53 samples/sec   Loss 7.7470   LearningRate 0.0612   Epoch: 4   Global Step: 72720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:13,202-Speed 9469.70 samples/sec   Loss 7.6959   LearningRate 0.0612   Epoch: 4   Global Step: 72730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:14,310-Speed 9249.06 samples/sec   Loss 7.8107   LearningRate 0.0612   Epoch: 4   Global Step: 72740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:15,383-Speed 9549.44 samples/sec   Loss 7.7913   LearningRate 0.0612   Epoch: 4   Global Step: 72750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:16,486-Speed 9285.42 samples/sec   Loss 7.7274   LearningRate 0.0612   Epoch: 4   Global Step: 72760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:17,576-Speed 9405.48 samples/sec   Loss 7.6315   LearningRate 0.0612   Epoch: 4   Global Step: 72770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:18,648-Speed 9549.16 samples/sec   Loss 7.6407   LearningRate 0.0611   Epoch: 4   Global Step: 72780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:19,732-Speed 9463.87 samples/sec   Loss 7.6696   LearningRate 0.0611   Epoch: 4   Global Step: 72790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:20,788-Speed 9694.34 samples/sec   Loss 7.5945   LearningRate 0.0611   Epoch: 4   Global Step: 72800   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:28:21,877-Speed 9409.85 samples/sec   Loss 7.7845   LearningRate 0.0611   Epoch: 4   Global Step: 72810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:22,997-Speed 9146.35 samples/sec   Loss 7.7850   LearningRate 0.0611   Epoch: 4   Global Step: 72820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:24,141-Speed 8958.24 samples/sec   Loss 7.6653   LearningRate 0.0611   Epoch: 4   Global Step: 72830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:25,241-Speed 9315.29 samples/sec   Loss 7.5994   LearningRate 0.0611   Epoch: 4   Global Step: 72840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:26,338-Speed 9338.51 samples/sec   Loss 7.7591   LearningRate 0.0611   Epoch: 4   Global Step: 72850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:27,421-Speed 9463.23 samples/sec   Loss 7.7713   LearningRate 0.0611   Epoch: 4   Global Step: 72860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:28,451-Speed 9956.18 samples/sec   Loss 7.7262   LearningRate 0.0611   Epoch: 4   Global Step: 72870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:29,480-Speed 9953.33 samples/sec   Loss 7.6434   LearningRate 0.0611   Epoch: 4   Global Step: 72880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:30,560-Speed 9489.80 samples/sec   Loss 7.7082   LearningRate 0.0611   Epoch: 4   Global Step: 72890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:31,623-Speed 9637.61 samples/sec   Loss 7.8148   LearningRate 0.0611   Epoch: 4   Global Step: 72900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:32,709-Speed 9433.78 samples/sec   Loss 7.7894   LearningRate 0.0611   Epoch: 4   Global Step: 72910   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:28:33,817-Speed 9244.47 samples/sec   Loss 7.7318   LearningRate 0.0611   Epoch: 4   Global Step: 72920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:34,954-Speed 9012.89 samples/sec   Loss 7.7790   LearningRate 0.0611   Epoch: 4   Global Step: 72930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:36,005-Speed 9745.23 samples/sec   Loss 7.6325   LearningRate 0.0611   Epoch: 4   Global Step: 72940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:37,103-Speed 9331.47 samples/sec   Loss 7.6064   LearningRate 0.0611   Epoch: 4   Global Step: 72950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:38,238-Speed 9028.67 samples/sec   Loss 7.7517   LearningRate 0.0611   Epoch: 4   Global Step: 72960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:39,289-Speed 9751.20 samples/sec   Loss 7.6306   LearningRate 0.0611   Epoch: 4   Global Step: 72970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:40,410-Speed 9144.45 samples/sec   Loss 7.5700   LearningRate 0.0611   Epoch: 4   Global Step: 72980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:41,538-Speed 9075.55 samples/sec   Loss 7.9058   LearningRate 0.0611   Epoch: 4   Global Step: 72990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:42,620-Speed 9473.67 samples/sec   Loss 7.7986   LearningRate 0.0610   Epoch: 4   Global Step: 73000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:43,700-Speed 9486.68 samples/sec   Loss 7.7305   LearningRate 0.0610   Epoch: 4   Global Step: 73010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:44,770-Speed 9574.31 samples/sec   Loss 7.6833   LearningRate 0.0610   Epoch: 4   Global Step: 73020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:45,862-Speed 9391.32 samples/sec   Loss 7.5371   LearningRate 0.0610   Epoch: 4   Global Step: 73030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:28:46,943-Speed 9473.38 samples/sec   Loss 7.5939   LearningRate 0.0610   Epoch: 4   Global Step: 73040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:28:47,992-Speed 9772.81 samples/sec   Loss 7.7248   LearningRate 0.0610   Epoch: 4   Global Step: 73050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:28:49,068-Speed 9526.85 samples/sec   Loss 7.7862   LearningRate 0.0610   Epoch: 4   Global Step: 73060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:28:50,153-Speed 9441.98 samples/sec   Loss 7.7330   LearningRate 0.0610   Epoch: 4   Global Step: 73070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:28:51,194-Speed 9837.88 samples/sec   Loss 7.5012   LearningRate 0.0610   Epoch: 4   Global Step: 73080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:28:52,253-Speed 9673.75 samples/sec   Loss 7.6964   LearningRate 0.0610   Epoch: 4   Global Step: 73090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:28:53,352-Speed 9327.29 samples/sec   Loss 7.6438   LearningRate 0.0610   Epoch: 4   Global Step: 73100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:28:54,397-Speed 9804.18 samples/sec   Loss 7.6869   LearningRate 0.0610   Epoch: 4   Global Step: 73110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:28:55,527-Speed 9071.89 samples/sec   Loss 7.8086   LearningRate 0.0610   Epoch: 4   Global Step: 73120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:28:56,620-Speed 9373.01 samples/sec   Loss 7.7271   LearningRate 0.0610   Epoch: 4   Global Step: 73130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:57,688-Speed 9593.50 samples/sec   Loss 7.6008   LearningRate 0.0610   Epoch: 4   Global Step: 73140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:58,760-Speed 9557.45 samples/sec   Loss 7.6304   LearningRate 0.0610   Epoch: 4   Global Step: 73150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:28:59,864-Speed 9278.91 samples/sec   Loss 7.6955   LearningRate 0.0610   Epoch: 4   Global Step: 73160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:00,966-Speed 9299.68 samples/sec   Loss 7.7380   LearningRate 0.0610   Epoch: 4   Global Step: 73170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:02,074-Speed 9245.28 samples/sec   Loss 7.8284   LearningRate 0.0610   Epoch: 4   Global Step: 73180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:03,190-Speed 9175.17 samples/sec   Loss 7.7570   LearningRate 0.0610   Epoch: 4   Global Step: 73190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:04,276-Speed 9438.27 samples/sec   Loss 7.7772   LearningRate 0.0610   Epoch: 4   Global Step: 73200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:05,387-Speed 9223.49 samples/sec   Loss 7.6842   LearningRate 0.0609   Epoch: 4   Global Step: 73210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:06,463-Speed 9525.68 samples/sec   Loss 7.6894   LearningRate 0.0609   Epoch: 4   Global Step: 73220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:07,598-Speed 9030.23 samples/sec   Loss 7.7351   LearningRate 0.0609   Epoch: 4   Global Step: 73230   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:29:08,676-Speed 9501.26 samples/sec   Loss 7.7049   LearningRate 0.0609   Epoch: 4   Global Step: 73240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:09,781-Speed 9269.87 samples/sec   Loss 7.8200   LearningRate 0.0609   Epoch: 4   Global Step: 73250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:10,874-Speed 9374.82 samples/sec   Loss 7.6273   LearningRate 0.0609   Epoch: 4   Global Step: 73260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:11,956-Speed 9469.71 samples/sec   Loss 7.7312   LearningRate 0.0609   Epoch: 4   Global Step: 73270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:13,120-Speed 8800.06 samples/sec   Loss 7.5612   LearningRate 0.0609   Epoch: 4   Global Step: 73280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:14,228-Speed 9246.47 samples/sec   Loss 7.5990   LearningRate 0.0609   Epoch: 4   Global Step: 73290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:15,332-Speed 9287.17 samples/sec   Loss 7.8705   LearningRate 0.0609   Epoch: 4   Global Step: 73300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:16,426-Speed 9360.11 samples/sec   Loss 7.6854   LearningRate 0.0609   Epoch: 4   Global Step: 73310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:17,580-Speed 8877.18 samples/sec   Loss 7.8262   LearningRate 0.0609   Epoch: 4   Global Step: 73320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:18,675-Speed 9361.88 samples/sec   Loss 7.7480   LearningRate 0.0609   Epoch: 4   Global Step: 73330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:19,756-Speed 9480.82 samples/sec   Loss 7.8457   LearningRate 0.0609   Epoch: 4   Global Step: 73340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:20,877-Speed 9138.64 samples/sec   Loss 7.6272   LearningRate 0.0609   Epoch: 4   Global Step: 73350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:21,967-Speed 9397.32 samples/sec   Loss 7.8175   LearningRate 0.0609   Epoch: 4   Global Step: 73360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:23,079-Speed 9218.44 samples/sec   Loss 7.7287   LearningRate 0.0609   Epoch: 4   Global Step: 73370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:24,164-Speed 9439.96 samples/sec   Loss 7.6978   LearningRate 0.0609   Epoch: 4   Global Step: 73380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:25,235-Speed 9568.86 samples/sec   Loss 7.6075   LearningRate 0.0609   Epoch: 4   Global Step: 73390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:26,319-Speed 9451.29 samples/sec   Loss 7.6173   LearningRate 0.0609   Epoch: 4   Global Step: 73400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:27,375-Speed 9705.05 samples/sec   Loss 7.6824   LearningRate 0.0609   Epoch: 4   Global Step: 73410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:28,464-Speed 9405.40 samples/sec   Loss 7.7179   LearningRate 0.0608   Epoch: 4   Global Step: 73420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:29,546-Speed 9467.81 samples/sec   Loss 7.6892   LearningRate 0.0608   Epoch: 4   Global Step: 73430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:30,616-Speed 9578.56 samples/sec   Loss 7.6587   LearningRate 0.0608   Epoch: 4   Global Step: 73440   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:29:31,694-Speed 9503.63 samples/sec   Loss 7.6725   LearningRate 0.0608   Epoch: 4   Global Step: 73450   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:29:32,786-Speed 9382.82 samples/sec   Loss 7.7089   LearningRate 0.0608   Epoch: 4   Global Step: 73460   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:29:33,890-Speed 9275.96 samples/sec   Loss 7.7386   LearningRate 0.0608   Epoch: 4   Global Step: 73470   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:29:34,990-Speed 9321.20 samples/sec   Loss 7.7688   LearningRate 0.0608   Epoch: 4   Global Step: 73480   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:29:36,072-Speed 9464.32 samples/sec   Loss 7.6988   LearningRate 0.0608   Epoch: 4   Global Step: 73490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:37,174-Speed 9302.10 samples/sec   Loss 7.7810   LearningRate 0.0608   Epoch: 4   Global Step: 73500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:38,266-Speed 9376.24 samples/sec   Loss 7.7324   LearningRate 0.0608   Epoch: 4   Global Step: 73510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:39,375-Speed 9241.99 samples/sec   Loss 7.6737   LearningRate 0.0608   Epoch: 4   Global Step: 73520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:40,506-Speed 9062.37 samples/sec   Loss 7.7418   LearningRate 0.0608   Epoch: 4   Global Step: 73530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:41,604-Speed 9332.86 samples/sec   Loss 7.6988   LearningRate 0.0608   Epoch: 4   Global Step: 73540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:42,720-Speed 9178.73 samples/sec   Loss 7.6309   LearningRate 0.0608   Epoch: 4   Global Step: 73550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:43,784-Speed 9631.46 samples/sec   Loss 7.7244   LearningRate 0.0608   Epoch: 4   Global Step: 73560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:44,872-Speed 9418.07 samples/sec   Loss 7.7401   LearningRate 0.0608   Epoch: 4   Global Step: 73570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:45,930-Speed 9680.37 samples/sec   Loss 7.7200   LearningRate 0.0608   Epoch: 4   Global Step: 73580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:47,009-Speed 9497.03 samples/sec   Loss 7.7319   LearningRate 0.0608   Epoch: 4   Global Step: 73590   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:29:48,075-Speed 9609.83 samples/sec   Loss 7.6727   LearningRate 0.0608   Epoch: 4   Global Step: 73600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:49,151-Speed 9531.01 samples/sec   Loss 7.7424   LearningRate 0.0608   Epoch: 4   Global Step: 73610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:50,204-Speed 9724.37 samples/sec   Loss 7.7768   LearningRate 0.0608   Epoch: 4   Global Step: 73620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:51,264-Speed 9664.30 samples/sec   Loss 7.7218   LearningRate 0.0608   Epoch: 4   Global Step: 73630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:52,336-Speed 9564.65 samples/sec   Loss 7.8009   LearningRate 0.0607   Epoch: 4   Global Step: 73640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:53,436-Speed 9309.77 samples/sec   Loss 7.6936   LearningRate 0.0607   Epoch: 4   Global Step: 73650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:54,566-Speed 9070.80 samples/sec   Loss 7.8071   LearningRate 0.0607   Epoch: 4   Global Step: 73660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:55,686-Speed 9151.35 samples/sec   Loss 7.7436   LearningRate 0.0607   Epoch: 4   Global Step: 73670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:56,769-Speed 9455.58 samples/sec   Loss 7.7061   LearningRate 0.0607   Epoch: 4   Global Step: 73680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:57,808-Speed 9864.53 samples/sec   Loss 7.6785   LearningRate 0.0607   Epoch: 4   Global Step: 73690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:29:58,855-Speed 9788.83 samples/sec   Loss 7.5955   LearningRate 0.0607   Epoch: 4   Global Step: 73700   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:29:59,950-Speed 9356.29 samples/sec   Loss 7.7941   LearningRate 0.0607   Epoch: 4   Global Step: 73710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:30:01,027-Speed 9513.88 samples/sec   Loss 7.7394   LearningRate 0.0607   Epoch: 4   Global Step: 73720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:30:02,140-Speed 9206.96 samples/sec   Loss 7.6673   LearningRate 0.0607   Epoch: 4   Global Step: 73730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:30:03,259-Speed 9154.93 samples/sec   Loss 7.7148   LearningRate 0.0607   Epoch: 4   Global Step: 73740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:30:04,313-Speed 9717.44 samples/sec   Loss 7.7153   LearningRate 0.0607   Epoch: 4   Global Step: 73750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:30:05,376-Speed 9637.79 samples/sec   Loss 7.7951   LearningRate 0.0607   Epoch: 4   Global Step: 73760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:30:06,429-Speed 9733.65 samples/sec   Loss 7.6885   LearningRate 0.0607   Epoch: 4   Global Step: 73770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:30:07,467-Speed 9868.42 samples/sec   Loss 7.7644   LearningRate 0.0607   Epoch: 4   Global Step: 73780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:30:08,578-Speed 9225.09 samples/sec   Loss 7.6507   LearningRate 0.0607   Epoch: 4   Global Step: 73790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:09,683-Speed 9278.61 samples/sec   Loss 7.7640   LearningRate 0.0607   Epoch: 4   Global Step: 73800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:10,792-Speed 9235.84 samples/sec   Loss 7.7103   LearningRate 0.0607   Epoch: 4   Global Step: 73810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:11,887-Speed 9357.52 samples/sec   Loss 7.6746   LearningRate 0.0607   Epoch: 4   Global Step: 73820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:13,016-Speed 9073.63 samples/sec   Loss 7.7038   LearningRate 0.0607   Epoch: 4   Global Step: 73830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:14,167-Speed 8896.22 samples/sec   Loss 7.7152   LearningRate 0.0607   Epoch: 4   Global Step: 73840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:15,258-Speed 9395.57 samples/sec   Loss 7.5860   LearningRate 0.0606   Epoch: 4   Global Step: 73850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:16,362-Speed 9281.16 samples/sec   Loss 7.6463   LearningRate 0.0606   Epoch: 4   Global Step: 73860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:17,453-Speed 9393.21 samples/sec   Loss 7.7060   LearningRate 0.0606   Epoch: 4   Global Step: 73870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:18,515-Speed 9641.70 samples/sec   Loss 7.6702   LearningRate 0.0606   Epoch: 4   Global Step: 73880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:19,593-Speed 9512.98 samples/sec   Loss 7.7246   LearningRate 0.0606   Epoch: 4   Global Step: 73890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:30:20,663-Speed 9577.20 samples/sec   Loss 7.7398   LearningRate 0.0606   Epoch: 4   Global Step: 73900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:30:21,758-Speed 9357.88 samples/sec   Loss 7.6845   LearningRate 0.0606   Epoch: 4   Global Step: 73910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:30:22,860-Speed 9298.80 samples/sec   Loss 7.6799   LearningRate 0.0606   Epoch: 4   Global Step: 73920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:30:23,950-Speed 9396.70 samples/sec   Loss 7.7193   LearningRate 0.0606   Epoch: 4   Global Step: 73930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:30:25,011-Speed 9659.36 samples/sec   Loss 7.7164   LearningRate 0.0606   Epoch: 4   Global Step: 73940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:26,091-Speed 9487.60 samples/sec   Loss 7.6826   LearningRate 0.0606   Epoch: 4   Global Step: 73950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:27,159-Speed 9593.85 samples/sec   Loss 7.7756   LearningRate 0.0606   Epoch: 4   Global Step: 73960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:28,267-Speed 9246.22 samples/sec   Loss 7.5107   LearningRate 0.0606   Epoch: 4   Global Step: 73970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:29,343-Speed 9521.97 samples/sec   Loss 7.6888   LearningRate 0.0606   Epoch: 4   Global Step: 73980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:30,417-Speed 9536.33 samples/sec   Loss 7.6060   LearningRate 0.0606   Epoch: 4   Global Step: 73990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:31,471-Speed 9724.09 samples/sec   Loss 7.7406   LearningRate 0.0606   Epoch: 4   Global Step: 74000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:30:53,242-[lfw][74000]XNorm: 12.354982
Training: 2022-04-11 14:30:53,243-[lfw][74000]Accuracy-Flip: 0.99550+-0.00248
Training: 2022-04-11 14:30:53,244-[lfw][74000]Accuracy-Highest: 0.99583
Training: 2022-04-11 14:31:18,400-[cfp_fp][74000]XNorm: 10.379946
Training: 2022-04-11 14:31:18,401-[cfp_fp][74000]Accuracy-Flip: 0.95143+-0.01268
Training: 2022-04-11 14:31:18,401-[cfp_fp][74000]Accuracy-Highest: 0.95171
Training: 2022-04-11 14:31:40,135-[agedb_30][74000]XNorm: 11.873969
Training: 2022-04-11 14:31:40,136-[agedb_30][74000]Accuracy-Flip: 0.95967+-0.01122
Training: 2022-04-11 14:31:40,137-[agedb_30][74000]Accuracy-Highest: 0.96033
Training: 2022-04-11 14:31:41,212-Speed 146.83 samples/sec   Loss 7.6394   LearningRate 0.0606   Epoch: 4   Global Step: 74010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:31:42,317-Speed 9277.22 samples/sec   Loss 7.6369   LearningRate 0.0606   Epoch: 4   Global Step: 74020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:31:43,401-Speed 9457.38 samples/sec   Loss 7.7098   LearningRate 0.0606   Epoch: 4   Global Step: 74030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:31:44,509-Speed 9242.35 samples/sec   Loss 7.6464   LearningRate 0.0606   Epoch: 4   Global Step: 74040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:31:45,577-Speed 9591.97 samples/sec   Loss 7.8545   LearningRate 0.0606   Epoch: 4   Global Step: 74050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:31:46,719-Speed 8971.72 samples/sec   Loss 7.6583   LearningRate 0.0606   Epoch: 4   Global Step: 74060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:31:47,817-Speed 9330.10 samples/sec   Loss 7.6999   LearningRate 0.0605   Epoch: 4   Global Step: 74070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:31:48,912-Speed 9354.79 samples/sec   Loss 7.7060   LearningRate 0.0605   Epoch: 4   Global Step: 74080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:31:50,029-Speed 9176.46 samples/sec   Loss 7.7220   LearningRate 0.0605   Epoch: 4   Global Step: 74090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:31:51,081-Speed 9742.44 samples/sec   Loss 7.7152   LearningRate 0.0605   Epoch: 4   Global Step: 74100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:31:52,154-Speed 9549.45 samples/sec   Loss 7.6948   LearningRate 0.0605   Epoch: 4   Global Step: 74110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:31:53,230-Speed 9524.92 samples/sec   Loss 7.7888   LearningRate 0.0605   Epoch: 4   Global Step: 74120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:31:54,303-Speed 9545.98 samples/sec   Loss 7.7703   LearningRate 0.0605   Epoch: 4   Global Step: 74130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:31:55,405-Speed 9296.44 samples/sec   Loss 7.7252   LearningRate 0.0605   Epoch: 4   Global Step: 74140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:31:56,512-Speed 9255.89 samples/sec   Loss 7.7835   LearningRate 0.0605   Epoch: 4   Global Step: 74150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:31:57,559-Speed 9787.96 samples/sec   Loss 7.7161   LearningRate 0.0605   Epoch: 4   Global Step: 74160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:31:58,660-Speed 9304.76 samples/sec   Loss 7.6076   LearningRate 0.0605   Epoch: 4   Global Step: 74170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:31:59,738-Speed 9505.90 samples/sec   Loss 7.6798   LearningRate 0.0605   Epoch: 4   Global Step: 74180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:00,785-Speed 9785.57 samples/sec   Loss 7.7552   LearningRate 0.0605   Epoch: 4   Global Step: 74190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:01,897-Speed 9215.50 samples/sec   Loss 7.8175   LearningRate 0.0605   Epoch: 4   Global Step: 74200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:02,999-Speed 9300.93 samples/sec   Loss 7.8126   LearningRate 0.0605   Epoch: 4   Global Step: 74210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:04,101-Speed 9294.09 samples/sec   Loss 7.7293   LearningRate 0.0605   Epoch: 4   Global Step: 74220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:05,212-Speed 9221.15 samples/sec   Loss 7.7448   LearningRate 0.0605   Epoch: 4   Global Step: 74230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:06,271-Speed 9681.60 samples/sec   Loss 7.9101   LearningRate 0.0605   Epoch: 4   Global Step: 74240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:07,347-Speed 9521.81 samples/sec   Loss 7.6965   LearningRate 0.0605   Epoch: 4   Global Step: 74250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:08,456-Speed 9234.98 samples/sec   Loss 7.7927   LearningRate 0.0605   Epoch: 4   Global Step: 74260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:09,577-Speed 9147.57 samples/sec   Loss 7.6317   LearningRate 0.0605   Epoch: 4   Global Step: 74270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:10,674-Speed 9338.40 samples/sec   Loss 7.7108   LearningRate 0.0604   Epoch: 4   Global Step: 74280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:11,779-Speed 9271.49 samples/sec   Loss 7.6179   LearningRate 0.0604   Epoch: 4   Global Step: 74290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:12,866-Speed 9432.21 samples/sec   Loss 7.6280   LearningRate 0.0604   Epoch: 4   Global Step: 74300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:14,016-Speed 8904.06 samples/sec   Loss 7.7495   LearningRate 0.0604   Epoch: 4   Global Step: 74310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:15,077-Speed 9661.41 samples/sec   Loss 7.6992   LearningRate 0.0604   Epoch: 4   Global Step: 74320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:16,213-Speed 9020.48 samples/sec   Loss 7.7805   LearningRate 0.0604   Epoch: 4   Global Step: 74330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:17,315-Speed 9294.42 samples/sec   Loss 7.6491   LearningRate 0.0604   Epoch: 4   Global Step: 74340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:18,397-Speed 9472.09 samples/sec   Loss 7.7296   LearningRate 0.0604   Epoch: 4   Global Step: 74350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:19,498-Speed 9312.29 samples/sec   Loss 7.7827   LearningRate 0.0604   Epoch: 4   Global Step: 74360   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:32:20,559-Speed 9654.02 samples/sec   Loss 7.7711   LearningRate 0.0604   Epoch: 4   Global Step: 74370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:21,629-Speed 9573.64 samples/sec   Loss 7.7994   LearningRate 0.0604   Epoch: 4   Global Step: 74380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:22,724-Speed 9355.87 samples/sec   Loss 7.6494   LearningRate 0.0604   Epoch: 4   Global Step: 74390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:23,850-Speed 9097.46 samples/sec   Loss 7.7080   LearningRate 0.0604   Epoch: 4   Global Step: 74400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:24,931-Speed 9485.83 samples/sec   Loss 7.8726   LearningRate 0.0604   Epoch: 4   Global Step: 74410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:26,035-Speed 9276.77 samples/sec   Loss 7.6046   LearningRate 0.0604   Epoch: 4   Global Step: 74420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:27,142-Speed 9257.27 samples/sec   Loss 7.6652   LearningRate 0.0604   Epoch: 4   Global Step: 74430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:28,243-Speed 9307.14 samples/sec   Loss 7.5779   LearningRate 0.0604   Epoch: 4   Global Step: 74440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:29,285-Speed 9830.40 samples/sec   Loss 7.6955   LearningRate 0.0604   Epoch: 4   Global Step: 74450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:30,377-Speed 9384.34 samples/sec   Loss 7.7307   LearningRate 0.0604   Epoch: 4   Global Step: 74460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:31,484-Speed 9261.74 samples/sec   Loss 7.7081   LearningRate 0.0604   Epoch: 4   Global Step: 74470   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:32:32,572-Speed 9415.95 samples/sec   Loss 7.7337   LearningRate 0.0604   Epoch: 4   Global Step: 74480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:33,703-Speed 9053.41 samples/sec   Loss 7.7903   LearningRate 0.0604   Epoch: 4   Global Step: 74490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:34,806-Speed 9293.73 samples/sec   Loss 7.7990   LearningRate 0.0603   Epoch: 4   Global Step: 74500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:35,898-Speed 9381.38 samples/sec   Loss 7.6577   LearningRate 0.0603   Epoch: 4   Global Step: 74510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:36,973-Speed 9526.83 samples/sec   Loss 7.8045   LearningRate 0.0603   Epoch: 4   Global Step: 74520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:38,080-Speed 9257.96 samples/sec   Loss 7.6777   LearningRate 0.0603   Epoch: 4   Global Step: 74530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:39,165-Speed 9450.11 samples/sec   Loss 7.7354   LearningRate 0.0603   Epoch: 4   Global Step: 74540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:40,234-Speed 9583.62 samples/sec   Loss 7.6254   LearningRate 0.0603   Epoch: 4   Global Step: 74550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:41,304-Speed 9575.12 samples/sec   Loss 7.5729   LearningRate 0.0603   Epoch: 4   Global Step: 74560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:42,431-Speed 9092.90 samples/sec   Loss 7.8025   LearningRate 0.0603   Epoch: 4   Global Step: 74570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:43,511-Speed 9482.46 samples/sec   Loss 7.6950   LearningRate 0.0603   Epoch: 4   Global Step: 74580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:44,595-Speed 9449.51 samples/sec   Loss 7.7670   LearningRate 0.0603   Epoch: 4   Global Step: 74590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:45,704-Speed 9242.05 samples/sec   Loss 7.6441   LearningRate 0.0603   Epoch: 4   Global Step: 74600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:46,764-Speed 9662.96 samples/sec   Loss 7.6063   LearningRate 0.0603   Epoch: 4   Global Step: 74610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:32:47,873-Speed 9242.93 samples/sec   Loss 7.7308   LearningRate 0.0603   Epoch: 4   Global Step: 74620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:48,960-Speed 9424.96 samples/sec   Loss 7.7148   LearningRate 0.0603   Epoch: 4   Global Step: 74630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:50,005-Speed 9805.20 samples/sec   Loss 7.6639   LearningRate 0.0603   Epoch: 4   Global Step: 74640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:51,064-Speed 9678.38 samples/sec   Loss 7.7876   LearningRate 0.0603   Epoch: 4   Global Step: 74650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:52,106-Speed 9833.89 samples/sec   Loss 7.7177   LearningRate 0.0603   Epoch: 4   Global Step: 74660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:53,204-Speed 9333.66 samples/sec   Loss 7.7357   LearningRate 0.0603   Epoch: 4   Global Step: 74670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:54,256-Speed 9736.65 samples/sec   Loss 7.6441   LearningRate 0.0603   Epoch: 4   Global Step: 74680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:55,339-Speed 9465.25 samples/sec   Loss 7.7879   LearningRate 0.0603   Epoch: 4   Global Step: 74690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:56,419-Speed 9483.91 samples/sec   Loss 7.6983   LearningRate 0.0603   Epoch: 4   Global Step: 74700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:57,525-Speed 9267.83 samples/sec   Loss 7.6924   LearningRate 0.0602   Epoch: 4   Global Step: 74710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:32:58,615-Speed 9396.47 samples/sec   Loss 7.6009   LearningRate 0.0602   Epoch: 4   Global Step: 74720   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:32:59,673-Speed 9680.45 samples/sec   Loss 7.5920   LearningRate 0.0602   Epoch: 4   Global Step: 74730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:00,764-Speed 9396.72 samples/sec   Loss 7.5836   LearningRate 0.0602   Epoch: 4   Global Step: 74740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:01,868-Speed 9282.71 samples/sec   Loss 7.6934   LearningRate 0.0602   Epoch: 4   Global Step: 74750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:02,996-Speed 9083.31 samples/sec   Loss 7.7337   LearningRate 0.0602   Epoch: 4   Global Step: 74760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:04,095-Speed 9318.31 samples/sec   Loss 7.6716   LearningRate 0.0602   Epoch: 4   Global Step: 74770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:05,176-Speed 9477.75 samples/sec   Loss 7.6808   LearningRate 0.0602   Epoch: 4   Global Step: 74780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:06,252-Speed 9523.56 samples/sec   Loss 7.7057   LearningRate 0.0602   Epoch: 4   Global Step: 74790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:07,337-Speed 9444.79 samples/sec   Loss 7.6631   LearningRate 0.0602   Epoch: 4   Global Step: 74800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:08,409-Speed 9556.15 samples/sec   Loss 7.6698   LearningRate 0.0602   Epoch: 4   Global Step: 74810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:09,503-Speed 9370.43 samples/sec   Loss 7.8071   LearningRate 0.0602   Epoch: 4   Global Step: 74820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:10,588-Speed 9441.38 samples/sec   Loss 7.7066   LearningRate 0.0602   Epoch: 4   Global Step: 74830   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:33:11,682-Speed 9370.69 samples/sec   Loss 7.7264   LearningRate 0.0602   Epoch: 4   Global Step: 74840   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:33:12,758-Speed 9521.89 samples/sec   Loss 7.7400   LearningRate 0.0602   Epoch: 4   Global Step: 74850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:13,825-Speed 9599.49 samples/sec   Loss 7.6842   LearningRate 0.0602   Epoch: 4   Global Step: 74860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:14,918-Speed 9379.42 samples/sec   Loss 7.7622   LearningRate 0.0602   Epoch: 4   Global Step: 74870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:16,009-Speed 9386.05 samples/sec   Loss 7.8018   LearningRate 0.0602   Epoch: 4   Global Step: 74880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:17,077-Speed 9599.25 samples/sec   Loss 7.6562   LearningRate 0.0602   Epoch: 4   Global Step: 74890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:18,177-Speed 9307.87 samples/sec   Loss 7.7817   LearningRate 0.0602   Epoch: 4   Global Step: 74900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:19,273-Speed 9351.27 samples/sec   Loss 7.7279   LearningRate 0.0602   Epoch: 4   Global Step: 74910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:20,318-Speed 9799.81 samples/sec   Loss 7.8157   LearningRate 0.0602   Epoch: 4   Global Step: 74920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:21,406-Speed 9416.42 samples/sec   Loss 7.7031   LearningRate 0.0601   Epoch: 4   Global Step: 74930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:22,518-Speed 9216.90 samples/sec   Loss 7.6754   LearningRate 0.0601   Epoch: 4   Global Step: 74940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:23,665-Speed 8934.96 samples/sec   Loss 7.6564   LearningRate 0.0601   Epoch: 4   Global Step: 74950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:24,763-Speed 9337.24 samples/sec   Loss 7.6455   LearningRate 0.0601   Epoch: 4   Global Step: 74960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:25,873-Speed 9232.39 samples/sec   Loss 7.7112   LearningRate 0.0601   Epoch: 4   Global Step: 74970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:26,959-Speed 9434.82 samples/sec   Loss 7.6109   LearningRate 0.0601   Epoch: 4   Global Step: 74980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:28,025-Speed 9612.47 samples/sec   Loss 7.7165   LearningRate 0.0601   Epoch: 4   Global Step: 74990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:29,099-Speed 9539.30 samples/sec   Loss 7.7508   LearningRate 0.0601   Epoch: 4   Global Step: 75000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:30,196-Speed 9339.21 samples/sec   Loss 7.8384   LearningRate 0.0601   Epoch: 4   Global Step: 75010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:31,307-Speed 9230.61 samples/sec   Loss 7.7124   LearningRate 0.0601   Epoch: 4   Global Step: 75020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:32,401-Speed 9363.68 samples/sec   Loss 7.6991   LearningRate 0.0601   Epoch: 4   Global Step: 75030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:33,475-Speed 9536.59 samples/sec   Loss 7.7375   LearningRate 0.0601   Epoch: 4   Global Step: 75040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:34,586-Speed 9225.50 samples/sec   Loss 7.6357   LearningRate 0.0601   Epoch: 4   Global Step: 75050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:35,634-Speed 9770.78 samples/sec   Loss 7.6418   LearningRate 0.0601   Epoch: 4   Global Step: 75060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:36,736-Speed 9304.32 samples/sec   Loss 7.6642   LearningRate 0.0601   Epoch: 4   Global Step: 75070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:37,841-Speed 9269.58 samples/sec   Loss 7.6619   LearningRate 0.0601   Epoch: 4   Global Step: 75080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:38,932-Speed 9391.18 samples/sec   Loss 7.7716   LearningRate 0.0601   Epoch: 4   Global Step: 75090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:40,014-Speed 9471.27 samples/sec   Loss 7.7068   LearningRate 0.0601   Epoch: 4   Global Step: 75100   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:33:41,143-Speed 9070.41 samples/sec   Loss 7.7541   LearningRate 0.0601   Epoch: 4   Global Step: 75110   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:33:42,268-Speed 9103.64 samples/sec   Loss 7.7590   LearningRate 0.0601   Epoch: 4   Global Step: 75120   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:33:43,411-Speed 8972.05 samples/sec   Loss 7.7383   LearningRate 0.0601   Epoch: 4   Global Step: 75130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:44,503-Speed 9382.96 samples/sec   Loss 7.7469   LearningRate 0.0600   Epoch: 4   Global Step: 75140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:45,597-Speed 9362.27 samples/sec   Loss 7.7063   LearningRate 0.0600   Epoch: 4   Global Step: 75150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:46,647-Speed 9761.83 samples/sec   Loss 7.7616   LearningRate 0.0600   Epoch: 4   Global Step: 75160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:47,753-Speed 9264.76 samples/sec   Loss 7.7242   LearningRate 0.0600   Epoch: 4   Global Step: 75170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:48,855-Speed 9302.59 samples/sec   Loss 7.6955   LearningRate 0.0600   Epoch: 4   Global Step: 75180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:49,923-Speed 9584.97 samples/sec   Loss 7.6636   LearningRate 0.0600   Epoch: 4   Global Step: 75190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:51,013-Speed 9405.62 samples/sec   Loss 7.7325   LearningRate 0.0600   Epoch: 4   Global Step: 75200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:33:52,091-Speed 9503.94 samples/sec   Loss 7.6778   LearningRate 0.0600   Epoch: 4   Global Step: 75210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:53,158-Speed 9601.80 samples/sec   Loss 7.7559   LearningRate 0.0600   Epoch: 4   Global Step: 75220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:54,219-Speed 9658.91 samples/sec   Loss 7.6711   LearningRate 0.0600   Epoch: 4   Global Step: 75230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:55,286-Speed 9602.39 samples/sec   Loss 7.7598   LearningRate 0.0600   Epoch: 4   Global Step: 75240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:56,376-Speed 9401.42 samples/sec   Loss 7.6958   LearningRate 0.0600   Epoch: 4   Global Step: 75250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:57,465-Speed 9410.14 samples/sec   Loss 7.6229   LearningRate 0.0600   Epoch: 4   Global Step: 75260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:58,512-Speed 9780.76 samples/sec   Loss 7.6900   LearningRate 0.0600   Epoch: 4   Global Step: 75270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:33:59,604-Speed 9386.20 samples/sec   Loss 7.6266   LearningRate 0.0600   Epoch: 4   Global Step: 75280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:34:00,669-Speed 9622.04 samples/sec   Loss 7.8112   LearningRate 0.0600   Epoch: 4   Global Step: 75290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:34:01,742-Speed 9547.47 samples/sec   Loss 7.7357   LearningRate 0.0600   Epoch: 4   Global Step: 75300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:34:02,844-Speed 9301.30 samples/sec   Loss 7.7116   LearningRate 0.0600   Epoch: 4   Global Step: 75310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:03,927-Speed 9458.26 samples/sec   Loss 7.8207   LearningRate 0.0600   Epoch: 4   Global Step: 75320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:05,040-Speed 9221.71 samples/sec   Loss 7.6456   LearningRate 0.0600   Epoch: 4   Global Step: 75330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:06,108-Speed 9586.81 samples/sec   Loss 7.7050   LearningRate 0.0600   Epoch: 4   Global Step: 75340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:07,218-Speed 9234.65 samples/sec   Loss 7.6994   LearningRate 0.0600   Epoch: 4   Global Step: 75350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:08,314-Speed 9348.91 samples/sec   Loss 7.7548   LearningRate 0.0599   Epoch: 4   Global Step: 75360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:09,360-Speed 9792.78 samples/sec   Loss 7.6627   LearningRate 0.0599   Epoch: 4   Global Step: 75370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:10,461-Speed 9304.85 samples/sec   Loss 7.6593   LearningRate 0.0599   Epoch: 4   Global Step: 75380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:11,530-Speed 9585.77 samples/sec   Loss 7.7702   LearningRate 0.0599   Epoch: 4   Global Step: 75390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:12,639-Speed 9238.26 samples/sec   Loss 7.6703   LearningRate 0.0599   Epoch: 4   Global Step: 75400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:13,721-Speed 9468.72 samples/sec   Loss 7.7506   LearningRate 0.0599   Epoch: 4   Global Step: 75410   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:34:14,813-Speed 9391.62 samples/sec   Loss 7.6666   LearningRate 0.0599   Epoch: 4   Global Step: 75420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:15,903-Speed 9394.85 samples/sec   Loss 7.7893   LearningRate 0.0599   Epoch: 4   Global Step: 75430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:17,024-Speed 9143.26 samples/sec   Loss 7.7913   LearningRate 0.0599   Epoch: 4   Global Step: 75440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:18,162-Speed 9001.09 samples/sec   Loss 7.7643   LearningRate 0.0599   Epoch: 4   Global Step: 75450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:19,259-Speed 9342.46 samples/sec   Loss 7.7643   LearningRate 0.0599   Epoch: 4   Global Step: 75460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:20,323-Speed 9625.83 samples/sec   Loss 7.7918   LearningRate 0.0599   Epoch: 4   Global Step: 75470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:21,424-Speed 9302.46 samples/sec   Loss 7.6304   LearningRate 0.0599   Epoch: 4   Global Step: 75480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:22,515-Speed 9396.69 samples/sec   Loss 7.6871   LearningRate 0.0599   Epoch: 4   Global Step: 75490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:23,605-Speed 9400.46 samples/sec   Loss 7.7607   LearningRate 0.0599   Epoch: 4   Global Step: 75500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:24,748-Speed 8965.48 samples/sec   Loss 7.7001   LearningRate 0.0599   Epoch: 4   Global Step: 75510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:25,776-Speed 9968.48 samples/sec   Loss 7.7561   LearningRate 0.0599   Epoch: 4   Global Step: 75520   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:34:26,897-Speed 9137.58 samples/sec   Loss 7.7267   LearningRate 0.0599   Epoch: 4   Global Step: 75530   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:34:27,958-Speed 9660.65 samples/sec   Loss 7.6183   LearningRate 0.0599   Epoch: 4   Global Step: 75540   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:34:29,009-Speed 9749.48 samples/sec   Loss 7.6574   LearningRate 0.0599   Epoch: 4   Global Step: 75550   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:34:30,067-Speed 9677.86 samples/sec   Loss 7.8451   LearningRate 0.0599   Epoch: 4   Global Step: 75560   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:34:31,152-Speed 9449.91 samples/sec   Loss 7.6569   LearningRate 0.0598   Epoch: 4   Global Step: 75570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:32,227-Speed 9526.54 samples/sec   Loss 7.6122   LearningRate 0.0598   Epoch: 4   Global Step: 75580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:33,297-Speed 9578.12 samples/sec   Loss 7.7094   LearningRate 0.0598   Epoch: 4   Global Step: 75590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:34,372-Speed 9528.76 samples/sec   Loss 7.8231   LearningRate 0.0598   Epoch: 4   Global Step: 75600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:35,489-Speed 9173.95 samples/sec   Loss 7.6476   LearningRate 0.0598   Epoch: 4   Global Step: 75610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:36,596-Speed 9251.43 samples/sec   Loss 7.6871   LearningRate 0.0598   Epoch: 4   Global Step: 75620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:34:37,647-Speed 9756.28 samples/sec   Loss 7.6200   LearningRate 0.0598   Epoch: 4   Global Step: 75630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:34:38,723-Speed 9521.22 samples/sec   Loss 7.7264   LearningRate 0.0598   Epoch: 4   Global Step: 75640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:34:39,830-Speed 9251.51 samples/sec   Loss 7.6525   LearningRate 0.0598   Epoch: 4   Global Step: 75650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:34:40,949-Speed 9161.96 samples/sec   Loss 7.6892   LearningRate 0.0598   Epoch: 4   Global Step: 75660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:34:42,028-Speed 9492.46 samples/sec   Loss 7.6652   LearningRate 0.0598   Epoch: 4   Global Step: 75670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:34:43,105-Speed 9512.12 samples/sec   Loss 7.7016   LearningRate 0.0598   Epoch: 4   Global Step: 75680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:34:44,156-Speed 9756.91 samples/sec   Loss 7.6619   LearningRate 0.0598   Epoch: 4   Global Step: 75690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:34:45,195-Speed 9854.76 samples/sec   Loss 7.7228   LearningRate 0.0598   Epoch: 4   Global Step: 75700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:34:46,275-Speed 9487.37 samples/sec   Loss 7.7200   LearningRate 0.0598   Epoch: 4   Global Step: 75710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:34:47,336-Speed 9661.68 samples/sec   Loss 7.6773   LearningRate 0.0598   Epoch: 4   Global Step: 75720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:48,382-Speed 9790.61 samples/sec   Loss 7.7386   LearningRate 0.0598   Epoch: 4   Global Step: 75730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:49,447-Speed 9621.40 samples/sec   Loss 7.6192   LearningRate 0.0598   Epoch: 4   Global Step: 75740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:50,504-Speed 9695.43 samples/sec   Loss 7.5187   LearningRate 0.0598   Epoch: 4   Global Step: 75750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:51,548-Speed 9816.08 samples/sec   Loss 7.7086   LearningRate 0.0598   Epoch: 4   Global Step: 75760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:52,631-Speed 9457.76 samples/sec   Loss 7.6582   LearningRate 0.0598   Epoch: 4   Global Step: 75770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:53,752-Speed 9142.00 samples/sec   Loss 7.7763   LearningRate 0.0598   Epoch: 4   Global Step: 75780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:54,838-Speed 9435.42 samples/sec   Loss 7.7005   LearningRate 0.0597   Epoch: 4   Global Step: 75790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:55,923-Speed 9446.29 samples/sec   Loss 7.7155   LearningRate 0.0597   Epoch: 4   Global Step: 75800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:57,006-Speed 9459.02 samples/sec   Loss 7.6868   LearningRate 0.0597   Epoch: 4   Global Step: 75810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:34:58,051-Speed 9800.52 samples/sec   Loss 7.5923   LearningRate 0.0597   Epoch: 4   Global Step: 75820   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:34:59,114-Speed 9640.74 samples/sec   Loss 7.6622   LearningRate 0.0597   Epoch: 4   Global Step: 75830   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:35:00,203-Speed 9415.15 samples/sec   Loss 7.6166   LearningRate 0.0597   Epoch: 4   Global Step: 75840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:35:01,259-Speed 9703.84 samples/sec   Loss 7.8831   LearningRate 0.0597   Epoch: 4   Global Step: 75850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:35:02,364-Speed 9274.36 samples/sec   Loss 7.6937   LearningRate 0.0597   Epoch: 4   Global Step: 75860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:35:03,433-Speed 9584.30 samples/sec   Loss 7.6105   LearningRate 0.0597   Epoch: 4   Global Step: 75870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:35:04,526-Speed 9368.02 samples/sec   Loss 7.6401   LearningRate 0.0597   Epoch: 4   Global Step: 75880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:35:05,599-Speed 9555.99 samples/sec   Loss 7.8226   LearningRate 0.0597   Epoch: 4   Global Step: 75890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:35:06,673-Speed 9537.05 samples/sec   Loss 7.6737   LearningRate 0.0597   Epoch: 4   Global Step: 75900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:35:07,762-Speed 9404.08 samples/sec   Loss 7.7453   LearningRate 0.0597   Epoch: 4   Global Step: 75910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:35:08,841-Speed 9500.98 samples/sec   Loss 7.7002   LearningRate 0.0597   Epoch: 4   Global Step: 75920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:35:09,925-Speed 9447.79 samples/sec   Loss 7.5009   LearningRate 0.0597   Epoch: 4   Global Step: 75930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:35:10,972-Speed 9784.24 samples/sec   Loss 7.6305   LearningRate 0.0597   Epoch: 4   Global Step: 75940   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:35:12,033-Speed 9660.75 samples/sec   Loss 7.6159   LearningRate 0.0597   Epoch: 4   Global Step: 75950   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:35:13,180-Speed 8929.49 samples/sec   Loss 7.7100   LearningRate 0.0597   Epoch: 4   Global Step: 75960   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:35:14,254-Speed 9550.78 samples/sec   Loss 7.7015   LearningRate 0.0597   Epoch: 4   Global Step: 75970   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:35:15,350-Speed 9345.11 samples/sec   Loss 7.5848   LearningRate 0.0597   Epoch: 4   Global Step: 75980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:35:16,477-Speed 9095.70 samples/sec   Loss 7.7701   LearningRate 0.0597   Epoch: 4   Global Step: 75990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:35:17,600-Speed 9119.54 samples/sec   Loss 7.6163   LearningRate 0.0596   Epoch: 4   Global Step: 76000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:35:39,740-[lfw][76000]XNorm: 11.971042
Training: 2022-04-11 14:35:39,741-[lfw][76000]Accuracy-Flip: 0.99517+-0.00302
Training: 2022-04-11 14:35:39,741-[lfw][76000]Accuracy-Highest: 0.99583
Training: 2022-04-11 14:36:05,359-[cfp_fp][76000]XNorm: 10.078779
Training: 2022-04-11 14:36:05,360-[cfp_fp][76000]Accuracy-Flip: 0.95400+-0.00905
Training: 2022-04-11 14:36:05,360-[cfp_fp][76000]Accuracy-Highest: 0.95400
Training: 2022-04-11 14:36:27,480-[agedb_30][76000]XNorm: 11.492150
Training: 2022-04-11 14:36:27,481-[agedb_30][76000]Accuracy-Flip: 0.96067+-0.00967
Training: 2022-04-11 14:36:27,481-[agedb_30][76000]Accuracy-Highest: 0.96067
Training: 2022-04-11 14:36:28,600-Speed 144.23 samples/sec   Loss 7.6557   LearningRate 0.0596   Epoch: 4   Global Step: 76010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:36:29,645-Speed 9804.31 samples/sec   Loss 7.7408   LearningRate 0.0596   Epoch: 4   Global Step: 76020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:36:30,680-Speed 9904.17 samples/sec   Loss 7.7558   LearningRate 0.0596   Epoch: 4   Global Step: 76030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:36:31,753-Speed 9545.65 samples/sec   Loss 7.6399   LearningRate 0.0596   Epoch: 4   Global Step: 76040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:36:32,878-Speed 9119.42 samples/sec   Loss 7.7199   LearningRate 0.0596   Epoch: 4   Global Step: 76050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:36:33,989-Speed 9214.27 samples/sec   Loss 7.6388   LearningRate 0.0596   Epoch: 4   Global Step: 76060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:36:35,078-Speed 9414.11 samples/sec   Loss 7.7027   LearningRate 0.0596   Epoch: 4   Global Step: 76070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:36:36,147-Speed 9587.18 samples/sec   Loss 7.6163   LearningRate 0.0596   Epoch: 4   Global Step: 76080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:36:37,186-Speed 9860.22 samples/sec   Loss 7.7661   LearningRate 0.0596   Epoch: 4   Global Step: 76090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:36:38,266-Speed 9486.63 samples/sec   Loss 7.7015   LearningRate 0.0596   Epoch: 4   Global Step: 76100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:36:39,338-Speed 9558.88 samples/sec   Loss 7.7157   LearningRate 0.0596   Epoch: 4   Global Step: 76110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:36:40,439-Speed 9301.96 samples/sec   Loss 7.6732   LearningRate 0.0596   Epoch: 4   Global Step: 76120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:41,548-Speed 9245.23 samples/sec   Loss 7.7260   LearningRate 0.0596   Epoch: 4   Global Step: 76130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:42,605-Speed 9688.20 samples/sec   Loss 7.7398   LearningRate 0.0596   Epoch: 4   Global Step: 76140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:43,711-Speed 9267.33 samples/sec   Loss 7.7804   LearningRate 0.0596   Epoch: 4   Global Step: 76150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:44,811-Speed 9314.57 samples/sec   Loss 7.6468   LearningRate 0.0596   Epoch: 4   Global Step: 76160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:45,861-Speed 9760.54 samples/sec   Loss 7.7378   LearningRate 0.0596   Epoch: 4   Global Step: 76170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:46,933-Speed 9564.74 samples/sec   Loss 7.7101   LearningRate 0.0596   Epoch: 4   Global Step: 76180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:47,975-Speed 9836.25 samples/sec   Loss 7.7804   LearningRate 0.0596   Epoch: 4   Global Step: 76190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:49,055-Speed 9479.49 samples/sec   Loss 7.6474   LearningRate 0.0596   Epoch: 4   Global Step: 76200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:50,150-Speed 9358.72 samples/sec   Loss 7.6997   LearningRate 0.0596   Epoch: 4   Global Step: 76210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:51,247-Speed 9337.76 samples/sec   Loss 7.7355   LearningRate 0.0595   Epoch: 4   Global Step: 76220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:36:52,342-Speed 9359.29 samples/sec   Loss 7.6910   LearningRate 0.0595   Epoch: 4   Global Step: 76230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:53,430-Speed 9413.31 samples/sec   Loss 7.7167   LearningRate 0.0595   Epoch: 4   Global Step: 76240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:54,537-Speed 9259.89 samples/sec   Loss 7.6636   LearningRate 0.0595   Epoch: 4   Global Step: 76250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:55,638-Speed 9301.12 samples/sec   Loss 7.6810   LearningRate 0.0595   Epoch: 4   Global Step: 76260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:56,720-Speed 9469.64 samples/sec   Loss 7.7711   LearningRate 0.0595   Epoch: 4   Global Step: 76270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:57,801-Speed 9478.05 samples/sec   Loss 7.7270   LearningRate 0.0595   Epoch: 4   Global Step: 76280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:58,890-Speed 9407.47 samples/sec   Loss 7.5351   LearningRate 0.0595   Epoch: 4   Global Step: 76290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:36:59,950-Speed 9678.13 samples/sec   Loss 7.5800   LearningRate 0.0595   Epoch: 4   Global Step: 76300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:01,042-Speed 9384.90 samples/sec   Loss 7.5478   LearningRate 0.0595   Epoch: 4   Global Step: 76310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:02,133-Speed 9387.83 samples/sec   Loss 7.5933   LearningRate 0.0595   Epoch: 4   Global Step: 76320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:03,205-Speed 9553.77 samples/sec   Loss 7.6270   LearningRate 0.0595   Epoch: 4   Global Step: 76330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:04,282-Speed 9521.84 samples/sec   Loss 7.6851   LearningRate 0.0595   Epoch: 4   Global Step: 76340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:05,366-Speed 9448.79 samples/sec   Loss 7.6292   LearningRate 0.0595   Epoch: 4   Global Step: 76350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:06,431-Speed 9625.55 samples/sec   Loss 7.5878   LearningRate 0.0595   Epoch: 4   Global Step: 76360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:07,519-Speed 9412.39 samples/sec   Loss 7.6097   LearningRate 0.0595   Epoch: 4   Global Step: 76370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:08,636-Speed 9178.57 samples/sec   Loss 7.6887   LearningRate 0.0595   Epoch: 4   Global Step: 76380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:09,752-Speed 9180.98 samples/sec   Loss 7.6130   LearningRate 0.0595   Epoch: 4   Global Step: 76390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:10,819-Speed 9597.91 samples/sec   Loss 7.7013   LearningRate 0.0595   Epoch: 4   Global Step: 76400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:11,905-Speed 9435.63 samples/sec   Loss 7.6056   LearningRate 0.0595   Epoch: 4   Global Step: 76410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:12,980-Speed 9529.51 samples/sec   Loss 7.5751   LearningRate 0.0595   Epoch: 4   Global Step: 76420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:14,081-Speed 9309.57 samples/sec   Loss 7.7110   LearningRate 0.0595   Epoch: 4   Global Step: 76430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:15,180-Speed 9324.38 samples/sec   Loss 7.6858   LearningRate 0.0594   Epoch: 4   Global Step: 76440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:16,242-Speed 9651.81 samples/sec   Loss 7.6351   LearningRate 0.0594   Epoch: 4   Global Step: 76450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:17,316-Speed 9539.42 samples/sec   Loss 7.7915   LearningRate 0.0594   Epoch: 4   Global Step: 76460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:18,438-Speed 9124.30 samples/sec   Loss 7.7554   LearningRate 0.0594   Epoch: 4   Global Step: 76470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:19,544-Speed 9271.03 samples/sec   Loss 7.8388   LearningRate 0.0594   Epoch: 4   Global Step: 76480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:20,650-Speed 9260.84 samples/sec   Loss 7.7320   LearningRate 0.0594   Epoch: 4   Global Step: 76490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:21,736-Speed 9431.98 samples/sec   Loss 7.6656   LearningRate 0.0594   Epoch: 4   Global Step: 76500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:22,807-Speed 9569.64 samples/sec   Loss 7.7122   LearningRate 0.0594   Epoch: 4   Global Step: 76510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:23,856-Speed 9770.60 samples/sec   Loss 7.7083   LearningRate 0.0594   Epoch: 4   Global Step: 76520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:24,910-Speed 9726.24 samples/sec   Loss 7.6734   LearningRate 0.0594   Epoch: 4   Global Step: 76530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:25,981-Speed 9558.10 samples/sec   Loss 7.6590   LearningRate 0.0594   Epoch: 4   Global Step: 76540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:27,056-Speed 9533.03 samples/sec   Loss 7.7099   LearningRate 0.0594   Epoch: 4   Global Step: 76550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:28,114-Speed 9687.73 samples/sec   Loss 7.7094   LearningRate 0.0594   Epoch: 4   Global Step: 76560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:29,237-Speed 9127.45 samples/sec   Loss 7.7314   LearningRate 0.0594   Epoch: 4   Global Step: 76570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:30,298-Speed 9651.95 samples/sec   Loss 7.6480   LearningRate 0.0594   Epoch: 4   Global Step: 76580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:31,394-Speed 9350.76 samples/sec   Loss 7.7098   LearningRate 0.0594   Epoch: 4   Global Step: 76590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:32,469-Speed 9531.77 samples/sec   Loss 7.6397   LearningRate 0.0594   Epoch: 4   Global Step: 76600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:33,579-Speed 9230.38 samples/sec   Loss 7.7184   LearningRate 0.0594   Epoch: 4   Global Step: 76610   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:37:34,645-Speed 9607.09 samples/sec   Loss 7.7007   LearningRate 0.0594   Epoch: 4   Global Step: 76620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:35,695-Speed 9758.24 samples/sec   Loss 7.6067   LearningRate 0.0594   Epoch: 4   Global Step: 76630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:36,772-Speed 9515.82 samples/sec   Loss 7.6932   LearningRate 0.0594   Epoch: 4   Global Step: 76640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:37,855-Speed 9458.20 samples/sec   Loss 7.5425   LearningRate 0.0593   Epoch: 4   Global Step: 76650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:38,903-Speed 9776.41 samples/sec   Loss 7.5341   LearningRate 0.0593   Epoch: 4   Global Step: 76660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:40,016-Speed 9215.90 samples/sec   Loss 7.6828   LearningRate 0.0593   Epoch: 4   Global Step: 76670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:41,080-Speed 9632.59 samples/sec   Loss 7.6486   LearningRate 0.0593   Epoch: 4   Global Step: 76680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:42,154-Speed 9536.50 samples/sec   Loss 7.6957   LearningRate 0.0593   Epoch: 4   Global Step: 76690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:43,210-Speed 9701.26 samples/sec   Loss 7.6026   LearningRate 0.0593   Epoch: 4   Global Step: 76700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:44,291-Speed 9483.04 samples/sec   Loss 7.7901   LearningRate 0.0593   Epoch: 4   Global Step: 76710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:45,345-Speed 9725.16 samples/sec   Loss 7.6477   LearningRate 0.0593   Epoch: 4   Global Step: 76720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:46,430-Speed 9439.67 samples/sec   Loss 7.6521   LearningRate 0.0593   Epoch: 4   Global Step: 76730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:47,527-Speed 9342.30 samples/sec   Loss 7.6177   LearningRate 0.0593   Epoch: 4   Global Step: 76740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:48,603-Speed 9527.55 samples/sec   Loss 7.6042   LearningRate 0.0593   Epoch: 4   Global Step: 76750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:49,643-Speed 9846.61 samples/sec   Loss 7.6013   LearningRate 0.0593   Epoch: 4   Global Step: 76760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:50,707-Speed 9629.88 samples/sec   Loss 7.6625   LearningRate 0.0593   Epoch: 4   Global Step: 76770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:51,801-Speed 9370.12 samples/sec   Loss 7.6128   LearningRate 0.0593   Epoch: 4   Global Step: 76780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:52,889-Speed 9414.22 samples/sec   Loss 7.8256   LearningRate 0.0593   Epoch: 4   Global Step: 76790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:37:53,979-Speed 9398.51 samples/sec   Loss 7.6665   LearningRate 0.0593   Epoch: 4   Global Step: 76800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:55,030-Speed 9753.03 samples/sec   Loss 7.7214   LearningRate 0.0593   Epoch: 4   Global Step: 76810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:56,113-Speed 9459.06 samples/sec   Loss 7.7808   LearningRate 0.0593   Epoch: 4   Global Step: 76820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:57,171-Speed 9681.21 samples/sec   Loss 7.4894   LearningRate 0.0593   Epoch: 4   Global Step: 76830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:58,279-Speed 9243.53 samples/sec   Loss 7.7540   LearningRate 0.0593   Epoch: 4   Global Step: 76840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:37:59,377-Speed 9338.53 samples/sec   Loss 7.6286   LearningRate 0.0593   Epoch: 4   Global Step: 76850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:00,444-Speed 9600.75 samples/sec   Loss 7.7741   LearningRate 0.0593   Epoch: 4   Global Step: 76860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:01,522-Speed 9509.11 samples/sec   Loss 7.5438   LearningRate 0.0592   Epoch: 4   Global Step: 76870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:02,622-Speed 9320.34 samples/sec   Loss 7.7029   LearningRate 0.0592   Epoch: 4   Global Step: 76880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:03,731-Speed 9237.24 samples/sec   Loss 7.7500   LearningRate 0.0592   Epoch: 4   Global Step: 76890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:04,814-Speed 9461.86 samples/sec   Loss 7.7432   LearningRate 0.0592   Epoch: 4   Global Step: 76900   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:38:05,867-Speed 9725.01 samples/sec   Loss 7.5830   LearningRate 0.0592   Epoch: 4   Global Step: 76910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:06,932-Speed 9617.45 samples/sec   Loss 7.6381   LearningRate 0.0592   Epoch: 4   Global Step: 76920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:08,055-Speed 9126.50 samples/sec   Loss 7.6597   LearningRate 0.0592   Epoch: 4   Global Step: 76930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:09,109-Speed 9723.14 samples/sec   Loss 7.7784   LearningRate 0.0592   Epoch: 4   Global Step: 76940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:10,180-Speed 9564.54 samples/sec   Loss 7.6244   LearningRate 0.0592   Epoch: 4   Global Step: 76950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:11,281-Speed 9312.45 samples/sec   Loss 7.7284   LearningRate 0.0592   Epoch: 4   Global Step: 76960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:12,342-Speed 9657.13 samples/sec   Loss 7.6547   LearningRate 0.0592   Epoch: 4   Global Step: 76970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:13,417-Speed 9527.19 samples/sec   Loss 7.8048   LearningRate 0.0592   Epoch: 4   Global Step: 76980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:14,547-Speed 9066.12 samples/sec   Loss 7.7704   LearningRate 0.0592   Epoch: 4   Global Step: 76990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:15,647-Speed 9313.40 samples/sec   Loss 7.7346   LearningRate 0.0592   Epoch: 4   Global Step: 77000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:16,758-Speed 9223.33 samples/sec   Loss 7.6435   LearningRate 0.0592   Epoch: 4   Global Step: 77010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:17,836-Speed 9501.26 samples/sec   Loss 7.7361   LearningRate 0.0592   Epoch: 4   Global Step: 77020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:18,878-Speed 9836.00 samples/sec   Loss 7.7636   LearningRate 0.0592   Epoch: 4   Global Step: 77030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:19,934-Speed 9703.23 samples/sec   Loss 7.6643   LearningRate 0.0592   Epoch: 4   Global Step: 77040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:21,010-Speed 9523.53 samples/sec   Loss 7.7595   LearningRate 0.0592   Epoch: 4   Global Step: 77050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:22,110-Speed 9318.21 samples/sec   Loss 7.6439   LearningRate 0.0592   Epoch: 4   Global Step: 77060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:23,191-Speed 9472.49 samples/sec   Loss 7.6489   LearningRate 0.0592   Epoch: 4   Global Step: 77070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:24,251-Speed 9673.77 samples/sec   Loss 7.6493   LearningRate 0.0592   Epoch: 4   Global Step: 77080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:25,324-Speed 9547.30 samples/sec   Loss 7.7300   LearningRate 0.0591   Epoch: 4   Global Step: 77090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:26,411-Speed 9422.97 samples/sec   Loss 7.7611   LearningRate 0.0591   Epoch: 4   Global Step: 77100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:27,486-Speed 9532.85 samples/sec   Loss 7.5951   LearningRate 0.0591   Epoch: 4   Global Step: 77110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:28,581-Speed 9356.36 samples/sec   Loss 7.6268   LearningRate 0.0591   Epoch: 4   Global Step: 77120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:29,628-Speed 9788.89 samples/sec   Loss 7.6084   LearningRate 0.0591   Epoch: 4   Global Step: 77130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:30,666-Speed 9873.03 samples/sec   Loss 7.7152   LearningRate 0.0591   Epoch: 4   Global Step: 77140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:31,729-Speed 9637.49 samples/sec   Loss 7.7142   LearningRate 0.0591   Epoch: 4   Global Step: 77150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:32,843-Speed 9196.43 samples/sec   Loss 7.5392   LearningRate 0.0591   Epoch: 4   Global Step: 77160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:33,916-Speed 9546.88 samples/sec   Loss 7.7210   LearningRate 0.0591   Epoch: 4   Global Step: 77170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:35,008-Speed 9377.92 samples/sec   Loss 7.6985   LearningRate 0.0591   Epoch: 4   Global Step: 77180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:36,081-Speed 9548.20 samples/sec   Loss 7.7245   LearningRate 0.0591   Epoch: 4   Global Step: 77190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:37,157-Speed 9524.08 samples/sec   Loss 7.6569   LearningRate 0.0591   Epoch: 4   Global Step: 77200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:38,232-Speed 9533.99 samples/sec   Loss 7.6588   LearningRate 0.0591   Epoch: 4   Global Step: 77210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:39,314-Speed 9475.94 samples/sec   Loss 7.7178   LearningRate 0.0591   Epoch: 4   Global Step: 77220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:40,396-Speed 9470.37 samples/sec   Loss 7.7909   LearningRate 0.0591   Epoch: 4   Global Step: 77230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:41,473-Speed 9514.12 samples/sec   Loss 7.6293   LearningRate 0.0591   Epoch: 4   Global Step: 77240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:42,533-Speed 9665.62 samples/sec   Loss 7.6637   LearningRate 0.0591   Epoch: 4   Global Step: 77250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:43,624-Speed 9390.62 samples/sec   Loss 7.7568   LearningRate 0.0591   Epoch: 4   Global Step: 77260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:44,736-Speed 9214.10 samples/sec   Loss 7.5771   LearningRate 0.0591   Epoch: 4   Global Step: 77270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:45,763-Speed 9977.55 samples/sec   Loss 7.6581   LearningRate 0.0591   Epoch: 4   Global Step: 77280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:46,792-Speed 9954.58 samples/sec   Loss 7.6606   LearningRate 0.0591   Epoch: 4   Global Step: 77290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:47,866-Speed 9541.31 samples/sec   Loss 7.7072   LearningRate 0.0590   Epoch: 4   Global Step: 77300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:48,931-Speed 9620.38 samples/sec   Loss 7.6902   LearningRate 0.0590   Epoch: 4   Global Step: 77310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:50,034-Speed 9290.99 samples/sec   Loss 7.8366   LearningRate 0.0590   Epoch: 4   Global Step: 77320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:51,136-Speed 9298.55 samples/sec   Loss 7.7433   LearningRate 0.0590   Epoch: 4   Global Step: 77330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:52,225-Speed 9409.48 samples/sec   Loss 7.6495   LearningRate 0.0590   Epoch: 4   Global Step: 77340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:53,307-Speed 9464.72 samples/sec   Loss 7.7126   LearningRate 0.0590   Epoch: 4   Global Step: 77350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:54,384-Speed 9516.47 samples/sec   Loss 7.6293   LearningRate 0.0590   Epoch: 4   Global Step: 77360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:55,461-Speed 9507.91 samples/sec   Loss 7.5943   LearningRate 0.0590   Epoch: 4   Global Step: 77370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:38:56,568-Speed 9259.25 samples/sec   Loss 7.5226   LearningRate 0.0590   Epoch: 4   Global Step: 77380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:57,678-Speed 9231.56 samples/sec   Loss 7.6378   LearningRate 0.0590   Epoch: 4   Global Step: 77390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:58,784-Speed 9263.33 samples/sec   Loss 7.6686   LearningRate 0.0590   Epoch: 4   Global Step: 77400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:38:59,881-Speed 9342.52 samples/sec   Loss 7.6074   LearningRate 0.0590   Epoch: 4   Global Step: 77410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:39:00,984-Speed 9295.20 samples/sec   Loss 7.6271   LearningRate 0.0590   Epoch: 4   Global Step: 77420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:39:02,087-Speed 9287.02 samples/sec   Loss 7.7080   LearningRate 0.0590   Epoch: 4   Global Step: 77430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:39:03,192-Speed 9270.82 samples/sec   Loss 7.6263   LearningRate 0.0590   Epoch: 4   Global Step: 77440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:39:04,273-Speed 9477.80 samples/sec   Loss 7.7790   LearningRate 0.0590   Epoch: 4   Global Step: 77450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:39:05,355-Speed 9467.34 samples/sec   Loss 7.7667   LearningRate 0.0590   Epoch: 4   Global Step: 77460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:39:06,507-Speed 8891.07 samples/sec   Loss 7.7643   LearningRate 0.0590   Epoch: 4   Global Step: 77470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:39:07,584-Speed 9517.72 samples/sec   Loss 7.7034   LearningRate 0.0590   Epoch: 4   Global Step: 77480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:08,648-Speed 9626.90 samples/sec   Loss 7.6387   LearningRate 0.0590   Epoch: 4   Global Step: 77490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:09,722-Speed 9541.43 samples/sec   Loss 7.6479   LearningRate 0.0590   Epoch: 4   Global Step: 77500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:10,796-Speed 9545.40 samples/sec   Loss 7.6616   LearningRate 0.0590   Epoch: 4   Global Step: 77510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:11,860-Speed 9626.57 samples/sec   Loss 7.7930   LearningRate 0.0589   Epoch: 4   Global Step: 77520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:12,930-Speed 9573.54 samples/sec   Loss 7.7283   LearningRate 0.0589   Epoch: 4   Global Step: 77530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:14,040-Speed 9235.45 samples/sec   Loss 7.8006   LearningRate 0.0589   Epoch: 4   Global Step: 77540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:15,102-Speed 9649.96 samples/sec   Loss 7.8724   LearningRate 0.0589   Epoch: 4   Global Step: 77550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:16,163-Speed 9657.56 samples/sec   Loss 7.6538   LearningRate 0.0589   Epoch: 4   Global Step: 77560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:17,263-Speed 9318.98 samples/sec   Loss 7.5981   LearningRate 0.0589   Epoch: 4   Global Step: 77570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:18,349-Speed 9432.16 samples/sec   Loss 7.6112   LearningRate 0.0589   Epoch: 4   Global Step: 77580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:19,462-Speed 9211.92 samples/sec   Loss 7.6949   LearningRate 0.0589   Epoch: 4   Global Step: 77590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:20,549-Speed 9423.85 samples/sec   Loss 7.6799   LearningRate 0.0589   Epoch: 4   Global Step: 77600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:21,596-Speed 9791.17 samples/sec   Loss 7.6541   LearningRate 0.0589   Epoch: 4   Global Step: 77610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:22,706-Speed 9228.51 samples/sec   Loss 7.7709   LearningRate 0.0589   Epoch: 4   Global Step: 77620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:23,769-Speed 9638.33 samples/sec   Loss 7.6605   LearningRate 0.0589   Epoch: 4   Global Step: 77630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:24,845-Speed 9527.49 samples/sec   Loss 7.6347   LearningRate 0.0589   Epoch: 4   Global Step: 77640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:25,907-Speed 9643.40 samples/sec   Loss 7.6709   LearningRate 0.0589   Epoch: 4   Global Step: 77650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:27,001-Speed 9364.28 samples/sec   Loss 7.6341   LearningRate 0.0589   Epoch: 4   Global Step: 77660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:28,068-Speed 9603.94 samples/sec   Loss 7.6463   LearningRate 0.0589   Epoch: 4   Global Step: 77670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:29,134-Speed 9607.98 samples/sec   Loss 7.7042   LearningRate 0.0589   Epoch: 4   Global Step: 77680   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:39:30,236-Speed 9299.11 samples/sec   Loss 7.5839   LearningRate 0.0589   Epoch: 4   Global Step: 77690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:31,306-Speed 9581.39 samples/sec   Loss 7.6189   LearningRate 0.0589   Epoch: 4   Global Step: 77700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:32,376-Speed 9577.52 samples/sec   Loss 7.5762   LearningRate 0.0589   Epoch: 4   Global Step: 77710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:33,443-Speed 9602.46 samples/sec   Loss 7.5885   LearningRate 0.0589   Epoch: 4   Global Step: 77720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:34,501-Speed 9675.83 samples/sec   Loss 7.6247   LearningRate 0.0589   Epoch: 4   Global Step: 77730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:35,619-Speed 9165.99 samples/sec   Loss 7.5747   LearningRate 0.0588   Epoch: 4   Global Step: 77740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:36,666-Speed 9786.97 samples/sec   Loss 7.6904   LearningRate 0.0588   Epoch: 4   Global Step: 77750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:37,767-Speed 9310.18 samples/sec   Loss 7.6819   LearningRate 0.0588   Epoch: 4   Global Step: 77760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:38,852-Speed 9441.00 samples/sec   Loss 7.6622   LearningRate 0.0588   Epoch: 4   Global Step: 77770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:39,903-Speed 9755.33 samples/sec   Loss 7.6716   LearningRate 0.0588   Epoch: 4   Global Step: 77780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:40,960-Speed 9692.53 samples/sec   Loss 7.6897   LearningRate 0.0588   Epoch: 4   Global Step: 77790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:42,041-Speed 9478.17 samples/sec   Loss 7.6902   LearningRate 0.0588   Epoch: 4   Global Step: 77800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:43,145-Speed 9281.17 samples/sec   Loss 7.6361   LearningRate 0.0588   Epoch: 4   Global Step: 77810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:44,229-Speed 9444.33 samples/sec   Loss 7.6009   LearningRate 0.0588   Epoch: 4   Global Step: 77820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:45,294-Speed 9625.37 samples/sec   Loss 7.5196   LearningRate 0.0588   Epoch: 4   Global Step: 77830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:46,369-Speed 9532.98 samples/sec   Loss 7.7084   LearningRate 0.0588   Epoch: 4   Global Step: 77840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:47,434-Speed 9619.81 samples/sec   Loss 7.6564   LearningRate 0.0588   Epoch: 4   Global Step: 77850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:48,505-Speed 9571.82 samples/sec   Loss 7.6103   LearningRate 0.0588   Epoch: 4   Global Step: 77860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:49,572-Speed 9599.69 samples/sec   Loss 7.5398   LearningRate 0.0588   Epoch: 4   Global Step: 77870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:50,661-Speed 9405.73 samples/sec   Loss 7.5741   LearningRate 0.0588   Epoch: 4   Global Step: 77880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:51,800-Speed 9001.23 samples/sec   Loss 7.5899   LearningRate 0.0588   Epoch: 4   Global Step: 77890   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:39:52,859-Speed 9668.66 samples/sec   Loss 7.5755   LearningRate 0.0588   Epoch: 4   Global Step: 77900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:53,938-Speed 9493.92 samples/sec   Loss 7.6374   LearningRate 0.0588   Epoch: 4   Global Step: 77910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:54,988-Speed 9764.81 samples/sec   Loss 7.6502   LearningRate 0.0588   Epoch: 4   Global Step: 77920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:56,068-Speed 9487.61 samples/sec   Loss 7.6796   LearningRate 0.0588   Epoch: 4   Global Step: 77930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:57,153-Speed 9447.66 samples/sec   Loss 7.6480   LearningRate 0.0588   Epoch: 4   Global Step: 77940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:58,223-Speed 9573.40 samples/sec   Loss 7.6498   LearningRate 0.0588   Epoch: 4   Global Step: 77950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:39:59,297-Speed 9550.60 samples/sec   Loss 7.6523   LearningRate 0.0587   Epoch: 4   Global Step: 77960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:40:00,363-Speed 9619.43 samples/sec   Loss 7.7279   LearningRate 0.0587   Epoch: 4   Global Step: 77970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:40:01,428-Speed 9613.23 samples/sec   Loss 7.6778   LearningRate 0.0587   Epoch: 4   Global Step: 77980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:40:02,522-Speed 9365.51 samples/sec   Loss 7.6297   LearningRate 0.0587   Epoch: 4   Global Step: 77990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:40:03,606-Speed 9456.06 samples/sec   Loss 7.6724   LearningRate 0.0587   Epoch: 4   Global Step: 78000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:40:25,575-[lfw][78000]XNorm: 12.139055
Training: 2022-04-11 14:40:25,576-[lfw][78000]Accuracy-Flip: 0.99633+-0.00267
Training: 2022-04-11 14:40:25,576-[lfw][78000]Accuracy-Highest: 0.99633
Training: 2022-04-11 14:40:50,945-[cfp_fp][78000]XNorm: 10.332268
Training: 2022-04-11 14:40:50,945-[cfp_fp][78000]Accuracy-Flip: 0.95014+-0.01267
Training: 2022-04-11 14:40:50,946-[cfp_fp][78000]Accuracy-Highest: 0.95400
Training: 2022-04-11 14:41:12,855-[agedb_30][78000]XNorm: 11.762301
Training: 2022-04-11 14:41:12,856-[agedb_30][78000]Accuracy-Flip: 0.95900+-0.00873
Training: 2022-04-11 14:41:12,856-[agedb_30][78000]Accuracy-Highest: 0.96067
Training: 2022-04-11 14:41:13,956-Speed 145.56 samples/sec   Loss 7.6344   LearningRate 0.0587   Epoch: 4   Global Step: 78010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:15,025-Speed 9588.87 samples/sec   Loss 7.6983   LearningRate 0.0587   Epoch: 4   Global Step: 78020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:16,098-Speed 9545.17 samples/sec   Loss 7.6208   LearningRate 0.0587   Epoch: 4   Global Step: 78030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:17,150-Speed 9744.73 samples/sec   Loss 7.6669   LearningRate 0.0587   Epoch: 4   Global Step: 78040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:18,239-Speed 9408.34 samples/sec   Loss 7.6989   LearningRate 0.0587   Epoch: 4   Global Step: 78050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:19,316-Speed 9512.71 samples/sec   Loss 7.6719   LearningRate 0.0587   Epoch: 4   Global Step: 78060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:20,415-Speed 9328.00 samples/sec   Loss 7.5922   LearningRate 0.0587   Epoch: 4   Global Step: 78070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:21,501-Speed 9431.96 samples/sec   Loss 7.6070   LearningRate 0.0587   Epoch: 4   Global Step: 78080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:22,584-Speed 9457.75 samples/sec   Loss 7.5512   LearningRate 0.0587   Epoch: 4   Global Step: 78090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:23,642-Speed 9686.37 samples/sec   Loss 7.5096   LearningRate 0.0587   Epoch: 4   Global Step: 78100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:24,724-Speed 9468.99 samples/sec   Loss 7.9518   LearningRate 0.0587   Epoch: 4   Global Step: 78110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:25,799-Speed 9530.21 samples/sec   Loss 7.7583   LearningRate 0.0587   Epoch: 4   Global Step: 78120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:26,879-Speed 9488.90 samples/sec   Loss 7.8605   LearningRate 0.0587   Epoch: 4   Global Step: 78130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:27,945-Speed 9612.60 samples/sec   Loss 7.6759   LearningRate 0.0587   Epoch: 4   Global Step: 78140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:29,020-Speed 9528.87 samples/sec   Loss 7.7340   LearningRate 0.0587   Epoch: 4   Global Step: 78150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:30,084-Speed 9632.84 samples/sec   Loss 7.7140   LearningRate 0.0587   Epoch: 4   Global Step: 78160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:31,156-Speed 9558.64 samples/sec   Loss 7.6470   LearningRate 0.0586   Epoch: 4   Global Step: 78170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:32,235-Speed 9489.26 samples/sec   Loss 7.7518   LearningRate 0.0586   Epoch: 4   Global Step: 78180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:33,314-Speed 9509.40 samples/sec   Loss 7.6634   LearningRate 0.0586   Epoch: 4   Global Step: 78190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:34,370-Speed 9700.43 samples/sec   Loss 7.7305   LearningRate 0.0586   Epoch: 4   Global Step: 78200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:35,456-Speed 9437.90 samples/sec   Loss 7.6415   LearningRate 0.0586   Epoch: 4   Global Step: 78210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:36,506-Speed 9750.47 samples/sec   Loss 7.6485   LearningRate 0.0586   Epoch: 4   Global Step: 78220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:37,599-Speed 9376.38 samples/sec   Loss 7.6155   LearningRate 0.0586   Epoch: 4   Global Step: 78230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:38,660-Speed 9657.31 samples/sec   Loss 7.6183   LearningRate 0.0586   Epoch: 4   Global Step: 78240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:39,739-Speed 9491.07 samples/sec   Loss 7.6430   LearningRate 0.0586   Epoch: 4   Global Step: 78250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:40,855-Speed 9180.89 samples/sec   Loss 7.7151   LearningRate 0.0586   Epoch: 4   Global Step: 78260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:41,916-Speed 9655.25 samples/sec   Loss 7.6103   LearningRate 0.0586   Epoch: 4   Global Step: 78270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:42,964-Speed 9786.15 samples/sec   Loss 7.6157   LearningRate 0.0586   Epoch: 4   Global Step: 78280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:44,038-Speed 9538.34 samples/sec   Loss 7.6257   LearningRate 0.0586   Epoch: 4   Global Step: 78290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:45,152-Speed 9194.13 samples/sec   Loss 7.7694   LearningRate 0.0586   Epoch: 4   Global Step: 78300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:41:46,230-Speed 9510.07 samples/sec   Loss 7.6204   LearningRate 0.0586   Epoch: 4   Global Step: 78310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:47,321-Speed 9392.56 samples/sec   Loss 7.6486   LearningRate 0.0586   Epoch: 4   Global Step: 78320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:48,428-Speed 9254.02 samples/sec   Loss 7.6059   LearningRate 0.0586   Epoch: 4   Global Step: 78330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:49,527-Speed 9324.82 samples/sec   Loss 7.5561   LearningRate 0.0586   Epoch: 4   Global Step: 78340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:50,617-Speed 9394.56 samples/sec   Loss 7.6917   LearningRate 0.0586   Epoch: 4   Global Step: 78350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:51,656-Speed 9864.86 samples/sec   Loss 7.6668   LearningRate 0.0586   Epoch: 4   Global Step: 78360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:52,744-Speed 9414.59 samples/sec   Loss 7.6037   LearningRate 0.0586   Epoch: 4   Global Step: 78370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:53,838-Speed 9373.20 samples/sec   Loss 7.7943   LearningRate 0.0586   Epoch: 4   Global Step: 78380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:54,922-Speed 9445.24 samples/sec   Loss 7.6972   LearningRate 0.0585   Epoch: 4   Global Step: 78390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:56,085-Speed 8814.80 samples/sec   Loss 7.6051   LearningRate 0.0585   Epoch: 4   Global Step: 78400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:57,159-Speed 9535.13 samples/sec   Loss 7.4433   LearningRate 0.0585   Epoch: 4   Global Step: 78410   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:41:58,273-Speed 9196.97 samples/sec   Loss 7.6419   LearningRate 0.0585   Epoch: 4   Global Step: 78420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:41:59,342-Speed 9589.70 samples/sec   Loss 7.6463   LearningRate 0.0585   Epoch: 4   Global Step: 78430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:00,417-Speed 9524.10 samples/sec   Loss 7.6767   LearningRate 0.0585   Epoch: 4   Global Step: 78440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:01,510-Speed 9378.57 samples/sec   Loss 7.6365   LearningRate 0.0585   Epoch: 4   Global Step: 78450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:02,614-Speed 9278.19 samples/sec   Loss 7.6809   LearningRate 0.0585   Epoch: 4   Global Step: 78460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:03,699-Speed 9446.94 samples/sec   Loss 7.5331   LearningRate 0.0585   Epoch: 4   Global Step: 78470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:04,773-Speed 9544.33 samples/sec   Loss 7.5321   LearningRate 0.0585   Epoch: 4   Global Step: 78480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:05,869-Speed 9344.14 samples/sec   Loss 7.6945   LearningRate 0.0585   Epoch: 4   Global Step: 78490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:06,951-Speed 9476.16 samples/sec   Loss 7.7459   LearningRate 0.0585   Epoch: 4   Global Step: 78500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:08,044-Speed 9367.85 samples/sec   Loss 7.7671   LearningRate 0.0585   Epoch: 4   Global Step: 78510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:09,145-Speed 9305.70 samples/sec   Loss 7.6302   LearningRate 0.0585   Epoch: 4   Global Step: 78520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:10,245-Speed 9319.74 samples/sec   Loss 7.7015   LearningRate 0.0585   Epoch: 4   Global Step: 78530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:11,369-Speed 9112.59 samples/sec   Loss 7.5634   LearningRate 0.0585   Epoch: 4   Global Step: 78540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:12,480-Speed 9219.71 samples/sec   Loss 7.6354   LearningRate 0.0585   Epoch: 4   Global Step: 78550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:13,573-Speed 9379.47 samples/sec   Loss 7.5868   LearningRate 0.0585   Epoch: 4   Global Step: 78560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:14,647-Speed 9533.92 samples/sec   Loss 7.6194   LearningRate 0.0585   Epoch: 4   Global Step: 78570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:15,704-Speed 9693.11 samples/sec   Loss 7.7923   LearningRate 0.0585   Epoch: 4   Global Step: 78580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:16,778-Speed 9543.43 samples/sec   Loss 7.7865   LearningRate 0.0585   Epoch: 4   Global Step: 78590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:17,849-Speed 9569.94 samples/sec   Loss 7.6087   LearningRate 0.0585   Epoch: 4   Global Step: 78600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:18,933-Speed 9455.40 samples/sec   Loss 7.5640   LearningRate 0.0584   Epoch: 4   Global Step: 78610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:19,985-Speed 9737.94 samples/sec   Loss 7.5512   LearningRate 0.0584   Epoch: 4   Global Step: 78620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:21,023-Speed 9867.34 samples/sec   Loss 7.6808   LearningRate 0.0584   Epoch: 4   Global Step: 78630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:22,112-Speed 9414.03 samples/sec   Loss 7.6282   LearningRate 0.0584   Epoch: 4   Global Step: 78640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:23,254-Speed 8974.37 samples/sec   Loss 7.8285   LearningRate 0.0584   Epoch: 4   Global Step: 78650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:24,356-Speed 9295.14 samples/sec   Loss 7.7123   LearningRate 0.0584   Epoch: 4   Global Step: 78660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:25,481-Speed 9109.25 samples/sec   Loss 7.5911   LearningRate 0.0584   Epoch: 4   Global Step: 78670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:26,576-Speed 9356.41 samples/sec   Loss 7.5898   LearningRate 0.0584   Epoch: 4   Global Step: 78680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:27,642-Speed 9615.27 samples/sec   Loss 7.6383   LearningRate 0.0584   Epoch: 4   Global Step: 78690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:28,721-Speed 9497.57 samples/sec   Loss 7.6419   LearningRate 0.0584   Epoch: 4   Global Step: 78700   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:42:29,752-Speed 9933.50 samples/sec   Loss 7.6020   LearningRate 0.0584   Epoch: 4   Global Step: 78710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:30,825-Speed 9552.13 samples/sec   Loss 7.6641   LearningRate 0.0584   Epoch: 4   Global Step: 78720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:31,906-Speed 9472.58 samples/sec   Loss 7.7830   LearningRate 0.0584   Epoch: 4   Global Step: 78730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:32,975-Speed 9592.01 samples/sec   Loss 7.6197   LearningRate 0.0584   Epoch: 4   Global Step: 78740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:34,050-Speed 9538.89 samples/sec   Loss 7.5458   LearningRate 0.0584   Epoch: 4   Global Step: 78750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:35,122-Speed 9562.06 samples/sec   Loss 7.5982   LearningRate 0.0584   Epoch: 4   Global Step: 78760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:36,182-Speed 9665.95 samples/sec   Loss 7.7203   LearningRate 0.0584   Epoch: 4   Global Step: 78770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:37,277-Speed 9357.60 samples/sec   Loss 7.7123   LearningRate 0.0584   Epoch: 4   Global Step: 78780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:38,368-Speed 9398.10 samples/sec   Loss 7.6778   LearningRate 0.0584   Epoch: 4   Global Step: 78790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:39,457-Speed 9402.42 samples/sec   Loss 7.5970   LearningRate 0.0584   Epoch: 4   Global Step: 78800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:40,569-Speed 9214.59 samples/sec   Loss 7.6721   LearningRate 0.0584   Epoch: 4   Global Step: 78810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:41,642-Speed 9546.66 samples/sec   Loss 7.7090   LearningRate 0.0584   Epoch: 4   Global Step: 78820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:42,785-Speed 8968.11 samples/sec   Loss 7.5321   LearningRate 0.0583   Epoch: 4   Global Step: 78830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:43,932-Speed 8938.18 samples/sec   Loss 7.5087   LearningRate 0.0583   Epoch: 4   Global Step: 78840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:45,016-Speed 9443.80 samples/sec   Loss 7.7136   LearningRate 0.0583   Epoch: 4   Global Step: 78850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:46,105-Speed 9413.04 samples/sec   Loss 7.5735   LearningRate 0.0583   Epoch: 4   Global Step: 78860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:47,192-Speed 9424.44 samples/sec   Loss 7.7427   LearningRate 0.0583   Epoch: 4   Global Step: 78870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:48,320-Speed 9083.05 samples/sec   Loss 7.7805   LearningRate 0.0583   Epoch: 4   Global Step: 78880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:49,418-Speed 9332.29 samples/sec   Loss 7.6143   LearningRate 0.0583   Epoch: 4   Global Step: 78890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:50,519-Speed 9306.74 samples/sec   Loss 7.6248   LearningRate 0.0583   Epoch: 4   Global Step: 78900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:42:51,608-Speed 9409.48 samples/sec   Loss 7.7363   LearningRate 0.0583   Epoch: 4   Global Step: 78910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:52,670-Speed 9646.89 samples/sec   Loss 7.5733   LearningRate 0.0583   Epoch: 4   Global Step: 78920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:53,738-Speed 9598.93 samples/sec   Loss 7.7328   LearningRate 0.0583   Epoch: 4   Global Step: 78930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:54,825-Speed 9426.24 samples/sec   Loss 7.5830   LearningRate 0.0583   Epoch: 4   Global Step: 78940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:55,892-Speed 9598.51 samples/sec   Loss 7.6245   LearningRate 0.0583   Epoch: 4   Global Step: 78950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:56,985-Speed 9372.99 samples/sec   Loss 7.6217   LearningRate 0.0583   Epoch: 4   Global Step: 78960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:58,038-Speed 9736.32 samples/sec   Loss 7.6176   LearningRate 0.0583   Epoch: 4   Global Step: 78970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:42:59,183-Speed 8948.59 samples/sec   Loss 7.7644   LearningRate 0.0583   Epoch: 4   Global Step: 78980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:43:00,250-Speed 9600.04 samples/sec   Loss 7.6796   LearningRate 0.0583   Epoch: 4   Global Step: 78990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:43:01,402-Speed 8894.24 samples/sec   Loss 7.6895   LearningRate 0.0583   Epoch: 4   Global Step: 79000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:43:02,443-Speed 9840.62 samples/sec   Loss 7.6015   LearningRate 0.0583   Epoch: 4   Global Step: 79010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:03,519-Speed 9527.06 samples/sec   Loss 7.6710   LearningRate 0.0583   Epoch: 4   Global Step: 79020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:04,584-Speed 9615.03 samples/sec   Loss 7.6409   LearningRate 0.0583   Epoch: 4   Global Step: 79030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:05,683-Speed 9324.21 samples/sec   Loss 7.6685   LearningRate 0.0583   Epoch: 4   Global Step: 79040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:06,789-Speed 9267.66 samples/sec   Loss 7.6506   LearningRate 0.0582   Epoch: 4   Global Step: 79050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:07,922-Speed 9043.42 samples/sec   Loss 7.6567   LearningRate 0.0582   Epoch: 4   Global Step: 79060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:09,030-Speed 9246.58 samples/sec   Loss 7.6822   LearningRate 0.0582   Epoch: 4   Global Step: 79070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:10,104-Speed 9543.21 samples/sec   Loss 7.7039   LearningRate 0.0582   Epoch: 4   Global Step: 79080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:11,164-Speed 9661.00 samples/sec   Loss 7.5988   LearningRate 0.0582   Epoch: 4   Global Step: 79090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:43:12,256-Speed 9402.19 samples/sec   Loss 7.6416   LearningRate 0.0582   Epoch: 4   Global Step: 79100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:43:13,339-Speed 9460.50 samples/sec   Loss 7.6069   LearningRate 0.0582   Epoch: 4   Global Step: 79110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:43:14,436-Speed 9339.38 samples/sec   Loss 7.6536   LearningRate 0.0582   Epoch: 4   Global Step: 79120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:43:15,505-Speed 9577.33 samples/sec   Loss 7.6769   LearningRate 0.0582   Epoch: 4   Global Step: 79130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:43:16,593-Speed 9424.41 samples/sec   Loss 7.6410   LearningRate 0.0582   Epoch: 4   Global Step: 79140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:43:17,689-Speed 9349.02 samples/sec   Loss 7.5814   LearningRate 0.0582   Epoch: 4   Global Step: 79150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:43:18,759-Speed 9574.26 samples/sec   Loss 7.5709   LearningRate 0.0582   Epoch: 4   Global Step: 79160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:43:19,880-Speed 9140.76 samples/sec   Loss 7.6847   LearningRate 0.0582   Epoch: 4   Global Step: 79170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:43:20,997-Speed 9168.11 samples/sec   Loss 7.5840   LearningRate 0.0582   Epoch: 4   Global Step: 79180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:43:22,130-Speed 9042.92 samples/sec   Loss 7.5816   LearningRate 0.0582   Epoch: 4   Global Step: 79190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:23,279-Speed 8916.48 samples/sec   Loss 7.6273   LearningRate 0.0582   Epoch: 4   Global Step: 79200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:24,320-Speed 9854.97 samples/sec   Loss 7.5100   LearningRate 0.0582   Epoch: 4   Global Step: 79210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:25,398-Speed 9498.46 samples/sec   Loss 7.7014   LearningRate 0.0582   Epoch: 4   Global Step: 79220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:26,452-Speed 9719.57 samples/sec   Loss 7.5915   LearningRate 0.0582   Epoch: 4   Global Step: 79230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:27,539-Speed 9425.27 samples/sec   Loss 7.5320   LearningRate 0.0582   Epoch: 4   Global Step: 79240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:28,679-Speed 8991.43 samples/sec   Loss 7.6848   LearningRate 0.0582   Epoch: 4   Global Step: 79250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:29,777-Speed 9331.71 samples/sec   Loss 7.6142   LearningRate 0.0582   Epoch: 4   Global Step: 79260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:30,848-Speed 9563.16 samples/sec   Loss 7.5720   LearningRate 0.0581   Epoch: 4   Global Step: 79270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:31,947-Speed 9322.63 samples/sec   Loss 7.6906   LearningRate 0.0581   Epoch: 4   Global Step: 79280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:33,083-Speed 9018.61 samples/sec   Loss 7.6502   LearningRate 0.0581   Epoch: 4   Global Step: 79290   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:43:34,160-Speed 9516.85 samples/sec   Loss 7.6267   LearningRate 0.0581   Epoch: 4   Global Step: 79300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:35,240-Speed 9480.89 samples/sec   Loss 7.6877   LearningRate 0.0581   Epoch: 4   Global Step: 79310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:36,368-Speed 9084.06 samples/sec   Loss 7.6617   LearningRate 0.0581   Epoch: 4   Global Step: 79320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:37,465-Speed 9345.50 samples/sec   Loss 7.5734   LearningRate 0.0581   Epoch: 4   Global Step: 79330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:38,562-Speed 9343.55 samples/sec   Loss 7.6454   LearningRate 0.0581   Epoch: 4   Global Step: 79340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:39,651-Speed 9401.79 samples/sec   Loss 7.6354   LearningRate 0.0581   Epoch: 4   Global Step: 79350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:40,784-Speed 9044.83 samples/sec   Loss 7.5582   LearningRate 0.0581   Epoch: 4   Global Step: 79360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:41,898-Speed 9200.00 samples/sec   Loss 7.6135   LearningRate 0.0581   Epoch: 4   Global Step: 79370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:42,971-Speed 9548.00 samples/sec   Loss 7.7361   LearningRate 0.0581   Epoch: 4   Global Step: 79380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:44,007-Speed 9887.87 samples/sec   Loss 7.6591   LearningRate 0.0581   Epoch: 4   Global Step: 79390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:45,091-Speed 9454.74 samples/sec   Loss 7.6755   LearningRate 0.0581   Epoch: 4   Global Step: 79400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:46,132-Speed 9844.13 samples/sec   Loss 7.6758   LearningRate 0.0581   Epoch: 4   Global Step: 79410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:47,235-Speed 9290.34 samples/sec   Loss 7.7314   LearningRate 0.0581   Epoch: 4   Global Step: 79420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:48,308-Speed 9545.67 samples/sec   Loss 7.7109   LearningRate 0.0581   Epoch: 4   Global Step: 79430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:49,387-Speed 9501.17 samples/sec   Loss 7.5758   LearningRate 0.0581   Epoch: 4   Global Step: 79440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:50,456-Speed 9585.02 samples/sec   Loss 7.6092   LearningRate 0.0581   Epoch: 4   Global Step: 79450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:51,566-Speed 9227.17 samples/sec   Loss 7.6520   LearningRate 0.0581   Epoch: 4   Global Step: 79460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:52,649-Speed 9461.90 samples/sec   Loss 7.5804   LearningRate 0.0581   Epoch: 4   Global Step: 79470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:53,750-Speed 9311.20 samples/sec   Loss 7.6462   LearningRate 0.0581   Epoch: 4   Global Step: 79480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:54,826-Speed 9520.96 samples/sec   Loss 7.6264   LearningRate 0.0580   Epoch: 4   Global Step: 79490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:43:55,874-Speed 9775.78 samples/sec   Loss 7.6676   LearningRate 0.0580   Epoch: 4   Global Step: 79500   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:43:56,976-Speed 9297.48 samples/sec   Loss 7.6555   LearningRate 0.0580   Epoch: 4   Global Step: 79510   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:43:58,030-Speed 9715.73 samples/sec   Loss 7.6472   LearningRate 0.0580   Epoch: 4   Global Step: 79520   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:43:59,071-Speed 9842.48 samples/sec   Loss 7.6761   LearningRate 0.0580   Epoch: 4   Global Step: 79530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:00,183-Speed 9215.54 samples/sec   Loss 7.6316   LearningRate 0.0580   Epoch: 4   Global Step: 79540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:01,226-Speed 9825.17 samples/sec   Loss 7.5183   LearningRate 0.0580   Epoch: 4   Global Step: 79550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:02,300-Speed 9538.50 samples/sec   Loss 7.6593   LearningRate 0.0580   Epoch: 4   Global Step: 79560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:03,358-Speed 9681.66 samples/sec   Loss 7.6537   LearningRate 0.0580   Epoch: 4   Global Step: 79570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:04,465-Speed 9255.63 samples/sec   Loss 7.4911   LearningRate 0.0580   Epoch: 4   Global Step: 79580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:05,528-Speed 9647.10 samples/sec   Loss 7.5809   LearningRate 0.0580   Epoch: 4   Global Step: 79590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:06,639-Speed 9220.29 samples/sec   Loss 7.5796   LearningRate 0.0580   Epoch: 4   Global Step: 79600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:07,765-Speed 9100.09 samples/sec   Loss 7.6817   LearningRate 0.0580   Epoch: 4   Global Step: 79610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:08,838-Speed 9548.29 samples/sec   Loss 7.6712   LearningRate 0.0580   Epoch: 4   Global Step: 79620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:09,874-Speed 9892.85 samples/sec   Loss 7.7449   LearningRate 0.0580   Epoch: 4   Global Step: 79630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:10,919-Speed 9799.17 samples/sec   Loss 7.5920   LearningRate 0.0580   Epoch: 4   Global Step: 79640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:12,020-Speed 9302.76 samples/sec   Loss 7.7382   LearningRate 0.0580   Epoch: 4   Global Step: 79650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:13,108-Speed 9420.27 samples/sec   Loss 7.5549   LearningRate 0.0580   Epoch: 4   Global Step: 79660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:14,190-Speed 9466.24 samples/sec   Loss 7.5684   LearningRate 0.0580   Epoch: 4   Global Step: 79670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:15,286-Speed 9347.86 samples/sec   Loss 7.6755   LearningRate 0.0580   Epoch: 4   Global Step: 79680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:16,351-Speed 9621.14 samples/sec   Loss 7.6882   LearningRate 0.0580   Epoch: 4   Global Step: 79690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:17,432-Speed 9486.93 samples/sec   Loss 7.6076   LearningRate 0.0579   Epoch: 4   Global Step: 79700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:18,546-Speed 9192.29 samples/sec   Loss 7.6891   LearningRate 0.0579   Epoch: 4   Global Step: 79710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:19,639-Speed 9378.28 samples/sec   Loss 7.6508   LearningRate 0.0579   Epoch: 4   Global Step: 79720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:20,740-Speed 9302.98 samples/sec   Loss 7.6833   LearningRate 0.0579   Epoch: 4   Global Step: 79730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:21,816-Speed 9519.02 samples/sec   Loss 7.6770   LearningRate 0.0579   Epoch: 4   Global Step: 79740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:22,896-Speed 9493.68 samples/sec   Loss 7.6838   LearningRate 0.0579   Epoch: 4   Global Step: 79750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:23,962-Speed 9621.10 samples/sec   Loss 7.5915   LearningRate 0.0579   Epoch: 4   Global Step: 79760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:25,054-Speed 9375.86 samples/sec   Loss 7.6719   LearningRate 0.0579   Epoch: 4   Global Step: 79770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:26,121-Speed 9607.18 samples/sec   Loss 7.6355   LearningRate 0.0579   Epoch: 4   Global Step: 79780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:27,191-Speed 9577.28 samples/sec   Loss 7.6503   LearningRate 0.0579   Epoch: 4   Global Step: 79790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:28,287-Speed 9347.01 samples/sec   Loss 7.5247   LearningRate 0.0579   Epoch: 4   Global Step: 79800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:29,338-Speed 9742.72 samples/sec   Loss 7.4152   LearningRate 0.0579   Epoch: 4   Global Step: 79810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:30,414-Speed 9525.27 samples/sec   Loss 7.5846   LearningRate 0.0579   Epoch: 4   Global Step: 79820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:31,471-Speed 9693.16 samples/sec   Loss 7.6397   LearningRate 0.0579   Epoch: 4   Global Step: 79830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:32,606-Speed 9027.21 samples/sec   Loss 7.6242   LearningRate 0.0579   Epoch: 4   Global Step: 79840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:33,661-Speed 9712.33 samples/sec   Loss 7.6439   LearningRate 0.0579   Epoch: 4   Global Step: 79850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:34,773-Speed 9211.60 samples/sec   Loss 7.6878   LearningRate 0.0579   Epoch: 4   Global Step: 79860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:35,865-Speed 9386.66 samples/sec   Loss 7.6978   LearningRate 0.0579   Epoch: 4   Global Step: 79870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:36,976-Speed 9224.70 samples/sec   Loss 7.5249   LearningRate 0.0579   Epoch: 4   Global Step: 79880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:38,091-Speed 9192.60 samples/sec   Loss 7.7018   LearningRate 0.0579   Epoch: 4   Global Step: 79890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:39,170-Speed 9491.98 samples/sec   Loss 7.6060   LearningRate 0.0579   Epoch: 4   Global Step: 79900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:40,270-Speed 9316.88 samples/sec   Loss 7.6314   LearningRate 0.0579   Epoch: 4   Global Step: 79910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:41,338-Speed 9592.42 samples/sec   Loss 7.6474   LearningRate 0.0578   Epoch: 4   Global Step: 79920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:44:42,431-Speed 9373.79 samples/sec   Loss 7.6218   LearningRate 0.0578   Epoch: 4   Global Step: 79930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:43,528-Speed 9338.63 samples/sec   Loss 7.5501   LearningRate 0.0578   Epoch: 4   Global Step: 79940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:44,641-Speed 9210.30 samples/sec   Loss 7.6886   LearningRate 0.0578   Epoch: 4   Global Step: 79950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:45,768-Speed 9095.44 samples/sec   Loss 7.6035   LearningRate 0.0578   Epoch: 4   Global Step: 79960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:46,869-Speed 9303.90 samples/sec   Loss 7.5249   LearningRate 0.0578   Epoch: 4   Global Step: 79970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:47,949-Speed 9492.00 samples/sec   Loss 7.6747   LearningRate 0.0578   Epoch: 4   Global Step: 79980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:49,023-Speed 9539.43 samples/sec   Loss 7.6832   LearningRate 0.0578   Epoch: 4   Global Step: 79990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:44:50,085-Speed 9652.42 samples/sec   Loss 7.7134   LearningRate 0.0578   Epoch: 4   Global Step: 80000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:45:12,056-[lfw][80000]XNorm: 12.140908
Training: 2022-04-11 14:45:12,057-[lfw][80000]Accuracy-Flip: 0.99617+-0.00183
Training: 2022-04-11 14:45:12,057-[lfw][80000]Accuracy-Highest: 0.99633
Training: 2022-04-11 14:45:37,494-[cfp_fp][80000]XNorm: 10.246566
Training: 2022-04-11 14:45:37,495-[cfp_fp][80000]Accuracy-Flip: 0.95157+-0.01238
Training: 2022-04-11 14:45:37,496-[cfp_fp][80000]Accuracy-Highest: 0.95400
Training: 2022-04-11 14:45:59,450-[agedb_30][80000]XNorm: 11.691534
Training: 2022-04-11 14:45:59,450-[agedb_30][80000]Accuracy-Flip: 0.96083+-0.00938
Training: 2022-04-11 14:45:59,450-[agedb_30][80000]Accuracy-Highest: 0.96083
Training: 2022-04-11 14:46:00,495-Speed 145.43 samples/sec   Loss 7.5942   LearningRate 0.0578   Epoch: 4   Global Step: 80010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:01,570-Speed 9529.84 samples/sec   Loss 7.7299   LearningRate 0.0578   Epoch: 4   Global Step: 80020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:02,638-Speed 9594.28 samples/sec   Loss 7.6507   LearningRate 0.0578   Epoch: 4   Global Step: 80030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:03,712-Speed 9537.68 samples/sec   Loss 7.5572   LearningRate 0.0578   Epoch: 4   Global Step: 80040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:04,760-Speed 9781.99 samples/sec   Loss 7.6443   LearningRate 0.0578   Epoch: 4   Global Step: 80050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:05,832-Speed 9555.99 samples/sec   Loss 7.5612   LearningRate 0.0578   Epoch: 4   Global Step: 80060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:06,893-Speed 9655.78 samples/sec   Loss 7.6196   LearningRate 0.0578   Epoch: 4   Global Step: 80070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:07,966-Speed 9554.50 samples/sec   Loss 7.6145   LearningRate 0.0578   Epoch: 4   Global Step: 80080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:09,029-Speed 9638.98 samples/sec   Loss 7.5917   LearningRate 0.0578   Epoch: 4   Global Step: 80090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:10,124-Speed 9357.75 samples/sec   Loss 7.7587   LearningRate 0.0578   Epoch: 4   Global Step: 80100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:11,190-Speed 9609.94 samples/sec   Loss 7.5311   LearningRate 0.0578   Epoch: 4   Global Step: 80110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:12,273-Speed 9460.28 samples/sec   Loss 7.5843   LearningRate 0.0578   Epoch: 4   Global Step: 80120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:13,335-Speed 9650.33 samples/sec   Loss 7.6020   LearningRate 0.0578   Epoch: 4   Global Step: 80130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:14,406-Speed 9563.90 samples/sec   Loss 7.4643   LearningRate 0.0577   Epoch: 4   Global Step: 80140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:15,486-Speed 9490.58 samples/sec   Loss 7.6195   LearningRate 0.0577   Epoch: 4   Global Step: 80150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:16,569-Speed 9460.94 samples/sec   Loss 7.6370   LearningRate 0.0577   Epoch: 4   Global Step: 80160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:17,616-Speed 9779.01 samples/sec   Loss 7.4860   LearningRate 0.0577   Epoch: 4   Global Step: 80170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:18,729-Speed 9214.98 samples/sec   Loss 7.5098   LearningRate 0.0577   Epoch: 4   Global Step: 80180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:19,793-Speed 9625.47 samples/sec   Loss 7.6363   LearningRate 0.0577   Epoch: 4   Global Step: 80190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:20,849-Speed 9698.04 samples/sec   Loss 7.5132   LearningRate 0.0577   Epoch: 4   Global Step: 80200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:21,913-Speed 9637.70 samples/sec   Loss 7.6520   LearningRate 0.0577   Epoch: 4   Global Step: 80210   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:46:22,976-Speed 9635.04 samples/sec   Loss 7.5011   LearningRate 0.0577   Epoch: 4   Global Step: 80220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:24,051-Speed 9530.60 samples/sec   Loss 7.6032   LearningRate 0.0577   Epoch: 4   Global Step: 80230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:25,105-Speed 9717.12 samples/sec   Loss 7.5876   LearningRate 0.0577   Epoch: 4   Global Step: 80240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:26,225-Speed 9152.13 samples/sec   Loss 7.6088   LearningRate 0.0577   Epoch: 4   Global Step: 80250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:27,344-Speed 9154.30 samples/sec   Loss 7.6109   LearningRate 0.0577   Epoch: 4   Global Step: 80260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:28,459-Speed 9188.34 samples/sec   Loss 7.4627   LearningRate 0.0577   Epoch: 4   Global Step: 80270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:29,541-Speed 9471.34 samples/sec   Loss 7.6277   LearningRate 0.0577   Epoch: 4   Global Step: 80280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:30,598-Speed 9693.26 samples/sec   Loss 7.6521   LearningRate 0.0577   Epoch: 4   Global Step: 80290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:31,711-Speed 9209.01 samples/sec   Loss 7.6173   LearningRate 0.0577   Epoch: 4   Global Step: 80300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:32,815-Speed 9279.39 samples/sec   Loss 7.7147   LearningRate 0.0577   Epoch: 4   Global Step: 80310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:33,838-Speed 10023.87 samples/sec   Loss 7.5528   LearningRate 0.0577   Epoch: 4   Global Step: 80320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:34,919-Speed 9474.96 samples/sec   Loss 7.4884   LearningRate 0.0577   Epoch: 4   Global Step: 80330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:35,991-Speed 9560.25 samples/sec   Loss 7.6017   LearningRate 0.0577   Epoch: 4   Global Step: 80340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:37,054-Speed 9635.80 samples/sec   Loss 7.5063   LearningRate 0.0577   Epoch: 4   Global Step: 80350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:38,144-Speed 9400.08 samples/sec   Loss 7.6289   LearningRate 0.0576   Epoch: 4   Global Step: 80360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:39,211-Speed 9597.04 samples/sec   Loss 7.6931   LearningRate 0.0576   Epoch: 4   Global Step: 80370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:40,254-Speed 9823.03 samples/sec   Loss 7.5676   LearningRate 0.0576   Epoch: 4   Global Step: 80380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:41,351-Speed 9345.89 samples/sec   Loss 7.4775   LearningRate 0.0576   Epoch: 4   Global Step: 80390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:42,470-Speed 9156.69 samples/sec   Loss 7.5979   LearningRate 0.0576   Epoch: 4   Global Step: 80400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:43,589-Speed 9155.22 samples/sec   Loss 7.7466   LearningRate 0.0576   Epoch: 4   Global Step: 80410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:44,665-Speed 9523.15 samples/sec   Loss 7.6292   LearningRate 0.0576   Epoch: 4   Global Step: 80420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:45,738-Speed 9545.37 samples/sec   Loss 7.5393   LearningRate 0.0576   Epoch: 4   Global Step: 80430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:46,857-Speed 9158.42 samples/sec   Loss 7.6011   LearningRate 0.0576   Epoch: 4   Global Step: 80440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:47,918-Speed 9655.28 samples/sec   Loss 7.6049   LearningRate 0.0576   Epoch: 4   Global Step: 80450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:49,070-Speed 8895.57 samples/sec   Loss 7.6685   LearningRate 0.0576   Epoch: 4   Global Step: 80460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:50,173-Speed 9290.95 samples/sec   Loss 7.6443   LearningRate 0.0576   Epoch: 4   Global Step: 80470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:51,251-Speed 9507.96 samples/sec   Loss 7.5933   LearningRate 0.0576   Epoch: 4   Global Step: 80480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:52,337-Speed 9434.81 samples/sec   Loss 7.6591   LearningRate 0.0576   Epoch: 4   Global Step: 80490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:53,434-Speed 9338.90 samples/sec   Loss 7.6044   LearningRate 0.0576   Epoch: 4   Global Step: 80500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:54,530-Speed 9346.34 samples/sec   Loss 7.5204   LearningRate 0.0576   Epoch: 4   Global Step: 80510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:46:55,639-Speed 9243.30 samples/sec   Loss 7.5135   LearningRate 0.0576   Epoch: 4   Global Step: 80520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:56,740-Speed 9302.07 samples/sec   Loss 7.5288   LearningRate 0.0576   Epoch: 4   Global Step: 80530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:57,819-Speed 9494.90 samples/sec   Loss 7.6336   LearningRate 0.0576   Epoch: 4   Global Step: 80540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:58,916-Speed 9343.78 samples/sec   Loss 7.6237   LearningRate 0.0576   Epoch: 4   Global Step: 80550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:46:59,997-Speed 9477.82 samples/sec   Loss 7.5137   LearningRate 0.0576   Epoch: 4   Global Step: 80560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:01,083-Speed 9436.33 samples/sec   Loss 7.6105   LearningRate 0.0576   Epoch: 4   Global Step: 80570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:02,168-Speed 9437.44 samples/sec   Loss 7.7041   LearningRate 0.0575   Epoch: 4   Global Step: 80580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:03,256-Speed 9420.58 samples/sec   Loss 7.5461   LearningRate 0.0575   Epoch: 4   Global Step: 80590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:04,287-Speed 9934.15 samples/sec   Loss 7.6228   LearningRate 0.0575   Epoch: 4   Global Step: 80600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:05,350-Speed 9644.31 samples/sec   Loss 7.6515   LearningRate 0.0575   Epoch: 4   Global Step: 80610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:06,428-Speed 9500.78 samples/sec   Loss 7.5587   LearningRate 0.0575   Epoch: 4   Global Step: 80620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:07,530-Speed 9293.78 samples/sec   Loss 7.6525   LearningRate 0.0575   Epoch: 4   Global Step: 80630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:08,628-Speed 9334.03 samples/sec   Loss 7.4652   LearningRate 0.0575   Epoch: 4   Global Step: 80640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:09,654-Speed 9983.09 samples/sec   Loss 7.3554   LearningRate 0.0575   Epoch: 4   Global Step: 80650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:10,743-Speed 9412.84 samples/sec   Loss 7.5148   LearningRate 0.0575   Epoch: 4   Global Step: 80660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:11,825-Speed 9473.11 samples/sec   Loss 7.4732   LearningRate 0.0575   Epoch: 4   Global Step: 80670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:12,861-Speed 9888.51 samples/sec   Loss 7.6090   LearningRate 0.0575   Epoch: 4   Global Step: 80680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:13,931-Speed 9580.08 samples/sec   Loss 7.5580   LearningRate 0.0575   Epoch: 4   Global Step: 80690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:15,013-Speed 9466.49 samples/sec   Loss 7.5368   LearningRate 0.0575   Epoch: 4   Global Step: 80700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:16,098-Speed 9444.07 samples/sec   Loss 7.6092   LearningRate 0.0575   Epoch: 4   Global Step: 80710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:17,208-Speed 9230.91 samples/sec   Loss 7.5738   LearningRate 0.0575   Epoch: 4   Global Step: 80720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:18,298-Speed 9401.46 samples/sec   Loss 7.6380   LearningRate 0.0575   Epoch: 4   Global Step: 80730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:19,367-Speed 9579.06 samples/sec   Loss 7.8047   LearningRate 0.0575   Epoch: 4   Global Step: 80740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:20,469-Speed 9298.51 samples/sec   Loss 7.6525   LearningRate 0.0575   Epoch: 4   Global Step: 80750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:21,554-Speed 9443.85 samples/sec   Loss 7.6468   LearningRate 0.0575   Epoch: 4   Global Step: 80760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:22,638-Speed 9452.94 samples/sec   Loss 7.5903   LearningRate 0.0575   Epoch: 4   Global Step: 80770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:23,746-Speed 9242.37 samples/sec   Loss 7.6305   LearningRate 0.0575   Epoch: 4   Global Step: 80780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:24,816-Speed 9578.94 samples/sec   Loss 7.4774   LearningRate 0.0575   Epoch: 4   Global Step: 80790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:25,871-Speed 9713.82 samples/sec   Loss 7.5449   LearningRate 0.0574   Epoch: 4   Global Step: 80800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:26,957-Speed 9434.46 samples/sec   Loss 7.5410   LearningRate 0.0574   Epoch: 4   Global Step: 80810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:28,015-Speed 9680.41 samples/sec   Loss 7.5850   LearningRate 0.0574   Epoch: 4   Global Step: 80820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:29,116-Speed 9310.98 samples/sec   Loss 7.6086   LearningRate 0.0574   Epoch: 4   Global Step: 80830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:30,212-Speed 9345.87 samples/sec   Loss 7.5443   LearningRate 0.0574   Epoch: 4   Global Step: 80840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:31,292-Speed 9495.04 samples/sec   Loss 7.5814   LearningRate 0.0574   Epoch: 4   Global Step: 80850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:32,355-Speed 9633.74 samples/sec   Loss 7.5575   LearningRate 0.0574   Epoch: 4   Global Step: 80860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:33,424-Speed 9582.88 samples/sec   Loss 7.5879   LearningRate 0.0574   Epoch: 4   Global Step: 80870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:34,499-Speed 9532.86 samples/sec   Loss 7.6831   LearningRate 0.0574   Epoch: 4   Global Step: 80880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:35,544-Speed 9804.61 samples/sec   Loss 7.6507   LearningRate 0.0574   Epoch: 4   Global Step: 80890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:36,617-Speed 9549.16 samples/sec   Loss 7.5562   LearningRate 0.0574   Epoch: 4   Global Step: 80900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:37,667-Speed 9764.76 samples/sec   Loss 7.5857   LearningRate 0.0574   Epoch: 4   Global Step: 80910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:38,761-Speed 9359.20 samples/sec   Loss 7.6059   LearningRate 0.0574   Epoch: 4   Global Step: 80920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:39,868-Speed 9261.12 samples/sec   Loss 7.6655   LearningRate 0.0574   Epoch: 4   Global Step: 80930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:40,937-Speed 9582.84 samples/sec   Loss 7.5098   LearningRate 0.0574   Epoch: 4   Global Step: 80940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:41,996-Speed 9676.92 samples/sec   Loss 7.5171   LearningRate 0.0574   Epoch: 4   Global Step: 80950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:43,096-Speed 9312.63 samples/sec   Loss 7.6478   LearningRate 0.0574   Epoch: 4   Global Step: 80960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:44,202-Speed 9261.63 samples/sec   Loss 7.5671   LearningRate 0.0574   Epoch: 4   Global Step: 80970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:45,289-Speed 9432.20 samples/sec   Loss 7.5659   LearningRate 0.0574   Epoch: 4   Global Step: 80980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:46,349-Speed 9663.44 samples/sec   Loss 7.5846   LearningRate 0.0574   Epoch: 4   Global Step: 80990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:47,417-Speed 9591.50 samples/sec   Loss 7.5476   LearningRate 0.0574   Epoch: 4   Global Step: 81000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:47:48,526-Speed 9247.91 samples/sec   Loss 7.6058   LearningRate 0.0574   Epoch: 4   Global Step: 81010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:49,617-Speed 9386.96 samples/sec   Loss 7.5739   LearningRate 0.0573   Epoch: 4   Global Step: 81020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:50,667-Speed 9757.77 samples/sec   Loss 7.6020   LearningRate 0.0573   Epoch: 4   Global Step: 81030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:51,766-Speed 9323.37 samples/sec   Loss 7.4763   LearningRate 0.0573   Epoch: 4   Global Step: 81040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:52,852-Speed 9438.94 samples/sec   Loss 7.5994   LearningRate 0.0573   Epoch: 4   Global Step: 81050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:53,919-Speed 9601.62 samples/sec   Loss 7.6838   LearningRate 0.0573   Epoch: 4   Global Step: 81060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:54,980-Speed 9652.92 samples/sec   Loss 7.4805   LearningRate 0.0573   Epoch: 4   Global Step: 81070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:56,073-Speed 9371.86 samples/sec   Loss 7.6918   LearningRate 0.0573   Epoch: 4   Global Step: 81080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:57,155-Speed 9474.12 samples/sec   Loss 7.6284   LearningRate 0.0573   Epoch: 4   Global Step: 81090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:58,235-Speed 9484.99 samples/sec   Loss 7.4075   LearningRate 0.0573   Epoch: 4   Global Step: 81100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:47:59,361-Speed 9100.88 samples/sec   Loss 7.6548   LearningRate 0.0573   Epoch: 4   Global Step: 81110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:00,445-Speed 9448.01 samples/sec   Loss 7.4049   LearningRate 0.0573   Epoch: 4   Global Step: 81120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:01,528-Speed 9467.29 samples/sec   Loss 7.6265   LearningRate 0.0573   Epoch: 4   Global Step: 81130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:02,615-Speed 9422.25 samples/sec   Loss 7.5795   LearningRate 0.0573   Epoch: 4   Global Step: 81140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:03,737-Speed 9128.46 samples/sec   Loss 7.4682   LearningRate 0.0573   Epoch: 4   Global Step: 81150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:04,797-Speed 9669.25 samples/sec   Loss 7.5919   LearningRate 0.0573   Epoch: 4   Global Step: 81160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:05,873-Speed 9526.18 samples/sec   Loss 7.6015   LearningRate 0.0573   Epoch: 4   Global Step: 81170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:06,945-Speed 9557.41 samples/sec   Loss 7.5697   LearningRate 0.0573   Epoch: 4   Global Step: 81180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:08,087-Speed 8965.85 samples/sec   Loss 7.6876   LearningRate 0.0573   Epoch: 4   Global Step: 81190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:09,139-Speed 9745.39 samples/sec   Loss 7.6244   LearningRate 0.0573   Epoch: 4   Global Step: 81200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:10,196-Speed 9696.09 samples/sec   Loss 7.5883   LearningRate 0.0573   Epoch: 4   Global Step: 81210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:11,299-Speed 9282.65 samples/sec   Loss 7.5061   LearningRate 0.0573   Epoch: 4   Global Step: 81220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:12,390-Speed 9398.57 samples/sec   Loss 7.5420   LearningRate 0.0573   Epoch: 4   Global Step: 81230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:13,479-Speed 9412.09 samples/sec   Loss 7.6061   LearningRate 0.0572   Epoch: 4   Global Step: 81240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:14,574-Speed 9352.28 samples/sec   Loss 7.5260   LearningRate 0.0572   Epoch: 4   Global Step: 81250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:15,647-Speed 9545.74 samples/sec   Loss 7.6761   LearningRate 0.0572   Epoch: 4   Global Step: 81260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:16,694-Speed 9787.87 samples/sec   Loss 7.4123   LearningRate 0.0572   Epoch: 4   Global Step: 81270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:17,800-Speed 9262.20 samples/sec   Loss 7.6168   LearningRate 0.0572   Epoch: 4   Global Step: 81280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:18,884-Speed 9458.08 samples/sec   Loss 7.5734   LearningRate 0.0572   Epoch: 4   Global Step: 81290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:20,007-Speed 9124.53 samples/sec   Loss 7.6251   LearningRate 0.0572   Epoch: 4   Global Step: 81300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:21,092-Speed 9444.28 samples/sec   Loss 7.6272   LearningRate 0.0572   Epoch: 4   Global Step: 81310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:22,182-Speed 9392.41 samples/sec   Loss 7.5711   LearningRate 0.0572   Epoch: 4   Global Step: 81320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:23,269-Speed 9425.08 samples/sec   Loss 7.5356   LearningRate 0.0572   Epoch: 4   Global Step: 81330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:24,394-Speed 9112.02 samples/sec   Loss 7.5252   LearningRate 0.0572   Epoch: 4   Global Step: 81340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:25,470-Speed 9521.62 samples/sec   Loss 7.6183   LearningRate 0.0572   Epoch: 4   Global Step: 81350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:26,534-Speed 9630.57 samples/sec   Loss 7.5484   LearningRate 0.0572   Epoch: 4   Global Step: 81360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:27,599-Speed 9617.55 samples/sec   Loss 7.6389   LearningRate 0.0572   Epoch: 4   Global Step: 81370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:28,706-Speed 9257.67 samples/sec   Loss 7.5060   LearningRate 0.0572   Epoch: 4   Global Step: 81380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:29,765-Speed 9674.36 samples/sec   Loss 7.5391   LearningRate 0.0572   Epoch: 4   Global Step: 81390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:30,830-Speed 9629.56 samples/sec   Loss 7.5465   LearningRate 0.0572   Epoch: 4   Global Step: 81400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:31,901-Speed 9567.05 samples/sec   Loss 7.5156   LearningRate 0.0572   Epoch: 4   Global Step: 81410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:32,949-Speed 9777.37 samples/sec   Loss 7.5521   LearningRate 0.0572   Epoch: 4   Global Step: 81420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:34,097-Speed 8922.76 samples/sec   Loss 7.6537   LearningRate 0.0572   Epoch: 4   Global Step: 81430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:35,159-Speed 9650.95 samples/sec   Loss 7.5646   LearningRate 0.0572   Epoch: 4   Global Step: 81440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:36,269-Speed 9231.33 samples/sec   Loss 7.5114   LearningRate 0.0572   Epoch: 4   Global Step: 81450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:37,310-Speed 9836.06 samples/sec   Loss 7.6591   LearningRate 0.0572   Epoch: 4   Global Step: 81460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:38,369-Speed 9678.57 samples/sec   Loss 7.5401   LearningRate 0.0571   Epoch: 4   Global Step: 81470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:39,480-Speed 9217.86 samples/sec   Loss 7.5610   LearningRate 0.0571   Epoch: 4   Global Step: 81480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:40,548-Speed 9600.57 samples/sec   Loss 7.5362   LearningRate 0.0571   Epoch: 4   Global Step: 81490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:41,652-Speed 9281.67 samples/sec   Loss 7.5454   LearningRate 0.0571   Epoch: 4   Global Step: 81500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:42,733-Speed 9472.65 samples/sec   Loss 7.7549   LearningRate 0.0571   Epoch: 4   Global Step: 81510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:43,803-Speed 9574.62 samples/sec   Loss 7.5661   LearningRate 0.0571   Epoch: 4   Global Step: 81520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:44,916-Speed 9211.85 samples/sec   Loss 7.5043   LearningRate 0.0571   Epoch: 4   Global Step: 81530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:45,981-Speed 9613.37 samples/sec   Loss 7.5183   LearningRate 0.0571   Epoch: 4   Global Step: 81540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:47,091-Speed 9234.10 samples/sec   Loss 7.5882   LearningRate 0.0571   Epoch: 4   Global Step: 81550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:48,171-Speed 9490.87 samples/sec   Loss 7.5072   LearningRate 0.0571   Epoch: 4   Global Step: 81560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:48:49,258-Speed 9431.06 samples/sec   Loss 7.6258   LearningRate 0.0571   Epoch: 4   Global Step: 81570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:50,337-Speed 9487.86 samples/sec   Loss 7.5018   LearningRate 0.0571   Epoch: 4   Global Step: 81580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:51,427-Speed 9405.83 samples/sec   Loss 7.5396   LearningRate 0.0571   Epoch: 4   Global Step: 81590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:52,501-Speed 9541.15 samples/sec   Loss 7.6507   LearningRate 0.0571   Epoch: 4   Global Step: 81600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:53,570-Speed 9581.82 samples/sec   Loss 7.6453   LearningRate 0.0571   Epoch: 4   Global Step: 81610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:54,605-Speed 9895.39 samples/sec   Loss 7.5566   LearningRate 0.0571   Epoch: 4   Global Step: 81620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:55,675-Speed 9583.73 samples/sec   Loss 7.6676   LearningRate 0.0571   Epoch: 4   Global Step: 81630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:56,771-Speed 9346.87 samples/sec   Loss 7.4777   LearningRate 0.0571   Epoch: 4   Global Step: 81640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:57,827-Speed 9695.64 samples/sec   Loss 7.4784   LearningRate 0.0571   Epoch: 4   Global Step: 81650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:58,919-Speed 9383.49 samples/sec   Loss 7.6585   LearningRate 0.0571   Epoch: 4   Global Step: 81660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:48:59,983-Speed 9632.09 samples/sec   Loss 7.6009   LearningRate 0.0571   Epoch: 4   Global Step: 81670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:01,055-Speed 9561.36 samples/sec   Loss 7.6830   LearningRate 0.0571   Epoch: 4   Global Step: 81680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:02,130-Speed 9560.08 samples/sec   Loss 7.7026   LearningRate 0.0570   Epoch: 4   Global Step: 81690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:03,201-Speed 9566.24 samples/sec   Loss 7.6072   LearningRate 0.0570   Epoch: 4   Global Step: 81700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:04,268-Speed 9605.80 samples/sec   Loss 7.5413   LearningRate 0.0570   Epoch: 4   Global Step: 81710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:05,370-Speed 9297.27 samples/sec   Loss 7.5726   LearningRate 0.0570   Epoch: 4   Global Step: 81720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:06,466-Speed 9350.29 samples/sec   Loss 7.5226   LearningRate 0.0570   Epoch: 4   Global Step: 81730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:07,521-Speed 9716.74 samples/sec   Loss 7.4880   LearningRate 0.0570   Epoch: 4   Global Step: 81740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:08,619-Speed 9326.51 samples/sec   Loss 7.6370   LearningRate 0.0570   Epoch: 4   Global Step: 81750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:09,671-Speed 9738.39 samples/sec   Loss 7.5245   LearningRate 0.0570   Epoch: 4   Global Step: 81760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:10,748-Speed 9514.09 samples/sec   Loss 7.6072   LearningRate 0.0570   Epoch: 4   Global Step: 81770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:11,785-Speed 9885.18 samples/sec   Loss 7.5212   LearningRate 0.0570   Epoch: 4   Global Step: 81780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:12,849-Speed 9628.89 samples/sec   Loss 7.6712   LearningRate 0.0570   Epoch: 4   Global Step: 81790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:13,939-Speed 9404.87 samples/sec   Loss 7.4932   LearningRate 0.0570   Epoch: 4   Global Step: 81800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:14,996-Speed 9688.81 samples/sec   Loss 7.6187   LearningRate 0.0570   Epoch: 4   Global Step: 81810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:16,057-Speed 9660.23 samples/sec   Loss 7.4759   LearningRate 0.0570   Epoch: 4   Global Step: 81820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:17,120-Speed 9637.07 samples/sec   Loss 7.5993   LearningRate 0.0570   Epoch: 4   Global Step: 81830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:18,201-Speed 9479.54 samples/sec   Loss 7.5697   LearningRate 0.0570   Epoch: 4   Global Step: 81840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:19,306-Speed 9270.28 samples/sec   Loss 7.5501   LearningRate 0.0570   Epoch: 4   Global Step: 81850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:20,410-Speed 9284.38 samples/sec   Loss 7.5312   LearningRate 0.0570   Epoch: 4   Global Step: 81860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:21,494-Speed 9449.84 samples/sec   Loss 7.5353   LearningRate 0.0570   Epoch: 4   Global Step: 81870   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:49:22,547-Speed 9730.23 samples/sec   Loss 7.5328   LearningRate 0.0570   Epoch: 4   Global Step: 81880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:49:23,593-Speed 9790.55 samples/sec   Loss 7.4946   LearningRate 0.0570   Epoch: 4   Global Step: 81890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:49:24,672-Speed 9496.26 samples/sec   Loss 7.5457   LearningRate 0.0570   Epoch: 4   Global Step: 81900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:49:25,751-Speed 9502.83 samples/sec   Loss 7.6725   LearningRate 0.0569   Epoch: 4   Global Step: 81910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:49:26,831-Speed 9483.62 samples/sec   Loss 7.6741   LearningRate 0.0569   Epoch: 4   Global Step: 81920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:49:27,936-Speed 9268.49 samples/sec   Loss 7.6886   LearningRate 0.0569   Epoch: 4   Global Step: 81930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:49:29,019-Speed 9462.98 samples/sec   Loss 7.5496   LearningRate 0.0569   Epoch: 4   Global Step: 81940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:49:30,096-Speed 9516.49 samples/sec   Loss 7.5310   LearningRate 0.0569   Epoch: 4   Global Step: 81950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:49:31,184-Speed 9415.61 samples/sec   Loss 7.4911   LearningRate 0.0569   Epoch: 4   Global Step: 81960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:49:32,301-Speed 9176.68 samples/sec   Loss 7.6126   LearningRate 0.0569   Epoch: 4   Global Step: 81970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:49:33,417-Speed 9180.98 samples/sec   Loss 7.6094   LearningRate 0.0569   Epoch: 4   Global Step: 81980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:34,501-Speed 9451.58 samples/sec   Loss 7.6126   LearningRate 0.0569   Epoch: 4   Global Step: 81990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:35,603-Speed 9293.45 samples/sec   Loss 7.5589   LearningRate 0.0569   Epoch: 4   Global Step: 82000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:49:57,728-[lfw][82000]XNorm: 12.077396
Training: 2022-04-11 14:49:57,729-[lfw][82000]Accuracy-Flip: 0.99667+-0.00258
Training: 2022-04-11 14:49:57,729-[lfw][82000]Accuracy-Highest: 0.99667
Training: 2022-04-11 14:50:23,223-[cfp_fp][82000]XNorm: 10.286977
Training: 2022-04-11 14:50:23,224-[cfp_fp][82000]Accuracy-Flip: 0.94900+-0.01020
Training: 2022-04-11 14:50:23,224-[cfp_fp][82000]Accuracy-Highest: 0.95400
Training: 2022-04-11 14:50:45,254-[agedb_30][82000]XNorm: 11.711109
Training: 2022-04-11 14:50:45,254-[agedb_30][82000]Accuracy-Flip: 0.96300+-0.00894
Training: 2022-04-11 14:50:45,255-[agedb_30][82000]Accuracy-Highest: 0.96300
Training: 2022-04-11 14:50:46,336-Speed 144.77 samples/sec   Loss 7.4093   LearningRate 0.0569   Epoch: 4   Global Step: 82010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:50:47,420-Speed 9449.72 samples/sec   Loss 7.5476   LearningRate 0.0569   Epoch: 4   Global Step: 82020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:50:48,487-Speed 9601.62 samples/sec   Loss 7.7032   LearningRate 0.0569   Epoch: 4   Global Step: 82030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:50:49,570-Speed 9457.09 samples/sec   Loss 7.4803   LearningRate 0.0569   Epoch: 4   Global Step: 82040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:50:50,669-Speed 9323.85 samples/sec   Loss 7.5728   LearningRate 0.0569   Epoch: 4   Global Step: 82050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:50:51,709-Speed 9854.83 samples/sec   Loss 7.5819   LearningRate 0.0569   Epoch: 4   Global Step: 82060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:50:52,750-Speed 9852.03 samples/sec   Loss 7.5700   LearningRate 0.0569   Epoch: 4   Global Step: 82070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:50:53,837-Speed 9423.69 samples/sec   Loss 7.5986   LearningRate 0.0569   Epoch: 4   Global Step: 82080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:50:54,904-Speed 9603.79 samples/sec   Loss 7.6253   LearningRate 0.0569   Epoch: 4   Global Step: 82090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:50:55,987-Speed 9459.52 samples/sec   Loss 7.5834   LearningRate 0.0569   Epoch: 4   Global Step: 82100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:50:57,091-Speed 9277.86 samples/sec   Loss 7.5464   LearningRate 0.0569   Epoch: 4   Global Step: 82110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:50:58,174-Speed 9467.84 samples/sec   Loss 7.5347   LearningRate 0.0569   Epoch: 4   Global Step: 82120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:50:59,253-Speed 9488.32 samples/sec   Loss 7.5891   LearningRate 0.0568   Epoch: 4   Global Step: 82130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:00,330-Speed 9518.92 samples/sec   Loss 7.6193   LearningRate 0.0568   Epoch: 4   Global Step: 82140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:01,420-Speed 9396.02 samples/sec   Loss 7.5073   LearningRate 0.0568   Epoch: 4   Global Step: 82150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:02,519-Speed 9319.24 samples/sec   Loss 7.5819   LearningRate 0.0568   Epoch: 4   Global Step: 82160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:03,652-Speed 9045.51 samples/sec   Loss 7.6474   LearningRate 0.0568   Epoch: 4   Global Step: 82170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:04,784-Speed 9051.19 samples/sec   Loss 7.5410   LearningRate 0.0568   Epoch: 4   Global Step: 82180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:05,870-Speed 9437.42 samples/sec   Loss 7.5102   LearningRate 0.0568   Epoch: 4   Global Step: 82190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:06,968-Speed 9333.30 samples/sec   Loss 7.5027   LearningRate 0.0568   Epoch: 4   Global Step: 82200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:08,126-Speed 8845.64 samples/sec   Loss 7.4791   LearningRate 0.0568   Epoch: 4   Global Step: 82210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:09,249-Speed 9126.31 samples/sec   Loss 7.5927   LearningRate 0.0568   Epoch: 4   Global Step: 82220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:10,315-Speed 9611.86 samples/sec   Loss 7.4737   LearningRate 0.0568   Epoch: 4   Global Step: 82230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:11,390-Speed 9528.46 samples/sec   Loss 7.5694   LearningRate 0.0568   Epoch: 4   Global Step: 82240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:12,463-Speed 9553.47 samples/sec   Loss 7.5331   LearningRate 0.0568   Epoch: 4   Global Step: 82250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:13,511-Speed 9780.44 samples/sec   Loss 7.6035   LearningRate 0.0568   Epoch: 4   Global Step: 82260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:14,570-Speed 9672.92 samples/sec   Loss 7.5532   LearningRate 0.0568   Epoch: 4   Global Step: 82270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:15,671-Speed 9310.01 samples/sec   Loss 7.5646   LearningRate 0.0568   Epoch: 4   Global Step: 82280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:16,713-Speed 9834.65 samples/sec   Loss 7.6164   LearningRate 0.0568   Epoch: 4   Global Step: 82290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:17,768-Speed 9703.90 samples/sec   Loss 7.5100   LearningRate 0.0568   Epoch: 4   Global Step: 82300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:18,848-Speed 9492.01 samples/sec   Loss 7.5191   LearningRate 0.0568   Epoch: 4   Global Step: 82310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:19,932-Speed 9454.80 samples/sec   Loss 7.5521   LearningRate 0.0568   Epoch: 4   Global Step: 82320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:21,008-Speed 9518.92 samples/sec   Loss 7.5168   LearningRate 0.0568   Epoch: 4   Global Step: 82330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:22,067-Speed 9678.54 samples/sec   Loss 7.5783   LearningRate 0.0568   Epoch: 4   Global Step: 82340   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:51:23,126-Speed 9672.42 samples/sec   Loss 7.4629   LearningRate 0.0567   Epoch: 4   Global Step: 82350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:24,230-Speed 9284.19 samples/sec   Loss 7.4879   LearningRate 0.0567   Epoch: 4   Global Step: 82360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:25,330-Speed 9307.61 samples/sec   Loss 7.5882   LearningRate 0.0567   Epoch: 4   Global Step: 82370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:26,377-Speed 9796.58 samples/sec   Loss 7.4362   LearningRate 0.0567   Epoch: 4   Global Step: 82380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:27,417-Speed 9850.28 samples/sec   Loss 7.5698   LearningRate 0.0567   Epoch: 4   Global Step: 82390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:28,492-Speed 9530.39 samples/sec   Loss 7.4487   LearningRate 0.0567   Epoch: 4   Global Step: 82400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:29,584-Speed 9383.55 samples/sec   Loss 7.4898   LearningRate 0.0567   Epoch: 4   Global Step: 82410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:30,666-Speed 9469.13 samples/sec   Loss 7.6204   LearningRate 0.0567   Epoch: 4   Global Step: 82420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:31,764-Speed 9328.82 samples/sec   Loss 7.5401   LearningRate 0.0567   Epoch: 4   Global Step: 82430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:32,877-Speed 9207.73 samples/sec   Loss 7.6057   LearningRate 0.0567   Epoch: 4   Global Step: 82440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:33,979-Speed 9296.71 samples/sec   Loss 7.6119   LearningRate 0.0567   Epoch: 4   Global Step: 82450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:35,067-Speed 9417.63 samples/sec   Loss 7.6371   LearningRate 0.0567   Epoch: 4   Global Step: 82460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:36,116-Speed 9768.18 samples/sec   Loss 7.6256   LearningRate 0.0567   Epoch: 4   Global Step: 82470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:37,170-Speed 9716.47 samples/sec   Loss 7.5157   LearningRate 0.0567   Epoch: 4   Global Step: 82480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:38,265-Speed 9363.67 samples/sec   Loss 7.6556   LearningRate 0.0567   Epoch: 4   Global Step: 82490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:39,336-Speed 9562.87 samples/sec   Loss 7.6090   LearningRate 0.0567   Epoch: 4   Global Step: 82500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:40,428-Speed 9380.76 samples/sec   Loss 7.4727   LearningRate 0.0567   Epoch: 4   Global Step: 82510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:41,495-Speed 9604.62 samples/sec   Loss 7.6024   LearningRate 0.0567   Epoch: 4   Global Step: 82520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:42,551-Speed 9703.48 samples/sec   Loss 7.5087   LearningRate 0.0567   Epoch: 4   Global Step: 82530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:43,628-Speed 9514.17 samples/sec   Loss 7.5796   LearningRate 0.0567   Epoch: 4   Global Step: 82540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:44,712-Speed 9458.59 samples/sec   Loss 7.5703   LearningRate 0.0567   Epoch: 4   Global Step: 82550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:45,806-Speed 9361.70 samples/sec   Loss 7.5193   LearningRate 0.0567   Epoch: 4   Global Step: 82560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:46,922-Speed 9184.13 samples/sec   Loss 7.5358   LearningRate 0.0566   Epoch: 4   Global Step: 82570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:48,007-Speed 9436.77 samples/sec   Loss 7.6243   LearningRate 0.0566   Epoch: 4   Global Step: 82580   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:51:49,091-Speed 9450.57 samples/sec   Loss 7.5273   LearningRate 0.0566   Epoch: 4   Global Step: 82590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:50,173-Speed 9480.32 samples/sec   Loss 7.5452   LearningRate 0.0566   Epoch: 4   Global Step: 82600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:51,218-Speed 9799.04 samples/sec   Loss 7.5146   LearningRate 0.0566   Epoch: 4   Global Step: 82610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:52,347-Speed 9081.89 samples/sec   Loss 7.4187   LearningRate 0.0566   Epoch: 4   Global Step: 82620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:53,440-Speed 9369.12 samples/sec   Loss 7.5585   LearningRate 0.0566   Epoch: 4   Global Step: 82630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:54,570-Speed 9068.59 samples/sec   Loss 7.6391   LearningRate 0.0566   Epoch: 4   Global Step: 82640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:51:55,690-Speed 9142.09 samples/sec   Loss 7.4948   LearningRate 0.0566   Epoch: 4   Global Step: 82650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:56,798-Speed 9256.60 samples/sec   Loss 7.4607   LearningRate 0.0566   Epoch: 4   Global Step: 82660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:57,879-Speed 9473.76 samples/sec   Loss 7.4880   LearningRate 0.0566   Epoch: 4   Global Step: 82670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:51:58,979-Speed 9317.53 samples/sec   Loss 7.4126   LearningRate 0.0566   Epoch: 4   Global Step: 82680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:00,032-Speed 9731.81 samples/sec   Loss 7.5317   LearningRate 0.0566   Epoch: 4   Global Step: 82690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:01,101-Speed 9583.97 samples/sec   Loss 7.4501   LearningRate 0.0566   Epoch: 4   Global Step: 82700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:02,209-Speed 9245.15 samples/sec   Loss 7.5660   LearningRate 0.0566   Epoch: 4   Global Step: 82710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:03,313-Speed 9280.75 samples/sec   Loss 7.5651   LearningRate 0.0566   Epoch: 4   Global Step: 82720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:04,384-Speed 9562.60 samples/sec   Loss 7.4749   LearningRate 0.0566   Epoch: 4   Global Step: 82730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:05,460-Speed 9524.85 samples/sec   Loss 7.6461   LearningRate 0.0566   Epoch: 4   Global Step: 82740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:06,586-Speed 9099.20 samples/sec   Loss 7.5888   LearningRate 0.0566   Epoch: 4   Global Step: 82750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:07,674-Speed 9417.74 samples/sec   Loss 7.5408   LearningRate 0.0566   Epoch: 4   Global Step: 82760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:08,787-Speed 9206.60 samples/sec   Loss 7.5620   LearningRate 0.0566   Epoch: 4   Global Step: 82770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:09,859-Speed 9562.30 samples/sec   Loss 7.5151   LearningRate 0.0566   Epoch: 4   Global Step: 82780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:10,940-Speed 9478.11 samples/sec   Loss 7.5276   LearningRate 0.0565   Epoch: 4   Global Step: 82790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:12,038-Speed 9330.78 samples/sec   Loss 7.6225   LearningRate 0.0565   Epoch: 4   Global Step: 82800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:13,145-Speed 9250.31 samples/sec   Loss 7.6131   LearningRate 0.0565   Epoch: 4   Global Step: 82810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:14,225-Speed 9490.25 samples/sec   Loss 7.5550   LearningRate 0.0565   Epoch: 4   Global Step: 82820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:15,296-Speed 9563.48 samples/sec   Loss 7.4493   LearningRate 0.0565   Epoch: 4   Global Step: 82830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:16,369-Speed 9553.73 samples/sec   Loss 7.5594   LearningRate 0.0565   Epoch: 4   Global Step: 82840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:17,436-Speed 9604.11 samples/sec   Loss 7.6655   LearningRate 0.0565   Epoch: 4   Global Step: 82850   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:52:18,489-Speed 9725.86 samples/sec   Loss 7.5780   LearningRate 0.0565   Epoch: 4   Global Step: 82860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:19,597-Speed 9251.11 samples/sec   Loss 7.4709   LearningRate 0.0565   Epoch: 4   Global Step: 82870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:20,688-Speed 9389.89 samples/sec   Loss 7.5304   LearningRate 0.0565   Epoch: 4   Global Step: 82880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:21,770-Speed 9465.22 samples/sec   Loss 7.5354   LearningRate 0.0565   Epoch: 4   Global Step: 82890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:22,803-Speed 9924.70 samples/sec   Loss 7.5128   LearningRate 0.0565   Epoch: 4   Global Step: 82900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:23,889-Speed 9429.18 samples/sec   Loss 7.6535   LearningRate 0.0565   Epoch: 4   Global Step: 82910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:24,967-Speed 9508.99 samples/sec   Loss 7.4667   LearningRate 0.0565   Epoch: 4   Global Step: 82920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:26,032-Speed 9619.29 samples/sec   Loss 7.4617   LearningRate 0.0565   Epoch: 4   Global Step: 82930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:27,090-Speed 9692.48 samples/sec   Loss 7.4870   LearningRate 0.0565   Epoch: 4   Global Step: 82940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:28,176-Speed 9435.08 samples/sec   Loss 7.4498   LearningRate 0.0565   Epoch: 4   Global Step: 82950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:29,222-Speed 9791.17 samples/sec   Loss 7.7069   LearningRate 0.0565   Epoch: 4   Global Step: 82960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:30,282-Speed 9673.20 samples/sec   Loss 7.5458   LearningRate 0.0565   Epoch: 4   Global Step: 82970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:31,372-Speed 9395.88 samples/sec   Loss 7.5892   LearningRate 0.0565   Epoch: 4   Global Step: 82980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:32,425-Speed 9725.96 samples/sec   Loss 7.5649   LearningRate 0.0565   Epoch: 4   Global Step: 82990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:33,481-Speed 9702.90 samples/sec   Loss 7.5561   LearningRate 0.0565   Epoch: 4   Global Step: 83000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:34,573-Speed 9386.64 samples/sec   Loss 7.5919   LearningRate 0.0565   Epoch: 4   Global Step: 83010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:35,649-Speed 9518.14 samples/sec   Loss 7.4050   LearningRate 0.0564   Epoch: 4   Global Step: 83020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:36,716-Speed 9613.08 samples/sec   Loss 7.5655   LearningRate 0.0564   Epoch: 4   Global Step: 83030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:37,814-Speed 9329.80 samples/sec   Loss 7.4060   LearningRate 0.0564   Epoch: 4   Global Step: 83040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:38,893-Speed 9498.79 samples/sec   Loss 7.6047   LearningRate 0.0564   Epoch: 4   Global Step: 83050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:39,944-Speed 9754.86 samples/sec   Loss 7.5476   LearningRate 0.0564   Epoch: 4   Global Step: 83060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:41,018-Speed 9533.61 samples/sec   Loss 7.5897   LearningRate 0.0564   Epoch: 4   Global Step: 83070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:42,119-Speed 9306.96 samples/sec   Loss 7.5587   LearningRate 0.0564   Epoch: 4   Global Step: 83080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:43,209-Speed 9397.72 samples/sec   Loss 7.5468   LearningRate 0.0564   Epoch: 4   Global Step: 83090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:44,294-Speed 9444.91 samples/sec   Loss 7.7776   LearningRate 0.0564   Epoch: 4   Global Step: 83100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:45,436-Speed 8972.02 samples/sec   Loss 7.5282   LearningRate 0.0564   Epoch: 4   Global Step: 83110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:46,473-Speed 9881.54 samples/sec   Loss 7.4661   LearningRate 0.0564   Epoch: 4   Global Step: 83120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:47,537-Speed 9638.57 samples/sec   Loss 7.4986   LearningRate 0.0564   Epoch: 4   Global Step: 83130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:48,621-Speed 9450.68 samples/sec   Loss 7.4991   LearningRate 0.0564   Epoch: 4   Global Step: 83140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:52:49,671-Speed 9760.35 samples/sec   Loss 7.4936   LearningRate 0.0564   Epoch: 4   Global Step: 83150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:50,756-Speed 9452.05 samples/sec   Loss 7.6161   LearningRate 0.0564   Epoch: 4   Global Step: 83160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:51,817-Speed 9660.88 samples/sec   Loss 7.4261   LearningRate 0.0564   Epoch: 4   Global Step: 83170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:52,896-Speed 9493.96 samples/sec   Loss 7.5128   LearningRate 0.0564   Epoch: 4   Global Step: 83180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:53,938-Speed 9839.86 samples/sec   Loss 7.6314   LearningRate 0.0564   Epoch: 4   Global Step: 83190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:54,993-Speed 9705.19 samples/sec   Loss 7.4995   LearningRate 0.0564   Epoch: 4   Global Step: 83200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:56,126-Speed 9043.94 samples/sec   Loss 7.5221   LearningRate 0.0564   Epoch: 4   Global Step: 83210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:57,203-Speed 9517.51 samples/sec   Loss 7.5373   LearningRate 0.0564   Epoch: 4   Global Step: 83220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:58,311-Speed 9242.82 samples/sec   Loss 7.4733   LearningRate 0.0564   Epoch: 4   Global Step: 83230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:52:59,439-Speed 9081.88 samples/sec   Loss 7.4655   LearningRate 0.0563   Epoch: 4   Global Step: 83240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:53:00,515-Speed 9528.55 samples/sec   Loss 7.5744   LearningRate 0.0563   Epoch: 4   Global Step: 83250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:53:01,603-Speed 9420.22 samples/sec   Loss 7.5831   LearningRate 0.0563   Epoch: 4   Global Step: 83260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:53:02,679-Speed 9518.92 samples/sec   Loss 7.5118   LearningRate 0.0563   Epoch: 4   Global Step: 83270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:53:03,809-Speed 9063.73 samples/sec   Loss 7.5507   LearningRate 0.0563   Epoch: 4   Global Step: 83280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:53:04,871-Speed 9652.72 samples/sec   Loss 7.4683   LearningRate 0.0563   Epoch: 4   Global Step: 83290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:53:05,936-Speed 9616.72 samples/sec   Loss 7.4488   LearningRate 0.0563   Epoch: 4   Global Step: 83300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:53:07,003-Speed 9603.78 samples/sec   Loss 7.4151   LearningRate 0.0563   Epoch: 4   Global Step: 83310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:53:08,100-Speed 9349.63 samples/sec   Loss 7.5929   LearningRate 0.0563   Epoch: 4   Global Step: 83320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:53:09,121-Speed 10032.27 samples/sec   Loss 7.5396   LearningRate 0.0563   Epoch: 4   Global Step: 83330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:53:10,187-Speed 9611.28 samples/sec   Loss 7.6132   LearningRate 0.0563   Epoch: 4   Global Step: 83340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:53:11,261-Speed 9540.74 samples/sec   Loss 7.4154   LearningRate 0.0563   Epoch: 4   Global Step: 83350   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:53:12,350-Speed 9401.20 samples/sec   Loss 7.5620   LearningRate 0.0563   Epoch: 4   Global Step: 83360   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:53:13,476-Speed 9099.11 samples/sec   Loss 7.5304   LearningRate 0.0563   Epoch: 4   Global Step: 83370   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:53:14,520-Speed 9820.28 samples/sec   Loss 7.5265   LearningRate 0.0563   Epoch: 4   Global Step: 83380   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:53:15,573-Speed 9728.42 samples/sec   Loss 7.4940   LearningRate 0.0563   Epoch: 4   Global Step: 83390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:53:16,667-Speed 9359.70 samples/sec   Loss 7.5519   LearningRate 0.0563   Epoch: 4   Global Step: 83400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:53:17,743-Speed 9523.50 samples/sec   Loss 7.5237   LearningRate 0.0563   Epoch: 4   Global Step: 83410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:53:18,819-Speed 9527.97 samples/sec   Loss 7.5095   LearningRate 0.0563   Epoch: 4   Global Step: 83420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:53:19,890-Speed 9570.43 samples/sec   Loss 7.4850   LearningRate 0.0563   Epoch: 4   Global Step: 83430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:53:20,984-Speed 9370.21 samples/sec   Loss 7.4945   LearningRate 0.0563   Epoch: 4   Global Step: 83440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:53:22,353-Speed 7486.43 samples/sec   Loss 7.4955   LearningRate 0.0563   Epoch: 4   Global Step: 83450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:53:57,220-Speed 293.70 samples/sec   Loss 7.2486   LearningRate 0.0562   Epoch: 5   Global Step: 83460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:53:58,674-Speed 7045.06 samples/sec   Loss 6.8231   LearningRate 0.0562   Epoch: 5   Global Step: 83470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:54:00,001-Speed 7723.91 samples/sec   Loss 6.7100   LearningRate 0.0562   Epoch: 5   Global Step: 83480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:54:01,617-Speed 6340.63 samples/sec   Loss 6.7391   LearningRate 0.0562   Epoch: 5   Global Step: 83490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:54:02,767-Speed 8906.15 samples/sec   Loss 6.7404   LearningRate 0.0562   Epoch: 5   Global Step: 83500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:54:03,883-Speed 9178.09 samples/sec   Loss 6.6010   LearningRate 0.0562   Epoch: 5   Global Step: 83510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:54:05,172-Speed 7952.04 samples/sec   Loss 6.7226   LearningRate 0.0562   Epoch: 5   Global Step: 83520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:54:06,266-Speed 9369.56 samples/sec   Loss 6.6605   LearningRate 0.0562   Epoch: 5   Global Step: 83530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:07,344-Speed 9500.53 samples/sec   Loss 6.7870   LearningRate 0.0562   Epoch: 5   Global Step: 83540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:08,619-Speed 8032.17 samples/sec   Loss 6.7855   LearningRate 0.0562   Epoch: 5   Global Step: 83550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:09,758-Speed 8998.84 samples/sec   Loss 6.7179   LearningRate 0.0562   Epoch: 5   Global Step: 83560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:10,824-Speed 9613.46 samples/sec   Loss 6.7818   LearningRate 0.0562   Epoch: 5   Global Step: 83570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:11,942-Speed 9169.03 samples/sec   Loss 6.7457   LearningRate 0.0562   Epoch: 5   Global Step: 83580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:13,027-Speed 9446.39 samples/sec   Loss 6.6008   LearningRate 0.0562   Epoch: 5   Global Step: 83590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:14,109-Speed 9468.91 samples/sec   Loss 6.6882   LearningRate 0.0562   Epoch: 5   Global Step: 83600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:15,183-Speed 9535.74 samples/sec   Loss 6.7908   LearningRate 0.0562   Epoch: 5   Global Step: 83610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:16,281-Speed 9333.43 samples/sec   Loss 6.7496   LearningRate 0.0562   Epoch: 5   Global Step: 83620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:17,389-Speed 9293.52 samples/sec   Loss 6.8095   LearningRate 0.0562   Epoch: 5   Global Step: 83630   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:54:18,449-Speed 9670.57 samples/sec   Loss 6.7980   LearningRate 0.0562   Epoch: 5   Global Step: 83640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:19,519-Speed 9576.56 samples/sec   Loss 6.6908   LearningRate 0.0562   Epoch: 5   Global Step: 83650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:20,602-Speed 9463.48 samples/sec   Loss 6.8049   LearningRate 0.0562   Epoch: 5   Global Step: 83660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:21,715-Speed 9197.59 samples/sec   Loss 6.6674   LearningRate 0.0562   Epoch: 5   Global Step: 83670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:22,799-Speed 9457.43 samples/sec   Loss 6.7317   LearningRate 0.0561   Epoch: 5   Global Step: 83680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:23,841-Speed 9833.33 samples/sec   Loss 6.7760   LearningRate 0.0561   Epoch: 5   Global Step: 83690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:24,956-Speed 9184.69 samples/sec   Loss 6.7527   LearningRate 0.0561   Epoch: 5   Global Step: 83700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:26,120-Speed 8805.64 samples/sec   Loss 6.6767   LearningRate 0.0561   Epoch: 5   Global Step: 83710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:27,174-Speed 9721.44 samples/sec   Loss 6.6996   LearningRate 0.0561   Epoch: 5   Global Step: 83720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:28,264-Speed 9404.57 samples/sec   Loss 6.5686   LearningRate 0.0561   Epoch: 5   Global Step: 83730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:29,365-Speed 9304.96 samples/sec   Loss 6.8174   LearningRate 0.0561   Epoch: 5   Global Step: 83740   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:54:30,430-Speed 9620.97 samples/sec   Loss 6.7475   LearningRate 0.0561   Epoch: 5   Global Step: 83750   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:54:31,500-Speed 9570.47 samples/sec   Loss 6.7842   LearningRate 0.0561   Epoch: 5   Global Step: 83760   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:54:32,753-Speed 8175.99 samples/sec   Loss 6.7525   LearningRate 0.0561   Epoch: 5   Global Step: 83770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:33,783-Speed 9950.18 samples/sec   Loss 6.8157   LearningRate 0.0561   Epoch: 5   Global Step: 83780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:34,868-Speed 9448.90 samples/sec   Loss 6.7893   LearningRate 0.0561   Epoch: 5   Global Step: 83790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:35,934-Speed 9609.03 samples/sec   Loss 6.8073   LearningRate 0.0561   Epoch: 5   Global Step: 83800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:37,034-Speed 9310.19 samples/sec   Loss 6.8060   LearningRate 0.0561   Epoch: 5   Global Step: 83810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:38,090-Speed 9705.01 samples/sec   Loss 6.7827   LearningRate 0.0561   Epoch: 5   Global Step: 83820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:39,132-Speed 9829.96 samples/sec   Loss 6.8660   LearningRate 0.0561   Epoch: 5   Global Step: 83830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:40,224-Speed 9380.79 samples/sec   Loss 6.7377   LearningRate 0.0561   Epoch: 5   Global Step: 83840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:41,293-Speed 9589.96 samples/sec   Loss 6.8233   LearningRate 0.0561   Epoch: 5   Global Step: 83850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:42,351-Speed 9682.09 samples/sec   Loss 6.7774   LearningRate 0.0561   Epoch: 5   Global Step: 83860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:43,438-Speed 9432.71 samples/sec   Loss 6.8449   LearningRate 0.0561   Epoch: 5   Global Step: 83870   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:54:44,533-Speed 9355.27 samples/sec   Loss 6.8723   LearningRate 0.0561   Epoch: 5   Global Step: 83880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:45,634-Speed 9307.77 samples/sec   Loss 6.8611   LearningRate 0.0561   Epoch: 5   Global Step: 83890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:54:46,702-Speed 9589.72 samples/sec   Loss 6.8346   LearningRate 0.0561   Epoch: 5   Global Step: 83900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:54:47,815-Speed 9209.28 samples/sec   Loss 6.8523   LearningRate 0.0560   Epoch: 5   Global Step: 83910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:54:48,882-Speed 9605.25 samples/sec   Loss 6.9535   LearningRate 0.0560   Epoch: 5   Global Step: 83920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:54:49,931-Speed 9767.74 samples/sec   Loss 6.7939   LearningRate 0.0560   Epoch: 5   Global Step: 83930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:54:51,001-Speed 9575.02 samples/sec   Loss 6.7228   LearningRate 0.0560   Epoch: 5   Global Step: 83940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:54:52,163-Speed 8821.42 samples/sec   Loss 6.7967   LearningRate 0.0560   Epoch: 5   Global Step: 83950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:54:53,260-Speed 9334.99 samples/sec   Loss 6.8158   LearningRate 0.0560   Epoch: 5   Global Step: 83960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:54:54,347-Speed 9431.89 samples/sec   Loss 6.7863   LearningRate 0.0560   Epoch: 5   Global Step: 83970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:54:55,459-Speed 9216.40 samples/sec   Loss 6.9072   LearningRate 0.0560   Epoch: 5   Global Step: 83980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:54:56,606-Speed 8931.18 samples/sec   Loss 6.8151   LearningRate 0.0560   Epoch: 5   Global Step: 83990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:54:57,647-Speed 9847.78 samples/sec   Loss 6.8362   LearningRate 0.0560   Epoch: 5   Global Step: 84000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:55:19,591-[lfw][84000]XNorm: 11.742262
Training: 2022-04-11 14:55:19,592-[lfw][84000]Accuracy-Flip: 0.99617+-0.00248
Training: 2022-04-11 14:55:19,592-[lfw][84000]Accuracy-Highest: 0.99667
Training: 2022-04-11 14:55:44,947-[cfp_fp][84000]XNorm: 10.006655
Training: 2022-04-11 14:55:44,948-[cfp_fp][84000]Accuracy-Flip: 0.95143+-0.01204
Training: 2022-04-11 14:55:44,948-[cfp_fp][84000]Accuracy-Highest: 0.95400
Training: 2022-04-11 14:56:06,761-[agedb_30][84000]XNorm: 11.427986
Training: 2022-04-11 14:56:06,762-[agedb_30][84000]Accuracy-Flip: 0.96200+-0.01087
Training: 2022-04-11 14:56:06,763-[agedb_30][84000]Accuracy-Highest: 0.96300
Training: 2022-04-11 14:56:07,821-Speed 145.92 samples/sec   Loss 6.9539   LearningRate 0.0560   Epoch: 5   Global Step: 84010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:08,850-Speed 9964.28 samples/sec   Loss 6.8922   LearningRate 0.0560   Epoch: 5   Global Step: 84020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:09,884-Speed 9903.54 samples/sec   Loss 6.8138   LearningRate 0.0560   Epoch: 5   Global Step: 84030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:10,993-Speed 9235.87 samples/sec   Loss 6.9348   LearningRate 0.0560   Epoch: 5   Global Step: 84040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:12,064-Speed 9566.09 samples/sec   Loss 6.8688   LearningRate 0.0560   Epoch: 5   Global Step: 84050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:13,137-Speed 9552.29 samples/sec   Loss 6.9391   LearningRate 0.0560   Epoch: 5   Global Step: 84060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:14,193-Speed 9710.46 samples/sec   Loss 6.9801   LearningRate 0.0560   Epoch: 5   Global Step: 84070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:15,319-Speed 9101.38 samples/sec   Loss 6.8414   LearningRate 0.0560   Epoch: 5   Global Step: 84080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:16,394-Speed 9529.12 samples/sec   Loss 6.9123   LearningRate 0.0560   Epoch: 5   Global Step: 84090   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:56:17,460-Speed 9607.10 samples/sec   Loss 6.9088   LearningRate 0.0560   Epoch: 5   Global Step: 84100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:18,559-Speed 9320.91 samples/sec   Loss 6.8185   LearningRate 0.0560   Epoch: 5   Global Step: 84110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:19,637-Speed 9511.79 samples/sec   Loss 6.8253   LearningRate 0.0560   Epoch: 5   Global Step: 84120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:20,740-Speed 9288.69 samples/sec   Loss 6.9782   LearningRate 0.0559   Epoch: 5   Global Step: 84130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:21,828-Speed 9418.78 samples/sec   Loss 6.9596   LearningRate 0.0559   Epoch: 5   Global Step: 84140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:22,924-Speed 9346.93 samples/sec   Loss 6.9251   LearningRate 0.0559   Epoch: 5   Global Step: 84150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:24,035-Speed 9223.49 samples/sec   Loss 6.9228   LearningRate 0.0559   Epoch: 5   Global Step: 84160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:25,123-Speed 9411.19 samples/sec   Loss 6.9550   LearningRate 0.0559   Epoch: 5   Global Step: 84170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:26,377-Speed 8170.75 samples/sec   Loss 6.7801   LearningRate 0.0559   Epoch: 5   Global Step: 84180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:27,947-Speed 6524.61 samples/sec   Loss 6.9460   LearningRate 0.0559   Epoch: 5   Global Step: 84190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:29,352-Speed 7294.29 samples/sec   Loss 6.9408   LearningRate 0.0559   Epoch: 5   Global Step: 84200   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:56:30,613-Speed 8127.79 samples/sec   Loss 6.9029   LearningRate 0.0559   Epoch: 5   Global Step: 84210   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:56:31,668-Speed 9706.46 samples/sec   Loss 6.8428   LearningRate 0.0559   Epoch: 5   Global Step: 84220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:32,744-Speed 9523.30 samples/sec   Loss 6.9316   LearningRate 0.0559   Epoch: 5   Global Step: 84230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:33,791-Speed 9783.93 samples/sec   Loss 6.8470   LearningRate 0.0559   Epoch: 5   Global Step: 84240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:34,867-Speed 9519.99 samples/sec   Loss 6.8919   LearningRate 0.0559   Epoch: 5   Global Step: 84250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:35,968-Speed 9318.43 samples/sec   Loss 6.8533   LearningRate 0.0559   Epoch: 5   Global Step: 84260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:37,023-Speed 9711.22 samples/sec   Loss 6.9514   LearningRate 0.0559   Epoch: 5   Global Step: 84270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:38,091-Speed 9586.63 samples/sec   Loss 6.9639   LearningRate 0.0559   Epoch: 5   Global Step: 84280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:39,159-Speed 9596.07 samples/sec   Loss 6.8953   LearningRate 0.0559   Epoch: 5   Global Step: 84290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:40,229-Speed 9579.60 samples/sec   Loss 6.9853   LearningRate 0.0559   Epoch: 5   Global Step: 84300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:41,332-Speed 9290.50 samples/sec   Loss 6.9329   LearningRate 0.0559   Epoch: 5   Global Step: 84310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:42,437-Speed 9271.22 samples/sec   Loss 6.9065   LearningRate 0.0559   Epoch: 5   Global Step: 84320   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:56:43,538-Speed 9309.89 samples/sec   Loss 6.9276   LearningRate 0.0559   Epoch: 5   Global Step: 84330   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:56:44,593-Speed 9708.11 samples/sec   Loss 6.8816   LearningRate 0.0559   Epoch: 5   Global Step: 84340   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:56:45,668-Speed 9534.69 samples/sec   Loss 6.9829   LearningRate 0.0558   Epoch: 5   Global Step: 84350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:46,741-Speed 9548.68 samples/sec   Loss 6.8771   LearningRate 0.0558   Epoch: 5   Global Step: 84360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:47,784-Speed 9822.45 samples/sec   Loss 6.8994   LearningRate 0.0558   Epoch: 5   Global Step: 84370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:48,907-Speed 9125.26 samples/sec   Loss 6.9433   LearningRate 0.0558   Epoch: 5   Global Step: 84380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:50,008-Speed 9307.96 samples/sec   Loss 6.9989   LearningRate 0.0558   Epoch: 5   Global Step: 84390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:51,066-Speed 9687.64 samples/sec   Loss 6.9395   LearningRate 0.0558   Epoch: 5   Global Step: 84400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:52,128-Speed 9640.68 samples/sec   Loss 6.8301   LearningRate 0.0558   Epoch: 5   Global Step: 84410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:53,236-Speed 9252.04 samples/sec   Loss 6.9007   LearningRate 0.0558   Epoch: 5   Global Step: 84420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:54,309-Speed 9541.20 samples/sec   Loss 6.9283   LearningRate 0.0558   Epoch: 5   Global Step: 84430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:55,385-Speed 9525.27 samples/sec   Loss 7.0353   LearningRate 0.0558   Epoch: 5   Global Step: 84440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:56,486-Speed 9307.44 samples/sec   Loss 7.0176   LearningRate 0.0558   Epoch: 5   Global Step: 84450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:57,603-Speed 9171.13 samples/sec   Loss 6.9477   LearningRate 0.0558   Epoch: 5   Global Step: 84460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:58,684-Speed 9482.65 samples/sec   Loss 6.9905   LearningRate 0.0558   Epoch: 5   Global Step: 84470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:56:59,810-Speed 9097.03 samples/sec   Loss 6.8850   LearningRate 0.0558   Epoch: 5   Global Step: 84480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:00,877-Speed 9606.52 samples/sec   Loss 6.9128   LearningRate 0.0558   Epoch: 5   Global Step: 84490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:01,978-Speed 9299.28 samples/sec   Loss 6.9180   LearningRate 0.0558   Epoch: 5   Global Step: 84500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:03,070-Speed 9385.45 samples/sec   Loss 7.0533   LearningRate 0.0558   Epoch: 5   Global Step: 84510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:04,159-Speed 9410.98 samples/sec   Loss 6.8828   LearningRate 0.0558   Epoch: 5   Global Step: 84520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:05,246-Speed 9430.60 samples/sec   Loss 6.8858   LearningRate 0.0558   Epoch: 5   Global Step: 84530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:06,323-Speed 9517.21 samples/sec   Loss 6.9004   LearningRate 0.0558   Epoch: 5   Global Step: 84540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:07,394-Speed 9564.62 samples/sec   Loss 7.0932   LearningRate 0.0558   Epoch: 5   Global Step: 84550   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:57:08,489-Speed 9354.44 samples/sec   Loss 6.9426   LearningRate 0.0558   Epoch: 5   Global Step: 84560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:09,583-Speed 9369.19 samples/sec   Loss 7.0763   LearningRate 0.0558   Epoch: 5   Global Step: 84570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:10,647-Speed 9631.87 samples/sec   Loss 6.9280   LearningRate 0.0557   Epoch: 5   Global Step: 84580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:11,710-Speed 9636.49 samples/sec   Loss 6.9494   LearningRate 0.0557   Epoch: 5   Global Step: 84590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:12,783-Speed 9552.75 samples/sec   Loss 7.0403   LearningRate 0.0557   Epoch: 5   Global Step: 84600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:13,875-Speed 9380.45 samples/sec   Loss 6.9549   LearningRate 0.0557   Epoch: 5   Global Step: 84610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:14,961-Speed 9431.57 samples/sec   Loss 7.0241   LearningRate 0.0557   Epoch: 5   Global Step: 84620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:16,010-Speed 9771.40 samples/sec   Loss 7.0100   LearningRate 0.0557   Epoch: 5   Global Step: 84630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:17,160-Speed 8909.06 samples/sec   Loss 7.0148   LearningRate 0.0557   Epoch: 5   Global Step: 84640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:18,264-Speed 9284.20 samples/sec   Loss 7.0571   LearningRate 0.0557   Epoch: 5   Global Step: 84650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:19,362-Speed 9331.96 samples/sec   Loss 6.8158   LearningRate 0.0557   Epoch: 5   Global Step: 84660   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:57:20,429-Speed 9601.40 samples/sec   Loss 6.9227   LearningRate 0.0557   Epoch: 5   Global Step: 84670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:21,516-Speed 9423.51 samples/sec   Loss 6.8576   LearningRate 0.0557   Epoch: 5   Global Step: 84680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:22,619-Speed 9289.98 samples/sec   Loss 6.9253   LearningRate 0.0557   Epoch: 5   Global Step: 84690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:23,740-Speed 9141.56 samples/sec   Loss 6.8830   LearningRate 0.0557   Epoch: 5   Global Step: 84700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:24,873-Speed 9044.96 samples/sec   Loss 6.9778   LearningRate 0.0557   Epoch: 5   Global Step: 84710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:25,994-Speed 9142.19 samples/sec   Loss 7.0217   LearningRate 0.0557   Epoch: 5   Global Step: 84720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:27,079-Speed 9447.00 samples/sec   Loss 7.0343   LearningRate 0.0557   Epoch: 5   Global Step: 84730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:28,150-Speed 9565.27 samples/sec   Loss 7.1599   LearningRate 0.0557   Epoch: 5   Global Step: 84740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:29,206-Speed 9699.95 samples/sec   Loss 6.9744   LearningRate 0.0557   Epoch: 5   Global Step: 84750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:30,336-Speed 9062.44 samples/sec   Loss 7.0230   LearningRate 0.0557   Epoch: 5   Global Step: 84760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:31,450-Speed 9204.33 samples/sec   Loss 6.9997   LearningRate 0.0557   Epoch: 5   Global Step: 84770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:32,548-Speed 9331.02 samples/sec   Loss 7.0243   LearningRate 0.0557   Epoch: 5   Global Step: 84780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:57:33,673-Speed 9108.19 samples/sec   Loss 7.0351   LearningRate 0.0557   Epoch: 5   Global Step: 84790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:57:34,750-Speed 9504.97 samples/sec   Loss 6.9954   LearningRate 0.0556   Epoch: 5   Global Step: 84800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:57:35,802-Speed 9754.92 samples/sec   Loss 6.9257   LearningRate 0.0556   Epoch: 5   Global Step: 84810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:57:36,883-Speed 9481.21 samples/sec   Loss 7.0162   LearningRate 0.0556   Epoch: 5   Global Step: 84820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:57:38,013-Speed 9066.91 samples/sec   Loss 7.0506   LearningRate 0.0556   Epoch: 5   Global Step: 84830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:57:39,118-Speed 9276.20 samples/sec   Loss 6.9747   LearningRate 0.0556   Epoch: 5   Global Step: 84840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:57:40,221-Speed 9288.93 samples/sec   Loss 7.0184   LearningRate 0.0556   Epoch: 5   Global Step: 84850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:57:41,310-Speed 9406.12 samples/sec   Loss 7.0044   LearningRate 0.0556   Epoch: 5   Global Step: 84860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:57:42,386-Speed 9523.27 samples/sec   Loss 7.0102   LearningRate 0.0556   Epoch: 5   Global Step: 84870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:57:43,462-Speed 9528.78 samples/sec   Loss 7.0536   LearningRate 0.0556   Epoch: 5   Global Step: 84880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:44,504-Speed 9827.15 samples/sec   Loss 7.0094   LearningRate 0.0556   Epoch: 5   Global Step: 84890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:45,549-Speed 9809.40 samples/sec   Loss 7.1697   LearningRate 0.0556   Epoch: 5   Global Step: 84900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:46,624-Speed 9528.79 samples/sec   Loss 6.8575   LearningRate 0.0556   Epoch: 5   Global Step: 84910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:47,692-Speed 9593.54 samples/sec   Loss 7.0776   LearningRate 0.0556   Epoch: 5   Global Step: 84920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:48,821-Speed 9070.33 samples/sec   Loss 7.1389   LearningRate 0.0556   Epoch: 5   Global Step: 84930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:49,918-Speed 9355.78 samples/sec   Loss 7.0180   LearningRate 0.0556   Epoch: 5   Global Step: 84940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:51,028-Speed 9224.32 samples/sec   Loss 7.0519   LearningRate 0.0556   Epoch: 5   Global Step: 84950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:52,113-Speed 9442.30 samples/sec   Loss 7.0534   LearningRate 0.0556   Epoch: 5   Global Step: 84960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:53,191-Speed 9507.92 samples/sec   Loss 7.0386   LearningRate 0.0556   Epoch: 5   Global Step: 84970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:54,326-Speed 9025.27 samples/sec   Loss 7.1736   LearningRate 0.0556   Epoch: 5   Global Step: 84980   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:57:55,359-Speed 9922.58 samples/sec   Loss 6.9960   LearningRate 0.0556   Epoch: 5   Global Step: 84990   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:57:56,453-Speed 9369.97 samples/sec   Loss 7.1713   LearningRate 0.0556   Epoch: 5   Global Step: 85000   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 14:57:57,514-Speed 9658.61 samples/sec   Loss 7.0319   LearningRate 0.0556   Epoch: 5   Global Step: 85010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:58,572-Speed 9681.44 samples/sec   Loss 7.0498   LearningRate 0.0555   Epoch: 5   Global Step: 85020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:57:59,683-Speed 9224.41 samples/sec   Loss 7.0293   LearningRate 0.0555   Epoch: 5   Global Step: 85030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:00,734-Speed 9748.87 samples/sec   Loss 7.0423   LearningRate 0.0555   Epoch: 5   Global Step: 85040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:01,825-Speed 9385.68 samples/sec   Loss 7.0444   LearningRate 0.0555   Epoch: 5   Global Step: 85050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:02,904-Speed 9499.28 samples/sec   Loss 7.0168   LearningRate 0.0555   Epoch: 5   Global Step: 85060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:03,979-Speed 9533.41 samples/sec   Loss 7.0882   LearningRate 0.0555   Epoch: 5   Global Step: 85070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:05,037-Speed 9680.83 samples/sec   Loss 7.1732   LearningRate 0.0555   Epoch: 5   Global Step: 85080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:06,116-Speed 9496.69 samples/sec   Loss 6.9958   LearningRate 0.0555   Epoch: 5   Global Step: 85090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:07,178-Speed 9654.76 samples/sec   Loss 7.0943   LearningRate 0.0555   Epoch: 5   Global Step: 85100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:08,237-Speed 9672.16 samples/sec   Loss 7.0098   LearningRate 0.0555   Epoch: 5   Global Step: 85110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:09,324-Speed 9420.87 samples/sec   Loss 7.1648   LearningRate 0.0555   Epoch: 5   Global Step: 85120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:10,434-Speed 9236.92 samples/sec   Loss 7.2732   LearningRate 0.0555   Epoch: 5   Global Step: 85130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:11,547-Speed 9206.65 samples/sec   Loss 7.0715   LearningRate 0.0555   Epoch: 5   Global Step: 85140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:12,650-Speed 9288.07 samples/sec   Loss 7.0903   LearningRate 0.0555   Epoch: 5   Global Step: 85150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:13,731-Speed 9472.42 samples/sec   Loss 7.1077   LearningRate 0.0555   Epoch: 5   Global Step: 85160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:14,810-Speed 9499.49 samples/sec   Loss 7.0765   LearningRate 0.0555   Epoch: 5   Global Step: 85170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:15,857-Speed 9792.18 samples/sec   Loss 7.0574   LearningRate 0.0555   Epoch: 5   Global Step: 85180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:16,940-Speed 9454.88 samples/sec   Loss 7.0344   LearningRate 0.0555   Epoch: 5   Global Step: 85190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:18,026-Speed 9437.57 samples/sec   Loss 7.0028   LearningRate 0.0555   Epoch: 5   Global Step: 85200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:19,097-Speed 9564.39 samples/sec   Loss 7.1642   LearningRate 0.0555   Epoch: 5   Global Step: 85210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:20,135-Speed 9873.65 samples/sec   Loss 7.1741   LearningRate 0.0555   Epoch: 5   Global Step: 85220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:21,214-Speed 9493.24 samples/sec   Loss 7.0276   LearningRate 0.0555   Epoch: 5   Global Step: 85230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:22,300-Speed 9440.06 samples/sec   Loss 6.9685   LearningRate 0.0555   Epoch: 5   Global Step: 85240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:23,381-Speed 9472.32 samples/sec   Loss 6.9668   LearningRate 0.0554   Epoch: 5   Global Step: 85250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:24,485-Speed 9290.45 samples/sec   Loss 7.0524   LearningRate 0.0554   Epoch: 5   Global Step: 85260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:25,579-Speed 9363.07 samples/sec   Loss 7.0576   LearningRate 0.0554   Epoch: 5   Global Step: 85270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:26,676-Speed 9337.31 samples/sec   Loss 7.1281   LearningRate 0.0554   Epoch: 5   Global Step: 85280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:27,746-Speed 9576.49 samples/sec   Loss 7.1172   LearningRate 0.0554   Epoch: 5   Global Step: 85290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:28,839-Speed 9373.35 samples/sec   Loss 7.1057   LearningRate 0.0554   Epoch: 5   Global Step: 85300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:29,900-Speed 9655.84 samples/sec   Loss 7.1542   LearningRate 0.0554   Epoch: 5   Global Step: 85310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:30,981-Speed 9481.84 samples/sec   Loss 7.0799   LearningRate 0.0554   Epoch: 5   Global Step: 85320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:32,056-Speed 9526.87 samples/sec   Loss 7.0664   LearningRate 0.0554   Epoch: 5   Global Step: 85330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:33,139-Speed 9461.03 samples/sec   Loss 7.1373   LearningRate 0.0554   Epoch: 5   Global Step: 85340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:34,209-Speed 9582.09 samples/sec   Loss 7.0366   LearningRate 0.0554   Epoch: 5   Global Step: 85350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:35,320-Speed 9220.96 samples/sec   Loss 7.1132   LearningRate 0.0554   Epoch: 5   Global Step: 85360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:36,416-Speed 9354.68 samples/sec   Loss 7.0816   LearningRate 0.0554   Epoch: 5   Global Step: 85370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:37,516-Speed 9319.26 samples/sec   Loss 6.9802   LearningRate 0.0554   Epoch: 5   Global Step: 85380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:38,584-Speed 9598.79 samples/sec   Loss 7.0470   LearningRate 0.0554   Epoch: 5   Global Step: 85390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:39,647-Speed 9648.34 samples/sec   Loss 6.9989   LearningRate 0.0554   Epoch: 5   Global Step: 85400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:40,733-Speed 9433.49 samples/sec   Loss 7.1065   LearningRate 0.0554   Epoch: 5   Global Step: 85410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:41,814-Speed 9477.29 samples/sec   Loss 7.1120   LearningRate 0.0554   Epoch: 5   Global Step: 85420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:42,869-Speed 9712.63 samples/sec   Loss 7.1367   LearningRate 0.0554   Epoch: 5   Global Step: 85430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:43,943-Speed 9545.87 samples/sec   Loss 7.0459   LearningRate 0.0554   Epoch: 5   Global Step: 85440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:44,999-Speed 9698.82 samples/sec   Loss 7.0016   LearningRate 0.0554   Epoch: 5   Global Step: 85450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:46,056-Speed 9694.01 samples/sec   Loss 7.0731   LearningRate 0.0554   Epoch: 5   Global Step: 85460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:47,147-Speed 9391.71 samples/sec   Loss 7.0916   LearningRate 0.0553   Epoch: 5   Global Step: 85470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:48,255-Speed 9244.16 samples/sec   Loss 7.0856   LearningRate 0.0553   Epoch: 5   Global Step: 85480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:49,343-Speed 9425.20 samples/sec   Loss 7.1228   LearningRate 0.0553   Epoch: 5   Global Step: 85490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:50,426-Speed 9453.72 samples/sec   Loss 6.9618   LearningRate 0.0553   Epoch: 5   Global Step: 85500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:51,576-Speed 8909.82 samples/sec   Loss 7.0946   LearningRate 0.0553   Epoch: 5   Global Step: 85510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:58:52,647-Speed 9570.63 samples/sec   Loss 7.0576   LearningRate 0.0553   Epoch: 5   Global Step: 85520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:53,747-Speed 9316.58 samples/sec   Loss 7.1075   LearningRate 0.0553   Epoch: 5   Global Step: 85530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:54,825-Speed 9507.39 samples/sec   Loss 7.1118   LearningRate 0.0553   Epoch: 5   Global Step: 85540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:55,913-Speed 9417.89 samples/sec   Loss 7.0034   LearningRate 0.0553   Epoch: 5   Global Step: 85550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:56,984-Speed 9562.30 samples/sec   Loss 7.1626   LearningRate 0.0553   Epoch: 5   Global Step: 85560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:58,041-Speed 9690.62 samples/sec   Loss 7.0951   LearningRate 0.0553   Epoch: 5   Global Step: 85570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:58:59,123-Speed 9471.14 samples/sec   Loss 7.1211   LearningRate 0.0553   Epoch: 5   Global Step: 85580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:00,237-Speed 9197.73 samples/sec   Loss 7.0751   LearningRate 0.0553   Epoch: 5   Global Step: 85590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:01,307-Speed 9577.14 samples/sec   Loss 7.0837   LearningRate 0.0553   Epoch: 5   Global Step: 85600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:02,354-Speed 9781.78 samples/sec   Loss 7.1311   LearningRate 0.0553   Epoch: 5   Global Step: 85610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:03,381-Speed 9975.49 samples/sec   Loss 7.0625   LearningRate 0.0553   Epoch: 5   Global Step: 85620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:04,423-Speed 9834.81 samples/sec   Loss 7.1354   LearningRate 0.0553   Epoch: 5   Global Step: 85630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:05,481-Speed 9690.89 samples/sec   Loss 6.9657   LearningRate 0.0553   Epoch: 5   Global Step: 85640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:06,576-Speed 9355.46 samples/sec   Loss 7.0749   LearningRate 0.0553   Epoch: 5   Global Step: 85650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:07,669-Speed 9371.50 samples/sec   Loss 7.1291   LearningRate 0.0553   Epoch: 5   Global Step: 85660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:08,730-Speed 9652.83 samples/sec   Loss 7.1105   LearningRate 0.0553   Epoch: 5   Global Step: 85670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:09,843-Speed 9209.67 samples/sec   Loss 7.1385   LearningRate 0.0553   Epoch: 5   Global Step: 85680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:10,912-Speed 9583.09 samples/sec   Loss 7.1299   LearningRate 0.0553   Epoch: 5   Global Step: 85690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:11,973-Speed 9666.57 samples/sec   Loss 7.1460   LearningRate 0.0552   Epoch: 5   Global Step: 85700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:13,056-Speed 9459.21 samples/sec   Loss 7.1354   LearningRate 0.0552   Epoch: 5   Global Step: 85710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:14,146-Speed 9403.34 samples/sec   Loss 7.2060   LearningRate 0.0552   Epoch: 5   Global Step: 85720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:15,246-Speed 9313.93 samples/sec   Loss 7.1176   LearningRate 0.0552   Epoch: 5   Global Step: 85730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:16,341-Speed 9355.88 samples/sec   Loss 7.1762   LearningRate 0.0552   Epoch: 5   Global Step: 85740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:17,433-Speed 9378.94 samples/sec   Loss 7.1155   LearningRate 0.0552   Epoch: 5   Global Step: 85750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:18,529-Speed 9345.29 samples/sec   Loss 7.1582   LearningRate 0.0552   Epoch: 5   Global Step: 85760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:19,623-Speed 9367.57 samples/sec   Loss 7.1368   LearningRate 0.0552   Epoch: 5   Global Step: 85770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:20,722-Speed 9323.09 samples/sec   Loss 6.9690   LearningRate 0.0552   Epoch: 5   Global Step: 85780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:21,811-Speed 9407.27 samples/sec   Loss 7.1821   LearningRate 0.0552   Epoch: 5   Global Step: 85790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:22,952-Speed 8978.04 samples/sec   Loss 7.0579   LearningRate 0.0552   Epoch: 5   Global Step: 85800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:24,060-Speed 9248.98 samples/sec   Loss 7.1302   LearningRate 0.0552   Epoch: 5   Global Step: 85810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:25,135-Speed 9529.10 samples/sec   Loss 7.0996   LearningRate 0.0552   Epoch: 5   Global Step: 85820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:26,227-Speed 9381.30 samples/sec   Loss 7.1788   LearningRate 0.0552   Epoch: 5   Global Step: 85830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:27,288-Speed 9659.46 samples/sec   Loss 7.1125   LearningRate 0.0552   Epoch: 5   Global Step: 85840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:28,345-Speed 9690.63 samples/sec   Loss 7.2350   LearningRate 0.0552   Epoch: 5   Global Step: 85850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:29,374-Speed 9962.92 samples/sec   Loss 7.0837   LearningRate 0.0552   Epoch: 5   Global Step: 85860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:30,435-Speed 9656.29 samples/sec   Loss 7.0674   LearningRate 0.0552   Epoch: 5   Global Step: 85870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:31,528-Speed 9376.75 samples/sec   Loss 7.0920   LearningRate 0.0552   Epoch: 5   Global Step: 85880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:32,634-Speed 9269.03 samples/sec   Loss 7.0130   LearningRate 0.0552   Epoch: 5   Global Step: 85890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:33,752-Speed 9156.44 samples/sec   Loss 7.0528   LearningRate 0.0552   Epoch: 5   Global Step: 85900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:34,808-Speed 9702.00 samples/sec   Loss 7.0599   LearningRate 0.0552   Epoch: 5   Global Step: 85910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:35,885-Speed 9523.11 samples/sec   Loss 7.1332   LearningRate 0.0551   Epoch: 5   Global Step: 85920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:36,977-Speed 9380.64 samples/sec   Loss 7.1590   LearningRate 0.0551   Epoch: 5   Global Step: 85930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:38,058-Speed 9480.48 samples/sec   Loss 7.0356   LearningRate 0.0551   Epoch: 5   Global Step: 85940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:39,108-Speed 9751.69 samples/sec   Loss 7.0396   LearningRate 0.0551   Epoch: 5   Global Step: 85950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 14:59:40,151-Speed 9821.58 samples/sec   Loss 7.2042   LearningRate 0.0551   Epoch: 5   Global Step: 85960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:41,271-Speed 9149.52 samples/sec   Loss 7.1476   LearningRate 0.0551   Epoch: 5   Global Step: 85970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:42,367-Speed 9356.61 samples/sec   Loss 7.0326   LearningRate 0.0551   Epoch: 5   Global Step: 85980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:43,468-Speed 9302.02 samples/sec   Loss 7.0634   LearningRate 0.0551   Epoch: 5   Global Step: 85990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 14:59:44,549-Speed 9474.01 samples/sec   Loss 7.1865   LearningRate 0.0551   Epoch: 5   Global Step: 86000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:00:06,689-[lfw][86000]XNorm: 11.891397
Training: 2022-04-11 15:00:06,690-[lfw][86000]Accuracy-Flip: 0.99533+-0.00245
Training: 2022-04-11 15:00:06,690-[lfw][86000]Accuracy-Highest: 0.99667
Training: 2022-04-11 15:00:32,500-[cfp_fp][86000]XNorm: 10.047313
Training: 2022-04-11 15:00:32,501-[cfp_fp][86000]Accuracy-Flip: 0.95586+-0.00844
Training: 2022-04-11 15:00:32,501-[cfp_fp][86000]Accuracy-Highest: 0.95586
Training: 2022-04-11 15:00:54,823-[agedb_30][86000]XNorm: 11.431738
Training: 2022-04-11 15:00:54,824-[agedb_30][86000]Accuracy-Flip: 0.95933+-0.01148
Training: 2022-04-11 15:00:54,824-[agedb_30][86000]Accuracy-Highest: 0.96300
Training: 2022-04-11 15:00:55,944-Speed 143.43 samples/sec   Loss 7.1047   LearningRate 0.0551   Epoch: 5   Global Step: 86010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:00:57,038-Speed 9366.05 samples/sec   Loss 7.1425   LearningRate 0.0551   Epoch: 5   Global Step: 86020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:00:58,135-Speed 9339.29 samples/sec   Loss 7.1574   LearningRate 0.0551   Epoch: 5   Global Step: 86030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:00:59,194-Speed 9671.59 samples/sec   Loss 7.2798   LearningRate 0.0551   Epoch: 5   Global Step: 86040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:00,251-Speed 9691.01 samples/sec   Loss 7.1124   LearningRate 0.0551   Epoch: 5   Global Step: 86050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:01,304-Speed 9729.28 samples/sec   Loss 7.0614   LearningRate 0.0551   Epoch: 5   Global Step: 86060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:02,362-Speed 9689.08 samples/sec   Loss 7.1979   LearningRate 0.0551   Epoch: 5   Global Step: 86070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:03,414-Speed 9739.31 samples/sec   Loss 7.1514   LearningRate 0.0551   Epoch: 5   Global Step: 86080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:04,459-Speed 9803.13 samples/sec   Loss 7.0827   LearningRate 0.0551   Epoch: 5   Global Step: 86090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:05,509-Speed 9763.54 samples/sec   Loss 7.0652   LearningRate 0.0551   Epoch: 5   Global Step: 86100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:06,575-Speed 9611.40 samples/sec   Loss 7.2372   LearningRate 0.0551   Epoch: 5   Global Step: 86110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:07,641-Speed 9611.60 samples/sec   Loss 7.1074   LearningRate 0.0551   Epoch: 5   Global Step: 86120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:08,697-Speed 9699.12 samples/sec   Loss 7.1648   LearningRate 0.0551   Epoch: 5   Global Step: 86130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:09,733-Speed 9887.92 samples/sec   Loss 7.0889   LearningRate 0.0550   Epoch: 5   Global Step: 86140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:10,843-Speed 9226.60 samples/sec   Loss 7.0605   LearningRate 0.0550   Epoch: 5   Global Step: 86150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:11,896-Speed 9735.15 samples/sec   Loss 7.0957   LearningRate 0.0550   Epoch: 5   Global Step: 86160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:13,004-Speed 9249.95 samples/sec   Loss 7.1264   LearningRate 0.0550   Epoch: 5   Global Step: 86170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:14,122-Speed 9164.27 samples/sec   Loss 7.0999   LearningRate 0.0550   Epoch: 5   Global Step: 86180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:15,211-Speed 9407.75 samples/sec   Loss 7.2206   LearningRate 0.0550   Epoch: 5   Global Step: 86190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:16,265-Speed 9721.08 samples/sec   Loss 7.0943   LearningRate 0.0550   Epoch: 5   Global Step: 86200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:17,366-Speed 9313.29 samples/sec   Loss 7.1412   LearningRate 0.0550   Epoch: 5   Global Step: 86210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:18,435-Speed 9595.40 samples/sec   Loss 7.0612   LearningRate 0.0550   Epoch: 5   Global Step: 86220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:19,531-Speed 9341.25 samples/sec   Loss 7.1311   LearningRate 0.0550   Epoch: 5   Global Step: 86230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:20,608-Speed 9519.99 samples/sec   Loss 7.2390   LearningRate 0.0550   Epoch: 5   Global Step: 86240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:21,738-Speed 9064.15 samples/sec   Loss 7.1310   LearningRate 0.0550   Epoch: 5   Global Step: 86250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:22,819-Speed 9477.96 samples/sec   Loss 7.1708   LearningRate 0.0550   Epoch: 5   Global Step: 86260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:23,886-Speed 9603.82 samples/sec   Loss 7.1111   LearningRate 0.0550   Epoch: 5   Global Step: 86270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:24,962-Speed 9522.27 samples/sec   Loss 7.0485   LearningRate 0.0550   Epoch: 5   Global Step: 86280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:26,103-Speed 8980.12 samples/sec   Loss 7.1060   LearningRate 0.0550   Epoch: 5   Global Step: 86290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:27,209-Speed 9264.82 samples/sec   Loss 7.1748   LearningRate 0.0550   Epoch: 5   Global Step: 86300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:28,323-Speed 9192.87 samples/sec   Loss 7.2298   LearningRate 0.0550   Epoch: 5   Global Step: 86310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:29,395-Speed 9560.08 samples/sec   Loss 7.2221   LearningRate 0.0550   Epoch: 5   Global Step: 86320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:30,491-Speed 9348.60 samples/sec   Loss 7.2603   LearningRate 0.0550   Epoch: 5   Global Step: 86330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:31,582-Speed 9384.41 samples/sec   Loss 7.1664   LearningRate 0.0550   Epoch: 5   Global Step: 86340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:32,630-Speed 9781.56 samples/sec   Loss 7.1631   LearningRate 0.0550   Epoch: 5   Global Step: 86350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:33,713-Speed 9455.50 samples/sec   Loss 7.2920   LearningRate 0.0550   Epoch: 5   Global Step: 86360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:34,807-Speed 9365.58 samples/sec   Loss 7.2270   LearningRate 0.0549   Epoch: 5   Global Step: 86370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:35,871-Speed 9635.94 samples/sec   Loss 7.0627   LearningRate 0.0549   Epoch: 5   Global Step: 86380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:36,986-Speed 9188.36 samples/sec   Loss 7.0772   LearningRate 0.0549   Epoch: 5   Global Step: 86390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:38,077-Speed 9391.56 samples/sec   Loss 7.0783   LearningRate 0.0549   Epoch: 5   Global Step: 86400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:39,110-Speed 9919.79 samples/sec   Loss 7.1842   LearningRate 0.0549   Epoch: 5   Global Step: 86410   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:01:40,173-Speed 9642.26 samples/sec   Loss 7.3327   LearningRate 0.0549   Epoch: 5   Global Step: 86420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:41,262-Speed 9405.27 samples/sec   Loss 7.0735   LearningRate 0.0549   Epoch: 5   Global Step: 86430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:42,342-Speed 9492.16 samples/sec   Loss 7.1108   LearningRate 0.0549   Epoch: 5   Global Step: 86440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:43,418-Speed 9521.26 samples/sec   Loss 7.0197   LearningRate 0.0549   Epoch: 5   Global Step: 86450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:44,533-Speed 9188.37 samples/sec   Loss 7.1398   LearningRate 0.0549   Epoch: 5   Global Step: 86460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:45,619-Speed 9439.92 samples/sec   Loss 7.2478   LearningRate 0.0549   Epoch: 5   Global Step: 86470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:46,729-Speed 9229.52 samples/sec   Loss 7.1894   LearningRate 0.0549   Epoch: 5   Global Step: 86480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:47,812-Speed 9455.36 samples/sec   Loss 7.2533   LearningRate 0.0549   Epoch: 5   Global Step: 86490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:48,872-Speed 9669.32 samples/sec   Loss 7.1934   LearningRate 0.0549   Epoch: 5   Global Step: 86500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:49,976-Speed 9278.58 samples/sec   Loss 7.0685   LearningRate 0.0549   Epoch: 5   Global Step: 86510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:01:51,039-Speed 9639.89 samples/sec   Loss 7.2252   LearningRate 0.0549   Epoch: 5   Global Step: 86520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:52,087-Speed 9779.59 samples/sec   Loss 7.0688   LearningRate 0.0549   Epoch: 5   Global Step: 86530   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:53,154-Speed 9601.21 samples/sec   Loss 7.1797   LearningRate 0.0549   Epoch: 5   Global Step: 86540   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:54,249-Speed 9353.89 samples/sec   Loss 7.0998   LearningRate 0.0549   Epoch: 5   Global Step: 86550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:55,323-Speed 9549.12 samples/sec   Loss 7.1535   LearningRate 0.0549   Epoch: 5   Global Step: 86560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:56,382-Speed 9670.27 samples/sec   Loss 7.1630   LearningRate 0.0549   Epoch: 5   Global Step: 86570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:57,474-Speed 9387.78 samples/sec   Loss 7.0776   LearningRate 0.0549   Epoch: 5   Global Step: 86580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:58,582-Speed 9246.75 samples/sec   Loss 7.2690   LearningRate 0.0549   Epoch: 5   Global Step: 86590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:01:59,639-Speed 9687.82 samples/sec   Loss 7.1403   LearningRate 0.0548   Epoch: 5   Global Step: 86600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:00,688-Speed 9766.86 samples/sec   Loss 7.2290   LearningRate 0.0548   Epoch: 5   Global Step: 86610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:01,763-Speed 9534.15 samples/sec   Loss 7.3017   LearningRate 0.0548   Epoch: 5   Global Step: 86620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:02,808-Speed 9803.08 samples/sec   Loss 7.1020   LearningRate 0.0548   Epoch: 5   Global Step: 86630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:03,877-Speed 9584.27 samples/sec   Loss 7.2481   LearningRate 0.0548   Epoch: 5   Global Step: 86640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:04,973-Speed 9354.40 samples/sec   Loss 7.0823   LearningRate 0.0548   Epoch: 5   Global Step: 86650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:06,057-Speed 9448.61 samples/sec   Loss 7.2124   LearningRate 0.0548   Epoch: 5   Global Step: 86660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:07,125-Speed 9598.82 samples/sec   Loss 7.2065   LearningRate 0.0548   Epoch: 5   Global Step: 86670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:08,216-Speed 9384.44 samples/sec   Loss 7.1448   LearningRate 0.0548   Epoch: 5   Global Step: 86680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:09,318-Speed 9302.66 samples/sec   Loss 7.1733   LearningRate 0.0548   Epoch: 5   Global Step: 86690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:10,376-Speed 9678.11 samples/sec   Loss 7.2106   LearningRate 0.0548   Epoch: 5   Global Step: 86700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:11,424-Speed 9776.70 samples/sec   Loss 7.1591   LearningRate 0.0548   Epoch: 5   Global Step: 86710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:12,497-Speed 9558.52 samples/sec   Loss 7.1927   LearningRate 0.0548   Epoch: 5   Global Step: 86720   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:02:13,587-Speed 9398.27 samples/sec   Loss 7.1331   LearningRate 0.0548   Epoch: 5   Global Step: 86730   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:02:14,697-Speed 9235.52 samples/sec   Loss 7.0612   LearningRate 0.0548   Epoch: 5   Global Step: 86740   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:02:15,816-Speed 9153.81 samples/sec   Loss 7.0964   LearningRate 0.0548   Epoch: 5   Global Step: 86750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:16,937-Speed 9135.75 samples/sec   Loss 7.0657   LearningRate 0.0548   Epoch: 5   Global Step: 86760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:18,023-Speed 9436.43 samples/sec   Loss 7.1105   LearningRate 0.0548   Epoch: 5   Global Step: 86770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:19,126-Speed 9293.96 samples/sec   Loss 7.3183   LearningRate 0.0548   Epoch: 5   Global Step: 86780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:20,227-Speed 9301.25 samples/sec   Loss 7.2513   LearningRate 0.0548   Epoch: 5   Global Step: 86790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:21,313-Speed 9434.29 samples/sec   Loss 7.2318   LearningRate 0.0548   Epoch: 5   Global Step: 86800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:22,363-Speed 9762.82 samples/sec   Loss 7.2943   LearningRate 0.0548   Epoch: 5   Global Step: 86810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:23,434-Speed 9560.13 samples/sec   Loss 7.1621   LearningRate 0.0547   Epoch: 5   Global Step: 86820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:24,456-Speed 10029.80 samples/sec   Loss 7.1429   LearningRate 0.0547   Epoch: 5   Global Step: 86830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:25,536-Speed 9486.85 samples/sec   Loss 7.2028   LearningRate 0.0547   Epoch: 5   Global Step: 86840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:26,628-Speed 9380.99 samples/sec   Loss 7.3205   LearningRate 0.0547   Epoch: 5   Global Step: 86850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:27,687-Speed 9679.40 samples/sec   Loss 7.0954   LearningRate 0.0547   Epoch: 5   Global Step: 86860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:28,761-Speed 9533.10 samples/sec   Loss 7.2379   LearningRate 0.0547   Epoch: 5   Global Step: 86870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:29,805-Speed 9814.23 samples/sec   Loss 7.1204   LearningRate 0.0547   Epoch: 5   Global Step: 86880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:30,898-Speed 9380.12 samples/sec   Loss 7.2733   LearningRate 0.0547   Epoch: 5   Global Step: 86890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:31,947-Speed 9759.65 samples/sec   Loss 7.1019   LearningRate 0.0547   Epoch: 5   Global Step: 86900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:33,025-Speed 9506.61 samples/sec   Loss 7.1541   LearningRate 0.0547   Epoch: 5   Global Step: 86910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:34,118-Speed 9382.86 samples/sec   Loss 7.2904   LearningRate 0.0547   Epoch: 5   Global Step: 86920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:35,177-Speed 9667.41 samples/sec   Loss 7.2722   LearningRate 0.0547   Epoch: 5   Global Step: 86930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:36,212-Speed 9906.90 samples/sec   Loss 7.2257   LearningRate 0.0547   Epoch: 5   Global Step: 86940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:37,305-Speed 9374.51 samples/sec   Loss 7.0806   LearningRate 0.0547   Epoch: 5   Global Step: 86950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:38,354-Speed 9763.00 samples/sec   Loss 7.2803   LearningRate 0.0547   Epoch: 5   Global Step: 86960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:39,424-Speed 9583.03 samples/sec   Loss 7.2246   LearningRate 0.0547   Epoch: 5   Global Step: 86970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:40,465-Speed 9834.75 samples/sec   Loss 7.1731   LearningRate 0.0547   Epoch: 5   Global Step: 86980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:41,585-Speed 9154.63 samples/sec   Loss 7.3176   LearningRate 0.0547   Epoch: 5   Global Step: 86990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:02:42,719-Speed 9038.87 samples/sec   Loss 7.2891   LearningRate 0.0547   Epoch: 5   Global Step: 87000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:43,799-Speed 9484.37 samples/sec   Loss 7.2346   LearningRate 0.0547   Epoch: 5   Global Step: 87010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:44,865-Speed 9614.67 samples/sec   Loss 7.1389   LearningRate 0.0547   Epoch: 5   Global Step: 87020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:45,930-Speed 9617.31 samples/sec   Loss 7.2756   LearningRate 0.0547   Epoch: 5   Global Step: 87030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:47,055-Speed 9112.12 samples/sec   Loss 7.2346   LearningRate 0.0547   Epoch: 5   Global Step: 87040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:48,161-Speed 9259.53 samples/sec   Loss 7.2132   LearningRate 0.0546   Epoch: 5   Global Step: 87050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:49,260-Speed 9321.89 samples/sec   Loss 7.1887   LearningRate 0.0546   Epoch: 5   Global Step: 87060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:50,354-Speed 9366.25 samples/sec   Loss 7.2960   LearningRate 0.0546   Epoch: 5   Global Step: 87070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:51,461-Speed 9256.51 samples/sec   Loss 7.1516   LearningRate 0.0546   Epoch: 5   Global Step: 87080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:52,519-Speed 9685.26 samples/sec   Loss 7.2574   LearningRate 0.0546   Epoch: 5   Global Step: 87090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:02:53,629-Speed 9229.78 samples/sec   Loss 7.2128   LearningRate 0.0546   Epoch: 5   Global Step: 87100   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:02:54,703-Speed 9542.18 samples/sec   Loss 7.1254   LearningRate 0.0546   Epoch: 5   Global Step: 87110   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:02:55,732-Speed 9961.15 samples/sec   Loss 7.3072   LearningRate 0.0546   Epoch: 5   Global Step: 87120   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:02:56,789-Speed 9694.26 samples/sec   Loss 7.1832   LearningRate 0.0546   Epoch: 5   Global Step: 87130   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:02:57,839-Speed 9751.18 samples/sec   Loss 7.1449   LearningRate 0.0546   Epoch: 5   Global Step: 87140   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:02:58,964-Speed 9113.81 samples/sec   Loss 7.1773   LearningRate 0.0546   Epoch: 5   Global Step: 87150   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:03:00,019-Speed 9704.62 samples/sec   Loss 7.1491   LearningRate 0.0546   Epoch: 5   Global Step: 87160   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:03:01,038-Speed 10061.93 samples/sec   Loss 7.2938   LearningRate 0.0546   Epoch: 5   Global Step: 87170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:02,088-Speed 9754.39 samples/sec   Loss 7.3057   LearningRate 0.0546   Epoch: 5   Global Step: 87180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:03,129-Speed 9840.65 samples/sec   Loss 7.2098   LearningRate 0.0546   Epoch: 5   Global Step: 87190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:04,217-Speed 9422.65 samples/sec   Loss 7.2344   LearningRate 0.0546   Epoch: 5   Global Step: 87200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:05,268-Speed 9745.41 samples/sec   Loss 7.2436   LearningRate 0.0546   Epoch: 5   Global Step: 87210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:06,371-Speed 9288.21 samples/sec   Loss 7.1597   LearningRate 0.0546   Epoch: 5   Global Step: 87220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:07,457-Speed 9435.34 samples/sec   Loss 7.0765   LearningRate 0.0546   Epoch: 5   Global Step: 87230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:08,516-Speed 9678.52 samples/sec   Loss 7.1177   LearningRate 0.0546   Epoch: 5   Global Step: 87240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:09,605-Speed 9407.65 samples/sec   Loss 7.2296   LearningRate 0.0546   Epoch: 5   Global Step: 87250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:10,679-Speed 9540.62 samples/sec   Loss 7.1531   LearningRate 0.0546   Epoch: 5   Global Step: 87260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:11,790-Speed 9226.09 samples/sec   Loss 7.2920   LearningRate 0.0545   Epoch: 5   Global Step: 87270   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:03:12,872-Speed 9470.03 samples/sec   Loss 7.3143   LearningRate 0.0545   Epoch: 5   Global Step: 87280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:13,921-Speed 9766.67 samples/sec   Loss 7.2698   LearningRate 0.0545   Epoch: 5   Global Step: 87290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:15,017-Speed 9353.49 samples/sec   Loss 7.2418   LearningRate 0.0545   Epoch: 5   Global Step: 87300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:16,103-Speed 9430.81 samples/sec   Loss 7.1057   LearningRate 0.0545   Epoch: 5   Global Step: 87310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:17,154-Speed 9749.40 samples/sec   Loss 7.2333   LearningRate 0.0545   Epoch: 5   Global Step: 87320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:18,219-Speed 9624.55 samples/sec   Loss 7.0442   LearningRate 0.0545   Epoch: 5   Global Step: 87330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:19,289-Speed 9571.74 samples/sec   Loss 7.1768   LearningRate 0.0545   Epoch: 5   Global Step: 87340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:20,345-Speed 9702.57 samples/sec   Loss 7.2989   LearningRate 0.0545   Epoch: 5   Global Step: 87350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:21,416-Speed 9564.70 samples/sec   Loss 7.2026   LearningRate 0.0545   Epoch: 5   Global Step: 87360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:22,519-Speed 9295.96 samples/sec   Loss 7.2052   LearningRate 0.0545   Epoch: 5   Global Step: 87370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:23,578-Speed 9667.58 samples/sec   Loss 7.0907   LearningRate 0.0545   Epoch: 5   Global Step: 87380   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:03:24,643-Speed 9620.47 samples/sec   Loss 7.1819   LearningRate 0.0545   Epoch: 5   Global Step: 87390   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:03:25,760-Speed 9176.12 samples/sec   Loss 7.2404   LearningRate 0.0545   Epoch: 5   Global Step: 87400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:26,857-Speed 9338.68 samples/sec   Loss 7.3331   LearningRate 0.0545   Epoch: 5   Global Step: 87410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:27,954-Speed 9346.82 samples/sec   Loss 7.2319   LearningRate 0.0545   Epoch: 5   Global Step: 87420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:29,023-Speed 9579.80 samples/sec   Loss 7.3196   LearningRate 0.0545   Epoch: 5   Global Step: 87430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:30,113-Speed 9400.65 samples/sec   Loss 7.4292   LearningRate 0.0545   Epoch: 5   Global Step: 87440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:31,185-Speed 9561.89 samples/sec   Loss 7.3053   LearningRate 0.0545   Epoch: 5   Global Step: 87450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:32,258-Speed 9545.37 samples/sec   Loss 7.2099   LearningRate 0.0545   Epoch: 5   Global Step: 87460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:33,355-Speed 9338.03 samples/sec   Loss 7.1778   LearningRate 0.0545   Epoch: 5   Global Step: 87470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:34,453-Speed 9332.13 samples/sec   Loss 7.2071   LearningRate 0.0545   Epoch: 5   Global Step: 87480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:35,544-Speed 9394.19 samples/sec   Loss 7.1843   LearningRate 0.0545   Epoch: 5   Global Step: 87490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:36,649-Speed 9278.93 samples/sec   Loss 7.1985   LearningRate 0.0544   Epoch: 5   Global Step: 87500   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:03:37,710-Speed 9649.37 samples/sec   Loss 7.2056   LearningRate 0.0544   Epoch: 5   Global Step: 87510   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:03:38,776-Speed 9612.86 samples/sec   Loss 7.2755   LearningRate 0.0544   Epoch: 5   Global Step: 87520   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:03:39,824-Speed 9778.67 samples/sec   Loss 7.1536   LearningRate 0.0544   Epoch: 5   Global Step: 87530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:40,887-Speed 9636.59 samples/sec   Loss 7.1812   LearningRate 0.0544   Epoch: 5   Global Step: 87540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:41,996-Speed 9235.33 samples/sec   Loss 7.1547   LearningRate 0.0544   Epoch: 5   Global Step: 87550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:03:43,098-Speed 9306.23 samples/sec   Loss 7.1908   LearningRate 0.0544   Epoch: 5   Global Step: 87560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:03:44,172-Speed 9540.41 samples/sec   Loss 7.1955   LearningRate 0.0544   Epoch: 5   Global Step: 87570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:03:45,231-Speed 9677.51 samples/sec   Loss 7.1803   LearningRate 0.0544   Epoch: 5   Global Step: 87580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:03:46,290-Speed 9675.03 samples/sec   Loss 7.2064   LearningRate 0.0544   Epoch: 5   Global Step: 87590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:03:47,374-Speed 9454.87 samples/sec   Loss 7.1008   LearningRate 0.0544   Epoch: 5   Global Step: 87600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:03:48,453-Speed 9499.00 samples/sec   Loss 7.2947   LearningRate 0.0544   Epoch: 5   Global Step: 87610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:03:49,528-Speed 9530.73 samples/sec   Loss 7.2389   LearningRate 0.0544   Epoch: 5   Global Step: 87620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:03:50,595-Speed 9602.90 samples/sec   Loss 7.1471   LearningRate 0.0544   Epoch: 5   Global Step: 87630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:03:51,628-Speed 9913.09 samples/sec   Loss 7.2556   LearningRate 0.0544   Epoch: 5   Global Step: 87640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:03:52,679-Speed 9751.88 samples/sec   Loss 7.1777   LearningRate 0.0544   Epoch: 5   Global Step: 87650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:53,735-Speed 9696.84 samples/sec   Loss 7.2204   LearningRate 0.0544   Epoch: 5   Global Step: 87660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:54,826-Speed 9392.50 samples/sec   Loss 7.1464   LearningRate 0.0544   Epoch: 5   Global Step: 87670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:55,920-Speed 9369.26 samples/sec   Loss 7.2133   LearningRate 0.0544   Epoch: 5   Global Step: 87680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:56,990-Speed 9579.97 samples/sec   Loss 7.1333   LearningRate 0.0544   Epoch: 5   Global Step: 87690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:58,026-Speed 9884.04 samples/sec   Loss 7.2496   LearningRate 0.0544   Epoch: 5   Global Step: 87700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:03:59,126-Speed 9320.89 samples/sec   Loss 7.1596   LearningRate 0.0544   Epoch: 5   Global Step: 87710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:00,215-Speed 9407.57 samples/sec   Loss 7.2093   LearningRate 0.0543   Epoch: 5   Global Step: 87720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:01,263-Speed 9773.53 samples/sec   Loss 7.2359   LearningRate 0.0543   Epoch: 5   Global Step: 87730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:02,332-Speed 9589.55 samples/sec   Loss 7.2585   LearningRate 0.0543   Epoch: 5   Global Step: 87740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:04:03,432-Speed 9315.08 samples/sec   Loss 7.3079   LearningRate 0.0543   Epoch: 5   Global Step: 87750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:04:04,504-Speed 9553.26 samples/sec   Loss 7.0741   LearningRate 0.0543   Epoch: 5   Global Step: 87760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:04:05,563-Speed 9676.51 samples/sec   Loss 7.3083   LearningRate 0.0543   Epoch: 5   Global Step: 87770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:04:06,664-Speed 9307.83 samples/sec   Loss 7.3145   LearningRate 0.0543   Epoch: 5   Global Step: 87780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:04:07,777-Speed 9203.56 samples/sec   Loss 7.3658   LearningRate 0.0543   Epoch: 5   Global Step: 87790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:04:08,861-Speed 9458.12 samples/sec   Loss 7.3244   LearningRate 0.0543   Epoch: 5   Global Step: 87800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:04:09,955-Speed 9359.33 samples/sec   Loss 7.2663   LearningRate 0.0543   Epoch: 5   Global Step: 87810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:04:11,031-Speed 9528.98 samples/sec   Loss 7.2554   LearningRate 0.0543   Epoch: 5   Global Step: 87820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:04:12,134-Speed 9286.60 samples/sec   Loss 7.1497   LearningRate 0.0543   Epoch: 5   Global Step: 87830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:04:13,210-Speed 9528.87 samples/sec   Loss 7.2624   LearningRate 0.0543   Epoch: 5   Global Step: 87840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:14,332-Speed 9130.37 samples/sec   Loss 7.2130   LearningRate 0.0543   Epoch: 5   Global Step: 87850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:15,366-Speed 9906.31 samples/sec   Loss 7.2926   LearningRate 0.0543   Epoch: 5   Global Step: 87860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:16,422-Speed 9706.29 samples/sec   Loss 7.2266   LearningRate 0.0543   Epoch: 5   Global Step: 87870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:17,471-Speed 9766.57 samples/sec   Loss 7.2382   LearningRate 0.0543   Epoch: 5   Global Step: 87880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:18,518-Speed 9783.05 samples/sec   Loss 7.4150   LearningRate 0.0543   Epoch: 5   Global Step: 87890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:19,619-Speed 9306.67 samples/sec   Loss 7.2523   LearningRate 0.0543   Epoch: 5   Global Step: 87900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:20,712-Speed 9376.13 samples/sec   Loss 7.2075   LearningRate 0.0543   Epoch: 5   Global Step: 87910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:21,779-Speed 9598.23 samples/sec   Loss 7.2294   LearningRate 0.0543   Epoch: 5   Global Step: 87920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:22,876-Speed 9343.54 samples/sec   Loss 7.2053   LearningRate 0.0543   Epoch: 5   Global Step: 87930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:23,978-Speed 9293.31 samples/sec   Loss 7.2491   LearningRate 0.0543   Epoch: 5   Global Step: 87940   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:04:25,053-Speed 9530.18 samples/sec   Loss 7.2457   LearningRate 0.0542   Epoch: 5   Global Step: 87950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:26,157-Speed 9291.01 samples/sec   Loss 7.1738   LearningRate 0.0542   Epoch: 5   Global Step: 87960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:27,214-Speed 9692.31 samples/sec   Loss 7.2873   LearningRate 0.0542   Epoch: 5   Global Step: 87970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:28,272-Speed 9677.21 samples/sec   Loss 7.3177   LearningRate 0.0542   Epoch: 5   Global Step: 87980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:29,312-Speed 9857.06 samples/sec   Loss 7.2060   LearningRate 0.0542   Epoch: 5   Global Step: 87990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:30,426-Speed 9194.74 samples/sec   Loss 7.2623   LearningRate 0.0542   Epoch: 5   Global Step: 88000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:04:52,440-[lfw][88000]XNorm: 12.012908
Training: 2022-04-11 15:04:52,441-[lfw][88000]Accuracy-Flip: 0.99617+-0.00224
Training: 2022-04-11 15:04:52,441-[lfw][88000]Accuracy-Highest: 0.99667
Training: 2022-04-11 15:05:17,772-[cfp_fp][88000]XNorm: 10.177834
Training: 2022-04-11 15:05:17,773-[cfp_fp][88000]Accuracy-Flip: 0.95729+-0.00920
Training: 2022-04-11 15:05:17,774-[cfp_fp][88000]Accuracy-Highest: 0.95729
Training: 2022-04-11 15:05:39,596-[agedb_30][88000]XNorm: 11.606030
Training: 2022-04-11 15:05:39,597-[agedb_30][88000]Accuracy-Flip: 0.96317+-0.00911
Training: 2022-04-11 15:05:39,597-[agedb_30][88000]Accuracy-Highest: 0.96317
Training: 2022-04-11 15:05:40,686-Speed 145.75 samples/sec   Loss 7.2709   LearningRate 0.0542   Epoch: 5   Global Step: 88010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:05:41,762-Speed 9516.47 samples/sec   Loss 7.3265   LearningRate 0.0542   Epoch: 5   Global Step: 88020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:05:42,896-Speed 9033.31 samples/sec   Loss 7.2012   LearningRate 0.0542   Epoch: 5   Global Step: 88030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:05:43,984-Speed 9424.56 samples/sec   Loss 7.2741   LearningRate 0.0542   Epoch: 5   Global Step: 88040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:05:45,080-Speed 9344.56 samples/sec   Loss 7.3115   LearningRate 0.0542   Epoch: 5   Global Step: 88050   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:05:46,134-Speed 9722.17 samples/sec   Loss 7.3323   LearningRate 0.0542   Epoch: 5   Global Step: 88060   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:05:47,191-Speed 9697.37 samples/sec   Loss 7.2909   LearningRate 0.0542   Epoch: 5   Global Step: 88070   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:05:48,305-Speed 9193.44 samples/sec   Loss 7.3133   LearningRate 0.0542   Epoch: 5   Global Step: 88080   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:05:49,429-Speed 9114.84 samples/sec   Loss 7.3610   LearningRate 0.0542   Epoch: 5   Global Step: 88090   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:05:50,542-Speed 9204.00 samples/sec   Loss 7.2957   LearningRate 0.0542   Epoch: 5   Global Step: 88100   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:05:51,580-Speed 9875.04 samples/sec   Loss 7.3536   LearningRate 0.0542   Epoch: 5   Global Step: 88110   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:05:52,646-Speed 9613.77 samples/sec   Loss 7.3349   LearningRate 0.0542   Epoch: 5   Global Step: 88120   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:05:53,710-Speed 9624.68 samples/sec   Loss 7.2476   LearningRate 0.0542   Epoch: 5   Global Step: 88130   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:05:54,836-Speed 9102.82 samples/sec   Loss 7.3290   LearningRate 0.0542   Epoch: 5   Global Step: 88140   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:05:55,943-Speed 9258.07 samples/sec   Loss 7.1724   LearningRate 0.0542   Epoch: 5   Global Step: 88150   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:05:57,033-Speed 9402.07 samples/sec   Loss 7.2229   LearningRate 0.0542   Epoch: 5   Global Step: 88160   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:05:58,069-Speed 9891.88 samples/sec   Loss 7.2902   LearningRate 0.0542   Epoch: 5   Global Step: 88170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:05:59,120-Speed 9743.83 samples/sec   Loss 7.3238   LearningRate 0.0541   Epoch: 5   Global Step: 88180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:00,192-Speed 9561.09 samples/sec   Loss 7.3496   LearningRate 0.0541   Epoch: 5   Global Step: 88190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:01,236-Speed 9819.96 samples/sec   Loss 7.2609   LearningRate 0.0541   Epoch: 5   Global Step: 88200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:02,313-Speed 9506.26 samples/sec   Loss 7.2888   LearningRate 0.0541   Epoch: 5   Global Step: 88210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:03,411-Speed 9330.88 samples/sec   Loss 7.1464   LearningRate 0.0541   Epoch: 5   Global Step: 88220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:04,524-Speed 9208.42 samples/sec   Loss 7.2707   LearningRate 0.0541   Epoch: 5   Global Step: 88230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:05,609-Speed 9442.25 samples/sec   Loss 7.2336   LearningRate 0.0541   Epoch: 5   Global Step: 88240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:06,706-Speed 9339.83 samples/sec   Loss 7.1428   LearningRate 0.0541   Epoch: 5   Global Step: 88250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:07,787-Speed 9479.30 samples/sec   Loss 7.1598   LearningRate 0.0541   Epoch: 5   Global Step: 88260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:08,889-Speed 9299.04 samples/sec   Loss 7.3662   LearningRate 0.0541   Epoch: 5   Global Step: 88270   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:06:09,939-Speed 9754.14 samples/sec   Loss 7.3134   LearningRate 0.0541   Epoch: 5   Global Step: 88280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:11,012-Speed 9549.17 samples/sec   Loss 7.1696   LearningRate 0.0541   Epoch: 5   Global Step: 88290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:12,080-Speed 9598.69 samples/sec   Loss 7.2360   LearningRate 0.0541   Epoch: 5   Global Step: 88300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:13,152-Speed 9557.07 samples/sec   Loss 7.2905   LearningRate 0.0541   Epoch: 5   Global Step: 88310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:14,207-Speed 9715.54 samples/sec   Loss 7.2175   LearningRate 0.0541   Epoch: 5   Global Step: 88320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:15,264-Speed 9690.34 samples/sec   Loss 7.2551   LearningRate 0.0541   Epoch: 5   Global Step: 88330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:16,293-Speed 9956.92 samples/sec   Loss 7.2131   LearningRate 0.0541   Epoch: 5   Global Step: 88340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:17,369-Speed 9523.92 samples/sec   Loss 7.2235   LearningRate 0.0541   Epoch: 5   Global Step: 88350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:18,463-Speed 9365.66 samples/sec   Loss 7.2223   LearningRate 0.0541   Epoch: 5   Global Step: 88360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:19,527-Speed 9622.83 samples/sec   Loss 7.1805   LearningRate 0.0541   Epoch: 5   Global Step: 88370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:20,619-Speed 9390.47 samples/sec   Loss 7.1861   LearningRate 0.0541   Epoch: 5   Global Step: 88380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:21,725-Speed 9264.84 samples/sec   Loss 7.1885   LearningRate 0.0541   Epoch: 5   Global Step: 88390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:22,812-Speed 9421.17 samples/sec   Loss 7.2173   LearningRate 0.0540   Epoch: 5   Global Step: 88400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:23,877-Speed 9619.99 samples/sec   Loss 7.3066   LearningRate 0.0540   Epoch: 5   Global Step: 88410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:24,943-Speed 9615.37 samples/sec   Loss 7.1975   LearningRate 0.0540   Epoch: 5   Global Step: 88420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:26,045-Speed 9297.78 samples/sec   Loss 7.1900   LearningRate 0.0540   Epoch: 5   Global Step: 88430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:27,090-Speed 9796.81 samples/sec   Loss 7.3482   LearningRate 0.0540   Epoch: 5   Global Step: 88440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:28,184-Speed 9369.76 samples/sec   Loss 7.2421   LearningRate 0.0540   Epoch: 5   Global Step: 88450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:29,248-Speed 9627.44 samples/sec   Loss 7.2508   LearningRate 0.0540   Epoch: 5   Global Step: 88460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:30,335-Speed 9429.62 samples/sec   Loss 7.2004   LearningRate 0.0540   Epoch: 5   Global Step: 88470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:31,359-Speed 10003.77 samples/sec   Loss 7.2872   LearningRate 0.0540   Epoch: 5   Global Step: 88480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:32,404-Speed 9820.96 samples/sec   Loss 7.1653   LearningRate 0.0540   Epoch: 5   Global Step: 88490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:33,509-Speed 9269.24 samples/sec   Loss 7.3488   LearningRate 0.0540   Epoch: 5   Global Step: 88500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:34,599-Speed 9399.01 samples/sec   Loss 7.2457   LearningRate 0.0540   Epoch: 5   Global Step: 88510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:35,703-Speed 9282.42 samples/sec   Loss 7.2804   LearningRate 0.0540   Epoch: 5   Global Step: 88520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:36,795-Speed 9379.72 samples/sec   Loss 7.2590   LearningRate 0.0540   Epoch: 5   Global Step: 88530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:37,881-Speed 9439.65 samples/sec   Loss 7.1849   LearningRate 0.0540   Epoch: 5   Global Step: 88540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:38,978-Speed 9338.56 samples/sec   Loss 7.3498   LearningRate 0.0540   Epoch: 5   Global Step: 88550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:40,065-Speed 9426.45 samples/sec   Loss 7.2601   LearningRate 0.0540   Epoch: 5   Global Step: 88560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:41,147-Speed 9468.37 samples/sec   Loss 7.3392   LearningRate 0.0540   Epoch: 5   Global Step: 88570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:42,248-Speed 9302.32 samples/sec   Loss 7.1643   LearningRate 0.0540   Epoch: 5   Global Step: 88580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:43,359-Speed 9221.87 samples/sec   Loss 7.2092   LearningRate 0.0540   Epoch: 5   Global Step: 88590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:44,442-Speed 9469.57 samples/sec   Loss 7.3214   LearningRate 0.0540   Epoch: 5   Global Step: 88600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:45,508-Speed 9609.97 samples/sec   Loss 7.3166   LearningRate 0.0540   Epoch: 5   Global Step: 88610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:46,610-Speed 9291.47 samples/sec   Loss 7.2913   LearningRate 0.0540   Epoch: 5   Global Step: 88620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:47,682-Speed 9563.77 samples/sec   Loss 7.3536   LearningRate 0.0539   Epoch: 5   Global Step: 88630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:48,757-Speed 9525.27 samples/sec   Loss 7.3252   LearningRate 0.0539   Epoch: 5   Global Step: 88640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:49,859-Speed 9301.61 samples/sec   Loss 7.3115   LearningRate 0.0539   Epoch: 5   Global Step: 88650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:50,924-Speed 9628.70 samples/sec   Loss 7.3505   LearningRate 0.0539   Epoch: 5   Global Step: 88660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:52,013-Speed 9403.81 samples/sec   Loss 7.3133   LearningRate 0.0539   Epoch: 5   Global Step: 88670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:53,094-Speed 9482.37 samples/sec   Loss 7.3146   LearningRate 0.0539   Epoch: 5   Global Step: 88680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:54,159-Speed 9622.24 samples/sec   Loss 7.2966   LearningRate 0.0539   Epoch: 5   Global Step: 88690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:55,287-Speed 9078.87 samples/sec   Loss 7.3577   LearningRate 0.0539   Epoch: 5   Global Step: 88700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:56,365-Speed 9506.77 samples/sec   Loss 7.2956   LearningRate 0.0539   Epoch: 5   Global Step: 88710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:57,402-Speed 9879.86 samples/sec   Loss 7.2182   LearningRate 0.0539   Epoch: 5   Global Step: 88720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:06:58,474-Speed 9552.28 samples/sec   Loss 7.2972   LearningRate 0.0539   Epoch: 5   Global Step: 88730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:06:59,546-Speed 9562.03 samples/sec   Loss 7.2352   LearningRate 0.0539   Epoch: 5   Global Step: 88740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:00,618-Speed 9554.14 samples/sec   Loss 7.1791   LearningRate 0.0539   Epoch: 5   Global Step: 88750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:01,710-Speed 9383.44 samples/sec   Loss 7.2916   LearningRate 0.0539   Epoch: 5   Global Step: 88760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:02,769-Speed 9677.72 samples/sec   Loss 7.2885   LearningRate 0.0539   Epoch: 5   Global Step: 88770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:03,875-Speed 9267.03 samples/sec   Loss 7.2420   LearningRate 0.0539   Epoch: 5   Global Step: 88780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:04,942-Speed 9601.53 samples/sec   Loss 7.2849   LearningRate 0.0539   Epoch: 5   Global Step: 88790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:06,031-Speed 9411.10 samples/sec   Loss 7.3788   LearningRate 0.0539   Epoch: 5   Global Step: 88800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:07,095-Speed 9625.33 samples/sec   Loss 7.1662   LearningRate 0.0539   Epoch: 5   Global Step: 88810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:08,166-Speed 9569.33 samples/sec   Loss 7.1852   LearningRate 0.0539   Epoch: 5   Global Step: 88820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:09,230-Speed 9636.21 samples/sec   Loss 7.2251   LearningRate 0.0539   Epoch: 5   Global Step: 88830   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:07:10,325-Speed 9355.09 samples/sec   Loss 7.2746   LearningRate 0.0539   Epoch: 5   Global Step: 88840   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:07:11,434-Speed 9237.11 samples/sec   Loss 7.2630   LearningRate 0.0539   Epoch: 5   Global Step: 88850   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:07:12,561-Speed 9087.21 samples/sec   Loss 7.2636   LearningRate 0.0538   Epoch: 5   Global Step: 88860   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:07:13,669-Speed 9248.42 samples/sec   Loss 7.3084   LearningRate 0.0538   Epoch: 5   Global Step: 88870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:14,720-Speed 9757.46 samples/sec   Loss 7.1736   LearningRate 0.0538   Epoch: 5   Global Step: 88880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:15,765-Speed 9801.82 samples/sec   Loss 7.2267   LearningRate 0.0538   Epoch: 5   Global Step: 88890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:16,853-Speed 9418.10 samples/sec   Loss 7.2077   LearningRate 0.0538   Epoch: 5   Global Step: 88900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:17,937-Speed 9451.45 samples/sec   Loss 7.2907   LearningRate 0.0538   Epoch: 5   Global Step: 88910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:19,018-Speed 9480.98 samples/sec   Loss 7.2861   LearningRate 0.0538   Epoch: 5   Global Step: 88920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:20,071-Speed 9723.48 samples/sec   Loss 7.3645   LearningRate 0.0538   Epoch: 5   Global Step: 88930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:21,186-Speed 9194.78 samples/sec   Loss 7.2476   LearningRate 0.0538   Epoch: 5   Global Step: 88940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:22,297-Speed 9219.68 samples/sec   Loss 7.4111   LearningRate 0.0538   Epoch: 5   Global Step: 88950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:23,375-Speed 9501.89 samples/sec   Loss 7.3138   LearningRate 0.0538   Epoch: 5   Global Step: 88960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:24,528-Speed 8891.87 samples/sec   Loss 7.2730   LearningRate 0.0538   Epoch: 5   Global Step: 88970   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:07:25,593-Speed 9616.51 samples/sec   Loss 7.2165   LearningRate 0.0538   Epoch: 5   Global Step: 88980   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:07:26,702-Speed 9244.46 samples/sec   Loss 7.2984   LearningRate 0.0538   Epoch: 5   Global Step: 88990   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:07:27,769-Speed 9604.21 samples/sec   Loss 7.2252   LearningRate 0.0538   Epoch: 5   Global Step: 89000   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:07:28,853-Speed 9445.17 samples/sec   Loss 7.3222   LearningRate 0.0538   Epoch: 5   Global Step: 89010   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:07:29,878-Speed 9995.61 samples/sec   Loss 7.2213   LearningRate 0.0538   Epoch: 5   Global Step: 89020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:30,961-Speed 9468.69 samples/sec   Loss 7.3860   LearningRate 0.0538   Epoch: 5   Global Step: 89030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:32,018-Speed 9690.86 samples/sec   Loss 7.0981   LearningRate 0.0538   Epoch: 5   Global Step: 89040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:33,071-Speed 9732.54 samples/sec   Loss 7.3127   LearningRate 0.0538   Epoch: 5   Global Step: 89050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:34,160-Speed 9413.80 samples/sec   Loss 7.3130   LearningRate 0.0538   Epoch: 5   Global Step: 89060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:35,243-Speed 9453.04 samples/sec   Loss 7.2717   LearningRate 0.0538   Epoch: 5   Global Step: 89070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:36,337-Speed 9366.05 samples/sec   Loss 7.3004   LearningRate 0.0538   Epoch: 5   Global Step: 89080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:37,402-Speed 9622.14 samples/sec   Loss 7.2294   LearningRate 0.0537   Epoch: 5   Global Step: 89090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:38,462-Speed 9667.26 samples/sec   Loss 7.2639   LearningRate 0.0537   Epoch: 5   Global Step: 89100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:39,511-Speed 9771.45 samples/sec   Loss 7.3230   LearningRate 0.0537   Epoch: 5   Global Step: 89110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:40,587-Speed 9518.65 samples/sec   Loss 7.2323   LearningRate 0.0537   Epoch: 5   Global Step: 89120   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:07:41,635-Speed 9770.90 samples/sec   Loss 7.3194   LearningRate 0.0537   Epoch: 5   Global Step: 89130   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:07:42,728-Speed 9383.63 samples/sec   Loss 7.2325   LearningRate 0.0537   Epoch: 5   Global Step: 89140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:43,794-Speed 9613.45 samples/sec   Loss 7.2575   LearningRate 0.0537   Epoch: 5   Global Step: 89150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:44,856-Speed 9646.02 samples/sec   Loss 7.2808   LearningRate 0.0537   Epoch: 5   Global Step: 89160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:45,959-Speed 9289.88 samples/sec   Loss 7.2186   LearningRate 0.0537   Epoch: 5   Global Step: 89170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:47,026-Speed 9606.69 samples/sec   Loss 7.1181   LearningRate 0.0537   Epoch: 5   Global Step: 89180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:48,066-Speed 9854.31 samples/sec   Loss 7.2453   LearningRate 0.0537   Epoch: 5   Global Step: 89190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:49,170-Speed 9282.18 samples/sec   Loss 7.3231   LearningRate 0.0537   Epoch: 5   Global Step: 89200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:50,224-Speed 9713.51 samples/sec   Loss 7.2986   LearningRate 0.0537   Epoch: 5   Global Step: 89210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:51,350-Speed 9109.91 samples/sec   Loss 7.1935   LearningRate 0.0537   Epoch: 5   Global Step: 89220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:52,390-Speed 9846.34 samples/sec   Loss 7.2490   LearningRate 0.0537   Epoch: 5   Global Step: 89230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:53,450-Speed 9661.97 samples/sec   Loss 7.2426   LearningRate 0.0537   Epoch: 5   Global Step: 89240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:54,549-Speed 9324.27 samples/sec   Loss 7.3008   LearningRate 0.0537   Epoch: 5   Global Step: 89250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:55,629-Speed 9487.65 samples/sec   Loss 7.2556   LearningRate 0.0537   Epoch: 5   Global Step: 89260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:56,685-Speed 9701.40 samples/sec   Loss 7.1662   LearningRate 0.0537   Epoch: 5   Global Step: 89270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:57,803-Speed 9165.41 samples/sec   Loss 7.2939   LearningRate 0.0537   Epoch: 5   Global Step: 89280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:58,853-Speed 9758.31 samples/sec   Loss 7.2763   LearningRate 0.0537   Epoch: 5   Global Step: 89290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:07:59,935-Speed 9478.03 samples/sec   Loss 7.3149   LearningRate 0.0537   Epoch: 5   Global Step: 89300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:01,021-Speed 9434.07 samples/sec   Loss 7.4099   LearningRate 0.0536   Epoch: 5   Global Step: 89310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:02,101-Speed 9485.92 samples/sec   Loss 7.4413   LearningRate 0.0536   Epoch: 5   Global Step: 89320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:03,200-Speed 9322.91 samples/sec   Loss 7.2388   LearningRate 0.0536   Epoch: 5   Global Step: 89330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:04,264-Speed 9631.38 samples/sec   Loss 7.1874   LearningRate 0.0536   Epoch: 5   Global Step: 89340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:05,329-Speed 9619.27 samples/sec   Loss 7.2322   LearningRate 0.0536   Epoch: 5   Global Step: 89350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:06,374-Speed 9811.87 samples/sec   Loss 7.2797   LearningRate 0.0536   Epoch: 5   Global Step: 89360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:07,472-Speed 9324.83 samples/sec   Loss 7.2639   LearningRate 0.0536   Epoch: 5   Global Step: 89370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:08,604-Speed 9056.85 samples/sec   Loss 7.3196   LearningRate 0.0536   Epoch: 5   Global Step: 89380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:09,673-Speed 9582.38 samples/sec   Loss 7.2919   LearningRate 0.0536   Epoch: 5   Global Step: 89390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:10,757-Speed 9452.77 samples/sec   Loss 7.3398   LearningRate 0.0536   Epoch: 5   Global Step: 89400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:11,849-Speed 9383.10 samples/sec   Loss 7.2938   LearningRate 0.0536   Epoch: 5   Global Step: 89410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:12,893-Speed 9807.40 samples/sec   Loss 7.3769   LearningRate 0.0536   Epoch: 5   Global Step: 89420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:13,978-Speed 9445.97 samples/sec   Loss 7.4404   LearningRate 0.0536   Epoch: 5   Global Step: 89430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:15,027-Speed 9767.00 samples/sec   Loss 7.2070   LearningRate 0.0536   Epoch: 5   Global Step: 89440   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:08:16,084-Speed 9694.14 samples/sec   Loss 7.2638   LearningRate 0.0536   Epoch: 5   Global Step: 89450   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:08:17,124-Speed 9856.18 samples/sec   Loss 7.2539   LearningRate 0.0536   Epoch: 5   Global Step: 89460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:18,205-Speed 9479.68 samples/sec   Loss 7.2972   LearningRate 0.0536   Epoch: 5   Global Step: 89470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:19,270-Speed 9624.89 samples/sec   Loss 7.2805   LearningRate 0.0536   Epoch: 5   Global Step: 89480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:20,361-Speed 9393.95 samples/sec   Loss 7.2298   LearningRate 0.0536   Epoch: 5   Global Step: 89490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:21,469-Speed 9251.99 samples/sec   Loss 7.3183   LearningRate 0.0536   Epoch: 5   Global Step: 89500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:22,562-Speed 9371.94 samples/sec   Loss 7.3187   LearningRate 0.0536   Epoch: 5   Global Step: 89510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:23,648-Speed 9434.82 samples/sec   Loss 7.2868   LearningRate 0.0536   Epoch: 5   Global Step: 89520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:24,724-Speed 9522.26 samples/sec   Loss 7.2625   LearningRate 0.0536   Epoch: 5   Global Step: 89530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:25,821-Speed 9335.44 samples/sec   Loss 7.2291   LearningRate 0.0535   Epoch: 5   Global Step: 89540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:26,880-Speed 9670.85 samples/sec   Loss 7.2066   LearningRate 0.0535   Epoch: 5   Global Step: 89550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:27,950-Speed 9576.02 samples/sec   Loss 7.3333   LearningRate 0.0535   Epoch: 5   Global Step: 89560   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:08:29,021-Speed 9571.31 samples/sec   Loss 7.3928   LearningRate 0.0535   Epoch: 5   Global Step: 89570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:08:30,103-Speed 9468.06 samples/sec   Loss 7.3194   LearningRate 0.0535   Epoch: 5   Global Step: 89580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:08:31,196-Speed 9373.85 samples/sec   Loss 7.2056   LearningRate 0.0535   Epoch: 5   Global Step: 89590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:08:32,357-Speed 8824.97 samples/sec   Loss 7.2641   LearningRate 0.0535   Epoch: 5   Global Step: 89600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:08:33,441-Speed 9455.39 samples/sec   Loss 7.3140   LearningRate 0.0535   Epoch: 5   Global Step: 89610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:08:34,541-Speed 9318.63 samples/sec   Loss 7.2795   LearningRate 0.0535   Epoch: 5   Global Step: 89620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:08:35,646-Speed 9264.78 samples/sec   Loss 7.2566   LearningRate 0.0535   Epoch: 5   Global Step: 89630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:08:36,749-Speed 9295.04 samples/sec   Loss 7.3047   LearningRate 0.0535   Epoch: 5   Global Step: 89640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:08:37,839-Speed 9401.82 samples/sec   Loss 7.2448   LearningRate 0.0535   Epoch: 5   Global Step: 89650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:08:38,910-Speed 9567.06 samples/sec   Loss 7.1749   LearningRate 0.0535   Epoch: 5   Global Step: 89660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:08:39,973-Speed 9633.54 samples/sec   Loss 7.2190   LearningRate 0.0535   Epoch: 5   Global Step: 89670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:41,026-Speed 9730.95 samples/sec   Loss 7.2320   LearningRate 0.0535   Epoch: 5   Global Step: 89680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:42,082-Speed 9704.14 samples/sec   Loss 7.2680   LearningRate 0.0535   Epoch: 5   Global Step: 89690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:43,103-Speed 10028.83 samples/sec   Loss 7.1828   LearningRate 0.0535   Epoch: 5   Global Step: 89700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:44,207-Speed 9287.64 samples/sec   Loss 7.2526   LearningRate 0.0535   Epoch: 5   Global Step: 89710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:45,318-Speed 9220.45 samples/sec   Loss 7.2910   LearningRate 0.0535   Epoch: 5   Global Step: 89720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:46,413-Speed 9355.87 samples/sec   Loss 7.2496   LearningRate 0.0535   Epoch: 5   Global Step: 89730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:47,441-Speed 9964.27 samples/sec   Loss 7.2257   LearningRate 0.0535   Epoch: 5   Global Step: 89740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:48,521-Speed 9488.11 samples/sec   Loss 7.3270   LearningRate 0.0535   Epoch: 5   Global Step: 89750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:49,604-Speed 9459.59 samples/sec   Loss 7.1199   LearningRate 0.0535   Epoch: 5   Global Step: 89760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:50,666-Speed 9653.68 samples/sec   Loss 7.1530   LearningRate 0.0534   Epoch: 5   Global Step: 89770   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:08:51,760-Speed 9368.35 samples/sec   Loss 7.2049   LearningRate 0.0534   Epoch: 5   Global Step: 89780   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:08:52,801-Speed 9837.41 samples/sec   Loss 7.3182   LearningRate 0.0534   Epoch: 5   Global Step: 89790   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:08:53,878-Speed 9513.74 samples/sec   Loss 7.2601   LearningRate 0.0534   Epoch: 5   Global Step: 89800   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:08:54,975-Speed 9338.35 samples/sec   Loss 7.2675   LearningRate 0.0534   Epoch: 5   Global Step: 89810   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:08:56,042-Speed 9604.08 samples/sec   Loss 7.3601   LearningRate 0.0534   Epoch: 5   Global Step: 89820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:57,092-Speed 9766.11 samples/sec   Loss 7.2794   LearningRate 0.0534   Epoch: 5   Global Step: 89830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:58,188-Speed 9342.97 samples/sec   Loss 7.2747   LearningRate 0.0534   Epoch: 5   Global Step: 89840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:08:59,240-Speed 9735.86 samples/sec   Loss 7.2578   LearningRate 0.0534   Epoch: 5   Global Step: 89850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:09:00,310-Speed 9579.17 samples/sec   Loss 7.3317   LearningRate 0.0534   Epoch: 5   Global Step: 89860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:09:01,377-Speed 9604.94 samples/sec   Loss 7.3207   LearningRate 0.0534   Epoch: 5   Global Step: 89870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:09:02,450-Speed 9546.41 samples/sec   Loss 7.3232   LearningRate 0.0534   Epoch: 5   Global Step: 89880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:09:03,510-Speed 9673.20 samples/sec   Loss 7.3537   LearningRate 0.0534   Epoch: 5   Global Step: 89890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:09:04,564-Speed 9714.65 samples/sec   Loss 7.2833   LearningRate 0.0534   Epoch: 5   Global Step: 89900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:09:05,623-Speed 9682.35 samples/sec   Loss 7.2152   LearningRate 0.0534   Epoch: 5   Global Step: 89910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:09:06,710-Speed 9419.11 samples/sec   Loss 7.2596   LearningRate 0.0534   Epoch: 5   Global Step: 89920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:09:07,777-Speed 9603.42 samples/sec   Loss 7.3491   LearningRate 0.0534   Epoch: 5   Global Step: 89930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:09:08,871-Speed 9367.45 samples/sec   Loss 7.3233   LearningRate 0.0534   Epoch: 5   Global Step: 89940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:09:09,957-Speed 9432.50 samples/sec   Loss 7.2545   LearningRate 0.0534   Epoch: 5   Global Step: 89950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:09:11,022-Speed 9621.47 samples/sec   Loss 7.2058   LearningRate 0.0534   Epoch: 5   Global Step: 89960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:09:12,132-Speed 9228.38 samples/sec   Loss 7.1736   LearningRate 0.0534   Epoch: 5   Global Step: 89970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:09:13,222-Speed 9401.11 samples/sec   Loss 7.1605   LearningRate 0.0534   Epoch: 5   Global Step: 89980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:09:14,315-Speed 9380.71 samples/sec   Loss 7.1934   LearningRate 0.0534   Epoch: 5   Global Step: 89990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:09:15,341-Speed 9988.35 samples/sec   Loss 7.0752   LearningRate 0.0533   Epoch: 5   Global Step: 90000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:09:37,542-[lfw][90000]XNorm: 11.824306
Training: 2022-04-11 15:09:37,543-[lfw][90000]Accuracy-Flip: 0.99683+-0.00293
Training: 2022-04-11 15:09:37,543-[lfw][90000]Accuracy-Highest: 0.99683
Training: 2022-04-11 15:10:02,905-[cfp_fp][90000]XNorm: 9.925898
Training: 2022-04-11 15:10:02,906-[cfp_fp][90000]Accuracy-Flip: 0.95614+-0.01136
Training: 2022-04-11 15:10:02,906-[cfp_fp][90000]Accuracy-Highest: 0.95729
Training: 2022-04-11 15:10:24,923-[agedb_30][90000]XNorm: 11.266069
Training: 2022-04-11 15:10:24,924-[agedb_30][90000]Accuracy-Flip: 0.96033+-0.01137
Training: 2022-04-11 15:10:24,924-[agedb_30][90000]Accuracy-Highest: 0.96317
Training: 2022-04-11 15:10:25,965-Speed 144.99 samples/sec   Loss 7.2633   LearningRate 0.0533   Epoch: 5   Global Step: 90010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:27,027-Speed 9648.04 samples/sec   Loss 7.2771   LearningRate 0.0533   Epoch: 5   Global Step: 90020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:28,103-Speed 9514.15 samples/sec   Loss 7.3387   LearningRate 0.0533   Epoch: 5   Global Step: 90030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:29,217-Speed 9198.54 samples/sec   Loss 7.2848   LearningRate 0.0533   Epoch: 5   Global Step: 90040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:30,288-Speed 9568.60 samples/sec   Loss 7.2997   LearningRate 0.0533   Epoch: 5   Global Step: 90050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:31,326-Speed 9873.35 samples/sec   Loss 7.3003   LearningRate 0.0533   Epoch: 5   Global Step: 90060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:32,412-Speed 9440.58 samples/sec   Loss 7.4046   LearningRate 0.0533   Epoch: 5   Global Step: 90070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:33,506-Speed 9366.49 samples/sec   Loss 7.1505   LearningRate 0.0533   Epoch: 5   Global Step: 90080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:34,584-Speed 9502.27 samples/sec   Loss 7.2262   LearningRate 0.0533   Epoch: 5   Global Step: 90090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:35,668-Speed 9446.56 samples/sec   Loss 7.3198   LearningRate 0.0533   Epoch: 5   Global Step: 90100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:36,770-Speed 9298.58 samples/sec   Loss 7.1683   LearningRate 0.0533   Epoch: 5   Global Step: 90110   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:10:37,910-Speed 8991.45 samples/sec   Loss 7.1661   LearningRate 0.0533   Epoch: 5   Global Step: 90120   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:10:38,965-Speed 9710.41 samples/sec   Loss 7.1398   LearningRate 0.0533   Epoch: 5   Global Step: 90130   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:10:40,022-Speed 9689.25 samples/sec   Loss 7.1928   LearningRate 0.0533   Epoch: 5   Global Step: 90140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:41,068-Speed 9797.89 samples/sec   Loss 7.2951   LearningRate 0.0533   Epoch: 5   Global Step: 90150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:42,124-Speed 9704.84 samples/sec   Loss 7.2890   LearningRate 0.0533   Epoch: 5   Global Step: 90160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:43,146-Speed 10023.72 samples/sec   Loss 7.2098   LearningRate 0.0533   Epoch: 5   Global Step: 90170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:44,206-Speed 9670.75 samples/sec   Loss 7.1408   LearningRate 0.0533   Epoch: 5   Global Step: 90180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:45,262-Speed 9695.23 samples/sec   Loss 7.2612   LearningRate 0.0533   Epoch: 5   Global Step: 90190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:46,328-Speed 9616.38 samples/sec   Loss 7.2398   LearningRate 0.0533   Epoch: 5   Global Step: 90200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:47,405-Speed 9518.12 samples/sec   Loss 7.2555   LearningRate 0.0533   Epoch: 5   Global Step: 90210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:48,505-Speed 9313.14 samples/sec   Loss 7.2702   LearningRate 0.0533   Epoch: 5   Global Step: 90220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:49,619-Speed 9197.80 samples/sec   Loss 7.3979   LearningRate 0.0532   Epoch: 5   Global Step: 90230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:50,701-Speed 9468.65 samples/sec   Loss 7.3166   LearningRate 0.0532   Epoch: 5   Global Step: 90240   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:10:51,771-Speed 9579.76 samples/sec   Loss 7.2650   LearningRate 0.0532   Epoch: 5   Global Step: 90250   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:10:52,844-Speed 9552.13 samples/sec   Loss 7.2679   LearningRate 0.0532   Epoch: 5   Global Step: 90260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:53,938-Speed 9361.52 samples/sec   Loss 7.3577   LearningRate 0.0532   Epoch: 5   Global Step: 90270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:54,983-Speed 9801.47 samples/sec   Loss 7.4056   LearningRate 0.0532   Epoch: 5   Global Step: 90280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:56,040-Speed 9693.52 samples/sec   Loss 7.3131   LearningRate 0.0532   Epoch: 5   Global Step: 90290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:57,082-Speed 9829.20 samples/sec   Loss 7.2398   LearningRate 0.0532   Epoch: 5   Global Step: 90300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:58,162-Speed 9491.69 samples/sec   Loss 7.3544   LearningRate 0.0532   Epoch: 5   Global Step: 90310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:10:59,228-Speed 9611.89 samples/sec   Loss 7.3471   LearningRate 0.0532   Epoch: 5   Global Step: 90320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:00,319-Speed 9391.76 samples/sec   Loss 7.2652   LearningRate 0.0532   Epoch: 5   Global Step: 90330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:01,453-Speed 9036.23 samples/sec   Loss 7.2111   LearningRate 0.0532   Epoch: 5   Global Step: 90340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:02,528-Speed 9530.50 samples/sec   Loss 7.1966   LearningRate 0.0532   Epoch: 5   Global Step: 90350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:03,599-Speed 9574.97 samples/sec   Loss 7.3015   LearningRate 0.0532   Epoch: 5   Global Step: 90360   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:11:04,681-Speed 9468.53 samples/sec   Loss 7.2499   LearningRate 0.0532   Epoch: 5   Global Step: 90370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:05,783-Speed 9297.51 samples/sec   Loss 7.3534   LearningRate 0.0532   Epoch: 5   Global Step: 90380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:06,846-Speed 9648.55 samples/sec   Loss 7.3242   LearningRate 0.0532   Epoch: 5   Global Step: 90390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:07,899-Speed 9724.89 samples/sec   Loss 7.2690   LearningRate 0.0532   Epoch: 5   Global Step: 90400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:08,959-Speed 9665.27 samples/sec   Loss 7.2614   LearningRate 0.0532   Epoch: 5   Global Step: 90410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:09,995-Speed 9888.91 samples/sec   Loss 7.2707   LearningRate 0.0532   Epoch: 5   Global Step: 90420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:11,058-Speed 9638.94 samples/sec   Loss 7.2717   LearningRate 0.0532   Epoch: 5   Global Step: 90430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:12,129-Speed 9569.18 samples/sec   Loss 7.2718   LearningRate 0.0532   Epoch: 5   Global Step: 90440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:13,208-Speed 9499.68 samples/sec   Loss 7.3581   LearningRate 0.0532   Epoch: 5   Global Step: 90450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:14,296-Speed 9415.86 samples/sec   Loss 7.2643   LearningRate 0.0531   Epoch: 5   Global Step: 90460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:15,386-Speed 9398.68 samples/sec   Loss 7.2733   LearningRate 0.0531   Epoch: 5   Global Step: 90470   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:11:16,443-Speed 9693.50 samples/sec   Loss 7.3848   LearningRate 0.0531   Epoch: 5   Global Step: 90480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:17,555-Speed 9213.13 samples/sec   Loss 7.2662   LearningRate 0.0531   Epoch: 5   Global Step: 90490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:18,661-Speed 9268.84 samples/sec   Loss 7.2872   LearningRate 0.0531   Epoch: 5   Global Step: 90500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:19,778-Speed 9173.77 samples/sec   Loss 7.2741   LearningRate 0.0531   Epoch: 5   Global Step: 90510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:20,896-Speed 9165.91 samples/sec   Loss 7.3066   LearningRate 0.0531   Epoch: 5   Global Step: 90520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:21,989-Speed 9373.77 samples/sec   Loss 7.2907   LearningRate 0.0531   Epoch: 5   Global Step: 90530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:23,092-Speed 9285.95 samples/sec   Loss 7.3326   LearningRate 0.0531   Epoch: 5   Global Step: 90540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:24,168-Speed 9523.38 samples/sec   Loss 7.3802   LearningRate 0.0531   Epoch: 5   Global Step: 90550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:25,297-Speed 9073.78 samples/sec   Loss 7.3531   LearningRate 0.0531   Epoch: 5   Global Step: 90560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:26,374-Speed 9522.20 samples/sec   Loss 7.2246   LearningRate 0.0531   Epoch: 5   Global Step: 90570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:27,421-Speed 9778.27 samples/sec   Loss 7.1992   LearningRate 0.0531   Epoch: 5   Global Step: 90580   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:11:28,474-Speed 9734.40 samples/sec   Loss 7.1514   LearningRate 0.0531   Epoch: 5   Global Step: 90590   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:11:29,567-Speed 9370.45 samples/sec   Loss 7.2038   LearningRate 0.0531   Epoch: 5   Global Step: 90600   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:11:30,670-Speed 9288.24 samples/sec   Loss 7.2748   LearningRate 0.0531   Epoch: 5   Global Step: 90610   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:11:31,781-Speed 9223.87 samples/sec   Loss 7.3119   LearningRate 0.0531   Epoch: 5   Global Step: 90620   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:11:32,878-Speed 9343.23 samples/sec   Loss 7.2323   LearningRate 0.0531   Epoch: 5   Global Step: 90630   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:11:33,946-Speed 9586.22 samples/sec   Loss 7.2638   LearningRate 0.0531   Epoch: 5   Global Step: 90640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:35,031-Speed 9448.86 samples/sec   Loss 7.3030   LearningRate 0.0531   Epoch: 5   Global Step: 90650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:36,117-Speed 9430.77 samples/sec   Loss 7.2508   LearningRate 0.0531   Epoch: 5   Global Step: 90660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:37,183-Speed 9617.21 samples/sec   Loss 7.3668   LearningRate 0.0531   Epoch: 5   Global Step: 90670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:38,274-Speed 9388.39 samples/sec   Loss 7.3343   LearningRate 0.0531   Epoch: 5   Global Step: 90680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:39,356-Speed 9469.42 samples/sec   Loss 7.2343   LearningRate 0.0530   Epoch: 5   Global Step: 90690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:40,434-Speed 9510.77 samples/sec   Loss 7.3088   LearningRate 0.0530   Epoch: 5   Global Step: 90700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:41,521-Speed 9424.50 samples/sec   Loss 7.2661   LearningRate 0.0530   Epoch: 5   Global Step: 90710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:42,620-Speed 9320.04 samples/sec   Loss 7.2227   LearningRate 0.0530   Epoch: 5   Global Step: 90720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:43,695-Speed 9532.94 samples/sec   Loss 7.3142   LearningRate 0.0530   Epoch: 5   Global Step: 90730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:44,777-Speed 9476.49 samples/sec   Loss 7.2932   LearningRate 0.0530   Epoch: 5   Global Step: 90740   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:11:45,844-Speed 9598.30 samples/sec   Loss 7.2476   LearningRate 0.0530   Epoch: 5   Global Step: 90750   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:11:46,969-Speed 9110.54 samples/sec   Loss 7.2018   LearningRate 0.0530   Epoch: 5   Global Step: 90760   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:11:48,074-Speed 9266.35 samples/sec   Loss 7.2021   LearningRate 0.0530   Epoch: 5   Global Step: 90770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:49,179-Speed 9277.58 samples/sec   Loss 7.2992   LearningRate 0.0530   Epoch: 5   Global Step: 90780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:50,238-Speed 9672.27 samples/sec   Loss 7.2137   LearningRate 0.0530   Epoch: 5   Global Step: 90790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:51,273-Speed 9900.21 samples/sec   Loss 7.3857   LearningRate 0.0530   Epoch: 5   Global Step: 90800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:52,398-Speed 9112.98 samples/sec   Loss 7.1940   LearningRate 0.0530   Epoch: 5   Global Step: 90810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:53,460-Speed 9647.70 samples/sec   Loss 7.2841   LearningRate 0.0530   Epoch: 5   Global Step: 90820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:54,561-Speed 9301.49 samples/sec   Loss 7.1402   LearningRate 0.0530   Epoch: 5   Global Step: 90830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:55,685-Speed 9112.68 samples/sec   Loss 7.2377   LearningRate 0.0530   Epoch: 5   Global Step: 90840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:56,787-Speed 9302.74 samples/sec   Loss 7.2046   LearningRate 0.0530   Epoch: 5   Global Step: 90850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:57,874-Speed 9431.36 samples/sec   Loss 7.2756   LearningRate 0.0530   Epoch: 5   Global Step: 90860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:11:58,974-Speed 9316.88 samples/sec   Loss 7.2893   LearningRate 0.0530   Epoch: 5   Global Step: 90870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:00,095-Speed 9135.42 samples/sec   Loss 7.2982   LearningRate 0.0530   Epoch: 5   Global Step: 90880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:01,135-Speed 9858.74 samples/sec   Loss 7.3057   LearningRate 0.0530   Epoch: 5   Global Step: 90890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:02,205-Speed 9575.29 samples/sec   Loss 7.1487   LearningRate 0.0530   Epoch: 5   Global Step: 90900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:03,336-Speed 9059.59 samples/sec   Loss 7.1653   LearningRate 0.0530   Epoch: 5   Global Step: 90910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:04,449-Speed 9207.01 samples/sec   Loss 7.2241   LearningRate 0.0529   Epoch: 5   Global Step: 90920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:05,520-Speed 9566.13 samples/sec   Loss 7.1790   LearningRate 0.0529   Epoch: 5   Global Step: 90930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:06,542-Speed 10025.40 samples/sec   Loss 7.3191   LearningRate 0.0529   Epoch: 5   Global Step: 90940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:07,632-Speed 9400.20 samples/sec   Loss 7.2962   LearningRate 0.0529   Epoch: 5   Global Step: 90950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:08,685-Speed 9732.68 samples/sec   Loss 7.2646   LearningRate 0.0529   Epoch: 5   Global Step: 90960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:09,742-Speed 9695.24 samples/sec   Loss 7.3647   LearningRate 0.0529   Epoch: 5   Global Step: 90970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:10,796-Speed 9721.44 samples/sec   Loss 7.1794   LearningRate 0.0529   Epoch: 5   Global Step: 90980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:11,893-Speed 9336.86 samples/sec   Loss 7.3358   LearningRate 0.0529   Epoch: 5   Global Step: 90990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:12,972-Speed 9495.95 samples/sec   Loss 7.2790   LearningRate 0.0529   Epoch: 5   Global Step: 91000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:14,016-Speed 9815.54 samples/sec   Loss 7.2462   LearningRate 0.0529   Epoch: 5   Global Step: 91010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:15,058-Speed 9829.92 samples/sec   Loss 7.2503   LearningRate 0.0529   Epoch: 5   Global Step: 91020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:16,134-Speed 9523.46 samples/sec   Loss 7.2165   LearningRate 0.0529   Epoch: 5   Global Step: 91030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:17,193-Speed 9673.39 samples/sec   Loss 7.2316   LearningRate 0.0529   Epoch: 5   Global Step: 91040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:18,235-Speed 9837.17 samples/sec   Loss 7.2712   LearningRate 0.0529   Epoch: 5   Global Step: 91050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:19,317-Speed 9470.08 samples/sec   Loss 7.2707   LearningRate 0.0529   Epoch: 5   Global Step: 91060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:20,427-Speed 9228.92 samples/sec   Loss 7.2563   LearningRate 0.0529   Epoch: 5   Global Step: 91070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:21,479-Speed 9743.05 samples/sec   Loss 7.3602   LearningRate 0.0529   Epoch: 5   Global Step: 91080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:22,546-Speed 9605.09 samples/sec   Loss 7.1826   LearningRate 0.0529   Epoch: 5   Global Step: 91090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:23,591-Speed 9803.54 samples/sec   Loss 7.2620   LearningRate 0.0529   Epoch: 5   Global Step: 91100   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:12:24,675-Speed 9450.49 samples/sec   Loss 7.3664   LearningRate 0.0529   Epoch: 5   Global Step: 91110   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:12:25,720-Speed 9808.62 samples/sec   Loss 7.2046   LearningRate 0.0529   Epoch: 5   Global Step: 91120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:26,780-Speed 9666.65 samples/sec   Loss 7.2754   LearningRate 0.0529   Epoch: 5   Global Step: 91130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:27,861-Speed 9480.67 samples/sec   Loss 7.2674   LearningRate 0.0528   Epoch: 5   Global Step: 91140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:28,964-Speed 9288.15 samples/sec   Loss 7.1683   LearningRate 0.0528   Epoch: 5   Global Step: 91150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:30,079-Speed 9192.32 samples/sec   Loss 7.2620   LearningRate 0.0528   Epoch: 5   Global Step: 91160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:31,115-Speed 9881.02 samples/sec   Loss 7.2032   LearningRate 0.0528   Epoch: 5   Global Step: 91170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:32,179-Speed 9633.44 samples/sec   Loss 7.3672   LearningRate 0.0528   Epoch: 5   Global Step: 91180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:33,281-Speed 9294.76 samples/sec   Loss 7.1983   LearningRate 0.0528   Epoch: 5   Global Step: 91190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:34,371-Speed 9404.67 samples/sec   Loss 7.2775   LearningRate 0.0528   Epoch: 5   Global Step: 91200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:35,439-Speed 9587.91 samples/sec   Loss 7.2119   LearningRate 0.0528   Epoch: 5   Global Step: 91210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:36,477-Speed 9867.84 samples/sec   Loss 7.2229   LearningRate 0.0528   Epoch: 5   Global Step: 91220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:37,569-Speed 9390.02 samples/sec   Loss 7.3142   LearningRate 0.0528   Epoch: 5   Global Step: 91230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:38,676-Speed 9251.91 samples/sec   Loss 7.1981   LearningRate 0.0528   Epoch: 5   Global Step: 91240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:39,720-Speed 9821.93 samples/sec   Loss 7.3254   LearningRate 0.0528   Epoch: 5   Global Step: 91250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:40,765-Speed 9802.52 samples/sec   Loss 7.3001   LearningRate 0.0528   Epoch: 5   Global Step: 91260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:41,838-Speed 9550.53 samples/sec   Loss 7.2309   LearningRate 0.0528   Epoch: 5   Global Step: 91270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:42,898-Speed 9668.76 samples/sec   Loss 7.1929   LearningRate 0.0528   Epoch: 5   Global Step: 91280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:43,947-Speed 9766.11 samples/sec   Loss 7.2254   LearningRate 0.0528   Epoch: 5   Global Step: 91290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:45,046-Speed 9319.73 samples/sec   Loss 7.3564   LearningRate 0.0528   Epoch: 5   Global Step: 91300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:46,139-Speed 9373.03 samples/sec   Loss 7.3663   LearningRate 0.0528   Epoch: 5   Global Step: 91310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:47,222-Speed 9459.40 samples/sec   Loss 7.3218   LearningRate 0.0528   Epoch: 5   Global Step: 91320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:12:48,323-Speed 9310.40 samples/sec   Loss 7.3162   LearningRate 0.0528   Epoch: 5   Global Step: 91330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:49,394-Speed 9571.24 samples/sec   Loss 7.1747   LearningRate 0.0528   Epoch: 5   Global Step: 91340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:50,460-Speed 9607.94 samples/sec   Loss 7.3337   LearningRate 0.0528   Epoch: 5   Global Step: 91350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:51,542-Speed 9472.13 samples/sec   Loss 7.2526   LearningRate 0.0528   Epoch: 5   Global Step: 91360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:52,605-Speed 9639.69 samples/sec   Loss 7.2713   LearningRate 0.0527   Epoch: 5   Global Step: 91370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:53,662-Speed 9687.72 samples/sec   Loss 7.1254   LearningRate 0.0527   Epoch: 5   Global Step: 91380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:54,737-Speed 9536.01 samples/sec   Loss 7.2435   LearningRate 0.0527   Epoch: 5   Global Step: 91390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:55,815-Speed 9509.20 samples/sec   Loss 7.3421   LearningRate 0.0527   Epoch: 5   Global Step: 91400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:56,904-Speed 9402.30 samples/sec   Loss 7.2123   LearningRate 0.0527   Epoch: 5   Global Step: 91410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:57,976-Speed 9557.47 samples/sec   Loss 7.3446   LearningRate 0.0527   Epoch: 5   Global Step: 91420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:12:59,034-Speed 9689.65 samples/sec   Loss 7.3795   LearningRate 0.0527   Epoch: 5   Global Step: 91430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:00,077-Speed 9824.78 samples/sec   Loss 7.2388   LearningRate 0.0527   Epoch: 5   Global Step: 91440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:01,113-Speed 9889.26 samples/sec   Loss 7.3252   LearningRate 0.0527   Epoch: 5   Global Step: 91450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:02,211-Speed 9331.21 samples/sec   Loss 7.3081   LearningRate 0.0527   Epoch: 5   Global Step: 91460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:03,292-Speed 9481.31 samples/sec   Loss 7.3657   LearningRate 0.0527   Epoch: 5   Global Step: 91470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:04,357-Speed 9614.65 samples/sec   Loss 7.2449   LearningRate 0.0527   Epoch: 5   Global Step: 91480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:05,413-Speed 9708.28 samples/sec   Loss 7.3098   LearningRate 0.0527   Epoch: 5   Global Step: 91490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:06,483-Speed 9574.49 samples/sec   Loss 7.4067   LearningRate 0.0527   Epoch: 5   Global Step: 91500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:07,574-Speed 9390.68 samples/sec   Loss 7.3310   LearningRate 0.0527   Epoch: 5   Global Step: 91510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:08,643-Speed 9584.34 samples/sec   Loss 7.4304   LearningRate 0.0527   Epoch: 5   Global Step: 91520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:09,689-Speed 9798.83 samples/sec   Loss 7.4062   LearningRate 0.0527   Epoch: 5   Global Step: 91530   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:13:10,755-Speed 9610.89 samples/sec   Loss 7.2591   LearningRate 0.0527   Epoch: 5   Global Step: 91540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:11,805-Speed 9758.90 samples/sec   Loss 7.2415   LearningRate 0.0527   Epoch: 5   Global Step: 91550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:12,872-Speed 9594.75 samples/sec   Loss 7.2952   LearningRate 0.0527   Epoch: 5   Global Step: 91560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:13,955-Speed 9468.03 samples/sec   Loss 7.2804   LearningRate 0.0527   Epoch: 5   Global Step: 91570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:15,035-Speed 9483.54 samples/sec   Loss 7.2771   LearningRate 0.0527   Epoch: 5   Global Step: 91580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:16,086-Speed 9746.37 samples/sec   Loss 7.2460   LearningRate 0.0527   Epoch: 5   Global Step: 91590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:17,173-Speed 9433.14 samples/sec   Loss 7.2716   LearningRate 0.0526   Epoch: 5   Global Step: 91600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:18,279-Speed 9260.66 samples/sec   Loss 7.3761   LearningRate 0.0526   Epoch: 5   Global Step: 91610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:19,379-Speed 9315.73 samples/sec   Loss 7.3919   LearningRate 0.0526   Epoch: 5   Global Step: 91620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:20,444-Speed 9627.04 samples/sec   Loss 7.3235   LearningRate 0.0526   Epoch: 5   Global Step: 91630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:21,511-Speed 9600.11 samples/sec   Loss 7.2882   LearningRate 0.0526   Epoch: 5   Global Step: 91640   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:13:22,583-Speed 9553.89 samples/sec   Loss 7.2226   LearningRate 0.0526   Epoch: 5   Global Step: 91650   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:13:23,645-Speed 9647.12 samples/sec   Loss 7.3404   LearningRate 0.0526   Epoch: 5   Global Step: 91660   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:13:24,711-Speed 9614.42 samples/sec   Loss 7.3163   LearningRate 0.0526   Epoch: 5   Global Step: 91670   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:13:25,769-Speed 9677.82 samples/sec   Loss 7.2647   LearningRate 0.0526   Epoch: 5   Global Step: 91680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:26,815-Speed 9806.08 samples/sec   Loss 7.2928   LearningRate 0.0526   Epoch: 5   Global Step: 91690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:13:27,868-Speed 9730.12 samples/sec   Loss 7.2937   LearningRate 0.0526   Epoch: 5   Global Step: 91700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:13:28,953-Speed 9438.74 samples/sec   Loss 7.3811   LearningRate 0.0526   Epoch: 5   Global Step: 91710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:13:30,055-Speed 9298.82 samples/sec   Loss 7.2759   LearningRate 0.0526   Epoch: 5   Global Step: 91720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:13:31,148-Speed 9374.80 samples/sec   Loss 7.2121   LearningRate 0.0526   Epoch: 5   Global Step: 91730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:13:32,222-Speed 9539.14 samples/sec   Loss 7.2860   LearningRate 0.0526   Epoch: 5   Global Step: 91740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:13:33,301-Speed 9494.00 samples/sec   Loss 7.3403   LearningRate 0.0526   Epoch: 5   Global Step: 91750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:13:34,397-Speed 9350.86 samples/sec   Loss 7.2865   LearningRate 0.0526   Epoch: 5   Global Step: 91760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:13:35,496-Speed 9319.94 samples/sec   Loss 7.1998   LearningRate 0.0526   Epoch: 5   Global Step: 91770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:13:36,596-Speed 9316.88 samples/sec   Loss 7.2617   LearningRate 0.0526   Epoch: 5   Global Step: 91780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:13:37,670-Speed 9538.23 samples/sec   Loss 7.2545   LearningRate 0.0526   Epoch: 5   Global Step: 91790   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:38,738-Speed 9589.41 samples/sec   Loss 7.1446   LearningRate 0.0526   Epoch: 5   Global Step: 91800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:39,791-Speed 9741.87 samples/sec   Loss 7.2807   LearningRate 0.0526   Epoch: 5   Global Step: 91810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:40,870-Speed 9495.13 samples/sec   Loss 7.2407   LearningRate 0.0526   Epoch: 5   Global Step: 91820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:41,927-Speed 9690.02 samples/sec   Loss 7.2696   LearningRate 0.0525   Epoch: 5   Global Step: 91830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:42,988-Speed 9651.80 samples/sec   Loss 7.1557   LearningRate 0.0525   Epoch: 5   Global Step: 91840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:44,064-Speed 9528.22 samples/sec   Loss 7.2475   LearningRate 0.0525   Epoch: 5   Global Step: 91850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:45,163-Speed 9318.95 samples/sec   Loss 7.2133   LearningRate 0.0525   Epoch: 5   Global Step: 91860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:46,262-Speed 9326.99 samples/sec   Loss 7.2308   LearningRate 0.0525   Epoch: 5   Global Step: 91870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:47,360-Speed 9324.08 samples/sec   Loss 7.2054   LearningRate 0.0525   Epoch: 5   Global Step: 91880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:48,470-Speed 9237.11 samples/sec   Loss 7.2057   LearningRate 0.0525   Epoch: 5   Global Step: 91890   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:13:49,555-Speed 9449.02 samples/sec   Loss 7.2521   LearningRate 0.0525   Epoch: 5   Global Step: 91900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:50,628-Speed 9546.55 samples/sec   Loss 7.2564   LearningRate 0.0525   Epoch: 5   Global Step: 91910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:51,714-Speed 9434.53 samples/sec   Loss 7.2924   LearningRate 0.0525   Epoch: 5   Global Step: 91920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:52,754-Speed 9851.88 samples/sec   Loss 7.2245   LearningRate 0.0525   Epoch: 5   Global Step: 91930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:53,830-Speed 9517.82 samples/sec   Loss 7.3215   LearningRate 0.0525   Epoch: 5   Global Step: 91940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:54,934-Speed 9285.34 samples/sec   Loss 7.2836   LearningRate 0.0525   Epoch: 5   Global Step: 91950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:56,018-Speed 9449.87 samples/sec   Loss 7.2976   LearningRate 0.0525   Epoch: 5   Global Step: 91960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:57,055-Speed 9885.77 samples/sec   Loss 7.2922   LearningRate 0.0525   Epoch: 5   Global Step: 91970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:58,136-Speed 9475.51 samples/sec   Loss 7.2366   LearningRate 0.0525   Epoch: 5   Global Step: 91980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:13:59,206-Speed 9576.84 samples/sec   Loss 7.2706   LearningRate 0.0525   Epoch: 5   Global Step: 91990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:14:00,265-Speed 9675.20 samples/sec   Loss 7.2436   LearningRate 0.0525   Epoch: 5   Global Step: 92000   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:14:22,278-[lfw][92000]XNorm: 11.842051
Training: 2022-04-11 15:14:22,279-[lfw][92000]Accuracy-Flip: 0.99583+-0.00281
Training: 2022-04-11 15:14:22,279-[lfw][92000]Accuracy-Highest: 0.99683
Training: 2022-04-11 15:14:47,773-[cfp_fp][92000]XNorm: 9.961729
Training: 2022-04-11 15:14:47,774-[cfp_fp][92000]Accuracy-Flip: 0.95043+-0.01409
Training: 2022-04-11 15:14:47,774-[cfp_fp][92000]Accuracy-Highest: 0.95729
Training: 2022-04-11 15:15:09,776-[agedb_30][92000]XNorm: 11.278383
Training: 2022-04-11 15:15:09,776-[agedb_30][92000]Accuracy-Flip: 0.96033+-0.01113
Training: 2022-04-11 15:15:09,777-[agedb_30][92000]Accuracy-Highest: 0.96317
Training: 2022-04-11 15:15:10,862-Speed 145.05 samples/sec   Loss 7.1602   LearningRate 0.0525   Epoch: 5   Global Step: 92010   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:15:11,925-Speed 9633.72 samples/sec   Loss 7.3851   LearningRate 0.0525   Epoch: 5   Global Step: 92020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:13,038-Speed 9205.39 samples/sec   Loss 7.2909   LearningRate 0.0525   Epoch: 5   Global Step: 92030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:14,149-Speed 9230.31 samples/sec   Loss 7.2382   LearningRate 0.0525   Epoch: 5   Global Step: 92040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:15,209-Speed 9663.82 samples/sec   Loss 7.4205   LearningRate 0.0525   Epoch: 5   Global Step: 92050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:16,316-Speed 9256.11 samples/sec   Loss 7.2505   LearningRate 0.0524   Epoch: 5   Global Step: 92060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:17,407-Speed 9387.39 samples/sec   Loss 7.2345   LearningRate 0.0524   Epoch: 5   Global Step: 92070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:18,514-Speed 9253.20 samples/sec   Loss 7.4649   LearningRate 0.0524   Epoch: 5   Global Step: 92080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:19,607-Speed 9381.17 samples/sec   Loss 7.2982   LearningRate 0.0524   Epoch: 5   Global Step: 92090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:20,664-Speed 9695.30 samples/sec   Loss 7.3126   LearningRate 0.0524   Epoch: 5   Global Step: 92100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:21,771-Speed 9253.33 samples/sec   Loss 7.2583   LearningRate 0.0524   Epoch: 5   Global Step: 92110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:22,858-Speed 9423.12 samples/sec   Loss 7.1699   LearningRate 0.0524   Epoch: 5   Global Step: 92120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:23,954-Speed 9353.02 samples/sec   Loss 7.3005   LearningRate 0.0524   Epoch: 5   Global Step: 92130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:25,042-Speed 9411.35 samples/sec   Loss 7.3493   LearningRate 0.0524   Epoch: 5   Global Step: 92140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:26,137-Speed 9362.24 samples/sec   Loss 7.2281   LearningRate 0.0524   Epoch: 5   Global Step: 92150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:27,228-Speed 9394.71 samples/sec   Loss 7.3323   LearningRate 0.0524   Epoch: 5   Global Step: 92160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:28,291-Speed 9632.34 samples/sec   Loss 7.2898   LearningRate 0.0524   Epoch: 5   Global Step: 92170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:29,360-Speed 9587.75 samples/sec   Loss 7.2910   LearningRate 0.0524   Epoch: 5   Global Step: 92180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:30,407-Speed 9779.97 samples/sec   Loss 7.2557   LearningRate 0.0524   Epoch: 5   Global Step: 92190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:31,471-Speed 9629.71 samples/sec   Loss 7.3398   LearningRate 0.0524   Epoch: 5   Global Step: 92200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:32,545-Speed 9541.05 samples/sec   Loss 7.3233   LearningRate 0.0524   Epoch: 5   Global Step: 92210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:33,591-Speed 9797.43 samples/sec   Loss 7.2620   LearningRate 0.0524   Epoch: 5   Global Step: 92220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:34,699-Speed 9249.94 samples/sec   Loss 7.2472   LearningRate 0.0524   Epoch: 5   Global Step: 92230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:35,780-Speed 9475.58 samples/sec   Loss 7.3407   LearningRate 0.0524   Epoch: 5   Global Step: 92240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:36,879-Speed 9324.11 samples/sec   Loss 7.2577   LearningRate 0.0524   Epoch: 5   Global Step: 92250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:37,979-Speed 9315.70 samples/sec   Loss 7.3771   LearningRate 0.0524   Epoch: 5   Global Step: 92260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:39,881-Speed 5385.41 samples/sec   Loss 7.2870   LearningRate 0.0524   Epoch: 5   Global Step: 92270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:40,961-Speed 9487.47 samples/sec   Loss 7.2712   LearningRate 0.0524   Epoch: 5   Global Step: 92280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:42,051-Speed 9399.50 samples/sec   Loss 7.2085   LearningRate 0.0524   Epoch: 5   Global Step: 92290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:43,122-Speed 9566.96 samples/sec   Loss 7.1194   LearningRate 0.0523   Epoch: 5   Global Step: 92300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:44,243-Speed 9137.28 samples/sec   Loss 7.3888   LearningRate 0.0523   Epoch: 5   Global Step: 92310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:45,318-Speed 9531.70 samples/sec   Loss 7.2790   LearningRate 0.0523   Epoch: 5   Global Step: 92320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:46,406-Speed 9421.22 samples/sec   Loss 7.2282   LearningRate 0.0523   Epoch: 5   Global Step: 92330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:47,487-Speed 9477.14 samples/sec   Loss 7.3558   LearningRate 0.0523   Epoch: 5   Global Step: 92340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:48,577-Speed 9401.10 samples/sec   Loss 7.2638   LearningRate 0.0523   Epoch: 5   Global Step: 92350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:49,714-Speed 9014.37 samples/sec   Loss 7.2286   LearningRate 0.0523   Epoch: 5   Global Step: 92360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:50,754-Speed 9850.90 samples/sec   Loss 7.1530   LearningRate 0.0523   Epoch: 5   Global Step: 92370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:15:51,803-Speed 9768.43 samples/sec   Loss 7.2143   LearningRate 0.0523   Epoch: 5   Global Step: 92380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:52,857-Speed 9719.22 samples/sec   Loss 7.3315   LearningRate 0.0523   Epoch: 5   Global Step: 92390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:53,916-Speed 9678.44 samples/sec   Loss 7.3690   LearningRate 0.0523   Epoch: 5   Global Step: 92400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:54,949-Speed 9914.85 samples/sec   Loss 7.3667   LearningRate 0.0523   Epoch: 5   Global Step: 92410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:56,033-Speed 9453.91 samples/sec   Loss 7.2011   LearningRate 0.0523   Epoch: 5   Global Step: 92420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:57,100-Speed 9604.78 samples/sec   Loss 7.3266   LearningRate 0.0523   Epoch: 5   Global Step: 92430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:58,217-Speed 9168.44 samples/sec   Loss 7.2599   LearningRate 0.0523   Epoch: 5   Global Step: 92440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:15:59,329-Speed 9217.04 samples/sec   Loss 7.3281   LearningRate 0.0523   Epoch: 5   Global Step: 92450   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:16:00,403-Speed 9540.94 samples/sec   Loss 7.2782   LearningRate 0.0523   Epoch: 5   Global Step: 92460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:16:01,483-Speed 9480.85 samples/sec   Loss 7.2907   LearningRate 0.0523   Epoch: 5   Global Step: 92470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:16:02,521-Speed 9873.47 samples/sec   Loss 7.3642   LearningRate 0.0523   Epoch: 5   Global Step: 92480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:03,607-Speed 9518.16 samples/sec   Loss 7.2981   LearningRate 0.0523   Epoch: 5   Global Step: 92490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:04,680-Speed 9550.85 samples/sec   Loss 7.3158   LearningRate 0.0523   Epoch: 5   Global Step: 92500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:05,744-Speed 9633.96 samples/sec   Loss 7.2059   LearningRate 0.0523   Epoch: 5   Global Step: 92510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:06,792-Speed 9775.64 samples/sec   Loss 7.3300   LearningRate 0.0523   Epoch: 5   Global Step: 92520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:07,882-Speed 9394.89 samples/sec   Loss 7.1649   LearningRate 0.0522   Epoch: 5   Global Step: 92530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:08,962-Speed 9490.55 samples/sec   Loss 7.2219   LearningRate 0.0522   Epoch: 5   Global Step: 92540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:10,046-Speed 9457.99 samples/sec   Loss 7.2939   LearningRate 0.0522   Epoch: 5   Global Step: 92550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:11,102-Speed 9695.86 samples/sec   Loss 7.2551   LearningRate 0.0522   Epoch: 5   Global Step: 92560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:12,175-Speed 9554.65 samples/sec   Loss 7.1909   LearningRate 0.0522   Epoch: 5   Global Step: 92570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:13,254-Speed 9491.04 samples/sec   Loss 7.3286   LearningRate 0.0522   Epoch: 5   Global Step: 92580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:14,354-Speed 9312.26 samples/sec   Loss 7.2250   LearningRate 0.0522   Epoch: 5   Global Step: 92590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:15,428-Speed 9541.49 samples/sec   Loss 7.2417   LearningRate 0.0522   Epoch: 5   Global Step: 92600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:16,516-Speed 9418.37 samples/sec   Loss 7.2162   LearningRate 0.0522   Epoch: 5   Global Step: 92610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:17,609-Speed 9375.73 samples/sec   Loss 7.2449   LearningRate 0.0522   Epoch: 5   Global Step: 92620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:18,670-Speed 9656.87 samples/sec   Loss 7.2585   LearningRate 0.0522   Epoch: 5   Global Step: 92630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:19,716-Speed 9798.53 samples/sec   Loss 7.1992   LearningRate 0.0522   Epoch: 5   Global Step: 92640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:20,797-Speed 9475.30 samples/sec   Loss 7.2771   LearningRate 0.0522   Epoch: 5   Global Step: 92650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:21,887-Speed 9398.94 samples/sec   Loss 7.2119   LearningRate 0.0522   Epoch: 5   Global Step: 92660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:22,983-Speed 9352.29 samples/sec   Loss 7.2077   LearningRate 0.0522   Epoch: 5   Global Step: 92670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:24,042-Speed 9678.53 samples/sec   Loss 7.2803   LearningRate 0.0522   Epoch: 5   Global Step: 92680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:25,109-Speed 9610.51 samples/sec   Loss 7.3733   LearningRate 0.0522   Epoch: 5   Global Step: 92690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:26,177-Speed 9596.38 samples/sec   Loss 7.2267   LearningRate 0.0522   Epoch: 5   Global Step: 92700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:27,234-Speed 9689.64 samples/sec   Loss 7.2206   LearningRate 0.0522   Epoch: 5   Global Step: 92710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:28,293-Speed 9676.36 samples/sec   Loss 7.1890   LearningRate 0.0522   Epoch: 5   Global Step: 92720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:29,362-Speed 9589.96 samples/sec   Loss 7.2429   LearningRate 0.0522   Epoch: 5   Global Step: 92730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:30,448-Speed 9426.26 samples/sec   Loss 7.3070   LearningRate 0.0522   Epoch: 5   Global Step: 92740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:31,532-Speed 9460.14 samples/sec   Loss 7.2512   LearningRate 0.0522   Epoch: 5   Global Step: 92750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:32,584-Speed 9737.97 samples/sec   Loss 7.2255   LearningRate 0.0521   Epoch: 5   Global Step: 92760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:33,636-Speed 9736.83 samples/sec   Loss 7.2425   LearningRate 0.0521   Epoch: 5   Global Step: 92770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:34,748-Speed 9218.91 samples/sec   Loss 7.2754   LearningRate 0.0521   Epoch: 5   Global Step: 92780   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:16:35,856-Speed 9240.61 samples/sec   Loss 7.2881   LearningRate 0.0521   Epoch: 5   Global Step: 92790   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:16:36,963-Speed 9255.00 samples/sec   Loss 7.3647   LearningRate 0.0521   Epoch: 5   Global Step: 92800   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:16:38,070-Speed 9255.90 samples/sec   Loss 7.2745   LearningRate 0.0521   Epoch: 5   Global Step: 92810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:39,128-Speed 9691.74 samples/sec   Loss 7.1993   LearningRate 0.0521   Epoch: 5   Global Step: 92820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:40,195-Speed 9598.51 samples/sec   Loss 7.2362   LearningRate 0.0521   Epoch: 5   Global Step: 92830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:41,253-Speed 9686.53 samples/sec   Loss 7.4307   LearningRate 0.0521   Epoch: 5   Global Step: 92840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:42,312-Speed 9670.67 samples/sec   Loss 7.2708   LearningRate 0.0521   Epoch: 5   Global Step: 92850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:43,336-Speed 10013.50 samples/sec   Loss 7.1941   LearningRate 0.0521   Epoch: 5   Global Step: 92860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:44,375-Speed 9863.46 samples/sec   Loss 7.1851   LearningRate 0.0521   Epoch: 5   Global Step: 92870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:45,458-Speed 9460.76 samples/sec   Loss 7.1148   LearningRate 0.0521   Epoch: 5   Global Step: 92880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:46,569-Speed 9215.04 samples/sec   Loss 7.2015   LearningRate 0.0521   Epoch: 5   Global Step: 92890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:47,641-Speed 9561.05 samples/sec   Loss 7.2358   LearningRate 0.0521   Epoch: 5   Global Step: 92900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:48,701-Speed 9663.62 samples/sec   Loss 7.2099   LearningRate 0.0521   Epoch: 5   Global Step: 92910   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:16:49,796-Speed 9355.32 samples/sec   Loss 7.2681   LearningRate 0.0521   Epoch: 5   Global Step: 92920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:50,904-Speed 9245.90 samples/sec   Loss 7.2919   LearningRate 0.0521   Epoch: 5   Global Step: 92930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:51,978-Speed 9545.35 samples/sec   Loss 7.1944   LearningRate 0.0521   Epoch: 5   Global Step: 92940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:53,066-Speed 9410.94 samples/sec   Loss 7.3017   LearningRate 0.0521   Epoch: 5   Global Step: 92950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:54,138-Speed 9565.28 samples/sec   Loss 7.3617   LearningRate 0.0521   Epoch: 5   Global Step: 92960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:55,214-Speed 9524.67 samples/sec   Loss 7.2564   LearningRate 0.0521   Epoch: 5   Global Step: 92970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:56,315-Speed 9299.10 samples/sec   Loss 7.2779   LearningRate 0.0521   Epoch: 5   Global Step: 92980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:57,427-Speed 9213.12 samples/sec   Loss 7.0942   LearningRate 0.0520   Epoch: 5   Global Step: 92990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:58,493-Speed 9618.16 samples/sec   Loss 7.2680   LearningRate 0.0520   Epoch: 5   Global Step: 93000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:16:59,556-Speed 9634.24 samples/sec   Loss 7.1932   LearningRate 0.0520   Epoch: 5   Global Step: 93010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:00,604-Speed 9778.03 samples/sec   Loss 7.1949   LearningRate 0.0520   Epoch: 5   Global Step: 93020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:01,651-Speed 9802.02 samples/sec   Loss 7.3069   LearningRate 0.0520   Epoch: 5   Global Step: 93030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:17:02,688-Speed 9883.56 samples/sec   Loss 7.4239   LearningRate 0.0520   Epoch: 5   Global Step: 93040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:17:03,729-Speed 9848.54 samples/sec   Loss 7.2749   LearningRate 0.0520   Epoch: 5   Global Step: 93050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:17:04,819-Speed 9394.47 samples/sec   Loss 7.1541   LearningRate 0.0520   Epoch: 5   Global Step: 93060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:17:05,881-Speed 9652.95 samples/sec   Loss 7.1868   LearningRate 0.0520   Epoch: 5   Global Step: 93070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:17:07,005-Speed 9116.80 samples/sec   Loss 7.1253   LearningRate 0.0520   Epoch: 5   Global Step: 93080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:17:08,115-Speed 9222.86 samples/sec   Loss 7.1741   LearningRate 0.0520   Epoch: 5   Global Step: 93090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:17:09,206-Speed 9395.58 samples/sec   Loss 7.3283   LearningRate 0.0520   Epoch: 5   Global Step: 93100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:17:10,287-Speed 9476.02 samples/sec   Loss 7.3402   LearningRate 0.0520   Epoch: 5   Global Step: 93110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:17:11,355-Speed 9590.11 samples/sec   Loss 7.1908   LearningRate 0.0520   Epoch: 5   Global Step: 93120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:17:12,394-Speed 9866.14 samples/sec   Loss 7.3524   LearningRate 0.0520   Epoch: 5   Global Step: 93130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:13,449-Speed 9709.37 samples/sec   Loss 7.3364   LearningRate 0.0520   Epoch: 5   Global Step: 93140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:14,540-Speed 9399.03 samples/sec   Loss 7.2243   LearningRate 0.0520   Epoch: 5   Global Step: 93150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:15,589-Speed 9768.92 samples/sec   Loss 7.2968   LearningRate 0.0520   Epoch: 5   Global Step: 93160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:16,660-Speed 9566.95 samples/sec   Loss 7.2604   LearningRate 0.0520   Epoch: 5   Global Step: 93170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:17,731-Speed 9564.47 samples/sec   Loss 7.3094   LearningRate 0.0520   Epoch: 5   Global Step: 93180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:18,817-Speed 9437.99 samples/sec   Loss 7.3440   LearningRate 0.0520   Epoch: 5   Global Step: 93190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:19,908-Speed 9390.88 samples/sec   Loss 7.2467   LearningRate 0.0520   Epoch: 5   Global Step: 93200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:20,981-Speed 9546.00 samples/sec   Loss 7.2548   LearningRate 0.0520   Epoch: 5   Global Step: 93210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:22,048-Speed 9601.95 samples/sec   Loss 7.2363   LearningRate 0.0519   Epoch: 5   Global Step: 93220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:23,091-Speed 9825.14 samples/sec   Loss 7.3356   LearningRate 0.0519   Epoch: 5   Global Step: 93230   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:17:24,195-Speed 9275.26 samples/sec   Loss 7.2642   LearningRate 0.0519   Epoch: 5   Global Step: 93240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:25,306-Speed 9240.00 samples/sec   Loss 7.2922   LearningRate 0.0519   Epoch: 5   Global Step: 93250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:26,359-Speed 9720.62 samples/sec   Loss 7.1427   LearningRate 0.0519   Epoch: 5   Global Step: 93260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:27,447-Speed 9419.54 samples/sec   Loss 7.3127   LearningRate 0.0519   Epoch: 5   Global Step: 93270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:28,517-Speed 9578.11 samples/sec   Loss 7.2845   LearningRate 0.0519   Epoch: 5   Global Step: 93280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:29,572-Speed 9713.48 samples/sec   Loss 7.3453   LearningRate 0.0519   Epoch: 5   Global Step: 93290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:30,668-Speed 9345.52 samples/sec   Loss 7.2499   LearningRate 0.0519   Epoch: 5   Global Step: 93300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:31,729-Speed 9658.82 samples/sec   Loss 7.1967   LearningRate 0.0519   Epoch: 5   Global Step: 93310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:32,809-Speed 9487.98 samples/sec   Loss 7.3151   LearningRate 0.0519   Epoch: 5   Global Step: 93320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:33,918-Speed 9239.87 samples/sec   Loss 7.3138   LearningRate 0.0519   Epoch: 5   Global Step: 93330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:35,007-Speed 9409.69 samples/sec   Loss 7.2115   LearningRate 0.0519   Epoch: 5   Global Step: 93340   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:17:36,061-Speed 9721.18 samples/sec   Loss 7.2936   LearningRate 0.0519   Epoch: 5   Global Step: 93350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:37,166-Speed 9277.01 samples/sec   Loss 7.3365   LearningRate 0.0519   Epoch: 5   Global Step: 93360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:38,206-Speed 9852.20 samples/sec   Loss 7.1924   LearningRate 0.0519   Epoch: 5   Global Step: 93370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:39,277-Speed 9568.94 samples/sec   Loss 7.3251   LearningRate 0.0519   Epoch: 5   Global Step: 93380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:40,407-Speed 9063.10 samples/sec   Loss 7.3191   LearningRate 0.0519   Epoch: 5   Global Step: 93390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:41,493-Speed 9438.64 samples/sec   Loss 7.3192   LearningRate 0.0519   Epoch: 5   Global Step: 93400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:42,583-Speed 9395.85 samples/sec   Loss 7.2549   LearningRate 0.0519   Epoch: 5   Global Step: 93410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:43,650-Speed 9604.60 samples/sec   Loss 7.2787   LearningRate 0.0519   Epoch: 5   Global Step: 93420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:44,690-Speed 9850.47 samples/sec   Loss 7.2539   LearningRate 0.0519   Epoch: 5   Global Step: 93430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:45,761-Speed 9569.79 samples/sec   Loss 7.2845   LearningRate 0.0519   Epoch: 5   Global Step: 93440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:46,786-Speed 9992.50 samples/sec   Loss 7.2538   LearningRate 0.0518   Epoch: 5   Global Step: 93450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:47,908-Speed 9132.72 samples/sec   Loss 7.2737   LearningRate 0.0518   Epoch: 5   Global Step: 93460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:48,975-Speed 9606.25 samples/sec   Loss 7.2105   LearningRate 0.0518   Epoch: 5   Global Step: 93470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:50,097-Speed 9136.62 samples/sec   Loss 7.3232   LearningRate 0.0518   Epoch: 5   Global Step: 93480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:51,168-Speed 9560.28 samples/sec   Loss 7.2363   LearningRate 0.0518   Epoch: 5   Global Step: 93490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:52,284-Speed 9187.42 samples/sec   Loss 7.3520   LearningRate 0.0518   Epoch: 5   Global Step: 93500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:53,395-Speed 9217.09 samples/sec   Loss 7.3723   LearningRate 0.0518   Epoch: 5   Global Step: 93510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:54,448-Speed 9733.05 samples/sec   Loss 7.2584   LearningRate 0.0518   Epoch: 5   Global Step: 93520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:55,506-Speed 9694.70 samples/sec   Loss 7.1789   LearningRate 0.0518   Epoch: 5   Global Step: 93530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:56,555-Speed 9761.88 samples/sec   Loss 7.2553   LearningRate 0.0518   Epoch: 5   Global Step: 93540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:57,621-Speed 9618.80 samples/sec   Loss 7.2116   LearningRate 0.0518   Epoch: 5   Global Step: 93550   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:17:58,699-Speed 9501.13 samples/sec   Loss 7.2760   LearningRate 0.0518   Epoch: 5   Global Step: 93560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:17:59,772-Speed 9546.15 samples/sec   Loss 7.3506   LearningRate 0.0518   Epoch: 5   Global Step: 93570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:00,840-Speed 9592.06 samples/sec   Loss 7.2152   LearningRate 0.0518   Epoch: 5   Global Step: 93580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:01,905-Speed 9624.35 samples/sec   Loss 7.3301   LearningRate 0.0518   Epoch: 5   Global Step: 93590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:02,990-Speed 9450.93 samples/sec   Loss 7.2760   LearningRate 0.0518   Epoch: 5   Global Step: 93600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:04,033-Speed 9820.31 samples/sec   Loss 7.2737   LearningRate 0.0518   Epoch: 5   Global Step: 93610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:05,076-Speed 9823.62 samples/sec   Loss 7.1812   LearningRate 0.0518   Epoch: 5   Global Step: 93620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:06,152-Speed 9517.47 samples/sec   Loss 7.4139   LearningRate 0.0518   Epoch: 5   Global Step: 93630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:07,225-Speed 9547.19 samples/sec   Loss 7.2643   LearningRate 0.0518   Epoch: 5   Global Step: 93640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:08,255-Speed 9954.53 samples/sec   Loss 7.2473   LearningRate 0.0518   Epoch: 5   Global Step: 93650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:09,341-Speed 9433.99 samples/sec   Loss 7.0295   LearningRate 0.0518   Epoch: 5   Global Step: 93660   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:18:10,434-Speed 9370.37 samples/sec   Loss 7.2255   LearningRate 0.0518   Epoch: 5   Global Step: 93670   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:18:11,531-Speed 9339.28 samples/sec   Loss 7.2820   LearningRate 0.0517   Epoch: 5   Global Step: 93680   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:18:12,605-Speed 9541.49 samples/sec   Loss 7.2097   LearningRate 0.0517   Epoch: 5   Global Step: 93690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:13,643-Speed 9876.00 samples/sec   Loss 7.1641   LearningRate 0.0517   Epoch: 5   Global Step: 93700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:14,698-Speed 9715.92 samples/sec   Loss 7.3156   LearningRate 0.0517   Epoch: 5   Global Step: 93710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:15,725-Speed 9981.00 samples/sec   Loss 7.3010   LearningRate 0.0517   Epoch: 5   Global Step: 93720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:16,792-Speed 9601.43 samples/sec   Loss 7.2330   LearningRate 0.0517   Epoch: 5   Global Step: 93730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:17,916-Speed 9115.53 samples/sec   Loss 7.2863   LearningRate 0.0517   Epoch: 5   Global Step: 93740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:19,010-Speed 9366.61 samples/sec   Loss 7.3060   LearningRate 0.0517   Epoch: 5   Global Step: 93750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:20,110-Speed 9308.64 samples/sec   Loss 7.2117   LearningRate 0.0517   Epoch: 5   Global Step: 93760   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:21,184-Speed 9542.86 samples/sec   Loss 7.3899   LearningRate 0.0517   Epoch: 5   Global Step: 93770   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:22,271-Speed 9423.31 samples/sec   Loss 7.2149   LearningRate 0.0517   Epoch: 5   Global Step: 93780   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:23,337-Speed 9615.21 samples/sec   Loss 7.2892   LearningRate 0.0517   Epoch: 5   Global Step: 93790   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:18:24,416-Speed 9488.33 samples/sec   Loss 7.3211   LearningRate 0.0517   Epoch: 5   Global Step: 93800   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:18:25,529-Speed 9209.54 samples/sec   Loss 7.2420   LearningRate 0.0517   Epoch: 5   Global Step: 93810   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:18:26,602-Speed 9550.59 samples/sec   Loss 7.2322   LearningRate 0.0517   Epoch: 5   Global Step: 93820   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:18:27,645-Speed 9825.13 samples/sec   Loss 7.2794   LearningRate 0.0517   Epoch: 5   Global Step: 93830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:18:28,704-Speed 9676.30 samples/sec   Loss 7.2325   LearningRate 0.0517   Epoch: 5   Global Step: 93840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:18:29,829-Speed 9107.12 samples/sec   Loss 7.2570   LearningRate 0.0517   Epoch: 5   Global Step: 93850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:18:30,899-Speed 9574.17 samples/sec   Loss 7.2495   LearningRate 0.0517   Epoch: 5   Global Step: 93860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:18:31,983-Speed 9452.69 samples/sec   Loss 7.1998   LearningRate 0.0517   Epoch: 5   Global Step: 93870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:18:33,044-Speed 9665.47 samples/sec   Loss 7.3332   LearningRate 0.0517   Epoch: 5   Global Step: 93880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:18:34,128-Speed 9450.58 samples/sec   Loss 7.2981   LearningRate 0.0517   Epoch: 5   Global Step: 93890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:18:35,200-Speed 9554.88 samples/sec   Loss 7.2709   LearningRate 0.0517   Epoch: 5   Global Step: 93900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:18:36,264-Speed 9629.07 samples/sec   Loss 7.3341   LearningRate 0.0517   Epoch: 5   Global Step: 93910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:18:37,393-Speed 9080.94 samples/sec   Loss 7.3058   LearningRate 0.0516   Epoch: 5   Global Step: 93920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:18:38,437-Speed 9808.58 samples/sec   Loss 7.3302   LearningRate 0.0516   Epoch: 5   Global Step: 93930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:39,476-Speed 9864.15 samples/sec   Loss 7.2548   LearningRate 0.0516   Epoch: 5   Global Step: 93940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:40,570-Speed 9364.15 samples/sec   Loss 7.2119   LearningRate 0.0516   Epoch: 5   Global Step: 93950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:41,605-Speed 9900.52 samples/sec   Loss 7.3203   LearningRate 0.0516   Epoch: 5   Global Step: 93960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:42,637-Speed 9929.31 samples/sec   Loss 7.3086   LearningRate 0.0516   Epoch: 5   Global Step: 93970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:43,666-Speed 9957.14 samples/sec   Loss 7.2817   LearningRate 0.0516   Epoch: 5   Global Step: 93980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:44,741-Speed 9530.02 samples/sec   Loss 7.3340   LearningRate 0.0516   Epoch: 5   Global Step: 93990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:18:45,804-Speed 9638.35 samples/sec   Loss 7.1080   LearningRate 0.0516   Epoch: 5   Global Step: 94000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:19:07,695-[lfw][94000]XNorm: 11.538984
Training: 2022-04-11 15:19:07,696-[lfw][94000]Accuracy-Flip: 0.99617+-0.00248
Training: 2022-04-11 15:19:07,696-[lfw][94000]Accuracy-Highest: 0.99683
Training: 2022-04-11 15:19:32,991-[cfp_fp][94000]XNorm: 9.872888
Training: 2022-04-11 15:19:32,991-[cfp_fp][94000]Accuracy-Flip: 0.95314+-0.01144
Training: 2022-04-11 15:19:32,992-[cfp_fp][94000]Accuracy-Highest: 0.95729
Training: 2022-04-11 15:19:54,812-[agedb_30][94000]XNorm: 11.215298
Training: 2022-04-11 15:19:54,813-[agedb_30][94000]Accuracy-Flip: 0.96233+-0.00810
Training: 2022-04-11 15:19:54,814-[agedb_30][94000]Accuracy-Highest: 0.96317
Training: 2022-04-11 15:19:55,872-Speed 146.15 samples/sec   Loss 7.2117   LearningRate 0.0516   Epoch: 5   Global Step: 94010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:19:56,921-Speed 9771.10 samples/sec   Loss 7.3214   LearningRate 0.0516   Epoch: 5   Global Step: 94020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:19:57,977-Speed 9700.86 samples/sec   Loss 7.3417   LearningRate 0.0516   Epoch: 5   Global Step: 94030   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:19:59,068-Speed 9390.78 samples/sec   Loss 7.2362   LearningRate 0.0516   Epoch: 5   Global Step: 94040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:00,103-Speed 9902.39 samples/sec   Loss 7.1644   LearningRate 0.0516   Epoch: 5   Global Step: 94050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:01,180-Speed 9509.39 samples/sec   Loss 7.1496   LearningRate 0.0516   Epoch: 5   Global Step: 94060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:02,275-Speed 9359.37 samples/sec   Loss 7.2382   LearningRate 0.0516   Epoch: 5   Global Step: 94070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:03,357-Speed 9472.75 samples/sec   Loss 7.2335   LearningRate 0.0516   Epoch: 5   Global Step: 94080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:04,417-Speed 9658.28 samples/sec   Loss 7.2954   LearningRate 0.0516   Epoch: 5   Global Step: 94090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:05,476-Speed 9678.90 samples/sec   Loss 7.3134   LearningRate 0.0516   Epoch: 5   Global Step: 94100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:06,545-Speed 9582.45 samples/sec   Loss 7.3488   LearningRate 0.0516   Epoch: 5   Global Step: 94110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:07,603-Speed 9686.34 samples/sec   Loss 7.2306   LearningRate 0.0516   Epoch: 5   Global Step: 94120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:08,667-Speed 9628.94 samples/sec   Loss 7.2696   LearningRate 0.0516   Epoch: 5   Global Step: 94130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:09,719-Speed 9737.17 samples/sec   Loss 7.2409   LearningRate 0.0516   Epoch: 5   Global Step: 94140   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:20:10,824-Speed 9277.21 samples/sec   Loss 7.1750   LearningRate 0.0515   Epoch: 5   Global Step: 94150   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:20:11,897-Speed 9542.99 samples/sec   Loss 7.2882   LearningRate 0.0515   Epoch: 5   Global Step: 94160   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:20:12,980-Speed 9457.88 samples/sec   Loss 7.2684   LearningRate 0.0515   Epoch: 5   Global Step: 94170   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:20:14,077-Speed 9345.11 samples/sec   Loss 7.3992   LearningRate 0.0515   Epoch: 5   Global Step: 94180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:15,151-Speed 9546.95 samples/sec   Loss 7.3612   LearningRate 0.0515   Epoch: 5   Global Step: 94190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:16,208-Speed 9693.34 samples/sec   Loss 7.3759   LearningRate 0.0515   Epoch: 5   Global Step: 94200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:17,301-Speed 9368.73 samples/sec   Loss 7.2933   LearningRate 0.0515   Epoch: 5   Global Step: 94210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:18,391-Speed 9402.93 samples/sec   Loss 7.2655   LearningRate 0.0515   Epoch: 5   Global Step: 94220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:19,494-Speed 9290.57 samples/sec   Loss 7.1740   LearningRate 0.0515   Epoch: 5   Global Step: 94230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:20,585-Speed 9388.90 samples/sec   Loss 7.3317   LearningRate 0.0515   Epoch: 5   Global Step: 94240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:21,660-Speed 9529.46 samples/sec   Loss 7.2499   LearningRate 0.0515   Epoch: 5   Global Step: 94250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:22,738-Speed 9508.95 samples/sec   Loss 7.2038   LearningRate 0.0515   Epoch: 5   Global Step: 94260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:23,832-Speed 9363.31 samples/sec   Loss 7.3085   LearningRate 0.0515   Epoch: 5   Global Step: 94270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:24,906-Speed 9540.89 samples/sec   Loss 7.3423   LearningRate 0.0515   Epoch: 5   Global Step: 94280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:26,000-Speed 9366.77 samples/sec   Loss 7.2689   LearningRate 0.0515   Epoch: 5   Global Step: 94290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:27,066-Speed 9616.61 samples/sec   Loss 7.2511   LearningRate 0.0515   Epoch: 5   Global Step: 94300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:28,134-Speed 9598.32 samples/sec   Loss 7.1060   LearningRate 0.0515   Epoch: 5   Global Step: 94310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:29,207-Speed 9552.05 samples/sec   Loss 7.2853   LearningRate 0.0515   Epoch: 5   Global Step: 94320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:30,326-Speed 9155.18 samples/sec   Loss 7.2832   LearningRate 0.0515   Epoch: 5   Global Step: 94330   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:31,436-Speed 9229.25 samples/sec   Loss 7.2086   LearningRate 0.0515   Epoch: 5   Global Step: 94340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:32,559-Speed 9123.13 samples/sec   Loss 7.3774   LearningRate 0.0515   Epoch: 5   Global Step: 94350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:33,693-Speed 9033.82 samples/sec   Loss 7.1820   LearningRate 0.0515   Epoch: 5   Global Step: 94360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:34,796-Speed 9293.34 samples/sec   Loss 7.2552   LearningRate 0.0515   Epoch: 5   Global Step: 94370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:35,866-Speed 9570.97 samples/sec   Loss 7.3527   LearningRate 0.0514   Epoch: 5   Global Step: 94380   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:20:36,975-Speed 9241.56 samples/sec   Loss 7.3019   LearningRate 0.0514   Epoch: 5   Global Step: 94390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:38,052-Speed 9513.80 samples/sec   Loss 7.2157   LearningRate 0.0514   Epoch: 5   Global Step: 94400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:39,108-Speed 9703.53 samples/sec   Loss 7.3023   LearningRate 0.0514   Epoch: 5   Global Step: 94410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:40,166-Speed 9687.92 samples/sec   Loss 7.2761   LearningRate 0.0514   Epoch: 5   Global Step: 94420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:41,292-Speed 9099.78 samples/sec   Loss 7.3058   LearningRate 0.0514   Epoch: 5   Global Step: 94430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:42,374-Speed 9465.49 samples/sec   Loss 7.2746   LearningRate 0.0514   Epoch: 5   Global Step: 94440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:43,410-Speed 9887.51 samples/sec   Loss 7.2957   LearningRate 0.0514   Epoch: 5   Global Step: 94450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:44,478-Speed 9595.77 samples/sec   Loss 7.2323   LearningRate 0.0514   Epoch: 5   Global Step: 94460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:45,612-Speed 9033.43 samples/sec   Loss 7.2612   LearningRate 0.0514   Epoch: 5   Global Step: 94470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:46,721-Speed 9240.46 samples/sec   Loss 7.1993   LearningRate 0.0514   Epoch: 5   Global Step: 94480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:47,825-Speed 9282.49 samples/sec   Loss 7.3439   LearningRate 0.0514   Epoch: 5   Global Step: 94490   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:20:48,943-Speed 9161.13 samples/sec   Loss 7.1951   LearningRate 0.0514   Epoch: 5   Global Step: 94500   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:20:50,043-Speed 9320.34 samples/sec   Loss 7.1476   LearningRate 0.0514   Epoch: 5   Global Step: 94510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:51,119-Speed 9520.22 samples/sec   Loss 7.1563   LearningRate 0.0514   Epoch: 5   Global Step: 94520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:52,195-Speed 9518.75 samples/sec   Loss 7.2916   LearningRate 0.0514   Epoch: 5   Global Step: 94530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:53,284-Speed 9406.28 samples/sec   Loss 7.3658   LearningRate 0.0514   Epoch: 5   Global Step: 94540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:54,394-Speed 9234.89 samples/sec   Loss 7.3338   LearningRate 0.0514   Epoch: 5   Global Step: 94550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:55,479-Speed 9450.19 samples/sec   Loss 7.2311   LearningRate 0.0514   Epoch: 5   Global Step: 94560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:56,547-Speed 9590.75 samples/sec   Loss 7.3451   LearningRate 0.0514   Epoch: 5   Global Step: 94570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:57,653-Speed 9265.32 samples/sec   Loss 7.2642   LearningRate 0.0514   Epoch: 5   Global Step: 94580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:58,731-Speed 9498.89 samples/sec   Loss 7.2415   LearningRate 0.0514   Epoch: 5   Global Step: 94590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:20:59,831-Speed 9322.02 samples/sec   Loss 7.2722   LearningRate 0.0514   Epoch: 5   Global Step: 94600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:00,889-Speed 9684.50 samples/sec   Loss 7.2643   LearningRate 0.0513   Epoch: 5   Global Step: 94610   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:21:01,990-Speed 9305.71 samples/sec   Loss 7.1472   LearningRate 0.0513   Epoch: 5   Global Step: 94620   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:21:03,077-Speed 9425.90 samples/sec   Loss 7.2660   LearningRate 0.0513   Epoch: 5   Global Step: 94630   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:21:04,154-Speed 9516.08 samples/sec   Loss 7.2548   LearningRate 0.0513   Epoch: 5   Global Step: 94640   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:21:05,218-Speed 9624.51 samples/sec   Loss 7.3149   LearningRate 0.0513   Epoch: 5   Global Step: 94650   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:21:06,293-Speed 9531.35 samples/sec   Loss 7.2639   LearningRate 0.0513   Epoch: 5   Global Step: 94660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:07,384-Speed 9395.13 samples/sec   Loss 7.2155   LearningRate 0.0513   Epoch: 5   Global Step: 94670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:08,503-Speed 9149.06 samples/sec   Loss 7.2347   LearningRate 0.0513   Epoch: 5   Global Step: 94680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:09,556-Speed 9732.60 samples/sec   Loss 7.2148   LearningRate 0.0513   Epoch: 5   Global Step: 94690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:10,606-Speed 9755.72 samples/sec   Loss 7.3323   LearningRate 0.0513   Epoch: 5   Global Step: 94700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:11,648-Speed 9836.78 samples/sec   Loss 7.2262   LearningRate 0.0513   Epoch: 5   Global Step: 94710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:12,724-Speed 9523.34 samples/sec   Loss 7.1859   LearningRate 0.0513   Epoch: 5   Global Step: 94720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:13,860-Speed 9020.17 samples/sec   Loss 7.1787   LearningRate 0.0513   Epoch: 5   Global Step: 94730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:14,956-Speed 9352.84 samples/sec   Loss 7.2890   LearningRate 0.0513   Epoch: 5   Global Step: 94740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:16,033-Speed 9514.92 samples/sec   Loss 7.1948   LearningRate 0.0513   Epoch: 5   Global Step: 94750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:17,086-Speed 9730.02 samples/sec   Loss 7.2450   LearningRate 0.0513   Epoch: 5   Global Step: 94760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:18,162-Speed 9520.07 samples/sec   Loss 7.2357   LearningRate 0.0513   Epoch: 5   Global Step: 94770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:19,235-Speed 9550.79 samples/sec   Loss 7.2985   LearningRate 0.0513   Epoch: 5   Global Step: 94780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:20,309-Speed 9540.56 samples/sec   Loss 7.1532   LearningRate 0.0513   Epoch: 5   Global Step: 94790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:21,410-Speed 9298.76 samples/sec   Loss 7.2447   LearningRate 0.0513   Epoch: 5   Global Step: 94800   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:22,450-Speed 9859.47 samples/sec   Loss 7.1616   LearningRate 0.0513   Epoch: 5   Global Step: 94810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:23,571-Speed 9135.87 samples/sec   Loss 7.2865   LearningRate 0.0513   Epoch: 5   Global Step: 94820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:24,669-Speed 9331.52 samples/sec   Loss 7.2699   LearningRate 0.0513   Epoch: 5   Global Step: 94830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:25,731-Speed 9656.61 samples/sec   Loss 7.2241   LearningRate 0.0513   Epoch: 5   Global Step: 94840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:26,842-Speed 9217.62 samples/sec   Loss 7.3052   LearningRate 0.0512   Epoch: 5   Global Step: 94850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:27,911-Speed 9585.31 samples/sec   Loss 7.1922   LearningRate 0.0512   Epoch: 5   Global Step: 94860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:28,980-Speed 9587.74 samples/sec   Loss 7.1621   LearningRate 0.0512   Epoch: 5   Global Step: 94870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:30,076-Speed 9345.64 samples/sec   Loss 7.3112   LearningRate 0.0512   Epoch: 5   Global Step: 94880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:31,150-Speed 9544.74 samples/sec   Loss 7.1854   LearningRate 0.0512   Epoch: 5   Global Step: 94890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:32,222-Speed 9561.54 samples/sec   Loss 7.1720   LearningRate 0.0512   Epoch: 5   Global Step: 94900   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:33,292-Speed 9577.90 samples/sec   Loss 7.3005   LearningRate 0.0512   Epoch: 5   Global Step: 94910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:34,358-Speed 9612.17 samples/sec   Loss 7.3070   LearningRate 0.0512   Epoch: 5   Global Step: 94920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:35,462-Speed 9277.16 samples/sec   Loss 7.0983   LearningRate 0.0512   Epoch: 5   Global Step: 94930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:36,578-Speed 9184.50 samples/sec   Loss 7.1559   LearningRate 0.0512   Epoch: 5   Global Step: 94940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:37,661-Speed 9458.66 samples/sec   Loss 7.3197   LearningRate 0.0512   Epoch: 5   Global Step: 94950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:38,719-Speed 9681.56 samples/sec   Loss 7.2492   LearningRate 0.0512   Epoch: 5   Global Step: 94960   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:21:39,765-Speed 9794.92 samples/sec   Loss 7.1292   LearningRate 0.0512   Epoch: 5   Global Step: 94970   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:21:42,719-Speed 3467.93 samples/sec   Loss 7.1801   LearningRate 0.0512   Epoch: 5   Global Step: 94980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:44,677-Speed 5230.68 samples/sec   Loss 7.1309   LearningRate 0.0512   Epoch: 5   Global Step: 94990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:45,715-Speed 9875.57 samples/sec   Loss 7.2145   LearningRate 0.0512   Epoch: 5   Global Step: 95000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:46,822-Speed 9252.04 samples/sec   Loss 7.2219   LearningRate 0.0512   Epoch: 5   Global Step: 95010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:47,891-Speed 9585.62 samples/sec   Loss 7.2389   LearningRate 0.0512   Epoch: 5   Global Step: 95020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:49,025-Speed 9036.77 samples/sec   Loss 7.2147   LearningRate 0.0512   Epoch: 5   Global Step: 95030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:50,119-Speed 9362.66 samples/sec   Loss 7.2867   LearningRate 0.0512   Epoch: 5   Global Step: 95040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:51,222-Speed 9289.38 samples/sec   Loss 7.2651   LearningRate 0.0512   Epoch: 5   Global Step: 95050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:52,272-Speed 9764.61 samples/sec   Loss 7.2537   LearningRate 0.0512   Epoch: 5   Global Step: 95060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:53,366-Speed 9361.73 samples/sec   Loss 7.3272   LearningRate 0.0512   Epoch: 5   Global Step: 95070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:54,433-Speed 9598.39 samples/sec   Loss 7.2592   LearningRate 0.0511   Epoch: 5   Global Step: 95080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:55,548-Speed 9192.58 samples/sec   Loss 7.2688   LearningRate 0.0511   Epoch: 5   Global Step: 95090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:21:56,632-Speed 9453.87 samples/sec   Loss 7.3311   LearningRate 0.0511   Epoch: 5   Global Step: 95100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:57,709-Speed 9511.89 samples/sec   Loss 7.2136   LearningRate 0.0511   Epoch: 5   Global Step: 95110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:58,811-Speed 9299.18 samples/sec   Loss 7.2886   LearningRate 0.0511   Epoch: 5   Global Step: 95120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:21:59,909-Speed 9330.42 samples/sec   Loss 7.3114   LearningRate 0.0511   Epoch: 5   Global Step: 95130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:01,017-Speed 9250.46 samples/sec   Loss 7.1660   LearningRate 0.0511   Epoch: 5   Global Step: 95140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:02,395-Speed 7434.65 samples/sec   Loss 7.2405   LearningRate 0.0511   Epoch: 5   Global Step: 95150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:03,489-Speed 9365.19 samples/sec   Loss 7.1711   LearningRate 0.0511   Epoch: 5   Global Step: 95160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:04,569-Speed 9489.43 samples/sec   Loss 7.1515   LearningRate 0.0511   Epoch: 5   Global Step: 95170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:05,641-Speed 9550.65 samples/sec   Loss 7.1315   LearningRate 0.0511   Epoch: 5   Global Step: 95180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:06,774-Speed 9048.53 samples/sec   Loss 7.2060   LearningRate 0.0511   Epoch: 5   Global Step: 95190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:07,829-Speed 9710.80 samples/sec   Loss 7.2559   LearningRate 0.0511   Epoch: 5   Global Step: 95200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:08,926-Speed 9339.82 samples/sec   Loss 7.2107   LearningRate 0.0511   Epoch: 5   Global Step: 95210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:09,993-Speed 9598.58 samples/sec   Loss 7.2119   LearningRate 0.0511   Epoch: 5   Global Step: 95220   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:11,068-Speed 9536.21 samples/sec   Loss 7.2244   LearningRate 0.0511   Epoch: 5   Global Step: 95230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:12,189-Speed 9139.23 samples/sec   Loss 7.2400   LearningRate 0.0511   Epoch: 5   Global Step: 95240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:13,293-Speed 9276.68 samples/sec   Loss 7.1775   LearningRate 0.0511   Epoch: 5   Global Step: 95250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:14,357-Speed 9634.67 samples/sec   Loss 7.1845   LearningRate 0.0511   Epoch: 5   Global Step: 95260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:15,429-Speed 9554.01 samples/sec   Loss 7.2781   LearningRate 0.0511   Epoch: 5   Global Step: 95270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:22:16,506-Speed 9514.23 samples/sec   Loss 7.2358   LearningRate 0.0511   Epoch: 5   Global Step: 95280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:22:17,579-Speed 9546.77 samples/sec   Loss 7.2445   LearningRate 0.0511   Epoch: 5   Global Step: 95290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:22:18,677-Speed 9332.97 samples/sec   Loss 7.2149   LearningRate 0.0511   Epoch: 5   Global Step: 95300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:22:19,753-Speed 9521.58 samples/sec   Loss 7.2909   LearningRate 0.0510   Epoch: 5   Global Step: 95310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:22:20,822-Speed 9585.37 samples/sec   Loss 7.2423   LearningRate 0.0510   Epoch: 5   Global Step: 95320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:22:21,914-Speed 9383.89 samples/sec   Loss 7.1773   LearningRate 0.0510   Epoch: 5   Global Step: 95330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:22:23,024-Speed 9228.92 samples/sec   Loss 7.1499   LearningRate 0.0510   Epoch: 5   Global Step: 95340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:22:24,136-Speed 9211.22 samples/sec   Loss 7.1335   LearningRate 0.0510   Epoch: 5   Global Step: 95350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:22:25,217-Speed 9484.29 samples/sec   Loss 7.3404   LearningRate 0.0510   Epoch: 5   Global Step: 95360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:22:26,333-Speed 9178.73 samples/sec   Loss 7.1902   LearningRate 0.0510   Epoch: 5   Global Step: 95370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:27,424-Speed 9392.02 samples/sec   Loss 7.2168   LearningRate 0.0510   Epoch: 5   Global Step: 95380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:28,510-Speed 9430.76 samples/sec   Loss 7.2904   LearningRate 0.0510   Epoch: 5   Global Step: 95390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:29,579-Speed 9588.50 samples/sec   Loss 7.1936   LearningRate 0.0510   Epoch: 5   Global Step: 95400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:30,683-Speed 9281.94 samples/sec   Loss 7.0944   LearningRate 0.0510   Epoch: 5   Global Step: 95410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:31,780-Speed 9343.25 samples/sec   Loss 7.1618   LearningRate 0.0510   Epoch: 5   Global Step: 95420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:32,817-Speed 9882.40 samples/sec   Loss 7.1852   LearningRate 0.0510   Epoch: 5   Global Step: 95430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:33,891-Speed 9538.90 samples/sec   Loss 7.2356   LearningRate 0.0510   Epoch: 5   Global Step: 95440   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:34,993-Speed 9296.46 samples/sec   Loss 7.2145   LearningRate 0.0510   Epoch: 5   Global Step: 95450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:36,106-Speed 9202.97 samples/sec   Loss 7.1091   LearningRate 0.0510   Epoch: 5   Global Step: 95460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:37,187-Speed 9481.73 samples/sec   Loss 7.3430   LearningRate 0.0510   Epoch: 5   Global Step: 95470   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:22:38,303-Speed 9178.20 samples/sec   Loss 7.3053   LearningRate 0.0510   Epoch: 5   Global Step: 95480   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:22:39,393-Speed 9402.90 samples/sec   Loss 7.2268   LearningRate 0.0510   Epoch: 5   Global Step: 95490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:40,542-Speed 8921.25 samples/sec   Loss 7.2200   LearningRate 0.0510   Epoch: 5   Global Step: 95500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:41,650-Speed 9240.26 samples/sec   Loss 7.1882   LearningRate 0.0510   Epoch: 5   Global Step: 95510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:42,770-Speed 9151.24 samples/sec   Loss 7.2520   LearningRate 0.0510   Epoch: 5   Global Step: 95520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:43,880-Speed 9230.65 samples/sec   Loss 7.3749   LearningRate 0.0510   Epoch: 5   Global Step: 95530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:44,956-Speed 9519.72 samples/sec   Loss 7.1719   LearningRate 0.0510   Epoch: 5   Global Step: 95540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:46,047-Speed 9393.84 samples/sec   Loss 7.2464   LearningRate 0.0509   Epoch: 5   Global Step: 95550   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:47,120-Speed 9545.81 samples/sec   Loss 7.2253   LearningRate 0.0509   Epoch: 5   Global Step: 95560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:48,205-Speed 9449.49 samples/sec   Loss 7.1994   LearningRate 0.0509   Epoch: 5   Global Step: 95570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:49,295-Speed 9396.29 samples/sec   Loss 7.1146   LearningRate 0.0509   Epoch: 5   Global Step: 95580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:50,351-Speed 9707.05 samples/sec   Loss 7.3213   LearningRate 0.0509   Epoch: 5   Global Step: 95590   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:22:51,410-Speed 9676.41 samples/sec   Loss 7.2016   LearningRate 0.0509   Epoch: 5   Global Step: 95600   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:22:52,500-Speed 9402.46 samples/sec   Loss 7.1851   LearningRate 0.0509   Epoch: 5   Global Step: 95610   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:22:53,581-Speed 9471.02 samples/sec   Loss 7.3399   LearningRate 0.0509   Epoch: 5   Global Step: 95620   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:22:54,660-Speed 9496.81 samples/sec   Loss 7.3452   LearningRate 0.0509   Epoch: 5   Global Step: 95630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:55,768-Speed 9255.61 samples/sec   Loss 7.2640   LearningRate 0.0509   Epoch: 5   Global Step: 95640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:56,838-Speed 9578.58 samples/sec   Loss 7.3817   LearningRate 0.0509   Epoch: 5   Global Step: 95650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:57,917-Speed 9490.44 samples/sec   Loss 7.2700   LearningRate 0.0509   Epoch: 5   Global Step: 95660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:22:58,995-Speed 9501.18 samples/sec   Loss 7.1833   LearningRate 0.0509   Epoch: 5   Global Step: 95670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:00,089-Speed 9369.17 samples/sec   Loss 7.1516   LearningRate 0.0509   Epoch: 5   Global Step: 95680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:01,139-Speed 9754.26 samples/sec   Loss 7.2402   LearningRate 0.0509   Epoch: 5   Global Step: 95690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:02,244-Speed 9278.68 samples/sec   Loss 7.2611   LearningRate 0.0509   Epoch: 5   Global Step: 95700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:23:03,295-Speed 9751.33 samples/sec   Loss 7.2000   LearningRate 0.0509   Epoch: 5   Global Step: 95710   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:23:04,358-Speed 9635.59 samples/sec   Loss 7.2946   LearningRate 0.0509   Epoch: 5   Global Step: 95720   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:23:05,456-Speed 9330.66 samples/sec   Loss 7.1466   LearningRate 0.0509   Epoch: 5   Global Step: 95730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:23:06,550-Speed 9366.16 samples/sec   Loss 7.1815   LearningRate 0.0509   Epoch: 5   Global Step: 95740   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:23:08,568-Speed 5075.06 samples/sec   Loss 7.2086   LearningRate 0.0509   Epoch: 5   Global Step: 95750   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:23:09,627-Speed 9677.49 samples/sec   Loss 7.2612   LearningRate 0.0509   Epoch: 5   Global Step: 95760   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:23:11,523-Speed 5403.30 samples/sec   Loss 7.1271   LearningRate 0.0509   Epoch: 5   Global Step: 95770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:23:13,400-Speed 5456.82 samples/sec   Loss 7.2394   LearningRate 0.0508   Epoch: 5   Global Step: 95780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:23:14,453-Speed 9737.68 samples/sec   Loss 7.2291   LearningRate 0.0508   Epoch: 5   Global Step: 95790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-11 15:23:15,508-Speed 9704.42 samples/sec   Loss 7.2288   LearningRate 0.0508   Epoch: 5   Global Step: 95800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:16,564-Speed 9709.95 samples/sec   Loss 7.1911   LearningRate 0.0508   Epoch: 5   Global Step: 95810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:17,660-Speed 9340.21 samples/sec   Loss 7.1800   LearningRate 0.0508   Epoch: 5   Global Step: 95820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:18,762-Speed 9299.31 samples/sec   Loss 7.1810   LearningRate 0.0508   Epoch: 5   Global Step: 95830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:19,849-Speed 9429.27 samples/sec   Loss 7.1374   LearningRate 0.0508   Epoch: 5   Global Step: 95840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:20,934-Speed 9444.67 samples/sec   Loss 7.2284   LearningRate 0.0508   Epoch: 5   Global Step: 95850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:22,011-Speed 9507.38 samples/sec   Loss 7.1055   LearningRate 0.0508   Epoch: 5   Global Step: 95860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:23,112-Speed 9309.04 samples/sec   Loss 7.2248   LearningRate 0.0508   Epoch: 5   Global Step: 95870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:24,189-Speed 9505.79 samples/sec   Loss 7.2472   LearningRate 0.0508   Epoch: 5   Global Step: 95880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:25,272-Speed 9467.69 samples/sec   Loss 7.3475   LearningRate 0.0508   Epoch: 5   Global Step: 95890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:26,355-Speed 9461.13 samples/sec   Loss 7.2769   LearningRate 0.0508   Epoch: 5   Global Step: 95900   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:23:27,420-Speed 9616.10 samples/sec   Loss 7.2730   LearningRate 0.0508   Epoch: 5   Global Step: 95910   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:23:28,496-Speed 9528.61 samples/sec   Loss 7.1783   LearningRate 0.0508   Epoch: 5   Global Step: 95920   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:29,533-Speed 9878.30 samples/sec   Loss 7.1702   LearningRate 0.0508   Epoch: 5   Global Step: 95930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:30,615-Speed 9468.97 samples/sec   Loss 7.2998   LearningRate 0.0508   Epoch: 5   Global Step: 95940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:31,740-Speed 9109.15 samples/sec   Loss 7.1866   LearningRate 0.0508   Epoch: 5   Global Step: 95950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:32,820-Speed 9488.35 samples/sec   Loss 7.1852   LearningRate 0.0508   Epoch: 5   Global Step: 95960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:33,880-Speed 9663.48 samples/sec   Loss 7.2241   LearningRate 0.0508   Epoch: 5   Global Step: 95970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:34,925-Speed 9804.56 samples/sec   Loss 7.1063   LearningRate 0.0508   Epoch: 5   Global Step: 95980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:35,965-Speed 9852.13 samples/sec   Loss 7.2969   LearningRate 0.0508   Epoch: 5   Global Step: 95990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:37,046-Speed 9483.49 samples/sec   Loss 7.1934   LearningRate 0.0508   Epoch: 5   Global Step: 96000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:23:59,103-[lfw][96000]XNorm: 11.380481
Training: 2022-04-11 15:23:59,103-[lfw][96000]Accuracy-Flip: 0.99517+-0.00241
Training: 2022-04-11 15:23:59,104-[lfw][96000]Accuracy-Highest: 0.99683
Training: 2022-04-11 15:24:24,641-[cfp_fp][96000]XNorm: 9.756551
Training: 2022-04-11 15:24:24,642-[cfp_fp][96000]Accuracy-Flip: 0.95471+-0.01250
Training: 2022-04-11 15:24:24,642-[cfp_fp][96000]Accuracy-Highest: 0.95729
Training: 2022-04-11 15:24:46,695-[agedb_30][96000]XNorm: 11.009636
Training: 2022-04-11 15:24:46,695-[agedb_30][96000]Accuracy-Flip: 0.95867+-0.00862
Training: 2022-04-11 15:24:46,696-[agedb_30][96000]Accuracy-Highest: 0.96317
Training: 2022-04-11 15:24:47,788-Speed 144.75 samples/sec   Loss 7.1743   LearningRate 0.0507   Epoch: 5   Global Step: 96010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:24:48,851-Speed 9641.02 samples/sec   Loss 7.3405   LearningRate 0.0507   Epoch: 5   Global Step: 96020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:24:49,989-Speed 9006.71 samples/sec   Loss 7.1791   LearningRate 0.0507   Epoch: 5   Global Step: 96030   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:24:51,103-Speed 9193.31 samples/sec   Loss 7.1436   LearningRate 0.0507   Epoch: 5   Global Step: 96040   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:24:52,198-Speed 9362.88 samples/sec   Loss 7.1933   LearningRate 0.0507   Epoch: 5   Global Step: 96050   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:24:53,243-Speed 9801.07 samples/sec   Loss 7.2383   LearningRate 0.0507   Epoch: 5   Global Step: 96060   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:24:54,319-Speed 9518.21 samples/sec   Loss 7.3854   LearningRate 0.0507   Epoch: 5   Global Step: 96070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:24:55,407-Speed 9416.94 samples/sec   Loss 7.3345   LearningRate 0.0507   Epoch: 5   Global Step: 96080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:24:56,485-Speed 9511.59 samples/sec   Loss 7.2706   LearningRate 0.0507   Epoch: 5   Global Step: 96090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:24:57,562-Speed 9518.22 samples/sec   Loss 7.1717   LearningRate 0.0507   Epoch: 5   Global Step: 96100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:24:58,620-Speed 9683.48 samples/sec   Loss 7.2561   LearningRate 0.0507   Epoch: 5   Global Step: 96110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:24:59,656-Speed 9882.05 samples/sec   Loss 7.1685   LearningRate 0.0507   Epoch: 5   Global Step: 96120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:00,799-Speed 8967.34 samples/sec   Loss 7.2044   LearningRate 0.0507   Epoch: 5   Global Step: 96130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:01,905-Speed 9258.36 samples/sec   Loss 7.2961   LearningRate 0.0507   Epoch: 5   Global Step: 96140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:02,998-Speed 9380.22 samples/sec   Loss 7.1834   LearningRate 0.0507   Epoch: 5   Global Step: 96150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:04,084-Speed 9430.13 samples/sec   Loss 7.2434   LearningRate 0.0507   Epoch: 5   Global Step: 96160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:05,207-Speed 9125.98 samples/sec   Loss 7.1095   LearningRate 0.0507   Epoch: 5   Global Step: 96170   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:06,290-Speed 9466.03 samples/sec   Loss 7.2423   LearningRate 0.0507   Epoch: 5   Global Step: 96180   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:07,401-Speed 9217.30 samples/sec   Loss 7.3537   LearningRate 0.0507   Epoch: 5   Global Step: 96190   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:08,457-Speed 9703.81 samples/sec   Loss 7.1609   LearningRate 0.0507   Epoch: 5   Global Step: 96200   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:09,534-Speed 9512.64 samples/sec   Loss 7.2767   LearningRate 0.0507   Epoch: 5   Global Step: 96210   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:10,640-Speed 9264.40 samples/sec   Loss 7.2812   LearningRate 0.0507   Epoch: 5   Global Step: 96220   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:25:11,726-Speed 9434.83 samples/sec   Loss 7.2207   LearningRate 0.0507   Epoch: 5   Global Step: 96230   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:12,820-Speed 9366.68 samples/sec   Loss 7.2261   LearningRate 0.0507   Epoch: 5   Global Step: 96240   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:13,913-Speed 9369.59 samples/sec   Loss 7.1392   LearningRate 0.0506   Epoch: 5   Global Step: 96250   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:15,006-Speed 9382.72 samples/sec   Loss 7.2908   LearningRate 0.0506   Epoch: 5   Global Step: 96260   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:16,112-Speed 9259.34 samples/sec   Loss 7.2625   LearningRate 0.0506   Epoch: 5   Global Step: 96270   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:17,237-Speed 9109.07 samples/sec   Loss 7.0717   LearningRate 0.0506   Epoch: 5   Global Step: 96280   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:18,309-Speed 9558.24 samples/sec   Loss 7.1139   LearningRate 0.0506   Epoch: 5   Global Step: 96290   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:19,393-Speed 9456.01 samples/sec   Loss 7.2635   LearningRate 0.0506   Epoch: 5   Global Step: 96300   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:20,455-Speed 9649.30 samples/sec   Loss 7.2037   LearningRate 0.0506   Epoch: 5   Global Step: 96310   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:21,529-Speed 9533.82 samples/sec   Loss 7.1489   LearningRate 0.0506   Epoch: 5   Global Step: 96320   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:22,652-Speed 9124.73 samples/sec   Loss 7.3309   LearningRate 0.0506   Epoch: 5   Global Step: 96330   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:25:23,718-Speed 9614.89 samples/sec   Loss 7.2123   LearningRate 0.0506   Epoch: 5   Global Step: 96340   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:24,798-Speed 9483.15 samples/sec   Loss 7.2087   LearningRate 0.0506   Epoch: 5   Global Step: 96350   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:25,892-Speed 9369.47 samples/sec   Loss 7.2535   LearningRate 0.0506   Epoch: 5   Global Step: 96360   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:26,988-Speed 9347.47 samples/sec   Loss 7.2190   LearningRate 0.0506   Epoch: 5   Global Step: 96370   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:28,111-Speed 9120.61 samples/sec   Loss 7.2488   LearningRate 0.0506   Epoch: 5   Global Step: 96380   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:29,209-Speed 9336.39 samples/sec   Loss 7.1824   LearningRate 0.0506   Epoch: 5   Global Step: 96390   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:30,372-Speed 8807.69 samples/sec   Loss 7.2009   LearningRate 0.0506   Epoch: 5   Global Step: 96400   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:31,485-Speed 9200.80 samples/sec   Loss 7.1208   LearningRate 0.0506   Epoch: 5   Global Step: 96410   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:32,599-Speed 9200.38 samples/sec   Loss 7.1604   LearningRate 0.0506   Epoch: 5   Global Step: 96420   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:33,699-Speed 9315.45 samples/sec   Loss 7.2575   LearningRate 0.0506   Epoch: 5   Global Step: 96430   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:34,790-Speed 9393.89 samples/sec   Loss 7.1527   LearningRate 0.0506   Epoch: 5   Global Step: 96440   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:25:35,822-Speed 9923.56 samples/sec   Loss 7.1546   LearningRate 0.0506   Epoch: 5   Global Step: 96450   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:36,931-Speed 9239.94 samples/sec   Loss 7.1514   LearningRate 0.0506   Epoch: 5   Global Step: 96460   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:38,009-Speed 9503.75 samples/sec   Loss 7.2613   LearningRate 0.0506   Epoch: 5   Global Step: 96470   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:39,068-Speed 9678.71 samples/sec   Loss 7.1566   LearningRate 0.0505   Epoch: 5   Global Step: 96480   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:40,127-Speed 9678.41 samples/sec   Loss 7.1983   LearningRate 0.0505   Epoch: 5   Global Step: 96490   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:41,180-Speed 9723.76 samples/sec   Loss 7.2273   LearningRate 0.0505   Epoch: 5   Global Step: 96500   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:42,277-Speed 9343.29 samples/sec   Loss 7.3361   LearningRate 0.0505   Epoch: 5   Global Step: 96510   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:43,346-Speed 9581.36 samples/sec   Loss 7.2559   LearningRate 0.0505   Epoch: 5   Global Step: 96520   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:44,384-Speed 9873.16 samples/sec   Loss 7.1762   LearningRate 0.0505   Epoch: 5   Global Step: 96530   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:45,418-Speed 9908.84 samples/sec   Loss 7.1950   LearningRate 0.0505   Epoch: 5   Global Step: 96540   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:46,476-Speed 9689.05 samples/sec   Loss 7.2399   LearningRate 0.0505   Epoch: 5   Global Step: 96550   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:25:47,551-Speed 9531.16 samples/sec   Loss 7.3249   LearningRate 0.0505   Epoch: 5   Global Step: 96560   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:48,632-Speed 9475.06 samples/sec   Loss 7.1950   LearningRate 0.0505   Epoch: 5   Global Step: 96570   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:49,739-Speed 9256.28 samples/sec   Loss 7.2380   LearningRate 0.0505   Epoch: 5   Global Step: 96580   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:50,839-Speed 9314.89 samples/sec   Loss 7.1161   LearningRate 0.0505   Epoch: 5   Global Step: 96590   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:51,922-Speed 9455.71 samples/sec   Loss 7.3035   LearningRate 0.0505   Epoch: 5   Global Step: 96600   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:53,011-Speed 9411.32 samples/sec   Loss 7.1060   LearningRate 0.0505   Epoch: 5   Global Step: 96610   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:54,109-Speed 9328.77 samples/sec   Loss 7.3096   LearningRate 0.0505   Epoch: 5   Global Step: 96620   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:55,213-Speed 9280.99 samples/sec   Loss 7.2105   LearningRate 0.0505   Epoch: 5   Global Step: 96630   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:56,265-Speed 9751.33 samples/sec   Loss 7.2210   LearningRate 0.0505   Epoch: 5   Global Step: 96640   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:57,340-Speed 9526.09 samples/sec   Loss 7.3183   LearningRate 0.0505   Epoch: 5   Global Step: 96650   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:58,393-Speed 9730.36 samples/sec   Loss 7.2727   LearningRate 0.0505   Epoch: 5   Global Step: 96660   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:25:59,476-Speed 9459.81 samples/sec   Loss 7.1588   LearningRate 0.0505   Epoch: 5   Global Step: 96670   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:00,559-Speed 9467.24 samples/sec   Loss 7.2935   LearningRate 0.0505   Epoch: 5   Global Step: 96680   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:01,611-Speed 9736.23 samples/sec   Loss 7.2827   LearningRate 0.0505   Epoch: 5   Global Step: 96690   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:02,703-Speed 9387.60 samples/sec   Loss 7.2052   LearningRate 0.0505   Epoch: 5   Global Step: 96700   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:03,754-Speed 9741.10 samples/sec   Loss 7.2308   LearningRate 0.0505   Epoch: 5   Global Step: 96710   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:04,838-Speed 9455.79 samples/sec   Loss 7.2463   LearningRate 0.0504   Epoch: 5   Global Step: 96720   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:05,913-Speed 9527.10 samples/sec   Loss 7.1357   LearningRate 0.0504   Epoch: 5   Global Step: 96730   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:06,974-Speed 9658.53 samples/sec   Loss 7.1473   LearningRate 0.0504   Epoch: 5   Global Step: 96740   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:08,047-Speed 9550.25 samples/sec   Loss 7.2076   LearningRate 0.0504   Epoch: 5   Global Step: 96750   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:09,169-Speed 9134.70 samples/sec   Loss 7.2625   LearningRate 0.0504   Epoch: 5   Global Step: 96760   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:26:10,273-Speed 9275.96 samples/sec   Loss 7.2599   LearningRate 0.0504   Epoch: 5   Global Step: 96770   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:26:11,378-Speed 9267.52 samples/sec   Loss 7.1840   LearningRate 0.0504   Epoch: 5   Global Step: 96780   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:26:12,456-Speed 9511.24 samples/sec   Loss 7.1098   LearningRate 0.0504   Epoch: 5   Global Step: 96790   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:26:13,517-Speed 9656.17 samples/sec   Loss 7.2385   LearningRate 0.0504   Epoch: 5   Global Step: 96800   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:14,584-Speed 9598.84 samples/sec   Loss 7.1214   LearningRate 0.0504   Epoch: 5   Global Step: 96810   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:15,637-Speed 9739.97 samples/sec   Loss 7.2192   LearningRate 0.0504   Epoch: 5   Global Step: 96820   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:16,666-Speed 9960.29 samples/sec   Loss 7.1819   LearningRate 0.0504   Epoch: 5   Global Step: 96830   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:17,721-Speed 9705.40 samples/sec   Loss 7.1518   LearningRate 0.0504   Epoch: 5   Global Step: 96840   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:18,824-Speed 9291.15 samples/sec   Loss 7.1400   LearningRate 0.0504   Epoch: 5   Global Step: 96850   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:19,908-Speed 9455.08 samples/sec   Loss 7.2898   LearningRate 0.0504   Epoch: 5   Global Step: 96860   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:20,962-Speed 9714.05 samples/sec   Loss 7.3221   LearningRate 0.0504   Epoch: 5   Global Step: 96870   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:22,062-Speed 9316.64 samples/sec   Loss 7.2841   LearningRate 0.0504   Epoch: 5   Global Step: 96880   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:23,137-Speed 9527.58 samples/sec   Loss 7.2167   LearningRate 0.0504   Epoch: 5   Global Step: 96890   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:24,225-Speed 9417.55 samples/sec   Loss 7.3017   LearningRate 0.0504   Epoch: 5   Global Step: 96900   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:26:25,315-Speed 9398.26 samples/sec   Loss 7.1732   LearningRate 0.0504   Epoch: 5   Global Step: 96910   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:26:26,387-Speed 9560.79 samples/sec   Loss 7.1389   LearningRate 0.0504   Epoch: 5   Global Step: 96920   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:26:27,460-Speed 9554.94 samples/sec   Loss 7.2703   LearningRate 0.0504   Epoch: 5   Global Step: 96930   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:28,534-Speed 9531.20 samples/sec   Loss 7.1259   LearningRate 0.0504   Epoch: 5   Global Step: 96940   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:29,622-Speed 9417.98 samples/sec   Loss 7.1453   LearningRate 0.0503   Epoch: 5   Global Step: 96950   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:30,693-Speed 9572.85 samples/sec   Loss 7.0821   LearningRate 0.0503   Epoch: 5   Global Step: 96960   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:31,736-Speed 9815.26 samples/sec   Loss 7.2795   LearningRate 0.0503   Epoch: 5   Global Step: 96970   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:32,832-Speed 9364.28 samples/sec   Loss 7.0830   LearningRate 0.0503   Epoch: 5   Global Step: 96980   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:33,964-Speed 9051.14 samples/sec   Loss 7.2427   LearningRate 0.0503   Epoch: 5   Global Step: 96990   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:35,053-Speed 9409.15 samples/sec   Loss 7.2598   LearningRate 0.0503   Epoch: 5   Global Step: 97000   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:36,125-Speed 9559.53 samples/sec   Loss 7.3036   LearningRate 0.0503   Epoch: 5   Global Step: 97010   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:37,183-Speed 9679.43 samples/sec   Loss 7.1941   LearningRate 0.0503   Epoch: 5   Global Step: 97020   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:38,246-Speed 9643.01 samples/sec   Loss 7.2606   LearningRate 0.0503   Epoch: 5   Global Step: 97030   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:26:39,317-Speed 9565.50 samples/sec   Loss 7.2891   LearningRate 0.0503   Epoch: 5   Global Step: 97040   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:26:40,434-Speed 9173.10 samples/sec   Loss 7.2167   LearningRate 0.0503   Epoch: 5   Global Step: 97050   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:26:41,530-Speed 9345.77 samples/sec   Loss 7.1342   LearningRate 0.0503   Epoch: 5   Global Step: 97060   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:26:42,618-Speed 9416.82 samples/sec   Loss 7.2010   LearningRate 0.0503   Epoch: 5   Global Step: 97070   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:43,671-Speed 9726.93 samples/sec   Loss 7.1337   LearningRate 0.0503   Epoch: 5   Global Step: 97080   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:44,749-Speed 9504.26 samples/sec   Loss 7.1794   LearningRate 0.0503   Epoch: 5   Global Step: 97090   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:45,804-Speed 9719.39 samples/sec   Loss 7.1227   LearningRate 0.0503   Epoch: 5   Global Step: 97100   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:46,930-Speed 9097.58 samples/sec   Loss 7.1903   LearningRate 0.0503   Epoch: 5   Global Step: 97110   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:48,004-Speed 9539.87 samples/sec   Loss 7.2422   LearningRate 0.0503   Epoch: 5   Global Step: 97120   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:49,070-Speed 9613.59 samples/sec   Loss 7.2460   LearningRate 0.0503   Epoch: 5   Global Step: 97130   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:50,140-Speed 9574.68 samples/sec   Loss 7.3515   LearningRate 0.0503   Epoch: 5   Global Step: 97140   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:51,230-Speed 9399.25 samples/sec   Loss 7.2400   LearningRate 0.0503   Epoch: 5   Global Step: 97150   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:52,283-Speed 9726.29 samples/sec   Loss 7.2929   LearningRate 0.0503   Epoch: 5   Global Step: 97160   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-04-11 15:26:53,371-Speed 9419.42 samples/sec   Loss 7.3127   LearningRate 0.0503   Epoch: 5   Global Step: 97170   Fp16 Grad Scale: 262144   Required: 10 hours
Training: 2022-04-11 15:26:54,455-Speed 9450.82 samples/sec   Loss 7.2349   LearningRate 0.0503   Epoch: 5   Global Step: 97180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:26:55,519-Speed 9627.37 samples/sec   Loss 7.2452   LearningRate 0.0502   Epoch: 5   Global Step: 97190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:26:56,638-Speed 9165.71 samples/sec   Loss 7.2908   LearningRate 0.0502   Epoch: 5   Global Step: 97200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:26:57,680-Speed 9833.63 samples/sec   Loss 7.1538   LearningRate 0.0502   Epoch: 5   Global Step: 97210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:26:58,750-Speed 9576.39 samples/sec   Loss 7.2549   LearningRate 0.0502   Epoch: 5   Global Step: 97220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:26:59,841-Speed 9390.12 samples/sec   Loss 7.3053   LearningRate 0.0502   Epoch: 5   Global Step: 97230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:00,926-Speed 9438.77 samples/sec   Loss 7.1610   LearningRate 0.0502   Epoch: 5   Global Step: 97240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:02,055-Speed 9073.58 samples/sec   Loss 7.1817   LearningRate 0.0502   Epoch: 5   Global Step: 97250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:03,188-Speed 9049.17 samples/sec   Loss 7.2178   LearningRate 0.0502   Epoch: 5   Global Step: 97260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:04,240-Speed 9733.84 samples/sec   Loss 7.2154   LearningRate 0.0502   Epoch: 5   Global Step: 97270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:05,336-Speed 9354.65 samples/sec   Loss 7.0629   LearningRate 0.0502   Epoch: 5   Global Step: 97280   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:27:06,415-Speed 9488.09 samples/sec   Loss 7.2974   LearningRate 0.0502   Epoch: 5   Global Step: 97290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:07,544-Speed 9078.56 samples/sec   Loss 7.1511   LearningRate 0.0502   Epoch: 5   Global Step: 97300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:08,602-Speed 9683.15 samples/sec   Loss 7.1657   LearningRate 0.0502   Epoch: 5   Global Step: 97310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:09,718-Speed 9182.98 samples/sec   Loss 7.1312   LearningRate 0.0502   Epoch: 5   Global Step: 97320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:10,761-Speed 9822.72 samples/sec   Loss 7.2228   LearningRate 0.0502   Epoch: 5   Global Step: 97330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:11,869-Speed 9247.35 samples/sec   Loss 7.1717   LearningRate 0.0502   Epoch: 5   Global Step: 97340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:13,000-Speed 9054.13 samples/sec   Loss 7.2045   LearningRate 0.0502   Epoch: 5   Global Step: 97350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:14,114-Speed 9197.62 samples/sec   Loss 7.0325   LearningRate 0.0502   Epoch: 5   Global Step: 97360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:15,220-Speed 9273.28 samples/sec   Loss 7.2191   LearningRate 0.0502   Epoch: 5   Global Step: 97370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:16,331-Speed 9217.29 samples/sec   Loss 7.1898   LearningRate 0.0502   Epoch: 5   Global Step: 97380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:17,420-Speed 9412.01 samples/sec   Loss 7.2279   LearningRate 0.0502   Epoch: 5   Global Step: 97390   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:27:18,514-Speed 9364.35 samples/sec   Loss 7.1886   LearningRate 0.0502   Epoch: 5   Global Step: 97400   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:27:19,603-Speed 9411.66 samples/sec   Loss 7.1346   LearningRate 0.0502   Epoch: 5   Global Step: 97410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:20,655-Speed 9732.47 samples/sec   Loss 7.1692   LearningRate 0.0501   Epoch: 5   Global Step: 97420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:21,752-Speed 9345.84 samples/sec   Loss 7.1837   LearningRate 0.0501   Epoch: 5   Global Step: 97430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:22,827-Speed 9523.76 samples/sec   Loss 7.1700   LearningRate 0.0501   Epoch: 5   Global Step: 97440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:23,915-Speed 9423.36 samples/sec   Loss 7.2556   LearningRate 0.0501   Epoch: 5   Global Step: 97450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:25,010-Speed 9356.78 samples/sec   Loss 7.1507   LearningRate 0.0501   Epoch: 5   Global Step: 97460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:26,097-Speed 9423.06 samples/sec   Loss 7.0979   LearningRate 0.0501   Epoch: 5   Global Step: 97470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:27,212-Speed 9191.72 samples/sec   Loss 7.2846   LearningRate 0.0501   Epoch: 5   Global Step: 97480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:28,288-Speed 9525.02 samples/sec   Loss 7.2369   LearningRate 0.0501   Epoch: 5   Global Step: 97490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:29,372-Speed 9450.36 samples/sec   Loss 7.1664   LearningRate 0.0501   Epoch: 5   Global Step: 97500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:30,444-Speed 9552.12 samples/sec   Loss 7.1054   LearningRate 0.0501   Epoch: 5   Global Step: 97510   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:27:31,481-Speed 9881.16 samples/sec   Loss 7.1505   LearningRate 0.0501   Epoch: 5   Global Step: 97520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:32,539-Speed 9687.59 samples/sec   Loss 7.1081   LearningRate 0.0501   Epoch: 5   Global Step: 97530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:33,625-Speed 9433.57 samples/sec   Loss 7.1958   LearningRate 0.0501   Epoch: 5   Global Step: 97540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:34,753-Speed 9082.09 samples/sec   Loss 7.1827   LearningRate 0.0501   Epoch: 5   Global Step: 97550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:35,818-Speed 9628.21 samples/sec   Loss 7.2700   LearningRate 0.0501   Epoch: 5   Global Step: 97560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:36,898-Speed 9484.29 samples/sec   Loss 7.1665   LearningRate 0.0501   Epoch: 5   Global Step: 97570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:37,989-Speed 9393.25 samples/sec   Loss 7.1497   LearningRate 0.0501   Epoch: 5   Global Step: 97580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:39,073-Speed 9452.77 samples/sec   Loss 7.1935   LearningRate 0.0501   Epoch: 5   Global Step: 97590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:40,144-Speed 9571.14 samples/sec   Loss 7.2459   LearningRate 0.0501   Epoch: 5   Global Step: 97600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:27:41,208-Speed 9626.65 samples/sec   Loss 7.1491   LearningRate 0.0501   Epoch: 5   Global Step: 97610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:27:42,259-Speed 9747.05 samples/sec   Loss 7.1793   LearningRate 0.0501   Epoch: 5   Global Step: 97620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:27:43,315-Speed 9703.18 samples/sec   Loss 7.2410   LearningRate 0.0501   Epoch: 5   Global Step: 97630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:27:44,363-Speed 9769.93 samples/sec   Loss 7.2333   LearningRate 0.0501   Epoch: 5   Global Step: 97640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:27:45,445-Speed 9474.67 samples/sec   Loss 7.2103   LearningRate 0.0501   Epoch: 5   Global Step: 97650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:27:46,506-Speed 9663.56 samples/sec   Loss 7.1485   LearningRate 0.0500   Epoch: 5   Global Step: 97660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:27:47,603-Speed 9337.26 samples/sec   Loss 7.1413   LearningRate 0.0500   Epoch: 5   Global Step: 97670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:27:48,728-Speed 9106.23 samples/sec   Loss 7.2206   LearningRate 0.0500   Epoch: 5   Global Step: 97680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:27:49,807-Speed 9493.98 samples/sec   Loss 7.0966   LearningRate 0.0500   Epoch: 5   Global Step: 97690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:27:50,843-Speed 9892.84 samples/sec   Loss 7.1590   LearningRate 0.0500   Epoch: 5   Global Step: 97700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:51,907-Speed 9631.47 samples/sec   Loss 7.1917   LearningRate 0.0500   Epoch: 5   Global Step: 97710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:52,949-Speed 9829.19 samples/sec   Loss 7.2038   LearningRate 0.0500   Epoch: 5   Global Step: 97720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:54,069-Speed 9142.54 samples/sec   Loss 7.2131   LearningRate 0.0500   Epoch: 5   Global Step: 97730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:55,151-Speed 9473.87 samples/sec   Loss 7.2299   LearningRate 0.0500   Epoch: 5   Global Step: 97740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:56,258-Speed 9264.75 samples/sec   Loss 7.0421   LearningRate 0.0500   Epoch: 5   Global Step: 97750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:57,404-Speed 8938.95 samples/sec   Loss 7.2244   LearningRate 0.0500   Epoch: 5   Global Step: 97760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:58,460-Speed 9705.84 samples/sec   Loss 7.2462   LearningRate 0.0500   Epoch: 5   Global Step: 97770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:27:59,503-Speed 9825.19 samples/sec   Loss 7.2485   LearningRate 0.0500   Epoch: 5   Global Step: 97780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:00,596-Speed 9369.90 samples/sec   Loss 7.2243   LearningRate 0.0500   Epoch: 5   Global Step: 97790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:01,697-Speed 9305.62 samples/sec   Loss 7.2120   LearningRate 0.0500   Epoch: 5   Global Step: 97800   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:28:02,785-Speed 9422.74 samples/sec   Loss 7.2168   LearningRate 0.0500   Epoch: 5   Global Step: 97810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:03,862-Speed 9511.39 samples/sec   Loss 7.2019   LearningRate 0.0500   Epoch: 5   Global Step: 97820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:04,928-Speed 9608.70 samples/sec   Loss 7.2587   LearningRate 0.0500   Epoch: 5   Global Step: 97830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:06,024-Speed 9353.63 samples/sec   Loss 7.2685   LearningRate 0.0500   Epoch: 5   Global Step: 97840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:07,129-Speed 9269.80 samples/sec   Loss 7.1431   LearningRate 0.0500   Epoch: 5   Global Step: 97850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:08,230-Speed 9304.51 samples/sec   Loss 7.2586   LearningRate 0.0500   Epoch: 5   Global Step: 97860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:09,325-Speed 9355.11 samples/sec   Loss 7.2250   LearningRate 0.0500   Epoch: 5   Global Step: 97870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:10,444-Speed 9158.49 samples/sec   Loss 7.1324   LearningRate 0.0500   Epoch: 5   Global Step: 97880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:11,577-Speed 9045.04 samples/sec   Loss 7.0383   LearningRate 0.0500   Epoch: 5   Global Step: 97890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:12,657-Speed 9483.82 samples/sec   Loss 7.2495   LearningRate 0.0499   Epoch: 5   Global Step: 97900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:13,742-Speed 9440.17 samples/sec   Loss 7.1768   LearningRate 0.0499   Epoch: 5   Global Step: 97910   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:28:14,834-Speed 9390.16 samples/sec   Loss 7.1275   LearningRate 0.0499   Epoch: 5   Global Step: 97920   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:28:15,940-Speed 9267.73 samples/sec   Loss 7.1562   LearningRate 0.0499   Epoch: 5   Global Step: 97930   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:28:17,027-Speed 9424.93 samples/sec   Loss 7.2049   LearningRate 0.0499   Epoch: 5   Global Step: 97940   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:28:18,093-Speed 9603.39 samples/sec   Loss 7.0857   LearningRate 0.0499   Epoch: 5   Global Step: 97950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:19,187-Speed 9366.47 samples/sec   Loss 7.1705   LearningRate 0.0499   Epoch: 5   Global Step: 97960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:20,287-Speed 9312.76 samples/sec   Loss 7.1475   LearningRate 0.0499   Epoch: 5   Global Step: 97970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:21,365-Speed 9504.06 samples/sec   Loss 7.1852   LearningRate 0.0499   Epoch: 5   Global Step: 97980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:22,474-Speed 9242.13 samples/sec   Loss 7.2440   LearningRate 0.0499   Epoch: 5   Global Step: 97990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:23,585-Speed 9223.17 samples/sec   Loss 7.1621   LearningRate 0.0499   Epoch: 5   Global Step: 98000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:28:45,655-[lfw][98000]XNorm: 11.242619
Training: 2022-04-11 15:28:45,656-[lfw][98000]Accuracy-Flip: 0.99450+-0.00299
Training: 2022-04-11 15:28:45,657-[lfw][98000]Accuracy-Highest: 0.99683
Training: 2022-04-11 15:29:11,162-[cfp_fp][98000]XNorm: 9.602968
Training: 2022-04-11 15:29:11,163-[cfp_fp][98000]Accuracy-Flip: 0.95414+-0.01179
Training: 2022-04-11 15:29:11,164-[cfp_fp][98000]Accuracy-Highest: 0.95729
Training: 2022-04-11 15:29:33,200-[agedb_30][98000]XNorm: 10.926428
Training: 2022-04-11 15:29:33,201-[agedb_30][98000]Accuracy-Flip: 0.95817+-0.00935
Training: 2022-04-11 15:29:33,201-[agedb_30][98000]Accuracy-Highest: 0.96317
Training: 2022-04-11 15:29:34,286-Speed 144.84 samples/sec   Loss 7.0843   LearningRate 0.0499   Epoch: 5   Global Step: 98010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:29:35,359-Speed 9548.15 samples/sec   Loss 7.1387   LearningRate 0.0499   Epoch: 5   Global Step: 98020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:29:36,422-Speed 9637.52 samples/sec   Loss 7.1830   LearningRate 0.0499   Epoch: 5   Global Step: 98030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:29:37,510-Speed 9417.36 samples/sec   Loss 7.1920   LearningRate 0.0499   Epoch: 5   Global Step: 98040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:29:38,612-Speed 9301.58 samples/sec   Loss 7.1992   LearningRate 0.0499   Epoch: 5   Global Step: 98050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:29:39,695-Speed 9458.50 samples/sec   Loss 7.0547   LearningRate 0.0499   Epoch: 5   Global Step: 98060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:29:40,788-Speed 9377.14 samples/sec   Loss 7.1584   LearningRate 0.0499   Epoch: 5   Global Step: 98070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:29:41,848-Speed 9669.64 samples/sec   Loss 7.2171   LearningRate 0.0499   Epoch: 5   Global Step: 98080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:29:42,944-Speed 9344.31 samples/sec   Loss 7.2410   LearningRate 0.0499   Epoch: 5   Global Step: 98090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:29:44,034-Speed 9399.45 samples/sec   Loss 7.1006   LearningRate 0.0499   Epoch: 5   Global Step: 98100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:29:45,119-Speed 9443.36 samples/sec   Loss 7.0862   LearningRate 0.0499   Epoch: 5   Global Step: 98110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:29:46,197-Speed 9505.40 samples/sec   Loss 7.2215   LearningRate 0.0499   Epoch: 5   Global Step: 98120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:29:47,308-Speed 9225.59 samples/sec   Loss 7.1109   LearningRate 0.0498   Epoch: 5   Global Step: 98130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:29:48,399-Speed 9387.93 samples/sec   Loss 7.2681   LearningRate 0.0498   Epoch: 5   Global Step: 98140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:29:49,487-Speed 9418.27 samples/sec   Loss 7.1271   LearningRate 0.0498   Epoch: 5   Global Step: 98150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:29:50,575-Speed 9411.66 samples/sec   Loss 7.1265   LearningRate 0.0498   Epoch: 5   Global Step: 98160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:29:51,656-Speed 9477.92 samples/sec   Loss 7.0474   LearningRate 0.0498   Epoch: 5   Global Step: 98170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:29:52,689-Speed 9918.90 samples/sec   Loss 7.2549   LearningRate 0.0498   Epoch: 5   Global Step: 98180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:29:53,770-Speed 9482.20 samples/sec   Loss 7.1418   LearningRate 0.0498   Epoch: 5   Global Step: 98190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:29:54,817-Speed 9791.18 samples/sec   Loss 7.1389   LearningRate 0.0498   Epoch: 5   Global Step: 98200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:29:55,913-Speed 9348.35 samples/sec   Loss 7.1150   LearningRate 0.0498   Epoch: 5   Global Step: 98210   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:29:56,981-Speed 9589.99 samples/sec   Loss 7.2225   LearningRate 0.0498   Epoch: 5   Global Step: 98220   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:29:58,015-Speed 9909.68 samples/sec   Loss 7.1891   LearningRate 0.0498   Epoch: 5   Global Step: 98230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:29:59,123-Speed 9244.56 samples/sec   Loss 7.0658   LearningRate 0.0498   Epoch: 5   Global Step: 98240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:00,262-Speed 8995.43 samples/sec   Loss 7.2784   LearningRate 0.0498   Epoch: 5   Global Step: 98250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:01,308-Speed 9798.02 samples/sec   Loss 7.1840   LearningRate 0.0498   Epoch: 5   Global Step: 98260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:02,390-Speed 9473.85 samples/sec   Loss 7.2719   LearningRate 0.0498   Epoch: 5   Global Step: 98270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:03,480-Speed 9399.18 samples/sec   Loss 7.1324   LearningRate 0.0498   Epoch: 5   Global Step: 98280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:04,576-Speed 9350.21 samples/sec   Loss 7.1741   LearningRate 0.0498   Epoch: 5   Global Step: 98290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:05,708-Speed 9049.77 samples/sec   Loss 7.2224   LearningRate 0.0498   Epoch: 5   Global Step: 98300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:06,800-Speed 9388.54 samples/sec   Loss 7.1624   LearningRate 0.0498   Epoch: 5   Global Step: 98310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:07,876-Speed 9520.08 samples/sec   Loss 7.1452   LearningRate 0.0498   Epoch: 5   Global Step: 98320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:08,952-Speed 9527.03 samples/sec   Loss 7.1856   LearningRate 0.0498   Epoch: 5   Global Step: 98330   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:30:09,989-Speed 9876.36 samples/sec   Loss 7.1211   LearningRate 0.0498   Epoch: 5   Global Step: 98340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:11,046-Speed 9691.30 samples/sec   Loss 7.2812   LearningRate 0.0498   Epoch: 5   Global Step: 98350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:12,170-Speed 9114.05 samples/sec   Loss 7.1984   LearningRate 0.0498   Epoch: 5   Global Step: 98360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:13,239-Speed 9586.21 samples/sec   Loss 7.1640   LearningRate 0.0497   Epoch: 5   Global Step: 98370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:14,312-Speed 9549.29 samples/sec   Loss 7.1900   LearningRate 0.0497   Epoch: 5   Global Step: 98380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:15,409-Speed 9341.19 samples/sec   Loss 7.2164   LearningRate 0.0497   Epoch: 5   Global Step: 98390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:16,480-Speed 9566.15 samples/sec   Loss 7.0084   LearningRate 0.0497   Epoch: 5   Global Step: 98400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:17,552-Speed 9554.24 samples/sec   Loss 7.2203   LearningRate 0.0497   Epoch: 5   Global Step: 98410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:18,592-Speed 9850.39 samples/sec   Loss 7.0973   LearningRate 0.0497   Epoch: 5   Global Step: 98420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:19,654-Speed 9647.03 samples/sec   Loss 7.1246   LearningRate 0.0497   Epoch: 5   Global Step: 98430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:20,749-Speed 9362.25 samples/sec   Loss 7.1131   LearningRate 0.0497   Epoch: 5   Global Step: 98440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:21,847-Speed 9334.44 samples/sec   Loss 7.1684   LearningRate 0.0497   Epoch: 5   Global Step: 98450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:22,926-Speed 9495.30 samples/sec   Loss 7.1186   LearningRate 0.0497   Epoch: 5   Global Step: 98460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:23,968-Speed 9831.94 samples/sec   Loss 7.1953   LearningRate 0.0497   Epoch: 5   Global Step: 98470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:25,075-Speed 9250.95 samples/sec   Loss 7.0961   LearningRate 0.0497   Epoch: 5   Global Step: 98480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:26,156-Speed 9483.54 samples/sec   Loss 7.2199   LearningRate 0.0497   Epoch: 5   Global Step: 98490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:27,223-Speed 9598.37 samples/sec   Loss 7.1680   LearningRate 0.0497   Epoch: 5   Global Step: 98500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:28,282-Speed 9676.38 samples/sec   Loss 7.2294   LearningRate 0.0497   Epoch: 5   Global Step: 98510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:29,364-Speed 9463.50 samples/sec   Loss 7.1796   LearningRate 0.0497   Epoch: 5   Global Step: 98520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:30,453-Speed 9408.62 samples/sec   Loss 7.1495   LearningRate 0.0497   Epoch: 5   Global Step: 98530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:31,584-Speed 9061.78 samples/sec   Loss 7.2211   LearningRate 0.0497   Epoch: 5   Global Step: 98540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:32,627-Speed 9825.92 samples/sec   Loss 7.1797   LearningRate 0.0497   Epoch: 5   Global Step: 98550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:30:33,744-Speed 9171.51 samples/sec   Loss 7.1974   LearningRate 0.0497   Epoch: 5   Global Step: 98560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:34,847-Speed 9289.19 samples/sec   Loss 7.1880   LearningRate 0.0497   Epoch: 5   Global Step: 98570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:35,929-Speed 9469.73 samples/sec   Loss 7.2616   LearningRate 0.0497   Epoch: 5   Global Step: 98580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:37,013-Speed 9458.57 samples/sec   Loss 7.2973   LearningRate 0.0497   Epoch: 5   Global Step: 98590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:38,104-Speed 9394.94 samples/sec   Loss 7.1204   LearningRate 0.0497   Epoch: 5   Global Step: 98600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:39,176-Speed 9555.03 samples/sec   Loss 7.2634   LearningRate 0.0496   Epoch: 5   Global Step: 98610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:40,229-Speed 9729.43 samples/sec   Loss 7.1121   LearningRate 0.0496   Epoch: 5   Global Step: 98620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:41,294-Speed 9620.11 samples/sec   Loss 7.3123   LearningRate 0.0496   Epoch: 5   Global Step: 98630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:42,397-Speed 9291.36 samples/sec   Loss 7.1080   LearningRate 0.0496   Epoch: 5   Global Step: 98640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:43,477-Speed 9483.48 samples/sec   Loss 6.9837   LearningRate 0.0496   Epoch: 5   Global Step: 98650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:44,605-Speed 9089.42 samples/sec   Loss 7.1595   LearningRate 0.0496   Epoch: 5   Global Step: 98660   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:30:45,696-Speed 9386.22 samples/sec   Loss 7.2427   LearningRate 0.0496   Epoch: 5   Global Step: 98670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:46,786-Speed 9398.42 samples/sec   Loss 7.3039   LearningRate 0.0496   Epoch: 5   Global Step: 98680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:47,867-Speed 9483.44 samples/sec   Loss 7.1276   LearningRate 0.0496   Epoch: 5   Global Step: 98690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:48,973-Speed 9257.26 samples/sec   Loss 7.2526   LearningRate 0.0496   Epoch: 5   Global Step: 98700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:50,032-Speed 9676.72 samples/sec   Loss 7.0086   LearningRate 0.0496   Epoch: 5   Global Step: 98710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:51,121-Speed 9413.38 samples/sec   Loss 7.2582   LearningRate 0.0496   Epoch: 5   Global Step: 98720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:52,261-Speed 8987.53 samples/sec   Loss 7.1471   LearningRate 0.0496   Epoch: 5   Global Step: 98730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:53,396-Speed 9024.04 samples/sec   Loss 7.1007   LearningRate 0.0496   Epoch: 5   Global Step: 98740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:54,482-Speed 9432.79 samples/sec   Loss 7.1801   LearningRate 0.0496   Epoch: 5   Global Step: 98750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:55,568-Speed 9440.13 samples/sec   Loss 7.1681   LearningRate 0.0496   Epoch: 5   Global Step: 98760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:56,661-Speed 9370.62 samples/sec   Loss 7.1479   LearningRate 0.0496   Epoch: 5   Global Step: 98770   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:30:57,730-Speed 9586.39 samples/sec   Loss 7.1730   LearningRate 0.0496   Epoch: 5   Global Step: 98780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:58,825-Speed 9360.03 samples/sec   Loss 7.0826   LearningRate 0.0496   Epoch: 5   Global Step: 98790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:30:59,896-Speed 9566.07 samples/sec   Loss 7.2082   LearningRate 0.0496   Epoch: 5   Global Step: 98800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:31:00,954-Speed 9678.75 samples/sec   Loss 7.1871   LearningRate 0.0496   Epoch: 5   Global Step: 98810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:31:02,038-Speed 9461.92 samples/sec   Loss 7.2206   LearningRate 0.0496   Epoch: 5   Global Step: 98820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:31:03,107-Speed 9581.26 samples/sec   Loss 7.1762   LearningRate 0.0496   Epoch: 5   Global Step: 98830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:31:04,188-Speed 9480.36 samples/sec   Loss 7.1960   LearningRate 0.0495   Epoch: 5   Global Step: 98840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:31:05,255-Speed 9598.43 samples/sec   Loss 7.1867   LearningRate 0.0495   Epoch: 5   Global Step: 98850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:31:06,353-Speed 9330.22 samples/sec   Loss 7.1797   LearningRate 0.0495   Epoch: 5   Global Step: 98860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:31:07,438-Speed 9444.65 samples/sec   Loss 7.1689   LearningRate 0.0495   Epoch: 5   Global Step: 98870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:31:08,548-Speed 9234.76 samples/sec   Loss 7.2660   LearningRate 0.0495   Epoch: 5   Global Step: 98880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:31:09,624-Speed 9514.48 samples/sec   Loss 7.2012   LearningRate 0.0495   Epoch: 5   Global Step: 98890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:31:10,728-Speed 9288.10 samples/sec   Loss 7.1435   LearningRate 0.0495   Epoch: 5   Global Step: 98900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:11,828-Speed 9308.03 samples/sec   Loss 7.1368   LearningRate 0.0495   Epoch: 5   Global Step: 98910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:12,926-Speed 9333.13 samples/sec   Loss 7.1583   LearningRate 0.0495   Epoch: 5   Global Step: 98920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:14,061-Speed 9024.59 samples/sec   Loss 7.1144   LearningRate 0.0495   Epoch: 5   Global Step: 98930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:15,150-Speed 9409.37 samples/sec   Loss 7.1878   LearningRate 0.0495   Epoch: 5   Global Step: 98940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:16,222-Speed 9558.77 samples/sec   Loss 7.2765   LearningRate 0.0495   Epoch: 5   Global Step: 98950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:17,278-Speed 9708.82 samples/sec   Loss 7.2308   LearningRate 0.0495   Epoch: 5   Global Step: 98960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:18,394-Speed 9177.31 samples/sec   Loss 7.0797   LearningRate 0.0495   Epoch: 5   Global Step: 98970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:19,519-Speed 9108.16 samples/sec   Loss 7.3203   LearningRate 0.0495   Epoch: 5   Global Step: 98980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:20,613-Speed 9369.86 samples/sec   Loss 7.2466   LearningRate 0.0495   Epoch: 5   Global Step: 98990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:21,667-Speed 9715.97 samples/sec   Loss 7.0735   LearningRate 0.0495   Epoch: 5   Global Step: 99000   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:31:22,761-Speed 9364.14 samples/sec   Loss 7.2686   LearningRate 0.0495   Epoch: 5   Global Step: 99010   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:31:23,841-Speed 9487.53 samples/sec   Loss 7.2727   LearningRate 0.0495   Epoch: 5   Global Step: 99020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:24,958-Speed 9170.58 samples/sec   Loss 7.0370   LearningRate 0.0495   Epoch: 5   Global Step: 99030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:26,063-Speed 9278.24 samples/sec   Loss 7.1270   LearningRate 0.0495   Epoch: 5   Global Step: 99040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:27,162-Speed 9317.93 samples/sec   Loss 7.2064   LearningRate 0.0495   Epoch: 5   Global Step: 99050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:28,254-Speed 9388.45 samples/sec   Loss 7.1879   LearningRate 0.0495   Epoch: 5   Global Step: 99060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:29,325-Speed 9565.90 samples/sec   Loss 7.2728   LearningRate 0.0495   Epoch: 5   Global Step: 99070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:30,419-Speed 9363.36 samples/sec   Loss 7.1670   LearningRate 0.0494   Epoch: 5   Global Step: 99080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:31,514-Speed 9360.56 samples/sec   Loss 7.1482   LearningRate 0.0494   Epoch: 5   Global Step: 99090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:32,588-Speed 9534.71 samples/sec   Loss 7.1468   LearningRate 0.0494   Epoch: 5   Global Step: 99100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:33,647-Speed 9673.93 samples/sec   Loss 7.1440   LearningRate 0.0494   Epoch: 5   Global Step: 99110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:34,741-Speed 9370.78 samples/sec   Loss 7.0612   LearningRate 0.0494   Epoch: 5   Global Step: 99120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:35,839-Speed 9337.24 samples/sec   Loss 7.0986   LearningRate 0.0494   Epoch: 5   Global Step: 99130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:36,945-Speed 9261.22 samples/sec   Loss 7.1776   LearningRate 0.0494   Epoch: 5   Global Step: 99140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:38,063-Speed 9167.43 samples/sec   Loss 7.1946   LearningRate 0.0494   Epoch: 5   Global Step: 99150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:39,154-Speed 9390.18 samples/sec   Loss 7.0341   LearningRate 0.0494   Epoch: 5   Global Step: 99160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:40,248-Speed 9364.56 samples/sec   Loss 7.2615   LearningRate 0.0494   Epoch: 5   Global Step: 99170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:41,388-Speed 8984.36 samples/sec   Loss 7.2406   LearningRate 0.0494   Epoch: 5   Global Step: 99180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:42,479-Speed 9395.61 samples/sec   Loss 7.0872   LearningRate 0.0494   Epoch: 5   Global Step: 99190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:43,558-Speed 9490.23 samples/sec   Loss 7.1940   LearningRate 0.0494   Epoch: 5   Global Step: 99200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:44,653-Speed 9362.68 samples/sec   Loss 7.1506   LearningRate 0.0494   Epoch: 5   Global Step: 99210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:45,724-Speed 9568.06 samples/sec   Loss 7.1784   LearningRate 0.0494   Epoch: 5   Global Step: 99220   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:31:46,790-Speed 9604.04 samples/sec   Loss 7.1711   LearningRate 0.0494   Epoch: 5   Global Step: 99230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:47,896-Speed 9263.19 samples/sec   Loss 7.0766   LearningRate 0.0494   Epoch: 5   Global Step: 99240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:48,996-Speed 9317.75 samples/sec   Loss 7.1122   LearningRate 0.0494   Epoch: 5   Global Step: 99250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:50,089-Speed 9381.19 samples/sec   Loss 7.2151   LearningRate 0.0494   Epoch: 5   Global Step: 99260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:51,196-Speed 9255.86 samples/sec   Loss 7.1016   LearningRate 0.0494   Epoch: 5   Global Step: 99270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:52,301-Speed 9267.54 samples/sec   Loss 7.1409   LearningRate 0.0494   Epoch: 5   Global Step: 99280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:53,469-Speed 8774.67 samples/sec   Loss 7.2589   LearningRate 0.0494   Epoch: 5   Global Step: 99290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:54,553-Speed 9452.29 samples/sec   Loss 7.1528   LearningRate 0.0494   Epoch: 5   Global Step: 99300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:55,628-Speed 9531.47 samples/sec   Loss 7.0630   LearningRate 0.0494   Epoch: 5   Global Step: 99310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:56,710-Speed 9470.23 samples/sec   Loss 7.1356   LearningRate 0.0493   Epoch: 5   Global Step: 99320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:31:57,797-Speed 9421.19 samples/sec   Loss 7.2461   LearningRate 0.0493   Epoch: 5   Global Step: 99330   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:31:58,925-Speed 9084.08 samples/sec   Loss 7.1477   LearningRate 0.0493   Epoch: 5   Global Step: 99340   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:32:00,022-Speed 9337.17 samples/sec   Loss 7.1434   LearningRate 0.0493   Epoch: 5   Global Step: 99350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:01,118-Speed 9353.60 samples/sec   Loss 7.1724   LearningRate 0.0493   Epoch: 5   Global Step: 99360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:02,201-Speed 9465.97 samples/sec   Loss 7.1509   LearningRate 0.0493   Epoch: 5   Global Step: 99370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:03,294-Speed 9376.52 samples/sec   Loss 7.2702   LearningRate 0.0493   Epoch: 5   Global Step: 99380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:04,422-Speed 9078.32 samples/sec   Loss 7.0814   LearningRate 0.0493   Epoch: 5   Global Step: 99390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:05,523-Speed 9310.70 samples/sec   Loss 7.1887   LearningRate 0.0493   Epoch: 5   Global Step: 99400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:06,625-Speed 9289.74 samples/sec   Loss 7.1149   LearningRate 0.0493   Epoch: 5   Global Step: 99410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:07,727-Speed 9303.58 samples/sec   Loss 7.0609   LearningRate 0.0493   Epoch: 5   Global Step: 99420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:08,792-Speed 9623.57 samples/sec   Loss 7.1522   LearningRate 0.0493   Epoch: 5   Global Step: 99430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:09,873-Speed 9473.65 samples/sec   Loss 7.1623   LearningRate 0.0493   Epoch: 5   Global Step: 99440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:10,954-Speed 9478.89 samples/sec   Loss 7.2333   LearningRate 0.0493   Epoch: 5   Global Step: 99450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:12,060-Speed 9266.19 samples/sec   Loss 7.1453   LearningRate 0.0493   Epoch: 5   Global Step: 99460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:32:13,141-Speed 9482.61 samples/sec   Loss 7.1095   LearningRate 0.0493   Epoch: 5   Global Step: 99470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:32:14,194-Speed 9725.75 samples/sec   Loss 7.2565   LearningRate 0.0493   Epoch: 5   Global Step: 99480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:32:15,265-Speed 9567.49 samples/sec   Loss 7.0223   LearningRate 0.0493   Epoch: 5   Global Step: 99490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:32:16,369-Speed 9278.14 samples/sec   Loss 7.2566   LearningRate 0.0493   Epoch: 5   Global Step: 99500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:32:17,462-Speed 9375.10 samples/sec   Loss 7.3551   LearningRate 0.0493   Epoch: 5   Global Step: 99510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:32:18,560-Speed 9335.20 samples/sec   Loss 7.2019   LearningRate 0.0493   Epoch: 5   Global Step: 99520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:32:19,655-Speed 9352.34 samples/sec   Loss 7.1748   LearningRate 0.0493   Epoch: 5   Global Step: 99530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:32:20,716-Speed 9654.91 samples/sec   Loss 7.1640   LearningRate 0.0493   Epoch: 5   Global Step: 99540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:32:21,752-Speed 9900.08 samples/sec   Loss 7.1392   LearningRate 0.0493   Epoch: 5   Global Step: 99550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:32:22,823-Speed 9559.43 samples/sec   Loss 7.1114   LearningRate 0.0492   Epoch: 5   Global Step: 99560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:23,858-Speed 9897.67 samples/sec   Loss 7.1829   LearningRate 0.0492   Epoch: 5   Global Step: 99570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:24,975-Speed 9175.96 samples/sec   Loss 7.0953   LearningRate 0.0492   Epoch: 5   Global Step: 99580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:26,063-Speed 9415.22 samples/sec   Loss 7.0941   LearningRate 0.0492   Epoch: 5   Global Step: 99590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:27,139-Speed 9525.17 samples/sec   Loss 7.0166   LearningRate 0.0492   Epoch: 5   Global Step: 99600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:28,166-Speed 9970.45 samples/sec   Loss 7.1227   LearningRate 0.0492   Epoch: 5   Global Step: 99610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:29,202-Speed 9890.28 samples/sec   Loss 7.0066   LearningRate 0.0492   Epoch: 5   Global Step: 99620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:30,247-Speed 9805.04 samples/sec   Loss 7.2607   LearningRate 0.0492   Epoch: 5   Global Step: 99630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:31,338-Speed 9404.99 samples/sec   Loss 7.2163   LearningRate 0.0492   Epoch: 5   Global Step: 99640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:32,436-Speed 9329.98 samples/sec   Loss 7.1134   LearningRate 0.0492   Epoch: 5   Global Step: 99650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:33,483-Speed 9779.00 samples/sec   Loss 7.0900   LearningRate 0.0492   Epoch: 5   Global Step: 99660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:34,579-Speed 9352.78 samples/sec   Loss 7.2089   LearningRate 0.0492   Epoch: 5   Global Step: 99670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:35,723-Speed 8953.69 samples/sec   Loss 7.0691   LearningRate 0.0492   Epoch: 5   Global Step: 99680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:36,792-Speed 9582.95 samples/sec   Loss 7.1981   LearningRate 0.0492   Epoch: 5   Global Step: 99690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:37,846-Speed 9723.50 samples/sec   Loss 7.2326   LearningRate 0.0492   Epoch: 5   Global Step: 99700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:38,895-Speed 9768.98 samples/sec   Loss 7.2606   LearningRate 0.0492   Epoch: 5   Global Step: 99710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:39,988-Speed 9381.42 samples/sec   Loss 7.2275   LearningRate 0.0492   Epoch: 5   Global Step: 99720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:41,081-Speed 9374.17 samples/sec   Loss 7.2022   LearningRate 0.0492   Epoch: 5   Global Step: 99730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:42,192-Speed 9215.26 samples/sec   Loss 7.0780   LearningRate 0.0492   Epoch: 5   Global Step: 99740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:43,240-Speed 9780.08 samples/sec   Loss 7.1420   LearningRate 0.0492   Epoch: 5   Global Step: 99750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:44,268-Speed 9965.40 samples/sec   Loss 7.1065   LearningRate 0.0492   Epoch: 5   Global Step: 99760   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:32:45,388-Speed 9147.08 samples/sec   Loss 7.1549   LearningRate 0.0492   Epoch: 5   Global Step: 99770   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:32:46,457-Speed 9589.24 samples/sec   Loss 7.1441   LearningRate 0.0492   Epoch: 5   Global Step: 99780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:47,569-Speed 9207.95 samples/sec   Loss 7.0667   LearningRate 0.0491   Epoch: 5   Global Step: 99790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:48,647-Speed 9507.57 samples/sec   Loss 7.1829   LearningRate 0.0491   Epoch: 5   Global Step: 99800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:49,745-Speed 9333.71 samples/sec   Loss 7.1955   LearningRate 0.0491   Epoch: 5   Global Step: 99810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:50,809-Speed 9636.33 samples/sec   Loss 7.1572   LearningRate 0.0491   Epoch: 5   Global Step: 99820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:51,852-Speed 9820.86 samples/sec   Loss 7.1531   LearningRate 0.0491   Epoch: 5   Global Step: 99830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:52,892-Speed 9847.27 samples/sec   Loss 7.2235   LearningRate 0.0491   Epoch: 5   Global Step: 99840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:53,976-Speed 9458.36 samples/sec   Loss 7.1820   LearningRate 0.0491   Epoch: 5   Global Step: 99850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:55,103-Speed 9091.72 samples/sec   Loss 7.2633   LearningRate 0.0491   Epoch: 5   Global Step: 99860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:56,147-Speed 9809.59 samples/sec   Loss 7.2740   LearningRate 0.0491   Epoch: 5   Global Step: 99870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:32:57,241-Speed 9367.37 samples/sec   Loss 7.2660   LearningRate 0.0491   Epoch: 5   Global Step: 99880   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:32:58,331-Speed 9396.98 samples/sec   Loss 7.2134   LearningRate 0.0491   Epoch: 5   Global Step: 99890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:32:59,427-Speed 9347.58 samples/sec   Loss 7.0772   LearningRate 0.0491   Epoch: 5   Global Step: 99900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:33:00,503-Speed 9527.56 samples/sec   Loss 7.1660   LearningRate 0.0491   Epoch: 5   Global Step: 99910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:33:01,548-Speed 9803.17 samples/sec   Loss 7.1317   LearningRate 0.0491   Epoch: 5   Global Step: 99920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:33:02,621-Speed 9552.31 samples/sec   Loss 7.0761   LearningRate 0.0491   Epoch: 5   Global Step: 99930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:33:03,718-Speed 9336.33 samples/sec   Loss 7.0238   LearningRate 0.0491   Epoch: 5   Global Step: 99940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:33:04,775-Speed 9698.29 samples/sec   Loss 7.1338   LearningRate 0.0491   Epoch: 5   Global Step: 99950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:33:05,871-Speed 9342.08 samples/sec   Loss 7.0742   LearningRate 0.0491   Epoch: 5   Global Step: 99960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:33:06,958-Speed 9425.32 samples/sec   Loss 7.1054   LearningRate 0.0491   Epoch: 5   Global Step: 99970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:33:08,038-Speed 9496.39 samples/sec   Loss 7.1670   LearningRate 0.0491   Epoch: 5   Global Step: 99980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:33:09,129-Speed 9387.60 samples/sec   Loss 7.1467   LearningRate 0.0491   Epoch: 5   Global Step: 99990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:33:10,193-Speed 9632.83 samples/sec   Loss 7.1642   LearningRate 0.0491   Epoch: 5   Global Step: 100000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:33:32,597-[lfw][100000]XNorm: 11.507255
Training: 2022-04-11 15:33:32,598-[lfw][100000]Accuracy-Flip: 0.99550+-0.00289
Training: 2022-04-11 15:33:32,598-[lfw][100000]Accuracy-Highest: 0.99683
Training: 2022-04-11 15:33:58,166-[cfp_fp][100000]XNorm: 9.890199
Training: 2022-04-11 15:33:58,167-[cfp_fp][100000]Accuracy-Flip: 0.95857+-0.00852
Training: 2022-04-11 15:33:58,167-[cfp_fp][100000]Accuracy-Highest: 0.95857
Training: 2022-04-11 15:34:19,909-[agedb_30][100000]XNorm: 11.149160
Training: 2022-04-11 15:34:19,909-[agedb_30][100000]Accuracy-Flip: 0.96133+-0.00862
Training: 2022-04-11 15:34:19,910-[agedb_30][100000]Accuracy-Highest: 0.96317
Training: 2022-04-11 15:34:20,979-Speed 144.66 samples/sec   Loss 7.1048   LearningRate 0.0491   Epoch: 5   Global Step: 100010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:34:22,042-Speed 9639.67 samples/sec   Loss 7.1667   LearningRate 0.0491   Epoch: 5   Global Step: 100020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:34:23,132-Speed 9404.87 samples/sec   Loss 7.0967   LearningRate 0.0490   Epoch: 5   Global Step: 100030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:34:24,237-Speed 9272.13 samples/sec   Loss 7.1738   LearningRate 0.0490   Epoch: 5   Global Step: 100040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:34:25,294-Speed 9688.74 samples/sec   Loss 7.0243   LearningRate 0.0490   Epoch: 5   Global Step: 100050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:34:26,330-Speed 9890.55 samples/sec   Loss 7.2009   LearningRate 0.0490   Epoch: 5   Global Step: 100060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:34:27,399-Speed 9590.82 samples/sec   Loss 7.1478   LearningRate 0.0490   Epoch: 5   Global Step: 100070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:34:28,561-Speed 8813.23 samples/sec   Loss 7.1320   LearningRate 0.0490   Epoch: 5   Global Step: 100080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:34:29,625-Speed 9636.35 samples/sec   Loss 7.1917   LearningRate 0.0490   Epoch: 5   Global Step: 100090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:34:30,718-Speed 9371.02 samples/sec   Loss 7.0783   LearningRate 0.0490   Epoch: 5   Global Step: 100100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:34:31,783-Speed 9625.16 samples/sec   Loss 7.1734   LearningRate 0.0490   Epoch: 5   Global Step: 100110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:34:32,883-Speed 9315.57 samples/sec   Loss 7.1691   LearningRate 0.0490   Epoch: 5   Global Step: 100120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:34:33,962-Speed 9490.33 samples/sec   Loss 7.1552   LearningRate 0.0490   Epoch: 5   Global Step: 100130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:34:35,371-Speed 7270.06 samples/sec   Loss 7.1646   LearningRate 0.0490   Epoch: 5   Global Step: 100140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:02,552-Speed 376.76 samples/sec   Loss 6.7753   LearningRate 0.0490   Epoch: 6   Global Step: 100150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:03,980-Speed 7178.25 samples/sec   Loss 6.2424   LearningRate 0.0490   Epoch: 6   Global Step: 100160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:05,889-Speed 5364.24 samples/sec   Loss 6.3282   LearningRate 0.0490   Epoch: 6   Global Step: 100170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:07,182-Speed 7925.56 samples/sec   Loss 6.3607   LearningRate 0.0490   Epoch: 6   Global Step: 100180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:08,292-Speed 9239.98 samples/sec   Loss 6.3298   LearningRate 0.0490   Epoch: 6   Global Step: 100190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:09,711-Speed 7220.88 samples/sec   Loss 6.3390   LearningRate 0.0490   Epoch: 6   Global Step: 100200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:10,823-Speed 9209.55 samples/sec   Loss 6.2855   LearningRate 0.0490   Epoch: 6   Global Step: 100210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:11,878-Speed 9711.06 samples/sec   Loss 6.2769   LearningRate 0.0490   Epoch: 6   Global Step: 100220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:12,935-Speed 9693.40 samples/sec   Loss 6.2828   LearningRate 0.0490   Epoch: 6   Global Step: 100230   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:35:14,032-Speed 9345.05 samples/sec   Loss 6.3434   LearningRate 0.0490   Epoch: 6   Global Step: 100240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:15,135-Speed 9293.54 samples/sec   Loss 6.3957   LearningRate 0.0490   Epoch: 6   Global Step: 100250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:16,244-Speed 9238.27 samples/sec   Loss 6.3142   LearningRate 0.0490   Epoch: 6   Global Step: 100260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:17,338-Speed 9364.52 samples/sec   Loss 6.3122   LearningRate 0.0489   Epoch: 6   Global Step: 100270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:18,441-Speed 9289.52 samples/sec   Loss 6.3668   LearningRate 0.0489   Epoch: 6   Global Step: 100280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:19,938-Speed 6840.11 samples/sec   Loss 6.3780   LearningRate 0.0489   Epoch: 6   Global Step: 100290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:21,262-Speed 7735.77 samples/sec   Loss 6.3039   LearningRate 0.0489   Epoch: 6   Global Step: 100300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:22,526-Speed 8106.14 samples/sec   Loss 6.4319   LearningRate 0.0489   Epoch: 6   Global Step: 100310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:23,817-Speed 7937.90 samples/sec   Loss 6.3998   LearningRate 0.0489   Epoch: 6   Global Step: 100320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:24,900-Speed 9468.76 samples/sec   Loss 6.2926   LearningRate 0.0489   Epoch: 6   Global Step: 100330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:25,948-Speed 9777.15 samples/sec   Loss 6.3244   LearningRate 0.0489   Epoch: 6   Global Step: 100340   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:35:27,040-Speed 9383.36 samples/sec   Loss 6.4247   LearningRate 0.0489   Epoch: 6   Global Step: 100350   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:35:28,127-Speed 9423.63 samples/sec   Loss 6.4300   LearningRate 0.0489   Epoch: 6   Global Step: 100360   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:35:29,236-Speed 9237.22 samples/sec   Loss 6.3512   LearningRate 0.0489   Epoch: 6   Global Step: 100370   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:35:30,352-Speed 9183.83 samples/sec   Loss 6.2816   LearningRate 0.0489   Epoch: 6   Global Step: 100380   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:35:31,389-Speed 9881.33 samples/sec   Loss 6.3437   LearningRate 0.0489   Epoch: 6   Global Step: 100390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:32,480-Speed 9393.61 samples/sec   Loss 6.3935   LearningRate 0.0489   Epoch: 6   Global Step: 100400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:33,543-Speed 9637.26 samples/sec   Loss 6.2890   LearningRate 0.0489   Epoch: 6   Global Step: 100410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:34,655-Speed 9213.66 samples/sec   Loss 6.4398   LearningRate 0.0489   Epoch: 6   Global Step: 100420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:35,712-Speed 9689.33 samples/sec   Loss 6.3645   LearningRate 0.0489   Epoch: 6   Global Step: 100430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:36,836-Speed 9122.16 samples/sec   Loss 6.4034   LearningRate 0.0489   Epoch: 6   Global Step: 100440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:38,129-Speed 7920.95 samples/sec   Loss 6.3778   LearningRate 0.0489   Epoch: 6   Global Step: 100450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:39,254-Speed 9113.49 samples/sec   Loss 6.3774   LearningRate 0.0489   Epoch: 6   Global Step: 100460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:40,335-Speed 9478.14 samples/sec   Loss 6.4234   LearningRate 0.0489   Epoch: 6   Global Step: 100470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:41,410-Speed 9524.59 samples/sec   Loss 6.3922   LearningRate 0.0489   Epoch: 6   Global Step: 100480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:42,488-Speed 9506.76 samples/sec   Loss 6.4126   LearningRate 0.0489   Epoch: 6   Global Step: 100490   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:35:43,538-Speed 9758.38 samples/sec   Loss 6.3613   LearningRate 0.0489   Epoch: 6   Global Step: 100500   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:35:44,573-Speed 9906.36 samples/sec   Loss 6.4050   LearningRate 0.0488   Epoch: 6   Global Step: 100510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:45,925-Speed 7579.08 samples/sec   Loss 6.4746   LearningRate 0.0488   Epoch: 6   Global Step: 100520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:46,996-Speed 9567.18 samples/sec   Loss 6.4278   LearningRate 0.0488   Epoch: 6   Global Step: 100530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:48,111-Speed 9187.47 samples/sec   Loss 6.3896   LearningRate 0.0488   Epoch: 6   Global Step: 100540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:49,149-Speed 9868.33 samples/sec   Loss 6.4544   LearningRate 0.0488   Epoch: 6   Global Step: 100550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:50,212-Speed 9637.53 samples/sec   Loss 6.4487   LearningRate 0.0488   Epoch: 6   Global Step: 100560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:51,269-Speed 9694.78 samples/sec   Loss 6.4908   LearningRate 0.0488   Epoch: 6   Global Step: 100570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:52,333-Speed 9629.93 samples/sec   Loss 6.4951   LearningRate 0.0488   Epoch: 6   Global Step: 100580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:53,372-Speed 9863.17 samples/sec   Loss 6.4594   LearningRate 0.0488   Epoch: 6   Global Step: 100590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:54,424-Speed 9742.98 samples/sec   Loss 6.4653   LearningRate 0.0488   Epoch: 6   Global Step: 100600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:55,537-Speed 9199.97 samples/sec   Loss 6.5609   LearningRate 0.0488   Epoch: 6   Global Step: 100610   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:35:56,628-Speed 9398.89 samples/sec   Loss 6.4365   LearningRate 0.0488   Epoch: 6   Global Step: 100620   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:35:57,715-Speed 9420.43 samples/sec   Loss 6.4960   LearningRate 0.0488   Epoch: 6   Global Step: 100630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:58,757-Speed 9837.65 samples/sec   Loss 6.4455   LearningRate 0.0488   Epoch: 6   Global Step: 100640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:35:59,806-Speed 9763.98 samples/sec   Loss 6.5129   LearningRate 0.0488   Epoch: 6   Global Step: 100650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:00,891-Speed 9439.20 samples/sec   Loss 6.5260   LearningRate 0.0488   Epoch: 6   Global Step: 100660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:01,995-Speed 9283.74 samples/sec   Loss 6.3969   LearningRate 0.0488   Epoch: 6   Global Step: 100670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:03,091-Speed 9349.33 samples/sec   Loss 6.5071   LearningRate 0.0488   Epoch: 6   Global Step: 100680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:04,170-Speed 9497.12 samples/sec   Loss 6.5894   LearningRate 0.0488   Epoch: 6   Global Step: 100690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:05,213-Speed 9819.38 samples/sec   Loss 6.4779   LearningRate 0.0488   Epoch: 6   Global Step: 100700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:06,251-Speed 9871.37 samples/sec   Loss 6.5123   LearningRate 0.0488   Epoch: 6   Global Step: 100710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:07,338-Speed 9431.93 samples/sec   Loss 6.4144   LearningRate 0.0488   Epoch: 6   Global Step: 100720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:08,416-Speed 9508.28 samples/sec   Loss 6.4426   LearningRate 0.0488   Epoch: 6   Global Step: 100730   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:36:09,492-Speed 9522.11 samples/sec   Loss 6.5083   LearningRate 0.0488   Epoch: 6   Global Step: 100740   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:36:10,528-Speed 9889.63 samples/sec   Loss 6.4722   LearningRate 0.0487   Epoch: 6   Global Step: 100750   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:36:11,650-Speed 9125.85 samples/sec   Loss 6.5401   LearningRate 0.0487   Epoch: 6   Global Step: 100760   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:36:12,762-Speed 9219.28 samples/sec   Loss 6.4676   LearningRate 0.0487   Epoch: 6   Global Step: 100770   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:36:13,855-Speed 9368.88 samples/sec   Loss 6.4805   LearningRate 0.0487   Epoch: 6   Global Step: 100780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:14,925-Speed 9577.09 samples/sec   Loss 6.4548   LearningRate 0.0487   Epoch: 6   Global Step: 100790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:16,024-Speed 9325.31 samples/sec   Loss 6.4793   LearningRate 0.0487   Epoch: 6   Global Step: 100800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:17,124-Speed 9312.37 samples/sec   Loss 6.5249   LearningRate 0.0487   Epoch: 6   Global Step: 100810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:18,177-Speed 9731.54 samples/sec   Loss 6.5479   LearningRate 0.0487   Epoch: 6   Global Step: 100820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:19,278-Speed 9309.13 samples/sec   Loss 6.4979   LearningRate 0.0487   Epoch: 6   Global Step: 100830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:20,409-Speed 9056.21 samples/sec   Loss 6.4764   LearningRate 0.0487   Epoch: 6   Global Step: 100840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:21,479-Speed 9580.44 samples/sec   Loss 6.5081   LearningRate 0.0487   Epoch: 6   Global Step: 100850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:22,538-Speed 9672.36 samples/sec   Loss 6.5621   LearningRate 0.0487   Epoch: 6   Global Step: 100860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:23,595-Speed 9688.93 samples/sec   Loss 6.4956   LearningRate 0.0487   Epoch: 6   Global Step: 100870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:24,662-Speed 9604.35 samples/sec   Loss 6.6318   LearningRate 0.0487   Epoch: 6   Global Step: 100880   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:36:25,730-Speed 9594.27 samples/sec   Loss 6.5095   LearningRate 0.0487   Epoch: 6   Global Step: 100890   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:36:26,810-Speed 9489.08 samples/sec   Loss 6.5175   LearningRate 0.0487   Epoch: 6   Global Step: 100900   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:36:27,852-Speed 9833.48 samples/sec   Loss 6.4967   LearningRate 0.0487   Epoch: 6   Global Step: 100910   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:36:28,922-Speed 9580.31 samples/sec   Loss 6.4652   LearningRate 0.0487   Epoch: 6   Global Step: 100920   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:36:29,987-Speed 9617.94 samples/sec   Loss 6.4706   LearningRate 0.0487   Epoch: 6   Global Step: 100930   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:36:31,062-Speed 9527.20 samples/sec   Loss 6.5497   LearningRate 0.0487   Epoch: 6   Global Step: 100940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:32,133-Speed 9569.79 samples/sec   Loss 6.5145   LearningRate 0.0487   Epoch: 6   Global Step: 100950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:33,233-Speed 9315.67 samples/sec   Loss 6.5337   LearningRate 0.0487   Epoch: 6   Global Step: 100960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:34,287-Speed 9717.54 samples/sec   Loss 6.5009   LearningRate 0.0487   Epoch: 6   Global Step: 100970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:35,332-Speed 9805.03 samples/sec   Loss 6.6729   LearningRate 0.0487   Epoch: 6   Global Step: 100980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:36,419-Speed 9425.52 samples/sec   Loss 6.4456   LearningRate 0.0486   Epoch: 6   Global Step: 100990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:37,512-Speed 9371.80 samples/sec   Loss 6.4487   LearningRate 0.0486   Epoch: 6   Global Step: 101000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:38,580-Speed 9599.16 samples/sec   Loss 6.6140   LearningRate 0.0486   Epoch: 6   Global Step: 101010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:39,658-Speed 9503.75 samples/sec   Loss 6.5603   LearningRate 0.0486   Epoch: 6   Global Step: 101020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:40,695-Speed 9884.83 samples/sec   Loss 6.4152   LearningRate 0.0486   Epoch: 6   Global Step: 101030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:41,759-Speed 9623.54 samples/sec   Loss 6.5477   LearningRate 0.0486   Epoch: 6   Global Step: 101040   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:36:42,836-Speed 9517.53 samples/sec   Loss 6.5355   LearningRate 0.0486   Epoch: 6   Global Step: 101050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:43,928-Speed 9386.50 samples/sec   Loss 6.5196   LearningRate 0.0486   Epoch: 6   Global Step: 101060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:45,037-Speed 9235.47 samples/sec   Loss 6.5840   LearningRate 0.0486   Epoch: 6   Global Step: 101070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:46,120-Speed 9465.46 samples/sec   Loss 6.4129   LearningRate 0.0486   Epoch: 6   Global Step: 101080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:47,220-Speed 9312.04 samples/sec   Loss 6.5196   LearningRate 0.0486   Epoch: 6   Global Step: 101090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:48,363-Speed 8961.98 samples/sec   Loss 6.4706   LearningRate 0.0486   Epoch: 6   Global Step: 101100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:49,430-Speed 9600.36 samples/sec   Loss 6.5300   LearningRate 0.0486   Epoch: 6   Global Step: 101110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:50,474-Speed 9822.29 samples/sec   Loss 6.5907   LearningRate 0.0486   Epoch: 6   Global Step: 101120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:51,516-Speed 9824.58 samples/sec   Loss 6.6160   LearningRate 0.0486   Epoch: 6   Global Step: 101130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:52,632-Speed 9181.20 samples/sec   Loss 6.5209   LearningRate 0.0486   Epoch: 6   Global Step: 101140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:53,728-Speed 9355.44 samples/sec   Loss 6.5055   LearningRate 0.0486   Epoch: 6   Global Step: 101150   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:36:54,788-Speed 9657.50 samples/sec   Loss 6.6357   LearningRate 0.0486   Epoch: 6   Global Step: 101160   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:36:55,881-Speed 9373.17 samples/sec   Loss 6.4682   LearningRate 0.0486   Epoch: 6   Global Step: 101170   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:36:56,989-Speed 9255.26 samples/sec   Loss 6.5654   LearningRate 0.0486   Epoch: 6   Global Step: 101180   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:36:58,047-Speed 9682.61 samples/sec   Loss 6.4819   LearningRate 0.0486   Epoch: 6   Global Step: 101190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:36:59,117-Speed 9577.50 samples/sec   Loss 6.5545   LearningRate 0.0486   Epoch: 6   Global Step: 101200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:00,225-Speed 9246.77 samples/sec   Loss 6.4913   LearningRate 0.0486   Epoch: 6   Global Step: 101210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:01,355-Speed 9068.19 samples/sec   Loss 6.6145   LearningRate 0.0486   Epoch: 6   Global Step: 101220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:02,413-Speed 9681.68 samples/sec   Loss 6.5433   LearningRate 0.0485   Epoch: 6   Global Step: 101230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:03,485-Speed 9558.40 samples/sec   Loss 6.6456   LearningRate 0.0485   Epoch: 6   Global Step: 101240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:04,573-Speed 9423.50 samples/sec   Loss 6.6238   LearningRate 0.0485   Epoch: 6   Global Step: 101250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:05,628-Speed 9711.49 samples/sec   Loss 6.5620   LearningRate 0.0485   Epoch: 6   Global Step: 101260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:06,679-Speed 9748.44 samples/sec   Loss 6.5409   LearningRate 0.0485   Epoch: 6   Global Step: 101270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:07,763-Speed 9448.88 samples/sec   Loss 6.5808   LearningRate 0.0485   Epoch: 6   Global Step: 101280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:08,811-Speed 9782.03 samples/sec   Loss 6.5807   LearningRate 0.0485   Epoch: 6   Global Step: 101290   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:37:09,905-Speed 9359.97 samples/sec   Loss 6.5752   LearningRate 0.0485   Epoch: 6   Global Step: 101300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:11,007-Speed 9297.69 samples/sec   Loss 6.4794   LearningRate 0.0485   Epoch: 6   Global Step: 101310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:12,131-Speed 9119.99 samples/sec   Loss 6.5959   LearningRate 0.0485   Epoch: 6   Global Step: 101320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:13,158-Speed 9972.61 samples/sec   Loss 6.5398   LearningRate 0.0485   Epoch: 6   Global Step: 101330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:14,229-Speed 9569.92 samples/sec   Loss 6.6226   LearningRate 0.0485   Epoch: 6   Global Step: 101340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:15,322-Speed 9378.78 samples/sec   Loss 6.5804   LearningRate 0.0485   Epoch: 6   Global Step: 101350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:16,409-Speed 9425.86 samples/sec   Loss 6.7295   LearningRate 0.0485   Epoch: 6   Global Step: 101360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:17,473-Speed 9632.17 samples/sec   Loss 6.6135   LearningRate 0.0485   Epoch: 6   Global Step: 101370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:18,521-Speed 9776.18 samples/sec   Loss 6.5503   LearningRate 0.0485   Epoch: 6   Global Step: 101380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:19,593-Speed 9555.06 samples/sec   Loss 6.6333   LearningRate 0.0485   Epoch: 6   Global Step: 101390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:20,670-Speed 9509.91 samples/sec   Loss 6.6415   LearningRate 0.0485   Epoch: 6   Global Step: 101400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:21,779-Speed 9242.09 samples/sec   Loss 6.6337   LearningRate 0.0485   Epoch: 6   Global Step: 101410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:22,850-Speed 9567.84 samples/sec   Loss 6.6279   LearningRate 0.0485   Epoch: 6   Global Step: 101420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:23,926-Speed 9518.06 samples/sec   Loss 6.4970   LearningRate 0.0485   Epoch: 6   Global Step: 101430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:25,028-Speed 9298.01 samples/sec   Loss 6.5496   LearningRate 0.0485   Epoch: 6   Global Step: 101440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:26,122-Speed 9369.87 samples/sec   Loss 6.5622   LearningRate 0.0485   Epoch: 6   Global Step: 101450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:27,177-Speed 9714.34 samples/sec   Loss 6.6251   LearningRate 0.0485   Epoch: 6   Global Step: 101460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:28,257-Speed 9483.62 samples/sec   Loss 6.6904   LearningRate 0.0484   Epoch: 6   Global Step: 101470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:29,339-Speed 9468.91 samples/sec   Loss 6.5904   LearningRate 0.0484   Epoch: 6   Global Step: 101480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:30,459-Speed 9145.28 samples/sec   Loss 6.6516   LearningRate 0.0484   Epoch: 6   Global Step: 101490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:31,538-Speed 9500.48 samples/sec   Loss 6.5304   LearningRate 0.0484   Epoch: 6   Global Step: 101500   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:37:32,607-Speed 9576.99 samples/sec   Loss 6.6002   LearningRate 0.0484   Epoch: 6   Global Step: 101510   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:37:33,678-Speed 9567.17 samples/sec   Loss 6.6424   LearningRate 0.0484   Epoch: 6   Global Step: 101520   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:37:34,765-Speed 9424.99 samples/sec   Loss 6.5580   LearningRate 0.0484   Epoch: 6   Global Step: 101530   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:37:35,821-Speed 9707.09 samples/sec   Loss 6.4866   LearningRate 0.0484   Epoch: 6   Global Step: 101540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:36,914-Speed 9380.89 samples/sec   Loss 6.6339   LearningRate 0.0484   Epoch: 6   Global Step: 101550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:38,032-Speed 9163.63 samples/sec   Loss 6.6213   LearningRate 0.0484   Epoch: 6   Global Step: 101560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:39,112-Speed 9486.21 samples/sec   Loss 6.6471   LearningRate 0.0484   Epoch: 6   Global Step: 101570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:40,179-Speed 9601.06 samples/sec   Loss 6.5920   LearningRate 0.0484   Epoch: 6   Global Step: 101580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:41,231-Speed 9737.91 samples/sec   Loss 6.6761   LearningRate 0.0484   Epoch: 6   Global Step: 101590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:42,307-Speed 9526.54 samples/sec   Loss 6.6535   LearningRate 0.0484   Epoch: 6   Global Step: 101600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:43,436-Speed 9076.30 samples/sec   Loss 6.6055   LearningRate 0.0484   Epoch: 6   Global Step: 101610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:44,467-Speed 9939.14 samples/sec   Loss 6.5502   LearningRate 0.0484   Epoch: 6   Global Step: 101620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:45,539-Speed 9555.76 samples/sec   Loss 6.6003   LearningRate 0.0484   Epoch: 6   Global Step: 101630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:46,643-Speed 9279.69 samples/sec   Loss 6.6016   LearningRate 0.0484   Epoch: 6   Global Step: 101640   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:37:47,761-Speed 9167.45 samples/sec   Loss 6.6613   LearningRate 0.0484   Epoch: 6   Global Step: 101650   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:37:48,877-Speed 9175.06 samples/sec   Loss 6.5654   LearningRate 0.0484   Epoch: 6   Global Step: 101660   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:37:49,963-Speed 9438.90 samples/sec   Loss 6.6255   LearningRate 0.0484   Epoch: 6   Global Step: 101670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:51,061-Speed 9329.70 samples/sec   Loss 6.6824   LearningRate 0.0484   Epoch: 6   Global Step: 101680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:52,147-Speed 9434.46 samples/sec   Loss 6.5799   LearningRate 0.0484   Epoch: 6   Global Step: 101690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:53,200-Speed 9729.26 samples/sec   Loss 6.5968   LearningRate 0.0484   Epoch: 6   Global Step: 101700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:54,274-Speed 9546.25 samples/sec   Loss 6.6680   LearningRate 0.0483   Epoch: 6   Global Step: 101710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:55,358-Speed 9449.37 samples/sec   Loss 6.6267   LearningRate 0.0483   Epoch: 6   Global Step: 101720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:56,483-Speed 9107.53 samples/sec   Loss 6.5633   LearningRate 0.0483   Epoch: 6   Global Step: 101730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:57,570-Speed 9429.24 samples/sec   Loss 6.7236   LearningRate 0.0483   Epoch: 6   Global Step: 101740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:58,616-Speed 9791.06 samples/sec   Loss 6.5795   LearningRate 0.0483   Epoch: 6   Global Step: 101750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:37:59,669-Speed 9735.15 samples/sec   Loss 6.6946   LearningRate 0.0483   Epoch: 6   Global Step: 101760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:00,793-Speed 9108.83 samples/sec   Loss 6.6274   LearningRate 0.0483   Epoch: 6   Global Step: 101770   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:38:01,895-Speed 9302.35 samples/sec   Loss 6.6020   LearningRate 0.0483   Epoch: 6   Global Step: 101780   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:38:02,972-Speed 9515.26 samples/sec   Loss 6.6383   LearningRate 0.0483   Epoch: 6   Global Step: 101790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:04,036-Speed 9624.70 samples/sec   Loss 6.6594   LearningRate 0.0483   Epoch: 6   Global Step: 101800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:05,150-Speed 9194.98 samples/sec   Loss 6.6773   LearningRate 0.0483   Epoch: 6   Global Step: 101810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:06,251-Speed 9309.84 samples/sec   Loss 6.6852   LearningRate 0.0483   Epoch: 6   Global Step: 101820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:07,350-Speed 9325.85 samples/sec   Loss 6.5721   LearningRate 0.0483   Epoch: 6   Global Step: 101830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:08,411-Speed 9653.72 samples/sec   Loss 6.6321   LearningRate 0.0483   Epoch: 6   Global Step: 101840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:09,496-Speed 9442.28 samples/sec   Loss 6.5596   LearningRate 0.0483   Epoch: 6   Global Step: 101850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:10,566-Speed 9576.17 samples/sec   Loss 6.5883   LearningRate 0.0483   Epoch: 6   Global Step: 101860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:11,686-Speed 9154.61 samples/sec   Loss 6.5801   LearningRate 0.0483   Epoch: 6   Global Step: 101870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:12,778-Speed 9383.16 samples/sec   Loss 6.6351   LearningRate 0.0483   Epoch: 6   Global Step: 101880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:13,854-Speed 9518.82 samples/sec   Loss 6.7735   LearningRate 0.0483   Epoch: 6   Global Step: 101890   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:38:14,973-Speed 9160.46 samples/sec   Loss 6.6537   LearningRate 0.0483   Epoch: 6   Global Step: 101900   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:38:16,070-Speed 9341.73 samples/sec   Loss 6.7815   LearningRate 0.0483   Epoch: 6   Global Step: 101910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:17,166-Speed 9344.60 samples/sec   Loss 6.7402   LearningRate 0.0483   Epoch: 6   Global Step: 101920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:18,277-Speed 9226.21 samples/sec   Loss 6.7623   LearningRate 0.0483   Epoch: 6   Global Step: 101930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:19,359-Speed 9463.21 samples/sec   Loss 6.5822   LearningRate 0.0483   Epoch: 6   Global Step: 101940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:20,466-Speed 9259.29 samples/sec   Loss 6.6126   LearningRate 0.0482   Epoch: 6   Global Step: 101950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:21,554-Speed 9415.52 samples/sec   Loss 6.7340   LearningRate 0.0482   Epoch: 6   Global Step: 101960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:22,683-Speed 9068.41 samples/sec   Loss 6.6381   LearningRate 0.0482   Epoch: 6   Global Step: 101970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:23,782-Speed 9329.41 samples/sec   Loss 6.5873   LearningRate 0.0482   Epoch: 6   Global Step: 101980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:24,877-Speed 9353.97 samples/sec   Loss 6.6644   LearningRate 0.0482   Epoch: 6   Global Step: 101990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:26,000-Speed 9129.17 samples/sec   Loss 6.7062   LearningRate 0.0482   Epoch: 6   Global Step: 102000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:38:48,138-[lfw][102000]XNorm: 11.438304
Training: 2022-04-11 15:38:48,139-[lfw][102000]Accuracy-Flip: 0.99550+-0.00269
Training: 2022-04-11 15:38:48,140-[lfw][102000]Accuracy-Highest: 0.99683
Training: 2022-04-11 15:39:13,728-[cfp_fp][102000]XNorm: 9.747645
Training: 2022-04-11 15:39:13,729-[cfp_fp][102000]Accuracy-Flip: 0.95600+-0.01104
Training: 2022-04-11 15:39:13,729-[cfp_fp][102000]Accuracy-Highest: 0.95857
Training: 2022-04-11 15:39:35,829-[agedb_30][102000]XNorm: 11.100109
Training: 2022-04-11 15:39:35,829-[agedb_30][102000]Accuracy-Flip: 0.96483+-0.00880
Training: 2022-04-11 15:39:35,829-[agedb_30][102000]Accuracy-Highest: 0.96483
Training: 2022-04-11 15:39:36,888-Speed 144.45 samples/sec   Loss 6.6428   LearningRate 0.0482   Epoch: 6   Global Step: 102010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:39:37,969-Speed 9483.99 samples/sec   Loss 6.6569   LearningRate 0.0482   Epoch: 6   Global Step: 102020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:39:39,068-Speed 9317.22 samples/sec   Loss 6.6365   LearningRate 0.0482   Epoch: 6   Global Step: 102030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:39:40,148-Speed 9489.25 samples/sec   Loss 6.7594   LearningRate 0.0482   Epoch: 6   Global Step: 102040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:39:41,221-Speed 9547.13 samples/sec   Loss 6.7113   LearningRate 0.0482   Epoch: 6   Global Step: 102050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:39:42,285-Speed 9635.72 samples/sec   Loss 6.7044   LearningRate 0.0482   Epoch: 6   Global Step: 102060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:39:43,340-Speed 9711.96 samples/sec   Loss 6.6895   LearningRate 0.0482   Epoch: 6   Global Step: 102070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:39:44,464-Speed 9115.68 samples/sec   Loss 6.6305   LearningRate 0.0482   Epoch: 6   Global Step: 102080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:39:45,538-Speed 9536.83 samples/sec   Loss 6.7176   LearningRate 0.0482   Epoch: 6   Global Step: 102090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:39:46,624-Speed 9434.92 samples/sec   Loss 6.6154   LearningRate 0.0482   Epoch: 6   Global Step: 102100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:39:47,687-Speed 9638.99 samples/sec   Loss 6.5471   LearningRate 0.0482   Epoch: 6   Global Step: 102110   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:39:48,749-Speed 9646.33 samples/sec   Loss 6.7426   LearningRate 0.0482   Epoch: 6   Global Step: 102120   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:39:49,822-Speed 9555.87 samples/sec   Loss 6.6356   LearningRate 0.0482   Epoch: 6   Global Step: 102130   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:39:50,920-Speed 9332.79 samples/sec   Loss 6.6582   LearningRate 0.0482   Epoch: 6   Global Step: 102140   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:39:51,990-Speed 9574.50 samples/sec   Loss 6.6904   LearningRate 0.0482   Epoch: 6   Global Step: 102150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:39:53,060-Speed 9576.30 samples/sec   Loss 6.7233   LearningRate 0.0482   Epoch: 6   Global Step: 102160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:39:54,104-Speed 9810.84 samples/sec   Loss 6.6794   LearningRate 0.0482   Epoch: 6   Global Step: 102170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:39:55,167-Speed 9635.86 samples/sec   Loss 6.7043   LearningRate 0.0482   Epoch: 6   Global Step: 102180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:39:56,215-Speed 9778.85 samples/sec   Loss 6.6325   LearningRate 0.0481   Epoch: 6   Global Step: 102190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:39:57,267-Speed 9742.60 samples/sec   Loss 6.7565   LearningRate 0.0481   Epoch: 6   Global Step: 102200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:39:58,317-Speed 9755.29 samples/sec   Loss 6.7049   LearningRate 0.0481   Epoch: 6   Global Step: 102210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:39:59,393-Speed 9521.93 samples/sec   Loss 6.5913   LearningRate 0.0481   Epoch: 6   Global Step: 102220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:40:00,481-Speed 9416.93 samples/sec   Loss 6.7253   LearningRate 0.0481   Epoch: 6   Global Step: 102230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:40:01,561-Speed 9491.69 samples/sec   Loss 6.6418   LearningRate 0.0481   Epoch: 6   Global Step: 102240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:40:02,636-Speed 9535.73 samples/sec   Loss 6.6971   LearningRate 0.0481   Epoch: 6   Global Step: 102250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:40:03,698-Speed 9643.03 samples/sec   Loss 6.4956   LearningRate 0.0481   Epoch: 6   Global Step: 102260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:04,755-Speed 9694.25 samples/sec   Loss 6.6694   LearningRate 0.0481   Epoch: 6   Global Step: 102270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:05,840-Speed 9448.73 samples/sec   Loss 6.7681   LearningRate 0.0481   Epoch: 6   Global Step: 102280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:06,919-Speed 9492.56 samples/sec   Loss 6.6669   LearningRate 0.0481   Epoch: 6   Global Step: 102290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:08,002-Speed 9456.10 samples/sec   Loss 6.7102   LearningRate 0.0481   Epoch: 6   Global Step: 102300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:09,050-Speed 9781.15 samples/sec   Loss 6.5827   LearningRate 0.0481   Epoch: 6   Global Step: 102310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:10,126-Speed 9516.17 samples/sec   Loss 6.7632   LearningRate 0.0481   Epoch: 6   Global Step: 102320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:11,226-Speed 9321.47 samples/sec   Loss 6.8106   LearningRate 0.0481   Epoch: 6   Global Step: 102330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:12,304-Speed 9510.47 samples/sec   Loss 6.7675   LearningRate 0.0481   Epoch: 6   Global Step: 102340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:13,389-Speed 9445.20 samples/sec   Loss 6.6808   LearningRate 0.0481   Epoch: 6   Global Step: 102350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:14,492-Speed 9286.04 samples/sec   Loss 6.6957   LearningRate 0.0481   Epoch: 6   Global Step: 102360   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:40:15,588-Speed 9357.79 samples/sec   Loss 6.6908   LearningRate 0.0481   Epoch: 6   Global Step: 102370   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:40:16,627-Speed 9861.83 samples/sec   Loss 6.7873   LearningRate 0.0481   Epoch: 6   Global Step: 102380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:17,704-Speed 9514.69 samples/sec   Loss 6.6692   LearningRate 0.0481   Epoch: 6   Global Step: 102390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:18,785-Speed 9479.87 samples/sec   Loss 6.6677   LearningRate 0.0481   Epoch: 6   Global Step: 102400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:19,822-Speed 9874.85 samples/sec   Loss 6.6354   LearningRate 0.0481   Epoch: 6   Global Step: 102410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:20,945-Speed 9127.94 samples/sec   Loss 6.6537   LearningRate 0.0481   Epoch: 6   Global Step: 102420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:22,022-Speed 9514.38 samples/sec   Loss 6.6628   LearningRate 0.0480   Epoch: 6   Global Step: 102430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:23,082-Speed 9669.87 samples/sec   Loss 6.8360   LearningRate 0.0480   Epoch: 6   Global Step: 102440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:24,152-Speed 9576.37 samples/sec   Loss 6.6450   LearningRate 0.0480   Epoch: 6   Global Step: 102450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:25,225-Speed 9548.10 samples/sec   Loss 6.7586   LearningRate 0.0480   Epoch: 6   Global Step: 102460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:26,300-Speed 9533.90 samples/sec   Loss 6.7151   LearningRate 0.0480   Epoch: 6   Global Step: 102470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:27,423-Speed 9122.91 samples/sec   Loss 6.7168   LearningRate 0.0480   Epoch: 6   Global Step: 102480   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:40:28,496-Speed 9543.73 samples/sec   Loss 6.6742   LearningRate 0.0480   Epoch: 6   Global Step: 102490   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:40:29,584-Speed 9434.51 samples/sec   Loss 6.6666   LearningRate 0.0480   Epoch: 6   Global Step: 102500   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:40:30,702-Speed 9162.67 samples/sec   Loss 6.8178   LearningRate 0.0480   Epoch: 6   Global Step: 102510   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:40:31,791-Speed 9412.39 samples/sec   Loss 6.6427   LearningRate 0.0480   Epoch: 6   Global Step: 102520   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:40:32,851-Speed 9663.51 samples/sec   Loss 6.7447   LearningRate 0.0480   Epoch: 6   Global Step: 102530   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:40:33,936-Speed 9442.55 samples/sec   Loss 6.7688   LearningRate 0.0480   Epoch: 6   Global Step: 102540   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:40:35,019-Speed 9459.63 samples/sec   Loss 6.7407   LearningRate 0.0480   Epoch: 6   Global Step: 102550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:36,093-Speed 9537.45 samples/sec   Loss 6.7314   LearningRate 0.0480   Epoch: 6   Global Step: 102560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:37,199-Speed 9274.68 samples/sec   Loss 6.7051   LearningRate 0.0480   Epoch: 6   Global Step: 102570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:38,248-Speed 9763.51 samples/sec   Loss 6.7368   LearningRate 0.0480   Epoch: 6   Global Step: 102580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:39,290-Speed 9831.99 samples/sec   Loss 6.7351   LearningRate 0.0480   Epoch: 6   Global Step: 102590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:40,418-Speed 9082.20 samples/sec   Loss 6.7309   LearningRate 0.0480   Epoch: 6   Global Step: 102600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:41,502-Speed 9459.35 samples/sec   Loss 6.7070   LearningRate 0.0480   Epoch: 6   Global Step: 102610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:42,594-Speed 9380.66 samples/sec   Loss 6.7494   LearningRate 0.0480   Epoch: 6   Global Step: 102620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:43,674-Speed 9485.04 samples/sec   Loss 6.6804   LearningRate 0.0480   Epoch: 6   Global Step: 102630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:44,785-Speed 9222.14 samples/sec   Loss 6.6840   LearningRate 0.0480   Epoch: 6   Global Step: 102640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:45,896-Speed 9219.13 samples/sec   Loss 6.7474   LearningRate 0.0480   Epoch: 6   Global Step: 102650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:46,971-Speed 9536.53 samples/sec   Loss 6.7671   LearningRate 0.0480   Epoch: 6   Global Step: 102660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:47,995-Speed 10014.40 samples/sec   Loss 6.6487   LearningRate 0.0479   Epoch: 6   Global Step: 102670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:49,037-Speed 9832.74 samples/sec   Loss 6.7948   LearningRate 0.0479   Epoch: 6   Global Step: 102680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:50,145-Speed 9242.21 samples/sec   Loss 6.7775   LearningRate 0.0479   Epoch: 6   Global Step: 102690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:51,250-Speed 9274.79 samples/sec   Loss 6.7850   LearningRate 0.0479   Epoch: 6   Global Step: 102700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:52,330-Speed 9486.03 samples/sec   Loss 6.7728   LearningRate 0.0479   Epoch: 6   Global Step: 102710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:53,355-Speed 9992.53 samples/sec   Loss 6.6145   LearningRate 0.0479   Epoch: 6   Global Step: 102720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:54,405-Speed 9764.01 samples/sec   Loss 6.6811   LearningRate 0.0479   Epoch: 6   Global Step: 102730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:55,476-Speed 9562.96 samples/sec   Loss 6.7356   LearningRate 0.0479   Epoch: 6   Global Step: 102740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:40:56,549-Speed 9548.59 samples/sec   Loss 6.7809   LearningRate 0.0479   Epoch: 6   Global Step: 102750   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:40:57,624-Speed 9534.62 samples/sec   Loss 6.7271   LearningRate 0.0479   Epoch: 6   Global Step: 102760   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:40:58,712-Speed 9416.66 samples/sec   Loss 6.8789   LearningRate 0.0479   Epoch: 6   Global Step: 102770   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:40:59,781-Speed 9584.49 samples/sec   Loss 6.7271   LearningRate 0.0479   Epoch: 6   Global Step: 102780   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:41:00,849-Speed 9587.81 samples/sec   Loss 6.7331   LearningRate 0.0479   Epoch: 6   Global Step: 102790   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:41:01,942-Speed 9378.03 samples/sec   Loss 6.6080   LearningRate 0.0479   Epoch: 6   Global Step: 102800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:03,027-Speed 9440.81 samples/sec   Loss 6.6751   LearningRate 0.0479   Epoch: 6   Global Step: 102810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:04,063-Speed 9892.14 samples/sec   Loss 6.7919   LearningRate 0.0479   Epoch: 6   Global Step: 102820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:05,100-Speed 9877.39 samples/sec   Loss 6.6499   LearningRate 0.0479   Epoch: 6   Global Step: 102830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:06,141-Speed 9841.03 samples/sec   Loss 6.7558   LearningRate 0.0479   Epoch: 6   Global Step: 102840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:07,205-Speed 9641.39 samples/sec   Loss 6.7749   LearningRate 0.0479   Epoch: 6   Global Step: 102850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:08,293-Speed 9413.19 samples/sec   Loss 6.6420   LearningRate 0.0479   Epoch: 6   Global Step: 102860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:09,403-Speed 9227.21 samples/sec   Loss 6.7632   LearningRate 0.0479   Epoch: 6   Global Step: 102870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:10,471-Speed 9595.54 samples/sec   Loss 6.7817   LearningRate 0.0479   Epoch: 6   Global Step: 102880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:11,559-Speed 9417.15 samples/sec   Loss 6.7163   LearningRate 0.0479   Epoch: 6   Global Step: 102890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:12,656-Speed 9339.33 samples/sec   Loss 6.8031   LearningRate 0.0479   Epoch: 6   Global Step: 102900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:13,748-Speed 9388.37 samples/sec   Loss 6.7880   LearningRate 0.0478   Epoch: 6   Global Step: 102910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:14,851-Speed 9288.54 samples/sec   Loss 6.6509   LearningRate 0.0478   Epoch: 6   Global Step: 102920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:15,960-Speed 9235.47 samples/sec   Loss 6.5729   LearningRate 0.0478   Epoch: 6   Global Step: 102930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:17,088-Speed 9086.70 samples/sec   Loss 6.7874   LearningRate 0.0478   Epoch: 6   Global Step: 102940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:18,151-Speed 9641.84 samples/sec   Loss 6.7590   LearningRate 0.0478   Epoch: 6   Global Step: 102950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:19,228-Speed 9506.32 samples/sec   Loss 6.7601   LearningRate 0.0478   Epoch: 6   Global Step: 102960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:20,292-Speed 9629.94 samples/sec   Loss 6.8149   LearningRate 0.0478   Epoch: 6   Global Step: 102970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:21,428-Speed 9019.43 samples/sec   Loss 6.7421   LearningRate 0.0478   Epoch: 6   Global Step: 102980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:22,535-Speed 9260.01 samples/sec   Loss 6.7314   LearningRate 0.0478   Epoch: 6   Global Step: 102990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:23,573-Speed 9866.76 samples/sec   Loss 6.6753   LearningRate 0.0478   Epoch: 6   Global Step: 103000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:24,685-Speed 9220.42 samples/sec   Loss 6.8348   LearningRate 0.0478   Epoch: 6   Global Step: 103010   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:41:25,736-Speed 9758.14 samples/sec   Loss 6.7694   LearningRate 0.0478   Epoch: 6   Global Step: 103020   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:41:26,891-Speed 8864.29 samples/sec   Loss 6.7925   LearningRate 0.0478   Epoch: 6   Global Step: 103030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:27,981-Speed 9404.03 samples/sec   Loss 6.7830   LearningRate 0.0478   Epoch: 6   Global Step: 103040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:29,075-Speed 9363.53 samples/sec   Loss 6.8010   LearningRate 0.0478   Epoch: 6   Global Step: 103050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:30,147-Speed 9554.32 samples/sec   Loss 6.7065   LearningRate 0.0478   Epoch: 6   Global Step: 103060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:31,217-Speed 9576.15 samples/sec   Loss 6.7137   LearningRate 0.0478   Epoch: 6   Global Step: 103070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:32,276-Speed 9678.48 samples/sec   Loss 6.7952   LearningRate 0.0478   Epoch: 6   Global Step: 103080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:33,307-Speed 9940.19 samples/sec   Loss 6.7151   LearningRate 0.0478   Epoch: 6   Global Step: 103090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:34,353-Speed 9795.45 samples/sec   Loss 6.6993   LearningRate 0.0478   Epoch: 6   Global Step: 103100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:35,434-Speed 9471.08 samples/sec   Loss 6.7877   LearningRate 0.0478   Epoch: 6   Global Step: 103110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:36,530-Speed 9350.98 samples/sec   Loss 6.8464   LearningRate 0.0478   Epoch: 6   Global Step: 103120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:37,679-Speed 8921.40 samples/sec   Loss 6.7890   LearningRate 0.0478   Epoch: 6   Global Step: 103130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:38,736-Speed 9688.97 samples/sec   Loss 6.7152   LearningRate 0.0478   Epoch: 6   Global Step: 103140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:39,853-Speed 9173.97 samples/sec   Loss 6.7549   LearningRate 0.0477   Epoch: 6   Global Step: 103150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:40,919-Speed 9613.15 samples/sec   Loss 6.6749   LearningRate 0.0477   Epoch: 6   Global Step: 103160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:41:42,008-Speed 9410.37 samples/sec   Loss 6.7567   LearningRate 0.0477   Epoch: 6   Global Step: 103170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:43,103-Speed 9357.92 samples/sec   Loss 6.8723   LearningRate 0.0477   Epoch: 6   Global Step: 103180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:44,204-Speed 9310.20 samples/sec   Loss 6.7300   LearningRate 0.0477   Epoch: 6   Global Step: 103190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:45,297-Speed 9370.38 samples/sec   Loss 6.8384   LearningRate 0.0477   Epoch: 6   Global Step: 103200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:46,373-Speed 9524.80 samples/sec   Loss 6.7392   LearningRate 0.0477   Epoch: 6   Global Step: 103210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:47,432-Speed 9671.98 samples/sec   Loss 6.8214   LearningRate 0.0477   Epoch: 6   Global Step: 103220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:48,474-Speed 9834.06 samples/sec   Loss 6.7521   LearningRate 0.0477   Epoch: 6   Global Step: 103230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:49,519-Speed 9811.60 samples/sec   Loss 6.7538   LearningRate 0.0477   Epoch: 6   Global Step: 103240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:50,631-Speed 9207.07 samples/sec   Loss 6.7744   LearningRate 0.0477   Epoch: 6   Global Step: 103250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:51,726-Speed 9357.61 samples/sec   Loss 6.7735   LearningRate 0.0477   Epoch: 6   Global Step: 103260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:52,785-Speed 9675.32 samples/sec   Loss 6.8197   LearningRate 0.0477   Epoch: 6   Global Step: 103270   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:41:53,844-Speed 9675.83 samples/sec   Loss 6.7398   LearningRate 0.0477   Epoch: 6   Global Step: 103280   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:41:54,930-Speed 9442.01 samples/sec   Loss 6.7355   LearningRate 0.0477   Epoch: 6   Global Step: 103290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:56,046-Speed 9175.88 samples/sec   Loss 6.7689   LearningRate 0.0477   Epoch: 6   Global Step: 103300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:57,132-Speed 9438.77 samples/sec   Loss 6.8720   LearningRate 0.0477   Epoch: 6   Global Step: 103310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:58,226-Speed 9363.09 samples/sec   Loss 6.8047   LearningRate 0.0477   Epoch: 6   Global Step: 103320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:41:59,341-Speed 9191.40 samples/sec   Loss 6.7416   LearningRate 0.0477   Epoch: 6   Global Step: 103330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:00,430-Speed 9405.34 samples/sec   Loss 6.7240   LearningRate 0.0477   Epoch: 6   Global Step: 103340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:01,502-Speed 9557.40 samples/sec   Loss 6.7867   LearningRate 0.0477   Epoch: 6   Global Step: 103350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:02,605-Speed 9294.16 samples/sec   Loss 6.7908   LearningRate 0.0477   Epoch: 6   Global Step: 103360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:03,694-Speed 9409.72 samples/sec   Loss 6.6927   LearningRate 0.0477   Epoch: 6   Global Step: 103370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:04,768-Speed 9534.17 samples/sec   Loss 6.8246   LearningRate 0.0477   Epoch: 6   Global Step: 103380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:05,798-Speed 9947.07 samples/sec   Loss 6.8396   LearningRate 0.0476   Epoch: 6   Global Step: 103390   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:42:06,853-Speed 9711.59 samples/sec   Loss 6.7770   LearningRate 0.0476   Epoch: 6   Global Step: 103400   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:42:07,913-Speed 9674.70 samples/sec   Loss 6.7763   LearningRate 0.0476   Epoch: 6   Global Step: 103410   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:42:08,971-Speed 9680.04 samples/sec   Loss 6.7624   LearningRate 0.0476   Epoch: 6   Global Step: 103420   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:42:10,035-Speed 9635.15 samples/sec   Loss 6.7107   LearningRate 0.0476   Epoch: 6   Global Step: 103430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:11,081-Speed 9789.39 samples/sec   Loss 6.7978   LearningRate 0.0476   Epoch: 6   Global Step: 103440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:42:12,125-Speed 9813.15 samples/sec   Loss 6.8402   LearningRate 0.0476   Epoch: 6   Global Step: 103450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:42:13,222-Speed 9343.98 samples/sec   Loss 6.8422   LearningRate 0.0476   Epoch: 6   Global Step: 103460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:42:14,304-Speed 9467.76 samples/sec   Loss 6.7067   LearningRate 0.0476   Epoch: 6   Global Step: 103470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:42:15,372-Speed 9597.44 samples/sec   Loss 6.7254   LearningRate 0.0476   Epoch: 6   Global Step: 103480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:42:16,413-Speed 9832.98 samples/sec   Loss 6.6319   LearningRate 0.0476   Epoch: 6   Global Step: 103490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:42:17,507-Speed 9372.93 samples/sec   Loss 6.6798   LearningRate 0.0476   Epoch: 6   Global Step: 103500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:42:18,540-Speed 9921.41 samples/sec   Loss 6.6906   LearningRate 0.0476   Epoch: 6   Global Step: 103510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:42:19,598-Speed 9679.46 samples/sec   Loss 6.7693   LearningRate 0.0476   Epoch: 6   Global Step: 103520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:42:20,695-Speed 9341.49 samples/sec   Loss 6.7915   LearningRate 0.0476   Epoch: 6   Global Step: 103530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:42:21,764-Speed 9589.26 samples/sec   Loss 6.7768   LearningRate 0.0476   Epoch: 6   Global Step: 103540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:22,861-Speed 9337.83 samples/sec   Loss 6.7863   LearningRate 0.0476   Epoch: 6   Global Step: 103550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:23,950-Speed 9405.63 samples/sec   Loss 6.8472   LearningRate 0.0476   Epoch: 6   Global Step: 103560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:25,026-Speed 9529.43 samples/sec   Loss 6.9266   LearningRate 0.0476   Epoch: 6   Global Step: 103570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:26,126-Speed 9309.30 samples/sec   Loss 6.9393   LearningRate 0.0476   Epoch: 6   Global Step: 103580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:27,210-Speed 9455.68 samples/sec   Loss 6.7267   LearningRate 0.0476   Epoch: 6   Global Step: 103590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:28,320-Speed 9230.21 samples/sec   Loss 6.8164   LearningRate 0.0476   Epoch: 6   Global Step: 103600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:29,441-Speed 9139.16 samples/sec   Loss 6.7571   LearningRate 0.0476   Epoch: 6   Global Step: 103610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:30,530-Speed 9406.93 samples/sec   Loss 6.7351   LearningRate 0.0476   Epoch: 6   Global Step: 103620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:31,606-Speed 9522.25 samples/sec   Loss 6.8285   LearningRate 0.0475   Epoch: 6   Global Step: 103630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:32,689-Speed 9460.34 samples/sec   Loss 6.7834   LearningRate 0.0475   Epoch: 6   Global Step: 103640   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:42:33,812-Speed 9127.96 samples/sec   Loss 6.8245   LearningRate 0.0475   Epoch: 6   Global Step: 103650   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:42:34,891-Speed 9490.93 samples/sec   Loss 6.8576   LearningRate 0.0475   Epoch: 6   Global Step: 103660   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:42:35,963-Speed 9555.78 samples/sec   Loss 6.8551   LearningRate 0.0475   Epoch: 6   Global Step: 103670   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:42:37,027-Speed 9636.13 samples/sec   Loss 6.8072   LearningRate 0.0475   Epoch: 6   Global Step: 103680   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:42:38,089-Speed 9650.28 samples/sec   Loss 6.8435   LearningRate 0.0475   Epoch: 6   Global Step: 103690   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:42:39,197-Speed 9246.03 samples/sec   Loss 6.7997   LearningRate 0.0475   Epoch: 6   Global Step: 103700   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:42:40,297-Speed 9316.72 samples/sec   Loss 6.7905   LearningRate 0.0475   Epoch: 6   Global Step: 103710   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:42:41,396-Speed 9326.11 samples/sec   Loss 6.8951   LearningRate 0.0475   Epoch: 6   Global Step: 103720   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:42:42,453-Speed 9687.64 samples/sec   Loss 6.8170   LearningRate 0.0475   Epoch: 6   Global Step: 103730   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:42:43,510-Speed 9696.67 samples/sec   Loss 6.7983   LearningRate 0.0475   Epoch: 6   Global Step: 103740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:44,585-Speed 9531.58 samples/sec   Loss 6.9286   LearningRate 0.0475   Epoch: 6   Global Step: 103750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:45,714-Speed 9074.97 samples/sec   Loss 6.8264   LearningRate 0.0475   Epoch: 6   Global Step: 103760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:46,814-Speed 9317.13 samples/sec   Loss 6.8553   LearningRate 0.0475   Epoch: 6   Global Step: 103770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:47,838-Speed 10003.25 samples/sec   Loss 6.7980   LearningRate 0.0475   Epoch: 6   Global Step: 103780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:48,924-Speed 9438.99 samples/sec   Loss 6.7676   LearningRate 0.0475   Epoch: 6   Global Step: 103790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:50,045-Speed 9139.11 samples/sec   Loss 6.8025   LearningRate 0.0475   Epoch: 6   Global Step: 103800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:51,107-Speed 9648.75 samples/sec   Loss 6.8186   LearningRate 0.0475   Epoch: 6   Global Step: 103810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:52,154-Speed 9777.55 samples/sec   Loss 6.8947   LearningRate 0.0475   Epoch: 6   Global Step: 103820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:53,259-Speed 9276.77 samples/sec   Loss 6.7046   LearningRate 0.0475   Epoch: 6   Global Step: 103830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:54,342-Speed 9465.20 samples/sec   Loss 6.8055   LearningRate 0.0475   Epoch: 6   Global Step: 103840   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:42:55,367-Speed 9988.07 samples/sec   Loss 6.7846   LearningRate 0.0475   Epoch: 6   Global Step: 103850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:56,435-Speed 9599.52 samples/sec   Loss 6.8189   LearningRate 0.0475   Epoch: 6   Global Step: 103860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:57,517-Speed 9468.35 samples/sec   Loss 6.8721   LearningRate 0.0475   Epoch: 6   Global Step: 103870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:58,570-Speed 9735.08 samples/sec   Loss 6.9358   LearningRate 0.0474   Epoch: 6   Global Step: 103880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:42:59,623-Speed 9731.03 samples/sec   Loss 6.7626   LearningRate 0.0474   Epoch: 6   Global Step: 103890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:43:00,739-Speed 9175.08 samples/sec   Loss 6.8872   LearningRate 0.0474   Epoch: 6   Global Step: 103900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:43:01,784-Speed 9808.70 samples/sec   Loss 6.8177   LearningRate 0.0474   Epoch: 6   Global Step: 103910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:43:02,885-Speed 9302.18 samples/sec   Loss 6.7314   LearningRate 0.0474   Epoch: 6   Global Step: 103920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:43:04,003-Speed 9166.24 samples/sec   Loss 6.8024   LearningRate 0.0474   Epoch: 6   Global Step: 103930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:43:05,073-Speed 9573.09 samples/sec   Loss 6.7523   LearningRate 0.0474   Epoch: 6   Global Step: 103940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:43:06,169-Speed 9353.78 samples/sec   Loss 6.8983   LearningRate 0.0474   Epoch: 6   Global Step: 103950   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:43:07,276-Speed 9252.38 samples/sec   Loss 6.8345   LearningRate 0.0474   Epoch: 6   Global Step: 103960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:43:08,376-Speed 9316.27 samples/sec   Loss 6.8763   LearningRate 0.0474   Epoch: 6   Global Step: 103970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:43:09,482-Speed 9262.19 samples/sec   Loss 6.8156   LearningRate 0.0474   Epoch: 6   Global Step: 103980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:43:10,574-Speed 9383.14 samples/sec   Loss 6.9440   LearningRate 0.0474   Epoch: 6   Global Step: 103990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:43:11,638-Speed 9632.91 samples/sec   Loss 6.8301   LearningRate 0.0474   Epoch: 6   Global Step: 104000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:43:33,990-[lfw][104000]XNorm: 11.199217
Training: 2022-04-11 15:43:33,991-[lfw][104000]Accuracy-Flip: 0.99617+-0.00269
Training: 2022-04-11 15:43:33,991-[lfw][104000]Accuracy-Highest: 0.99683
Training: 2022-04-11 15:43:59,870-[cfp_fp][104000]XNorm: 9.498876
Training: 2022-04-11 15:43:59,870-[cfp_fp][104000]Accuracy-Flip: 0.95700+-0.00959
Training: 2022-04-11 15:43:59,871-[cfp_fp][104000]Accuracy-Highest: 0.95857
Training: 2022-04-11 15:44:22,133-[agedb_30][104000]XNorm: 10.824177
Training: 2022-04-11 15:44:22,134-[agedb_30][104000]Accuracy-Flip: 0.96383+-0.00995
Training: 2022-04-11 15:44:22,134-[agedb_30][104000]Accuracy-Highest: 0.96483
Training: 2022-04-11 15:44:23,225-Speed 143.04 samples/sec   Loss 6.9240   LearningRate 0.0474   Epoch: 6   Global Step: 104010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:24,310-Speed 9436.07 samples/sec   Loss 6.7355   LearningRate 0.0474   Epoch: 6   Global Step: 104020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:25,422-Speed 9217.45 samples/sec   Loss 6.7697   LearningRate 0.0474   Epoch: 6   Global Step: 104030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:26,548-Speed 9097.44 samples/sec   Loss 6.8112   LearningRate 0.0474   Epoch: 6   Global Step: 104040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:27,600-Speed 9734.65 samples/sec   Loss 6.8464   LearningRate 0.0474   Epoch: 6   Global Step: 104050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:28,677-Speed 9517.29 samples/sec   Loss 6.8585   LearningRate 0.0474   Epoch: 6   Global Step: 104060   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:44:29,772-Speed 9358.93 samples/sec   Loss 6.8592   LearningRate 0.0474   Epoch: 6   Global Step: 104070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:30,830-Speed 9679.86 samples/sec   Loss 6.8474   LearningRate 0.0474   Epoch: 6   Global Step: 104080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:31,876-Speed 9795.88 samples/sec   Loss 6.7884   LearningRate 0.0474   Epoch: 6   Global Step: 104090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:32,962-Speed 9431.29 samples/sec   Loss 6.8124   LearningRate 0.0474   Epoch: 6   Global Step: 104100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:34,042-Speed 9495.95 samples/sec   Loss 6.8388   LearningRate 0.0474   Epoch: 6   Global Step: 104110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:35,160-Speed 9165.22 samples/sec   Loss 6.8529   LearningRate 0.0473   Epoch: 6   Global Step: 104120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:44:36,250-Speed 9397.46 samples/sec   Loss 6.9060   LearningRate 0.0473   Epoch: 6   Global Step: 104130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:44:37,355-Speed 9273.61 samples/sec   Loss 6.8562   LearningRate 0.0473   Epoch: 6   Global Step: 104140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:44:38,403-Speed 9768.72 samples/sec   Loss 6.8456   LearningRate 0.0473   Epoch: 6   Global Step: 104150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:44:39,469-Speed 9617.02 samples/sec   Loss 6.7935   LearningRate 0.0473   Epoch: 6   Global Step: 104160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:44:40,598-Speed 9073.91 samples/sec   Loss 6.6845   LearningRate 0.0473   Epoch: 6   Global Step: 104170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:44:41,700-Speed 9294.81 samples/sec   Loss 6.8292   LearningRate 0.0473   Epoch: 6   Global Step: 104180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:44:42,757-Speed 9699.62 samples/sec   Loss 6.8528   LearningRate 0.0473   Epoch: 6   Global Step: 104190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:44:43,816-Speed 9677.19 samples/sec   Loss 6.8091   LearningRate 0.0473   Epoch: 6   Global Step: 104200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:44:44,902-Speed 9430.85 samples/sec   Loss 6.9329   LearningRate 0.0473   Epoch: 6   Global Step: 104210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:44:46,026-Speed 9119.56 samples/sec   Loss 6.8128   LearningRate 0.0473   Epoch: 6   Global Step: 104220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:47,097-Speed 9560.08 samples/sec   Loss 6.9614   LearningRate 0.0473   Epoch: 6   Global Step: 104230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:48,142-Speed 9805.74 samples/sec   Loss 6.7955   LearningRate 0.0473   Epoch: 6   Global Step: 104240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:49,241-Speed 9327.24 samples/sec   Loss 6.7545   LearningRate 0.0473   Epoch: 6   Global Step: 104250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:50,330-Speed 9409.92 samples/sec   Loss 6.7622   LearningRate 0.0473   Epoch: 6   Global Step: 104260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:51,418-Speed 9412.78 samples/sec   Loss 6.8664   LearningRate 0.0473   Epoch: 6   Global Step: 104270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:52,510-Speed 9388.12 samples/sec   Loss 6.8287   LearningRate 0.0473   Epoch: 6   Global Step: 104280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:53,624-Speed 9196.76 samples/sec   Loss 6.7677   LearningRate 0.0473   Epoch: 6   Global Step: 104290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:54,726-Speed 9297.37 samples/sec   Loss 6.7228   LearningRate 0.0473   Epoch: 6   Global Step: 104300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:55,780-Speed 9725.92 samples/sec   Loss 6.8678   LearningRate 0.0473   Epoch: 6   Global Step: 104310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:44:56,845-Speed 9616.30 samples/sec   Loss 6.8192   LearningRate 0.0473   Epoch: 6   Global Step: 104320   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:44:57,999-Speed 8877.51 samples/sec   Loss 6.9158   LearningRate 0.0473   Epoch: 6   Global Step: 104330   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:44:59,129-Speed 9067.41 samples/sec   Loss 6.8182   LearningRate 0.0473   Epoch: 6   Global Step: 104340   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:45:00,211-Speed 9469.45 samples/sec   Loss 6.9023   LearningRate 0.0473   Epoch: 6   Global Step: 104350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:01,300-Speed 9408.43 samples/sec   Loss 6.9427   LearningRate 0.0472   Epoch: 6   Global Step: 104360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:02,382-Speed 9477.98 samples/sec   Loss 6.8512   LearningRate 0.0472   Epoch: 6   Global Step: 104370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:03,510-Speed 9077.24 samples/sec   Loss 6.8824   LearningRate 0.0472   Epoch: 6   Global Step: 104380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:04,589-Speed 9501.42 samples/sec   Loss 6.8960   LearningRate 0.0472   Epoch: 6   Global Step: 104390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:05,646-Speed 9694.02 samples/sec   Loss 6.8969   LearningRate 0.0472   Epoch: 6   Global Step: 104400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:06,730-Speed 9451.78 samples/sec   Loss 6.8697   LearningRate 0.0472   Epoch: 6   Global Step: 104410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:07,825-Speed 9358.70 samples/sec   Loss 6.9398   LearningRate 0.0472   Epoch: 6   Global Step: 104420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:08,905-Speed 9484.21 samples/sec   Loss 6.8170   LearningRate 0.0472   Epoch: 6   Global Step: 104430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:10,006-Speed 9302.20 samples/sec   Loss 7.0257   LearningRate 0.0472   Epoch: 6   Global Step: 104440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:11,092-Speed 9440.62 samples/sec   Loss 6.8473   LearningRate 0.0472   Epoch: 6   Global Step: 104450   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:45:12,215-Speed 9127.44 samples/sec   Loss 6.7976   LearningRate 0.0472   Epoch: 6   Global Step: 104460   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:45:13,342-Speed 9087.97 samples/sec   Loss 6.9312   LearningRate 0.0472   Epoch: 6   Global Step: 104470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:14,451-Speed 9237.28 samples/sec   Loss 6.9118   LearningRate 0.0472   Epoch: 6   Global Step: 104480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:15,550-Speed 9328.91 samples/sec   Loss 6.7221   LearningRate 0.0472   Epoch: 6   Global Step: 104490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:16,600-Speed 9757.13 samples/sec   Loss 6.6521   LearningRate 0.0472   Epoch: 6   Global Step: 104500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:17,667-Speed 9599.85 samples/sec   Loss 6.8312   LearningRate 0.0472   Epoch: 6   Global Step: 104510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:18,731-Speed 9629.65 samples/sec   Loss 6.8137   LearningRate 0.0472   Epoch: 6   Global Step: 104520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:19,867-Speed 9021.52 samples/sec   Loss 6.9197   LearningRate 0.0472   Epoch: 6   Global Step: 104530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:20,968-Speed 9305.65 samples/sec   Loss 6.8351   LearningRate 0.0472   Epoch: 6   Global Step: 104540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:22,036-Speed 9590.55 samples/sec   Loss 6.8494   LearningRate 0.0472   Epoch: 6   Global Step: 104550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:23,103-Speed 9606.63 samples/sec   Loss 6.7745   LearningRate 0.0472   Epoch: 6   Global Step: 104560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:24,199-Speed 9347.19 samples/sec   Loss 6.9114   LearningRate 0.0472   Epoch: 6   Global Step: 104570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:25,289-Speed 9403.97 samples/sec   Loss 6.8293   LearningRate 0.0472   Epoch: 6   Global Step: 104580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:26,348-Speed 9673.49 samples/sec   Loss 6.7360   LearningRate 0.0472   Epoch: 6   Global Step: 104590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:27,426-Speed 9500.31 samples/sec   Loss 6.8777   LearningRate 0.0471   Epoch: 6   Global Step: 104600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:28,467-Speed 9847.40 samples/sec   Loss 6.8639   LearningRate 0.0471   Epoch: 6   Global Step: 104610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:29,543-Speed 9513.98 samples/sec   Loss 6.8611   LearningRate 0.0471   Epoch: 6   Global Step: 104620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:30,632-Speed 9417.39 samples/sec   Loss 6.8167   LearningRate 0.0471   Epoch: 6   Global Step: 104630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:31,706-Speed 9536.01 samples/sec   Loss 6.7649   LearningRate 0.0471   Epoch: 6   Global Step: 104640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:32,828-Speed 9129.88 samples/sec   Loss 6.8664   LearningRate 0.0471   Epoch: 6   Global Step: 104650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:33,899-Speed 9568.28 samples/sec   Loss 6.8979   LearningRate 0.0471   Epoch: 6   Global Step: 104660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:34,983-Speed 9460.48 samples/sec   Loss 6.8146   LearningRate 0.0471   Epoch: 6   Global Step: 104670   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:45:36,034-Speed 9750.70 samples/sec   Loss 6.9084   LearningRate 0.0471   Epoch: 6   Global Step: 104680   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:45:37,063-Speed 9956.87 samples/sec   Loss 6.7948   LearningRate 0.0471   Epoch: 6   Global Step: 104690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:38,120-Speed 9686.59 samples/sec   Loss 6.8005   LearningRate 0.0471   Epoch: 6   Global Step: 104700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:39,170-Speed 9764.26 samples/sec   Loss 6.8946   LearningRate 0.0471   Epoch: 6   Global Step: 104710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:40,226-Speed 9699.83 samples/sec   Loss 6.8343   LearningRate 0.0471   Epoch: 6   Global Step: 104720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:41,321-Speed 9363.41 samples/sec   Loss 6.8363   LearningRate 0.0471   Epoch: 6   Global Step: 104730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:42,404-Speed 9460.63 samples/sec   Loss 6.9499   LearningRate 0.0471   Epoch: 6   Global Step: 104740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:43,477-Speed 9549.00 samples/sec   Loss 6.7827   LearningRate 0.0471   Epoch: 6   Global Step: 104750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:44,559-Speed 9467.08 samples/sec   Loss 6.8785   LearningRate 0.0471   Epoch: 6   Global Step: 104760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:45,633-Speed 9538.57 samples/sec   Loss 6.8960   LearningRate 0.0471   Epoch: 6   Global Step: 104770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:46,714-Speed 9481.74 samples/sec   Loss 6.9169   LearningRate 0.0471   Epoch: 6   Global Step: 104780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:47,824-Speed 9228.50 samples/sec   Loss 6.9895   LearningRate 0.0471   Epoch: 6   Global Step: 104790   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:45:48,879-Speed 9707.93 samples/sec   Loss 6.7549   LearningRate 0.0471   Epoch: 6   Global Step: 104800   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:45:49,942-Speed 9643.83 samples/sec   Loss 6.7524   LearningRate 0.0471   Epoch: 6   Global Step: 104810   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:45:50,997-Speed 9711.54 samples/sec   Loss 6.8573   LearningRate 0.0471   Epoch: 6   Global Step: 104820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:52,044-Speed 9783.40 samples/sec   Loss 6.8799   LearningRate 0.0471   Epoch: 6   Global Step: 104830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:53,068-Speed 10009.55 samples/sec   Loss 6.8975   LearningRate 0.0471   Epoch: 6   Global Step: 104840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:54,123-Speed 9716.78 samples/sec   Loss 6.7256   LearningRate 0.0470   Epoch: 6   Global Step: 104850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:55,216-Speed 9373.58 samples/sec   Loss 6.7956   LearningRate 0.0470   Epoch: 6   Global Step: 104860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:56,290-Speed 9543.88 samples/sec   Loss 6.9019   LearningRate 0.0470   Epoch: 6   Global Step: 104870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:57,355-Speed 9619.42 samples/sec   Loss 6.8381   LearningRate 0.0470   Epoch: 6   Global Step: 104880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:58,418-Speed 9638.96 samples/sec   Loss 6.7964   LearningRate 0.0470   Epoch: 6   Global Step: 104890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:45:59,537-Speed 9155.84 samples/sec   Loss 6.8796   LearningRate 0.0470   Epoch: 6   Global Step: 104900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:00,619-Speed 9467.59 samples/sec   Loss 6.7952   LearningRate 0.0470   Epoch: 6   Global Step: 104910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:01,708-Speed 9405.28 samples/sec   Loss 6.9066   LearningRate 0.0470   Epoch: 6   Global Step: 104920   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:46:02,813-Speed 9271.66 samples/sec   Loss 6.9620   LearningRate 0.0470   Epoch: 6   Global Step: 104930   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:46:03,925-Speed 9219.78 samples/sec   Loss 6.8241   LearningRate 0.0470   Epoch: 6   Global Step: 104940   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:46:04,993-Speed 9594.17 samples/sec   Loss 6.8614   LearningRate 0.0470   Epoch: 6   Global Step: 104950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:06,060-Speed 9597.98 samples/sec   Loss 6.7891   LearningRate 0.0470   Epoch: 6   Global Step: 104960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:07,124-Speed 9637.03 samples/sec   Loss 6.9506   LearningRate 0.0470   Epoch: 6   Global Step: 104970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:08,231-Speed 9252.44 samples/sec   Loss 6.7641   LearningRate 0.0470   Epoch: 6   Global Step: 104980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:09,297-Speed 9605.71 samples/sec   Loss 6.9203   LearningRate 0.0470   Epoch: 6   Global Step: 104990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:10,383-Speed 9436.76 samples/sec   Loss 6.7854   LearningRate 0.0470   Epoch: 6   Global Step: 105000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:11,495-Speed 9219.14 samples/sec   Loss 6.7799   LearningRate 0.0470   Epoch: 6   Global Step: 105010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:12,585-Speed 9399.49 samples/sec   Loss 6.8693   LearningRate 0.0470   Epoch: 6   Global Step: 105020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:13,652-Speed 9606.21 samples/sec   Loss 6.7474   LearningRate 0.0470   Epoch: 6   Global Step: 105030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:14,761-Speed 9241.39 samples/sec   Loss 6.9150   LearningRate 0.0470   Epoch: 6   Global Step: 105040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:15,861-Speed 9311.96 samples/sec   Loss 7.0376   LearningRate 0.0470   Epoch: 6   Global Step: 105050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:16,945-Speed 9450.53 samples/sec   Loss 6.8472   LearningRate 0.0470   Epoch: 6   Global Step: 105060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:18,022-Speed 9515.52 samples/sec   Loss 6.8901   LearningRate 0.0470   Epoch: 6   Global Step: 105070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:46:19,100-Speed 9503.46 samples/sec   Loss 6.8584   LearningRate 0.0470   Epoch: 6   Global Step: 105080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:46:20,193-Speed 9377.52 samples/sec   Loss 6.9077   LearningRate 0.0469   Epoch: 6   Global Step: 105090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:46:21,263-Speed 9568.19 samples/sec   Loss 6.9215   LearningRate 0.0469   Epoch: 6   Global Step: 105100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:46:22,345-Speed 9474.30 samples/sec   Loss 7.0051   LearningRate 0.0469   Epoch: 6   Global Step: 105110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:46:23,432-Speed 9424.88 samples/sec   Loss 6.9291   LearningRate 0.0469   Epoch: 6   Global Step: 105120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:46:24,527-Speed 9359.94 samples/sec   Loss 6.8483   LearningRate 0.0469   Epoch: 6   Global Step: 105130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:46:25,613-Speed 9430.01 samples/sec   Loss 6.9642   LearningRate 0.0469   Epoch: 6   Global Step: 105140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:46:26,734-Speed 9146.50 samples/sec   Loss 6.7915   LearningRate 0.0469   Epoch: 6   Global Step: 105150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:46:27,856-Speed 9127.10 samples/sec   Loss 6.8787   LearningRate 0.0469   Epoch: 6   Global Step: 105160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:46:28,947-Speed 9391.85 samples/sec   Loss 6.8345   LearningRate 0.0469   Epoch: 6   Global Step: 105170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:30,032-Speed 9445.68 samples/sec   Loss 6.9377   LearningRate 0.0469   Epoch: 6   Global Step: 105180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:31,098-Speed 9610.90 samples/sec   Loss 6.9226   LearningRate 0.0469   Epoch: 6   Global Step: 105190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:32,177-Speed 9495.16 samples/sec   Loss 6.9391   LearningRate 0.0469   Epoch: 6   Global Step: 105200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:33,285-Speed 9251.14 samples/sec   Loss 6.8311   LearningRate 0.0469   Epoch: 6   Global Step: 105210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:34,375-Speed 9398.66 samples/sec   Loss 6.8178   LearningRate 0.0469   Epoch: 6   Global Step: 105220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:35,474-Speed 9321.71 samples/sec   Loss 6.9530   LearningRate 0.0469   Epoch: 6   Global Step: 105230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:36,516-Speed 9835.25 samples/sec   Loss 6.8243   LearningRate 0.0469   Epoch: 6   Global Step: 105240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:37,554-Speed 9875.50 samples/sec   Loss 6.8313   LearningRate 0.0469   Epoch: 6   Global Step: 105250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:38,589-Speed 9895.40 samples/sec   Loss 6.8419   LearningRate 0.0469   Epoch: 6   Global Step: 105260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:39,666-Speed 9511.22 samples/sec   Loss 6.9056   LearningRate 0.0469   Epoch: 6   Global Step: 105270   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:46:40,713-Speed 9789.30 samples/sec   Loss 6.9827   LearningRate 0.0469   Epoch: 6   Global Step: 105280   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:46:41,809-Speed 9345.48 samples/sec   Loss 6.8695   LearningRate 0.0469   Epoch: 6   Global Step: 105290   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:46:42,908-Speed 9326.33 samples/sec   Loss 6.9640   LearningRate 0.0469   Epoch: 6   Global Step: 105300   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:46:44,012-Speed 9283.38 samples/sec   Loss 6.9096   LearningRate 0.0469   Epoch: 6   Global Step: 105310   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:46:45,037-Speed 9993.45 samples/sec   Loss 6.7880   LearningRate 0.0469   Epoch: 6   Global Step: 105320   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:46:46,127-Speed 9401.97 samples/sec   Loss 6.8605   LearningRate 0.0469   Epoch: 6   Global Step: 105330   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:46:47,193-Speed 9611.58 samples/sec   Loss 6.8415   LearningRate 0.0468   Epoch: 6   Global Step: 105340   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:46:48,301-Speed 9244.78 samples/sec   Loss 6.7744   LearningRate 0.0468   Epoch: 6   Global Step: 105350   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:46:49,393-Speed 9379.59 samples/sec   Loss 6.8163   LearningRate 0.0468   Epoch: 6   Global Step: 105360   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:46:50,468-Speed 9539.23 samples/sec   Loss 6.8801   LearningRate 0.0468   Epoch: 6   Global Step: 105370   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:46:51,528-Speed 9662.55 samples/sec   Loss 6.9143   LearningRate 0.0468   Epoch: 6   Global Step: 105380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:52,624-Speed 9354.23 samples/sec   Loss 6.9459   LearningRate 0.0468   Epoch: 6   Global Step: 105390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:53,714-Speed 9400.61 samples/sec   Loss 6.8932   LearningRate 0.0468   Epoch: 6   Global Step: 105400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:54,824-Speed 9230.06 samples/sec   Loss 6.8807   LearningRate 0.0468   Epoch: 6   Global Step: 105410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:55,915-Speed 9389.21 samples/sec   Loss 6.7844   LearningRate 0.0468   Epoch: 6   Global Step: 105420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:56,969-Speed 9719.30 samples/sec   Loss 6.8813   LearningRate 0.0468   Epoch: 6   Global Step: 105430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:58,045-Speed 9526.39 samples/sec   Loss 6.8213   LearningRate 0.0468   Epoch: 6   Global Step: 105440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:46:59,148-Speed 9284.64 samples/sec   Loss 6.8564   LearningRate 0.0468   Epoch: 6   Global Step: 105450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:00,203-Speed 9717.15 samples/sec   Loss 6.8472   LearningRate 0.0468   Epoch: 6   Global Step: 105460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:01,301-Speed 9325.35 samples/sec   Loss 6.9226   LearningRate 0.0468   Epoch: 6   Global Step: 105470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:02,396-Speed 9356.58 samples/sec   Loss 6.8355   LearningRate 0.0468   Epoch: 6   Global Step: 105480   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:47:03,491-Speed 9363.76 samples/sec   Loss 6.9070   LearningRate 0.0468   Epoch: 6   Global Step: 105490   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:47:04,558-Speed 9662.94 samples/sec   Loss 6.8892   LearningRate 0.0468   Epoch: 6   Global Step: 105500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:05,650-Speed 9381.91 samples/sec   Loss 6.9279   LearningRate 0.0468   Epoch: 6   Global Step: 105510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:06,740-Speed 9401.42 samples/sec   Loss 6.8939   LearningRate 0.0468   Epoch: 6   Global Step: 105520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:07,812-Speed 9556.28 samples/sec   Loss 6.8392   LearningRate 0.0468   Epoch: 6   Global Step: 105530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:08,872-Speed 9672.96 samples/sec   Loss 6.9101   LearningRate 0.0468   Epoch: 6   Global Step: 105540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:09,929-Speed 9691.35 samples/sec   Loss 6.8361   LearningRate 0.0468   Epoch: 6   Global Step: 105550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:11,032-Speed 9290.83 samples/sec   Loss 6.8482   LearningRate 0.0468   Epoch: 6   Global Step: 105560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:12,110-Speed 9507.18 samples/sec   Loss 6.8015   LearningRate 0.0468   Epoch: 6   Global Step: 105570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:13,205-Speed 9351.43 samples/sec   Loss 6.7878   LearningRate 0.0467   Epoch: 6   Global Step: 105580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:14,277-Speed 9559.80 samples/sec   Loss 6.8817   LearningRate 0.0467   Epoch: 6   Global Step: 105590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:15,406-Speed 9075.14 samples/sec   Loss 6.8196   LearningRate 0.0467   Epoch: 6   Global Step: 105600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:16,485-Speed 9490.80 samples/sec   Loss 6.8786   LearningRate 0.0467   Epoch: 6   Global Step: 105610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:17,574-Speed 9410.47 samples/sec   Loss 6.9375   LearningRate 0.0467   Epoch: 6   Global Step: 105620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:18,629-Speed 9709.02 samples/sec   Loss 6.8388   LearningRate 0.0467   Epoch: 6   Global Step: 105630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:19,703-Speed 9548.93 samples/sec   Loss 6.7723   LearningRate 0.0467   Epoch: 6   Global Step: 105640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:20,776-Speed 9548.08 samples/sec   Loss 6.8223   LearningRate 0.0467   Epoch: 6   Global Step: 105650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:21,866-Speed 9394.23 samples/sec   Loss 6.9528   LearningRate 0.0467   Epoch: 6   Global Step: 105660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:22,918-Speed 9742.54 samples/sec   Loss 6.8889   LearningRate 0.0467   Epoch: 6   Global Step: 105670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:24,023-Speed 9279.96 samples/sec   Loss 6.8597   LearningRate 0.0467   Epoch: 6   Global Step: 105680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:25,123-Speed 9310.76 samples/sec   Loss 6.9508   LearningRate 0.0467   Epoch: 6   Global Step: 105690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:26,203-Speed 9483.40 samples/sec   Loss 6.8067   LearningRate 0.0467   Epoch: 6   Global Step: 105700   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:47:27,262-Speed 9682.14 samples/sec   Loss 6.8458   LearningRate 0.0467   Epoch: 6   Global Step: 105710   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:47:28,354-Speed 9381.32 samples/sec   Loss 6.8282   LearningRate 0.0467   Epoch: 6   Global Step: 105720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:29,432-Speed 9502.11 samples/sec   Loss 6.9799   LearningRate 0.0467   Epoch: 6   Global Step: 105730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:30,517-Speed 9438.33 samples/sec   Loss 6.8523   LearningRate 0.0467   Epoch: 6   Global Step: 105740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:31,597-Speed 9489.09 samples/sec   Loss 6.8194   LearningRate 0.0467   Epoch: 6   Global Step: 105750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:32,700-Speed 9291.12 samples/sec   Loss 6.7926   LearningRate 0.0467   Epoch: 6   Global Step: 105760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:33,785-Speed 9442.72 samples/sec   Loss 7.0017   LearningRate 0.0467   Epoch: 6   Global Step: 105770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:34,876-Speed 9391.31 samples/sec   Loss 6.8716   LearningRate 0.0467   Epoch: 6   Global Step: 105780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:35,951-Speed 9535.25 samples/sec   Loss 7.0103   LearningRate 0.0467   Epoch: 6   Global Step: 105790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:37,009-Speed 9681.27 samples/sec   Loss 6.8317   LearningRate 0.0467   Epoch: 6   Global Step: 105800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:38,101-Speed 9381.53 samples/sec   Loss 6.8673   LearningRate 0.0467   Epoch: 6   Global Step: 105810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:39,198-Speed 9346.42 samples/sec   Loss 6.9477   LearningRate 0.0466   Epoch: 6   Global Step: 105820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:40,308-Speed 9229.26 samples/sec   Loss 6.9323   LearningRate 0.0466   Epoch: 6   Global Step: 105830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:41,369-Speed 9664.81 samples/sec   Loss 6.8762   LearningRate 0.0466   Epoch: 6   Global Step: 105840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:42,449-Speed 9479.73 samples/sec   Loss 6.8220   LearningRate 0.0466   Epoch: 6   Global Step: 105850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:43,513-Speed 9630.49 samples/sec   Loss 6.8913   LearningRate 0.0466   Epoch: 6   Global Step: 105860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:44,568-Speed 9715.42 samples/sec   Loss 6.9206   LearningRate 0.0466   Epoch: 6   Global Step: 105870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:45,627-Speed 9672.99 samples/sec   Loss 6.8249   LearningRate 0.0466   Epoch: 6   Global Step: 105880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:46,667-Speed 9854.34 samples/sec   Loss 6.8784   LearningRate 0.0466   Epoch: 6   Global Step: 105890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:47,752-Speed 9442.47 samples/sec   Loss 6.9954   LearningRate 0.0466   Epoch: 6   Global Step: 105900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:48,831-Speed 9492.88 samples/sec   Loss 6.9064   LearningRate 0.0466   Epoch: 6   Global Step: 105910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:49,902-Speed 9568.45 samples/sec   Loss 6.8380   LearningRate 0.0466   Epoch: 6   Global Step: 105920   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:47:50,988-Speed 9430.69 samples/sec   Loss 6.9003   LearningRate 0.0466   Epoch: 6   Global Step: 105930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:52,079-Speed 9389.49 samples/sec   Loss 6.9558   LearningRate 0.0466   Epoch: 6   Global Step: 105940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:53,145-Speed 9615.29 samples/sec   Loss 6.8939   LearningRate 0.0466   Epoch: 6   Global Step: 105950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:54,220-Speed 9535.27 samples/sec   Loss 7.0385   LearningRate 0.0466   Epoch: 6   Global Step: 105960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:55,279-Speed 9669.71 samples/sec   Loss 6.9641   LearningRate 0.0466   Epoch: 6   Global Step: 105970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:56,375-Speed 9350.58 samples/sec   Loss 6.8012   LearningRate 0.0466   Epoch: 6   Global Step: 105980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:57,476-Speed 9311.99 samples/sec   Loss 6.9365   LearningRate 0.0466   Epoch: 6   Global Step: 105990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:47:58,558-Speed 9466.93 samples/sec   Loss 6.8786   LearningRate 0.0466   Epoch: 6   Global Step: 106000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:48:20,355-[lfw][106000]XNorm: 11.342331
Training: 2022-04-11 15:48:20,356-[lfw][106000]Accuracy-Flip: 0.99617+-0.00279
Training: 2022-04-11 15:48:20,356-[lfw][106000]Accuracy-Highest: 0.99683
Training: 2022-04-11 15:48:45,547-[cfp_fp][106000]XNorm: 9.626601
Training: 2022-04-11 15:48:45,548-[cfp_fp][106000]Accuracy-Flip: 0.95614+-0.01169
Training: 2022-04-11 15:48:45,548-[cfp_fp][106000]Accuracy-Highest: 0.95857
Training: 2022-04-11 15:49:07,360-[agedb_30][106000]XNorm: 10.892514
Training: 2022-04-11 15:49:07,361-[agedb_30][106000]Accuracy-Flip: 0.96250+-0.00932
Training: 2022-04-11 15:49:07,362-[agedb_30][106000]Accuracy-Highest: 0.96483
Training: 2022-04-11 15:49:08,435-Speed 146.54 samples/sec   Loss 6.9093   LearningRate 0.0466   Epoch: 6   Global Step: 106010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:09,480-Speed 9803.29 samples/sec   Loss 6.8598   LearningRate 0.0466   Epoch: 6   Global Step: 106020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:10,557-Speed 9513.81 samples/sec   Loss 6.8489   LearningRate 0.0466   Epoch: 6   Global Step: 106030   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:49:11,644-Speed 9432.38 samples/sec   Loss 6.9421   LearningRate 0.0466   Epoch: 6   Global Step: 106040   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:49:12,699-Speed 9711.01 samples/sec   Loss 6.8313   LearningRate 0.0466   Epoch: 6   Global Step: 106050   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:49:13,803-Speed 9274.54 samples/sec   Loss 6.7791   LearningRate 0.0466   Epoch: 6   Global Step: 106060   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:49:14,915-Speed 9221.42 samples/sec   Loss 6.9308   LearningRate 0.0465   Epoch: 6   Global Step: 106070   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:49:15,964-Speed 9767.84 samples/sec   Loss 6.8382   LearningRate 0.0465   Epoch: 6   Global Step: 106080   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:49:17,020-Speed 9697.92 samples/sec   Loss 6.9276   LearningRate 0.0465   Epoch: 6   Global Step: 106090   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:49:18,095-Speed 9529.51 samples/sec   Loss 6.8956   LearningRate 0.0465   Epoch: 6   Global Step: 106100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:19,163-Speed 9590.79 samples/sec   Loss 6.9513   LearningRate 0.0465   Epoch: 6   Global Step: 106110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:20,249-Speed 9436.34 samples/sec   Loss 6.8384   LearningRate 0.0465   Epoch: 6   Global Step: 106120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:21,344-Speed 9356.87 samples/sec   Loss 6.8742   LearningRate 0.0465   Epoch: 6   Global Step: 106130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:22,444-Speed 9318.34 samples/sec   Loss 6.7840   LearningRate 0.0465   Epoch: 6   Global Step: 106140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:23,537-Speed 9366.90 samples/sec   Loss 6.9060   LearningRate 0.0465   Epoch: 6   Global Step: 106150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:24,653-Speed 9183.14 samples/sec   Loss 6.9492   LearningRate 0.0465   Epoch: 6   Global Step: 106160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:25,751-Speed 9331.42 samples/sec   Loss 6.8420   LearningRate 0.0465   Epoch: 6   Global Step: 106170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:26,871-Speed 9150.16 samples/sec   Loss 6.9556   LearningRate 0.0465   Epoch: 6   Global Step: 106180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:27,966-Speed 9355.28 samples/sec   Loss 6.9018   LearningRate 0.0465   Epoch: 6   Global Step: 106190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:29,045-Speed 9500.86 samples/sec   Loss 6.9526   LearningRate 0.0465   Epoch: 6   Global Step: 106200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:30,114-Speed 9582.52 samples/sec   Loss 6.9538   LearningRate 0.0465   Epoch: 6   Global Step: 106210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:31,158-Speed 9817.07 samples/sec   Loss 6.9679   LearningRate 0.0465   Epoch: 6   Global Step: 106220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:32,241-Speed 9464.03 samples/sec   Loss 6.8943   LearningRate 0.0465   Epoch: 6   Global Step: 106230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:33,268-Speed 9970.07 samples/sec   Loss 6.9103   LearningRate 0.0465   Epoch: 6   Global Step: 106240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:34,371-Speed 9293.38 samples/sec   Loss 6.8863   LearningRate 0.0465   Epoch: 6   Global Step: 106250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:35,461-Speed 9398.33 samples/sec   Loss 6.8861   LearningRate 0.0465   Epoch: 6   Global Step: 106260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:36,545-Speed 9448.01 samples/sec   Loss 6.8663   LearningRate 0.0465   Epoch: 6   Global Step: 106270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:37,612-Speed 9605.27 samples/sec   Loss 7.0419   LearningRate 0.0465   Epoch: 6   Global Step: 106280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:38,709-Speed 9339.29 samples/sec   Loss 6.9604   LearningRate 0.0465   Epoch: 6   Global Step: 106290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:39,833-Speed 9117.85 samples/sec   Loss 6.9788   LearningRate 0.0465   Epoch: 6   Global Step: 106300   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:49:40,915-Speed 9468.16 samples/sec   Loss 6.9188   LearningRate 0.0464   Epoch: 6   Global Step: 106310   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:49:42,024-Speed 9241.65 samples/sec   Loss 6.8250   LearningRate 0.0464   Epoch: 6   Global Step: 106320   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:49:43,107-Speed 9454.10 samples/sec   Loss 6.8998   LearningRate 0.0464   Epoch: 6   Global Step: 106330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:44,173-Speed 9621.20 samples/sec   Loss 6.8594   LearningRate 0.0464   Epoch: 6   Global Step: 106340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:45,219-Speed 9795.16 samples/sec   Loss 6.9432   LearningRate 0.0464   Epoch: 6   Global Step: 106350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:46,271-Speed 9737.20 samples/sec   Loss 6.9162   LearningRate 0.0464   Epoch: 6   Global Step: 106360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:47,352-Speed 9480.86 samples/sec   Loss 6.8445   LearningRate 0.0464   Epoch: 6   Global Step: 106370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:48,423-Speed 9566.27 samples/sec   Loss 6.9484   LearningRate 0.0464   Epoch: 6   Global Step: 106380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:49,464-Speed 9847.10 samples/sec   Loss 6.9326   LearningRate 0.0464   Epoch: 6   Global Step: 106390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:50,490-Speed 9982.35 samples/sec   Loss 6.8154   LearningRate 0.0464   Epoch: 6   Global Step: 106400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:49:51,536-Speed 9793.75 samples/sec   Loss 6.8382   LearningRate 0.0464   Epoch: 6   Global Step: 106410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:49:52,628-Speed 9384.98 samples/sec   Loss 6.9795   LearningRate 0.0464   Epoch: 6   Global Step: 106420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:49:53,685-Speed 9692.74 samples/sec   Loss 6.9277   LearningRate 0.0464   Epoch: 6   Global Step: 106430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:49:54,777-Speed 9389.73 samples/sec   Loss 6.8959   LearningRate 0.0464   Epoch: 6   Global Step: 106440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:49:55,806-Speed 9951.65 samples/sec   Loss 7.0154   LearningRate 0.0464   Epoch: 6   Global Step: 106450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:49:56,865-Speed 9674.45 samples/sec   Loss 6.8655   LearningRate 0.0464   Epoch: 6   Global Step: 106460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:49:57,953-Speed 9420.57 samples/sec   Loss 6.8256   LearningRate 0.0464   Epoch: 6   Global Step: 106470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:49:59,016-Speed 9639.46 samples/sec   Loss 6.8865   LearningRate 0.0464   Epoch: 6   Global Step: 106480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:00,116-Speed 9309.12 samples/sec   Loss 6.9659   LearningRate 0.0464   Epoch: 6   Global Step: 106490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:01,178-Speed 9651.48 samples/sec   Loss 6.9429   LearningRate 0.0464   Epoch: 6   Global Step: 106500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:02,268-Speed 9403.72 samples/sec   Loss 6.9039   LearningRate 0.0464   Epoch: 6   Global Step: 106510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:03,341-Speed 9547.63 samples/sec   Loss 6.8493   LearningRate 0.0464   Epoch: 6   Global Step: 106520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:04,457-Speed 9181.28 samples/sec   Loss 6.9361   LearningRate 0.0464   Epoch: 6   Global Step: 106530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:05,537-Speed 9478.81 samples/sec   Loss 7.0226   LearningRate 0.0464   Epoch: 6   Global Step: 106540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:06,641-Speed 9287.24 samples/sec   Loss 6.9307   LearningRate 0.0464   Epoch: 6   Global Step: 106550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:07,683-Speed 9831.60 samples/sec   Loss 6.9026   LearningRate 0.0463   Epoch: 6   Global Step: 106560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:08,755-Speed 9559.70 samples/sec   Loss 6.8376   LearningRate 0.0463   Epoch: 6   Global Step: 106570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:09,845-Speed 9401.11 samples/sec   Loss 6.8133   LearningRate 0.0463   Epoch: 6   Global Step: 106580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:10,957-Speed 9209.05 samples/sec   Loss 6.8294   LearningRate 0.0463   Epoch: 6   Global Step: 106590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:12,017-Speed 9670.57 samples/sec   Loss 6.9144   LearningRate 0.0463   Epoch: 6   Global Step: 106600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:13,059-Speed 9828.20 samples/sec   Loss 6.9551   LearningRate 0.0463   Epoch: 6   Global Step: 106610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:14,114-Speed 9716.03 samples/sec   Loss 6.9371   LearningRate 0.0463   Epoch: 6   Global Step: 106620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:15,211-Speed 9345.08 samples/sec   Loss 6.9863   LearningRate 0.0463   Epoch: 6   Global Step: 106630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:16,300-Speed 9407.64 samples/sec   Loss 6.9304   LearningRate 0.0463   Epoch: 6   Global Step: 106640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:17,357-Speed 9687.81 samples/sec   Loss 6.9753   LearningRate 0.0463   Epoch: 6   Global Step: 106650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:18,489-Speed 9053.80 samples/sec   Loss 6.8581   LearningRate 0.0463   Epoch: 6   Global Step: 106660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:19,530-Speed 9837.13 samples/sec   Loss 6.9243   LearningRate 0.0463   Epoch: 6   Global Step: 106670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:20,613-Speed 9463.68 samples/sec   Loss 6.8502   LearningRate 0.0463   Epoch: 6   Global Step: 106680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:21,650-Speed 9879.68 samples/sec   Loss 6.8702   LearningRate 0.0463   Epoch: 6   Global Step: 106690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:22,776-Speed 9106.18 samples/sec   Loss 6.9380   LearningRate 0.0463   Epoch: 6   Global Step: 106700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:23,907-Speed 9057.18 samples/sec   Loss 6.8806   LearningRate 0.0463   Epoch: 6   Global Step: 106710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:24,988-Speed 9488.86 samples/sec   Loss 6.9101   LearningRate 0.0463   Epoch: 6   Global Step: 106720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:26,079-Speed 9390.33 samples/sec   Loss 6.8977   LearningRate 0.0463   Epoch: 6   Global Step: 106730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:27,159-Speed 9488.95 samples/sec   Loss 6.9619   LearningRate 0.0463   Epoch: 6   Global Step: 106740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:28,254-Speed 9350.91 samples/sec   Loss 7.0063   LearningRate 0.0463   Epoch: 6   Global Step: 106750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:29,329-Speed 9537.20 samples/sec   Loss 6.8830   LearningRate 0.0463   Epoch: 6   Global Step: 106760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:30,407-Speed 9503.89 samples/sec   Loss 6.8354   LearningRate 0.0463   Epoch: 6   Global Step: 106770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:31,483-Speed 9518.71 samples/sec   Loss 6.8491   LearningRate 0.0463   Epoch: 6   Global Step: 106780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:32,552-Speed 9585.94 samples/sec   Loss 6.9226   LearningRate 0.0463   Epoch: 6   Global Step: 106790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:33,658-Speed 9264.30 samples/sec   Loss 6.9748   LearningRate 0.0462   Epoch: 6   Global Step: 106800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:34,717-Speed 9674.77 samples/sec   Loss 6.9613   LearningRate 0.0462   Epoch: 6   Global Step: 106810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:50:35,807-Speed 9401.55 samples/sec   Loss 6.7876   LearningRate 0.0462   Epoch: 6   Global Step: 106820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:36,907-Speed 9314.25 samples/sec   Loss 6.9500   LearningRate 0.0462   Epoch: 6   Global Step: 106830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:37,994-Speed 9421.00 samples/sec   Loss 6.8395   LearningRate 0.0462   Epoch: 6   Global Step: 106840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:39,098-Speed 9281.60 samples/sec   Loss 6.8987   LearningRate 0.0462   Epoch: 6   Global Step: 106850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:40,157-Speed 9680.12 samples/sec   Loss 6.7959   LearningRate 0.0462   Epoch: 6   Global Step: 106860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:41,235-Speed 9497.98 samples/sec   Loss 6.8959   LearningRate 0.0462   Epoch: 6   Global Step: 106870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:42,311-Speed 9525.18 samples/sec   Loss 6.8802   LearningRate 0.0462   Epoch: 6   Global Step: 106880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:43,394-Speed 9462.77 samples/sec   Loss 6.9170   LearningRate 0.0462   Epoch: 6   Global Step: 106890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:44,431-Speed 9888.73 samples/sec   Loss 6.8729   LearningRate 0.0462   Epoch: 6   Global Step: 106900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:45,506-Speed 9532.86 samples/sec   Loss 6.9153   LearningRate 0.0462   Epoch: 6   Global Step: 106910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:46,536-Speed 9942.36 samples/sec   Loss 6.8835   LearningRate 0.0462   Epoch: 6   Global Step: 106920   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:50:47,654-Speed 9167.35 samples/sec   Loss 6.9002   LearningRate 0.0462   Epoch: 6   Global Step: 106930   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:50:48,732-Speed 9499.77 samples/sec   Loss 6.7789   LearningRate 0.0462   Epoch: 6   Global Step: 106940   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:50:49,831-Speed 9323.47 samples/sec   Loss 6.7649   LearningRate 0.0462   Epoch: 6   Global Step: 106950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:50,917-Speed 9437.08 samples/sec   Loss 6.8751   LearningRate 0.0462   Epoch: 6   Global Step: 106960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:51,974-Speed 9693.34 samples/sec   Loss 6.8417   LearningRate 0.0462   Epoch: 6   Global Step: 106970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:53,086-Speed 9213.49 samples/sec   Loss 6.9046   LearningRate 0.0462   Epoch: 6   Global Step: 106980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:54,151-Speed 9618.88 samples/sec   Loss 6.8505   LearningRate 0.0462   Epoch: 6   Global Step: 106990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:55,259-Speed 9247.48 samples/sec   Loss 6.9250   LearningRate 0.0462   Epoch: 6   Global Step: 107000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:56,367-Speed 9253.34 samples/sec   Loss 6.8886   LearningRate 0.0462   Epoch: 6   Global Step: 107010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:57,480-Speed 9207.41 samples/sec   Loss 6.8825   LearningRate 0.0462   Epoch: 6   Global Step: 107020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:58,600-Speed 9146.00 samples/sec   Loss 6.8469   LearningRate 0.0462   Epoch: 6   Global Step: 107030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:50:59,699-Speed 9320.26 samples/sec   Loss 6.7819   LearningRate 0.0462   Epoch: 6   Global Step: 107040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:00,839-Speed 8989.43 samples/sec   Loss 7.0548   LearningRate 0.0461   Epoch: 6   Global Step: 107050   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:51:01,932-Speed 9374.61 samples/sec   Loss 6.8545   LearningRate 0.0461   Epoch: 6   Global Step: 107060   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:51:02,995-Speed 9640.57 samples/sec   Loss 6.8374   LearningRate 0.0461   Epoch: 6   Global Step: 107070   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:51:04,076-Speed 9483.58 samples/sec   Loss 6.9003   LearningRate 0.0461   Epoch: 6   Global Step: 107080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:05,173-Speed 9337.18 samples/sec   Loss 6.9073   LearningRate 0.0461   Epoch: 6   Global Step: 107090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:06,251-Speed 9500.54 samples/sec   Loss 6.9002   LearningRate 0.0461   Epoch: 6   Global Step: 107100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:07,310-Speed 9683.18 samples/sec   Loss 6.8883   LearningRate 0.0461   Epoch: 6   Global Step: 107110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:08,344-Speed 9899.97 samples/sec   Loss 6.9166   LearningRate 0.0461   Epoch: 6   Global Step: 107120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:09,392-Speed 9781.50 samples/sec   Loss 7.0155   LearningRate 0.0461   Epoch: 6   Global Step: 107130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:10,467-Speed 9526.38 samples/sec   Loss 6.9027   LearningRate 0.0461   Epoch: 6   Global Step: 107140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:11,543-Speed 9522.00 samples/sec   Loss 6.8614   LearningRate 0.0461   Epoch: 6   Global Step: 107150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:12,646-Speed 9288.80 samples/sec   Loss 6.9425   LearningRate 0.0461   Epoch: 6   Global Step: 107160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:13,703-Speed 9694.98 samples/sec   Loss 6.9167   LearningRate 0.0461   Epoch: 6   Global Step: 107170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:14,767-Speed 9630.48 samples/sec   Loss 6.9423   LearningRate 0.0461   Epoch: 6   Global Step: 107180   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:51:15,876-Speed 9239.53 samples/sec   Loss 6.7966   LearningRate 0.0461   Epoch: 6   Global Step: 107190   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:51:16,979-Speed 9288.81 samples/sec   Loss 7.0305   LearningRate 0.0461   Epoch: 6   Global Step: 107200   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:51:18,071-Speed 9381.36 samples/sec   Loss 6.8593   LearningRate 0.0461   Epoch: 6   Global Step: 107210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:19,105-Speed 9912.21 samples/sec   Loss 6.9257   LearningRate 0.0461   Epoch: 6   Global Step: 107220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:20,180-Speed 9531.90 samples/sec   Loss 6.9001   LearningRate 0.0461   Epoch: 6   Global Step: 107230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:21,300-Speed 9151.52 samples/sec   Loss 6.9468   LearningRate 0.0461   Epoch: 6   Global Step: 107240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:22,361-Speed 9659.42 samples/sec   Loss 6.9643   LearningRate 0.0461   Epoch: 6   Global Step: 107250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:23,448-Speed 9425.24 samples/sec   Loss 6.8523   LearningRate 0.0461   Epoch: 6   Global Step: 107260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:24,484-Speed 9887.52 samples/sec   Loss 6.7992   LearningRate 0.0461   Epoch: 6   Global Step: 107270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:25,528-Speed 9815.24 samples/sec   Loss 6.8781   LearningRate 0.0461   Epoch: 6   Global Step: 107280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:26,582-Speed 9722.58 samples/sec   Loss 6.8833   LearningRate 0.0460   Epoch: 6   Global Step: 107290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:27,676-Speed 9365.33 samples/sec   Loss 6.9153   LearningRate 0.0460   Epoch: 6   Global Step: 107300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:28,736-Speed 9664.46 samples/sec   Loss 6.9290   LearningRate 0.0460   Epoch: 6   Global Step: 107310   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:51:29,801-Speed 9618.23 samples/sec   Loss 6.9796   LearningRate 0.0460   Epoch: 6   Global Step: 107320   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:51:30,864-Speed 9639.74 samples/sec   Loss 6.9332   LearningRate 0.0460   Epoch: 6   Global Step: 107330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:31,929-Speed 9624.82 samples/sec   Loss 7.0110   LearningRate 0.0460   Epoch: 6   Global Step: 107340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:32,991-Speed 9651.85 samples/sec   Loss 6.8885   LearningRate 0.0460   Epoch: 6   Global Step: 107350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:34,057-Speed 9609.87 samples/sec   Loss 6.9600   LearningRate 0.0460   Epoch: 6   Global Step: 107360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:35,174-Speed 9172.19 samples/sec   Loss 7.0113   LearningRate 0.0460   Epoch: 6   Global Step: 107370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:36,230-Speed 9705.60 samples/sec   Loss 6.9293   LearningRate 0.0460   Epoch: 6   Global Step: 107380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:37,304-Speed 9535.31 samples/sec   Loss 6.8396   LearningRate 0.0460   Epoch: 6   Global Step: 107390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:38,402-Speed 9331.32 samples/sec   Loss 6.9062   LearningRate 0.0460   Epoch: 6   Global Step: 107400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:39,479-Speed 9513.77 samples/sec   Loss 6.9177   LearningRate 0.0460   Epoch: 6   Global Step: 107410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:40,589-Speed 9231.56 samples/sec   Loss 6.9280   LearningRate 0.0460   Epoch: 6   Global Step: 107420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:41,675-Speed 9436.90 samples/sec   Loss 6.8322   LearningRate 0.0460   Epoch: 6   Global Step: 107430   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:51:42,768-Speed 9376.33 samples/sec   Loss 6.9270   LearningRate 0.0460   Epoch: 6   Global Step: 107440   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:51:43,841-Speed 9545.22 samples/sec   Loss 6.8333   LearningRate 0.0460   Epoch: 6   Global Step: 107450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:44,911-Speed 9579.81 samples/sec   Loss 6.9451   LearningRate 0.0460   Epoch: 6   Global Step: 107460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:51:46,024-Speed 9201.40 samples/sec   Loss 6.9931   LearningRate 0.0460   Epoch: 6   Global Step: 107470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:51:47,087-Speed 9642.55 samples/sec   Loss 6.9764   LearningRate 0.0460   Epoch: 6   Global Step: 107480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:51:48,149-Speed 9650.60 samples/sec   Loss 6.8643   LearningRate 0.0460   Epoch: 6   Global Step: 107490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:51:49,215-Speed 9606.28 samples/sec   Loss 6.9312   LearningRate 0.0460   Epoch: 6   Global Step: 107500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:51:50,266-Speed 9753.77 samples/sec   Loss 6.8361   LearningRate 0.0460   Epoch: 6   Global Step: 107510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:51:51,309-Speed 9821.32 samples/sec   Loss 6.9064   LearningRate 0.0460   Epoch: 6   Global Step: 107520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:51:52,411-Speed 9298.90 samples/sec   Loss 6.9725   LearningRate 0.0460   Epoch: 6   Global Step: 107530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:51:53,562-Speed 8895.33 samples/sec   Loss 6.8538   LearningRate 0.0459   Epoch: 6   Global Step: 107540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:51:54,645-Speed 9463.74 samples/sec   Loss 7.0254   LearningRate 0.0459   Epoch: 6   Global Step: 107550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:51:55,709-Speed 9631.86 samples/sec   Loss 6.8783   LearningRate 0.0459   Epoch: 6   Global Step: 107560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:56,770-Speed 9655.04 samples/sec   Loss 6.9237   LearningRate 0.0459   Epoch: 6   Global Step: 107570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:51:57,862-Speed 9387.01 samples/sec   Loss 7.0443   LearningRate 0.0459   Epoch: 6   Global Step: 107580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:51:58,918-Speed 9705.72 samples/sec   Loss 6.9373   LearningRate 0.0459   Epoch: 6   Global Step: 107590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:51:59,981-Speed 9633.29 samples/sec   Loss 6.9914   LearningRate 0.0459   Epoch: 6   Global Step: 107600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:01,061-Speed 9493.68 samples/sec   Loss 6.8867   LearningRate 0.0459   Epoch: 6   Global Step: 107610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:02,138-Speed 9513.28 samples/sec   Loss 6.9129   LearningRate 0.0459   Epoch: 6   Global Step: 107620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:03,184-Speed 9794.31 samples/sec   Loss 6.8688   LearningRate 0.0459   Epoch: 6   Global Step: 107630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:04,223-Speed 9860.26 samples/sec   Loss 6.9110   LearningRate 0.0459   Epoch: 6   Global Step: 107640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:05,268-Speed 9807.47 samples/sec   Loss 6.8690   LearningRate 0.0459   Epoch: 6   Global Step: 107650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:06,406-Speed 9005.18 samples/sec   Loss 6.9379   LearningRate 0.0459   Epoch: 6   Global Step: 107660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:07,494-Speed 9416.46 samples/sec   Loss 7.0150   LearningRate 0.0459   Epoch: 6   Global Step: 107670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:08,581-Speed 9419.60 samples/sec   Loss 6.9038   LearningRate 0.0459   Epoch: 6   Global Step: 107680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:09,646-Speed 9623.26 samples/sec   Loss 6.8142   LearningRate 0.0459   Epoch: 6   Global Step: 107690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:10,710-Speed 9632.01 samples/sec   Loss 6.9370   LearningRate 0.0459   Epoch: 6   Global Step: 107700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:11,768-Speed 9682.99 samples/sec   Loss 6.8504   LearningRate 0.0459   Epoch: 6   Global Step: 107710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:12,891-Speed 9122.42 samples/sec   Loss 6.9116   LearningRate 0.0459   Epoch: 6   Global Step: 107720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:13,977-Speed 9433.76 samples/sec   Loss 6.9939   LearningRate 0.0459   Epoch: 6   Global Step: 107730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:15,048-Speed 9567.96 samples/sec   Loss 7.0023   LearningRate 0.0459   Epoch: 6   Global Step: 107740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:16,106-Speed 9682.32 samples/sec   Loss 6.7953   LearningRate 0.0459   Epoch: 6   Global Step: 107750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:17,136-Speed 9955.41 samples/sec   Loss 6.8372   LearningRate 0.0459   Epoch: 6   Global Step: 107760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:18,220-Speed 9450.03 samples/sec   Loss 6.8268   LearningRate 0.0459   Epoch: 6   Global Step: 107770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:19,250-Speed 9949.59 samples/sec   Loss 6.9161   LearningRate 0.0459   Epoch: 6   Global Step: 107780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:20,329-Speed 9494.82 samples/sec   Loss 7.0035   LearningRate 0.0458   Epoch: 6   Global Step: 107790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:21,396-Speed 9601.79 samples/sec   Loss 6.8937   LearningRate 0.0458   Epoch: 6   Global Step: 107800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:22,474-Speed 9504.57 samples/sec   Loss 6.9094   LearningRate 0.0458   Epoch: 6   Global Step: 107810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:23,569-Speed 9354.69 samples/sec   Loss 6.9917   LearningRate 0.0458   Epoch: 6   Global Step: 107820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:24,645-Speed 9524.82 samples/sec   Loss 6.8572   LearningRate 0.0458   Epoch: 6   Global Step: 107830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:25,702-Speed 9701.50 samples/sec   Loss 6.8916   LearningRate 0.0458   Epoch: 6   Global Step: 107840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:26,740-Speed 9867.93 samples/sec   Loss 6.8359   LearningRate 0.0458   Epoch: 6   Global Step: 107850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:27,799-Speed 9669.02 samples/sec   Loss 6.7758   LearningRate 0.0458   Epoch: 6   Global Step: 107860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:28,926-Speed 9094.40 samples/sec   Loss 6.8662   LearningRate 0.0458   Epoch: 6   Global Step: 107870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:30,021-Speed 9364.17 samples/sec   Loss 7.0419   LearningRate 0.0458   Epoch: 6   Global Step: 107880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:31,112-Speed 9383.82 samples/sec   Loss 6.7906   LearningRate 0.0458   Epoch: 6   Global Step: 107890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:32,221-Speed 9242.87 samples/sec   Loss 6.9107   LearningRate 0.0458   Epoch: 6   Global Step: 107900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:33,343-Speed 9137.75 samples/sec   Loss 6.7910   LearningRate 0.0458   Epoch: 6   Global Step: 107910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:34,382-Speed 9852.12 samples/sec   Loss 6.8898   LearningRate 0.0458   Epoch: 6   Global Step: 107920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:35,490-Speed 9249.78 samples/sec   Loss 6.9576   LearningRate 0.0458   Epoch: 6   Global Step: 107930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:36,586-Speed 9349.06 samples/sec   Loss 6.9216   LearningRate 0.0458   Epoch: 6   Global Step: 107940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:37,674-Speed 9426.24 samples/sec   Loss 6.8689   LearningRate 0.0458   Epoch: 6   Global Step: 107950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:38,780-Speed 9259.76 samples/sec   Loss 6.7596   LearningRate 0.0458   Epoch: 6   Global Step: 107960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:52:39,868-Speed 9417.84 samples/sec   Loss 6.8974   LearningRate 0.0458   Epoch: 6   Global Step: 107970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:40,957-Speed 9405.15 samples/sec   Loss 6.8233   LearningRate 0.0458   Epoch: 6   Global Step: 107980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:42,044-Speed 9429.11 samples/sec   Loss 6.7681   LearningRate 0.0458   Epoch: 6   Global Step: 107990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:52:43,158-Speed 9198.10 samples/sec   Loss 6.8618   LearningRate 0.0458   Epoch: 6   Global Step: 108000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:53:04,915-[lfw][108000]XNorm: 11.272427
Training: 2022-04-11 15:53:04,915-[lfw][108000]Accuracy-Flip: 0.99600+-0.00260
Training: 2022-04-11 15:53:04,916-[lfw][108000]Accuracy-Highest: 0.99683
Training: 2022-04-11 15:53:30,093-[cfp_fp][108000]XNorm: 9.577855
Training: 2022-04-11 15:53:30,094-[cfp_fp][108000]Accuracy-Flip: 0.95700+-0.01003
Training: 2022-04-11 15:53:30,094-[cfp_fp][108000]Accuracy-Highest: 0.95857
Training: 2022-04-11 15:53:51,843-[agedb_30][108000]XNorm: 10.826729
Training: 2022-04-11 15:53:51,843-[agedb_30][108000]Accuracy-Flip: 0.96317+-0.00838
Training: 2022-04-11 15:53:51,843-[agedb_30][108000]Accuracy-Highest: 0.96483
Training: 2022-04-11 15:53:52,950-Speed 146.72 samples/sec   Loss 6.9105   LearningRate 0.0458   Epoch: 6   Global Step: 108010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:53:54,036-Speed 9430.62 samples/sec   Loss 6.8513   LearningRate 0.0458   Epoch: 6   Global Step: 108020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:53:55,078-Speed 9839.25 samples/sec   Loss 6.9590   LearningRate 0.0457   Epoch: 6   Global Step: 108030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:53:56,193-Speed 9188.66 samples/sec   Loss 6.8845   LearningRate 0.0457   Epoch: 6   Global Step: 108040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:53:57,285-Speed 9386.88 samples/sec   Loss 6.8457   LearningRate 0.0457   Epoch: 6   Global Step: 108050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:53:58,402-Speed 9173.29 samples/sec   Loss 6.9052   LearningRate 0.0457   Epoch: 6   Global Step: 108060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:53:59,470-Speed 9593.58 samples/sec   Loss 6.9710   LearningRate 0.0457   Epoch: 6   Global Step: 108070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:00,552-Speed 9466.88 samples/sec   Loss 6.8759   LearningRate 0.0457   Epoch: 6   Global Step: 108080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:01,599-Speed 9783.65 samples/sec   Loss 6.9611   LearningRate 0.0457   Epoch: 6   Global Step: 108090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:02,693-Speed 9370.02 samples/sec   Loss 6.8934   LearningRate 0.0457   Epoch: 6   Global Step: 108100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:03,784-Speed 9391.18 samples/sec   Loss 7.0359   LearningRate 0.0457   Epoch: 6   Global Step: 108110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:04,847-Speed 9634.74 samples/sec   Loss 6.8872   LearningRate 0.0457   Epoch: 6   Global Step: 108120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:05,923-Speed 9521.01 samples/sec   Loss 6.9729   LearningRate 0.0457   Epoch: 6   Global Step: 108130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:06,999-Speed 9526.75 samples/sec   Loss 6.8655   LearningRate 0.0457   Epoch: 6   Global Step: 108140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:08,066-Speed 9597.26 samples/sec   Loss 6.9494   LearningRate 0.0457   Epoch: 6   Global Step: 108150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:09,112-Speed 9801.41 samples/sec   Loss 6.9066   LearningRate 0.0457   Epoch: 6   Global Step: 108160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:10,159-Speed 9781.83 samples/sec   Loss 6.8944   LearningRate 0.0457   Epoch: 6   Global Step: 108170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:11,211-Speed 9744.27 samples/sec   Loss 6.9251   LearningRate 0.0457   Epoch: 6   Global Step: 108180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:12,255-Speed 9807.15 samples/sec   Loss 6.9315   LearningRate 0.0457   Epoch: 6   Global Step: 108190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:13,316-Speed 9662.29 samples/sec   Loss 6.7143   LearningRate 0.0457   Epoch: 6   Global Step: 108200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:14,388-Speed 9559.22 samples/sec   Loss 6.9271   LearningRate 0.0457   Epoch: 6   Global Step: 108210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:15,465-Speed 9514.97 samples/sec   Loss 7.0072   LearningRate 0.0457   Epoch: 6   Global Step: 108220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:16,611-Speed 8940.53 samples/sec   Loss 6.9331   LearningRate 0.0457   Epoch: 6   Global Step: 108230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:17,688-Speed 9517.55 samples/sec   Loss 6.8820   LearningRate 0.0457   Epoch: 6   Global Step: 108240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:18,767-Speed 9500.32 samples/sec   Loss 6.9420   LearningRate 0.0457   Epoch: 6   Global Step: 108250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:19,824-Speed 9690.27 samples/sec   Loss 6.9073   LearningRate 0.0457   Epoch: 6   Global Step: 108260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:20,935-Speed 9221.55 samples/sec   Loss 6.8654   LearningRate 0.0457   Epoch: 6   Global Step: 108270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:22,076-Speed 8978.42 samples/sec   Loss 6.8185   LearningRate 0.0456   Epoch: 6   Global Step: 108280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:23,168-Speed 9386.11 samples/sec   Loss 6.8052   LearningRate 0.0456   Epoch: 6   Global Step: 108290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:24,283-Speed 9189.11 samples/sec   Loss 6.9546   LearningRate 0.0456   Epoch: 6   Global Step: 108300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:25,366-Speed 9456.05 samples/sec   Loss 6.8604   LearningRate 0.0456   Epoch: 6   Global Step: 108310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:26,451-Speed 9445.83 samples/sec   Loss 6.8522   LearningRate 0.0456   Epoch: 6   Global Step: 108320   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:54:27,521-Speed 9581.26 samples/sec   Loss 6.9645   LearningRate 0.0456   Epoch: 6   Global Step: 108330   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:54:28,582-Speed 9651.47 samples/sec   Loss 6.8900   LearningRate 0.0456   Epoch: 6   Global Step: 108340   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:54:29,647-Speed 9618.17 samples/sec   Loss 6.9794   LearningRate 0.0456   Epoch: 6   Global Step: 108350   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:54:30,685-Speed 9877.83 samples/sec   Loss 6.9663   LearningRate 0.0456   Epoch: 6   Global Step: 108360   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:54:31,795-Speed 9227.56 samples/sec   Loss 6.8808   LearningRate 0.0456   Epoch: 6   Global Step: 108370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:32,874-Speed 9498.93 samples/sec   Loss 6.9947   LearningRate 0.0456   Epoch: 6   Global Step: 108380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:33,939-Speed 9621.46 samples/sec   Loss 6.8456   LearningRate 0.0456   Epoch: 6   Global Step: 108390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:35,006-Speed 9602.27 samples/sec   Loss 6.8946   LearningRate 0.0456   Epoch: 6   Global Step: 108400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:36,093-Speed 9429.13 samples/sec   Loss 6.8375   LearningRate 0.0456   Epoch: 6   Global Step: 108410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:37,164-Speed 9566.56 samples/sec   Loss 6.9271   LearningRate 0.0456   Epoch: 6   Global Step: 108420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:38,241-Speed 9509.47 samples/sec   Loss 6.9776   LearningRate 0.0456   Epoch: 6   Global Step: 108430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:39,344-Speed 9291.37 samples/sec   Loss 6.8723   LearningRate 0.0456   Epoch: 6   Global Step: 108440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:40,412-Speed 9593.36 samples/sec   Loss 6.9896   LearningRate 0.0456   Epoch: 6   Global Step: 108450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:41,486-Speed 9543.67 samples/sec   Loss 6.9407   LearningRate 0.0456   Epoch: 6   Global Step: 108460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:42,566-Speed 9481.51 samples/sec   Loss 6.8686   LearningRate 0.0456   Epoch: 6   Global Step: 108470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:43,677-Speed 9223.52 samples/sec   Loss 6.7859   LearningRate 0.0456   Epoch: 6   Global Step: 108480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:54:44,775-Speed 9335.07 samples/sec   Loss 6.8148   LearningRate 0.0456   Epoch: 6   Global Step: 108490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:45,895-Speed 9148.43 samples/sec   Loss 6.9652   LearningRate 0.0456   Epoch: 6   Global Step: 108500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:46,953-Speed 9687.43 samples/sec   Loss 6.9834   LearningRate 0.0456   Epoch: 6   Global Step: 108510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:47,993-Speed 9845.79 samples/sec   Loss 6.8957   LearningRate 0.0456   Epoch: 6   Global Step: 108520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:49,097-Speed 9285.49 samples/sec   Loss 6.8408   LearningRate 0.0455   Epoch: 6   Global Step: 108530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:50,167-Speed 9572.45 samples/sec   Loss 6.9186   LearningRate 0.0455   Epoch: 6   Global Step: 108540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:51,266-Speed 9322.74 samples/sec   Loss 6.9604   LearningRate 0.0455   Epoch: 6   Global Step: 108550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:52,344-Speed 9500.52 samples/sec   Loss 6.8550   LearningRate 0.0455   Epoch: 6   Global Step: 108560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:53,392-Speed 9773.97 samples/sec   Loss 6.9077   LearningRate 0.0455   Epoch: 6   Global Step: 108570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:54,483-Speed 9398.57 samples/sec   Loss 6.8861   LearningRate 0.0455   Epoch: 6   Global Step: 108580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:55,544-Speed 9656.68 samples/sec   Loss 6.9363   LearningRate 0.0455   Epoch: 6   Global Step: 108590   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:54:56,644-Speed 9316.78 samples/sec   Loss 6.9250   LearningRate 0.0455   Epoch: 6   Global Step: 108600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:57,744-Speed 9311.51 samples/sec   Loss 6.8066   LearningRate 0.0455   Epoch: 6   Global Step: 108610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:58,839-Speed 9355.37 samples/sec   Loss 6.9258   LearningRate 0.0455   Epoch: 6   Global Step: 108620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:54:59,916-Speed 9513.74 samples/sec   Loss 6.8127   LearningRate 0.0455   Epoch: 6   Global Step: 108630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:00,971-Speed 9713.51 samples/sec   Loss 6.8613   LearningRate 0.0455   Epoch: 6   Global Step: 108640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:02,026-Speed 9718.16 samples/sec   Loss 6.8932   LearningRate 0.0455   Epoch: 6   Global Step: 108650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:03,104-Speed 9502.42 samples/sec   Loss 6.8646   LearningRate 0.0455   Epoch: 6   Global Step: 108660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:04,183-Speed 9498.87 samples/sec   Loss 6.8460   LearningRate 0.0455   Epoch: 6   Global Step: 108670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:05,285-Speed 9293.16 samples/sec   Loss 6.7700   LearningRate 0.0455   Epoch: 6   Global Step: 108680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:06,348-Speed 9645.35 samples/sec   Loss 6.9303   LearningRate 0.0455   Epoch: 6   Global Step: 108690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:07,426-Speed 9496.12 samples/sec   Loss 6.8386   LearningRate 0.0455   Epoch: 6   Global Step: 108700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:55:08,500-Speed 9541.56 samples/sec   Loss 6.8077   LearningRate 0.0455   Epoch: 6   Global Step: 108710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:55:09,573-Speed 9554.55 samples/sec   Loss 6.9255   LearningRate 0.0455   Epoch: 6   Global Step: 108720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:55:10,673-Speed 9313.24 samples/sec   Loss 6.8308   LearningRate 0.0455   Epoch: 6   Global Step: 108730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:55:11,740-Speed 9601.26 samples/sec   Loss 6.8333   LearningRate 0.0455   Epoch: 6   Global Step: 108740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:55:12,843-Speed 9283.71 samples/sec   Loss 6.8551   LearningRate 0.0455   Epoch: 6   Global Step: 108750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:55:13,932-Speed 9410.89 samples/sec   Loss 6.8418   LearningRate 0.0455   Epoch: 6   Global Step: 108760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:55:14,967-Speed 9902.27 samples/sec   Loss 6.8954   LearningRate 0.0454   Epoch: 6   Global Step: 108770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:55:16,039-Speed 9558.93 samples/sec   Loss 6.9495   LearningRate 0.0454   Epoch: 6   Global Step: 108780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:55:17,137-Speed 9330.69 samples/sec   Loss 6.9225   LearningRate 0.0454   Epoch: 6   Global Step: 108790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:55:18,226-Speed 9411.31 samples/sec   Loss 6.9204   LearningRate 0.0454   Epoch: 6   Global Step: 108800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:19,316-Speed 9397.92 samples/sec   Loss 6.9373   LearningRate 0.0454   Epoch: 6   Global Step: 108810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:20,409-Speed 9373.30 samples/sec   Loss 6.8747   LearningRate 0.0454   Epoch: 6   Global Step: 108820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:21,507-Speed 9333.61 samples/sec   Loss 6.8496   LearningRate 0.0454   Epoch: 6   Global Step: 108830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:22,566-Speed 9676.06 samples/sec   Loss 6.8767   LearningRate 0.0454   Epoch: 6   Global Step: 108840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:23,645-Speed 9493.73 samples/sec   Loss 6.8052   LearningRate 0.0454   Epoch: 6   Global Step: 108850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:24,724-Speed 9499.15 samples/sec   Loss 6.9075   LearningRate 0.0454   Epoch: 6   Global Step: 108860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:25,842-Speed 9165.95 samples/sec   Loss 6.9076   LearningRate 0.0454   Epoch: 6   Global Step: 108870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:26,948-Speed 9264.29 samples/sec   Loss 6.8992   LearningRate 0.0454   Epoch: 6   Global Step: 108880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:28,024-Speed 9520.52 samples/sec   Loss 6.8721   LearningRate 0.0454   Epoch: 6   Global Step: 108890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:29,073-Speed 9766.93 samples/sec   Loss 6.8815   LearningRate 0.0454   Epoch: 6   Global Step: 108900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:30,164-Speed 9397.39 samples/sec   Loss 6.8941   LearningRate 0.0454   Epoch: 6   Global Step: 108910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:31,247-Speed 9460.57 samples/sec   Loss 6.9094   LearningRate 0.0454   Epoch: 6   Global Step: 108920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:32,289-Speed 9830.67 samples/sec   Loss 6.9196   LearningRate 0.0454   Epoch: 6   Global Step: 108930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:33,353-Speed 9630.63 samples/sec   Loss 6.9383   LearningRate 0.0454   Epoch: 6   Global Step: 108940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:34,414-Speed 9660.47 samples/sec   Loss 6.9224   LearningRate 0.0454   Epoch: 6   Global Step: 108950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:35,457-Speed 9823.07 samples/sec   Loss 6.8462   LearningRate 0.0454   Epoch: 6   Global Step: 108960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:36,510-Speed 9726.09 samples/sec   Loss 6.9460   LearningRate 0.0454   Epoch: 6   Global Step: 108970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:37,584-Speed 9542.66 samples/sec   Loss 6.8231   LearningRate 0.0454   Epoch: 6   Global Step: 108980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:38,669-Speed 9439.73 samples/sec   Loss 6.9970   LearningRate 0.0454   Epoch: 6   Global Step: 108990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:39,736-Speed 9600.79 samples/sec   Loss 6.9377   LearningRate 0.0454   Epoch: 6   Global Step: 109000   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:55:40,855-Speed 9160.81 samples/sec   Loss 6.9547   LearningRate 0.0454   Epoch: 6   Global Step: 109010   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:55:41,941-Speed 9426.68 samples/sec   Loss 6.8352   LearningRate 0.0453   Epoch: 6   Global Step: 109020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:43,022-Speed 9478.30 samples/sec   Loss 6.9196   LearningRate 0.0453   Epoch: 6   Global Step: 109030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:44,113-Speed 9395.62 samples/sec   Loss 7.0188   LearningRate 0.0453   Epoch: 6   Global Step: 109040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:45,166-Speed 9740.14 samples/sec   Loss 6.9821   LearningRate 0.0453   Epoch: 6   Global Step: 109050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:46,226-Speed 9664.48 samples/sec   Loss 6.8727   LearningRate 0.0453   Epoch: 6   Global Step: 109060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:47,301-Speed 9531.48 samples/sec   Loss 6.8130   LearningRate 0.0453   Epoch: 6   Global Step: 109070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:48,360-Speed 9668.02 samples/sec   Loss 6.9308   LearningRate 0.0453   Epoch: 6   Global Step: 109080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:49,484-Speed 9117.97 samples/sec   Loss 6.8334   LearningRate 0.0453   Epoch: 6   Global Step: 109090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:50,567-Speed 9458.31 samples/sec   Loss 6.8326   LearningRate 0.0453   Epoch: 6   Global Step: 109100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:51,652-Speed 9442.35 samples/sec   Loss 6.8751   LearningRate 0.0453   Epoch: 6   Global Step: 109110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:52,733-Speed 9484.08 samples/sec   Loss 6.9450   LearningRate 0.0453   Epoch: 6   Global Step: 109120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:53,806-Speed 9546.04 samples/sec   Loss 6.9295   LearningRate 0.0453   Epoch: 6   Global Step: 109130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:54,910-Speed 9278.05 samples/sec   Loss 6.7746   LearningRate 0.0453   Epoch: 6   Global Step: 109140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:55,970-Speed 9669.45 samples/sec   Loss 6.8338   LearningRate 0.0453   Epoch: 6   Global Step: 109150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:57,077-Speed 9253.24 samples/sec   Loss 6.9083   LearningRate 0.0453   Epoch: 6   Global Step: 109160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:58,174-Speed 9342.65 samples/sec   Loss 6.8695   LearningRate 0.0453   Epoch: 6   Global Step: 109170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:55:59,242-Speed 9598.62 samples/sec   Loss 6.9157   LearningRate 0.0453   Epoch: 6   Global Step: 109180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:00,337-Speed 9351.02 samples/sec   Loss 6.9443   LearningRate 0.0453   Epoch: 6   Global Step: 109190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:01,401-Speed 9630.84 samples/sec   Loss 6.9002   LearningRate 0.0453   Epoch: 6   Global Step: 109200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:02,487-Speed 9439.83 samples/sec   Loss 6.7999   LearningRate 0.0453   Epoch: 6   Global Step: 109210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:03,514-Speed 9983.18 samples/sec   Loss 6.8635   LearningRate 0.0453   Epoch: 6   Global Step: 109220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:04,551-Speed 9882.67 samples/sec   Loss 6.9387   LearningRate 0.0453   Epoch: 6   Global Step: 109230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:05,614-Speed 9638.21 samples/sec   Loss 6.9817   LearningRate 0.0453   Epoch: 6   Global Step: 109240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:06,691-Speed 9508.24 samples/sec   Loss 7.0278   LearningRate 0.0453   Epoch: 6   Global Step: 109250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:07,755-Speed 9628.44 samples/sec   Loss 6.9788   LearningRate 0.0453   Epoch: 6   Global Step: 109260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:08,840-Speed 9445.43 samples/sec   Loss 6.9895   LearningRate 0.0452   Epoch: 6   Global Step: 109270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:09,892-Speed 9736.22 samples/sec   Loss 6.9225   LearningRate 0.0452   Epoch: 6   Global Step: 109280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:10,952-Speed 9667.64 samples/sec   Loss 7.0147   LearningRate 0.0452   Epoch: 6   Global Step: 109290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:12,011-Speed 9670.73 samples/sec   Loss 6.8780   LearningRate 0.0452   Epoch: 6   Global Step: 109300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:13,093-Speed 9469.58 samples/sec   Loss 6.8738   LearningRate 0.0452   Epoch: 6   Global Step: 109310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:14,174-Speed 9483.75 samples/sec   Loss 6.9077   LearningRate 0.0452   Epoch: 6   Global Step: 109320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:15,243-Speed 9588.88 samples/sec   Loss 6.9149   LearningRate 0.0452   Epoch: 6   Global Step: 109330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:16,281-Speed 9871.62 samples/sec   Loss 6.8652   LearningRate 0.0452   Epoch: 6   Global Step: 109340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:17,337-Speed 9697.39 samples/sec   Loss 6.9232   LearningRate 0.0452   Epoch: 6   Global Step: 109350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:18,414-Speed 9513.17 samples/sec   Loss 6.9019   LearningRate 0.0452   Epoch: 6   Global Step: 109360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:19,522-Speed 9243.60 samples/sec   Loss 6.9882   LearningRate 0.0452   Epoch: 6   Global Step: 109370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:20,633-Speed 9226.35 samples/sec   Loss 6.8536   LearningRate 0.0452   Epoch: 6   Global Step: 109380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:21,724-Speed 9388.67 samples/sec   Loss 7.0047   LearningRate 0.0452   Epoch: 6   Global Step: 109390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:22,763-Speed 9862.70 samples/sec   Loss 6.8820   LearningRate 0.0452   Epoch: 6   Global Step: 109400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:23,848-Speed 9448.42 samples/sec   Loss 6.8486   LearningRate 0.0452   Epoch: 6   Global Step: 109410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:24,911-Speed 9637.85 samples/sec   Loss 6.8291   LearningRate 0.0452   Epoch: 6   Global Step: 109420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:25,990-Speed 9498.67 samples/sec   Loss 6.8468   LearningRate 0.0452   Epoch: 6   Global Step: 109430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:27,051-Speed 9653.31 samples/sec   Loss 6.8386   LearningRate 0.0452   Epoch: 6   Global Step: 109440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:28,125-Speed 9538.23 samples/sec   Loss 6.9101   LearningRate 0.0452   Epoch: 6   Global Step: 109450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:29,173-Speed 9778.40 samples/sec   Loss 6.8689   LearningRate 0.0452   Epoch: 6   Global Step: 109460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:30,249-Speed 9526.92 samples/sec   Loss 6.8252   LearningRate 0.0452   Epoch: 6   Global Step: 109470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:31,358-Speed 9231.50 samples/sec   Loss 6.9764   LearningRate 0.0452   Epoch: 6   Global Step: 109480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:32,458-Speed 9320.51 samples/sec   Loss 6.9051   LearningRate 0.0452   Epoch: 6   Global Step: 109490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:33,517-Speed 9671.13 samples/sec   Loss 6.7911   LearningRate 0.0452   Epoch: 6   Global Step: 109500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:34,587-Speed 9574.32 samples/sec   Loss 6.9636   LearningRate 0.0452   Epoch: 6   Global Step: 109510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:35,675-Speed 9423.01 samples/sec   Loss 6.8885   LearningRate 0.0451   Epoch: 6   Global Step: 109520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:36,800-Speed 9105.72 samples/sec   Loss 6.8755   LearningRate 0.0451   Epoch: 6   Global Step: 109530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:37,936-Speed 9020.67 samples/sec   Loss 6.9948   LearningRate 0.0451   Epoch: 6   Global Step: 109540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:39,001-Speed 9617.76 samples/sec   Loss 6.9084   LearningRate 0.0451   Epoch: 6   Global Step: 109550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:40,034-Speed 9917.83 samples/sec   Loss 6.8867   LearningRate 0.0451   Epoch: 6   Global Step: 109560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:41,095-Speed 9662.40 samples/sec   Loss 6.7562   LearningRate 0.0451   Epoch: 6   Global Step: 109570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:42,201-Speed 9265.47 samples/sec   Loss 6.7567   LearningRate 0.0451   Epoch: 6   Global Step: 109580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:43,306-Speed 9265.63 samples/sec   Loss 6.8932   LearningRate 0.0451   Epoch: 6   Global Step: 109590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:44,403-Speed 9343.02 samples/sec   Loss 6.9921   LearningRate 0.0451   Epoch: 6   Global Step: 109600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:45,526-Speed 9125.13 samples/sec   Loss 6.9753   LearningRate 0.0451   Epoch: 6   Global Step: 109610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:46,563-Speed 9879.08 samples/sec   Loss 6.8676   LearningRate 0.0451   Epoch: 6   Global Step: 109620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:47,626-Speed 9637.86 samples/sec   Loss 6.8353   LearningRate 0.0451   Epoch: 6   Global Step: 109630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:48,707-Speed 9478.75 samples/sec   Loss 6.9690   LearningRate 0.0451   Epoch: 6   Global Step: 109640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:49,804-Speed 9339.44 samples/sec   Loss 6.9677   LearningRate 0.0451   Epoch: 6   Global Step: 109650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:56:50,882-Speed 9504.22 samples/sec   Loss 6.9863   LearningRate 0.0451   Epoch: 6   Global Step: 109660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:51,959-Speed 9514.45 samples/sec   Loss 6.9162   LearningRate 0.0451   Epoch: 6   Global Step: 109670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:53,031-Speed 9556.63 samples/sec   Loss 6.8396   LearningRate 0.0451   Epoch: 6   Global Step: 109680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:54,134-Speed 9291.71 samples/sec   Loss 6.9646   LearningRate 0.0451   Epoch: 6   Global Step: 109690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:55,210-Speed 9521.98 samples/sec   Loss 6.8023   LearningRate 0.0451   Epoch: 6   Global Step: 109700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:56,327-Speed 9171.82 samples/sec   Loss 6.7901   LearningRate 0.0451   Epoch: 6   Global Step: 109710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:57,414-Speed 9427.84 samples/sec   Loss 6.9287   LearningRate 0.0451   Epoch: 6   Global Step: 109720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:58,496-Speed 9472.38 samples/sec   Loss 6.7726   LearningRate 0.0451   Epoch: 6   Global Step: 109730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:56:59,558-Speed 9645.59 samples/sec   Loss 6.9584   LearningRate 0.0451   Epoch: 6   Global Step: 109740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:57:00,633-Speed 9538.66 samples/sec   Loss 6.8575   LearningRate 0.0451   Epoch: 6   Global Step: 109750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:57:01,714-Speed 9476.80 samples/sec   Loss 6.9024   LearningRate 0.0451   Epoch: 6   Global Step: 109760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:57:02,801-Speed 9423.94 samples/sec   Loss 6.8551   LearningRate 0.0450   Epoch: 6   Global Step: 109770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:57:03,865-Speed 9630.22 samples/sec   Loss 6.9025   LearningRate 0.0450   Epoch: 6   Global Step: 109780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:57:04,908-Speed 9822.08 samples/sec   Loss 6.9496   LearningRate 0.0450   Epoch: 6   Global Step: 109790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:57:06,018-Speed 9234.92 samples/sec   Loss 6.9164   LearningRate 0.0450   Epoch: 6   Global Step: 109800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:57:07,077-Speed 9673.50 samples/sec   Loss 6.9135   LearningRate 0.0450   Epoch: 6   Global Step: 109810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:57:08,120-Speed 9823.04 samples/sec   Loss 6.8166   LearningRate 0.0450   Epoch: 6   Global Step: 109820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:57:09,187-Speed 9603.59 samples/sec   Loss 6.8695   LearningRate 0.0450   Epoch: 6   Global Step: 109830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:57:10,249-Speed 9642.77 samples/sec   Loss 6.8529   LearningRate 0.0450   Epoch: 6   Global Step: 109840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:57:11,347-Speed 9335.04 samples/sec   Loss 6.8898   LearningRate 0.0450   Epoch: 6   Global Step: 109850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:57:12,461-Speed 9196.80 samples/sec   Loss 6.8423   LearningRate 0.0450   Epoch: 6   Global Step: 109860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:57:13,558-Speed 9342.19 samples/sec   Loss 6.8979   LearningRate 0.0450   Epoch: 6   Global Step: 109870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:57:14,635-Speed 9513.63 samples/sec   Loss 6.9578   LearningRate 0.0450   Epoch: 6   Global Step: 109880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:57:15,721-Speed 9438.76 samples/sec   Loss 6.7252   LearningRate 0.0450   Epoch: 6   Global Step: 109890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:57:16,823-Speed 9293.69 samples/sec   Loss 6.8842   LearningRate 0.0450   Epoch: 6   Global Step: 109900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:57:17,900-Speed 9516.16 samples/sec   Loss 6.9378   LearningRate 0.0450   Epoch: 6   Global Step: 109910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:57:18,949-Speed 9770.28 samples/sec   Loss 6.8465   LearningRate 0.0450   Epoch: 6   Global Step: 109920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:57:20,061-Speed 9212.92 samples/sec   Loss 6.8624   LearningRate 0.0450   Epoch: 6   Global Step: 109930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:57:21,177-Speed 9182.19 samples/sec   Loss 6.9315   LearningRate 0.0450   Epoch: 6   Global Step: 109940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:57:22,236-Speed 9676.27 samples/sec   Loss 6.8422   LearningRate 0.0450   Epoch: 6   Global Step: 109950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:57:23,272-Speed 9892.34 samples/sec   Loss 6.9037   LearningRate 0.0450   Epoch: 6   Global Step: 109960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:57:24,348-Speed 9518.24 samples/sec   Loss 6.8485   LearningRate 0.0450   Epoch: 6   Global Step: 109970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:57:25,408-Speed 9669.14 samples/sec   Loss 6.9500   LearningRate 0.0450   Epoch: 6   Global Step: 109980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:57:26,508-Speed 9316.50 samples/sec   Loss 6.9742   LearningRate 0.0450   Epoch: 6   Global Step: 109990   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 15:57:27,558-Speed 9757.06 samples/sec   Loss 6.9436   LearningRate 0.0450   Epoch: 6   Global Step: 110000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:57:49,353-[lfw][110000]XNorm: 10.955663
Training: 2022-04-11 15:57:49,354-[lfw][110000]Accuracy-Flip: 0.99500+-0.00342
Training: 2022-04-11 15:57:49,354-[lfw][110000]Accuracy-Highest: 0.99683
Training: 2022-04-11 15:58:14,839-[cfp_fp][110000]XNorm: 9.328135
Training: 2022-04-11 15:58:14,840-[cfp_fp][110000]Accuracy-Flip: 0.96157+-0.00987
Training: 2022-04-11 15:58:14,840-[cfp_fp][110000]Accuracy-Highest: 0.96157
Training: 2022-04-11 15:58:36,856-[agedb_30][110000]XNorm: 10.627442
Training: 2022-04-11 15:58:36,856-[agedb_30][110000]Accuracy-Flip: 0.96317+-0.00728
Training: 2022-04-11 15:58:36,857-[agedb_30][110000]Accuracy-Highest: 0.96483
Training: 2022-04-11 15:58:37,920-Speed 145.53 samples/sec   Loss 6.8450   LearningRate 0.0450   Epoch: 6   Global Step: 110010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:58:38,981-Speed 9659.02 samples/sec   Loss 6.9452   LearningRate 0.0449   Epoch: 6   Global Step: 110020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:58:40,016-Speed 9903.35 samples/sec   Loss 6.9396   LearningRate 0.0449   Epoch: 6   Global Step: 110030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:58:41,088-Speed 9556.92 samples/sec   Loss 6.9088   LearningRate 0.0449   Epoch: 6   Global Step: 110040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:58:42,193-Speed 9269.30 samples/sec   Loss 6.9043   LearningRate 0.0449   Epoch: 6   Global Step: 110050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:58:43,242-Speed 9774.81 samples/sec   Loss 6.9385   LearningRate 0.0449   Epoch: 6   Global Step: 110060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:58:44,267-Speed 9995.75 samples/sec   Loss 6.9062   LearningRate 0.0449   Epoch: 6   Global Step: 110070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:58:45,328-Speed 9659.08 samples/sec   Loss 6.8470   LearningRate 0.0449   Epoch: 6   Global Step: 110080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:58:46,458-Speed 9063.88 samples/sec   Loss 6.9333   LearningRate 0.0449   Epoch: 6   Global Step: 110090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:58:47,556-Speed 9332.39 samples/sec   Loss 6.9626   LearningRate 0.0449   Epoch: 6   Global Step: 110100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:58:48,632-Speed 9521.81 samples/sec   Loss 6.8640   LearningRate 0.0449   Epoch: 6   Global Step: 110110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:58:49,734-Speed 9299.55 samples/sec   Loss 6.9086   LearningRate 0.0449   Epoch: 6   Global Step: 110120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:58:50,762-Speed 9962.63 samples/sec   Loss 6.9801   LearningRate 0.0449   Epoch: 6   Global Step: 110130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:58:51,821-Speed 9675.13 samples/sec   Loss 6.9158   LearningRate 0.0449   Epoch: 6   Global Step: 110140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:58:52,861-Speed 9848.86 samples/sec   Loss 6.8287   LearningRate 0.0449   Epoch: 6   Global Step: 110150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:58:53,971-Speed 9238.05 samples/sec   Loss 6.8217   LearningRate 0.0449   Epoch: 6   Global Step: 110160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:58:55,032-Speed 9652.67 samples/sec   Loss 6.7674   LearningRate 0.0449   Epoch: 6   Global Step: 110170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:58:56,112-Speed 9489.94 samples/sec   Loss 6.8269   LearningRate 0.0449   Epoch: 6   Global Step: 110180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:58:57,181-Speed 9585.55 samples/sec   Loss 6.8820   LearningRate 0.0449   Epoch: 6   Global Step: 110190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:58:58,245-Speed 9623.74 samples/sec   Loss 6.8660   LearningRate 0.0449   Epoch: 6   Global Step: 110200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:58:59,318-Speed 9553.38 samples/sec   Loss 7.0440   LearningRate 0.0449   Epoch: 6   Global Step: 110210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:00,451-Speed 9039.45 samples/sec   Loss 6.8765   LearningRate 0.0449   Epoch: 6   Global Step: 110220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:01,525-Speed 9545.78 samples/sec   Loss 6.9687   LearningRate 0.0449   Epoch: 6   Global Step: 110230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:02,580-Speed 9710.28 samples/sec   Loss 6.8568   LearningRate 0.0449   Epoch: 6   Global Step: 110240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:03,664-Speed 9450.17 samples/sec   Loss 6.9363   LearningRate 0.0449   Epoch: 6   Global Step: 110250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:04,745-Speed 9479.57 samples/sec   Loss 6.9002   LearningRate 0.0449   Epoch: 6   Global Step: 110260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:05,814-Speed 9590.90 samples/sec   Loss 6.8515   LearningRate 0.0448   Epoch: 6   Global Step: 110270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:06,873-Speed 9675.25 samples/sec   Loss 6.9163   LearningRate 0.0448   Epoch: 6   Global Step: 110280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:07,952-Speed 9498.44 samples/sec   Loss 6.8935   LearningRate 0.0448   Epoch: 6   Global Step: 110290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:09,030-Speed 9501.69 samples/sec   Loss 6.8920   LearningRate 0.0448   Epoch: 6   Global Step: 110300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:10,082-Speed 9739.23 samples/sec   Loss 7.0098   LearningRate 0.0448   Epoch: 6   Global Step: 110310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:11,178-Speed 9350.05 samples/sec   Loss 6.8355   LearningRate 0.0448   Epoch: 6   Global Step: 110320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:12,280-Speed 9299.15 samples/sec   Loss 6.9188   LearningRate 0.0448   Epoch: 6   Global Step: 110330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:13,340-Speed 9664.53 samples/sec   Loss 6.8768   LearningRate 0.0448   Epoch: 6   Global Step: 110340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:14,418-Speed 9503.65 samples/sec   Loss 6.8298   LearningRate 0.0448   Epoch: 6   Global Step: 110350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:15,519-Speed 9304.96 samples/sec   Loss 6.8040   LearningRate 0.0448   Epoch: 6   Global Step: 110360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:16,620-Speed 9310.06 samples/sec   Loss 6.8542   LearningRate 0.0448   Epoch: 6   Global Step: 110370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:17,707-Speed 9429.81 samples/sec   Loss 7.0409   LearningRate 0.0448   Epoch: 6   Global Step: 110380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:18,794-Speed 9420.82 samples/sec   Loss 6.9238   LearningRate 0.0448   Epoch: 6   Global Step: 110390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:19,875-Speed 9481.50 samples/sec   Loss 6.9305   LearningRate 0.0448   Epoch: 6   Global Step: 110400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:20,980-Speed 9268.29 samples/sec   Loss 6.8853   LearningRate 0.0448   Epoch: 6   Global Step: 110410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:22,090-Speed 9228.84 samples/sec   Loss 6.9630   LearningRate 0.0448   Epoch: 6   Global Step: 110420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:23,190-Speed 9318.79 samples/sec   Loss 6.9391   LearningRate 0.0448   Epoch: 6   Global Step: 110430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:24,302-Speed 9218.28 samples/sec   Loss 6.9487   LearningRate 0.0448   Epoch: 6   Global Step: 110440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:25,372-Speed 9574.45 samples/sec   Loss 7.0150   LearningRate 0.0448   Epoch: 6   Global Step: 110450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:26,458-Speed 9458.13 samples/sec   Loss 6.8890   LearningRate 0.0448   Epoch: 6   Global Step: 110460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:27,500-Speed 9828.48 samples/sec   Loss 6.7512   LearningRate 0.0448   Epoch: 6   Global Step: 110470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:28,603-Speed 9291.93 samples/sec   Loss 6.6332   LearningRate 0.0448   Epoch: 6   Global Step: 110480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:29,680-Speed 9513.66 samples/sec   Loss 6.9839   LearningRate 0.0448   Epoch: 6   Global Step: 110490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:30,777-Speed 9332.12 samples/sec   Loss 6.9242   LearningRate 0.0448   Epoch: 6   Global Step: 110500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:31,887-Speed 9229.85 samples/sec   Loss 6.9116   LearningRate 0.0447   Epoch: 6   Global Step: 110510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:32,975-Speed 9426.86 samples/sec   Loss 6.9609   LearningRate 0.0447   Epoch: 6   Global Step: 110520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:34,096-Speed 9138.81 samples/sec   Loss 6.8553   LearningRate 0.0447   Epoch: 6   Global Step: 110530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:35,147-Speed 9745.64 samples/sec   Loss 6.9421   LearningRate 0.0447   Epoch: 6   Global Step: 110540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:36,208-Speed 9661.27 samples/sec   Loss 6.9066   LearningRate 0.0447   Epoch: 6   Global Step: 110550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:37,280-Speed 9555.59 samples/sec   Loss 6.9421   LearningRate 0.0447   Epoch: 6   Global Step: 110560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:38,401-Speed 9143.36 samples/sec   Loss 6.9372   LearningRate 0.0447   Epoch: 6   Global Step: 110570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:39,474-Speed 9546.88 samples/sec   Loss 6.8903   LearningRate 0.0447   Epoch: 6   Global Step: 110580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:40,516-Speed 9830.19 samples/sec   Loss 6.8421   LearningRate 0.0447   Epoch: 6   Global Step: 110590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:41,626-Speed 9233.41 samples/sec   Loss 6.9236   LearningRate 0.0447   Epoch: 6   Global Step: 110600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:42,699-Speed 9543.60 samples/sec   Loss 6.8544   LearningRate 0.0447   Epoch: 6   Global Step: 110610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:43,807-Speed 9253.68 samples/sec   Loss 6.9962   LearningRate 0.0447   Epoch: 6   Global Step: 110620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:44,913-Speed 9266.20 samples/sec   Loss 6.9310   LearningRate 0.0447   Epoch: 6   Global Step: 110630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:46,005-Speed 9382.87 samples/sec   Loss 6.9095   LearningRate 0.0447   Epoch: 6   Global Step: 110640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:47,091-Speed 9435.79 samples/sec   Loss 6.9357   LearningRate 0.0447   Epoch: 6   Global Step: 110650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:48,166-Speed 9533.24 samples/sec   Loss 6.8240   LearningRate 0.0447   Epoch: 6   Global Step: 110660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:49,230-Speed 9628.46 samples/sec   Loss 6.8466   LearningRate 0.0447   Epoch: 6   Global Step: 110670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 15:59:50,298-Speed 9591.01 samples/sec   Loss 6.8226   LearningRate 0.0447   Epoch: 6   Global Step: 110680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:51,381-Speed 9459.53 samples/sec   Loss 6.8362   LearningRate 0.0447   Epoch: 6   Global Step: 110690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:52,498-Speed 9175.12 samples/sec   Loss 6.8400   LearningRate 0.0447   Epoch: 6   Global Step: 110700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:53,621-Speed 9122.78 samples/sec   Loss 6.7968   LearningRate 0.0447   Epoch: 6   Global Step: 110710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:54,660-Speed 9862.75 samples/sec   Loss 6.7806   LearningRate 0.0447   Epoch: 6   Global Step: 110720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:55,738-Speed 9508.28 samples/sec   Loss 6.8453   LearningRate 0.0447   Epoch: 6   Global Step: 110730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:56,848-Speed 9225.87 samples/sec   Loss 6.9852   LearningRate 0.0447   Epoch: 6   Global Step: 110740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:57,919-Speed 9568.09 samples/sec   Loss 6.8636   LearningRate 0.0447   Epoch: 6   Global Step: 110750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 15:59:58,979-Speed 9669.65 samples/sec   Loss 6.8575   LearningRate 0.0446   Epoch: 6   Global Step: 110760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:00,068-Speed 9408.70 samples/sec   Loss 6.8366   LearningRate 0.0446   Epoch: 6   Global Step: 110770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:01,168-Speed 9311.14 samples/sec   Loss 6.8745   LearningRate 0.0446   Epoch: 6   Global Step: 110780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:02,233-Speed 9626.02 samples/sec   Loss 6.9046   LearningRate 0.0446   Epoch: 6   Global Step: 110790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:03,329-Speed 9350.73 samples/sec   Loss 6.7753   LearningRate 0.0446   Epoch: 6   Global Step: 110800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:04,420-Speed 9393.57 samples/sec   Loss 6.9127   LearningRate 0.0446   Epoch: 6   Global Step: 110810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:05,517-Speed 9331.49 samples/sec   Loss 6.9142   LearningRate 0.0446   Epoch: 6   Global Step: 110820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:06,603-Speed 9442.06 samples/sec   Loss 6.8911   LearningRate 0.0446   Epoch: 6   Global Step: 110830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:07,686-Speed 9454.27 samples/sec   Loss 6.8835   LearningRate 0.0446   Epoch: 6   Global Step: 110840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:08,737-Speed 9749.63 samples/sec   Loss 6.7294   LearningRate 0.0446   Epoch: 6   Global Step: 110850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:09,832-Speed 9361.31 samples/sec   Loss 6.9183   LearningRate 0.0446   Epoch: 6   Global Step: 110860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:10,930-Speed 9331.73 samples/sec   Loss 6.8751   LearningRate 0.0446   Epoch: 6   Global Step: 110870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:11,988-Speed 9684.83 samples/sec   Loss 6.8897   LearningRate 0.0446   Epoch: 6   Global Step: 110880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:13,147-Speed 8834.69 samples/sec   Loss 6.9579   LearningRate 0.0446   Epoch: 6   Global Step: 110890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:14,238-Speed 9391.11 samples/sec   Loss 6.9349   LearningRate 0.0446   Epoch: 6   Global Step: 110900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:15,341-Speed 9296.05 samples/sec   Loss 7.0523   LearningRate 0.0446   Epoch: 6   Global Step: 110910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:16,416-Speed 9530.26 samples/sec   Loss 6.8797   LearningRate 0.0446   Epoch: 6   Global Step: 110920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:17,509-Speed 9370.16 samples/sec   Loss 6.9291   LearningRate 0.0446   Epoch: 6   Global Step: 110930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:18,607-Speed 9334.21 samples/sec   Loss 6.8774   LearningRate 0.0446   Epoch: 6   Global Step: 110940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:19,685-Speed 9503.12 samples/sec   Loss 6.7198   LearningRate 0.0446   Epoch: 6   Global Step: 110950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:20,807-Speed 9132.75 samples/sec   Loss 6.8502   LearningRate 0.0446   Epoch: 6   Global Step: 110960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:21,922-Speed 9196.41 samples/sec   Loss 6.7989   LearningRate 0.0446   Epoch: 6   Global Step: 110970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:22,975-Speed 9730.57 samples/sec   Loss 6.9418   LearningRate 0.0446   Epoch: 6   Global Step: 110980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:24,050-Speed 9531.61 samples/sec   Loss 6.8226   LearningRate 0.0446   Epoch: 6   Global Step: 110990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:25,140-Speed 9396.70 samples/sec   Loss 6.9454   LearningRate 0.0446   Epoch: 6   Global Step: 111000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:26,218-Speed 9507.71 samples/sec   Loss 6.8808   LearningRate 0.0445   Epoch: 6   Global Step: 111010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:27,329-Speed 9217.16 samples/sec   Loss 6.8205   LearningRate 0.0445   Epoch: 6   Global Step: 111020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:28,372-Speed 9826.16 samples/sec   Loss 6.9571   LearningRate 0.0445   Epoch: 6   Global Step: 111030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:29,422-Speed 9760.25 samples/sec   Loss 6.9653   LearningRate 0.0445   Epoch: 6   Global Step: 111040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:30,463-Speed 9838.29 samples/sec   Loss 6.8645   LearningRate 0.0445   Epoch: 6   Global Step: 111050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:31,553-Speed 9403.24 samples/sec   Loss 6.8191   LearningRate 0.0445   Epoch: 6   Global Step: 111060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:32,620-Speed 9602.90 samples/sec   Loss 6.8593   LearningRate 0.0445   Epoch: 6   Global Step: 111070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:33,711-Speed 9389.22 samples/sec   Loss 6.9639   LearningRate 0.0445   Epoch: 6   Global Step: 111080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:34,846-Speed 9028.25 samples/sec   Loss 6.9011   LearningRate 0.0445   Epoch: 6   Global Step: 111090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:35,954-Speed 9247.88 samples/sec   Loss 6.9964   LearningRate 0.0445   Epoch: 6   Global Step: 111100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:37,098-Speed 8953.26 samples/sec   Loss 6.8539   LearningRate 0.0445   Epoch: 6   Global Step: 111110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:38,234-Speed 9023.29 samples/sec   Loss 6.9475   LearningRate 0.0445   Epoch: 6   Global Step: 111120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:39,325-Speed 9399.25 samples/sec   Loss 6.9278   LearningRate 0.0445   Epoch: 6   Global Step: 111130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:40,405-Speed 9486.75 samples/sec   Loss 6.7814   LearningRate 0.0445   Epoch: 6   Global Step: 111140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:41,499-Speed 9365.60 samples/sec   Loss 6.7481   LearningRate 0.0445   Epoch: 6   Global Step: 111150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:42,541-Speed 9826.06 samples/sec   Loss 6.9546   LearningRate 0.0445   Epoch: 6   Global Step: 111160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:43,562-Speed 10043.77 samples/sec   Loss 6.8539   LearningRate 0.0445   Epoch: 6   Global Step: 111170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:44,635-Speed 9548.22 samples/sec   Loss 6.8159   LearningRate 0.0445   Epoch: 6   Global Step: 111180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:45,697-Speed 9646.35 samples/sec   Loss 6.8115   LearningRate 0.0445   Epoch: 6   Global Step: 111190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:46,758-Speed 9650.53 samples/sec   Loss 6.9414   LearningRate 0.0445   Epoch: 6   Global Step: 111200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:47,787-Speed 9964.44 samples/sec   Loss 6.8090   LearningRate 0.0445   Epoch: 6   Global Step: 111210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:48,860-Speed 9542.51 samples/sec   Loss 6.9167   LearningRate 0.0445   Epoch: 6   Global Step: 111220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:49,957-Speed 9344.78 samples/sec   Loss 7.0247   LearningRate 0.0445   Epoch: 6   Global Step: 111230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:00:51,029-Speed 9559.13 samples/sec   Loss 6.7724   LearningRate 0.0445   Epoch: 6   Global Step: 111240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:52,083-Speed 9716.17 samples/sec   Loss 6.8972   LearningRate 0.0445   Epoch: 6   Global Step: 111250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:53,157-Speed 9545.36 samples/sec   Loss 6.8561   LearningRate 0.0444   Epoch: 6   Global Step: 111260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:54,242-Speed 9444.87 samples/sec   Loss 6.9220   LearningRate 0.0444   Epoch: 6   Global Step: 111270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:55,321-Speed 9490.91 samples/sec   Loss 6.7823   LearningRate 0.0444   Epoch: 6   Global Step: 111280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:56,410-Speed 9412.81 samples/sec   Loss 6.9358   LearningRate 0.0444   Epoch: 6   Global Step: 111290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:57,504-Speed 9359.49 samples/sec   Loss 6.8823   LearningRate 0.0444   Epoch: 6   Global Step: 111300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:58,621-Speed 9173.43 samples/sec   Loss 7.0397   LearningRate 0.0444   Epoch: 6   Global Step: 111310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:00:59,755-Speed 9036.59 samples/sec   Loss 6.9719   LearningRate 0.0444   Epoch: 6   Global Step: 111320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:00,831-Speed 9531.23 samples/sec   Loss 6.9896   LearningRate 0.0444   Epoch: 6   Global Step: 111330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:01,905-Speed 9539.14 samples/sec   Loss 6.8588   LearningRate 0.0444   Epoch: 6   Global Step: 111340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:02,988-Speed 9458.96 samples/sec   Loss 6.8462   LearningRate 0.0444   Epoch: 6   Global Step: 111350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:04,055-Speed 9602.58 samples/sec   Loss 6.9406   LearningRate 0.0444   Epoch: 6   Global Step: 111360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:05,105-Speed 9749.94 samples/sec   Loss 6.9653   LearningRate 0.0444   Epoch: 6   Global Step: 111370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:06,167-Speed 9655.29 samples/sec   Loss 6.9334   LearningRate 0.0444   Epoch: 6   Global Step: 111380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:07,217-Speed 9757.09 samples/sec   Loss 6.8712   LearningRate 0.0444   Epoch: 6   Global Step: 111390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:08,279-Speed 9644.85 samples/sec   Loss 6.9684   LearningRate 0.0444   Epoch: 6   Global Step: 111400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:09,322-Speed 9829.97 samples/sec   Loss 6.8326   LearningRate 0.0444   Epoch: 6   Global Step: 111410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:10,395-Speed 9541.46 samples/sec   Loss 6.9074   LearningRate 0.0444   Epoch: 6   Global Step: 111420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:11,497-Speed 9299.68 samples/sec   Loss 6.9401   LearningRate 0.0444   Epoch: 6   Global Step: 111430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:12,560-Speed 9638.13 samples/sec   Loss 6.8704   LearningRate 0.0444   Epoch: 6   Global Step: 111440   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:01:13,629-Speed 9588.07 samples/sec   Loss 6.8483   LearningRate 0.0444   Epoch: 6   Global Step: 111450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:14,731-Speed 9295.53 samples/sec   Loss 6.9258   LearningRate 0.0444   Epoch: 6   Global Step: 111460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:15,830-Speed 9318.70 samples/sec   Loss 6.8595   LearningRate 0.0444   Epoch: 6   Global Step: 111470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:16,899-Speed 9584.77 samples/sec   Loss 6.8672   LearningRate 0.0444   Epoch: 6   Global Step: 111480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:18,013-Speed 9203.45 samples/sec   Loss 6.8938   LearningRate 0.0444   Epoch: 6   Global Step: 111490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:19,096-Speed 9461.76 samples/sec   Loss 6.9371   LearningRate 0.0444   Epoch: 6   Global Step: 111500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:20,163-Speed 9604.96 samples/sec   Loss 6.9452   LearningRate 0.0443   Epoch: 6   Global Step: 111510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:21,235-Speed 9563.28 samples/sec   Loss 6.8259   LearningRate 0.0443   Epoch: 6   Global Step: 111520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:22,316-Speed 9469.43 samples/sec   Loss 6.9966   LearningRate 0.0443   Epoch: 6   Global Step: 111530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:23,369-Speed 9733.50 samples/sec   Loss 6.9797   LearningRate 0.0443   Epoch: 6   Global Step: 111540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:24,466-Speed 9342.86 samples/sec   Loss 6.8097   LearningRate 0.0443   Epoch: 6   Global Step: 111550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:25,508-Speed 9827.94 samples/sec   Loss 6.8132   LearningRate 0.0443   Epoch: 6   Global Step: 111560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:26,574-Speed 9613.71 samples/sec   Loss 6.8127   LearningRate 0.0443   Epoch: 6   Global Step: 111570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:27,646-Speed 9561.84 samples/sec   Loss 6.8742   LearningRate 0.0443   Epoch: 6   Global Step: 111580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:28,705-Speed 9671.34 samples/sec   Loss 6.8303   LearningRate 0.0443   Epoch: 6   Global Step: 111590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:29,743-Speed 9867.88 samples/sec   Loss 6.9115   LearningRate 0.0443   Epoch: 6   Global Step: 111600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:30,765-Speed 10036.58 samples/sec   Loss 6.8400   LearningRate 0.0443   Epoch: 6   Global Step: 111610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:31,872-Speed 9249.49 samples/sec   Loss 6.8820   LearningRate 0.0443   Epoch: 6   Global Step: 111620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:32,952-Speed 9488.30 samples/sec   Loss 6.8438   LearningRate 0.0443   Epoch: 6   Global Step: 111630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:33,980-Speed 9969.06 samples/sec   Loss 6.8898   LearningRate 0.0443   Epoch: 6   Global Step: 111640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:35,039-Speed 9670.46 samples/sec   Loss 6.7121   LearningRate 0.0443   Epoch: 6   Global Step: 111650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:36,145-Speed 9266.10 samples/sec   Loss 6.9649   LearningRate 0.0443   Epoch: 6   Global Step: 111660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:37,261-Speed 9187.45 samples/sec   Loss 6.6999   LearningRate 0.0443   Epoch: 6   Global Step: 111670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:38,388-Speed 9086.69 samples/sec   Loss 6.8389   LearningRate 0.0443   Epoch: 6   Global Step: 111680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:39,425-Speed 9884.12 samples/sec   Loss 6.8039   LearningRate 0.0443   Epoch: 6   Global Step: 111690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:40,520-Speed 9359.98 samples/sec   Loss 6.9100   LearningRate 0.0443   Epoch: 6   Global Step: 111700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:41,579-Speed 9666.59 samples/sec   Loss 6.8757   LearningRate 0.0443   Epoch: 6   Global Step: 111710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:42,618-Speed 9866.27 samples/sec   Loss 6.8287   LearningRate 0.0443   Epoch: 6   Global Step: 111720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:43,726-Speed 9248.26 samples/sec   Loss 6.9103   LearningRate 0.0443   Epoch: 6   Global Step: 111730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:44,796-Speed 9573.05 samples/sec   Loss 6.8765   LearningRate 0.0443   Epoch: 6   Global Step: 111740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:45,846-Speed 9759.17 samples/sec   Loss 6.8063   LearningRate 0.0443   Epoch: 6   Global Step: 111750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:46,934-Speed 9411.73 samples/sec   Loss 6.8438   LearningRate 0.0443   Epoch: 6   Global Step: 111760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:48,016-Speed 9474.50 samples/sec   Loss 6.8698   LearningRate 0.0442   Epoch: 6   Global Step: 111770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:49,109-Speed 9373.86 samples/sec   Loss 6.8321   LearningRate 0.0442   Epoch: 6   Global Step: 111780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:50,190-Speed 9482.24 samples/sec   Loss 6.8198   LearningRate 0.0442   Epoch: 6   Global Step: 111790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:51,325-Speed 9028.84 samples/sec   Loss 6.9205   LearningRate 0.0442   Epoch: 6   Global Step: 111800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:52,403-Speed 9501.74 samples/sec   Loss 6.9502   LearningRate 0.0442   Epoch: 6   Global Step: 111810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:53,491-Speed 9413.56 samples/sec   Loss 6.8588   LearningRate 0.0442   Epoch: 6   Global Step: 111820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:01:54,580-Speed 9408.19 samples/sec   Loss 6.7928   LearningRate 0.0442   Epoch: 6   Global Step: 111830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:55,666-Speed 9440.16 samples/sec   Loss 6.9436   LearningRate 0.0442   Epoch: 6   Global Step: 111840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:56,754-Speed 9412.36 samples/sec   Loss 6.8722   LearningRate 0.0442   Epoch: 6   Global Step: 111850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:57,873-Speed 9159.42 samples/sec   Loss 6.8772   LearningRate 0.0442   Epoch: 6   Global Step: 111860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:01:58,943-Speed 9572.53 samples/sec   Loss 6.9544   LearningRate 0.0442   Epoch: 6   Global Step: 111870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:02:00,064-Speed 9144.95 samples/sec   Loss 6.9225   LearningRate 0.0442   Epoch: 6   Global Step: 111880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:02:01,166-Speed 9299.48 samples/sec   Loss 6.8580   LearningRate 0.0442   Epoch: 6   Global Step: 111890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:02:02,239-Speed 9548.97 samples/sec   Loss 6.9132   LearningRate 0.0442   Epoch: 6   Global Step: 111900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:02:03,272-Speed 9911.10 samples/sec   Loss 6.8217   LearningRate 0.0442   Epoch: 6   Global Step: 111910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:02:04,322-Speed 9762.31 samples/sec   Loss 6.8490   LearningRate 0.0442   Epoch: 6   Global Step: 111920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:02:05,390-Speed 9594.88 samples/sec   Loss 6.9522   LearningRate 0.0442   Epoch: 6   Global Step: 111930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:02:06,507-Speed 9169.90 samples/sec   Loss 6.8968   LearningRate 0.0442   Epoch: 6   Global Step: 111940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:02:07,556-Speed 9771.36 samples/sec   Loss 6.9628   LearningRate 0.0442   Epoch: 6   Global Step: 111950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:02:08,650-Speed 9363.24 samples/sec   Loss 6.9879   LearningRate 0.0442   Epoch: 6   Global Step: 111960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:02:09,730-Speed 9489.96 samples/sec   Loss 6.8515   LearningRate 0.0442   Epoch: 6   Global Step: 111970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:02:10,786-Speed 9698.86 samples/sec   Loss 6.9190   LearningRate 0.0442   Epoch: 6   Global Step: 111980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:02:11,837-Speed 9746.61 samples/sec   Loss 6.8941   LearningRate 0.0442   Epoch: 6   Global Step: 111990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:02:12,917-Speed 9491.76 samples/sec   Loss 6.9265   LearningRate 0.0442   Epoch: 6   Global Step: 112000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:02:34,791-[lfw][112000]XNorm: 10.958596
Training: 2022-04-11 16:02:34,791-[lfw][112000]Accuracy-Flip: 0.99633+-0.00296
Training: 2022-04-11 16:02:34,792-[lfw][112000]Accuracy-Highest: 0.99683
Training: 2022-04-11 16:03:00,101-[cfp_fp][112000]XNorm: 9.443154
Training: 2022-04-11 16:03:00,102-[cfp_fp][112000]Accuracy-Flip: 0.95257+-0.01326
Training: 2022-04-11 16:03:00,103-[cfp_fp][112000]Accuracy-Highest: 0.96157
Training: 2022-04-11 16:03:21,988-[agedb_30][112000]XNorm: 10.618512
Training: 2022-04-11 16:03:21,989-[agedb_30][112000]Accuracy-Flip: 0.96017+-0.00970
Training: 2022-04-11 16:03:21,990-[agedb_30][112000]Accuracy-Highest: 0.96483
Training: 2022-04-11 16:03:23,060-Speed 145.99 samples/sec   Loss 6.8568   LearningRate 0.0442   Epoch: 6   Global Step: 112010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:03:24,166-Speed 9261.37 samples/sec   Loss 6.9093   LearningRate 0.0441   Epoch: 6   Global Step: 112020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:03:25,239-Speed 9549.75 samples/sec   Loss 6.8922   LearningRate 0.0441   Epoch: 6   Global Step: 112030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:03:26,271-Speed 9921.72 samples/sec   Loss 6.9085   LearningRate 0.0441   Epoch: 6   Global Step: 112040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:03:27,312-Speed 9847.32 samples/sec   Loss 6.9158   LearningRate 0.0441   Epoch: 6   Global Step: 112050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:03:28,364-Speed 9739.26 samples/sec   Loss 6.8637   LearningRate 0.0441   Epoch: 6   Global Step: 112060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:03:29,401-Speed 9884.63 samples/sec   Loss 6.8420   LearningRate 0.0441   Epoch: 6   Global Step: 112070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:30,442-Speed 9837.99 samples/sec   Loss 6.9096   LearningRate 0.0441   Epoch: 6   Global Step: 112080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:31,486-Speed 9813.51 samples/sec   Loss 6.9852   LearningRate 0.0441   Epoch: 6   Global Step: 112090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:32,542-Speed 9706.27 samples/sec   Loss 6.8462   LearningRate 0.0441   Epoch: 6   Global Step: 112100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:33,621-Speed 9494.48 samples/sec   Loss 6.8572   LearningRate 0.0441   Epoch: 6   Global Step: 112110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:34,664-Speed 9821.89 samples/sec   Loss 6.7703   LearningRate 0.0441   Epoch: 6   Global Step: 112120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:35,744-Speed 9490.17 samples/sec   Loss 6.7991   LearningRate 0.0441   Epoch: 6   Global Step: 112130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:36,874-Speed 9065.87 samples/sec   Loss 6.9030   LearningRate 0.0441   Epoch: 6   Global Step: 112140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:37,990-Speed 9183.85 samples/sec   Loss 6.9687   LearningRate 0.0441   Epoch: 6   Global Step: 112150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:39,098-Speed 9246.70 samples/sec   Loss 6.9746   LearningRate 0.0441   Epoch: 6   Global Step: 112160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:40,185-Speed 9423.63 samples/sec   Loss 6.8918   LearningRate 0.0441   Epoch: 6   Global Step: 112170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:41,252-Speed 9606.95 samples/sec   Loss 6.8111   LearningRate 0.0441   Epoch: 6   Global Step: 112180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:42,302-Speed 9765.71 samples/sec   Loss 6.7345   LearningRate 0.0441   Epoch: 6   Global Step: 112190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:43,370-Speed 9585.28 samples/sec   Loss 6.7768   LearningRate 0.0441   Epoch: 6   Global Step: 112200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:44,423-Speed 9734.49 samples/sec   Loss 6.9237   LearningRate 0.0441   Epoch: 6   Global Step: 112210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:45,499-Speed 9519.75 samples/sec   Loss 6.8183   LearningRate 0.0441   Epoch: 6   Global Step: 112220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:46,546-Speed 9786.61 samples/sec   Loss 6.9109   LearningRate 0.0441   Epoch: 6   Global Step: 112230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:47,608-Speed 9651.39 samples/sec   Loss 6.8918   LearningRate 0.0441   Epoch: 6   Global Step: 112240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:48,670-Speed 9647.94 samples/sec   Loss 6.9182   LearningRate 0.0441   Epoch: 6   Global Step: 112250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:03:49,698-Speed 9973.17 samples/sec   Loss 6.8322   LearningRate 0.0441   Epoch: 6   Global Step: 112260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:03:50,774-Speed 9521.10 samples/sec   Loss 6.9543   LearningRate 0.0440   Epoch: 6   Global Step: 112270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:03:51,842-Speed 9592.93 samples/sec   Loss 6.9652   LearningRate 0.0440   Epoch: 6   Global Step: 112280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:03:52,891-Speed 9765.44 samples/sec   Loss 6.8450   LearningRate 0.0440   Epoch: 6   Global Step: 112290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:03:53,959-Speed 9590.34 samples/sec   Loss 6.8746   LearningRate 0.0440   Epoch: 6   Global Step: 112300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:03:55,049-Speed 9399.25 samples/sec   Loss 6.7918   LearningRate 0.0440   Epoch: 6   Global Step: 112310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:03:56,131-Speed 9467.75 samples/sec   Loss 6.8601   LearningRate 0.0440   Epoch: 6   Global Step: 112320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:03:57,279-Speed 8928.42 samples/sec   Loss 6.8623   LearningRate 0.0440   Epoch: 6   Global Step: 112330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:03:58,361-Speed 9468.86 samples/sec   Loss 6.8906   LearningRate 0.0440   Epoch: 6   Global Step: 112340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:03:59,407-Speed 9801.41 samples/sec   Loss 6.9064   LearningRate 0.0440   Epoch: 6   Global Step: 112350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:00,489-Speed 9475.15 samples/sec   Loss 6.7749   LearningRate 0.0440   Epoch: 6   Global Step: 112360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:01,595-Speed 9260.78 samples/sec   Loss 6.7579   LearningRate 0.0440   Epoch: 6   Global Step: 112370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:02,675-Speed 9484.95 samples/sec   Loss 6.8525   LearningRate 0.0440   Epoch: 6   Global Step: 112380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:03,752-Speed 9509.98 samples/sec   Loss 6.8831   LearningRate 0.0440   Epoch: 6   Global Step: 112390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:04,839-Speed 9426.44 samples/sec   Loss 6.8487   LearningRate 0.0440   Epoch: 6   Global Step: 112400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:05,878-Speed 9859.85 samples/sec   Loss 6.8909   LearningRate 0.0440   Epoch: 6   Global Step: 112410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:06,947-Speed 9589.29 samples/sec   Loss 6.8370   LearningRate 0.0440   Epoch: 6   Global Step: 112420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:07,999-Speed 9733.80 samples/sec   Loss 6.9084   LearningRate 0.0440   Epoch: 6   Global Step: 112430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:09,056-Speed 9692.65 samples/sec   Loss 6.8248   LearningRate 0.0440   Epoch: 6   Global Step: 112440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:10,117-Speed 9664.78 samples/sec   Loss 6.8918   LearningRate 0.0440   Epoch: 6   Global Step: 112450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:11,191-Speed 9535.84 samples/sec   Loss 6.7859   LearningRate 0.0440   Epoch: 6   Global Step: 112460   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:04:12,232-Speed 9845.03 samples/sec   Loss 6.9633   LearningRate 0.0440   Epoch: 6   Global Step: 112470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:13,318-Speed 9431.08 samples/sec   Loss 6.8837   LearningRate 0.0440   Epoch: 6   Global Step: 112480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:14,427-Speed 9240.93 samples/sec   Loss 6.8832   LearningRate 0.0440   Epoch: 6   Global Step: 112490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:15,546-Speed 9155.08 samples/sec   Loss 7.0007   LearningRate 0.0440   Epoch: 6   Global Step: 112500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:16,623-Speed 9520.52 samples/sec   Loss 6.8612   LearningRate 0.0440   Epoch: 6   Global Step: 112510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:17,705-Speed 9537.58 samples/sec   Loss 6.8551   LearningRate 0.0439   Epoch: 6   Global Step: 112520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:18,763-Speed 9680.44 samples/sec   Loss 6.7893   LearningRate 0.0439   Epoch: 6   Global Step: 112530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:19,822-Speed 9676.29 samples/sec   Loss 6.8326   LearningRate 0.0439   Epoch: 6   Global Step: 112540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:20,854-Speed 9923.42 samples/sec   Loss 6.8176   LearningRate 0.0439   Epoch: 6   Global Step: 112550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:21,915-Speed 9659.21 samples/sec   Loss 6.7855   LearningRate 0.0439   Epoch: 6   Global Step: 112560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:23,032-Speed 9173.94 samples/sec   Loss 6.8726   LearningRate 0.0439   Epoch: 6   Global Step: 112570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:24,154-Speed 9133.05 samples/sec   Loss 6.8729   LearningRate 0.0439   Epoch: 6   Global Step: 112580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:25,199-Speed 9801.05 samples/sec   Loss 6.8374   LearningRate 0.0439   Epoch: 6   Global Step: 112590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:26,249-Speed 9759.23 samples/sec   Loss 6.8448   LearningRate 0.0439   Epoch: 6   Global Step: 112600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:27,319-Speed 9574.18 samples/sec   Loss 6.8476   LearningRate 0.0439   Epoch: 6   Global Step: 112610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:28,371-Speed 9734.66 samples/sec   Loss 6.8083   LearningRate 0.0439   Epoch: 6   Global Step: 112620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:29,416-Speed 9807.91 samples/sec   Loss 6.7934   LearningRate 0.0439   Epoch: 6   Global Step: 112630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:30,508-Speed 9386.08 samples/sec   Loss 6.9124   LearningRate 0.0439   Epoch: 6   Global Step: 112640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:31,569-Speed 9654.22 samples/sec   Loss 6.8770   LearningRate 0.0439   Epoch: 6   Global Step: 112650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:32,634-Speed 9625.64 samples/sec   Loss 6.8555   LearningRate 0.0439   Epoch: 6   Global Step: 112660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:33,690-Speed 9699.81 samples/sec   Loss 6.8761   LearningRate 0.0439   Epoch: 6   Global Step: 112670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:34,731-Speed 9843.88 samples/sec   Loss 6.8279   LearningRate 0.0439   Epoch: 6   Global Step: 112680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:35,761-Speed 9949.75 samples/sec   Loss 6.8648   LearningRate 0.0439   Epoch: 6   Global Step: 112690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:36,826-Speed 9619.03 samples/sec   Loss 6.7789   LearningRate 0.0439   Epoch: 6   Global Step: 112700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:37,868-Speed 9833.80 samples/sec   Loss 6.8849   LearningRate 0.0439   Epoch: 6   Global Step: 112710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:38,905-Speed 9878.67 samples/sec   Loss 6.8620   LearningRate 0.0439   Epoch: 6   Global Step: 112720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:39,961-Speed 9700.79 samples/sec   Loss 6.8454   LearningRate 0.0439   Epoch: 6   Global Step: 112730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:41,005-Speed 9827.25 samples/sec   Loss 7.0040   LearningRate 0.0439   Epoch: 6   Global Step: 112740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:42,101-Speed 9345.60 samples/sec   Loss 6.8671   LearningRate 0.0439   Epoch: 6   Global Step: 112750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:43,177-Speed 9522.76 samples/sec   Loss 6.8715   LearningRate 0.0439   Epoch: 6   Global Step: 112760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:44,243-Speed 9608.38 samples/sec   Loss 6.9838   LearningRate 0.0438   Epoch: 6   Global Step: 112770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:45,337-Speed 9368.42 samples/sec   Loss 6.8367   LearningRate 0.0438   Epoch: 6   Global Step: 112780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:46,431-Speed 9359.37 samples/sec   Loss 6.8663   LearningRate 0.0438   Epoch: 6   Global Step: 112790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:47,491-Speed 9679.92 samples/sec   Loss 6.8439   LearningRate 0.0438   Epoch: 6   Global Step: 112800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:48,568-Speed 9506.28 samples/sec   Loss 6.9632   LearningRate 0.0438   Epoch: 6   Global Step: 112810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:49,662-Speed 9364.74 samples/sec   Loss 6.9588   LearningRate 0.0438   Epoch: 6   Global Step: 112820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:50,806-Speed 8960.07 samples/sec   Loss 6.8738   LearningRate 0.0438   Epoch: 6   Global Step: 112830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:51,900-Speed 9367.65 samples/sec   Loss 6.9314   LearningRate 0.0438   Epoch: 6   Global Step: 112840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:53,007-Speed 9256.02 samples/sec   Loss 6.7429   LearningRate 0.0438   Epoch: 6   Global Step: 112850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:54,108-Speed 9301.79 samples/sec   Loss 6.8508   LearningRate 0.0438   Epoch: 6   Global Step: 112860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:55,208-Speed 9315.70 samples/sec   Loss 6.8397   LearningRate 0.0438   Epoch: 6   Global Step: 112870   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:04:56,332-Speed 9114.59 samples/sec   Loss 6.8224   LearningRate 0.0438   Epoch: 6   Global Step: 112880   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:04:57,405-Speed 9552.90 samples/sec   Loss 6.7735   LearningRate 0.0438   Epoch: 6   Global Step: 112890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:04:58,496-Speed 9386.50 samples/sec   Loss 6.8806   LearningRate 0.0438   Epoch: 6   Global Step: 112900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:04:59,608-Speed 9218.81 samples/sec   Loss 6.8589   LearningRate 0.0438   Epoch: 6   Global Step: 112910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:00,672-Speed 9634.35 samples/sec   Loss 6.8716   LearningRate 0.0438   Epoch: 6   Global Step: 112920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:01,771-Speed 9320.76 samples/sec   Loss 6.7288   LearningRate 0.0438   Epoch: 6   Global Step: 112930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:02,842-Speed 9563.71 samples/sec   Loss 7.0437   LearningRate 0.0438   Epoch: 6   Global Step: 112940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:03,909-Speed 9602.12 samples/sec   Loss 6.8021   LearningRate 0.0438   Epoch: 6   Global Step: 112950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:05,003-Speed 9370.44 samples/sec   Loss 6.8515   LearningRate 0.0438   Epoch: 6   Global Step: 112960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:06,105-Speed 9291.85 samples/sec   Loss 6.8994   LearningRate 0.0438   Epoch: 6   Global Step: 112970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:07,231-Speed 9106.26 samples/sec   Loss 6.8206   LearningRate 0.0438   Epoch: 6   Global Step: 112980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:08,300-Speed 9584.88 samples/sec   Loss 6.8237   LearningRate 0.0438   Epoch: 6   Global Step: 112990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:09,405-Speed 9270.49 samples/sec   Loss 6.8414   LearningRate 0.0438   Epoch: 6   Global Step: 113000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:10,475-Speed 9573.83 samples/sec   Loss 6.8664   LearningRate 0.0438   Epoch: 6   Global Step: 113010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:11,540-Speed 9623.22 samples/sec   Loss 6.9245   LearningRate 0.0437   Epoch: 6   Global Step: 113020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:12,653-Speed 9211.52 samples/sec   Loss 6.8395   LearningRate 0.0437   Epoch: 6   Global Step: 113030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:13,742-Speed 9407.96 samples/sec   Loss 6.9235   LearningRate 0.0437   Epoch: 6   Global Step: 113040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:14,796-Speed 9725.91 samples/sec   Loss 6.8534   LearningRate 0.0437   Epoch: 6   Global Step: 113050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:15,889-Speed 9373.12 samples/sec   Loss 6.9424   LearningRate 0.0437   Epoch: 6   Global Step: 113060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:16,955-Speed 9614.15 samples/sec   Loss 6.9442   LearningRate 0.0437   Epoch: 6   Global Step: 113070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:18,030-Speed 9531.47 samples/sec   Loss 7.0037   LearningRate 0.0437   Epoch: 6   Global Step: 113080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:19,105-Speed 9527.94 samples/sec   Loss 6.7850   LearningRate 0.0437   Epoch: 6   Global Step: 113090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:20,199-Speed 9371.02 samples/sec   Loss 6.8787   LearningRate 0.0437   Epoch: 6   Global Step: 113100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:21,247-Speed 9771.28 samples/sec   Loss 6.9058   LearningRate 0.0437   Epoch: 6   Global Step: 113110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:22,365-Speed 9168.49 samples/sec   Loss 6.8426   LearningRate 0.0437   Epoch: 6   Global Step: 113120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:23,434-Speed 9580.55 samples/sec   Loss 6.8613   LearningRate 0.0437   Epoch: 6   Global Step: 113130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:24,537-Speed 9288.40 samples/sec   Loss 6.9343   LearningRate 0.0437   Epoch: 6   Global Step: 113140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:25,631-Speed 9369.15 samples/sec   Loss 6.9522   LearningRate 0.0437   Epoch: 6   Global Step: 113150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:26,723-Speed 9378.10 samples/sec   Loss 6.7783   LearningRate 0.0437   Epoch: 6   Global Step: 113160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:27,822-Speed 9325.19 samples/sec   Loss 6.8294   LearningRate 0.0437   Epoch: 6   Global Step: 113170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:28,894-Speed 9553.55 samples/sec   Loss 6.8603   LearningRate 0.0437   Epoch: 6   Global Step: 113180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:30,013-Speed 9158.90 samples/sec   Loss 6.8222   LearningRate 0.0437   Epoch: 6   Global Step: 113190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:31,067-Speed 9721.62 samples/sec   Loss 6.7855   LearningRate 0.0437   Epoch: 6   Global Step: 113200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:32,124-Speed 9694.97 samples/sec   Loss 6.8958   LearningRate 0.0437   Epoch: 6   Global Step: 113210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:33,163-Speed 9868.71 samples/sec   Loss 6.9713   LearningRate 0.0437   Epoch: 6   Global Step: 113220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:34,200-Speed 9871.20 samples/sec   Loss 6.8220   LearningRate 0.0437   Epoch: 6   Global Step: 113230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:35,320-Speed 9154.92 samples/sec   Loss 6.8761   LearningRate 0.0437   Epoch: 6   Global Step: 113240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:36,412-Speed 9376.08 samples/sec   Loss 6.9850   LearningRate 0.0437   Epoch: 6   Global Step: 113250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:37,490-Speed 9509.91 samples/sec   Loss 6.9873   LearningRate 0.0437   Epoch: 6   Global Step: 113260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:38,527-Speed 9878.28 samples/sec   Loss 6.8707   LearningRate 0.0437   Epoch: 6   Global Step: 113270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:39,563-Speed 9894.46 samples/sec   Loss 6.9093   LearningRate 0.0436   Epoch: 6   Global Step: 113280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:40,672-Speed 9239.16 samples/sec   Loss 6.8529   LearningRate 0.0436   Epoch: 6   Global Step: 113290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:41,769-Speed 9341.81 samples/sec   Loss 6.8712   LearningRate 0.0436   Epoch: 6   Global Step: 113300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:42,863-Speed 9364.42 samples/sec   Loss 6.8188   LearningRate 0.0436   Epoch: 6   Global Step: 113310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:43,938-Speed 9530.11 samples/sec   Loss 6.8011   LearningRate 0.0436   Epoch: 6   Global Step: 113320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:45,029-Speed 9398.34 samples/sec   Loss 6.9563   LearningRate 0.0436   Epoch: 6   Global Step: 113330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:46,092-Speed 9631.60 samples/sec   Loss 6.8075   LearningRate 0.0436   Epoch: 6   Global Step: 113340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:47,163-Speed 9568.68 samples/sec   Loss 6.8533   LearningRate 0.0436   Epoch: 6   Global Step: 113350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:48,260-Speed 9343.17 samples/sec   Loss 6.7580   LearningRate 0.0436   Epoch: 6   Global Step: 113360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:49,306-Speed 9797.46 samples/sec   Loss 6.8635   LearningRate 0.0436   Epoch: 6   Global Step: 113370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:50,349-Speed 9821.01 samples/sec   Loss 6.9523   LearningRate 0.0436   Epoch: 6   Global Step: 113380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:51,386-Speed 9882.06 samples/sec   Loss 6.7795   LearningRate 0.0436   Epoch: 6   Global Step: 113390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:52,473-Speed 9428.76 samples/sec   Loss 6.8667   LearningRate 0.0436   Epoch: 6   Global Step: 113400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:53,527-Speed 9722.41 samples/sec   Loss 6.7915   LearningRate 0.0436   Epoch: 6   Global Step: 113410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:54,576-Speed 9764.90 samples/sec   Loss 6.8690   LearningRate 0.0436   Epoch: 6   Global Step: 113420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:55,662-Speed 9437.72 samples/sec   Loss 6.8969   LearningRate 0.0436   Epoch: 6   Global Step: 113430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:05:56,738-Speed 9515.13 samples/sec   Loss 6.9839   LearningRate 0.0436   Epoch: 6   Global Step: 113440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:57,831-Speed 9374.72 samples/sec   Loss 6.9241   LearningRate 0.0436   Epoch: 6   Global Step: 113450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:58,905-Speed 9540.57 samples/sec   Loss 6.7680   LearningRate 0.0436   Epoch: 6   Global Step: 113460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:05:59,966-Speed 9659.29 samples/sec   Loss 6.8311   LearningRate 0.0436   Epoch: 6   Global Step: 113470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:01,014-Speed 9785.31 samples/sec   Loss 6.7965   LearningRate 0.0436   Epoch: 6   Global Step: 113480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:02,079-Speed 9616.38 samples/sec   Loss 6.7541   LearningRate 0.0436   Epoch: 6   Global Step: 113490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:03,138-Speed 9673.73 samples/sec   Loss 6.8375   LearningRate 0.0436   Epoch: 6   Global Step: 113500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:04,226-Speed 9418.36 samples/sec   Loss 6.8163   LearningRate 0.0436   Epoch: 6   Global Step: 113510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:05,300-Speed 9536.97 samples/sec   Loss 6.8234   LearningRate 0.0436   Epoch: 6   Global Step: 113520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:06,382-Speed 9471.99 samples/sec   Loss 6.8283   LearningRate 0.0435   Epoch: 6   Global Step: 113530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:07,449-Speed 9597.97 samples/sec   Loss 6.7667   LearningRate 0.0435   Epoch: 6   Global Step: 113540   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:06:08,536-Speed 9433.62 samples/sec   Loss 6.9030   LearningRate 0.0435   Epoch: 6   Global Step: 113550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:09,679-Speed 8964.64 samples/sec   Loss 6.7968   LearningRate 0.0435   Epoch: 6   Global Step: 113560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:10,789-Speed 9232.11 samples/sec   Loss 6.8331   LearningRate 0.0435   Epoch: 6   Global Step: 113570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:11,873-Speed 9448.54 samples/sec   Loss 6.8697   LearningRate 0.0435   Epoch: 6   Global Step: 113580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:12,974-Speed 9312.92 samples/sec   Loss 6.8072   LearningRate 0.0435   Epoch: 6   Global Step: 113590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:14,059-Speed 9439.56 samples/sec   Loss 6.8128   LearningRate 0.0435   Epoch: 6   Global Step: 113600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:15,093-Speed 9910.21 samples/sec   Loss 6.8948   LearningRate 0.0435   Epoch: 6   Global Step: 113610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:16,123-Speed 9939.99 samples/sec   Loss 6.8874   LearningRate 0.0435   Epoch: 6   Global Step: 113620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:17,176-Speed 9732.54 samples/sec   Loss 6.7947   LearningRate 0.0435   Epoch: 6   Global Step: 113630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:18,223-Speed 9787.94 samples/sec   Loss 6.8225   LearningRate 0.0435   Epoch: 6   Global Step: 113640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:19,327-Speed 9280.72 samples/sec   Loss 6.8263   LearningRate 0.0435   Epoch: 6   Global Step: 113650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:20,437-Speed 9234.43 samples/sec   Loss 6.8228   LearningRate 0.0435   Epoch: 6   Global Step: 113660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:21,500-Speed 9634.11 samples/sec   Loss 6.8152   LearningRate 0.0435   Epoch: 6   Global Step: 113670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:22,568-Speed 9593.35 samples/sec   Loss 6.8019   LearningRate 0.0435   Epoch: 6   Global Step: 113680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:23,624-Speed 9698.78 samples/sec   Loss 6.8817   LearningRate 0.0435   Epoch: 6   Global Step: 113690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:24,688-Speed 9631.79 samples/sec   Loss 6.7638   LearningRate 0.0435   Epoch: 6   Global Step: 113700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:25,753-Speed 9620.03 samples/sec   Loss 6.9047   LearningRate 0.0435   Epoch: 6   Global Step: 113710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:26,817-Speed 9633.78 samples/sec   Loss 6.8736   LearningRate 0.0435   Epoch: 6   Global Step: 113720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:27,901-Speed 9452.36 samples/sec   Loss 6.7726   LearningRate 0.0435   Epoch: 6   Global Step: 113730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:29,035-Speed 9038.82 samples/sec   Loss 6.7801   LearningRate 0.0435   Epoch: 6   Global Step: 113740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:30,102-Speed 9599.36 samples/sec   Loss 6.8822   LearningRate 0.0435   Epoch: 6   Global Step: 113750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:31,171-Speed 9593.92 samples/sec   Loss 6.7772   LearningRate 0.0435   Epoch: 6   Global Step: 113760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:32,354-Speed 8653.98 samples/sec   Loss 6.9146   LearningRate 0.0435   Epoch: 6   Global Step: 113770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:33,411-Speed 9696.18 samples/sec   Loss 6.8935   LearningRate 0.0434   Epoch: 6   Global Step: 113780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:34,431-Speed 10050.69 samples/sec   Loss 6.8482   LearningRate 0.0434   Epoch: 6   Global Step: 113790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:35,501-Speed 9570.62 samples/sec   Loss 6.8577   LearningRate 0.0434   Epoch: 6   Global Step: 113800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:36,611-Speed 9226.90 samples/sec   Loss 6.9195   LearningRate 0.0434   Epoch: 6   Global Step: 113810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:37,692-Speed 9483.47 samples/sec   Loss 6.8325   LearningRate 0.0434   Epoch: 6   Global Step: 113820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:38,826-Speed 9032.32 samples/sec   Loss 6.9460   LearningRate 0.0434   Epoch: 6   Global Step: 113830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:39,906-Speed 9483.30 samples/sec   Loss 6.7879   LearningRate 0.0434   Epoch: 6   Global Step: 113840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:41,008-Speed 9303.57 samples/sec   Loss 6.9016   LearningRate 0.0434   Epoch: 6   Global Step: 113850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:42,105-Speed 9338.50 samples/sec   Loss 6.8857   LearningRate 0.0434   Epoch: 6   Global Step: 113860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:43,208-Speed 9288.19 samples/sec   Loss 6.7898   LearningRate 0.0434   Epoch: 6   Global Step: 113870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:44,334-Speed 9100.75 samples/sec   Loss 6.9211   LearningRate 0.0434   Epoch: 6   Global Step: 113880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:45,406-Speed 9557.58 samples/sec   Loss 6.9295   LearningRate 0.0434   Epoch: 6   Global Step: 113890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:46,498-Speed 9388.27 samples/sec   Loss 6.8283   LearningRate 0.0434   Epoch: 6   Global Step: 113900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:47,538-Speed 9845.97 samples/sec   Loss 6.9163   LearningRate 0.0434   Epoch: 6   Global Step: 113910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:48,634-Speed 9354.67 samples/sec   Loss 6.8433   LearningRate 0.0434   Epoch: 6   Global Step: 113920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:49,727-Speed 9374.45 samples/sec   Loss 6.8531   LearningRate 0.0434   Epoch: 6   Global Step: 113930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:50,822-Speed 9353.61 samples/sec   Loss 6.8794   LearningRate 0.0434   Epoch: 6   Global Step: 113940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:51,921-Speed 9323.56 samples/sec   Loss 6.7360   LearningRate 0.0434   Epoch: 6   Global Step: 113950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:52,989-Speed 9592.86 samples/sec   Loss 6.8186   LearningRate 0.0434   Epoch: 6   Global Step: 113960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:54,078-Speed 9404.30 samples/sec   Loss 6.7877   LearningRate 0.0434   Epoch: 6   Global Step: 113970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:06:55,154-Speed 9527.13 samples/sec   Loss 6.7925   LearningRate 0.0434   Epoch: 6   Global Step: 113980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:56,224-Speed 9573.59 samples/sec   Loss 6.8506   LearningRate 0.0434   Epoch: 6   Global Step: 113990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:06:57,287-Speed 9641.68 samples/sec   Loss 6.7944   LearningRate 0.0434   Epoch: 6   Global Step: 114000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:07:19,335-[lfw][114000]XNorm: 11.049207
Training: 2022-04-11 16:07:19,336-[lfw][114000]Accuracy-Flip: 0.99600+-0.00260
Training: 2022-04-11 16:07:19,337-[lfw][114000]Accuracy-Highest: 0.99683
Training: 2022-04-11 16:07:44,774-[cfp_fp][114000]XNorm: 9.426555
Training: 2022-04-11 16:07:44,775-[cfp_fp][114000]Accuracy-Flip: 0.96071+-0.00978
Training: 2022-04-11 16:07:44,775-[cfp_fp][114000]Accuracy-Highest: 0.96157
Training: 2022-04-11 16:08:06,736-[agedb_30][114000]XNorm: 10.742994
Training: 2022-04-11 16:08:06,737-[agedb_30][114000]Accuracy-Flip: 0.96217+-0.01019
Training: 2022-04-11 16:08:06,737-[agedb_30][114000]Accuracy-Highest: 0.96483
Training: 2022-04-11 16:08:07,815-Speed 145.19 samples/sec   Loss 6.9462   LearningRate 0.0434   Epoch: 6   Global Step: 114010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:08,869-Speed 9727.13 samples/sec   Loss 6.9183   LearningRate 0.0434   Epoch: 6   Global Step: 114020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:09,929-Speed 9662.77 samples/sec   Loss 6.8832   LearningRate 0.0434   Epoch: 6   Global Step: 114030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:11,000-Speed 9568.92 samples/sec   Loss 6.7718   LearningRate 0.0433   Epoch: 6   Global Step: 114040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:08:12,127-Speed 9096.65 samples/sec   Loss 6.9774   LearningRate 0.0433   Epoch: 6   Global Step: 114050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:08:13,180-Speed 9721.81 samples/sec   Loss 6.8830   LearningRate 0.0433   Epoch: 6   Global Step: 114060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:08:14,259-Speed 9500.86 samples/sec   Loss 6.8161   LearningRate 0.0433   Epoch: 6   Global Step: 114070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:08:15,340-Speed 9481.94 samples/sec   Loss 6.8143   LearningRate 0.0433   Epoch: 6   Global Step: 114080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:08:16,422-Speed 9465.00 samples/sec   Loss 7.0218   LearningRate 0.0433   Epoch: 6   Global Step: 114090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:08:17,499-Speed 9519.47 samples/sec   Loss 6.9187   LearningRate 0.0433   Epoch: 6   Global Step: 114100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:08:18,560-Speed 9658.92 samples/sec   Loss 6.8028   LearningRate 0.0433   Epoch: 6   Global Step: 114110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:08:19,622-Speed 9641.95 samples/sec   Loss 6.8369   LearningRate 0.0433   Epoch: 6   Global Step: 114120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:08:20,666-Speed 9818.48 samples/sec   Loss 6.7271   LearningRate 0.0433   Epoch: 6   Global Step: 114130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:08:21,739-Speed 9550.97 samples/sec   Loss 6.7614   LearningRate 0.0433   Epoch: 6   Global Step: 114140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:22,788-Speed 9766.49 samples/sec   Loss 6.8543   LearningRate 0.0433   Epoch: 6   Global Step: 114150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:23,833-Speed 9803.78 samples/sec   Loss 6.8941   LearningRate 0.0433   Epoch: 6   Global Step: 114160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:24,951-Speed 9165.76 samples/sec   Loss 6.7954   LearningRate 0.0433   Epoch: 6   Global Step: 114170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:26,018-Speed 9600.37 samples/sec   Loss 6.7496   LearningRate 0.0433   Epoch: 6   Global Step: 114180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:27,090-Speed 9557.99 samples/sec   Loss 6.7684   LearningRate 0.0433   Epoch: 6   Global Step: 114190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:28,175-Speed 9445.54 samples/sec   Loss 6.8029   LearningRate 0.0433   Epoch: 6   Global Step: 114200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:29,236-Speed 9655.04 samples/sec   Loss 6.8080   LearningRate 0.0433   Epoch: 6   Global Step: 114210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:30,278-Speed 9840.14 samples/sec   Loss 6.8040   LearningRate 0.0433   Epoch: 6   Global Step: 114220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:31,334-Speed 9697.20 samples/sec   Loss 6.7976   LearningRate 0.0433   Epoch: 6   Global Step: 114230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:32,369-Speed 9903.78 samples/sec   Loss 6.7872   LearningRate 0.0433   Epoch: 6   Global Step: 114240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:33,465-Speed 9349.48 samples/sec   Loss 6.8554   LearningRate 0.0433   Epoch: 6   Global Step: 114250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:34,583-Speed 9164.36 samples/sec   Loss 6.8822   LearningRate 0.0433   Epoch: 6   Global Step: 114260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:35,714-Speed 9056.78 samples/sec   Loss 6.9090   LearningRate 0.0433   Epoch: 6   Global Step: 114270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:36,834-Speed 9145.81 samples/sec   Loss 6.7741   LearningRate 0.0433   Epoch: 6   Global Step: 114280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:37,925-Speed 9396.26 samples/sec   Loss 6.7206   LearningRate 0.0432   Epoch: 6   Global Step: 114290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:39,003-Speed 9500.52 samples/sec   Loss 6.7960   LearningRate 0.0432   Epoch: 6   Global Step: 114300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:40,108-Speed 9271.27 samples/sec   Loss 6.9794   LearningRate 0.0432   Epoch: 6   Global Step: 114310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:41,195-Speed 9430.17 samples/sec   Loss 6.7242   LearningRate 0.0432   Epoch: 6   Global Step: 114320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:42,329-Speed 9034.02 samples/sec   Loss 6.8284   LearningRate 0.0432   Epoch: 6   Global Step: 114330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:43,365-Speed 9896.16 samples/sec   Loss 6.7879   LearningRate 0.0432   Epoch: 6   Global Step: 114340   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:08:44,436-Speed 9569.54 samples/sec   Loss 6.8012   LearningRate 0.0432   Epoch: 6   Global Step: 114350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:45,511-Speed 9530.31 samples/sec   Loss 6.7594   LearningRate 0.0432   Epoch: 6   Global Step: 114360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:46,605-Speed 9357.89 samples/sec   Loss 6.8420   LearningRate 0.0432   Epoch: 6   Global Step: 114370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:47,733-Speed 9084.82 samples/sec   Loss 6.8703   LearningRate 0.0432   Epoch: 6   Global Step: 114380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:48,796-Speed 9641.11 samples/sec   Loss 6.8474   LearningRate 0.0432   Epoch: 6   Global Step: 114390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:49,874-Speed 9500.65 samples/sec   Loss 6.8054   LearningRate 0.0432   Epoch: 6   Global Step: 114400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:50,949-Speed 9534.99 samples/sec   Loss 6.8133   LearningRate 0.0432   Epoch: 6   Global Step: 114410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:52,029-Speed 9482.10 samples/sec   Loss 6.8045   LearningRate 0.0432   Epoch: 6   Global Step: 114420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:53,083-Speed 9725.16 samples/sec   Loss 6.9039   LearningRate 0.0432   Epoch: 6   Global Step: 114430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:54,177-Speed 9371.20 samples/sec   Loss 6.8645   LearningRate 0.0432   Epoch: 6   Global Step: 114440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:55,241-Speed 9633.05 samples/sec   Loss 6.9066   LearningRate 0.0432   Epoch: 6   Global Step: 114450   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:08:56,314-Speed 9552.32 samples/sec   Loss 6.7391   LearningRate 0.0432   Epoch: 6   Global Step: 114460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:57,383-Speed 9582.76 samples/sec   Loss 6.8543   LearningRate 0.0432   Epoch: 6   Global Step: 114470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:58,486-Speed 9293.47 samples/sec   Loss 6.7097   LearningRate 0.0432   Epoch: 6   Global Step: 114480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:08:59,541-Speed 9709.50 samples/sec   Loss 6.8443   LearningRate 0.0432   Epoch: 6   Global Step: 114490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:00,564-Speed 10015.32 samples/sec   Loss 6.9483   LearningRate 0.0432   Epoch: 6   Global Step: 114500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:01,625-Speed 9654.83 samples/sec   Loss 6.8726   LearningRate 0.0432   Epoch: 6   Global Step: 114510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:02,723-Speed 9338.39 samples/sec   Loss 6.7328   LearningRate 0.0432   Epoch: 6   Global Step: 114520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:03,761-Speed 9864.10 samples/sec   Loss 6.7743   LearningRate 0.0432   Epoch: 6   Global Step: 114530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:04,826-Speed 9624.88 samples/sec   Loss 6.7984   LearningRate 0.0431   Epoch: 6   Global Step: 114540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:05,856-Speed 9943.85 samples/sec   Loss 6.8302   LearningRate 0.0431   Epoch: 6   Global Step: 114550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:06,953-Speed 9347.30 samples/sec   Loss 6.7051   LearningRate 0.0431   Epoch: 6   Global Step: 114560   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:09:08,049-Speed 9348.62 samples/sec   Loss 6.7488   LearningRate 0.0431   Epoch: 6   Global Step: 114570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:09,159-Speed 9230.35 samples/sec   Loss 6.7684   LearningRate 0.0431   Epoch: 6   Global Step: 114580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:10,225-Speed 9603.86 samples/sec   Loss 6.7623   LearningRate 0.0431   Epoch: 6   Global Step: 114590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:11,298-Speed 9552.64 samples/sec   Loss 6.7194   LearningRate 0.0431   Epoch: 6   Global Step: 114600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:12,425-Speed 9092.14 samples/sec   Loss 6.8962   LearningRate 0.0431   Epoch: 6   Global Step: 114610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:13,553-Speed 9080.20 samples/sec   Loss 6.7582   LearningRate 0.0431   Epoch: 6   Global Step: 114620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:14,671-Speed 9170.73 samples/sec   Loss 6.9024   LearningRate 0.0431   Epoch: 6   Global Step: 114630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:15,754-Speed 9464.83 samples/sec   Loss 6.8303   LearningRate 0.0431   Epoch: 6   Global Step: 114640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:09:16,840-Speed 9430.22 samples/sec   Loss 6.9189   LearningRate 0.0431   Epoch: 6   Global Step: 114650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:09:17,921-Speed 9481.82 samples/sec   Loss 6.8294   LearningRate 0.0431   Epoch: 6   Global Step: 114660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:09:18,956-Speed 9891.75 samples/sec   Loss 6.9234   LearningRate 0.0431   Epoch: 6   Global Step: 114670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:09:20,031-Speed 9537.42 samples/sec   Loss 6.8185   LearningRate 0.0431   Epoch: 6   Global Step: 114680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:09:21,087-Speed 9696.47 samples/sec   Loss 6.7062   LearningRate 0.0431   Epoch: 6   Global Step: 114690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:09:22,130-Speed 9821.46 samples/sec   Loss 6.8912   LearningRate 0.0431   Epoch: 6   Global Step: 114700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:09:23,273-Speed 8963.35 samples/sec   Loss 6.8557   LearningRate 0.0431   Epoch: 6   Global Step: 114710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:09:24,314-Speed 9853.62 samples/sec   Loss 6.8634   LearningRate 0.0431   Epoch: 6   Global Step: 114720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:09:25,378-Speed 9624.79 samples/sec   Loss 6.7731   LearningRate 0.0431   Epoch: 6   Global Step: 114730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:09:26,427-Speed 9763.99 samples/sec   Loss 6.7786   LearningRate 0.0431   Epoch: 6   Global Step: 114740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:27,528-Speed 9310.92 samples/sec   Loss 6.9691   LearningRate 0.0431   Epoch: 6   Global Step: 114750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:28,625-Speed 9338.92 samples/sec   Loss 6.7982   LearningRate 0.0431   Epoch: 6   Global Step: 114760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:29,716-Speed 9392.96 samples/sec   Loss 6.8003   LearningRate 0.0431   Epoch: 6   Global Step: 114770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:30,782-Speed 9612.06 samples/sec   Loss 6.8139   LearningRate 0.0431   Epoch: 6   Global Step: 114780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:31,866-Speed 9449.86 samples/sec   Loss 6.8156   LearningRate 0.0431   Epoch: 6   Global Step: 114790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:33,020-Speed 8882.27 samples/sec   Loss 6.8522   LearningRate 0.0430   Epoch: 6   Global Step: 114800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:34,117-Speed 9338.68 samples/sec   Loss 6.9242   LearningRate 0.0430   Epoch: 6   Global Step: 114810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:35,157-Speed 9854.45 samples/sec   Loss 6.8680   LearningRate 0.0430   Epoch: 6   Global Step: 114820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:36,249-Speed 9379.44 samples/sec   Loss 6.8603   LearningRate 0.0430   Epoch: 6   Global Step: 114830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:37,342-Speed 9376.88 samples/sec   Loss 6.7872   LearningRate 0.0430   Epoch: 6   Global Step: 114840   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:09:38,440-Speed 9332.66 samples/sec   Loss 6.8249   LearningRate 0.0430   Epoch: 6   Global Step: 114850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:39,526-Speed 9436.59 samples/sec   Loss 6.8222   LearningRate 0.0430   Epoch: 6   Global Step: 114860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:40,594-Speed 9588.76 samples/sec   Loss 6.7536   LearningRate 0.0430   Epoch: 6   Global Step: 114870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:41,674-Speed 9486.89 samples/sec   Loss 6.9170   LearningRate 0.0430   Epoch: 6   Global Step: 114880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:42,731-Speed 9697.27 samples/sec   Loss 6.9111   LearningRate 0.0430   Epoch: 6   Global Step: 114890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:43,836-Speed 9278.48 samples/sec   Loss 6.7876   LearningRate 0.0430   Epoch: 6   Global Step: 114900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:44,929-Speed 9374.27 samples/sec   Loss 6.8270   LearningRate 0.0430   Epoch: 6   Global Step: 114910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:45,978-Speed 9760.78 samples/sec   Loss 6.9490   LearningRate 0.0430   Epoch: 6   Global Step: 114920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:47,085-Speed 9258.88 samples/sec   Loss 6.7741   LearningRate 0.0430   Epoch: 6   Global Step: 114930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:48,160-Speed 9527.39 samples/sec   Loss 6.8884   LearningRate 0.0430   Epoch: 6   Global Step: 114940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:49,261-Speed 9309.31 samples/sec   Loss 6.7446   LearningRate 0.0430   Epoch: 6   Global Step: 114950   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:09:50,337-Speed 9519.62 samples/sec   Loss 6.8093   LearningRate 0.0430   Epoch: 6   Global Step: 114960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:51,380-Speed 9827.82 samples/sec   Loss 6.8647   LearningRate 0.0430   Epoch: 6   Global Step: 114970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:52,462-Speed 9468.14 samples/sec   Loss 6.8217   LearningRate 0.0430   Epoch: 6   Global Step: 114980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:53,550-Speed 9418.99 samples/sec   Loss 6.7459   LearningRate 0.0430   Epoch: 6   Global Step: 114990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:54,639-Speed 9415.48 samples/sec   Loss 6.9354   LearningRate 0.0430   Epoch: 6   Global Step: 115000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:55,751-Speed 9215.77 samples/sec   Loss 6.7554   LearningRate 0.0430   Epoch: 6   Global Step: 115010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:56,818-Speed 9596.40 samples/sec   Loss 6.8512   LearningRate 0.0430   Epoch: 6   Global Step: 115020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:57,916-Speed 9334.62 samples/sec   Loss 6.8720   LearningRate 0.0430   Epoch: 6   Global Step: 115030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:09:58,970-Speed 9724.04 samples/sec   Loss 6.9151   LearningRate 0.0430   Epoch: 6   Global Step: 115040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:00,066-Speed 9344.40 samples/sec   Loss 6.6992   LearningRate 0.0429   Epoch: 6   Global Step: 115050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:01,147-Speed 9479.80 samples/sec   Loss 6.8705   LearningRate 0.0429   Epoch: 6   Global Step: 115060   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:10:02,256-Speed 9244.73 samples/sec   Loss 6.8015   LearningRate 0.0429   Epoch: 6   Global Step: 115070   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:10:03,345-Speed 9404.73 samples/sec   Loss 6.8478   LearningRate 0.0429   Epoch: 6   Global Step: 115080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:04,415-Speed 9573.96 samples/sec   Loss 6.7789   LearningRate 0.0429   Epoch: 6   Global Step: 115090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:05,451-Speed 9888.22 samples/sec   Loss 6.8797   LearningRate 0.0429   Epoch: 6   Global Step: 115100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:06,545-Speed 9368.29 samples/sec   Loss 6.8074   LearningRate 0.0429   Epoch: 6   Global Step: 115110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:07,618-Speed 9546.58 samples/sec   Loss 6.9552   LearningRate 0.0429   Epoch: 6   Global Step: 115120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:08,704-Speed 9441.09 samples/sec   Loss 6.8958   LearningRate 0.0429   Epoch: 6   Global Step: 115130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:09,754-Speed 9756.17 samples/sec   Loss 6.8729   LearningRate 0.0429   Epoch: 6   Global Step: 115140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:10,841-Speed 9424.07 samples/sec   Loss 6.9175   LearningRate 0.0429   Epoch: 6   Global Step: 115150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:11,887-Speed 9802.14 samples/sec   Loss 6.7839   LearningRate 0.0429   Epoch: 6   Global Step: 115160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:12,947-Speed 9666.18 samples/sec   Loss 6.8077   LearningRate 0.0429   Epoch: 6   Global Step: 115170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:14,015-Speed 9594.99 samples/sec   Loss 6.8501   LearningRate 0.0429   Epoch: 6   Global Step: 115180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:15,085-Speed 9573.79 samples/sec   Loss 6.9267   LearningRate 0.0429   Epoch: 6   Global Step: 115190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:16,140-Speed 9707.46 samples/sec   Loss 6.8166   LearningRate 0.0429   Epoch: 6   Global Step: 115200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:17,233-Speed 9373.83 samples/sec   Loss 6.9076   LearningRate 0.0429   Epoch: 6   Global Step: 115210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:18,327-Speed 9371.06 samples/sec   Loss 6.7706   LearningRate 0.0429   Epoch: 6   Global Step: 115220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:19,439-Speed 9212.55 samples/sec   Loss 6.9051   LearningRate 0.0429   Epoch: 6   Global Step: 115230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:20,549-Speed 9226.65 samples/sec   Loss 6.8433   LearningRate 0.0429   Epoch: 6   Global Step: 115240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:21,643-Speed 9370.74 samples/sec   Loss 6.8427   LearningRate 0.0429   Epoch: 6   Global Step: 115250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:22,715-Speed 9556.92 samples/sec   Loss 6.7806   LearningRate 0.0429   Epoch: 6   Global Step: 115260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:23,806-Speed 9391.42 samples/sec   Loss 6.7312   LearningRate 0.0429   Epoch: 6   Global Step: 115270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:24,912-Speed 9265.75 samples/sec   Loss 6.9263   LearningRate 0.0429   Epoch: 6   Global Step: 115280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:26,013-Speed 9305.89 samples/sec   Loss 6.8553   LearningRate 0.0429   Epoch: 6   Global Step: 115290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:27,119-Speed 9263.17 samples/sec   Loss 6.8585   LearningRate 0.0429   Epoch: 6   Global Step: 115300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:28,216-Speed 9340.33 samples/sec   Loss 6.8225   LearningRate 0.0428   Epoch: 6   Global Step: 115310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:29,280-Speed 9637.27 samples/sec   Loss 6.7790   LearningRate 0.0428   Epoch: 6   Global Step: 115320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:30,359-Speed 9499.10 samples/sec   Loss 6.8635   LearningRate 0.0428   Epoch: 6   Global Step: 115330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:31,429-Speed 9570.36 samples/sec   Loss 6.7168   LearningRate 0.0428   Epoch: 6   Global Step: 115340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:32,497-Speed 9596.84 samples/sec   Loss 6.8169   LearningRate 0.0428   Epoch: 6   Global Step: 115350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:33,603-Speed 9259.87 samples/sec   Loss 6.7033   LearningRate 0.0428   Epoch: 6   Global Step: 115360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:34,701-Speed 9335.07 samples/sec   Loss 6.6931   LearningRate 0.0428   Epoch: 6   Global Step: 115370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:35,773-Speed 9557.24 samples/sec   Loss 6.7402   LearningRate 0.0428   Epoch: 6   Global Step: 115380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:36,810-Speed 9874.23 samples/sec   Loss 6.8411   LearningRate 0.0428   Epoch: 6   Global Step: 115390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:37,905-Speed 9355.92 samples/sec   Loss 6.8366   LearningRate 0.0428   Epoch: 6   Global Step: 115400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:38,982-Speed 9518.41 samples/sec   Loss 6.7936   LearningRate 0.0428   Epoch: 6   Global Step: 115410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:40,038-Speed 9698.47 samples/sec   Loss 6.7845   LearningRate 0.0428   Epoch: 6   Global Step: 115420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:41,115-Speed 9511.71 samples/sec   Loss 6.7936   LearningRate 0.0428   Epoch: 6   Global Step: 115430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:42,212-Speed 9346.10 samples/sec   Loss 6.9250   LearningRate 0.0428   Epoch: 6   Global Step: 115440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:43,305-Speed 9376.18 samples/sec   Loss 6.8850   LearningRate 0.0428   Epoch: 6   Global Step: 115450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:44,405-Speed 9313.06 samples/sec   Loss 6.9016   LearningRate 0.0428   Epoch: 6   Global Step: 115460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:45,498-Speed 9369.77 samples/sec   Loss 6.7718   LearningRate 0.0428   Epoch: 6   Global Step: 115470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:46,568-Speed 9574.49 samples/sec   Loss 6.7564   LearningRate 0.0428   Epoch: 6   Global Step: 115480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:47,659-Speed 9395.83 samples/sec   Loss 6.7364   LearningRate 0.0428   Epoch: 6   Global Step: 115490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:48,740-Speed 9480.53 samples/sec   Loss 6.7965   LearningRate 0.0428   Epoch: 6   Global Step: 115500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:49,786-Speed 9793.03 samples/sec   Loss 6.7757   LearningRate 0.0428   Epoch: 6   Global Step: 115510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:10:50,863-Speed 9517.87 samples/sec   Loss 6.8439   LearningRate 0.0428   Epoch: 6   Global Step: 115520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:51,931-Speed 9592.90 samples/sec   Loss 6.8330   LearningRate 0.0428   Epoch: 6   Global Step: 115530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:53,018-Speed 9419.23 samples/sec   Loss 6.7661   LearningRate 0.0428   Epoch: 6   Global Step: 115540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:54,118-Speed 9321.20 samples/sec   Loss 6.7741   LearningRate 0.0428   Epoch: 6   Global Step: 115550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:55,167-Speed 9768.52 samples/sec   Loss 6.7646   LearningRate 0.0427   Epoch: 6   Global Step: 115560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:56,278-Speed 9217.05 samples/sec   Loss 6.7555   LearningRate 0.0427   Epoch: 6   Global Step: 115570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:57,363-Speed 9447.03 samples/sec   Loss 6.7469   LearningRate 0.0427   Epoch: 6   Global Step: 115580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:58,452-Speed 9409.42 samples/sec   Loss 6.6978   LearningRate 0.0427   Epoch: 6   Global Step: 115590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:10:59,571-Speed 9149.82 samples/sec   Loss 6.6912   LearningRate 0.0427   Epoch: 6   Global Step: 115600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:00,687-Speed 9190.99 samples/sec   Loss 6.8460   LearningRate 0.0427   Epoch: 6   Global Step: 115610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:01,754-Speed 9600.05 samples/sec   Loss 6.9192   LearningRate 0.0427   Epoch: 6   Global Step: 115620   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:11:02,815-Speed 9651.20 samples/sec   Loss 6.7971   LearningRate 0.0427   Epoch: 6   Global Step: 115630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:03,889-Speed 9540.08 samples/sec   Loss 6.8499   LearningRate 0.0427   Epoch: 6   Global Step: 115640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:04,977-Speed 9419.23 samples/sec   Loss 6.8341   LearningRate 0.0427   Epoch: 6   Global Step: 115650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:06,035-Speed 9687.64 samples/sec   Loss 6.8897   LearningRate 0.0427   Epoch: 6   Global Step: 115660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:07,128-Speed 9370.88 samples/sec   Loss 6.6629   LearningRate 0.0427   Epoch: 6   Global Step: 115670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:08,214-Speed 9440.22 samples/sec   Loss 6.7278   LearningRate 0.0427   Epoch: 6   Global Step: 115680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:09,316-Speed 9295.48 samples/sec   Loss 6.8444   LearningRate 0.0427   Epoch: 6   Global Step: 115690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:10,374-Speed 9678.98 samples/sec   Loss 6.8390   LearningRate 0.0427   Epoch: 6   Global Step: 115700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:11,494-Speed 9154.89 samples/sec   Loss 6.8464   LearningRate 0.0427   Epoch: 6   Global Step: 115710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:12,577-Speed 9461.93 samples/sec   Loss 6.9165   LearningRate 0.0427   Epoch: 6   Global Step: 115720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:13,625-Speed 9782.60 samples/sec   Loss 6.7373   LearningRate 0.0427   Epoch: 6   Global Step: 115730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:14,691-Speed 9604.62 samples/sec   Loss 6.7264   LearningRate 0.0427   Epoch: 6   Global Step: 115740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:15,778-Speed 9429.06 samples/sec   Loss 6.8681   LearningRate 0.0427   Epoch: 6   Global Step: 115750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:16,847-Speed 9584.02 samples/sec   Loss 6.8914   LearningRate 0.0427   Epoch: 6   Global Step: 115760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:17,932-Speed 9442.05 samples/sec   Loss 6.8369   LearningRate 0.0427   Epoch: 6   Global Step: 115770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:19,014-Speed 9471.98 samples/sec   Loss 6.8252   LearningRate 0.0427   Epoch: 6   Global Step: 115780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:20,067-Speed 9732.03 samples/sec   Loss 6.9060   LearningRate 0.0427   Epoch: 6   Global Step: 115790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:21,131-Speed 9628.49 samples/sec   Loss 6.9062   LearningRate 0.0427   Epoch: 6   Global Step: 115800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:22,204-Speed 9551.14 samples/sec   Loss 6.8881   LearningRate 0.0427   Epoch: 6   Global Step: 115810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:23,310-Speed 9257.35 samples/sec   Loss 6.7932   LearningRate 0.0426   Epoch: 6   Global Step: 115820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:24,394-Speed 9459.75 samples/sec   Loss 6.7137   LearningRate 0.0426   Epoch: 6   Global Step: 115830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:11:25,512-Speed 9161.81 samples/sec   Loss 6.7871   LearningRate 0.0426   Epoch: 6   Global Step: 115840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:11:26,603-Speed 9398.20 samples/sec   Loss 6.7238   LearningRate 0.0426   Epoch: 6   Global Step: 115850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:11:27,692-Speed 9402.89 samples/sec   Loss 6.7144   LearningRate 0.0426   Epoch: 6   Global Step: 115860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:11:28,744-Speed 9743.83 samples/sec   Loss 6.8919   LearningRate 0.0426   Epoch: 6   Global Step: 115870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:11:29,799-Speed 9710.15 samples/sec   Loss 6.8565   LearningRate 0.0426   Epoch: 6   Global Step: 115880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:11:30,849-Speed 9759.98 samples/sec   Loss 6.7709   LearningRate 0.0426   Epoch: 6   Global Step: 115890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:11:31,941-Speed 9385.67 samples/sec   Loss 6.7633   LearningRate 0.0426   Epoch: 6   Global Step: 115900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:11:33,027-Speed 9432.21 samples/sec   Loss 6.7334   LearningRate 0.0426   Epoch: 6   Global Step: 115910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:11:34,114-Speed 9423.40 samples/sec   Loss 6.8907   LearningRate 0.0426   Epoch: 6   Global Step: 115920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:11:35,154-Speed 9853.62 samples/sec   Loss 6.8744   LearningRate 0.0426   Epoch: 6   Global Step: 115930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:36,224-Speed 9571.88 samples/sec   Loss 6.8606   LearningRate 0.0426   Epoch: 6   Global Step: 115940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:37,278-Speed 9721.75 samples/sec   Loss 6.7090   LearningRate 0.0426   Epoch: 6   Global Step: 115950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:38,357-Speed 9501.31 samples/sec   Loss 6.7716   LearningRate 0.0426   Epoch: 6   Global Step: 115960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:39,491-Speed 9034.88 samples/sec   Loss 6.8229   LearningRate 0.0426   Epoch: 6   Global Step: 115970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:11:40,569-Speed 9502.55 samples/sec   Loss 6.8684   LearningRate 0.0426   Epoch: 6   Global Step: 115980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:11:41,661-Speed 9380.31 samples/sec   Loss 6.8963   LearningRate 0.0426   Epoch: 6   Global Step: 115990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:11:42,780-Speed 9160.94 samples/sec   Loss 6.7564   LearningRate 0.0426   Epoch: 6   Global Step: 116000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:12:04,860-[lfw][116000]XNorm: 10.876187
Training: 2022-04-11 16:12:04,861-[lfw][116000]Accuracy-Flip: 0.99617+-0.00259
Training: 2022-04-11 16:12:04,861-[lfw][116000]Accuracy-Highest: 0.99683
Training: 2022-04-11 16:12:30,405-[cfp_fp][116000]XNorm: 9.313619
Training: 2022-04-11 16:12:30,406-[cfp_fp][116000]Accuracy-Flip: 0.95600+-0.00733
Training: 2022-04-11 16:12:30,406-[cfp_fp][116000]Accuracy-Highest: 0.96157
Training: 2022-04-11 16:12:52,542-[agedb_30][116000]XNorm: 10.510777
Training: 2022-04-11 16:12:52,542-[agedb_30][116000]Accuracy-Flip: 0.96383+-0.00943
Training: 2022-04-11 16:12:52,543-[agedb_30][116000]Accuracy-Highest: 0.96483
Training: 2022-04-11 16:12:53,663-Speed 144.46 samples/sec   Loss 6.8444   LearningRate 0.0426   Epoch: 6   Global Step: 116010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:12:54,755-Speed 9392.65 samples/sec   Loss 6.8382   LearningRate 0.0426   Epoch: 6   Global Step: 116020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:12:55,826-Speed 9562.53 samples/sec   Loss 6.8632   LearningRate 0.0426   Epoch: 6   Global Step: 116030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:12:56,869-Speed 9827.05 samples/sec   Loss 6.8440   LearningRate 0.0426   Epoch: 6   Global Step: 116040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:12:57,931-Speed 9645.01 samples/sec   Loss 6.8923   LearningRate 0.0426   Epoch: 6   Global Step: 116050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:12:59,028-Speed 9341.83 samples/sec   Loss 6.7904   LearningRate 0.0426   Epoch: 6   Global Step: 116060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:13:00,116-Speed 9416.75 samples/sec   Loss 6.8433   LearningRate 0.0425   Epoch: 6   Global Step: 116070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:13:01,196-Speed 9493.99 samples/sec   Loss 6.7977   LearningRate 0.0425   Epoch: 6   Global Step: 116080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:02,277-Speed 9473.71 samples/sec   Loss 6.7539   LearningRate 0.0425   Epoch: 6   Global Step: 116090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:03,363-Speed 9434.98 samples/sec   Loss 6.7922   LearningRate 0.0425   Epoch: 6   Global Step: 116100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:04,412-Speed 9762.45 samples/sec   Loss 6.8160   LearningRate 0.0425   Epoch: 6   Global Step: 116110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:05,462-Speed 9763.84 samples/sec   Loss 6.8039   LearningRate 0.0425   Epoch: 6   Global Step: 116120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:06,522-Speed 9665.60 samples/sec   Loss 6.7647   LearningRate 0.0425   Epoch: 6   Global Step: 116130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:07,632-Speed 9228.88 samples/sec   Loss 6.7579   LearningRate 0.0425   Epoch: 6   Global Step: 116140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:08,785-Speed 8883.24 samples/sec   Loss 6.8356   LearningRate 0.0425   Epoch: 6   Global Step: 116150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:09,895-Speed 9230.22 samples/sec   Loss 6.6386   LearningRate 0.0425   Epoch: 6   Global Step: 116160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:10,962-Speed 9602.10 samples/sec   Loss 6.8029   LearningRate 0.0425   Epoch: 6   Global Step: 116170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:12,059-Speed 9338.85 samples/sec   Loss 6.8606   LearningRate 0.0425   Epoch: 6   Global Step: 116180   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:13:13,122-Speed 9642.41 samples/sec   Loss 6.7721   LearningRate 0.0425   Epoch: 6   Global Step: 116190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:14,160-Speed 9871.88 samples/sec   Loss 6.8503   LearningRate 0.0425   Epoch: 6   Global Step: 116200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:15,212-Speed 9741.05 samples/sec   Loss 6.9138   LearningRate 0.0425   Epoch: 6   Global Step: 116210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:16,308-Speed 9347.64 samples/sec   Loss 6.7850   LearningRate 0.0425   Epoch: 6   Global Step: 116220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:17,377-Speed 9587.79 samples/sec   Loss 6.7372   LearningRate 0.0425   Epoch: 6   Global Step: 116230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:18,446-Speed 9577.35 samples/sec   Loss 6.7639   LearningRate 0.0425   Epoch: 6   Global Step: 116240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:19,566-Speed 9153.38 samples/sec   Loss 6.6945   LearningRate 0.0425   Epoch: 6   Global Step: 116250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:20,639-Speed 9545.99 samples/sec   Loss 6.7828   LearningRate 0.0425   Epoch: 6   Global Step: 116260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:21,693-Speed 9721.77 samples/sec   Loss 6.7243   LearningRate 0.0425   Epoch: 6   Global Step: 116270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:22,830-Speed 9009.17 samples/sec   Loss 6.8677   LearningRate 0.0425   Epoch: 6   Global Step: 116280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:23,901-Speed 9569.47 samples/sec   Loss 6.7213   LearningRate 0.0425   Epoch: 6   Global Step: 116290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:24,979-Speed 9509.19 samples/sec   Loss 6.8007   LearningRate 0.0425   Epoch: 6   Global Step: 116300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:26,016-Speed 9873.99 samples/sec   Loss 6.7794   LearningRate 0.0425   Epoch: 6   Global Step: 116310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:27,086-Speed 9581.62 samples/sec   Loss 6.7108   LearningRate 0.0425   Epoch: 6   Global Step: 116320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:28,155-Speed 9579.34 samples/sec   Loss 6.7069   LearningRate 0.0424   Epoch: 6   Global Step: 116330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:29,217-Speed 9654.87 samples/sec   Loss 6.7858   LearningRate 0.0424   Epoch: 6   Global Step: 116340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:30,294-Speed 9507.71 samples/sec   Loss 6.6951   LearningRate 0.0424   Epoch: 6   Global Step: 116350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:31,395-Speed 9316.02 samples/sec   Loss 6.8288   LearningRate 0.0424   Epoch: 6   Global Step: 116360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:32,475-Speed 9480.19 samples/sec   Loss 6.8647   LearningRate 0.0424   Epoch: 6   Global Step: 116370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:33,564-Speed 9411.25 samples/sec   Loss 6.8995   LearningRate 0.0424   Epoch: 6   Global Step: 116380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:34,621-Speed 9694.87 samples/sec   Loss 6.7678   LearningRate 0.0424   Epoch: 6   Global Step: 116390   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:13:35,720-Speed 9319.35 samples/sec   Loss 6.8917   LearningRate 0.0424   Epoch: 6   Global Step: 116400   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:13:36,813-Speed 9377.93 samples/sec   Loss 6.8212   LearningRate 0.0424   Epoch: 6   Global Step: 116410   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:13:37,875-Speed 9642.99 samples/sec   Loss 6.8312   LearningRate 0.0424   Epoch: 6   Global Step: 116420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:38,953-Speed 9505.83 samples/sec   Loss 6.7818   LearningRate 0.0424   Epoch: 6   Global Step: 116430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:40,065-Speed 9211.66 samples/sec   Loss 6.7728   LearningRate 0.0424   Epoch: 6   Global Step: 116440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:41,153-Speed 9419.99 samples/sec   Loss 6.8107   LearningRate 0.0424   Epoch: 6   Global Step: 116450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:42,222-Speed 9592.73 samples/sec   Loss 6.6950   LearningRate 0.0424   Epoch: 6   Global Step: 116460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:43,306-Speed 9451.26 samples/sec   Loss 6.8388   LearningRate 0.0424   Epoch: 6   Global Step: 116470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:44,392-Speed 9445.09 samples/sec   Loss 6.8057   LearningRate 0.0424   Epoch: 6   Global Step: 116480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:45,479-Speed 9419.77 samples/sec   Loss 6.7455   LearningRate 0.0424   Epoch: 6   Global Step: 116490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:46,526-Speed 9792.74 samples/sec   Loss 6.7653   LearningRate 0.0424   Epoch: 6   Global Step: 116500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:47,560-Speed 9902.88 samples/sec   Loss 6.8762   LearningRate 0.0424   Epoch: 6   Global Step: 116510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:13:48,604-Speed 9820.98 samples/sec   Loss 6.7240   LearningRate 0.0424   Epoch: 6   Global Step: 116520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:13:49,640-Speed 9892.86 samples/sec   Loss 6.7671   LearningRate 0.0424   Epoch: 6   Global Step: 116530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:13:50,695-Speed 9709.54 samples/sec   Loss 6.8118   LearningRate 0.0424   Epoch: 6   Global Step: 116540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:13:51,767-Speed 9553.80 samples/sec   Loss 6.8849   LearningRate 0.0424   Epoch: 6   Global Step: 116550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:13:52,875-Speed 9245.54 samples/sec   Loss 6.7546   LearningRate 0.0424   Epoch: 6   Global Step: 116560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:13:53,950-Speed 9530.65 samples/sec   Loss 6.7700   LearningRate 0.0424   Epoch: 6   Global Step: 116570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:13:55,036-Speed 9439.73 samples/sec   Loss 6.7335   LearningRate 0.0424   Epoch: 6   Global Step: 116580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:13:56,124-Speed 9416.81 samples/sec   Loss 6.8869   LearningRate 0.0423   Epoch: 6   Global Step: 116590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:13:57,183-Speed 9675.22 samples/sec   Loss 6.9086   LearningRate 0.0423   Epoch: 6   Global Step: 116600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:13:58,285-Speed 9302.22 samples/sec   Loss 6.7956   LearningRate 0.0423   Epoch: 6   Global Step: 116610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:13:59,373-Speed 9415.48 samples/sec   Loss 6.7880   LearningRate 0.0423   Epoch: 6   Global Step: 116620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:14:00,448-Speed 9529.61 samples/sec   Loss 6.8139   LearningRate 0.0423   Epoch: 6   Global Step: 116630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:14:01,541-Speed 9380.95 samples/sec   Loss 6.8098   LearningRate 0.0423   Epoch: 6   Global Step: 116640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:14:02,599-Speed 9684.85 samples/sec   Loss 6.8755   LearningRate 0.0423   Epoch: 6   Global Step: 116650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:14:03,657-Speed 9678.23 samples/sec   Loss 6.8018   LearningRate 0.0423   Epoch: 6   Global Step: 116660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:14:04,720-Speed 9638.28 samples/sec   Loss 6.7296   LearningRate 0.0423   Epoch: 6   Global Step: 116670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:14:05,789-Speed 9589.94 samples/sec   Loss 6.8653   LearningRate 0.0423   Epoch: 6   Global Step: 116680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:14:06,829-Speed 9852.52 samples/sec   Loss 6.8439   LearningRate 0.0423   Epoch: 6   Global Step: 116690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:14:07,918-Speed 9409.70 samples/sec   Loss 6.7978   LearningRate 0.0423   Epoch: 6   Global Step: 116700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:14:08,975-Speed 9694.38 samples/sec   Loss 6.7868   LearningRate 0.0423   Epoch: 6   Global Step: 116710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:14:10,070-Speed 9353.43 samples/sec   Loss 6.7396   LearningRate 0.0423   Epoch: 6   Global Step: 116720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:14:11,133-Speed 9640.03 samples/sec   Loss 6.7261   LearningRate 0.0423   Epoch: 6   Global Step: 116730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:14:12,215-Speed 9466.58 samples/sec   Loss 6.6966   LearningRate 0.0423   Epoch: 6   Global Step: 116740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:14:13,306-Speed 9396.24 samples/sec   Loss 6.8243   LearningRate 0.0423   Epoch: 6   Global Step: 116750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:14:14,351-Speed 9804.07 samples/sec   Loss 6.8759   LearningRate 0.0423   Epoch: 6   Global Step: 116760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:14:15,422-Speed 9571.70 samples/sec   Loss 6.8180   LearningRate 0.0423   Epoch: 6   Global Step: 116770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:14:16,537-Speed 9185.83 samples/sec   Loss 6.8224   LearningRate 0.0423   Epoch: 6   Global Step: 116780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:14:17,598-Speed 9652.07 samples/sec   Loss 6.8309   LearningRate 0.0423   Epoch: 6   Global Step: 116790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:14:18,683-Speed 9444.00 samples/sec   Loss 6.8324   LearningRate 0.0423   Epoch: 6   Global Step: 116800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:14:19,777-Speed 9368.25 samples/sec   Loss 6.8477   LearningRate 0.0423   Epoch: 6   Global Step: 116810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:14:20,825-Speed 9778.82 samples/sec   Loss 6.7086   LearningRate 0.0423   Epoch: 6   Global Step: 116820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:14:22,138-Speed 7799.20 samples/sec   Loss 6.8485   LearningRate 0.0423   Epoch: 6   Global Step: 116830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:14:59,435-Speed 274.57 samples/sec   Loss 6.4998   LearningRate 0.0422   Epoch: 7   Global Step: 116840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:15:00,528-Speed 9379.20 samples/sec   Loss 6.0574   LearningRate 0.0422   Epoch: 7   Global Step: 116850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:15:01,628-Speed 9312.03 samples/sec   Loss 6.0052   LearningRate 0.0422   Epoch: 7   Global Step: 116860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:15:03,219-Speed 6436.67 samples/sec   Loss 5.9777   LearningRate 0.0422   Epoch: 7   Global Step: 116870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:15:04,522-Speed 7864.47 samples/sec   Loss 6.0200   LearningRate 0.0422   Epoch: 7   Global Step: 116880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:15:05,813-Speed 7933.77 samples/sec   Loss 6.0079   LearningRate 0.0422   Epoch: 7   Global Step: 116890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:15:07,037-Speed 8377.69 samples/sec   Loss 5.9893   LearningRate 0.0422   Epoch: 7   Global Step: 116900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:15:08,135-Speed 9334.63 samples/sec   Loss 6.0173   LearningRate 0.0422   Epoch: 7   Global Step: 116910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:15:09,210-Speed 9524.75 samples/sec   Loss 6.0039   LearningRate 0.0422   Epoch: 7   Global Step: 116920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:15:10,283-Speed 9555.40 samples/sec   Loss 5.9710   LearningRate 0.0422   Epoch: 7   Global Step: 116930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:11,389-Speed 9259.58 samples/sec   Loss 6.0003   LearningRate 0.0422   Epoch: 7   Global Step: 116940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:12,511-Speed 9129.57 samples/sec   Loss 6.0345   LearningRate 0.0422   Epoch: 7   Global Step: 116950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:13,601-Speed 9401.16 samples/sec   Loss 6.0304   LearningRate 0.0422   Epoch: 7   Global Step: 116960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:14,693-Speed 9382.48 samples/sec   Loss 5.8750   LearningRate 0.0422   Epoch: 7   Global Step: 116970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:15,789-Speed 9349.16 samples/sec   Loss 5.9737   LearningRate 0.0422   Epoch: 7   Global Step: 116980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:16,911-Speed 9133.27 samples/sec   Loss 5.9389   LearningRate 0.0422   Epoch: 7   Global Step: 116990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:18,041-Speed 9065.81 samples/sec   Loss 5.9005   LearningRate 0.0422   Epoch: 7   Global Step: 117000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:19,138-Speed 9342.65 samples/sec   Loss 6.0902   LearningRate 0.0422   Epoch: 7   Global Step: 117010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:20,241-Speed 9293.94 samples/sec   Loss 6.1293   LearningRate 0.0422   Epoch: 7   Global Step: 117020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:21,400-Speed 8837.87 samples/sec   Loss 5.9853   LearningRate 0.0422   Epoch: 7   Global Step: 117030   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:15:22,519-Speed 9155.05 samples/sec   Loss 5.9693   LearningRate 0.0422   Epoch: 7   Global Step: 117040   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:15:23,602-Speed 9461.01 samples/sec   Loss 6.1138   LearningRate 0.0422   Epoch: 7   Global Step: 117050   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:15:24,709-Speed 9258.83 samples/sec   Loss 6.0351   LearningRate 0.0422   Epoch: 7   Global Step: 117060   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:15:25,802-Speed 9371.96 samples/sec   Loss 6.0651   LearningRate 0.0422   Epoch: 7   Global Step: 117070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:26,874-Speed 9554.07 samples/sec   Loss 6.0207   LearningRate 0.0422   Epoch: 7   Global Step: 117080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:27,940-Speed 9613.85 samples/sec   Loss 5.9316   LearningRate 0.0422   Epoch: 7   Global Step: 117090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:28,998-Speed 9693.05 samples/sec   Loss 6.0520   LearningRate 0.0421   Epoch: 7   Global Step: 117100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:30,098-Speed 9308.57 samples/sec   Loss 6.1167   LearningRate 0.0421   Epoch: 7   Global Step: 117110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:31,165-Speed 9606.55 samples/sec   Loss 6.0994   LearningRate 0.0421   Epoch: 7   Global Step: 117120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:32,253-Speed 9417.99 samples/sec   Loss 6.0190   LearningRate 0.0421   Epoch: 7   Global Step: 117130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:33,352-Speed 9319.39 samples/sec   Loss 6.0062   LearningRate 0.0421   Epoch: 7   Global Step: 117140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:34,656-Speed 7858.92 samples/sec   Loss 6.0158   LearningRate 0.0421   Epoch: 7   Global Step: 117150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:36,245-Speed 6448.11 samples/sec   Loss 6.0296   LearningRate 0.0421   Epoch: 7   Global Step: 117160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:37,347-Speed 9291.36 samples/sec   Loss 6.0479   LearningRate 0.0421   Epoch: 7   Global Step: 117170   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:15:38,459-Speed 9218.44 samples/sec   Loss 6.0981   LearningRate 0.0421   Epoch: 7   Global Step: 117180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:39,720-Speed 8121.76 samples/sec   Loss 6.0590   LearningRate 0.0421   Epoch: 7   Global Step: 117190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:41,002-Speed 7993.92 samples/sec   Loss 6.1045   LearningRate 0.0421   Epoch: 7   Global Step: 117200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:42,068-Speed 9615.37 samples/sec   Loss 6.0524   LearningRate 0.0421   Epoch: 7   Global Step: 117210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:43,398-Speed 7701.25 samples/sec   Loss 5.9512   LearningRate 0.0421   Epoch: 7   Global Step: 117220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:44,512-Speed 9198.61 samples/sec   Loss 6.0564   LearningRate 0.0421   Epoch: 7   Global Step: 117230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:45,631-Speed 9150.70 samples/sec   Loss 6.1120   LearningRate 0.0421   Epoch: 7   Global Step: 117240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:46,740-Speed 9245.43 samples/sec   Loss 6.0704   LearningRate 0.0421   Epoch: 7   Global Step: 117250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:47,877-Speed 9007.06 samples/sec   Loss 6.1339   LearningRate 0.0421   Epoch: 7   Global Step: 117260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:48,950-Speed 9553.94 samples/sec   Loss 6.0538   LearningRate 0.0421   Epoch: 7   Global Step: 117270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:50,045-Speed 9359.31 samples/sec   Loss 6.1674   LearningRate 0.0421   Epoch: 7   Global Step: 117280   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:15:51,127-Speed 9462.79 samples/sec   Loss 6.0090   LearningRate 0.0421   Epoch: 7   Global Step: 117290   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:15:52,207-Speed 9485.04 samples/sec   Loss 6.0198   LearningRate 0.0421   Epoch: 7   Global Step: 117300   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:15:53,265-Speed 9695.11 samples/sec   Loss 6.1574   LearningRate 0.0421   Epoch: 7   Global Step: 117310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:54,337-Speed 9554.22 samples/sec   Loss 6.0634   LearningRate 0.0421   Epoch: 7   Global Step: 117320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:55,423-Speed 9436.96 samples/sec   Loss 5.9678   LearningRate 0.0421   Epoch: 7   Global Step: 117330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:56,453-Speed 9948.47 samples/sec   Loss 6.0843   LearningRate 0.0421   Epoch: 7   Global Step: 117340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:57,535-Speed 9464.51 samples/sec   Loss 6.0835   LearningRate 0.0421   Epoch: 7   Global Step: 117350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:58,670-Speed 9032.59 samples/sec   Loss 6.1083   LearningRate 0.0420   Epoch: 7   Global Step: 117360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:15:59,742-Speed 9553.88 samples/sec   Loss 6.2062   LearningRate 0.0420   Epoch: 7   Global Step: 117370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:00,844-Speed 9296.74 samples/sec   Loss 6.1722   LearningRate 0.0420   Epoch: 7   Global Step: 117380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:01,932-Speed 9416.52 samples/sec   Loss 6.0758   LearningRate 0.0420   Epoch: 7   Global Step: 117390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:03,044-Speed 9209.52 samples/sec   Loss 6.1955   LearningRate 0.0420   Epoch: 7   Global Step: 117400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:04,086-Speed 9835.50 samples/sec   Loss 6.1082   LearningRate 0.0420   Epoch: 7   Global Step: 117410   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:16:05,152-Speed 9622.02 samples/sec   Loss 6.0335   LearningRate 0.0420   Epoch: 7   Global Step: 117420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:06,243-Speed 9388.92 samples/sec   Loss 6.0622   LearningRate 0.0420   Epoch: 7   Global Step: 117430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:07,331-Speed 9425.07 samples/sec   Loss 6.2117   LearningRate 0.0420   Epoch: 7   Global Step: 117440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:08,392-Speed 9655.90 samples/sec   Loss 6.0471   LearningRate 0.0420   Epoch: 7   Global Step: 117450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:09,488-Speed 9349.52 samples/sec   Loss 6.0847   LearningRate 0.0420   Epoch: 7   Global Step: 117460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:10,587-Speed 9319.73 samples/sec   Loss 6.0726   LearningRate 0.0420   Epoch: 7   Global Step: 117470   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:11,701-Speed 9196.43 samples/sec   Loss 6.0899   LearningRate 0.0420   Epoch: 7   Global Step: 117480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:12,812-Speed 9226.10 samples/sec   Loss 6.0847   LearningRate 0.0420   Epoch: 7   Global Step: 117490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:13,972-Speed 8832.48 samples/sec   Loss 6.1704   LearningRate 0.0420   Epoch: 7   Global Step: 117500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:15,002-Speed 9942.13 samples/sec   Loss 6.2094   LearningRate 0.0420   Epoch: 7   Global Step: 117510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:16,053-Speed 9752.03 samples/sec   Loss 6.1346   LearningRate 0.0420   Epoch: 7   Global Step: 117520   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:16:17,127-Speed 9540.14 samples/sec   Loss 6.1357   LearningRate 0.0420   Epoch: 7   Global Step: 117530   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:16:18,206-Speed 9492.30 samples/sec   Loss 6.1228   LearningRate 0.0420   Epoch: 7   Global Step: 117540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:19,284-Speed 9502.29 samples/sec   Loss 6.1520   LearningRate 0.0420   Epoch: 7   Global Step: 117550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:20,385-Speed 9309.93 samples/sec   Loss 6.1480   LearningRate 0.0420   Epoch: 7   Global Step: 117560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:21,453-Speed 9590.89 samples/sec   Loss 6.0449   LearningRate 0.0420   Epoch: 7   Global Step: 117570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:22,566-Speed 9208.43 samples/sec   Loss 6.0183   LearningRate 0.0420   Epoch: 7   Global Step: 117580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:23,729-Speed 8810.06 samples/sec   Loss 6.2122   LearningRate 0.0420   Epoch: 7   Global Step: 117590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:24,795-Speed 9617.08 samples/sec   Loss 6.2628   LearningRate 0.0420   Epoch: 7   Global Step: 117600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:25,916-Speed 9136.28 samples/sec   Loss 6.0810   LearningRate 0.0419   Epoch: 7   Global Step: 117610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:26,995-Speed 9496.48 samples/sec   Loss 6.1210   LearningRate 0.0419   Epoch: 7   Global Step: 117620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:28,101-Speed 9269.08 samples/sec   Loss 6.1782   LearningRate 0.0419   Epoch: 7   Global Step: 117630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:29,168-Speed 9597.25 samples/sec   Loss 6.1806   LearningRate 0.0419   Epoch: 7   Global Step: 117640   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:30,237-Speed 9585.48 samples/sec   Loss 6.1702   LearningRate 0.0419   Epoch: 7   Global Step: 117650   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:31,281-Speed 9810.08 samples/sec   Loss 6.2138   LearningRate 0.0419   Epoch: 7   Global Step: 117660   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:32,363-Speed 9469.47 samples/sec   Loss 6.2389   LearningRate 0.0419   Epoch: 7   Global Step: 117670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:33,432-Speed 9591.57 samples/sec   Loss 6.1978   LearningRate 0.0419   Epoch: 7   Global Step: 117680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:34,516-Speed 9451.30 samples/sec   Loss 6.1223   LearningRate 0.0419   Epoch: 7   Global Step: 117690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:35,648-Speed 9051.49 samples/sec   Loss 6.2410   LearningRate 0.0419   Epoch: 7   Global Step: 117700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:36,753-Speed 9272.69 samples/sec   Loss 6.2309   LearningRate 0.0419   Epoch: 7   Global Step: 117710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:37,833-Speed 9487.27 samples/sec   Loss 6.1285   LearningRate 0.0419   Epoch: 7   Global Step: 117720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:38,899-Speed 9607.03 samples/sec   Loss 6.2645   LearningRate 0.0419   Epoch: 7   Global Step: 117730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:39,975-Speed 9527.97 samples/sec   Loss 6.1480   LearningRate 0.0419   Epoch: 7   Global Step: 117740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:41,072-Speed 9335.92 samples/sec   Loss 6.2410   LearningRate 0.0419   Epoch: 7   Global Step: 117750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:42,151-Speed 9501.81 samples/sec   Loss 6.2561   LearningRate 0.0419   Epoch: 7   Global Step: 117760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:43,290-Speed 8993.46 samples/sec   Loss 6.0772   LearningRate 0.0419   Epoch: 7   Global Step: 117770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:44,345-Speed 9713.15 samples/sec   Loss 6.1787   LearningRate 0.0419   Epoch: 7   Global Step: 117780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:45,460-Speed 9189.92 samples/sec   Loss 6.0504   LearningRate 0.0419   Epoch: 7   Global Step: 117790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:46,552-Speed 9384.16 samples/sec   Loss 6.2766   LearningRate 0.0419   Epoch: 7   Global Step: 117800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:47,684-Speed 9052.43 samples/sec   Loss 6.1958   LearningRate 0.0419   Epoch: 7   Global Step: 117810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:48,756-Speed 9557.33 samples/sec   Loss 6.2349   LearningRate 0.0419   Epoch: 7   Global Step: 117820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:16:49,835-Speed 9495.75 samples/sec   Loss 6.2629   LearningRate 0.0419   Epoch: 7   Global Step: 117830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:50,891-Speed 9700.54 samples/sec   Loss 6.3197   LearningRate 0.0419   Epoch: 7   Global Step: 117840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:51,967-Speed 9520.00 samples/sec   Loss 6.2808   LearningRate 0.0419   Epoch: 7   Global Step: 117850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:53,051-Speed 9450.42 samples/sec   Loss 6.1966   LearningRate 0.0419   Epoch: 7   Global Step: 117860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:54,133-Speed 9478.07 samples/sec   Loss 6.2641   LearningRate 0.0418   Epoch: 7   Global Step: 117870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:55,220-Speed 9422.11 samples/sec   Loss 6.1671   LearningRate 0.0418   Epoch: 7   Global Step: 117880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:56,308-Speed 9420.30 samples/sec   Loss 6.1902   LearningRate 0.0418   Epoch: 7   Global Step: 117890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:57,392-Speed 9447.43 samples/sec   Loss 6.1230   LearningRate 0.0418   Epoch: 7   Global Step: 117900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:58,486-Speed 9371.21 samples/sec   Loss 6.3059   LearningRate 0.0418   Epoch: 7   Global Step: 117910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:16:59,544-Speed 9675.96 samples/sec   Loss 6.2125   LearningRate 0.0418   Epoch: 7   Global Step: 117920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:17:00,647-Speed 9290.19 samples/sec   Loss 6.2824   LearningRate 0.0418   Epoch: 7   Global Step: 117930   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:17:01,709-Speed 9646.58 samples/sec   Loss 6.3246   LearningRate 0.0418   Epoch: 7   Global Step: 117940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:17:02,808-Speed 9322.73 samples/sec   Loss 6.2580   LearningRate 0.0418   Epoch: 7   Global Step: 117950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:17:03,927-Speed 9156.92 samples/sec   Loss 6.1510   LearningRate 0.0418   Epoch: 7   Global Step: 117960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:17:05,010-Speed 9470.12 samples/sec   Loss 6.1389   LearningRate 0.0418   Epoch: 7   Global Step: 117970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:17:06,105-Speed 9351.75 samples/sec   Loss 6.1614   LearningRate 0.0418   Epoch: 7   Global Step: 117980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:17:07,160-Speed 9711.79 samples/sec   Loss 6.2688   LearningRate 0.0418   Epoch: 7   Global Step: 117990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:17:08,240-Speed 9492.75 samples/sec   Loss 6.1193   LearningRate 0.0418   Epoch: 7   Global Step: 118000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:17:30,314-[lfw][118000]XNorm: 10.867755
Training: 2022-04-11 16:17:30,314-[lfw][118000]Accuracy-Flip: 0.99467+-0.00356
Training: 2022-04-11 16:17:30,314-[lfw][118000]Accuracy-Highest: 0.99683
Training: 2022-04-11 16:17:55,799-[cfp_fp][118000]XNorm: 9.272217
Training: 2022-04-11 16:17:55,800-[cfp_fp][118000]Accuracy-Flip: 0.95943+-0.01125
Training: 2022-04-11 16:17:55,800-[cfp_fp][118000]Accuracy-Highest: 0.96157
Training: 2022-04-11 16:18:17,819-[agedb_30][118000]XNorm: 10.515524
Training: 2022-04-11 16:18:17,819-[agedb_30][118000]Accuracy-Flip: 0.96467+-0.00945
Training: 2022-04-11 16:18:17,820-[agedb_30][118000]Accuracy-Highest: 0.96483
Training: 2022-04-11 16:18:18,920-Speed 144.88 samples/sec   Loss 6.1758   LearningRate 0.0418   Epoch: 7   Global Step: 118010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:20,018-Speed 9332.65 samples/sec   Loss 6.2896   LearningRate 0.0418   Epoch: 7   Global Step: 118020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:21,128-Speed 9233.46 samples/sec   Loss 6.2217   LearningRate 0.0418   Epoch: 7   Global Step: 118030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:22,241-Speed 9201.50 samples/sec   Loss 6.1584   LearningRate 0.0418   Epoch: 7   Global Step: 118040   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:18:23,359-Speed 9169.36 samples/sec   Loss 6.2597   LearningRate 0.0418   Epoch: 7   Global Step: 118050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:24,441-Speed 9472.26 samples/sec   Loss 6.2237   LearningRate 0.0418   Epoch: 7   Global Step: 118060   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:25,546-Speed 9270.18 samples/sec   Loss 6.2117   LearningRate 0.0418   Epoch: 7   Global Step: 118070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:26,663-Speed 9178.44 samples/sec   Loss 6.3328   LearningRate 0.0418   Epoch: 7   Global Step: 118080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:27,726-Speed 9635.85 samples/sec   Loss 6.2365   LearningRate 0.0418   Epoch: 7   Global Step: 118090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:28,814-Speed 9415.95 samples/sec   Loss 6.2805   LearningRate 0.0418   Epoch: 7   Global Step: 118100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:29,916-Speed 9292.41 samples/sec   Loss 6.1732   LearningRate 0.0418   Epoch: 7   Global Step: 118110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:30,998-Speed 9473.80 samples/sec   Loss 6.1801   LearningRate 0.0418   Epoch: 7   Global Step: 118120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:32,080-Speed 9466.97 samples/sec   Loss 6.3021   LearningRate 0.0417   Epoch: 7   Global Step: 118130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:33,167-Speed 9426.84 samples/sec   Loss 6.1693   LearningRate 0.0417   Epoch: 7   Global Step: 118140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:34,234-Speed 9604.47 samples/sec   Loss 6.3366   LearningRate 0.0417   Epoch: 7   Global Step: 118150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:35,337-Speed 9282.87 samples/sec   Loss 6.1924   LearningRate 0.0417   Epoch: 7   Global Step: 118160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:36,433-Speed 9354.46 samples/sec   Loss 6.1564   LearningRate 0.0417   Epoch: 7   Global Step: 118170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:37,539-Speed 9260.54 samples/sec   Loss 6.3019   LearningRate 0.0417   Epoch: 7   Global Step: 118180   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:38,675-Speed 9020.46 samples/sec   Loss 6.2072   LearningRate 0.0417   Epoch: 7   Global Step: 118190   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:39,770-Speed 9361.41 samples/sec   Loss 6.2916   LearningRate 0.0417   Epoch: 7   Global Step: 118200   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:40,826-Speed 9698.79 samples/sec   Loss 6.2068   LearningRate 0.0417   Epoch: 7   Global Step: 118210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:41,920-Speed 9362.82 samples/sec   Loss 6.2365   LearningRate 0.0417   Epoch: 7   Global Step: 118220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:43,024-Speed 9285.64 samples/sec   Loss 6.2982   LearningRate 0.0417   Epoch: 7   Global Step: 118230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:44,105-Speed 9479.33 samples/sec   Loss 6.2435   LearningRate 0.0417   Epoch: 7   Global Step: 118240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:45,141-Speed 9881.60 samples/sec   Loss 6.2418   LearningRate 0.0417   Epoch: 7   Global Step: 118250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:46,194-Speed 9728.90 samples/sec   Loss 6.2794   LearningRate 0.0417   Epoch: 7   Global Step: 118260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:18:47,271-Speed 9518.08 samples/sec   Loss 6.1991   LearningRate 0.0417   Epoch: 7   Global Step: 118270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:18:48,370-Speed 9318.87 samples/sec   Loss 6.2509   LearningRate 0.0417   Epoch: 7   Global Step: 118280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:18:49,454-Speed 9456.41 samples/sec   Loss 6.2790   LearningRate 0.0417   Epoch: 7   Global Step: 118290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:18:50,600-Speed 8943.48 samples/sec   Loss 6.2595   LearningRate 0.0417   Epoch: 7   Global Step: 118300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:18:51,662-Speed 9644.46 samples/sec   Loss 6.3019   LearningRate 0.0417   Epoch: 7   Global Step: 118310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:18:52,756-Speed 9374.01 samples/sec   Loss 6.3041   LearningRate 0.0417   Epoch: 7   Global Step: 118320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:18:53,835-Speed 9493.03 samples/sec   Loss 6.2803   LearningRate 0.0417   Epoch: 7   Global Step: 118330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:18:54,893-Speed 9684.23 samples/sec   Loss 6.3169   LearningRate 0.0417   Epoch: 7   Global Step: 118340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:18:55,952-Speed 9675.01 samples/sec   Loss 6.3493   LearningRate 0.0417   Epoch: 7   Global Step: 118350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:18:57,056-Speed 9279.63 samples/sec   Loss 6.3046   LearningRate 0.0417   Epoch: 7   Global Step: 118360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:58,136-Speed 9481.09 samples/sec   Loss 6.2710   LearningRate 0.0417   Epoch: 7   Global Step: 118370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:18:59,213-Speed 9523.88 samples/sec   Loss 6.3336   LearningRate 0.0417   Epoch: 7   Global Step: 118380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:00,298-Speed 9439.97 samples/sec   Loss 6.1978   LearningRate 0.0416   Epoch: 7   Global Step: 118390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:01,363-Speed 9617.59 samples/sec   Loss 6.2322   LearningRate 0.0416   Epoch: 7   Global Step: 118400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:02,458-Speed 9358.28 samples/sec   Loss 6.2345   LearningRate 0.0416   Epoch: 7   Global Step: 118410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:03,540-Speed 9471.97 samples/sec   Loss 6.1961   LearningRate 0.0416   Epoch: 7   Global Step: 118420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:04,630-Speed 9399.47 samples/sec   Loss 6.2567   LearningRate 0.0416   Epoch: 7   Global Step: 118430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:05,701-Speed 9570.83 samples/sec   Loss 6.3305   LearningRate 0.0416   Epoch: 7   Global Step: 118440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:06,782-Speed 9477.27 samples/sec   Loss 6.2247   LearningRate 0.0416   Epoch: 7   Global Step: 118450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:07,849-Speed 9602.58 samples/sec   Loss 6.3128   LearningRate 0.0416   Epoch: 7   Global Step: 118460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:08,948-Speed 9318.87 samples/sec   Loss 6.3514   LearningRate 0.0416   Epoch: 7   Global Step: 118470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:10,020-Speed 9554.18 samples/sec   Loss 6.2831   LearningRate 0.0416   Epoch: 7   Global Step: 118480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:11,117-Speed 9346.07 samples/sec   Loss 6.2953   LearningRate 0.0416   Epoch: 7   Global Step: 118490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:12,172-Speed 9715.35 samples/sec   Loss 6.1637   LearningRate 0.0416   Epoch: 7   Global Step: 118500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:13,265-Speed 9373.66 samples/sec   Loss 6.3437   LearningRate 0.0416   Epoch: 7   Global Step: 118510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:14,356-Speed 9390.06 samples/sec   Loss 6.2903   LearningRate 0.0416   Epoch: 7   Global Step: 118520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:15,408-Speed 9732.96 samples/sec   Loss 6.4232   LearningRate 0.0416   Epoch: 7   Global Step: 118530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:16,517-Speed 9243.63 samples/sec   Loss 6.1923   LearningRate 0.0416   Epoch: 7   Global Step: 118540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:17,625-Speed 9248.69 samples/sec   Loss 6.2657   LearningRate 0.0416   Epoch: 7   Global Step: 118550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:18,706-Speed 9482.62 samples/sec   Loss 6.2999   LearningRate 0.0416   Epoch: 7   Global Step: 118560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:19,808-Speed 9298.91 samples/sec   Loss 6.3689   LearningRate 0.0416   Epoch: 7   Global Step: 118570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:20,916-Speed 9241.34 samples/sec   Loss 6.3536   LearningRate 0.0416   Epoch: 7   Global Step: 118580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:22,007-Speed 9390.00 samples/sec   Loss 6.3394   LearningRate 0.0416   Epoch: 7   Global Step: 118590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:23,125-Speed 9173.02 samples/sec   Loss 6.2553   LearningRate 0.0416   Epoch: 7   Global Step: 118600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:24,226-Speed 9300.01 samples/sec   Loss 6.3796   LearningRate 0.0416   Epoch: 7   Global Step: 118610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:25,534-Speed 7834.47 samples/sec   Loss 6.2800   LearningRate 0.0416   Epoch: 7   Global Step: 118620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:26,608-Speed 9538.79 samples/sec   Loss 6.3566   LearningRate 0.0416   Epoch: 7   Global Step: 118630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:27,735-Speed 9092.42 samples/sec   Loss 6.2891   LearningRate 0.0416   Epoch: 7   Global Step: 118640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:28,809-Speed 9535.03 samples/sec   Loss 6.1808   LearningRate 0.0415   Epoch: 7   Global Step: 118650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:29,915-Speed 9270.57 samples/sec   Loss 6.3717   LearningRate 0.0415   Epoch: 7   Global Step: 118660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:31,007-Speed 9385.80 samples/sec   Loss 6.3180   LearningRate 0.0415   Epoch: 7   Global Step: 118670   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:32,106-Speed 9322.96 samples/sec   Loss 6.2653   LearningRate 0.0415   Epoch: 7   Global Step: 118680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:33,271-Speed 8788.78 samples/sec   Loss 6.1264   LearningRate 0.0415   Epoch: 7   Global Step: 118690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:34,351-Speed 9485.34 samples/sec   Loss 6.2678   LearningRate 0.0415   Epoch: 7   Global Step: 118700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:35,406-Speed 9713.14 samples/sec   Loss 6.2450   LearningRate 0.0415   Epoch: 7   Global Step: 118710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:36,469-Speed 9642.89 samples/sec   Loss 6.3308   LearningRate 0.0415   Epoch: 7   Global Step: 118720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:37,561-Speed 9383.08 samples/sec   Loss 6.3239   LearningRate 0.0415   Epoch: 7   Global Step: 118730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:38,650-Speed 9411.99 samples/sec   Loss 6.2832   LearningRate 0.0415   Epoch: 7   Global Step: 118740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:39,730-Speed 9480.54 samples/sec   Loss 6.2924   LearningRate 0.0415   Epoch: 7   Global Step: 118750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:40,809-Speed 9493.38 samples/sec   Loss 6.3266   LearningRate 0.0415   Epoch: 7   Global Step: 118760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:41,873-Speed 9639.56 samples/sec   Loss 6.3209   LearningRate 0.0415   Epoch: 7   Global Step: 118770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:43,002-Speed 9071.86 samples/sec   Loss 6.2522   LearningRate 0.0415   Epoch: 7   Global Step: 118780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:44,133-Speed 9063.03 samples/sec   Loss 6.2767   LearningRate 0.0415   Epoch: 7   Global Step: 118790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:45,180-Speed 9776.65 samples/sec   Loss 6.3638   LearningRate 0.0415   Epoch: 7   Global Step: 118800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:46,230-Speed 9764.75 samples/sec   Loss 6.3532   LearningRate 0.0415   Epoch: 7   Global Step: 118810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:47,332-Speed 9294.61 samples/sec   Loss 6.3443   LearningRate 0.0415   Epoch: 7   Global Step: 118820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:48,439-Speed 9253.53 samples/sec   Loss 6.3058   LearningRate 0.0415   Epoch: 7   Global Step: 118830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:49,476-Speed 9878.53 samples/sec   Loss 6.3100   LearningRate 0.0415   Epoch: 7   Global Step: 118840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:50,538-Speed 9651.91 samples/sec   Loss 6.2967   LearningRate 0.0415   Epoch: 7   Global Step: 118850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:51,643-Speed 9269.14 samples/sec   Loss 6.3695   LearningRate 0.0415   Epoch: 7   Global Step: 118860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:52,680-Speed 9889.19 samples/sec   Loss 6.4235   LearningRate 0.0415   Epoch: 7   Global Step: 118870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:53,780-Speed 9307.02 samples/sec   Loss 6.3275   LearningRate 0.0415   Epoch: 7   Global Step: 118880   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:19:54,866-Speed 9441.09 samples/sec   Loss 6.4756   LearningRate 0.0415   Epoch: 7   Global Step: 118890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:55,978-Speed 9211.68 samples/sec   Loss 6.3391   LearningRate 0.0415   Epoch: 7   Global Step: 118900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:57,101-Speed 9125.57 samples/sec   Loss 6.3334   LearningRate 0.0414   Epoch: 7   Global Step: 118910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:58,196-Speed 9357.36 samples/sec   Loss 6.2926   LearningRate 0.0414   Epoch: 7   Global Step: 118920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:19:59,269-Speed 9552.54 samples/sec   Loss 6.3426   LearningRate 0.0414   Epoch: 7   Global Step: 118930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:00,327-Speed 9684.62 samples/sec   Loss 6.3141   LearningRate 0.0414   Epoch: 7   Global Step: 118940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:01,403-Speed 9521.39 samples/sec   Loss 6.3503   LearningRate 0.0414   Epoch: 7   Global Step: 118950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:02,505-Speed 9298.11 samples/sec   Loss 6.2646   LearningRate 0.0414   Epoch: 7   Global Step: 118960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:03,666-Speed 8825.39 samples/sec   Loss 6.4473   LearningRate 0.0414   Epoch: 7   Global Step: 118970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:04,756-Speed 9399.69 samples/sec   Loss 6.3260   LearningRate 0.0414   Epoch: 7   Global Step: 118980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:05,801-Speed 9809.37 samples/sec   Loss 6.4195   LearningRate 0.0414   Epoch: 7   Global Step: 118990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:06,852-Speed 9746.17 samples/sec   Loss 6.3685   LearningRate 0.0414   Epoch: 7   Global Step: 119000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:07,910-Speed 9681.33 samples/sec   Loss 6.4051   LearningRate 0.0414   Epoch: 7   Global Step: 119010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:08,990-Speed 9483.07 samples/sec   Loss 6.3078   LearningRate 0.0414   Epoch: 7   Global Step: 119020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:10,081-Speed 9398.88 samples/sec   Loss 6.3842   LearningRate 0.0414   Epoch: 7   Global Step: 119030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:11,181-Speed 9310.32 samples/sec   Loss 6.2783   LearningRate 0.0414   Epoch: 7   Global Step: 119040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:12,252-Speed 9571.90 samples/sec   Loss 6.2897   LearningRate 0.0414   Epoch: 7   Global Step: 119050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:13,342-Speed 9394.10 samples/sec   Loss 6.3117   LearningRate 0.0414   Epoch: 7   Global Step: 119060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:14,452-Speed 9236.53 samples/sec   Loss 6.4500   LearningRate 0.0414   Epoch: 7   Global Step: 119070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:15,565-Speed 9204.81 samples/sec   Loss 6.3612   LearningRate 0.0414   Epoch: 7   Global Step: 119080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:16,655-Speed 9399.38 samples/sec   Loss 6.4287   LearningRate 0.0414   Epoch: 7   Global Step: 119090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:17,719-Speed 9635.69 samples/sec   Loss 6.3519   LearningRate 0.0414   Epoch: 7   Global Step: 119100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:18,784-Speed 9620.47 samples/sec   Loss 6.3469   LearningRate 0.0414   Epoch: 7   Global Step: 119110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:19,851-Speed 9600.94 samples/sec   Loss 6.3530   LearningRate 0.0414   Epoch: 7   Global Step: 119120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:20,939-Speed 9414.47 samples/sec   Loss 6.4710   LearningRate 0.0414   Epoch: 7   Global Step: 119130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:21,994-Speed 9714.01 samples/sec   Loss 6.3286   LearningRate 0.0414   Epoch: 7   Global Step: 119140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:23,061-Speed 9605.63 samples/sec   Loss 6.2033   LearningRate 0.0414   Epoch: 7   Global Step: 119150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:24,139-Speed 9498.38 samples/sec   Loss 6.3312   LearningRate 0.0414   Epoch: 7   Global Step: 119160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:25,224-Speed 9447.08 samples/sec   Loss 6.4449   LearningRate 0.0413   Epoch: 7   Global Step: 119170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:26,314-Speed 9400.27 samples/sec   Loss 6.3910   LearningRate 0.0413   Epoch: 7   Global Step: 119180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:27,408-Speed 9365.57 samples/sec   Loss 6.3421   LearningRate 0.0413   Epoch: 7   Global Step: 119190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:28,492-Speed 9454.58 samples/sec   Loss 6.4174   LearningRate 0.0413   Epoch: 7   Global Step: 119200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:29,585-Speed 9377.64 samples/sec   Loss 6.3898   LearningRate 0.0413   Epoch: 7   Global Step: 119210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:30,681-Speed 9343.60 samples/sec   Loss 6.3982   LearningRate 0.0413   Epoch: 7   Global Step: 119220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:31,816-Speed 9025.93 samples/sec   Loss 6.3995   LearningRate 0.0413   Epoch: 7   Global Step: 119230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:32,882-Speed 9616.03 samples/sec   Loss 6.3355   LearningRate 0.0413   Epoch: 7   Global Step: 119240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:34,021-Speed 8994.61 samples/sec   Loss 6.4833   LearningRate 0.0413   Epoch: 7   Global Step: 119250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:35,100-Speed 9494.95 samples/sec   Loss 6.4593   LearningRate 0.0413   Epoch: 7   Global Step: 119260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:36,181-Speed 9476.05 samples/sec   Loss 6.4218   LearningRate 0.0413   Epoch: 7   Global Step: 119270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:37,263-Speed 9467.36 samples/sec   Loss 6.3669   LearningRate 0.0413   Epoch: 7   Global Step: 119280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:38,384-Speed 9140.38 samples/sec   Loss 6.3699   LearningRate 0.0413   Epoch: 7   Global Step: 119290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:39,462-Speed 9503.18 samples/sec   Loss 6.4273   LearningRate 0.0413   Epoch: 7   Global Step: 119300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:40,543-Speed 9484.58 samples/sec   Loss 6.3982   LearningRate 0.0413   Epoch: 7   Global Step: 119310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:41,619-Speed 9524.47 samples/sec   Loss 6.3496   LearningRate 0.0413   Epoch: 7   Global Step: 119320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:42,730-Speed 9228.15 samples/sec   Loss 6.3914   LearningRate 0.0413   Epoch: 7   Global Step: 119330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:43,881-Speed 8901.80 samples/sec   Loss 6.3524   LearningRate 0.0413   Epoch: 7   Global Step: 119340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:44,962-Speed 9478.66 samples/sec   Loss 6.4801   LearningRate 0.0413   Epoch: 7   Global Step: 119350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:46,059-Speed 9345.34 samples/sec   Loss 6.4342   LearningRate 0.0413   Epoch: 7   Global Step: 119360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:47,154-Speed 9351.61 samples/sec   Loss 6.3955   LearningRate 0.0413   Epoch: 7   Global Step: 119370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:48,225-Speed 9569.56 samples/sec   Loss 6.3281   LearningRate 0.0413   Epoch: 7   Global Step: 119380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:49,294-Speed 9583.10 samples/sec   Loss 6.4164   LearningRate 0.0413   Epoch: 7   Global Step: 119390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:50,413-Speed 9154.56 samples/sec   Loss 6.3501   LearningRate 0.0413   Epoch: 7   Global Step: 119400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:51,467-Speed 9720.25 samples/sec   Loss 6.3960   LearningRate 0.0413   Epoch: 7   Global Step: 119410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:52,491-Speed 10012.86 samples/sec   Loss 6.4565   LearningRate 0.0413   Epoch: 7   Global Step: 119420   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:53,570-Speed 9495.43 samples/sec   Loss 6.3911   LearningRate 0.0412   Epoch: 7   Global Step: 119430   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:54,628-Speed 9681.48 samples/sec   Loss 6.4006   LearningRate 0.0412   Epoch: 7   Global Step: 119440   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:55,688-Speed 9669.80 samples/sec   Loss 6.4736   LearningRate 0.0412   Epoch: 7   Global Step: 119450   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:56,796-Speed 9246.41 samples/sec   Loss 6.4231   LearningRate 0.0412   Epoch: 7   Global Step: 119460   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:20:57,922-Speed 9097.62 samples/sec   Loss 6.4709   LearningRate 0.0412   Epoch: 7   Global Step: 119470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:20:58,997-Speed 9539.87 samples/sec   Loss 6.2846   LearningRate 0.0412   Epoch: 7   Global Step: 119480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:00,057-Speed 9657.69 samples/sec   Loss 6.4137   LearningRate 0.0412   Epoch: 7   Global Step: 119490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:01,162-Speed 9277.25 samples/sec   Loss 6.3384   LearningRate 0.0412   Epoch: 7   Global Step: 119500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:02,254-Speed 9381.67 samples/sec   Loss 6.4558   LearningRate 0.0412   Epoch: 7   Global Step: 119510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:03,358-Speed 9285.83 samples/sec   Loss 6.5145   LearningRate 0.0412   Epoch: 7   Global Step: 119520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:04,444-Speed 9435.46 samples/sec   Loss 6.3780   LearningRate 0.0412   Epoch: 7   Global Step: 119530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:05,524-Speed 9482.36 samples/sec   Loss 6.4499   LearningRate 0.0412   Epoch: 7   Global Step: 119540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:06,584-Speed 9668.77 samples/sec   Loss 6.3871   LearningRate 0.0412   Epoch: 7   Global Step: 119550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:07,710-Speed 9097.60 samples/sec   Loss 6.4062   LearningRate 0.0412   Epoch: 7   Global Step: 119560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:08,799-Speed 9412.21 samples/sec   Loss 6.4444   LearningRate 0.0412   Epoch: 7   Global Step: 119570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:09,881-Speed 9468.23 samples/sec   Loss 6.3494   LearningRate 0.0412   Epoch: 7   Global Step: 119580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:10,960-Speed 9494.37 samples/sec   Loss 6.3295   LearningRate 0.0412   Epoch: 7   Global Step: 119590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:12,084-Speed 9114.79 samples/sec   Loss 6.3648   LearningRate 0.0412   Epoch: 7   Global Step: 119600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:13,158-Speed 9541.74 samples/sec   Loss 6.3352   LearningRate 0.0412   Epoch: 7   Global Step: 119610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:14,223-Speed 9625.88 samples/sec   Loss 6.3811   LearningRate 0.0412   Epoch: 7   Global Step: 119620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:15,266-Speed 9816.30 samples/sec   Loss 6.3414   LearningRate 0.0412   Epoch: 7   Global Step: 119630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:16,379-Speed 9207.32 samples/sec   Loss 6.4653   LearningRate 0.0412   Epoch: 7   Global Step: 119640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:17,488-Speed 9241.90 samples/sec   Loss 6.4089   LearningRate 0.0412   Epoch: 7   Global Step: 119650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:18,558-Speed 9575.56 samples/sec   Loss 6.4292   LearningRate 0.0412   Epoch: 7   Global Step: 119660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:19,687-Speed 9070.97 samples/sec   Loss 6.4609   LearningRate 0.0412   Epoch: 7   Global Step: 119670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:20,762-Speed 9533.91 samples/sec   Loss 6.4745   LearningRate 0.0412   Epoch: 7   Global Step: 119680   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:21,821-Speed 9679.99 samples/sec   Loss 6.4397   LearningRate 0.0411   Epoch: 7   Global Step: 119690   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:22,919-Speed 9332.73 samples/sec   Loss 6.4778   LearningRate 0.0411   Epoch: 7   Global Step: 119700   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:23,954-Speed 9901.63 samples/sec   Loss 6.4313   LearningRate 0.0411   Epoch: 7   Global Step: 119710   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:25,059-Speed 9270.57 samples/sec   Loss 6.3751   LearningRate 0.0411   Epoch: 7   Global Step: 119720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:26,138-Speed 9492.93 samples/sec   Loss 6.3569   LearningRate 0.0411   Epoch: 7   Global Step: 119730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:27,251-Speed 9206.35 samples/sec   Loss 6.3865   LearningRate 0.0411   Epoch: 7   Global Step: 119740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:28,337-Speed 9430.43 samples/sec   Loss 6.3472   LearningRate 0.0411   Epoch: 7   Global Step: 119750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:29,422-Speed 9447.94 samples/sec   Loss 6.4496   LearningRate 0.0411   Epoch: 7   Global Step: 119760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:30,511-Speed 9412.42 samples/sec   Loss 6.4348   LearningRate 0.0411   Epoch: 7   Global Step: 119770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:31,618-Speed 9252.09 samples/sec   Loss 6.4456   LearningRate 0.0411   Epoch: 7   Global Step: 119780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:32,664-Speed 9800.75 samples/sec   Loss 6.3677   LearningRate 0.0411   Epoch: 7   Global Step: 119790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:33,737-Speed 9546.98 samples/sec   Loss 6.4241   LearningRate 0.0411   Epoch: 7   Global Step: 119800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:34,799-Speed 9645.74 samples/sec   Loss 6.4700   LearningRate 0.0411   Epoch: 7   Global Step: 119810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:35,880-Speed 9473.99 samples/sec   Loss 6.4245   LearningRate 0.0411   Epoch: 7   Global Step: 119820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:37,014-Speed 9038.33 samples/sec   Loss 6.5128   LearningRate 0.0411   Epoch: 7   Global Step: 119830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:38,134-Speed 9144.45 samples/sec   Loss 6.4824   LearningRate 0.0411   Epoch: 7   Global Step: 119840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:39,217-Speed 9461.46 samples/sec   Loss 6.3298   LearningRate 0.0411   Epoch: 7   Global Step: 119850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:40,283-Speed 9615.99 samples/sec   Loss 6.3507   LearningRate 0.0411   Epoch: 7   Global Step: 119860   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:41,358-Speed 9529.83 samples/sec   Loss 6.4710   LearningRate 0.0411   Epoch: 7   Global Step: 119870   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:42,437-Speed 9504.19 samples/sec   Loss 6.5262   LearningRate 0.0411   Epoch: 7   Global Step: 119880   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:21:43,514-Speed 9513.02 samples/sec   Loss 6.3972   LearningRate 0.0411   Epoch: 7   Global Step: 119890   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:21:44,559-Speed 9803.30 samples/sec   Loss 6.4485   LearningRate 0.0411   Epoch: 7   Global Step: 119900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:45,653-Speed 9361.30 samples/sec   Loss 6.3015   LearningRate 0.0411   Epoch: 7   Global Step: 119910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:46,744-Speed 9389.30 samples/sec   Loss 6.4031   LearningRate 0.0411   Epoch: 7   Global Step: 119920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:47,840-Speed 9349.17 samples/sec   Loss 6.4960   LearningRate 0.0411   Epoch: 7   Global Step: 119930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:48,929-Speed 9414.17 samples/sec   Loss 6.4151   LearningRate 0.0411   Epoch: 7   Global Step: 119940   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:50,016-Speed 9426.64 samples/sec   Loss 6.5397   LearningRate 0.0410   Epoch: 7   Global Step: 119950   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:51,071-Speed 9710.80 samples/sec   Loss 6.4463   LearningRate 0.0410   Epoch: 7   Global Step: 119960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:52,169-Speed 9329.30 samples/sec   Loss 6.3540   LearningRate 0.0410   Epoch: 7   Global Step: 119970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:53,298-Speed 9081.23 samples/sec   Loss 6.4611   LearningRate 0.0410   Epoch: 7   Global Step: 119980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:21:54,387-Speed 9402.06 samples/sec   Loss 6.4183   LearningRate 0.0410   Epoch: 7   Global Step: 119990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:21:55,454-Speed 9601.28 samples/sec   Loss 6.4138   LearningRate 0.0410   Epoch: 7   Global Step: 120000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:22:17,577-[lfw][120000]XNorm: 11.040144
Training: 2022-04-11 16:22:17,578-[lfw][120000]Accuracy-Flip: 0.99650+-0.00293
Training: 2022-04-11 16:22:17,578-[lfw][120000]Accuracy-Highest: 0.99683
Training: 2022-04-11 16:22:43,050-[cfp_fp][120000]XNorm: 9.430222
Training: 2022-04-11 16:22:43,051-[cfp_fp][120000]Accuracy-Flip: 0.95329+-0.01314
Training: 2022-04-11 16:22:43,051-[cfp_fp][120000]Accuracy-Highest: 0.96157
Training: 2022-04-11 16:23:04,878-[agedb_30][120000]XNorm: 10.759171
Training: 2022-04-11 16:23:04,879-[agedb_30][120000]Accuracy-Flip: 0.96283+-0.01038
Training: 2022-04-11 16:23:04,880-[agedb_30][120000]Accuracy-Highest: 0.96483
Training: 2022-04-11 16:23:05,985-Speed 145.19 samples/sec   Loss 6.5329   LearningRate 0.0410   Epoch: 7   Global Step: 120010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:07,072-Speed 9418.65 samples/sec   Loss 6.4126   LearningRate 0.0410   Epoch: 7   Global Step: 120020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:08,122-Speed 9763.41 samples/sec   Loss 6.4684   LearningRate 0.0410   Epoch: 7   Global Step: 120030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:09,182-Speed 9666.10 samples/sec   Loss 6.3350   LearningRate 0.0410   Epoch: 7   Global Step: 120040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:10,279-Speed 9342.69 samples/sec   Loss 6.4944   LearningRate 0.0410   Epoch: 7   Global Step: 120050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:11,370-Speed 9382.55 samples/sec   Loss 6.4769   LearningRate 0.0410   Epoch: 7   Global Step: 120060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:12,452-Speed 9468.68 samples/sec   Loss 6.4146   LearningRate 0.0410   Epoch: 7   Global Step: 120070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:13,600-Speed 8929.55 samples/sec   Loss 6.2905   LearningRate 0.0410   Epoch: 7   Global Step: 120080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:14,670-Speed 9573.63 samples/sec   Loss 6.4457   LearningRate 0.0410   Epoch: 7   Global Step: 120090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:15,726-Speed 9698.98 samples/sec   Loss 6.3863   LearningRate 0.0410   Epoch: 7   Global Step: 120100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:16,776-Speed 9763.97 samples/sec   Loss 6.3918   LearningRate 0.0410   Epoch: 7   Global Step: 120110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:17,884-Speed 9250.39 samples/sec   Loss 6.4814   LearningRate 0.0410   Epoch: 7   Global Step: 120120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:18,976-Speed 9378.76 samples/sec   Loss 6.4358   LearningRate 0.0410   Epoch: 7   Global Step: 120130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:20,060-Speed 9455.07 samples/sec   Loss 6.4599   LearningRate 0.0410   Epoch: 7   Global Step: 120140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:21,113-Speed 9726.74 samples/sec   Loss 6.4492   LearningRate 0.0410   Epoch: 7   Global Step: 120150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:22,188-Speed 9531.14 samples/sec   Loss 6.4460   LearningRate 0.0410   Epoch: 7   Global Step: 120160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:23,307-Speed 9163.09 samples/sec   Loss 6.4412   LearningRate 0.0410   Epoch: 7   Global Step: 120170   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:24,403-Speed 9350.99 samples/sec   Loss 6.4193   LearningRate 0.0410   Epoch: 7   Global Step: 120180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:25,469-Speed 9616.47 samples/sec   Loss 6.3498   LearningRate 0.0410   Epoch: 7   Global Step: 120190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:26,557-Speed 9417.29 samples/sec   Loss 6.4644   LearningRate 0.0410   Epoch: 7   Global Step: 120200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:27,642-Speed 9435.69 samples/sec   Loss 6.3959   LearningRate 0.0409   Epoch: 7   Global Step: 120210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:28,742-Speed 9320.27 samples/sec   Loss 6.4833   LearningRate 0.0409   Epoch: 7   Global Step: 120220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:29,796-Speed 9719.67 samples/sec   Loss 6.3904   LearningRate 0.0409   Epoch: 7   Global Step: 120230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:30,847-Speed 9746.09 samples/sec   Loss 6.4881   LearningRate 0.0409   Epoch: 7   Global Step: 120240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:31,935-Speed 9413.65 samples/sec   Loss 6.4551   LearningRate 0.0409   Epoch: 7   Global Step: 120250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:33,034-Speed 9323.83 samples/sec   Loss 6.3560   LearningRate 0.0409   Epoch: 7   Global Step: 120260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:34,110-Speed 9520.82 samples/sec   Loss 6.3806   LearningRate 0.0409   Epoch: 7   Global Step: 120270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:35,189-Speed 9494.52 samples/sec   Loss 6.4757   LearningRate 0.0409   Epoch: 7   Global Step: 120280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:36,292-Speed 9293.16 samples/sec   Loss 6.3807   LearningRate 0.0409   Epoch: 7   Global Step: 120290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:37,358-Speed 9609.30 samples/sec   Loss 6.3786   LearningRate 0.0409   Epoch: 7   Global Step: 120300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:38,456-Speed 9336.89 samples/sec   Loss 6.3815   LearningRate 0.0409   Epoch: 7   Global Step: 120310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:39,538-Speed 9463.41 samples/sec   Loss 6.4980   LearningRate 0.0409   Epoch: 7   Global Step: 120320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:40,644-Speed 9266.72 samples/sec   Loss 6.4125   LearningRate 0.0409   Epoch: 7   Global Step: 120330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:41,742-Speed 9331.69 samples/sec   Loss 6.5060   LearningRate 0.0409   Epoch: 7   Global Step: 120340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:42,835-Speed 9380.13 samples/sec   Loss 6.5995   LearningRate 0.0409   Epoch: 7   Global Step: 120350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:43,941-Speed 9265.20 samples/sec   Loss 6.6226   LearningRate 0.0409   Epoch: 7   Global Step: 120360   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:44,973-Speed 9927.13 samples/sec   Loss 6.4240   LearningRate 0.0409   Epoch: 7   Global Step: 120370   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:46,028-Speed 9710.80 samples/sec   Loss 6.5369   LearningRate 0.0409   Epoch: 7   Global Step: 120380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:47,090-Speed 9649.76 samples/sec   Loss 6.5783   LearningRate 0.0409   Epoch: 7   Global Step: 120390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:48,216-Speed 9099.42 samples/sec   Loss 6.4293   LearningRate 0.0409   Epoch: 7   Global Step: 120400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:49,281-Speed 9619.62 samples/sec   Loss 6.4631   LearningRate 0.0409   Epoch: 7   Global Step: 120410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:50,376-Speed 9358.45 samples/sec   Loss 6.5838   LearningRate 0.0409   Epoch: 7   Global Step: 120420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:51,438-Speed 9648.91 samples/sec   Loss 6.3614   LearningRate 0.0409   Epoch: 7   Global Step: 120430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:52,510-Speed 9558.99 samples/sec   Loss 6.5414   LearningRate 0.0409   Epoch: 7   Global Step: 120440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:53,596-Speed 9427.51 samples/sec   Loss 6.4709   LearningRate 0.0409   Epoch: 7   Global Step: 120450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:54,651-Speed 9725.85 samples/sec   Loss 6.4357   LearningRate 0.0409   Epoch: 7   Global Step: 120460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:55,729-Speed 9503.99 samples/sec   Loss 6.5114   LearningRate 0.0408   Epoch: 7   Global Step: 120470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:23:56,854-Speed 9111.36 samples/sec   Loss 6.4435   LearningRate 0.0408   Epoch: 7   Global Step: 120480   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:57,984-Speed 9062.32 samples/sec   Loss 6.4887   LearningRate 0.0408   Epoch: 7   Global Step: 120490   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:23:59,040-Speed 9704.29 samples/sec   Loss 6.5130   LearningRate 0.0408   Epoch: 7   Global Step: 120500   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:00,126-Speed 9435.46 samples/sec   Loss 6.3846   LearningRate 0.0408   Epoch: 7   Global Step: 120510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:01,236-Speed 9237.64 samples/sec   Loss 6.3904   LearningRate 0.0408   Epoch: 7   Global Step: 120520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:02,311-Speed 9527.26 samples/sec   Loss 6.3811   LearningRate 0.0408   Epoch: 7   Global Step: 120530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:03,407-Speed 9349.70 samples/sec   Loss 6.5282   LearningRate 0.0408   Epoch: 7   Global Step: 120540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:04,493-Speed 9431.41 samples/sec   Loss 6.5416   LearningRate 0.0408   Epoch: 7   Global Step: 120550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:05,579-Speed 9437.82 samples/sec   Loss 6.5517   LearningRate 0.0408   Epoch: 7   Global Step: 120560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:06,644-Speed 9621.98 samples/sec   Loss 6.5429   LearningRate 0.0408   Epoch: 7   Global Step: 120570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:07,706-Speed 9651.80 samples/sec   Loss 6.4328   LearningRate 0.0408   Epoch: 7   Global Step: 120580   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:24:08,791-Speed 9446.13 samples/sec   Loss 6.4605   LearningRate 0.0408   Epoch: 7   Global Step: 120590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:09,856-Speed 9613.32 samples/sec   Loss 6.4903   LearningRate 0.0408   Epoch: 7   Global Step: 120600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:10,974-Speed 9170.55 samples/sec   Loss 6.4515   LearningRate 0.0408   Epoch: 7   Global Step: 120610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:12,049-Speed 9532.02 samples/sec   Loss 6.4747   LearningRate 0.0408   Epoch: 7   Global Step: 120620   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:13,262-Speed 8443.99 samples/sec   Loss 6.5174   LearningRate 0.0408   Epoch: 7   Global Step: 120630   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:14,323-Speed 9659.08 samples/sec   Loss 6.5681   LearningRate 0.0408   Epoch: 7   Global Step: 120640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:15,456-Speed 9037.75 samples/sec   Loss 6.5375   LearningRate 0.0408   Epoch: 7   Global Step: 120650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:16,575-Speed 9155.08 samples/sec   Loss 6.4409   LearningRate 0.0408   Epoch: 7   Global Step: 120660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:17,693-Speed 9178.90 samples/sec   Loss 6.4354   LearningRate 0.0408   Epoch: 7   Global Step: 120670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:18,808-Speed 9190.69 samples/sec   Loss 6.5075   LearningRate 0.0408   Epoch: 7   Global Step: 120680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:19,909-Speed 9302.88 samples/sec   Loss 6.4860   LearningRate 0.0408   Epoch: 7   Global Step: 120690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:20,997-Speed 9421.77 samples/sec   Loss 6.4224   LearningRate 0.0408   Epoch: 7   Global Step: 120700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:22,073-Speed 9521.60 samples/sec   Loss 6.3321   LearningRate 0.0408   Epoch: 7   Global Step: 120710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:23,149-Speed 9516.10 samples/sec   Loss 6.4761   LearningRate 0.0408   Epoch: 7   Global Step: 120720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:24,227-Speed 9506.57 samples/sec   Loss 6.5198   LearningRate 0.0407   Epoch: 7   Global Step: 120730   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:25,333-Speed 9265.42 samples/sec   Loss 6.4075   LearningRate 0.0407   Epoch: 7   Global Step: 120740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:26,455-Speed 9129.86 samples/sec   Loss 6.4240   LearningRate 0.0407   Epoch: 7   Global Step: 120750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:27,544-Speed 9410.54 samples/sec   Loss 6.4888   LearningRate 0.0407   Epoch: 7   Global Step: 120760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:28,640-Speed 9349.77 samples/sec   Loss 6.6073   LearningRate 0.0407   Epoch: 7   Global Step: 120770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:29,774-Speed 9032.11 samples/sec   Loss 6.5501   LearningRate 0.0407   Epoch: 7   Global Step: 120780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:30,899-Speed 9111.80 samples/sec   Loss 6.4453   LearningRate 0.0407   Epoch: 7   Global Step: 120790   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:31,984-Speed 9438.02 samples/sec   Loss 6.4164   LearningRate 0.0407   Epoch: 7   Global Step: 120800   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:33,065-Speed 9478.21 samples/sec   Loss 6.4529   LearningRate 0.0407   Epoch: 7   Global Step: 120810   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:34,148-Speed 9458.20 samples/sec   Loss 6.4259   LearningRate 0.0407   Epoch: 7   Global Step: 120820   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:35,234-Speed 9435.61 samples/sec   Loss 6.4271   LearningRate 0.0407   Epoch: 7   Global Step: 120830   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:36,325-Speed 9400.59 samples/sec   Loss 6.4736   LearningRate 0.0407   Epoch: 7   Global Step: 120840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:37,368-Speed 9820.07 samples/sec   Loss 6.5029   LearningRate 0.0407   Epoch: 7   Global Step: 120850   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:38,423-Speed 9715.25 samples/sec   Loss 6.5033   LearningRate 0.0407   Epoch: 7   Global Step: 120860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:39,517-Speed 9367.06 samples/sec   Loss 6.4450   LearningRate 0.0407   Epoch: 7   Global Step: 120870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:40,650-Speed 9039.18 samples/sec   Loss 6.4152   LearningRate 0.0407   Epoch: 7   Global Step: 120880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:41,737-Speed 9430.40 samples/sec   Loss 6.4181   LearningRate 0.0407   Epoch: 7   Global Step: 120890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:42,843-Speed 9256.52 samples/sec   Loss 6.6182   LearningRate 0.0407   Epoch: 7   Global Step: 120900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:43,955-Speed 9213.72 samples/sec   Loss 6.4560   LearningRate 0.0407   Epoch: 7   Global Step: 120910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:45,071-Speed 9180.64 samples/sec   Loss 6.3953   LearningRate 0.0407   Epoch: 7   Global Step: 120920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:46,124-Speed 9729.94 samples/sec   Loss 6.5601   LearningRate 0.0407   Epoch: 7   Global Step: 120930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:47,193-Speed 9587.87 samples/sec   Loss 6.4537   LearningRate 0.0407   Epoch: 7   Global Step: 120940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:48,283-Speed 9400.13 samples/sec   Loss 6.3464   LearningRate 0.0407   Epoch: 7   Global Step: 120950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:24:49,385-Speed 9302.45 samples/sec   Loss 6.3972   LearningRate 0.0407   Epoch: 7   Global Step: 120960   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:50,448-Speed 9632.99 samples/sec   Loss 6.4448   LearningRate 0.0407   Epoch: 7   Global Step: 120970   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:51,486-Speed 9873.72 samples/sec   Loss 6.4627   LearningRate 0.0407   Epoch: 7   Global Step: 120980   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:52,567-Speed 9477.44 samples/sec   Loss 6.3426   LearningRate 0.0406   Epoch: 7   Global Step: 120990   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:53,652-Speed 9439.03 samples/sec   Loss 6.4352   LearningRate 0.0406   Epoch: 7   Global Step: 121000   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:54,747-Speed 9362.89 samples/sec   Loss 6.4918   LearningRate 0.0406   Epoch: 7   Global Step: 121010   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:55,834-Speed 9430.21 samples/sec   Loss 6.4170   LearningRate 0.0406   Epoch: 7   Global Step: 121020   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:56,969-Speed 9030.74 samples/sec   Loss 6.4241   LearningRate 0.0406   Epoch: 7   Global Step: 121030   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:58,043-Speed 9535.90 samples/sec   Loss 6.5042   LearningRate 0.0406   Epoch: 7   Global Step: 121040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:24:59,197-Speed 8876.11 samples/sec   Loss 6.5622   LearningRate 0.0406   Epoch: 7   Global Step: 121050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:00,281-Speed 9455.96 samples/sec   Loss 6.5223   LearningRate 0.0406   Epoch: 7   Global Step: 121060   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:25:01,353-Speed 9551.48 samples/sec   Loss 6.4021   LearningRate 0.0406   Epoch: 7   Global Step: 121070   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:02,438-Speed 9444.58 samples/sec   Loss 6.5123   LearningRate 0.0406   Epoch: 7   Global Step: 121080   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:03,501-Speed 9638.71 samples/sec   Loss 6.4460   LearningRate 0.0406   Epoch: 7   Global Step: 121090   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:04,564-Speed 9638.28 samples/sec   Loss 6.5339   LearningRate 0.0406   Epoch: 7   Global Step: 121100   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:05,676-Speed 9214.47 samples/sec   Loss 6.5030   LearningRate 0.0406   Epoch: 7   Global Step: 121110   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:06,752-Speed 9528.22 samples/sec   Loss 6.5477   LearningRate 0.0406   Epoch: 7   Global Step: 121120   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:07,870-Speed 9163.76 samples/sec   Loss 6.4486   LearningRate 0.0406   Epoch: 7   Global Step: 121130   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:08,974-Speed 9281.96 samples/sec   Loss 6.4347   LearningRate 0.0406   Epoch: 7   Global Step: 121140   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:10,101-Speed 9088.76 samples/sec   Loss 6.5506   LearningRate 0.0406   Epoch: 7   Global Step: 121150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:11,214-Speed 9203.32 samples/sec   Loss 6.5166   LearningRate 0.0406   Epoch: 7   Global Step: 121160   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:12,281-Speed 9604.36 samples/sec   Loss 6.4564   LearningRate 0.0406   Epoch: 7   Global Step: 121170   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:25:13,381-Speed 9310.47 samples/sec   Loss 6.5135   LearningRate 0.0406   Epoch: 7   Global Step: 121180   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:25:14,455-Speed 9545.07 samples/sec   Loss 6.4984   LearningRate 0.0406   Epoch: 7   Global Step: 121190   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:25:15,611-Speed 8862.57 samples/sec   Loss 6.5316   LearningRate 0.0406   Epoch: 7   Global Step: 121200   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:25:16,718-Speed 9254.55 samples/sec   Loss 6.4868   LearningRate 0.0406   Epoch: 7   Global Step: 121210   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:17,791-Speed 9555.42 samples/sec   Loss 6.4603   LearningRate 0.0406   Epoch: 7   Global Step: 121220   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:18,868-Speed 9515.84 samples/sec   Loss 6.5713   LearningRate 0.0406   Epoch: 7   Global Step: 121230   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:19,979-Speed 9219.86 samples/sec   Loss 6.4506   LearningRate 0.0406   Epoch: 7   Global Step: 121240   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:21,062-Speed 9460.70 samples/sec   Loss 6.5452   LearningRate 0.0405   Epoch: 7   Global Step: 121250   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:22,160-Speed 9327.63 samples/sec   Loss 6.5122   LearningRate 0.0405   Epoch: 7   Global Step: 121260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:23,287-Speed 9094.82 samples/sec   Loss 6.5032   LearningRate 0.0405   Epoch: 7   Global Step: 121270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:24,336-Speed 9766.55 samples/sec   Loss 6.4831   LearningRate 0.0405   Epoch: 7   Global Step: 121280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:25,405-Speed 9583.62 samples/sec   Loss 6.3698   LearningRate 0.0405   Epoch: 7   Global Step: 121290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:26,469-Speed 9634.80 samples/sec   Loss 6.4331   LearningRate 0.0405   Epoch: 7   Global Step: 121300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:27,598-Speed 9071.84 samples/sec   Loss 6.4766   LearningRate 0.0405   Epoch: 7   Global Step: 121310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:28,720-Speed 9130.22 samples/sec   Loss 6.5756   LearningRate 0.0405   Epoch: 7   Global Step: 121320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:29,829-Speed 9238.21 samples/sec   Loss 6.4370   LearningRate 0.0405   Epoch: 7   Global Step: 121330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:30,893-Speed 9632.38 samples/sec   Loss 6.5788   LearningRate 0.0405   Epoch: 7   Global Step: 121340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:32,010-Speed 9170.15 samples/sec   Loss 6.5206   LearningRate 0.0405   Epoch: 7   Global Step: 121350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:33,107-Speed 9340.05 samples/sec   Loss 6.5682   LearningRate 0.0405   Epoch: 7   Global Step: 121360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:34,189-Speed 9474.38 samples/sec   Loss 6.5275   LearningRate 0.0405   Epoch: 7   Global Step: 121370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:35,270-Speed 9474.44 samples/sec   Loss 6.5572   LearningRate 0.0405   Epoch: 7   Global Step: 121380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:36,332-Speed 9648.64 samples/sec   Loss 6.5387   LearningRate 0.0405   Epoch: 7   Global Step: 121390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:37,443-Speed 9222.26 samples/sec   Loss 6.4652   LearningRate 0.0405   Epoch: 7   Global Step: 121400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:38,519-Speed 9521.48 samples/sec   Loss 6.5129   LearningRate 0.0405   Epoch: 7   Global Step: 121410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:39,592-Speed 9553.17 samples/sec   Loss 6.5054   LearningRate 0.0405   Epoch: 7   Global Step: 121420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:40,686-Speed 9365.18 samples/sec   Loss 6.4465   LearningRate 0.0405   Epoch: 7   Global Step: 121430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:41,764-Speed 9501.89 samples/sec   Loss 6.4223   LearningRate 0.0405   Epoch: 7   Global Step: 121440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:42,851-Speed 9425.18 samples/sec   Loss 6.5047   LearningRate 0.0405   Epoch: 7   Global Step: 121450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:43,936-Speed 9448.07 samples/sec   Loss 6.5423   LearningRate 0.0405   Epoch: 7   Global Step: 121460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:45,011-Speed 9525.29 samples/sec   Loss 6.4946   LearningRate 0.0405   Epoch: 7   Global Step: 121470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:46,083-Speed 9556.43 samples/sec   Loss 6.4584   LearningRate 0.0405   Epoch: 7   Global Step: 121480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:47,134-Speed 9753.25 samples/sec   Loss 6.4587   LearningRate 0.0405   Epoch: 7   Global Step: 121490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:48,201-Speed 9600.11 samples/sec   Loss 6.4454   LearningRate 0.0405   Epoch: 7   Global Step: 121500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:25:49,334-Speed 9048.68 samples/sec   Loss 6.5335   LearningRate 0.0404   Epoch: 7   Global Step: 121510   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:50,434-Speed 9309.44 samples/sec   Loss 6.5404   LearningRate 0.0404   Epoch: 7   Global Step: 121520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:51,545-Speed 9227.13 samples/sec   Loss 6.3990   LearningRate 0.0404   Epoch: 7   Global Step: 121530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:52,655-Speed 9226.15 samples/sec   Loss 6.5067   LearningRate 0.0404   Epoch: 7   Global Step: 121540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:53,776-Speed 9145.65 samples/sec   Loss 6.5649   LearningRate 0.0404   Epoch: 7   Global Step: 121550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:54,894-Speed 9163.17 samples/sec   Loss 6.5555   LearningRate 0.0404   Epoch: 7   Global Step: 121560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:56,061-Speed 8781.37 samples/sec   Loss 6.4425   LearningRate 0.0404   Epoch: 7   Global Step: 121570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:57,126-Speed 9625.46 samples/sec   Loss 6.4474   LearningRate 0.0404   Epoch: 7   Global Step: 121580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:58,225-Speed 9318.17 samples/sec   Loss 6.4515   LearningRate 0.0404   Epoch: 7   Global Step: 121590   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:25:59,313-Speed 9418.44 samples/sec   Loss 6.5039   LearningRate 0.0404   Epoch: 7   Global Step: 121600   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:26:00,366-Speed 9732.58 samples/sec   Loss 6.6260   LearningRate 0.0404   Epoch: 7   Global Step: 121610   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:26:01,440-Speed 9539.95 samples/sec   Loss 6.4964   LearningRate 0.0404   Epoch: 7   Global Step: 121620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:02,508-Speed 9588.85 samples/sec   Loss 6.4841   LearningRate 0.0404   Epoch: 7   Global Step: 121630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:03,585-Speed 9512.80 samples/sec   Loss 6.5179   LearningRate 0.0404   Epoch: 7   Global Step: 121640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:04,658-Speed 9548.82 samples/sec   Loss 6.4607   LearningRate 0.0404   Epoch: 7   Global Step: 121650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:05,729-Speed 9562.26 samples/sec   Loss 6.5231   LearningRate 0.0404   Epoch: 7   Global Step: 121660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:06,819-Speed 9405.18 samples/sec   Loss 6.5528   LearningRate 0.0404   Epoch: 7   Global Step: 121670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:07,894-Speed 9532.48 samples/sec   Loss 6.4923   LearningRate 0.0404   Epoch: 7   Global Step: 121680   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:08,959-Speed 9625.39 samples/sec   Loss 6.4591   LearningRate 0.0404   Epoch: 7   Global Step: 121690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:10,055-Speed 9343.39 samples/sec   Loss 6.5286   LearningRate 0.0404   Epoch: 7   Global Step: 121700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:11,142-Speed 9431.63 samples/sec   Loss 6.4956   LearningRate 0.0404   Epoch: 7   Global Step: 121710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:12,218-Speed 9519.81 samples/sec   Loss 6.4267   LearningRate 0.0404   Epoch: 7   Global Step: 121720   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:26:13,305-Speed 9425.12 samples/sec   Loss 6.3210   LearningRate 0.0404   Epoch: 7   Global Step: 121730   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:26:14,363-Speed 9683.21 samples/sec   Loss 6.4732   LearningRate 0.0404   Epoch: 7   Global Step: 121740   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:26:15,437-Speed 9543.96 samples/sec   Loss 6.5774   LearningRate 0.0404   Epoch: 7   Global Step: 121750   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:26:16,523-Speed 9433.12 samples/sec   Loss 6.5065   LearningRate 0.0404   Epoch: 7   Global Step: 121760   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:26:17,603-Speed 9487.71 samples/sec   Loss 6.5532   LearningRate 0.0404   Epoch: 7   Global Step: 121770   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:26:18,655-Speed 9744.13 samples/sec   Loss 6.4588   LearningRate 0.0403   Epoch: 7   Global Step: 121780   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:26:19,730-Speed 9529.44 samples/sec   Loss 6.5669   LearningRate 0.0403   Epoch: 7   Global Step: 121790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:20,796-Speed 9611.78 samples/sec   Loss 6.5569   LearningRate 0.0403   Epoch: 7   Global Step: 121800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:21,939-Speed 8959.51 samples/sec   Loss 6.5057   LearningRate 0.0403   Epoch: 7   Global Step: 121810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:22,999-Speed 9669.28 samples/sec   Loss 6.5498   LearningRate 0.0403   Epoch: 7   Global Step: 121820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:24,083-Speed 9460.23 samples/sec   Loss 6.4819   LearningRate 0.0403   Epoch: 7   Global Step: 121830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:25,172-Speed 9403.42 samples/sec   Loss 6.5351   LearningRate 0.0403   Epoch: 7   Global Step: 121840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:26,275-Speed 9292.89 samples/sec   Loss 6.4625   LearningRate 0.0403   Epoch: 7   Global Step: 121850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:27,355-Speed 9485.24 samples/sec   Loss 6.5379   LearningRate 0.0403   Epoch: 7   Global Step: 121860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:28,494-Speed 8989.39 samples/sec   Loss 6.5777   LearningRate 0.0403   Epoch: 7   Global Step: 121870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:29,605-Speed 9224.36 samples/sec   Loss 6.6498   LearningRate 0.0403   Epoch: 7   Global Step: 121880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:30,692-Speed 9433.64 samples/sec   Loss 6.5504   LearningRate 0.0403   Epoch: 7   Global Step: 121890   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:26:31,762-Speed 9573.77 samples/sec   Loss 6.5686   LearningRate 0.0403   Epoch: 7   Global Step: 121900   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:26:32,838-Speed 9519.09 samples/sec   Loss 6.5285   LearningRate 0.0403   Epoch: 7   Global Step: 121910   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:26:33,956-Speed 9162.93 samples/sec   Loss 6.5477   LearningRate 0.0403   Epoch: 7   Global Step: 121920   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:26:35,071-Speed 9190.76 samples/sec   Loss 6.5085   LearningRate 0.0403   Epoch: 7   Global Step: 121930   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:26:36,159-Speed 9418.39 samples/sec   Loss 6.4652   LearningRate 0.0403   Epoch: 7   Global Step: 121940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:37,219-Speed 9667.34 samples/sec   Loss 6.5217   LearningRate 0.0403   Epoch: 7   Global Step: 121950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:38,316-Speed 9341.13 samples/sec   Loss 6.5353   LearningRate 0.0403   Epoch: 7   Global Step: 121960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:39,415-Speed 9322.62 samples/sec   Loss 6.5033   LearningRate 0.0403   Epoch: 7   Global Step: 121970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:40,474-Speed 9677.23 samples/sec   Loss 6.4854   LearningRate 0.0403   Epoch: 7   Global Step: 121980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:41,554-Speed 9482.59 samples/sec   Loss 6.5213   LearningRate 0.0403   Epoch: 7   Global Step: 121990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:26:42,642-Speed 9415.50 samples/sec   Loss 6.5202   LearningRate 0.0403   Epoch: 7   Global Step: 122000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:27:04,539-[lfw][122000]XNorm: 10.660572
Training: 2022-04-11 16:27:04,539-[lfw][122000]Accuracy-Flip: 0.99550+-0.00269
Training: 2022-04-11 16:27:04,540-[lfw][122000]Accuracy-Highest: 0.99683
Training: 2022-04-11 16:27:29,860-[cfp_fp][122000]XNorm: 9.040363
Training: 2022-04-11 16:27:29,860-[cfp_fp][122000]Accuracy-Flip: 0.95871+-0.00918
Training: 2022-04-11 16:27:29,861-[cfp_fp][122000]Accuracy-Highest: 0.96157
Training: 2022-04-11 16:27:51,765-[agedb_30][122000]XNorm: 10.287548
Training: 2022-04-11 16:27:51,766-[agedb_30][122000]Accuracy-Flip: 0.96333+-0.00940
Training: 2022-04-11 16:27:51,766-[agedb_30][122000]Accuracy-Highest: 0.96483
Training: 2022-04-11 16:27:52,885-Speed 145.78 samples/sec   Loss 6.5032   LearningRate 0.0403   Epoch: 7   Global Step: 122010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:27:53,994-Speed 9237.17 samples/sec   Loss 6.5450   LearningRate 0.0403   Epoch: 7   Global Step: 122020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:27:55,065-Speed 9568.28 samples/sec   Loss 6.5814   LearningRate 0.0403   Epoch: 7   Global Step: 122030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:27:56,151-Speed 9430.48 samples/sec   Loss 6.6019   LearningRate 0.0402   Epoch: 7   Global Step: 122040   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:27:57,263-Speed 9213.22 samples/sec   Loss 6.5080   LearningRate 0.0402   Epoch: 7   Global Step: 122050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:27:58,336-Speed 9548.13 samples/sec   Loss 6.4860   LearningRate 0.0402   Epoch: 7   Global Step: 122060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:27:59,396-Speed 9665.04 samples/sec   Loss 6.5643   LearningRate 0.0402   Epoch: 7   Global Step: 122070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:00,495-Speed 9324.08 samples/sec   Loss 6.5595   LearningRate 0.0402   Epoch: 7   Global Step: 122080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:01,571-Speed 9527.52 samples/sec   Loss 6.4822   LearningRate 0.0402   Epoch: 7   Global Step: 122090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:02,624-Speed 9721.85 samples/sec   Loss 6.4852   LearningRate 0.0402   Epoch: 7   Global Step: 122100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:03,726-Speed 9302.35 samples/sec   Loss 6.5438   LearningRate 0.0402   Epoch: 7   Global Step: 122110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:04,770-Speed 9816.77 samples/sec   Loss 6.4528   LearningRate 0.0402   Epoch: 7   Global Step: 122120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:05,885-Speed 9185.48 samples/sec   Loss 6.5698   LearningRate 0.0402   Epoch: 7   Global Step: 122130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:06,989-Speed 9282.68 samples/sec   Loss 6.5847   LearningRate 0.0402   Epoch: 7   Global Step: 122140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:08,068-Speed 9490.01 samples/sec   Loss 6.4855   LearningRate 0.0402   Epoch: 7   Global Step: 122150   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:09,144-Speed 9527.33 samples/sec   Loss 6.5479   LearningRate 0.0402   Epoch: 7   Global Step: 122160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:10,203-Speed 9672.99 samples/sec   Loss 6.5129   LearningRate 0.0402   Epoch: 7   Global Step: 122170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:11,314-Speed 9227.26 samples/sec   Loss 6.4850   LearningRate 0.0402   Epoch: 7   Global Step: 122180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:12,433-Speed 9152.60 samples/sec   Loss 6.4530   LearningRate 0.0402   Epoch: 7   Global Step: 122190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:13,540-Speed 9253.22 samples/sec   Loss 6.5545   LearningRate 0.0402   Epoch: 7   Global Step: 122200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:14,636-Speed 9347.17 samples/sec   Loss 6.5366   LearningRate 0.0402   Epoch: 7   Global Step: 122210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:15,743-Speed 9261.47 samples/sec   Loss 6.5828   LearningRate 0.0402   Epoch: 7   Global Step: 122220   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:16,851-Speed 9245.26 samples/sec   Loss 6.5155   LearningRate 0.0402   Epoch: 7   Global Step: 122230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:17,937-Speed 9435.11 samples/sec   Loss 6.5881   LearningRate 0.0402   Epoch: 7   Global Step: 122240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:19,006-Speed 9589.04 samples/sec   Loss 6.6002   LearningRate 0.0402   Epoch: 7   Global Step: 122250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:20,058-Speed 9733.87 samples/sec   Loss 6.5550   LearningRate 0.0402   Epoch: 7   Global Step: 122260   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:21,138-Speed 9491.12 samples/sec   Loss 6.5695   LearningRate 0.0402   Epoch: 7   Global Step: 122270   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:22,254-Speed 9176.80 samples/sec   Loss 6.4598   LearningRate 0.0402   Epoch: 7   Global Step: 122280   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:23,328-Speed 9543.54 samples/sec   Loss 6.5424   LearningRate 0.0402   Epoch: 7   Global Step: 122290   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:24,399-Speed 9570.17 samples/sec   Loss 6.6100   LearningRate 0.0401   Epoch: 7   Global Step: 122300   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:25,464-Speed 9617.25 samples/sec   Loss 6.5091   LearningRate 0.0401   Epoch: 7   Global Step: 122310   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:26,539-Speed 9528.72 samples/sec   Loss 6.5990   LearningRate 0.0401   Epoch: 7   Global Step: 122320   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:27,606-Speed 9601.99 samples/sec   Loss 6.5473   LearningRate 0.0401   Epoch: 7   Global Step: 122330   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:28,690-Speed 9461.30 samples/sec   Loss 6.6365   LearningRate 0.0401   Epoch: 7   Global Step: 122340   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:29,753-Speed 9634.51 samples/sec   Loss 6.6173   LearningRate 0.0401   Epoch: 7   Global Step: 122350   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:30,818-Speed 9622.23 samples/sec   Loss 6.6267   LearningRate 0.0401   Epoch: 7   Global Step: 122360   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:28:31,920-Speed 9296.94 samples/sec   Loss 6.6180   LearningRate 0.0401   Epoch: 7   Global Step: 122370   Fp16 Grad Scale: 262144   Required: 9 hours
Training: 2022-04-11 16:28:33,026-Speed 9257.70 samples/sec   Loss 6.4570   LearningRate 0.0401   Epoch: 7   Global Step: 122380   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:34,106-Speed 9487.97 samples/sec   Loss 6.4763   LearningRate 0.0401   Epoch: 7   Global Step: 122390   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:35,186-Speed 9488.97 samples/sec   Loss 6.5750   LearningRate 0.0401   Epoch: 7   Global Step: 122400   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:36,266-Speed 9487.62 samples/sec   Loss 6.5296   LearningRate 0.0401   Epoch: 7   Global Step: 122410   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:37,351-Speed 9442.24 samples/sec   Loss 6.5111   LearningRate 0.0401   Epoch: 7   Global Step: 122420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:38,480-Speed 9070.40 samples/sec   Loss 6.6951   LearningRate 0.0401   Epoch: 7   Global Step: 122430   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:39,576-Speed 9354.09 samples/sec   Loss 6.4271   LearningRate 0.0401   Epoch: 7   Global Step: 122440   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:40,661-Speed 9441.48 samples/sec   Loss 6.4723   LearningRate 0.0401   Epoch: 7   Global Step: 122450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:41,743-Speed 9472.62 samples/sec   Loss 6.4929   LearningRate 0.0401   Epoch: 7   Global Step: 122460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:42,776-Speed 9923.34 samples/sec   Loss 6.5448   LearningRate 0.0401   Epoch: 7   Global Step: 122470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:43,920-Speed 8950.82 samples/sec   Loss 6.5456   LearningRate 0.0401   Epoch: 7   Global Step: 122480   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:44,975-Speed 9719.19 samples/sec   Loss 6.5302   LearningRate 0.0401   Epoch: 7   Global Step: 122490   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:46,101-Speed 9096.63 samples/sec   Loss 6.5380   LearningRate 0.0401   Epoch: 7   Global Step: 122500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:47,161-Speed 9663.78 samples/sec   Loss 6.5016   LearningRate 0.0401   Epoch: 7   Global Step: 122510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:48,242-Speed 9483.53 samples/sec   Loss 6.4963   LearningRate 0.0401   Epoch: 7   Global Step: 122520   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:49,319-Speed 9509.48 samples/sec   Loss 6.6372   LearningRate 0.0401   Epoch: 7   Global Step: 122530   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:50,421-Speed 9299.01 samples/sec   Loss 6.4977   LearningRate 0.0401   Epoch: 7   Global Step: 122540   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:51,525-Speed 9279.08 samples/sec   Loss 6.6260   LearningRate 0.0401   Epoch: 7   Global Step: 122550   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:52,635-Speed 9233.26 samples/sec   Loss 6.5639   LearningRate 0.0401   Epoch: 7   Global Step: 122560   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:53,747-Speed 9210.68 samples/sec   Loss 6.5227   LearningRate 0.0400   Epoch: 7   Global Step: 122570   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:54,838-Speed 9390.69 samples/sec   Loss 6.5132   LearningRate 0.0400   Epoch: 7   Global Step: 122580   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-11 16:28:55,942-Speed 9283.75 samples/sec   Loss 6.5838   LearningRate 0.0400   Epoch: 7   Global Step: 122590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:57,072-Speed 9063.21 samples/sec   Loss 6.4262   LearningRate 0.0400   Epoch: 7   Global Step: 122600   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:58,131-Speed 9675.39 samples/sec   Loss 6.5860   LearningRate 0.0400   Epoch: 7   Global Step: 122610   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:28:59,239-Speed 9251.09 samples/sec   Loss 6.6457   LearningRate 0.0400   Epoch: 7   Global Step: 122620   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:29:00,289-Speed 9754.83 samples/sec   Loss 6.4529   LearningRate 0.0400   Epoch: 7   Global Step: 122630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:29:01,362-Speed 9554.19 samples/sec   Loss 6.5223   LearningRate 0.0400   Epoch: 7   Global Step: 122640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-11 16:29:02,459-Speed 9334.51 samples/sec   Loss 6.5923   LearningRate 0.0400   Epoch: 7   Global Step: 122650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:03,518-Speed 9676.95 samples/sec   Loss 6.4616   LearningRate 0.0400   Epoch: 7   Global Step: 122660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:04,644-Speed 9103.55 samples/sec   Loss 6.6116   LearningRate 0.0400   Epoch: 7   Global Step: 122670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:05,743-Speed 9319.55 samples/sec   Loss 6.5536   LearningRate 0.0400   Epoch: 7   Global Step: 122680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:06,782-Speed 9860.89 samples/sec   Loss 6.5345   LearningRate 0.0400   Epoch: 7   Global Step: 122690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:07,868-Speed 9430.34 samples/sec   Loss 6.4828   LearningRate 0.0400   Epoch: 7   Global Step: 122700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:08,944-Speed 9528.91 samples/sec   Loss 6.5307   LearningRate 0.0400   Epoch: 7   Global Step: 122710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:10,005-Speed 9657.47 samples/sec   Loss 6.4534   LearningRate 0.0400   Epoch: 7   Global Step: 122720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:11,072-Speed 9604.21 samples/sec   Loss 6.6295   LearningRate 0.0400   Epoch: 7   Global Step: 122730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:12,125-Speed 9731.48 samples/sec   Loss 6.5016   LearningRate 0.0400   Epoch: 7   Global Step: 122740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:13,227-Speed 9290.91 samples/sec   Loss 6.5776   LearningRate 0.0400   Epoch: 7   Global Step: 122750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:14,323-Speed 9352.91 samples/sec   Loss 6.4926   LearningRate 0.0400   Epoch: 7   Global Step: 122760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:15,385-Speed 9647.85 samples/sec   Loss 6.4592   LearningRate 0.0400   Epoch: 7   Global Step: 122770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:16,473-Speed 9418.03 samples/sec   Loss 6.4189   LearningRate 0.0400   Epoch: 7   Global Step: 122780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:17,555-Speed 9468.98 samples/sec   Loss 6.5401   LearningRate 0.0400   Epoch: 7   Global Step: 122790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:18,706-Speed 8904.33 samples/sec   Loss 6.6226   LearningRate 0.0400   Epoch: 7   Global Step: 122800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:19,787-Speed 9478.65 samples/sec   Loss 6.3900   LearningRate 0.0400   Epoch: 7   Global Step: 122810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:20,870-Speed 9465.62 samples/sec   Loss 6.4805   LearningRate 0.0400   Epoch: 7   Global Step: 122820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:21,952-Speed 9466.67 samples/sec   Loss 6.5416   LearningRate 0.0399   Epoch: 7   Global Step: 122830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:23,030-Speed 9507.83 samples/sec   Loss 6.4673   LearningRate 0.0399   Epoch: 7   Global Step: 122840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:24,117-Speed 9426.55 samples/sec   Loss 6.6004   LearningRate 0.0399   Epoch: 7   Global Step: 122850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:25,197-Speed 9487.39 samples/sec   Loss 6.4715   LearningRate 0.0399   Epoch: 7   Global Step: 122860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:26,283-Speed 9427.75 samples/sec   Loss 6.5834   LearningRate 0.0399   Epoch: 7   Global Step: 122870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:27,355-Speed 9556.65 samples/sec   Loss 6.5584   LearningRate 0.0399   Epoch: 7   Global Step: 122880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:28,436-Speed 9486.43 samples/sec   Loss 6.5602   LearningRate 0.0399   Epoch: 7   Global Step: 122890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:29,497-Speed 9657.37 samples/sec   Loss 6.4873   LearningRate 0.0399   Epoch: 7   Global Step: 122900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:30,582-Speed 9442.03 samples/sec   Loss 6.5058   LearningRate 0.0399   Epoch: 7   Global Step: 122910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:31,661-Speed 9491.95 samples/sec   Loss 6.3931   LearningRate 0.0399   Epoch: 7   Global Step: 122920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:32,767-Speed 9268.55 samples/sec   Loss 6.5044   LearningRate 0.0399   Epoch: 7   Global Step: 122930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:33,884-Speed 9171.63 samples/sec   Loss 6.5028   LearningRate 0.0399   Epoch: 7   Global Step: 122940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:35,010-Speed 9098.72 samples/sec   Loss 6.5719   LearningRate 0.0399   Epoch: 7   Global Step: 122950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:36,134-Speed 9113.92 samples/sec   Loss 6.4882   LearningRate 0.0399   Epoch: 7   Global Step: 122960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:37,215-Speed 9475.58 samples/sec   Loss 6.4887   LearningRate 0.0399   Epoch: 7   Global Step: 122970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:38,318-Speed 9289.13 samples/sec   Loss 6.5579   LearningRate 0.0399   Epoch: 7   Global Step: 122980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:39,398-Speed 9497.07 samples/sec   Loss 6.4763   LearningRate 0.0399   Epoch: 7   Global Step: 122990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:40,452-Speed 9722.19 samples/sec   Loss 6.6094   LearningRate 0.0399   Epoch: 7   Global Step: 123000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:41,535-Speed 9461.89 samples/sec   Loss 6.5263   LearningRate 0.0399   Epoch: 7   Global Step: 123010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:42,671-Speed 9014.89 samples/sec   Loss 6.5015   LearningRate 0.0399   Epoch: 7   Global Step: 123020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:43,754-Speed 9463.84 samples/sec   Loss 6.5830   LearningRate 0.0399   Epoch: 7   Global Step: 123030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:44,834-Speed 9508.91 samples/sec   Loss 6.5656   LearningRate 0.0399   Epoch: 7   Global Step: 123040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:45,951-Speed 9169.65 samples/sec   Loss 6.5477   LearningRate 0.0399   Epoch: 7   Global Step: 123050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:47,049-Speed 9330.32 samples/sec   Loss 6.5126   LearningRate 0.0399   Epoch: 7   Global Step: 123060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:48,136-Speed 9424.10 samples/sec   Loss 6.5870   LearningRate 0.0399   Epoch: 7   Global Step: 123070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:49,215-Speed 9495.72 samples/sec   Loss 6.5656   LearningRate 0.0399   Epoch: 7   Global Step: 123080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:29:50,249-Speed 9912.04 samples/sec   Loss 6.5867   LearningRate 0.0398   Epoch: 7   Global Step: 123090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:51,363-Speed 9195.64 samples/sec   Loss 6.5758   LearningRate 0.0398   Epoch: 7   Global Step: 123100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:52,427-Speed 9631.77 samples/sec   Loss 6.5964   LearningRate 0.0398   Epoch: 7   Global Step: 123110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:53,509-Speed 9470.13 samples/sec   Loss 6.5065   LearningRate 0.0398   Epoch: 7   Global Step: 123120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:54,624-Speed 9185.15 samples/sec   Loss 6.5387   LearningRate 0.0398   Epoch: 7   Global Step: 123130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:55,726-Speed 9298.71 samples/sec   Loss 6.5574   LearningRate 0.0398   Epoch: 7   Global Step: 123140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:56,836-Speed 9233.63 samples/sec   Loss 6.6142   LearningRate 0.0398   Epoch: 7   Global Step: 123150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:57,903-Speed 9600.12 samples/sec   Loss 6.5257   LearningRate 0.0398   Epoch: 7   Global Step: 123160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:29:58,990-Speed 9426.45 samples/sec   Loss 6.5607   LearningRate 0.0398   Epoch: 7   Global Step: 123170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:00,085-Speed 9362.62 samples/sec   Loss 6.5663   LearningRate 0.0398   Epoch: 7   Global Step: 123180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:01,170-Speed 9443.49 samples/sec   Loss 6.4482   LearningRate 0.0398   Epoch: 7   Global Step: 123190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:02,282-Speed 9209.18 samples/sec   Loss 6.5827   LearningRate 0.0398   Epoch: 7   Global Step: 123200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:03,412-Speed 9078.01 samples/sec   Loss 6.5302   LearningRate 0.0398   Epoch: 7   Global Step: 123210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:04,499-Speed 9427.42 samples/sec   Loss 6.5050   LearningRate 0.0398   Epoch: 7   Global Step: 123220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:05,569-Speed 9575.27 samples/sec   Loss 6.6010   LearningRate 0.0398   Epoch: 7   Global Step: 123230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:06,663-Speed 9363.30 samples/sec   Loss 6.5421   LearningRate 0.0398   Epoch: 7   Global Step: 123240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:07,751-Speed 9416.66 samples/sec   Loss 6.5820   LearningRate 0.0398   Epoch: 7   Global Step: 123250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:08,820-Speed 9586.32 samples/sec   Loss 6.4791   LearningRate 0.0398   Epoch: 7   Global Step: 123260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:09,909-Speed 9412.53 samples/sec   Loss 6.5704   LearningRate 0.0398   Epoch: 7   Global Step: 123270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:10,993-Speed 9451.90 samples/sec   Loss 6.5039   LearningRate 0.0398   Epoch: 7   Global Step: 123280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:12,078-Speed 9437.88 samples/sec   Loss 6.4779   LearningRate 0.0398   Epoch: 7   Global Step: 123290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:13,191-Speed 9204.58 samples/sec   Loss 6.4704   LearningRate 0.0398   Epoch: 7   Global Step: 123300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:14,229-Speed 9874.48 samples/sec   Loss 6.6832   LearningRate 0.0398   Epoch: 7   Global Step: 123310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:15,363-Speed 9035.94 samples/sec   Loss 6.4841   LearningRate 0.0398   Epoch: 7   Global Step: 123320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:16,442-Speed 9503.43 samples/sec   Loss 6.5351   LearningRate 0.0398   Epoch: 7   Global Step: 123330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:17,524-Speed 9463.34 samples/sec   Loss 6.6150   LearningRate 0.0398   Epoch: 7   Global Step: 123340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:18,635-Speed 9221.96 samples/sec   Loss 6.4803   LearningRate 0.0398   Epoch: 7   Global Step: 123350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:19,717-Speed 9470.54 samples/sec   Loss 6.5616   LearningRate 0.0397   Epoch: 7   Global Step: 123360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:20,770-Speed 9735.63 samples/sec   Loss 6.5084   LearningRate 0.0397   Epoch: 7   Global Step: 123370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:21,834-Speed 9624.95 samples/sec   Loss 6.5767   LearningRate 0.0397   Epoch: 7   Global Step: 123380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:22,889-Speed 9713.12 samples/sec   Loss 6.5320   LearningRate 0.0397   Epoch: 7   Global Step: 123390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:24,030-Speed 8979.00 samples/sec   Loss 6.6159   LearningRate 0.0397   Epoch: 7   Global Step: 123400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:25,151-Speed 9140.59 samples/sec   Loss 6.5872   LearningRate 0.0397   Epoch: 7   Global Step: 123410   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:30:26,248-Speed 9342.03 samples/sec   Loss 6.6162   LearningRate 0.0397   Epoch: 7   Global Step: 123420   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:30:27,340-Speed 9386.69 samples/sec   Loss 6.4835   LearningRate 0.0397   Epoch: 7   Global Step: 123430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:28,508-Speed 8767.80 samples/sec   Loss 6.5174   LearningRate 0.0397   Epoch: 7   Global Step: 123440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:29,658-Speed 8910.66 samples/sec   Loss 6.6035   LearningRate 0.0397   Epoch: 7   Global Step: 123450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:30,751-Speed 9371.91 samples/sec   Loss 6.5728   LearningRate 0.0397   Epoch: 7   Global Step: 123460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:31,856-Speed 9275.16 samples/sec   Loss 6.5401   LearningRate 0.0397   Epoch: 7   Global Step: 123470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:32,904-Speed 9784.38 samples/sec   Loss 6.5317   LearningRate 0.0397   Epoch: 7   Global Step: 123480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:33,983-Speed 9495.20 samples/sec   Loss 6.5481   LearningRate 0.0397   Epoch: 7   Global Step: 123490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:35,043-Speed 9670.46 samples/sec   Loss 6.6438   LearningRate 0.0397   Epoch: 7   Global Step: 123500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:36,101-Speed 9682.42 samples/sec   Loss 6.5759   LearningRate 0.0397   Epoch: 7   Global Step: 123510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:37,196-Speed 9355.02 samples/sec   Loss 6.4699   LearningRate 0.0397   Epoch: 7   Global Step: 123520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:38,292-Speed 9350.71 samples/sec   Loss 6.6104   LearningRate 0.0397   Epoch: 7   Global Step: 123530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:39,385-Speed 9375.31 samples/sec   Loss 6.4934   LearningRate 0.0397   Epoch: 7   Global Step: 123540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:40,469-Speed 9443.39 samples/sec   Loss 6.4957   LearningRate 0.0397   Epoch: 7   Global Step: 123550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:41,532-Speed 9646.45 samples/sec   Loss 6.5400   LearningRate 0.0397   Epoch: 7   Global Step: 123560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:42,570-Speed 9869.09 samples/sec   Loss 6.5231   LearningRate 0.0397   Epoch: 7   Global Step: 123570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:43,665-Speed 9355.18 samples/sec   Loss 6.5308   LearningRate 0.0397   Epoch: 7   Global Step: 123580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:44,751-Speed 9439.61 samples/sec   Loss 6.5830   LearningRate 0.0397   Epoch: 7   Global Step: 123590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:45,852-Speed 9301.70 samples/sec   Loss 6.6039   LearningRate 0.0397   Epoch: 7   Global Step: 123600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:46,967-Speed 9195.62 samples/sec   Loss 6.5239   LearningRate 0.0397   Epoch: 7   Global Step: 123610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:48,057-Speed 9393.77 samples/sec   Loss 6.5316   LearningRate 0.0396   Epoch: 7   Global Step: 123620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:49,141-Speed 9452.24 samples/sec   Loss 6.5664   LearningRate 0.0396   Epoch: 7   Global Step: 123630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:50,259-Speed 9167.31 samples/sec   Loss 6.5837   LearningRate 0.0396   Epoch: 7   Global Step: 123640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:51,340-Speed 9477.41 samples/sec   Loss 6.5389   LearningRate 0.0396   Epoch: 7   Global Step: 123650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:52,397-Speed 9696.30 samples/sec   Loss 6.5575   LearningRate 0.0396   Epoch: 7   Global Step: 123660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:53,503-Speed 9273.58 samples/sec   Loss 6.6578   LearningRate 0.0396   Epoch: 7   Global Step: 123670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:54,585-Speed 9467.35 samples/sec   Loss 6.6548   LearningRate 0.0396   Epoch: 7   Global Step: 123680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:55,679-Speed 9367.18 samples/sec   Loss 6.4896   LearningRate 0.0396   Epoch: 7   Global Step: 123690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:56,727-Speed 9776.20 samples/sec   Loss 6.6520   LearningRate 0.0396   Epoch: 7   Global Step: 123700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:30:57,814-Speed 9422.27 samples/sec   Loss 6.5898   LearningRate 0.0396   Epoch: 7   Global Step: 123710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:58,868-Speed 9719.82 samples/sec   Loss 6.4809   LearningRate 0.0396   Epoch: 7   Global Step: 123720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:30:59,937-Speed 9586.94 samples/sec   Loss 6.5838   LearningRate 0.0396   Epoch: 7   Global Step: 123730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:31:01,057-Speed 9147.43 samples/sec   Loss 6.5948   LearningRate 0.0396   Epoch: 7   Global Step: 123740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:31:02,182-Speed 9109.32 samples/sec   Loss 6.5581   LearningRate 0.0396   Epoch: 7   Global Step: 123750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:31:03,334-Speed 8897.00 samples/sec   Loss 6.6211   LearningRate 0.0396   Epoch: 7   Global Step: 123760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:31:04,407-Speed 9549.01 samples/sec   Loss 6.6005   LearningRate 0.0396   Epoch: 7   Global Step: 123770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:31:05,501-Speed 9365.06 samples/sec   Loss 6.4855   LearningRate 0.0396   Epoch: 7   Global Step: 123780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:31:06,579-Speed 9504.81 samples/sec   Loss 6.4023   LearningRate 0.0396   Epoch: 7   Global Step: 123790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:31:07,678-Speed 9324.25 samples/sec   Loss 6.6141   LearningRate 0.0396   Epoch: 7   Global Step: 123800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:31:08,778-Speed 9316.06 samples/sec   Loss 6.5406   LearningRate 0.0396   Epoch: 7   Global Step: 123810   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:31:09,892-Speed 9192.25 samples/sec   Loss 6.5048   LearningRate 0.0396   Epoch: 7   Global Step: 123820   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:31:11,016-Speed 9118.37 samples/sec   Loss 6.6199   LearningRate 0.0396   Epoch: 7   Global Step: 123830   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:31:12,106-Speed 9407.19 samples/sec   Loss 6.6005   LearningRate 0.0396   Epoch: 7   Global Step: 123840   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:31:13,185-Speed 9494.32 samples/sec   Loss 6.5720   LearningRate 0.0396   Epoch: 7   Global Step: 123850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:31:14,296-Speed 9224.23 samples/sec   Loss 6.5770   LearningRate 0.0396   Epoch: 7   Global Step: 123860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:31:15,351-Speed 9709.60 samples/sec   Loss 6.5134   LearningRate 0.0396   Epoch: 7   Global Step: 123870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:31:16,406-Speed 9709.72 samples/sec   Loss 6.4864   LearningRate 0.0396   Epoch: 7   Global Step: 123880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:31:17,474-Speed 9595.69 samples/sec   Loss 6.6650   LearningRate 0.0395   Epoch: 7   Global Step: 123890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:31:18,541-Speed 9604.50 samples/sec   Loss 6.5742   LearningRate 0.0395   Epoch: 7   Global Step: 123900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:31:19,599-Speed 9684.28 samples/sec   Loss 6.4964   LearningRate 0.0395   Epoch: 7   Global Step: 123910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:31:20,671-Speed 9560.89 samples/sec   Loss 6.5707   LearningRate 0.0395   Epoch: 7   Global Step: 123920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:31:21,708-Speed 9884.14 samples/sec   Loss 6.4750   LearningRate 0.0395   Epoch: 7   Global Step: 123930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:31:22,809-Speed 9299.52 samples/sec   Loss 6.5796   LearningRate 0.0395   Epoch: 7   Global Step: 123940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:31:23,964-Speed 8873.81 samples/sec   Loss 6.5186   LearningRate 0.0395   Epoch: 7   Global Step: 123950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:31:25,108-Speed 8953.24 samples/sec   Loss 6.5969   LearningRate 0.0395   Epoch: 7   Global Step: 123960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:31:26,197-Speed 9406.35 samples/sec   Loss 6.6614   LearningRate 0.0395   Epoch: 7   Global Step: 123970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:31:27,295-Speed 9332.86 samples/sec   Loss 6.5136   LearningRate 0.0395   Epoch: 7   Global Step: 123980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:31:28,388-Speed 9375.22 samples/sec   Loss 6.6116   LearningRate 0.0395   Epoch: 7   Global Step: 123990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:31:29,493-Speed 9271.47 samples/sec   Loss 6.5459   LearningRate 0.0395   Epoch: 7   Global Step: 124000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:31:51,514-[lfw][124000]XNorm: 10.820141
Training: 2022-04-11 16:31:51,514-[lfw][124000]Accuracy-Flip: 0.99550+-0.00289
Training: 2022-04-11 16:31:51,515-[lfw][124000]Accuracy-Highest: 0.99683
Training: 2022-04-11 16:32:17,010-[cfp_fp][124000]XNorm: 9.170380
Training: 2022-04-11 16:32:17,011-[cfp_fp][124000]Accuracy-Flip: 0.95914+-0.00862
Training: 2022-04-11 16:32:17,011-[cfp_fp][124000]Accuracy-Highest: 0.96157
Training: 2022-04-11 16:32:38,990-[agedb_30][124000]XNorm: 10.493963
Training: 2022-04-11 16:32:38,991-[agedb_30][124000]Accuracy-Flip: 0.95933+-0.01001
Training: 2022-04-11 16:32:38,991-[agedb_30][124000]Accuracy-Highest: 0.96483
Training: 2022-04-11 16:32:40,078-Speed 145.07 samples/sec   Loss 6.6009   LearningRate 0.0395   Epoch: 7   Global Step: 124010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:32:41,132-Speed 9713.95 samples/sec   Loss 6.5636   LearningRate 0.0395   Epoch: 7   Global Step: 124020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:32:42,212-Speed 9494.48 samples/sec   Loss 6.5537   LearningRate 0.0395   Epoch: 7   Global Step: 124030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:32:43,288-Speed 9519.85 samples/sec   Loss 6.4991   LearningRate 0.0395   Epoch: 7   Global Step: 124040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:32:44,353-Speed 9621.07 samples/sec   Loss 6.5307   LearningRate 0.0395   Epoch: 7   Global Step: 124050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:32:45,432-Speed 9494.00 samples/sec   Loss 6.5816   LearningRate 0.0395   Epoch: 7   Global Step: 124060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:32:46,501-Speed 9583.99 samples/sec   Loss 6.5139   LearningRate 0.0395   Epoch: 7   Global Step: 124070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:32:47,625-Speed 9115.32 samples/sec   Loss 6.5305   LearningRate 0.0395   Epoch: 7   Global Step: 124080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:32:48,657-Speed 9927.45 samples/sec   Loss 6.4776   LearningRate 0.0395   Epoch: 7   Global Step: 124090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:32:49,729-Speed 9555.41 samples/sec   Loss 6.4927   LearningRate 0.0395   Epoch: 7   Global Step: 124100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:32:50,809-Speed 9488.40 samples/sec   Loss 6.6139   LearningRate 0.0395   Epoch: 7   Global Step: 124110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:32:51,854-Speed 9809.49 samples/sec   Loss 6.5315   LearningRate 0.0395   Epoch: 7   Global Step: 124120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:32:52,968-Speed 9195.98 samples/sec   Loss 6.5496   LearningRate 0.0395   Epoch: 7   Global Step: 124130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:32:54,032-Speed 9633.69 samples/sec   Loss 6.5662   LearningRate 0.0395   Epoch: 7   Global Step: 124140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:32:55,120-Speed 9415.47 samples/sec   Loss 6.6383   LearningRate 0.0395   Epoch: 7   Global Step: 124150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:32:56,176-Speed 9703.70 samples/sec   Loss 6.5116   LearningRate 0.0394   Epoch: 7   Global Step: 124160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:32:57,285-Speed 9239.48 samples/sec   Loss 6.6071   LearningRate 0.0394   Epoch: 7   Global Step: 124170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:32:58,391-Speed 9266.98 samples/sec   Loss 6.5237   LearningRate 0.0394   Epoch: 7   Global Step: 124180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:32:59,460-Speed 9581.30 samples/sec   Loss 6.5224   LearningRate 0.0394   Epoch: 7   Global Step: 124190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:00,557-Speed 9344.31 samples/sec   Loss 6.6482   LearningRate 0.0394   Epoch: 7   Global Step: 124200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:01,637-Speed 9484.78 samples/sec   Loss 6.5194   LearningRate 0.0394   Epoch: 7   Global Step: 124210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:02,695-Speed 9686.38 samples/sec   Loss 6.4590   LearningRate 0.0394   Epoch: 7   Global Step: 124220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:03,775-Speed 9480.57 samples/sec   Loss 6.5067   LearningRate 0.0394   Epoch: 7   Global Step: 124230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:04,853-Speed 9503.39 samples/sec   Loss 6.6301   LearningRate 0.0394   Epoch: 7   Global Step: 124240   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:33:05,943-Speed 9403.83 samples/sec   Loss 6.5470   LearningRate 0.0394   Epoch: 7   Global Step: 124250   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:33:07,020-Speed 9513.75 samples/sec   Loss 6.7602   LearningRate 0.0394   Epoch: 7   Global Step: 124260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:08,097-Speed 9515.54 samples/sec   Loss 6.5127   LearningRate 0.0394   Epoch: 7   Global Step: 124270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:09,237-Speed 8981.44 samples/sec   Loss 6.5138   LearningRate 0.0394   Epoch: 7   Global Step: 124280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:10,337-Speed 9316.77 samples/sec   Loss 6.5934   LearningRate 0.0394   Epoch: 7   Global Step: 124290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:11,430-Speed 9377.97 samples/sec   Loss 6.6200   LearningRate 0.0394   Epoch: 7   Global Step: 124300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:12,510-Speed 9490.97 samples/sec   Loss 6.5972   LearningRate 0.0394   Epoch: 7   Global Step: 124310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:13,636-Speed 9103.48 samples/sec   Loss 6.5980   LearningRate 0.0394   Epoch: 7   Global Step: 124320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:14,768-Speed 9049.05 samples/sec   Loss 6.6231   LearningRate 0.0394   Epoch: 7   Global Step: 124330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:15,852-Speed 9449.79 samples/sec   Loss 6.5771   LearningRate 0.0394   Epoch: 7   Global Step: 124340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:16,923-Speed 9567.90 samples/sec   Loss 6.5247   LearningRate 0.0394   Epoch: 7   Global Step: 124350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:18,010-Speed 9421.84 samples/sec   Loss 6.6658   LearningRate 0.0394   Epoch: 7   Global Step: 124360   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:33:19,081-Speed 9572.64 samples/sec   Loss 6.6137   LearningRate 0.0394   Epoch: 7   Global Step: 124370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:20,181-Speed 9314.60 samples/sec   Loss 6.5268   LearningRate 0.0394   Epoch: 7   Global Step: 124380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:21,289-Speed 9249.54 samples/sec   Loss 6.6028   LearningRate 0.0394   Epoch: 7   Global Step: 124390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:22,404-Speed 9188.47 samples/sec   Loss 6.5982   LearningRate 0.0394   Epoch: 7   Global Step: 124400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:23,493-Speed 9411.73 samples/sec   Loss 6.4950   LearningRate 0.0394   Epoch: 7   Global Step: 124410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:24,568-Speed 9526.76 samples/sec   Loss 6.6406   LearningRate 0.0393   Epoch: 7   Global Step: 124420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:25,669-Speed 9308.97 samples/sec   Loss 6.7051   LearningRate 0.0393   Epoch: 7   Global Step: 124430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:26,736-Speed 9597.03 samples/sec   Loss 6.5559   LearningRate 0.0393   Epoch: 7   Global Step: 124440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:27,814-Speed 9505.82 samples/sec   Loss 6.5310   LearningRate 0.0393   Epoch: 7   Global Step: 124450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:28,903-Speed 9411.06 samples/sec   Loss 6.4822   LearningRate 0.0393   Epoch: 7   Global Step: 124460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:29,977-Speed 9540.36 samples/sec   Loss 6.4906   LearningRate 0.0393   Epoch: 7   Global Step: 124470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:31,077-Speed 9313.04 samples/sec   Loss 6.4436   LearningRate 0.0393   Epoch: 7   Global Step: 124480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:32,158-Speed 9479.61 samples/sec   Loss 6.6148   LearningRate 0.0393   Epoch: 7   Global Step: 124490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:33,231-Speed 9552.12 samples/sec   Loss 6.6023   LearningRate 0.0393   Epoch: 7   Global Step: 124500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:34,297-Speed 9607.55 samples/sec   Loss 6.4952   LearningRate 0.0393   Epoch: 7   Global Step: 124510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:35,370-Speed 9549.26 samples/sec   Loss 6.5437   LearningRate 0.0393   Epoch: 7   Global Step: 124520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:36,484-Speed 9200.82 samples/sec   Loss 6.5996   LearningRate 0.0393   Epoch: 7   Global Step: 124530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:37,558-Speed 9537.39 samples/sec   Loss 6.5387   LearningRate 0.0393   Epoch: 7   Global Step: 124540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:38,648-Speed 9399.15 samples/sec   Loss 6.5461   LearningRate 0.0393   Epoch: 7   Global Step: 124550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:39,747-Speed 9320.13 samples/sec   Loss 6.5120   LearningRate 0.0393   Epoch: 7   Global Step: 124560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:40,867-Speed 9154.07 samples/sec   Loss 6.4425   LearningRate 0.0393   Epoch: 7   Global Step: 124570   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:33:41,925-Speed 9680.91 samples/sec   Loss 6.5314   LearningRate 0.0393   Epoch: 7   Global Step: 124580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:42,988-Speed 9638.11 samples/sec   Loss 6.5194   LearningRate 0.0393   Epoch: 7   Global Step: 124590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:44,062-Speed 9544.20 samples/sec   Loss 6.5000   LearningRate 0.0393   Epoch: 7   Global Step: 124600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:45,125-Speed 9636.46 samples/sec   Loss 6.5223   LearningRate 0.0393   Epoch: 7   Global Step: 124610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:46,216-Speed 9392.37 samples/sec   Loss 6.6032   LearningRate 0.0393   Epoch: 7   Global Step: 124620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:47,294-Speed 9504.29 samples/sec   Loss 6.6352   LearningRate 0.0393   Epoch: 7   Global Step: 124630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:48,386-Speed 9381.06 samples/sec   Loss 6.6288   LearningRate 0.0393   Epoch: 7   Global Step: 124640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:49,438-Speed 9738.84 samples/sec   Loss 6.6694   LearningRate 0.0393   Epoch: 7   Global Step: 124650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:50,555-Speed 9173.12 samples/sec   Loss 6.5895   LearningRate 0.0393   Epoch: 7   Global Step: 124660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:51,637-Speed 9475.85 samples/sec   Loss 6.6432   LearningRate 0.0393   Epoch: 7   Global Step: 124670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:52,702-Speed 9627.27 samples/sec   Loss 6.4711   LearningRate 0.0393   Epoch: 7   Global Step: 124680   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:33:53,776-Speed 9535.47 samples/sec   Loss 6.6175   LearningRate 0.0392   Epoch: 7   Global Step: 124690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:54,878-Speed 9296.55 samples/sec   Loss 6.5278   LearningRate 0.0392   Epoch: 7   Global Step: 124700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:56,000-Speed 9133.56 samples/sec   Loss 6.5529   LearningRate 0.0392   Epoch: 7   Global Step: 124710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:57,096-Speed 9343.52 samples/sec   Loss 6.5333   LearningRate 0.0392   Epoch: 7   Global Step: 124720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:58,171-Speed 9536.93 samples/sec   Loss 6.6011   LearningRate 0.0392   Epoch: 7   Global Step: 124730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:33:59,271-Speed 9310.65 samples/sec   Loss 6.4881   LearningRate 0.0392   Epoch: 7   Global Step: 124740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:00,371-Speed 9321.01 samples/sec   Loss 6.5687   LearningRate 0.0392   Epoch: 7   Global Step: 124750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:01,447-Speed 9521.86 samples/sec   Loss 6.5200   LearningRate 0.0392   Epoch: 7   Global Step: 124760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:02,526-Speed 9495.87 samples/sec   Loss 6.5579   LearningRate 0.0392   Epoch: 7   Global Step: 124770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:03,608-Speed 9466.07 samples/sec   Loss 6.5286   LearningRate 0.0392   Epoch: 7   Global Step: 124780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:04,680-Speed 9560.13 samples/sec   Loss 6.5343   LearningRate 0.0392   Epoch: 7   Global Step: 124790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:05,788-Speed 9241.12 samples/sec   Loss 6.4823   LearningRate 0.0392   Epoch: 7   Global Step: 124800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:06,879-Speed 9397.40 samples/sec   Loss 6.5367   LearningRate 0.0392   Epoch: 7   Global Step: 124810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:07,967-Speed 9418.74 samples/sec   Loss 6.6107   LearningRate 0.0392   Epoch: 7   Global Step: 124820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:09,049-Speed 9463.06 samples/sec   Loss 6.5624   LearningRate 0.0392   Epoch: 7   Global Step: 124830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:10,150-Speed 9315.44 samples/sec   Loss 6.4395   LearningRate 0.0392   Epoch: 7   Global Step: 124840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:11,232-Speed 9474.23 samples/sec   Loss 6.5455   LearningRate 0.0392   Epoch: 7   Global Step: 124850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:12,315-Speed 9458.11 samples/sec   Loss 6.5194   LearningRate 0.0392   Epoch: 7   Global Step: 124860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:13,446-Speed 9055.87 samples/sec   Loss 6.5187   LearningRate 0.0392   Epoch: 7   Global Step: 124870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:14,517-Speed 9563.19 samples/sec   Loss 6.6047   LearningRate 0.0392   Epoch: 7   Global Step: 124880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:15,602-Speed 9447.41 samples/sec   Loss 6.5919   LearningRate 0.0392   Epoch: 7   Global Step: 124890   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:34:16,676-Speed 9535.26 samples/sec   Loss 6.5544   LearningRate 0.0392   Epoch: 7   Global Step: 124900   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:34:17,814-Speed 9001.08 samples/sec   Loss 6.5144   LearningRate 0.0392   Epoch: 7   Global Step: 124910   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:34:18,881-Speed 9607.59 samples/sec   Loss 6.6163   LearningRate 0.0392   Epoch: 7   Global Step: 124920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:19,978-Speed 9337.98 samples/sec   Loss 6.3647   LearningRate 0.0392   Epoch: 7   Global Step: 124930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:21,108-Speed 9063.71 samples/sec   Loss 6.5384   LearningRate 0.0392   Epoch: 7   Global Step: 124940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:22,167-Speed 9677.80 samples/sec   Loss 6.4831   LearningRate 0.0391   Epoch: 7   Global Step: 124950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:23,241-Speed 9544.64 samples/sec   Loss 6.6297   LearningRate 0.0391   Epoch: 7   Global Step: 124960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:24,381-Speed 8985.16 samples/sec   Loss 6.5645   LearningRate 0.0391   Epoch: 7   Global Step: 124970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:25,509-Speed 9082.68 samples/sec   Loss 6.5325   LearningRate 0.0391   Epoch: 7   Global Step: 124980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:26,643-Speed 9041.17 samples/sec   Loss 6.5537   LearningRate 0.0391   Epoch: 7   Global Step: 124990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:27,703-Speed 9662.93 samples/sec   Loss 6.5004   LearningRate 0.0391   Epoch: 7   Global Step: 125000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:34:28,761-Speed 9687.61 samples/sec   Loss 6.6107   LearningRate 0.0391   Epoch: 7   Global Step: 125010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:34:29,869-Speed 9249.52 samples/sec   Loss 6.5866   LearningRate 0.0391   Epoch: 7   Global Step: 125020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:34:30,968-Speed 9321.52 samples/sec   Loss 6.5830   LearningRate 0.0391   Epoch: 7   Global Step: 125030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:34:32,024-Speed 9705.59 samples/sec   Loss 6.6806   LearningRate 0.0391   Epoch: 7   Global Step: 125040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:34:33,118-Speed 9368.37 samples/sec   Loss 6.5646   LearningRate 0.0391   Epoch: 7   Global Step: 125050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:34:34,245-Speed 9084.01 samples/sec   Loss 6.5468   LearningRate 0.0391   Epoch: 7   Global Step: 125060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:34:35,335-Speed 9400.41 samples/sec   Loss 6.6214   LearningRate 0.0391   Epoch: 7   Global Step: 125070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:34:36,407-Speed 9557.21 samples/sec   Loss 6.5318   LearningRate 0.0391   Epoch: 7   Global Step: 125080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:34:37,524-Speed 9175.62 samples/sec   Loss 6.5285   LearningRate 0.0391   Epoch: 7   Global Step: 125090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:34:38,569-Speed 9807.75 samples/sec   Loss 6.6080   LearningRate 0.0391   Epoch: 7   Global Step: 125100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:39,665-Speed 9344.18 samples/sec   Loss 6.4962   LearningRate 0.0391   Epoch: 7   Global Step: 125110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:40,771-Speed 9265.72 samples/sec   Loss 6.6127   LearningRate 0.0391   Epoch: 7   Global Step: 125120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:41,918-Speed 8933.64 samples/sec   Loss 6.5786   LearningRate 0.0391   Epoch: 7   Global Step: 125130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:43,043-Speed 9110.42 samples/sec   Loss 6.7184   LearningRate 0.0391   Epoch: 7   Global Step: 125140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:44,135-Speed 9380.47 samples/sec   Loss 6.4987   LearningRate 0.0391   Epoch: 7   Global Step: 125150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:45,233-Speed 9326.69 samples/sec   Loss 6.5854   LearningRate 0.0391   Epoch: 7   Global Step: 125160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:46,350-Speed 9178.66 samples/sec   Loss 6.4990   LearningRate 0.0391   Epoch: 7   Global Step: 125170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:47,433-Speed 9463.69 samples/sec   Loss 6.5086   LearningRate 0.0391   Epoch: 7   Global Step: 125180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:48,514-Speed 9471.79 samples/sec   Loss 6.5048   LearningRate 0.0391   Epoch: 7   Global Step: 125190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:49,576-Speed 9657.05 samples/sec   Loss 6.5351   LearningRate 0.0391   Epoch: 7   Global Step: 125200   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:34:50,685-Speed 9234.44 samples/sec   Loss 6.5311   LearningRate 0.0391   Epoch: 7   Global Step: 125210   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:34:51,775-Speed 9401.65 samples/sec   Loss 6.4606   LearningRate 0.0390   Epoch: 7   Global Step: 125220   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:34:52,820-Speed 9804.18 samples/sec   Loss 6.5504   LearningRate 0.0390   Epoch: 7   Global Step: 125230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:53,861-Speed 9844.34 samples/sec   Loss 6.6547   LearningRate 0.0390   Epoch: 7   Global Step: 125240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:54,920-Speed 9674.55 samples/sec   Loss 6.6019   LearningRate 0.0390   Epoch: 7   Global Step: 125250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:55,988-Speed 9597.85 samples/sec   Loss 6.5719   LearningRate 0.0390   Epoch: 7   Global Step: 125260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:57,102-Speed 9188.84 samples/sec   Loss 6.5722   LearningRate 0.0390   Epoch: 7   Global Step: 125270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:58,216-Speed 9206.65 samples/sec   Loss 6.5892   LearningRate 0.0390   Epoch: 7   Global Step: 125280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:34:59,312-Speed 9344.39 samples/sec   Loss 6.6124   LearningRate 0.0390   Epoch: 7   Global Step: 125290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:00,395-Speed 9457.80 samples/sec   Loss 6.7126   LearningRate 0.0390   Epoch: 7   Global Step: 125300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:01,499-Speed 9284.35 samples/sec   Loss 6.6108   LearningRate 0.0390   Epoch: 7   Global Step: 125310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:02,582-Speed 9459.02 samples/sec   Loss 6.4038   LearningRate 0.0390   Epoch: 7   Global Step: 125320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:03,700-Speed 9166.10 samples/sec   Loss 6.6030   LearningRate 0.0390   Epoch: 7   Global Step: 125330   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:35:04,758-Speed 9685.05 samples/sec   Loss 6.5034   LearningRate 0.0390   Epoch: 7   Global Step: 125340   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:35:05,816-Speed 9678.47 samples/sec   Loss 6.6737   LearningRate 0.0390   Epoch: 7   Global Step: 125350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:06,926-Speed 9233.11 samples/sec   Loss 6.6006   LearningRate 0.0390   Epoch: 7   Global Step: 125360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:08,007-Speed 9480.23 samples/sec   Loss 6.5319   LearningRate 0.0390   Epoch: 7   Global Step: 125370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:09,147-Speed 8987.00 samples/sec   Loss 6.4585   LearningRate 0.0390   Epoch: 7   Global Step: 125380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:10,244-Speed 9344.66 samples/sec   Loss 6.5074   LearningRate 0.0390   Epoch: 7   Global Step: 125390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:11,295-Speed 9750.98 samples/sec   Loss 6.4375   LearningRate 0.0390   Epoch: 7   Global Step: 125400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:12,437-Speed 8965.01 samples/sec   Loss 6.4880   LearningRate 0.0390   Epoch: 7   Global Step: 125410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:13,543-Speed 9267.40 samples/sec   Loss 6.5794   LearningRate 0.0390   Epoch: 7   Global Step: 125420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:14,598-Speed 9708.46 samples/sec   Loss 6.6542   LearningRate 0.0390   Epoch: 7   Global Step: 125430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:15,695-Speed 9344.21 samples/sec   Loss 6.4847   LearningRate 0.0390   Epoch: 7   Global Step: 125440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:16,807-Speed 9212.17 samples/sec   Loss 6.4493   LearningRate 0.0390   Epoch: 7   Global Step: 125450   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:35:17,904-Speed 9338.57 samples/sec   Loss 6.5130   LearningRate 0.0390   Epoch: 7   Global Step: 125460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:19,021-Speed 9171.14 samples/sec   Loss 6.5321   LearningRate 0.0390   Epoch: 7   Global Step: 125470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:20,071-Speed 9754.10 samples/sec   Loss 6.4493   LearningRate 0.0390   Epoch: 7   Global Step: 125480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:21,140-Speed 9587.30 samples/sec   Loss 6.5234   LearningRate 0.0389   Epoch: 7   Global Step: 125490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:22,283-Speed 8967.88 samples/sec   Loss 6.6182   LearningRate 0.0389   Epoch: 7   Global Step: 125500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:23,403-Speed 9150.24 samples/sec   Loss 6.5614   LearningRate 0.0389   Epoch: 7   Global Step: 125510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:24,496-Speed 9376.99 samples/sec   Loss 6.5064   LearningRate 0.0389   Epoch: 7   Global Step: 125520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:25,590-Speed 9367.26 samples/sec   Loss 6.5198   LearningRate 0.0389   Epoch: 7   Global Step: 125530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:26,670-Speed 9486.32 samples/sec   Loss 6.6562   LearningRate 0.0389   Epoch: 7   Global Step: 125540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:27,756-Speed 9437.76 samples/sec   Loss 6.5103   LearningRate 0.0389   Epoch: 7   Global Step: 125550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:28,852-Speed 9352.79 samples/sec   Loss 6.4393   LearningRate 0.0389   Epoch: 7   Global Step: 125560   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:35:29,949-Speed 9337.53 samples/sec   Loss 6.5297   LearningRate 0.0389   Epoch: 7   Global Step: 125570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:31,040-Speed 9390.56 samples/sec   Loss 6.5807   LearningRate 0.0389   Epoch: 7   Global Step: 125580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:32,155-Speed 9185.75 samples/sec   Loss 6.6465   LearningRate 0.0389   Epoch: 7   Global Step: 125590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:33,314-Speed 8845.52 samples/sec   Loss 6.4647   LearningRate 0.0389   Epoch: 7   Global Step: 125600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:34,390-Speed 9521.95 samples/sec   Loss 6.4707   LearningRate 0.0389   Epoch: 7   Global Step: 125610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:35,468-Speed 9499.71 samples/sec   Loss 6.5615   LearningRate 0.0389   Epoch: 7   Global Step: 125620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:36,571-Speed 9285.44 samples/sec   Loss 6.5995   LearningRate 0.0389   Epoch: 7   Global Step: 125630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:37,659-Speed 9423.27 samples/sec   Loss 6.6776   LearningRate 0.0389   Epoch: 7   Global Step: 125640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:38,742-Speed 9459.71 samples/sec   Loss 6.5597   LearningRate 0.0389   Epoch: 7   Global Step: 125650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:39,824-Speed 9473.94 samples/sec   Loss 6.4246   LearningRate 0.0389   Epoch: 7   Global Step: 125660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:40,902-Speed 9501.80 samples/sec   Loss 6.6743   LearningRate 0.0389   Epoch: 7   Global Step: 125670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:42,032-Speed 9069.97 samples/sec   Loss 6.4467   LearningRate 0.0389   Epoch: 7   Global Step: 125680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:43,133-Speed 9308.67 samples/sec   Loss 6.5354   LearningRate 0.0389   Epoch: 7   Global Step: 125690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:44,211-Speed 9502.23 samples/sec   Loss 6.6062   LearningRate 0.0389   Epoch: 7   Global Step: 125700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:45,263-Speed 9744.63 samples/sec   Loss 6.5300   LearningRate 0.0389   Epoch: 7   Global Step: 125710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:46,296-Speed 9919.06 samples/sec   Loss 6.7759   LearningRate 0.0389   Epoch: 7   Global Step: 125720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:47,358-Speed 9646.71 samples/sec   Loss 6.4963   LearningRate 0.0389   Epoch: 7   Global Step: 125730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:48,463-Speed 9269.98 samples/sec   Loss 6.4224   LearningRate 0.0389   Epoch: 7   Global Step: 125740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:49,571-Speed 9248.33 samples/sec   Loss 6.4809   LearningRate 0.0389   Epoch: 7   Global Step: 125750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:50,625-Speed 9721.77 samples/sec   Loss 6.5049   LearningRate 0.0388   Epoch: 7   Global Step: 125760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:51,691-Speed 9614.68 samples/sec   Loss 6.6453   LearningRate 0.0388   Epoch: 7   Global Step: 125770   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:35:52,759-Speed 9593.51 samples/sec   Loss 6.4979   LearningRate 0.0388   Epoch: 7   Global Step: 125780   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:35:53,807-Speed 9775.16 samples/sec   Loss 6.5111   LearningRate 0.0388   Epoch: 7   Global Step: 125790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:54,886-Speed 9493.46 samples/sec   Loss 6.4911   LearningRate 0.0388   Epoch: 7   Global Step: 125800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:55,977-Speed 9387.70 samples/sec   Loss 6.6027   LearningRate 0.0388   Epoch: 7   Global Step: 125810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:57,072-Speed 9358.93 samples/sec   Loss 6.5122   LearningRate 0.0388   Epoch: 7   Global Step: 125820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:58,160-Speed 9419.49 samples/sec   Loss 6.5557   LearningRate 0.0388   Epoch: 7   Global Step: 125830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:35:59,240-Speed 9484.92 samples/sec   Loss 6.4024   LearningRate 0.0388   Epoch: 7   Global Step: 125840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:36:00,338-Speed 9334.25 samples/sec   Loss 6.5083   LearningRate 0.0388   Epoch: 7   Global Step: 125850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:36:01,403-Speed 9622.93 samples/sec   Loss 6.6061   LearningRate 0.0388   Epoch: 7   Global Step: 125860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:36:02,495-Speed 9383.41 samples/sec   Loss 6.6169   LearningRate 0.0388   Epoch: 7   Global Step: 125870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:36:03,602-Speed 9253.70 samples/sec   Loss 6.5603   LearningRate 0.0388   Epoch: 7   Global Step: 125880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:36:04,627-Speed 9992.93 samples/sec   Loss 6.6449   LearningRate 0.0388   Epoch: 7   Global Step: 125890   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:36:05,691-Speed 9631.81 samples/sec   Loss 6.5228   LearningRate 0.0388   Epoch: 7   Global Step: 125900   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:36:06,769-Speed 9506.58 samples/sec   Loss 6.6197   LearningRate 0.0388   Epoch: 7   Global Step: 125910   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:36:07,847-Speed 9505.43 samples/sec   Loss 6.6807   LearningRate 0.0388   Epoch: 7   Global Step: 125920   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:36:08,898-Speed 9742.76 samples/sec   Loss 6.5277   LearningRate 0.0388   Epoch: 7   Global Step: 125930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:36:09,957-Speed 9673.37 samples/sec   Loss 6.5453   LearningRate 0.0388   Epoch: 7   Global Step: 125940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:36:11,057-Speed 9317.70 samples/sec   Loss 6.6372   LearningRate 0.0388   Epoch: 7   Global Step: 125950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:36:12,155-Speed 9331.93 samples/sec   Loss 6.5138   LearningRate 0.0388   Epoch: 7   Global Step: 125960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:36:13,239-Speed 9451.63 samples/sec   Loss 6.6669   LearningRate 0.0388   Epoch: 7   Global Step: 125970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:36:14,307-Speed 9591.91 samples/sec   Loss 6.6445   LearningRate 0.0388   Epoch: 7   Global Step: 125980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:36:15,376-Speed 9585.54 samples/sec   Loss 6.6620   LearningRate 0.0388   Epoch: 7   Global Step: 125990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:36:16,467-Speed 9388.78 samples/sec   Loss 6.5726   LearningRate 0.0388   Epoch: 7   Global Step: 126000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:36:38,383-[lfw][126000]XNorm: 10.732953
Training: 2022-04-11 16:36:38,384-[lfw][126000]Accuracy-Flip: 0.99650+-0.00252
Training: 2022-04-11 16:36:38,384-[lfw][126000]Accuracy-Highest: 0.99683
Training: 2022-04-11 16:37:03,668-[cfp_fp][126000]XNorm: 9.168110
Training: 2022-04-11 16:37:03,669-[cfp_fp][126000]Accuracy-Flip: 0.95843+-0.01266
Training: 2022-04-11 16:37:03,669-[cfp_fp][126000]Accuracy-Highest: 0.96157
Training: 2022-04-11 16:37:25,454-[agedb_30][126000]XNorm: 10.388509
Training: 2022-04-11 16:37:25,455-[agedb_30][126000]Accuracy-Flip: 0.96400+-0.00898
Training: 2022-04-11 16:37:25,456-[agedb_30][126000]Accuracy-Highest: 0.96483
Training: 2022-04-11 16:37:26,524-Speed 146.17 samples/sec   Loss 6.5337   LearningRate 0.0388   Epoch: 7   Global Step: 126010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:27,613-Speed 9406.28 samples/sec   Loss 6.5580   LearningRate 0.0387   Epoch: 7   Global Step: 126020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:28,742-Speed 9076.05 samples/sec   Loss 6.6052   LearningRate 0.0387   Epoch: 7   Global Step: 126030   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:37:29,842-Speed 9312.99 samples/sec   Loss 6.5852   LearningRate 0.0387   Epoch: 7   Global Step: 126040   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:37:30,927-Speed 9442.94 samples/sec   Loss 6.4840   LearningRate 0.0387   Epoch: 7   Global Step: 126050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:32,017-Speed 9404.79 samples/sec   Loss 6.5589   LearningRate 0.0387   Epoch: 7   Global Step: 126060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:33,124-Speed 9257.99 samples/sec   Loss 6.5510   LearningRate 0.0387   Epoch: 7   Global Step: 126070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:34,199-Speed 9526.75 samples/sec   Loss 6.4284   LearningRate 0.0387   Epoch: 7   Global Step: 126080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:35,252-Speed 9735.86 samples/sec   Loss 6.5783   LearningRate 0.0387   Epoch: 7   Global Step: 126090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:36,377-Speed 9101.50 samples/sec   Loss 6.6423   LearningRate 0.0387   Epoch: 7   Global Step: 126100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:37,479-Speed 9301.60 samples/sec   Loss 6.4940   LearningRate 0.0387   Epoch: 7   Global Step: 126110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:38,567-Speed 9415.30 samples/sec   Loss 6.5086   LearningRate 0.0387   Epoch: 7   Global Step: 126120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:39,696-Speed 9072.23 samples/sec   Loss 6.5221   LearningRate 0.0387   Epoch: 7   Global Step: 126130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:40,770-Speed 9545.72 samples/sec   Loss 6.6035   LearningRate 0.0387   Epoch: 7   Global Step: 126140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:41,852-Speed 9469.84 samples/sec   Loss 6.5230   LearningRate 0.0387   Epoch: 7   Global Step: 126150   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:37:42,956-Speed 9278.62 samples/sec   Loss 6.5246   LearningRate 0.0387   Epoch: 7   Global Step: 126160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:44,055-Speed 9328.35 samples/sec   Loss 6.6063   LearningRate 0.0387   Epoch: 7   Global Step: 126170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:45,147-Speed 9380.30 samples/sec   Loss 6.5584   LearningRate 0.0387   Epoch: 7   Global Step: 126180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:46,213-Speed 9605.45 samples/sec   Loss 6.5901   LearningRate 0.0387   Epoch: 7   Global Step: 126190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:47,304-Speed 9398.52 samples/sec   Loss 6.6187   LearningRate 0.0387   Epoch: 7   Global Step: 126200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:48,406-Speed 9296.43 samples/sec   Loss 6.6171   LearningRate 0.0387   Epoch: 7   Global Step: 126210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:49,490-Speed 9463.21 samples/sec   Loss 6.6118   LearningRate 0.0387   Epoch: 7   Global Step: 126220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:50,585-Speed 9351.19 samples/sec   Loss 6.6456   LearningRate 0.0387   Epoch: 7   Global Step: 126230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:51,663-Speed 9505.49 samples/sec   Loss 6.5609   LearningRate 0.0387   Epoch: 7   Global Step: 126240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:52,800-Speed 9013.70 samples/sec   Loss 6.5161   LearningRate 0.0387   Epoch: 7   Global Step: 126250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:53,912-Speed 9215.30 samples/sec   Loss 6.5704   LearningRate 0.0387   Epoch: 7   Global Step: 126260   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:37:54,962-Speed 9760.39 samples/sec   Loss 6.6548   LearningRate 0.0387   Epoch: 7   Global Step: 126270   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:37:56,048-Speed 9428.69 samples/sec   Loss 6.5088   LearningRate 0.0387   Epoch: 7   Global Step: 126280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:57,143-Speed 9357.96 samples/sec   Loss 6.5906   LearningRate 0.0386   Epoch: 7   Global Step: 126290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:58,260-Speed 9175.29 samples/sec   Loss 6.6026   LearningRate 0.0386   Epoch: 7   Global Step: 126300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:37:59,328-Speed 9591.32 samples/sec   Loss 6.6255   LearningRate 0.0386   Epoch: 7   Global Step: 126310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:00,460-Speed 9054.63 samples/sec   Loss 6.4440   LearningRate 0.0386   Epoch: 7   Global Step: 126320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:01,537-Speed 9515.67 samples/sec   Loss 6.5597   LearningRate 0.0386   Epoch: 7   Global Step: 126330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:02,672-Speed 9025.69 samples/sec   Loss 6.5668   LearningRate 0.0386   Epoch: 7   Global Step: 126340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:03,741-Speed 9582.00 samples/sec   Loss 6.5885   LearningRate 0.0386   Epoch: 7   Global Step: 126350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:04,830-Speed 9410.55 samples/sec   Loss 6.6444   LearningRate 0.0386   Epoch: 7   Global Step: 126360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:05,910-Speed 9483.03 samples/sec   Loss 6.4897   LearningRate 0.0386   Epoch: 7   Global Step: 126370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:06,984-Speed 9546.49 samples/sec   Loss 6.5225   LearningRate 0.0386   Epoch: 7   Global Step: 126380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:08,106-Speed 9130.88 samples/sec   Loss 6.5753   LearningRate 0.0386   Epoch: 7   Global Step: 126390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:09,176-Speed 9574.23 samples/sec   Loss 6.5978   LearningRate 0.0386   Epoch: 7   Global Step: 126400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:10,261-Speed 9442.98 samples/sec   Loss 6.5088   LearningRate 0.0386   Epoch: 7   Global Step: 126410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:11,373-Speed 9214.03 samples/sec   Loss 6.4960   LearningRate 0.0386   Epoch: 7   Global Step: 126420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:12,441-Speed 9593.10 samples/sec   Loss 6.5631   LearningRate 0.0386   Epoch: 7   Global Step: 126430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:13,498-Speed 9702.75 samples/sec   Loss 6.5903   LearningRate 0.0386   Epoch: 7   Global Step: 126440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:14,539-Speed 9840.33 samples/sec   Loss 6.4755   LearningRate 0.0386   Epoch: 7   Global Step: 126450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:15,616-Speed 9514.20 samples/sec   Loss 6.5461   LearningRate 0.0386   Epoch: 7   Global Step: 126460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:16,688-Speed 9554.33 samples/sec   Loss 6.6268   LearningRate 0.0386   Epoch: 7   Global Step: 126470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:17,769-Speed 9483.88 samples/sec   Loss 6.5964   LearningRate 0.0386   Epoch: 7   Global Step: 126480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:18,813-Speed 9806.88 samples/sec   Loss 6.4693   LearningRate 0.0386   Epoch: 7   Global Step: 126490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:19,924-Speed 9227.60 samples/sec   Loss 6.6512   LearningRate 0.0386   Epoch: 7   Global Step: 126500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:20,984-Speed 9662.99 samples/sec   Loss 6.4699   LearningRate 0.0386   Epoch: 7   Global Step: 126510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:22,063-Speed 9498.98 samples/sec   Loss 6.5174   LearningRate 0.0386   Epoch: 7   Global Step: 126520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:23,214-Speed 8898.17 samples/sec   Loss 6.4909   LearningRate 0.0386   Epoch: 7   Global Step: 126530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:24,314-Speed 9317.01 samples/sec   Loss 6.6554   LearningRate 0.0386   Epoch: 7   Global Step: 126540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:25,442-Speed 9081.94 samples/sec   Loss 6.4402   LearningRate 0.0386   Epoch: 7   Global Step: 126550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:26,519-Speed 9516.12 samples/sec   Loss 6.4767   LearningRate 0.0385   Epoch: 7   Global Step: 126560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:27,617-Speed 9328.09 samples/sec   Loss 6.6195   LearningRate 0.0385   Epoch: 7   Global Step: 126570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:28,708-Speed 9391.78 samples/sec   Loss 6.6249   LearningRate 0.0385   Epoch: 7   Global Step: 126580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:29,806-Speed 9331.19 samples/sec   Loss 6.5809   LearningRate 0.0385   Epoch: 7   Global Step: 126590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:30,912-Speed 9263.17 samples/sec   Loss 6.5340   LearningRate 0.0385   Epoch: 7   Global Step: 126600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:32,008-Speed 9354.33 samples/sec   Loss 6.5455   LearningRate 0.0385   Epoch: 7   Global Step: 126610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:33,066-Speed 9679.59 samples/sec   Loss 6.5812   LearningRate 0.0385   Epoch: 7   Global Step: 126620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:34,174-Speed 9250.10 samples/sec   Loss 6.5760   LearningRate 0.0385   Epoch: 7   Global Step: 126630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:35,288-Speed 9199.57 samples/sec   Loss 6.5185   LearningRate 0.0385   Epoch: 7   Global Step: 126640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:36,351-Speed 9638.52 samples/sec   Loss 6.6439   LearningRate 0.0385   Epoch: 7   Global Step: 126650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:37,448-Speed 9335.28 samples/sec   Loss 6.4652   LearningRate 0.0385   Epoch: 7   Global Step: 126660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:38,542-Speed 9367.54 samples/sec   Loss 6.5467   LearningRate 0.0385   Epoch: 7   Global Step: 126670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:39,639-Speed 9338.56 samples/sec   Loss 6.5281   LearningRate 0.0385   Epoch: 7   Global Step: 126680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:40,730-Speed 9394.76 samples/sec   Loss 6.5561   LearningRate 0.0385   Epoch: 7   Global Step: 126690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:41,828-Speed 9324.50 samples/sec   Loss 6.6436   LearningRate 0.0385   Epoch: 7   Global Step: 126700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:42,933-Speed 9275.95 samples/sec   Loss 6.6199   LearningRate 0.0385   Epoch: 7   Global Step: 126710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:44,021-Speed 9426.52 samples/sec   Loss 6.5642   LearningRate 0.0385   Epoch: 7   Global Step: 126720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:45,098-Speed 9508.66 samples/sec   Loss 6.4961   LearningRate 0.0385   Epoch: 7   Global Step: 126730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:46,164-Speed 9609.79 samples/sec   Loss 6.5896   LearningRate 0.0385   Epoch: 7   Global Step: 126740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:47,240-Speed 9528.25 samples/sec   Loss 6.6047   LearningRate 0.0385   Epoch: 7   Global Step: 126750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:38:48,341-Speed 9304.86 samples/sec   Loss 6.5367   LearningRate 0.0385   Epoch: 7   Global Step: 126760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:49,448-Speed 9254.12 samples/sec   Loss 6.6772   LearningRate 0.0385   Epoch: 7   Global Step: 126770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:50,531-Speed 9466.88 samples/sec   Loss 6.6050   LearningRate 0.0385   Epoch: 7   Global Step: 126780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:51,626-Speed 9350.10 samples/sec   Loss 6.4954   LearningRate 0.0385   Epoch: 7   Global Step: 126790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:52,695-Speed 9583.49 samples/sec   Loss 6.5120   LearningRate 0.0385   Epoch: 7   Global Step: 126800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:53,754-Speed 9678.80 samples/sec   Loss 6.5168   LearningRate 0.0385   Epoch: 7   Global Step: 126810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:54,832-Speed 9501.93 samples/sec   Loss 6.6696   LearningRate 0.0385   Epoch: 7   Global Step: 126820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:55,940-Speed 9251.00 samples/sec   Loss 6.5317   LearningRate 0.0384   Epoch: 7   Global Step: 126830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:57,025-Speed 9442.78 samples/sec   Loss 6.5763   LearningRate 0.0384   Epoch: 7   Global Step: 126840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:58,083-Speed 9682.01 samples/sec   Loss 6.7260   LearningRate 0.0384   Epoch: 7   Global Step: 126850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:38:59,128-Speed 9805.12 samples/sec   Loss 6.5581   LearningRate 0.0384   Epoch: 7   Global Step: 126860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:00,217-Speed 9408.09 samples/sec   Loss 6.5786   LearningRate 0.0384   Epoch: 7   Global Step: 126870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:01,305-Speed 9417.78 samples/sec   Loss 6.5063   LearningRate 0.0384   Epoch: 7   Global Step: 126880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:02,391-Speed 9435.73 samples/sec   Loss 6.5371   LearningRate 0.0384   Epoch: 7   Global Step: 126890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:03,449-Speed 9687.82 samples/sec   Loss 6.4711   LearningRate 0.0384   Epoch: 7   Global Step: 126900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:04,487-Speed 9867.40 samples/sec   Loss 6.5689   LearningRate 0.0384   Epoch: 7   Global Step: 126910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:05,575-Speed 9420.27 samples/sec   Loss 6.5863   LearningRate 0.0384   Epoch: 7   Global Step: 126920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:06,671-Speed 9350.28 samples/sec   Loss 6.4743   LearningRate 0.0384   Epoch: 7   Global Step: 126930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:07,729-Speed 9683.29 samples/sec   Loss 6.5272   LearningRate 0.0384   Epoch: 7   Global Step: 126940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:08,785-Speed 9702.90 samples/sec   Loss 6.4617   LearningRate 0.0384   Epoch: 7   Global Step: 126950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:09,893-Speed 9244.00 samples/sec   Loss 6.6686   LearningRate 0.0384   Epoch: 7   Global Step: 126960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:10,981-Speed 9413.97 samples/sec   Loss 6.4577   LearningRate 0.0384   Epoch: 7   Global Step: 126970   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:12,065-Speed 9451.61 samples/sec   Loss 6.5034   LearningRate 0.0384   Epoch: 7   Global Step: 126980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:13,157-Speed 9387.39 samples/sec   Loss 6.4695   LearningRate 0.0384   Epoch: 7   Global Step: 126990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:14,202-Speed 9807.31 samples/sec   Loss 6.4985   LearningRate 0.0384   Epoch: 7   Global Step: 127000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:15,265-Speed 9635.63 samples/sec   Loss 6.5410   LearningRate 0.0384   Epoch: 7   Global Step: 127010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:16,359-Speed 9364.21 samples/sec   Loss 6.4978   LearningRate 0.0384   Epoch: 7   Global Step: 127020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:17,478-Speed 9158.37 samples/sec   Loss 6.4687   LearningRate 0.0384   Epoch: 7   Global Step: 127030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:18,594-Speed 9183.16 samples/sec   Loss 6.5348   LearningRate 0.0384   Epoch: 7   Global Step: 127040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:19,673-Speed 9492.58 samples/sec   Loss 6.5750   LearningRate 0.0384   Epoch: 7   Global Step: 127050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:20,762-Speed 9413.60 samples/sec   Loss 6.6047   LearningRate 0.0384   Epoch: 7   Global Step: 127060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:21,884-Speed 9135.02 samples/sec   Loss 6.5329   LearningRate 0.0384   Epoch: 7   Global Step: 127070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:22,975-Speed 9397.49 samples/sec   Loss 6.5885   LearningRate 0.0384   Epoch: 7   Global Step: 127080   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:39:24,066-Speed 9389.92 samples/sec   Loss 6.6379   LearningRate 0.0384   Epoch: 7   Global Step: 127090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:25,194-Speed 9076.64 samples/sec   Loss 6.6059   LearningRate 0.0383   Epoch: 7   Global Step: 127100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:26,275-Speed 9481.97 samples/sec   Loss 6.6119   LearningRate 0.0383   Epoch: 7   Global Step: 127110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:27,339-Speed 9624.54 samples/sec   Loss 6.5373   LearningRate 0.0383   Epoch: 7   Global Step: 127120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:28,442-Speed 9293.64 samples/sec   Loss 6.6250   LearningRate 0.0383   Epoch: 7   Global Step: 127130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:29,599-Speed 8850.36 samples/sec   Loss 6.4072   LearningRate 0.0383   Epoch: 7   Global Step: 127140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:30,695-Speed 9354.94 samples/sec   Loss 6.6018   LearningRate 0.0383   Epoch: 7   Global Step: 127150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:31,764-Speed 9583.45 samples/sec   Loss 6.6265   LearningRate 0.0383   Epoch: 7   Global Step: 127160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:32,872-Speed 9244.73 samples/sec   Loss 6.5980   LearningRate 0.0383   Epoch: 7   Global Step: 127170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:33,944-Speed 9565.66 samples/sec   Loss 6.6010   LearningRate 0.0383   Epoch: 7   Global Step: 127180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:35,036-Speed 9376.42 samples/sec   Loss 6.6201   LearningRate 0.0383   Epoch: 7   Global Step: 127190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:36,119-Speed 9464.05 samples/sec   Loss 6.5793   LearningRate 0.0383   Epoch: 7   Global Step: 127200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:37,166-Speed 9784.50 samples/sec   Loss 6.4922   LearningRate 0.0383   Epoch: 7   Global Step: 127210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:38,259-Speed 9376.27 samples/sec   Loss 6.6388   LearningRate 0.0383   Epoch: 7   Global Step: 127220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:39,375-Speed 9187.69 samples/sec   Loss 6.5018   LearningRate 0.0383   Epoch: 7   Global Step: 127230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:40,466-Speed 9391.89 samples/sec   Loss 6.5627   LearningRate 0.0383   Epoch: 7   Global Step: 127240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:41,516-Speed 9761.42 samples/sec   Loss 6.5227   LearningRate 0.0383   Epoch: 7   Global Step: 127250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:42,636-Speed 9147.12 samples/sec   Loss 6.6119   LearningRate 0.0383   Epoch: 7   Global Step: 127260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:43,734-Speed 9332.13 samples/sec   Loss 6.5652   LearningRate 0.0383   Epoch: 7   Global Step: 127270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:44,848-Speed 9199.70 samples/sec   Loss 6.5799   LearningRate 0.0383   Epoch: 7   Global Step: 127280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:45,969-Speed 9140.30 samples/sec   Loss 6.4276   LearningRate 0.0383   Epoch: 7   Global Step: 127290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:47,069-Speed 9317.72 samples/sec   Loss 6.4400   LearningRate 0.0383   Epoch: 7   Global Step: 127300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:39:48,167-Speed 9328.21 samples/sec   Loss 6.6043   LearningRate 0.0383   Epoch: 7   Global Step: 127310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:49,285-Speed 9164.74 samples/sec   Loss 6.6853   LearningRate 0.0383   Epoch: 7   Global Step: 127320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:50,392-Speed 9254.91 samples/sec   Loss 6.6610   LearningRate 0.0383   Epoch: 7   Global Step: 127330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:51,484-Speed 9387.02 samples/sec   Loss 6.5686   LearningRate 0.0383   Epoch: 7   Global Step: 127340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:52,555-Speed 9568.64 samples/sec   Loss 6.5345   LearningRate 0.0383   Epoch: 7   Global Step: 127350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:53,646-Speed 9388.85 samples/sec   Loss 6.6421   LearningRate 0.0383   Epoch: 7   Global Step: 127360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:54,740-Speed 9358.90 samples/sec   Loss 6.4401   LearningRate 0.0382   Epoch: 7   Global Step: 127370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:55,843-Speed 9290.73 samples/sec   Loss 6.5805   LearningRate 0.0382   Epoch: 7   Global Step: 127380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:56,945-Speed 9300.21 samples/sec   Loss 6.5383   LearningRate 0.0382   Epoch: 7   Global Step: 127390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:58,013-Speed 9591.81 samples/sec   Loss 6.5166   LearningRate 0.0382   Epoch: 7   Global Step: 127400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:39:59,077-Speed 9631.33 samples/sec   Loss 6.5596   LearningRate 0.0382   Epoch: 7   Global Step: 127410   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:40:00,182-Speed 9277.24 samples/sec   Loss 6.4060   LearningRate 0.0382   Epoch: 7   Global Step: 127420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:01,302-Speed 9143.48 samples/sec   Loss 6.6195   LearningRate 0.0382   Epoch: 7   Global Step: 127430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:02,381-Speed 9496.79 samples/sec   Loss 6.5447   LearningRate 0.0382   Epoch: 7   Global Step: 127440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:03,507-Speed 9100.79 samples/sec   Loss 6.4184   LearningRate 0.0382   Epoch: 7   Global Step: 127450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:04,602-Speed 9355.28 samples/sec   Loss 6.6283   LearningRate 0.0382   Epoch: 7   Global Step: 127460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:05,672-Speed 9579.99 samples/sec   Loss 6.5672   LearningRate 0.0382   Epoch: 7   Global Step: 127470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:06,736-Speed 9626.11 samples/sec   Loss 6.6759   LearningRate 0.0382   Epoch: 7   Global Step: 127480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:07,822-Speed 9433.82 samples/sec   Loss 6.5251   LearningRate 0.0382   Epoch: 7   Global Step: 127490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:08,901-Speed 9497.79 samples/sec   Loss 6.5475   LearningRate 0.0382   Epoch: 7   Global Step: 127500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:10,020-Speed 9153.19 samples/sec   Loss 6.5647   LearningRate 0.0382   Epoch: 7   Global Step: 127510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:11,095-Speed 9536.15 samples/sec   Loss 6.5183   LearningRate 0.0382   Epoch: 7   Global Step: 127520   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:40:12,207-Speed 9214.36 samples/sec   Loss 6.5446   LearningRate 0.0382   Epoch: 7   Global Step: 127530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:13,348-Speed 8979.56 samples/sec   Loss 6.5349   LearningRate 0.0382   Epoch: 7   Global Step: 127540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:14,478-Speed 9066.70 samples/sec   Loss 6.3772   LearningRate 0.0382   Epoch: 7   Global Step: 127550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:15,577-Speed 9327.95 samples/sec   Loss 6.5811   LearningRate 0.0382   Epoch: 7   Global Step: 127560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:16,645-Speed 9589.95 samples/sec   Loss 6.5909   LearningRate 0.0382   Epoch: 7   Global Step: 127570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:17,739-Speed 9361.78 samples/sec   Loss 6.5326   LearningRate 0.0382   Epoch: 7   Global Step: 127580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:18,825-Speed 9436.78 samples/sec   Loss 6.4215   LearningRate 0.0382   Epoch: 7   Global Step: 127590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:19,881-Speed 9704.49 samples/sec   Loss 6.5883   LearningRate 0.0382   Epoch: 7   Global Step: 127600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:20,962-Speed 9486.73 samples/sec   Loss 6.6007   LearningRate 0.0382   Epoch: 7   Global Step: 127610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:22,064-Speed 9291.03 samples/sec   Loss 6.6223   LearningRate 0.0382   Epoch: 7   Global Step: 127620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:23,137-Speed 9546.61 samples/sec   Loss 6.6658   LearningRate 0.0382   Epoch: 7   Global Step: 127630   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:40:24,199-Speed 9654.86 samples/sec   Loss 6.5726   LearningRate 0.0381   Epoch: 7   Global Step: 127640   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:40:25,259-Speed 9658.04 samples/sec   Loss 6.5825   LearningRate 0.0381   Epoch: 7   Global Step: 127650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:26,322-Speed 9646.66 samples/sec   Loss 6.6643   LearningRate 0.0381   Epoch: 7   Global Step: 127660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:27,469-Speed 8931.20 samples/sec   Loss 6.6651   LearningRate 0.0381   Epoch: 7   Global Step: 127670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:28,595-Speed 9092.75 samples/sec   Loss 6.5553   LearningRate 0.0381   Epoch: 7   Global Step: 127680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:29,698-Speed 9290.47 samples/sec   Loss 6.5436   LearningRate 0.0381   Epoch: 7   Global Step: 127690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:30,772-Speed 9542.06 samples/sec   Loss 6.4742   LearningRate 0.0381   Epoch: 7   Global Step: 127700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:31,880-Speed 9249.16 samples/sec   Loss 6.5990   LearningRate 0.0381   Epoch: 7   Global Step: 127710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:33,026-Speed 8942.36 samples/sec   Loss 6.5453   LearningRate 0.0381   Epoch: 7   Global Step: 127720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:34,134-Speed 9245.01 samples/sec   Loss 6.5433   LearningRate 0.0381   Epoch: 7   Global Step: 127730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:35,220-Speed 9438.80 samples/sec   Loss 6.5960   LearningRate 0.0381   Epoch: 7   Global Step: 127740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:36,303-Speed 9462.03 samples/sec   Loss 6.5235   LearningRate 0.0381   Epoch: 7   Global Step: 127750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:37,353-Speed 9759.91 samples/sec   Loss 6.5687   LearningRate 0.0381   Epoch: 7   Global Step: 127760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:38,433-Speed 9486.51 samples/sec   Loss 6.6054   LearningRate 0.0381   Epoch: 7   Global Step: 127770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:39,497-Speed 9629.71 samples/sec   Loss 6.5453   LearningRate 0.0381   Epoch: 7   Global Step: 127780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:40,612-Speed 9189.15 samples/sec   Loss 6.5687   LearningRate 0.0381   Epoch: 7   Global Step: 127790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:41,716-Speed 9276.33 samples/sec   Loss 6.5126   LearningRate 0.0381   Epoch: 7   Global Step: 127800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:42,820-Speed 9283.87 samples/sec   Loss 6.6287   LearningRate 0.0381   Epoch: 7   Global Step: 127810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:43,958-Speed 9005.04 samples/sec   Loss 6.4932   LearningRate 0.0381   Epoch: 7   Global Step: 127820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:45,045-Speed 9427.63 samples/sec   Loss 6.5589   LearningRate 0.0381   Epoch: 7   Global Step: 127830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:46,106-Speed 9653.61 samples/sec   Loss 6.5218   LearningRate 0.0381   Epoch: 7   Global Step: 127840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:47,194-Speed 9417.44 samples/sec   Loss 6.6078   LearningRate 0.0381   Epoch: 7   Global Step: 127850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:40:48,309-Speed 9188.88 samples/sec   Loss 6.6190   LearningRate 0.0381   Epoch: 7   Global Step: 127860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:40:49,393-Speed 9454.43 samples/sec   Loss 6.6143   LearningRate 0.0381   Epoch: 7   Global Step: 127870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:40:50,479-Speed 9437.74 samples/sec   Loss 6.5487   LearningRate 0.0381   Epoch: 7   Global Step: 127880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:40:51,556-Speed 9507.99 samples/sec   Loss 6.5401   LearningRate 0.0381   Epoch: 7   Global Step: 127890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:40:52,638-Speed 9473.76 samples/sec   Loss 6.6286   LearningRate 0.0381   Epoch: 7   Global Step: 127900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:40:53,701-Speed 9635.57 samples/sec   Loss 6.5259   LearningRate 0.0380   Epoch: 7   Global Step: 127910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:40:54,850-Speed 8919.94 samples/sec   Loss 6.5456   LearningRate 0.0380   Epoch: 7   Global Step: 127920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:40:55,930-Speed 9496.37 samples/sec   Loss 6.5259   LearningRate 0.0380   Epoch: 7   Global Step: 127930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:40:57,037-Speed 9254.13 samples/sec   Loss 6.6119   LearningRate 0.0380   Epoch: 7   Global Step: 127940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:40:58,148-Speed 9224.36 samples/sec   Loss 6.5838   LearningRate 0.0380   Epoch: 7   Global Step: 127950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:40:59,251-Speed 9285.96 samples/sec   Loss 6.5434   LearningRate 0.0380   Epoch: 7   Global Step: 127960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:41:00,321-Speed 9577.18 samples/sec   Loss 6.3960   LearningRate 0.0380   Epoch: 7   Global Step: 127970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:41:01,385-Speed 9629.84 samples/sec   Loss 6.4520   LearningRate 0.0380   Epoch: 7   Global Step: 127980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:41:02,471-Speed 9435.86 samples/sec   Loss 6.5674   LearningRate 0.0380   Epoch: 7   Global Step: 127990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:41:03,539-Speed 9591.07 samples/sec   Loss 6.5755   LearningRate 0.0380   Epoch: 7   Global Step: 128000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:41:25,452-[lfw][128000]XNorm: 11.146215
Training: 2022-04-11 16:41:25,453-[lfw][128000]Accuracy-Flip: 0.99533+-0.00332
Training: 2022-04-11 16:41:25,453-[lfw][128000]Accuracy-Highest: 0.99683
Training: 2022-04-11 16:41:50,773-[cfp_fp][128000]XNorm: 9.536976
Training: 2022-04-11 16:41:50,774-[cfp_fp][128000]Accuracy-Flip: 0.95943+-0.00821
Training: 2022-04-11 16:41:50,774-[cfp_fp][128000]Accuracy-Highest: 0.96157
Training: 2022-04-11 16:42:12,621-[agedb_30][128000]XNorm: 10.868312
Training: 2022-04-11 16:42:12,622-[agedb_30][128000]Accuracy-Flip: 0.96283+-0.01135
Training: 2022-04-11 16:42:12,622-[agedb_30][128000]Accuracy-Highest: 0.96483
Training: 2022-04-11 16:42:13,708-Speed 145.94 samples/sec   Loss 6.4688   LearningRate 0.0380   Epoch: 7   Global Step: 128010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:14,848-Speed 8985.73 samples/sec   Loss 6.5609   LearningRate 0.0380   Epoch: 7   Global Step: 128020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:15,955-Speed 9256.13 samples/sec   Loss 6.5735   LearningRate 0.0380   Epoch: 7   Global Step: 128030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:17,074-Speed 9157.16 samples/sec   Loss 6.5321   LearningRate 0.0380   Epoch: 7   Global Step: 128040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:18,196-Speed 9131.93 samples/sec   Loss 6.4494   LearningRate 0.0380   Epoch: 7   Global Step: 128050   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:42:19,260-Speed 9633.70 samples/sec   Loss 6.4519   LearningRate 0.0380   Epoch: 7   Global Step: 128060   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:42:20,332-Speed 9549.52 samples/sec   Loss 6.4460   LearningRate 0.0380   Epoch: 7   Global Step: 128070   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:42:21,420-Speed 9422.14 samples/sec   Loss 6.4918   LearningRate 0.0380   Epoch: 7   Global Step: 128080   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:42:22,496-Speed 9516.17 samples/sec   Loss 6.4911   LearningRate 0.0380   Epoch: 7   Global Step: 128090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:23,578-Speed 9469.73 samples/sec   Loss 6.6149   LearningRate 0.0380   Epoch: 7   Global Step: 128100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:24,644-Speed 9614.65 samples/sec   Loss 6.4362   LearningRate 0.0380   Epoch: 7   Global Step: 128110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:25,703-Speed 9681.55 samples/sec   Loss 6.6211   LearningRate 0.0380   Epoch: 7   Global Step: 128120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:26,793-Speed 9400.63 samples/sec   Loss 6.6172   LearningRate 0.0380   Epoch: 7   Global Step: 128130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:27,866-Speed 9550.28 samples/sec   Loss 6.4228   LearningRate 0.0380   Epoch: 7   Global Step: 128140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:28,961-Speed 9355.92 samples/sec   Loss 6.5710   LearningRate 0.0380   Epoch: 7   Global Step: 128150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:30,060-Speed 9320.02 samples/sec   Loss 6.4513   LearningRate 0.0380   Epoch: 7   Global Step: 128160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:31,122-Speed 9643.89 samples/sec   Loss 6.6564   LearningRate 0.0380   Epoch: 7   Global Step: 128170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:32,163-Speed 9846.53 samples/sec   Loss 6.5387   LearningRate 0.0379   Epoch: 7   Global Step: 128180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:33,209-Speed 9796.29 samples/sec   Loss 6.4929   LearningRate 0.0379   Epoch: 7   Global Step: 128190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:34,288-Speed 9498.69 samples/sec   Loss 6.4575   LearningRate 0.0379   Epoch: 7   Global Step: 128200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:35,359-Speed 9561.71 samples/sec   Loss 6.6700   LearningRate 0.0379   Epoch: 7   Global Step: 128210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:36,476-Speed 9175.30 samples/sec   Loss 6.5853   LearningRate 0.0379   Epoch: 7   Global Step: 128220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:37,542-Speed 9613.76 samples/sec   Loss 6.7208   LearningRate 0.0379   Epoch: 7   Global Step: 128230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:38,617-Speed 9525.11 samples/sec   Loss 6.5006   LearningRate 0.0379   Epoch: 7   Global Step: 128240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:39,734-Speed 9176.98 samples/sec   Loss 6.6093   LearningRate 0.0379   Epoch: 7   Global Step: 128250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:40,776-Speed 9830.92 samples/sec   Loss 6.6498   LearningRate 0.0379   Epoch: 7   Global Step: 128260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:41,845-Speed 9585.01 samples/sec   Loss 6.5488   LearningRate 0.0379   Epoch: 7   Global Step: 128270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:42,913-Speed 9589.66 samples/sec   Loss 6.4865   LearningRate 0.0379   Epoch: 7   Global Step: 128280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:44,012-Speed 9326.16 samples/sec   Loss 6.6229   LearningRate 0.0379   Epoch: 7   Global Step: 128290   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:42:45,055-Speed 9835.34 samples/sec   Loss 6.4660   LearningRate 0.0379   Epoch: 7   Global Step: 128300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:46,117-Speed 9642.69 samples/sec   Loss 6.4108   LearningRate 0.0379   Epoch: 7   Global Step: 128310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:47,226-Speed 9237.04 samples/sec   Loss 6.5003   LearningRate 0.0379   Epoch: 7   Global Step: 128320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:48,322-Speed 9353.36 samples/sec   Loss 6.5895   LearningRate 0.0379   Epoch: 7   Global Step: 128330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:49,412-Speed 9400.95 samples/sec   Loss 6.6066   LearningRate 0.0379   Epoch: 7   Global Step: 128340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:50,475-Speed 9636.77 samples/sec   Loss 6.5899   LearningRate 0.0379   Epoch: 7   Global Step: 128350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:51,591-Speed 9180.40 samples/sec   Loss 6.6697   LearningRate 0.0379   Epoch: 7   Global Step: 128360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:52,709-Speed 9164.93 samples/sec   Loss 6.6609   LearningRate 0.0379   Epoch: 7   Global Step: 128370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:53,837-Speed 9080.18 samples/sec   Loss 6.4847   LearningRate 0.0379   Epoch: 7   Global Step: 128380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:54,909-Speed 9557.89 samples/sec   Loss 6.6180   LearningRate 0.0379   Epoch: 7   Global Step: 128390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:42:56,007-Speed 9339.49 samples/sec   Loss 6.4945   LearningRate 0.0379   Epoch: 7   Global Step: 128400   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:42:57,090-Speed 9456.64 samples/sec   Loss 6.5313   LearningRate 0.0379   Epoch: 7   Global Step: 128410   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:42:58,204-Speed 9197.83 samples/sec   Loss 6.5728   LearningRate 0.0379   Epoch: 7   Global Step: 128420   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:42:59,302-Speed 9338.70 samples/sec   Loss 6.5559   LearningRate 0.0379   Epoch: 7   Global Step: 128430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:00,385-Speed 9455.91 samples/sec   Loss 6.5284   LearningRate 0.0379   Epoch: 7   Global Step: 128440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:01,450-Speed 9620.41 samples/sec   Loss 6.5222   LearningRate 0.0378   Epoch: 7   Global Step: 128450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:02,568-Speed 9167.39 samples/sec   Loss 6.5177   LearningRate 0.0378   Epoch: 7   Global Step: 128460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:03,666-Speed 9330.46 samples/sec   Loss 6.5406   LearningRate 0.0378   Epoch: 7   Global Step: 128470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:04,741-Speed 9533.28 samples/sec   Loss 6.5391   LearningRate 0.0378   Epoch: 7   Global Step: 128480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:05,818-Speed 9513.63 samples/sec   Loss 6.5752   LearningRate 0.0378   Epoch: 7   Global Step: 128490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:06,914-Speed 9348.90 samples/sec   Loss 6.5578   LearningRate 0.0378   Epoch: 7   Global Step: 128500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:08,006-Speed 9384.12 samples/sec   Loss 6.5790   LearningRate 0.0378   Epoch: 7   Global Step: 128510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:09,131-Speed 9108.71 samples/sec   Loss 6.5280   LearningRate 0.0378   Epoch: 7   Global Step: 128520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:10,237-Speed 9259.11 samples/sec   Loss 6.4984   LearningRate 0.0378   Epoch: 7   Global Step: 128530   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:43:11,329-Speed 9381.60 samples/sec   Loss 6.4185   LearningRate 0.0378   Epoch: 7   Global Step: 128540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:12,411-Speed 9471.37 samples/sec   Loss 6.5159   LearningRate 0.0378   Epoch: 7   Global Step: 128550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:13,513-Speed 9299.58 samples/sec   Loss 6.5487   LearningRate 0.0378   Epoch: 7   Global Step: 128560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:14,615-Speed 9302.07 samples/sec   Loss 6.5560   LearningRate 0.0378   Epoch: 7   Global Step: 128570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:15,704-Speed 9403.75 samples/sec   Loss 6.6194   LearningRate 0.0378   Epoch: 7   Global Step: 128580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:16,819-Speed 9205.76 samples/sec   Loss 6.5618   LearningRate 0.0378   Epoch: 7   Global Step: 128590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:17,926-Speed 9254.81 samples/sec   Loss 6.5699   LearningRate 0.0378   Epoch: 7   Global Step: 128600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:19,048-Speed 9125.45 samples/sec   Loss 6.4296   LearningRate 0.0378   Epoch: 7   Global Step: 128610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:20,121-Speed 9550.38 samples/sec   Loss 6.5360   LearningRate 0.0378   Epoch: 7   Global Step: 128620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:21,226-Speed 9272.12 samples/sec   Loss 6.5889   LearningRate 0.0378   Epoch: 7   Global Step: 128630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:22,314-Speed 9422.03 samples/sec   Loss 6.6109   LearningRate 0.0378   Epoch: 7   Global Step: 128640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:23,394-Speed 9484.19 samples/sec   Loss 6.4650   LearningRate 0.0378   Epoch: 7   Global Step: 128650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:24,472-Speed 9508.02 samples/sec   Loss 6.6388   LearningRate 0.0378   Epoch: 7   Global Step: 128660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:25,583-Speed 9227.36 samples/sec   Loss 6.5263   LearningRate 0.0378   Epoch: 7   Global Step: 128670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:26,656-Speed 9545.86 samples/sec   Loss 6.5011   LearningRate 0.0378   Epoch: 7   Global Step: 128680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:27,743-Speed 9427.04 samples/sec   Loss 6.5797   LearningRate 0.0378   Epoch: 7   Global Step: 128690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:28,869-Speed 9094.72 samples/sec   Loss 6.4950   LearningRate 0.0378   Epoch: 7   Global Step: 128700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:29,942-Speed 9549.99 samples/sec   Loss 6.6290   LearningRate 0.0378   Epoch: 7   Global Step: 128710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:30,979-Speed 9885.24 samples/sec   Loss 6.5998   LearningRate 0.0377   Epoch: 7   Global Step: 128720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:32,101-Speed 9132.26 samples/sec   Loss 6.5151   LearningRate 0.0377   Epoch: 7   Global Step: 128730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:33,267-Speed 8784.20 samples/sec   Loss 6.5531   LearningRate 0.0377   Epoch: 7   Global Step: 128740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:34,358-Speed 9395.65 samples/sec   Loss 6.5175   LearningRate 0.0377   Epoch: 7   Global Step: 128750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:35,467-Speed 9232.92 samples/sec   Loss 6.5211   LearningRate 0.0377   Epoch: 7   Global Step: 128760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:36,548-Speed 9483.61 samples/sec   Loss 6.4313   LearningRate 0.0377   Epoch: 7   Global Step: 128770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:37,638-Speed 9397.54 samples/sec   Loss 6.5452   LearningRate 0.0377   Epoch: 7   Global Step: 128780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:38,700-Speed 9645.11 samples/sec   Loss 6.5128   LearningRate 0.0377   Epoch: 7   Global Step: 128790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:39,766-Speed 9619.56 samples/sec   Loss 6.5834   LearningRate 0.0377   Epoch: 7   Global Step: 128800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:40,829-Speed 9634.27 samples/sec   Loss 6.5815   LearningRate 0.0377   Epoch: 7   Global Step: 128810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:41,945-Speed 9186.35 samples/sec   Loss 6.5879   LearningRate 0.0377   Epoch: 7   Global Step: 128820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:43,021-Speed 9516.94 samples/sec   Loss 6.4800   LearningRate 0.0377   Epoch: 7   Global Step: 128830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:44,082-Speed 9664.36 samples/sec   Loss 6.4124   LearningRate 0.0377   Epoch: 7   Global Step: 128840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:45,168-Speed 9435.42 samples/sec   Loss 6.5071   LearningRate 0.0377   Epoch: 7   Global Step: 128850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:46,299-Speed 9057.91 samples/sec   Loss 6.5293   LearningRate 0.0377   Epoch: 7   Global Step: 128860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:47,388-Speed 9408.51 samples/sec   Loss 6.5527   LearningRate 0.0377   Epoch: 7   Global Step: 128870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:48,462-Speed 9536.36 samples/sec   Loss 6.5666   LearningRate 0.0377   Epoch: 7   Global Step: 128880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:49,545-Speed 9463.67 samples/sec   Loss 6.5585   LearningRate 0.0377   Epoch: 7   Global Step: 128890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:50,618-Speed 9551.62 samples/sec   Loss 6.6703   LearningRate 0.0377   Epoch: 7   Global Step: 128900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:43:51,671-Speed 9723.36 samples/sec   Loss 6.6838   LearningRate 0.0377   Epoch: 7   Global Step: 128910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:52,747-Speed 9526.67 samples/sec   Loss 6.5731   LearningRate 0.0377   Epoch: 7   Global Step: 128920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:53,841-Speed 9363.57 samples/sec   Loss 6.5416   LearningRate 0.0377   Epoch: 7   Global Step: 128930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:54,948-Speed 9257.32 samples/sec   Loss 6.5391   LearningRate 0.0377   Epoch: 7   Global Step: 128940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:56,012-Speed 9627.40 samples/sec   Loss 6.4641   LearningRate 0.0377   Epoch: 7   Global Step: 128950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:57,098-Speed 9434.06 samples/sec   Loss 6.5541   LearningRate 0.0377   Epoch: 7   Global Step: 128960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:58,187-Speed 9413.23 samples/sec   Loss 6.4619   LearningRate 0.0377   Epoch: 7   Global Step: 128970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:43:59,270-Speed 9464.06 samples/sec   Loss 6.6102   LearningRate 0.0377   Epoch: 7   Global Step: 128980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:00,356-Speed 9437.36 samples/sec   Loss 6.4287   LearningRate 0.0376   Epoch: 7   Global Step: 128990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:01,454-Speed 9327.20 samples/sec   Loss 6.5574   LearningRate 0.0376   Epoch: 7   Global Step: 129000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:02,580-Speed 9104.59 samples/sec   Loss 6.5266   LearningRate 0.0376   Epoch: 7   Global Step: 129010   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:44:03,646-Speed 9612.94 samples/sec   Loss 6.5666   LearningRate 0.0376   Epoch: 7   Global Step: 129020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:04,757-Speed 9215.11 samples/sec   Loss 6.5505   LearningRate 0.0376   Epoch: 7   Global Step: 129030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:05,844-Speed 9425.69 samples/sec   Loss 6.4687   LearningRate 0.0376   Epoch: 7   Global Step: 129040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:06,929-Speed 9448.27 samples/sec   Loss 6.5865   LearningRate 0.0376   Epoch: 7   Global Step: 129050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:08,033-Speed 9274.12 samples/sec   Loss 6.6293   LearningRate 0.0376   Epoch: 7   Global Step: 129060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:09,121-Speed 9422.69 samples/sec   Loss 6.4515   LearningRate 0.0376   Epoch: 7   Global Step: 129070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:10,236-Speed 9187.50 samples/sec   Loss 6.6672   LearningRate 0.0376   Epoch: 7   Global Step: 129080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:11,321-Speed 9444.93 samples/sec   Loss 6.4253   LearningRate 0.0376   Epoch: 7   Global Step: 129090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:12,371-Speed 9756.65 samples/sec   Loss 6.5175   LearningRate 0.0376   Epoch: 7   Global Step: 129100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:13,478-Speed 9250.38 samples/sec   Loss 6.4987   LearningRate 0.0376   Epoch: 7   Global Step: 129110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:14,563-Speed 9447.46 samples/sec   Loss 6.5476   LearningRate 0.0376   Epoch: 7   Global Step: 129120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:15,701-Speed 9003.20 samples/sec   Loss 6.5652   LearningRate 0.0376   Epoch: 7   Global Step: 129130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:16,810-Speed 9240.63 samples/sec   Loss 6.6157   LearningRate 0.0376   Epoch: 7   Global Step: 129140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:17,895-Speed 9444.74 samples/sec   Loss 6.5842   LearningRate 0.0376   Epoch: 7   Global Step: 129150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:18,997-Speed 9297.07 samples/sec   Loss 6.4870   LearningRate 0.0376   Epoch: 7   Global Step: 129160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:20,077-Speed 9488.06 samples/sec   Loss 6.5237   LearningRate 0.0376   Epoch: 7   Global Step: 129170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:21,202-Speed 9105.99 samples/sec   Loss 6.4587   LearningRate 0.0376   Epoch: 7   Global Step: 129180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:22,322-Speed 9147.00 samples/sec   Loss 6.5385   LearningRate 0.0376   Epoch: 7   Global Step: 129190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:23,359-Speed 9880.27 samples/sec   Loss 6.5510   LearningRate 0.0376   Epoch: 7   Global Step: 129200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:24,450-Speed 9396.96 samples/sec   Loss 6.6234   LearningRate 0.0376   Epoch: 7   Global Step: 129210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:25,511-Speed 9656.38 samples/sec   Loss 6.5953   LearningRate 0.0376   Epoch: 7   Global Step: 129220   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:44:26,591-Speed 9490.91 samples/sec   Loss 6.6117   LearningRate 0.0376   Epoch: 7   Global Step: 129230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:27,651-Speed 9660.98 samples/sec   Loss 6.5196   LearningRate 0.0376   Epoch: 7   Global Step: 129240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:28,708-Speed 9694.83 samples/sec   Loss 6.6541   LearningRate 0.0376   Epoch: 7   Global Step: 129250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:44:29,762-Speed 9717.35 samples/sec   Loss 6.5744   LearningRate 0.0376   Epoch: 7   Global Step: 129260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:44:30,850-Speed 9418.31 samples/sec   Loss 6.4496   LearningRate 0.0375   Epoch: 7   Global Step: 129270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:44:31,938-Speed 9424.19 samples/sec   Loss 6.4989   LearningRate 0.0375   Epoch: 7   Global Step: 129280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:44:33,003-Speed 9616.77 samples/sec   Loss 6.5423   LearningRate 0.0375   Epoch: 7   Global Step: 129290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:44:34,050-Speed 9788.66 samples/sec   Loss 6.5584   LearningRate 0.0375   Epoch: 7   Global Step: 129300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:44:35,103-Speed 9729.26 samples/sec   Loss 6.5238   LearningRate 0.0375   Epoch: 7   Global Step: 129310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:44:36,201-Speed 9336.18 samples/sec   Loss 6.5210   LearningRate 0.0375   Epoch: 7   Global Step: 129320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:44:37,249-Speed 9768.93 samples/sec   Loss 6.4542   LearningRate 0.0375   Epoch: 7   Global Step: 129330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:44:38,412-Speed 8809.53 samples/sec   Loss 6.6745   LearningRate 0.0375   Epoch: 7   Global Step: 129340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:44:39,554-Speed 8977.03 samples/sec   Loss 6.4720   LearningRate 0.0375   Epoch: 7   Global Step: 129350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:40,683-Speed 9073.86 samples/sec   Loss 6.4770   LearningRate 0.0375   Epoch: 7   Global Step: 129360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:41,789-Speed 9262.60 samples/sec   Loss 6.6085   LearningRate 0.0375   Epoch: 7   Global Step: 129370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:42,909-Speed 9145.38 samples/sec   Loss 6.6198   LearningRate 0.0375   Epoch: 7   Global Step: 129380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:43,990-Speed 9480.82 samples/sec   Loss 6.4653   LearningRate 0.0375   Epoch: 7   Global Step: 129390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:45,068-Speed 9507.32 samples/sec   Loss 6.5410   LearningRate 0.0375   Epoch: 7   Global Step: 129400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:46,139-Speed 9562.95 samples/sec   Loss 6.4703   LearningRate 0.0375   Epoch: 7   Global Step: 129410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:47,254-Speed 9190.61 samples/sec   Loss 6.5365   LearningRate 0.0375   Epoch: 7   Global Step: 129420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:48,360-Speed 9261.75 samples/sec   Loss 6.6024   LearningRate 0.0375   Epoch: 7   Global Step: 129430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:49,440-Speed 9488.90 samples/sec   Loss 6.5589   LearningRate 0.0375   Epoch: 7   Global Step: 129440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:50,546-Speed 9267.79 samples/sec   Loss 6.5413   LearningRate 0.0375   Epoch: 7   Global Step: 129450   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:44:51,604-Speed 9681.31 samples/sec   Loss 6.5910   LearningRate 0.0375   Epoch: 7   Global Step: 129460   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:44:52,706-Speed 9291.73 samples/sec   Loss 6.4964   LearningRate 0.0375   Epoch: 7   Global Step: 129470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:53,755-Speed 9767.52 samples/sec   Loss 6.4864   LearningRate 0.0375   Epoch: 7   Global Step: 129480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:54,808-Speed 9734.28 samples/sec   Loss 6.5541   LearningRate 0.0375   Epoch: 7   Global Step: 129490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:55,891-Speed 9465.89 samples/sec   Loss 6.5502   LearningRate 0.0375   Epoch: 7   Global Step: 129500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:56,974-Speed 9457.42 samples/sec   Loss 6.5527   LearningRate 0.0375   Epoch: 7   Global Step: 129510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:58,074-Speed 9315.40 samples/sec   Loss 6.4597   LearningRate 0.0375   Epoch: 7   Global Step: 129520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:44:59,210-Speed 9021.73 samples/sec   Loss 6.5180   LearningRate 0.0375   Epoch: 7   Global Step: 129530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:00,348-Speed 9006.47 samples/sec   Loss 6.4811   LearningRate 0.0374   Epoch: 7   Global Step: 129540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:01,433-Speed 9440.31 samples/sec   Loss 6.6235   LearningRate 0.0374   Epoch: 7   Global Step: 129550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:02,539-Speed 9268.79 samples/sec   Loss 6.4931   LearningRate 0.0374   Epoch: 7   Global Step: 129560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:03,618-Speed 9491.19 samples/sec   Loss 6.4625   LearningRate 0.0374   Epoch: 7   Global Step: 129570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:04,707-Speed 9412.17 samples/sec   Loss 6.5964   LearningRate 0.0374   Epoch: 7   Global Step: 129580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:05,797-Speed 9394.62 samples/sec   Loss 6.5430   LearningRate 0.0374   Epoch: 7   Global Step: 129590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:06,859-Speed 9649.44 samples/sec   Loss 6.5654   LearningRate 0.0374   Epoch: 7   Global Step: 129600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:07,941-Speed 9465.34 samples/sec   Loss 6.4472   LearningRate 0.0374   Epoch: 7   Global Step: 129610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:09,041-Speed 9315.62 samples/sec   Loss 6.4495   LearningRate 0.0374   Epoch: 7   Global Step: 129620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:10,114-Speed 9554.25 samples/sec   Loss 6.4493   LearningRate 0.0374   Epoch: 7   Global Step: 129630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:11,177-Speed 9636.57 samples/sec   Loss 6.6304   LearningRate 0.0374   Epoch: 7   Global Step: 129640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:12,275-Speed 9325.72 samples/sec   Loss 6.5525   LearningRate 0.0374   Epoch: 7   Global Step: 129650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:13,335-Speed 9671.14 samples/sec   Loss 6.5364   LearningRate 0.0374   Epoch: 7   Global Step: 129660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:14,449-Speed 9199.38 samples/sec   Loss 6.5325   LearningRate 0.0374   Epoch: 7   Global Step: 129670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:15,491-Speed 9836.59 samples/sec   Loss 6.4775   LearningRate 0.0374   Epoch: 7   Global Step: 129680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:16,548-Speed 9689.20 samples/sec   Loss 6.5381   LearningRate 0.0374   Epoch: 7   Global Step: 129690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:17,692-Speed 8956.33 samples/sec   Loss 6.5779   LearningRate 0.0374   Epoch: 7   Global Step: 129700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:18,753-Speed 9661.91 samples/sec   Loss 6.5257   LearningRate 0.0374   Epoch: 7   Global Step: 129710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:19,893-Speed 8984.14 samples/sec   Loss 6.6120   LearningRate 0.0374   Epoch: 7   Global Step: 129720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:21,007-Speed 9201.84 samples/sec   Loss 6.5802   LearningRate 0.0374   Epoch: 7   Global Step: 129730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:22,048-Speed 9834.61 samples/sec   Loss 6.5113   LearningRate 0.0374   Epoch: 7   Global Step: 129740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:23,122-Speed 9547.29 samples/sec   Loss 6.5482   LearningRate 0.0374   Epoch: 7   Global Step: 129750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:24,195-Speed 9546.25 samples/sec   Loss 6.5119   LearningRate 0.0374   Epoch: 7   Global Step: 129760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:25,237-Speed 9833.36 samples/sec   Loss 6.5500   LearningRate 0.0374   Epoch: 7   Global Step: 129770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:26,301-Speed 9633.78 samples/sec   Loss 6.5401   LearningRate 0.0374   Epoch: 7   Global Step: 129780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:45:27,395-Speed 9366.64 samples/sec   Loss 6.5446   LearningRate 0.0374   Epoch: 7   Global Step: 129790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:45:28,484-Speed 9406.81 samples/sec   Loss 6.5663   LearningRate 0.0374   Epoch: 7   Global Step: 129800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:45:29,564-Speed 9481.00 samples/sec   Loss 6.5282   LearningRate 0.0373   Epoch: 7   Global Step: 129810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:45:30,620-Speed 9705.96 samples/sec   Loss 6.4871   LearningRate 0.0373   Epoch: 7   Global Step: 129820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:45:31,670-Speed 9766.01 samples/sec   Loss 6.5009   LearningRate 0.0373   Epoch: 7   Global Step: 129830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:45:32,754-Speed 9451.90 samples/sec   Loss 6.5005   LearningRate 0.0373   Epoch: 7   Global Step: 129840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:45:33,824-Speed 9576.80 samples/sec   Loss 6.5481   LearningRate 0.0373   Epoch: 7   Global Step: 129850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:45:34,918-Speed 9365.22 samples/sec   Loss 6.5100   LearningRate 0.0373   Epoch: 7   Global Step: 129860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:45:35,987-Speed 9585.54 samples/sec   Loss 6.5543   LearningRate 0.0373   Epoch: 7   Global Step: 129870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:45:37,087-Speed 9311.83 samples/sec   Loss 6.5313   LearningRate 0.0373   Epoch: 7   Global Step: 129880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:38,164-Speed 9516.31 samples/sec   Loss 6.5771   LearningRate 0.0373   Epoch: 7   Global Step: 129890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:39,261-Speed 9337.65 samples/sec   Loss 6.5560   LearningRate 0.0373   Epoch: 7   Global Step: 129900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:40,375-Speed 9200.48 samples/sec   Loss 6.5505   LearningRate 0.0373   Epoch: 7   Global Step: 129910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:41,483-Speed 9247.74 samples/sec   Loss 6.5249   LearningRate 0.0373   Epoch: 7   Global Step: 129920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:42,600-Speed 9171.12 samples/sec   Loss 6.5019   LearningRate 0.0373   Epoch: 7   Global Step: 129930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:43,700-Speed 9316.46 samples/sec   Loss 6.4648   LearningRate 0.0373   Epoch: 7   Global Step: 129940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:44,785-Speed 9445.20 samples/sec   Loss 6.4929   LearningRate 0.0373   Epoch: 7   Global Step: 129950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:45,864-Speed 9498.57 samples/sec   Loss 6.5669   LearningRate 0.0373   Epoch: 7   Global Step: 129960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:46,968-Speed 9277.73 samples/sec   Loss 6.5193   LearningRate 0.0373   Epoch: 7   Global Step: 129970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:45:48,100-Speed 9055.75 samples/sec   Loss 6.3604   LearningRate 0.0373   Epoch: 7   Global Step: 129980   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:45:49,164-Speed 9629.13 samples/sec   Loss 6.4980   LearningRate 0.0373   Epoch: 7   Global Step: 129990   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:45:50,236-Speed 9552.75 samples/sec   Loss 6.4821   LearningRate 0.0373   Epoch: 7   Global Step: 130000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:46:12,255-[lfw][130000]XNorm: 10.771086
Training: 2022-04-11 16:46:12,255-[lfw][130000]Accuracy-Flip: 0.99567+-0.00249
Training: 2022-04-11 16:46:12,256-[lfw][130000]Accuracy-Highest: 0.99683
Training: 2022-04-11 16:46:37,666-[cfp_fp][130000]XNorm: 9.138050
Training: 2022-04-11 16:46:37,667-[cfp_fp][130000]Accuracy-Flip: 0.95757+-0.01034
Training: 2022-04-11 16:46:37,667-[cfp_fp][130000]Accuracy-Highest: 0.96157
Training: 2022-04-11 16:46:59,637-[agedb_30][130000]XNorm: 10.391237
Training: 2022-04-11 16:46:59,638-[agedb_30][130000]Accuracy-Flip: 0.96650+-0.01045
Training: 2022-04-11 16:46:59,638-[agedb_30][130000]Accuracy-Highest: 0.96650
Training: 2022-04-11 16:47:00,716-Speed 145.29 samples/sec   Loss 6.4578   LearningRate 0.0373   Epoch: 7   Global Step: 130010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:01,794-Speed 9511.42 samples/sec   Loss 6.4471   LearningRate 0.0373   Epoch: 7   Global Step: 130020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:02,883-Speed 9409.16 samples/sec   Loss 6.5216   LearningRate 0.0373   Epoch: 7   Global Step: 130030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:03,964-Speed 9473.23 samples/sec   Loss 6.5258   LearningRate 0.0373   Epoch: 7   Global Step: 130040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:05,035-Speed 9566.57 samples/sec   Loss 6.4306   LearningRate 0.0373   Epoch: 7   Global Step: 130050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:06,112-Speed 9513.85 samples/sec   Loss 6.5677   LearningRate 0.0373   Epoch: 7   Global Step: 130060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:07,179-Speed 9605.22 samples/sec   Loss 6.6483   LearningRate 0.0373   Epoch: 7   Global Step: 130070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:08,270-Speed 9386.64 samples/sec   Loss 6.5282   LearningRate 0.0373   Epoch: 7   Global Step: 130080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:09,370-Speed 9319.66 samples/sec   Loss 6.5044   LearningRate 0.0372   Epoch: 7   Global Step: 130090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:10,461-Speed 9388.30 samples/sec   Loss 6.3950   LearningRate 0.0372   Epoch: 7   Global Step: 130100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:11,536-Speed 9525.76 samples/sec   Loss 6.5637   LearningRate 0.0372   Epoch: 7   Global Step: 130110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:12,613-Speed 9519.76 samples/sec   Loss 6.4238   LearningRate 0.0372   Epoch: 7   Global Step: 130120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:13,694-Speed 9479.31 samples/sec   Loss 6.5249   LearningRate 0.0372   Epoch: 7   Global Step: 130130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:14,789-Speed 9354.86 samples/sec   Loss 6.4981   LearningRate 0.0372   Epoch: 7   Global Step: 130140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:15,901-Speed 9213.98 samples/sec   Loss 6.4364   LearningRate 0.0372   Epoch: 7   Global Step: 130150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:16,982-Speed 9482.91 samples/sec   Loss 6.5644   LearningRate 0.0372   Epoch: 7   Global Step: 130160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:18,021-Speed 9861.98 samples/sec   Loss 6.4935   LearningRate 0.0372   Epoch: 7   Global Step: 130170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:19,122-Speed 9312.08 samples/sec   Loss 6.4528   LearningRate 0.0372   Epoch: 7   Global Step: 130180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:20,185-Speed 9638.19 samples/sec   Loss 6.4757   LearningRate 0.0372   Epoch: 7   Global Step: 130190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:21,265-Speed 9484.62 samples/sec   Loss 6.5143   LearningRate 0.0372   Epoch: 7   Global Step: 130200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:22,356-Speed 9391.92 samples/sec   Loss 6.4858   LearningRate 0.0372   Epoch: 7   Global Step: 130210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:23,459-Speed 9282.47 samples/sec   Loss 6.4837   LearningRate 0.0372   Epoch: 7   Global Step: 130220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:24,531-Speed 9562.89 samples/sec   Loss 6.5107   LearningRate 0.0372   Epoch: 7   Global Step: 130230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:25,580-Speed 9767.35 samples/sec   Loss 6.5438   LearningRate 0.0372   Epoch: 7   Global Step: 130240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:26,639-Speed 9670.76 samples/sec   Loss 6.4934   LearningRate 0.0372   Epoch: 7   Global Step: 130250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:27,738-Speed 9327.26 samples/sec   Loss 6.4897   LearningRate 0.0372   Epoch: 7   Global Step: 130260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:28,845-Speed 9253.66 samples/sec   Loss 6.4632   LearningRate 0.0372   Epoch: 7   Global Step: 130270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:29,947-Speed 9297.53 samples/sec   Loss 6.5203   LearningRate 0.0372   Epoch: 7   Global Step: 130280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:31,033-Speed 9432.42 samples/sec   Loss 6.5245   LearningRate 0.0372   Epoch: 7   Global Step: 130290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:32,166-Speed 9042.62 samples/sec   Loss 6.4299   LearningRate 0.0372   Epoch: 7   Global Step: 130300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:33,267-Speed 9304.57 samples/sec   Loss 6.5645   LearningRate 0.0372   Epoch: 7   Global Step: 130310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:34,392-Speed 9106.92 samples/sec   Loss 6.4744   LearningRate 0.0372   Epoch: 7   Global Step: 130320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:35,497-Speed 9280.19 samples/sec   Loss 6.5019   LearningRate 0.0372   Epoch: 7   Global Step: 130330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:36,596-Speed 9317.97 samples/sec   Loss 6.5588   LearningRate 0.0372   Epoch: 7   Global Step: 130340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:37,715-Speed 9160.71 samples/sec   Loss 6.4478   LearningRate 0.0372   Epoch: 7   Global Step: 130350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:38,774-Speed 9676.37 samples/sec   Loss 6.4816   LearningRate 0.0371   Epoch: 7   Global Step: 130360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:39,845-Speed 9559.54 samples/sec   Loss 6.4750   LearningRate 0.0371   Epoch: 7   Global Step: 130370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:40,988-Speed 8963.24 samples/sec   Loss 6.4985   LearningRate 0.0371   Epoch: 7   Global Step: 130380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:42,094-Speed 9265.34 samples/sec   Loss 6.4082   LearningRate 0.0371   Epoch: 7   Global Step: 130390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:43,176-Speed 9469.13 samples/sec   Loss 6.4311   LearningRate 0.0371   Epoch: 7   Global Step: 130400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:44,226-Speed 9762.50 samples/sec   Loss 6.4662   LearningRate 0.0371   Epoch: 7   Global Step: 130410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:45,293-Speed 9601.53 samples/sec   Loss 6.5452   LearningRate 0.0371   Epoch: 7   Global Step: 130420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:46,369-Speed 9519.00 samples/sec   Loss 6.4445   LearningRate 0.0371   Epoch: 7   Global Step: 130430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:47,445-Speed 9527.50 samples/sec   Loss 6.4566   LearningRate 0.0371   Epoch: 7   Global Step: 130440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:48,524-Speed 9499.55 samples/sec   Loss 6.5420   LearningRate 0.0371   Epoch: 7   Global Step: 130450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:49,581-Speed 9695.51 samples/sec   Loss 6.6589   LearningRate 0.0371   Epoch: 7   Global Step: 130460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:50,665-Speed 9451.63 samples/sec   Loss 6.5395   LearningRate 0.0371   Epoch: 7   Global Step: 130470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:47:51,794-Speed 9072.48 samples/sec   Loss 6.5318   LearningRate 0.0371   Epoch: 7   Global Step: 130480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:52,907-Speed 9207.72 samples/sec   Loss 6.6530   LearningRate 0.0371   Epoch: 7   Global Step: 130490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:53,980-Speed 9544.27 samples/sec   Loss 6.5199   LearningRate 0.0371   Epoch: 7   Global Step: 130500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:55,059-Speed 9500.44 samples/sec   Loss 6.4442   LearningRate 0.0371   Epoch: 7   Global Step: 130510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:56,172-Speed 9206.98 samples/sec   Loss 6.4187   LearningRate 0.0371   Epoch: 7   Global Step: 130520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:57,289-Speed 9168.95 samples/sec   Loss 6.5594   LearningRate 0.0371   Epoch: 7   Global Step: 130530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:58,380-Speed 9389.62 samples/sec   Loss 6.5222   LearningRate 0.0371   Epoch: 7   Global Step: 130540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:47:59,472-Speed 9382.42 samples/sec   Loss 6.4827   LearningRate 0.0371   Epoch: 7   Global Step: 130550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:00,554-Speed 9477.26 samples/sec   Loss 6.6069   LearningRate 0.0371   Epoch: 7   Global Step: 130560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:01,619-Speed 9613.64 samples/sec   Loss 6.5832   LearningRate 0.0371   Epoch: 7   Global Step: 130570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:02,724-Speed 9275.60 samples/sec   Loss 6.5849   LearningRate 0.0371   Epoch: 7   Global Step: 130580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:03,807-Speed 9461.17 samples/sec   Loss 6.5683   LearningRate 0.0371   Epoch: 7   Global Step: 130590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:04,890-Speed 9454.86 samples/sec   Loss 6.5088   LearningRate 0.0371   Epoch: 7   Global Step: 130600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:05,994-Speed 9282.31 samples/sec   Loss 6.5544   LearningRate 0.0371   Epoch: 7   Global Step: 130610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:07,091-Speed 9338.78 samples/sec   Loss 6.4839   LearningRate 0.0371   Epoch: 7   Global Step: 130620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:08,189-Speed 9335.56 samples/sec   Loss 6.5887   LearningRate 0.0370   Epoch: 7   Global Step: 130630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:09,230-Speed 9838.32 samples/sec   Loss 6.5134   LearningRate 0.0370   Epoch: 7   Global Step: 130640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:10,349-Speed 9157.56 samples/sec   Loss 6.4616   LearningRate 0.0370   Epoch: 7   Global Step: 130650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:11,480-Speed 9059.83 samples/sec   Loss 6.4673   LearningRate 0.0370   Epoch: 7   Global Step: 130660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:12,605-Speed 9107.91 samples/sec   Loss 6.5015   LearningRate 0.0370   Epoch: 7   Global Step: 130670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:13,654-Speed 9774.85 samples/sec   Loss 6.4736   LearningRate 0.0370   Epoch: 7   Global Step: 130680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:14,704-Speed 9754.84 samples/sec   Loss 6.4796   LearningRate 0.0370   Epoch: 7   Global Step: 130690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:15,739-Speed 9900.60 samples/sec   Loss 6.6195   LearningRate 0.0370   Epoch: 7   Global Step: 130700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:16,804-Speed 9617.17 samples/sec   Loss 6.5168   LearningRate 0.0370   Epoch: 7   Global Step: 130710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:17,913-Speed 9241.23 samples/sec   Loss 6.4463   LearningRate 0.0370   Epoch: 7   Global Step: 130720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:19,015-Speed 9294.20 samples/sec   Loss 6.5025   LearningRate 0.0370   Epoch: 7   Global Step: 130730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:20,105-Speed 9402.00 samples/sec   Loss 6.4439   LearningRate 0.0370   Epoch: 7   Global Step: 130740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:21,154-Speed 9765.58 samples/sec   Loss 6.5227   LearningRate 0.0370   Epoch: 7   Global Step: 130750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:22,210-Speed 9703.78 samples/sec   Loss 6.5925   LearningRate 0.0370   Epoch: 7   Global Step: 130760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:23,301-Speed 9389.52 samples/sec   Loss 6.4778   LearningRate 0.0370   Epoch: 7   Global Step: 130770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:24,424-Speed 9128.36 samples/sec   Loss 6.5338   LearningRate 0.0370   Epoch: 7   Global Step: 130780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:25,502-Speed 9504.31 samples/sec   Loss 6.5401   LearningRate 0.0370   Epoch: 7   Global Step: 130790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:48:26,566-Speed 9629.58 samples/sec   Loss 6.4924   LearningRate 0.0370   Epoch: 7   Global Step: 130800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:27,640-Speed 9535.59 samples/sec   Loss 6.4834   LearningRate 0.0370   Epoch: 7   Global Step: 130810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:28,743-Speed 9289.91 samples/sec   Loss 6.5086   LearningRate 0.0370   Epoch: 7   Global Step: 130820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:29,805-Speed 9645.39 samples/sec   Loss 6.4952   LearningRate 0.0370   Epoch: 7   Global Step: 130830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:30,920-Speed 9188.81 samples/sec   Loss 6.5412   LearningRate 0.0370   Epoch: 7   Global Step: 130840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:32,026-Speed 9268.89 samples/sec   Loss 6.5825   LearningRate 0.0370   Epoch: 7   Global Step: 130850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:33,085-Speed 9677.69 samples/sec   Loss 6.4214   LearningRate 0.0370   Epoch: 7   Global Step: 130860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:34,146-Speed 9657.65 samples/sec   Loss 6.4045   LearningRate 0.0370   Epoch: 7   Global Step: 130870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:35,234-Speed 9418.15 samples/sec   Loss 6.4970   LearningRate 0.0370   Epoch: 7   Global Step: 130880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:36,321-Speed 9422.84 samples/sec   Loss 6.4713   LearningRate 0.0370   Epoch: 7   Global Step: 130890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:37,442-Speed 9136.29 samples/sec   Loss 6.5511   LearningRate 0.0370   Epoch: 7   Global Step: 130900   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:48:38,542-Speed 9321.75 samples/sec   Loss 6.3697   LearningRate 0.0369   Epoch: 7   Global Step: 130910   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:48:39,608-Speed 9606.72 samples/sec   Loss 6.4592   LearningRate 0.0369   Epoch: 7   Global Step: 130920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:40,734-Speed 9096.01 samples/sec   Loss 6.5055   LearningRate 0.0369   Epoch: 7   Global Step: 130930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:41,802-Speed 9597.07 samples/sec   Loss 6.5201   LearningRate 0.0369   Epoch: 7   Global Step: 130940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:42,907-Speed 9274.46 samples/sec   Loss 6.6090   LearningRate 0.0369   Epoch: 7   Global Step: 130950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:44,015-Speed 9247.51 samples/sec   Loss 6.5592   LearningRate 0.0369   Epoch: 7   Global Step: 130960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:45,055-Speed 9853.97 samples/sec   Loss 6.4878   LearningRate 0.0369   Epoch: 7   Global Step: 130970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:46,152-Speed 9340.27 samples/sec   Loss 6.5064   LearningRate 0.0369   Epoch: 7   Global Step: 130980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:47,264-Speed 9215.05 samples/sec   Loss 6.4502   LearningRate 0.0369   Epoch: 7   Global Step: 130990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:48,379-Speed 9183.75 samples/sec   Loss 6.5819   LearningRate 0.0369   Epoch: 7   Global Step: 131000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:49,478-Speed 9331.54 samples/sec   Loss 6.4379   LearningRate 0.0369   Epoch: 7   Global Step: 131010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:50,551-Speed 9551.63 samples/sec   Loss 6.5051   LearningRate 0.0369   Epoch: 7   Global Step: 131020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:51,582-Speed 9931.55 samples/sec   Loss 6.4379   LearningRate 0.0369   Epoch: 7   Global Step: 131030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:52,686-Speed 9287.05 samples/sec   Loss 6.4670   LearningRate 0.0369   Epoch: 7   Global Step: 131040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:53,848-Speed 8810.10 samples/sec   Loss 6.5883   LearningRate 0.0369   Epoch: 7   Global Step: 131050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:54,957-Speed 9245.18 samples/sec   Loss 6.5650   LearningRate 0.0369   Epoch: 7   Global Step: 131060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:56,028-Speed 9566.15 samples/sec   Loss 6.4676   LearningRate 0.0369   Epoch: 7   Global Step: 131070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:57,117-Speed 9404.42 samples/sec   Loss 6.5272   LearningRate 0.0369   Epoch: 7   Global Step: 131080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:58,204-Speed 9421.67 samples/sec   Loss 6.4633   LearningRate 0.0369   Epoch: 7   Global Step: 131090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:48:59,223-Speed 10055.15 samples/sec   Loss 6.4341   LearningRate 0.0369   Epoch: 7   Global Step: 131100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:00,295-Speed 9557.59 samples/sec   Loss 6.4785   LearningRate 0.0369   Epoch: 7   Global Step: 131110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:01,391-Speed 9355.21 samples/sec   Loss 6.5595   LearningRate 0.0369   Epoch: 7   Global Step: 131120   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:49:02,493-Speed 9299.30 samples/sec   Loss 6.4921   LearningRate 0.0369   Epoch: 7   Global Step: 131130   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:49:03,621-Speed 9083.51 samples/sec   Loss 6.4707   LearningRate 0.0369   Epoch: 7   Global Step: 131140   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:49:04,722-Speed 9301.59 samples/sec   Loss 6.4000   LearningRate 0.0369   Epoch: 7   Global Step: 131150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:05,796-Speed 9544.98 samples/sec   Loss 6.4855   LearningRate 0.0369   Epoch: 7   Global Step: 131160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:06,861-Speed 9619.35 samples/sec   Loss 6.4144   LearningRate 0.0369   Epoch: 7   Global Step: 131170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:07,934-Speed 9544.35 samples/sec   Loss 6.5535   LearningRate 0.0368   Epoch: 7   Global Step: 131180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:09,041-Speed 9260.40 samples/sec   Loss 6.4473   LearningRate 0.0368   Epoch: 7   Global Step: 131190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:10,103-Speed 9646.52 samples/sec   Loss 6.5743   LearningRate 0.0368   Epoch: 7   Global Step: 131200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:11,149-Speed 9795.51 samples/sec   Loss 6.5284   LearningRate 0.0368   Epoch: 7   Global Step: 131210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:12,232-Speed 9469.77 samples/sec   Loss 6.3428   LearningRate 0.0368   Epoch: 7   Global Step: 131220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:13,312-Speed 9482.60 samples/sec   Loss 6.4942   LearningRate 0.0368   Epoch: 7   Global Step: 131230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:14,403-Speed 9391.50 samples/sec   Loss 6.4951   LearningRate 0.0368   Epoch: 7   Global Step: 131240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:15,474-Speed 9562.70 samples/sec   Loss 6.4550   LearningRate 0.0368   Epoch: 7   Global Step: 131250   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:49:16,599-Speed 9110.01 samples/sec   Loss 6.6039   LearningRate 0.0368   Epoch: 7   Global Step: 131260   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:49:17,676-Speed 9514.88 samples/sec   Loss 6.4404   LearningRate 0.0368   Epoch: 7   Global Step: 131270   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:49:18,715-Speed 9868.16 samples/sec   Loss 6.6926   LearningRate 0.0368   Epoch: 7   Global Step: 131280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:19,778-Speed 9633.56 samples/sec   Loss 6.5009   LearningRate 0.0368   Epoch: 7   Global Step: 131290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:20,814-Speed 9891.63 samples/sec   Loss 6.4813   LearningRate 0.0368   Epoch: 7   Global Step: 131300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:21,879-Speed 9617.71 samples/sec   Loss 6.4661   LearningRate 0.0368   Epoch: 7   Global Step: 131310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:22,960-Speed 9479.92 samples/sec   Loss 6.6163   LearningRate 0.0368   Epoch: 7   Global Step: 131320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:24,067-Speed 9256.52 samples/sec   Loss 6.4557   LearningRate 0.0368   Epoch: 7   Global Step: 131330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:25,128-Speed 9652.85 samples/sec   Loss 6.5170   LearningRate 0.0368   Epoch: 7   Global Step: 131340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:26,190-Speed 9648.53 samples/sec   Loss 6.4693   LearningRate 0.0368   Epoch: 7   Global Step: 131350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:27,260-Speed 9581.97 samples/sec   Loss 6.4953   LearningRate 0.0368   Epoch: 7   Global Step: 131360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:28,309-Speed 9764.01 samples/sec   Loss 6.5607   LearningRate 0.0368   Epoch: 7   Global Step: 131370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:29,378-Speed 9586.58 samples/sec   Loss 6.4913   LearningRate 0.0368   Epoch: 7   Global Step: 131380   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:49:30,481-Speed 9285.44 samples/sec   Loss 6.4274   LearningRate 0.0368   Epoch: 7   Global Step: 131390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:31,565-Speed 9458.33 samples/sec   Loss 6.4500   LearningRate 0.0368   Epoch: 7   Global Step: 131400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:32,620-Speed 9711.05 samples/sec   Loss 6.5706   LearningRate 0.0368   Epoch: 7   Global Step: 131410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:33,683-Speed 9637.56 samples/sec   Loss 6.4645   LearningRate 0.0368   Epoch: 7   Global Step: 131420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:34,758-Speed 9532.75 samples/sec   Loss 6.5271   LearningRate 0.0368   Epoch: 7   Global Step: 131430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:35,801-Speed 9823.88 samples/sec   Loss 6.4503   LearningRate 0.0368   Epoch: 7   Global Step: 131440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:36,877-Speed 9521.90 samples/sec   Loss 6.5414   LearningRate 0.0368   Epoch: 7   Global Step: 131450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:37,917-Speed 9857.68 samples/sec   Loss 6.5349   LearningRate 0.0367   Epoch: 7   Global Step: 131460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:38,985-Speed 9594.33 samples/sec   Loss 6.4698   LearningRate 0.0367   Epoch: 7   Global Step: 131470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:40,050-Speed 9616.10 samples/sec   Loss 6.5620   LearningRate 0.0367   Epoch: 7   Global Step: 131480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:41,122-Speed 9557.25 samples/sec   Loss 6.4711   LearningRate 0.0367   Epoch: 7   Global Step: 131490   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:49:42,198-Speed 9526.63 samples/sec   Loss 6.4723   LearningRate 0.0367   Epoch: 7   Global Step: 131500   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:49:43,267-Speed 9582.77 samples/sec   Loss 6.6408   LearningRate 0.0367   Epoch: 7   Global Step: 131510   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:49:44,340-Speed 9547.50 samples/sec   Loss 6.5260   LearningRate 0.0367   Epoch: 7   Global Step: 131520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:45,448-Speed 9248.35 samples/sec   Loss 6.4569   LearningRate 0.0367   Epoch: 7   Global Step: 131530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:46,512-Speed 9629.54 samples/sec   Loss 6.5644   LearningRate 0.0367   Epoch: 7   Global Step: 131540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:47,607-Speed 9358.76 samples/sec   Loss 6.5073   LearningRate 0.0367   Epoch: 7   Global Step: 131550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:48,648-Speed 9850.27 samples/sec   Loss 6.6139   LearningRate 0.0367   Epoch: 7   Global Step: 131560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:49,696-Speed 9776.40 samples/sec   Loss 6.5525   LearningRate 0.0367   Epoch: 7   Global Step: 131570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:50,779-Speed 9459.31 samples/sec   Loss 6.6343   LearningRate 0.0367   Epoch: 7   Global Step: 131580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:51,894-Speed 9183.97 samples/sec   Loss 6.5965   LearningRate 0.0367   Epoch: 7   Global Step: 131590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:53,007-Speed 9209.12 samples/sec   Loss 6.4692   LearningRate 0.0367   Epoch: 7   Global Step: 131600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:54,044-Speed 9879.97 samples/sec   Loss 6.5980   LearningRate 0.0367   Epoch: 7   Global Step: 131610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:55,101-Speed 9689.97 samples/sec   Loss 6.4929   LearningRate 0.0367   Epoch: 7   Global Step: 131620   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:49:56,165-Speed 9635.19 samples/sec   Loss 6.5745   LearningRate 0.0367   Epoch: 7   Global Step: 131630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:57,269-Speed 9276.90 samples/sec   Loss 6.5518   LearningRate 0.0367   Epoch: 7   Global Step: 131640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:58,398-Speed 9077.80 samples/sec   Loss 6.4364   LearningRate 0.0367   Epoch: 7   Global Step: 131650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:49:59,469-Speed 9568.58 samples/sec   Loss 6.4299   LearningRate 0.0367   Epoch: 7   Global Step: 131660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:00,554-Speed 9435.74 samples/sec   Loss 6.6481   LearningRate 0.0367   Epoch: 7   Global Step: 131670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:01,644-Speed 9411.01 samples/sec   Loss 6.5415   LearningRate 0.0367   Epoch: 7   Global Step: 131680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:02,727-Speed 9456.16 samples/sec   Loss 6.5175   LearningRate 0.0367   Epoch: 7   Global Step: 131690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:03,829-Speed 9294.14 samples/sec   Loss 6.5379   LearningRate 0.0367   Epoch: 7   Global Step: 131700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:04,896-Speed 9611.54 samples/sec   Loss 6.4916   LearningRate 0.0367   Epoch: 7   Global Step: 131710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:05,970-Speed 9534.54 samples/sec   Loss 6.4562   LearningRate 0.0367   Epoch: 7   Global Step: 131720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:07,045-Speed 9542.08 samples/sec   Loss 6.4687   LearningRate 0.0366   Epoch: 7   Global Step: 131730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:08,149-Speed 9288.22 samples/sec   Loss 6.5682   LearningRate 0.0366   Epoch: 7   Global Step: 131740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:09,261-Speed 9217.22 samples/sec   Loss 6.4926   LearningRate 0.0366   Epoch: 7   Global Step: 131750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:10,312-Speed 9746.47 samples/sec   Loss 6.5471   LearningRate 0.0366   Epoch: 7   Global Step: 131760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:11,424-Speed 9215.98 samples/sec   Loss 6.5516   LearningRate 0.0366   Epoch: 7   Global Step: 131770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:12,533-Speed 9239.40 samples/sec   Loss 6.4935   LearningRate 0.0366   Epoch: 7   Global Step: 131780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:13,621-Speed 9415.94 samples/sec   Loss 6.4157   LearningRate 0.0366   Epoch: 7   Global Step: 131790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:14,696-Speed 9532.47 samples/sec   Loss 6.5149   LearningRate 0.0366   Epoch: 7   Global Step: 131800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:15,806-Speed 9231.24 samples/sec   Loss 6.5134   LearningRate 0.0366   Epoch: 7   Global Step: 131810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:16,900-Speed 9369.06 samples/sec   Loss 6.3897   LearningRate 0.0366   Epoch: 7   Global Step: 131820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:17,981-Speed 9477.06 samples/sec   Loss 6.6106   LearningRate 0.0366   Epoch: 7   Global Step: 131830   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:50:19,041-Speed 9664.75 samples/sec   Loss 6.4825   LearningRate 0.0366   Epoch: 7   Global Step: 131840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:20,195-Speed 8881.95 samples/sec   Loss 6.4438   LearningRate 0.0366   Epoch: 7   Global Step: 131850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:21,327-Speed 9055.00 samples/sec   Loss 6.5283   LearningRate 0.0366   Epoch: 7   Global Step: 131860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:22,428-Speed 9307.29 samples/sec   Loss 6.4369   LearningRate 0.0366   Epoch: 7   Global Step: 131870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:23,540-Speed 9209.10 samples/sec   Loss 6.4267   LearningRate 0.0366   Epoch: 7   Global Step: 131880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:24,625-Speed 9451.20 samples/sec   Loss 6.4510   LearningRate 0.0366   Epoch: 7   Global Step: 131890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:25,696-Speed 9561.31 samples/sec   Loss 6.5304   LearningRate 0.0366   Epoch: 7   Global Step: 131900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:26,767-Speed 9571.09 samples/sec   Loss 6.4148   LearningRate 0.0366   Epoch: 7   Global Step: 131910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:27,844-Speed 9515.63 samples/sec   Loss 6.4914   LearningRate 0.0366   Epoch: 7   Global Step: 131920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:28,975-Speed 9058.13 samples/sec   Loss 6.5550   LearningRate 0.0366   Epoch: 7   Global Step: 131930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:30,064-Speed 9404.52 samples/sec   Loss 6.6100   LearningRate 0.0366   Epoch: 7   Global Step: 131940   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:50:31,158-Speed 9371.46 samples/sec   Loss 6.6019   LearningRate 0.0366   Epoch: 7   Global Step: 131950   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:50:32,237-Speed 9490.22 samples/sec   Loss 6.5015   LearningRate 0.0366   Epoch: 7   Global Step: 131960   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:50:33,286-Speed 9767.83 samples/sec   Loss 6.4916   LearningRate 0.0366   Epoch: 7   Global Step: 131970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:34,363-Speed 9516.00 samples/sec   Loss 6.4740   LearningRate 0.0366   Epoch: 7   Global Step: 131980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:35,422-Speed 9675.93 samples/sec   Loss 6.4956   LearningRate 0.0366   Epoch: 7   Global Step: 131990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:36,507-Speed 9439.91 samples/sec   Loss 6.6133   LearningRate 0.0366   Epoch: 7   Global Step: 132000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:50:58,523-[lfw][132000]XNorm: 10.575938
Training: 2022-04-11 16:50:58,524-[lfw][132000]Accuracy-Flip: 0.99583+-0.00300
Training: 2022-04-11 16:50:58,525-[lfw][132000]Accuracy-Highest: 0.99683
Training: 2022-04-11 16:51:23,976-[cfp_fp][132000]XNorm: 8.998566
Training: 2022-04-11 16:51:23,977-[cfp_fp][132000]Accuracy-Flip: 0.95543+-0.00941
Training: 2022-04-11 16:51:23,978-[cfp_fp][132000]Accuracy-Highest: 0.96157
Training: 2022-04-11 16:51:45,924-[agedb_30][132000]XNorm: 10.248483
Training: 2022-04-11 16:51:45,925-[agedb_30][132000]Accuracy-Flip: 0.96183+-0.00831
Training: 2022-04-11 16:51:45,925-[agedb_30][132000]Accuracy-Highest: 0.96650
Training: 2022-04-11 16:51:47,028-Speed 145.21 samples/sec   Loss 6.4539   LearningRate 0.0365   Epoch: 7   Global Step: 132010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:51:48,068-Speed 9856.05 samples/sec   Loss 6.5066   LearningRate 0.0365   Epoch: 7   Global Step: 132020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:51:49,158-Speed 9398.07 samples/sec   Loss 6.5052   LearningRate 0.0365   Epoch: 7   Global Step: 132030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:51:50,219-Speed 9655.30 samples/sec   Loss 6.4070   LearningRate 0.0365   Epoch: 7   Global Step: 132040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:51:51,253-Speed 9905.71 samples/sec   Loss 6.6050   LearningRate 0.0365   Epoch: 7   Global Step: 132050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:51:52,353-Speed 9321.09 samples/sec   Loss 6.6488   LearningRate 0.0365   Epoch: 7   Global Step: 132060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:51:53,439-Speed 9434.64 samples/sec   Loss 6.4935   LearningRate 0.0365   Epoch: 7   Global Step: 132070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:51:54,528-Speed 9408.14 samples/sec   Loss 6.5583   LearningRate 0.0365   Epoch: 7   Global Step: 132080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:51:55,586-Speed 9680.82 samples/sec   Loss 6.5410   LearningRate 0.0365   Epoch: 7   Global Step: 132090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:51:56,657-Speed 9569.55 samples/sec   Loss 6.5651   LearningRate 0.0365   Epoch: 7   Global Step: 132100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:51:57,765-Speed 9243.70 samples/sec   Loss 6.5613   LearningRate 0.0365   Epoch: 7   Global Step: 132110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:51:58,868-Speed 9288.41 samples/sec   Loss 6.4462   LearningRate 0.0365   Epoch: 7   Global Step: 132120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:51:59,935-Speed 9604.00 samples/sec   Loss 6.5425   LearningRate 0.0365   Epoch: 7   Global Step: 132130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:00,993-Speed 9684.65 samples/sec   Loss 6.4869   LearningRate 0.0365   Epoch: 7   Global Step: 132140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:02,040-Speed 9786.65 samples/sec   Loss 6.4499   LearningRate 0.0365   Epoch: 7   Global Step: 132150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:03,139-Speed 9321.52 samples/sec   Loss 6.4806   LearningRate 0.0365   Epoch: 7   Global Step: 132160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:04,198-Speed 9676.71 samples/sec   Loss 6.5735   LearningRate 0.0365   Epoch: 7   Global Step: 132170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:05,255-Speed 9695.30 samples/sec   Loss 6.4820   LearningRate 0.0365   Epoch: 7   Global Step: 132180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:06,354-Speed 9321.79 samples/sec   Loss 6.4804   LearningRate 0.0365   Epoch: 7   Global Step: 132190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:07,417-Speed 9641.21 samples/sec   Loss 6.5069   LearningRate 0.0365   Epoch: 7   Global Step: 132200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:08,486-Speed 9577.75 samples/sec   Loss 6.4848   LearningRate 0.0365   Epoch: 7   Global Step: 132210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:09,549-Speed 9642.26 samples/sec   Loss 6.4286   LearningRate 0.0365   Epoch: 7   Global Step: 132220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:10,601-Speed 9740.82 samples/sec   Loss 6.3883   LearningRate 0.0365   Epoch: 7   Global Step: 132230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:11,700-Speed 9322.61 samples/sec   Loss 6.4032   LearningRate 0.0365   Epoch: 7   Global Step: 132240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:12,800-Speed 9316.20 samples/sec   Loss 6.4609   LearningRate 0.0365   Epoch: 7   Global Step: 132250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:13,920-Speed 9152.60 samples/sec   Loss 6.5269   LearningRate 0.0365   Epoch: 7   Global Step: 132260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:14,998-Speed 9504.64 samples/sec   Loss 6.4495   LearningRate 0.0365   Epoch: 7   Global Step: 132270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:16,104-Speed 9263.68 samples/sec   Loss 6.4550   LearningRate 0.0365   Epoch: 7   Global Step: 132280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:17,246-Speed 8972.62 samples/sec   Loss 6.4731   LearningRate 0.0364   Epoch: 7   Global Step: 132290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:18,299-Speed 9724.94 samples/sec   Loss 6.5788   LearningRate 0.0364   Epoch: 7   Global Step: 132300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:19,389-Speed 9403.26 samples/sec   Loss 6.5154   LearningRate 0.0364   Epoch: 7   Global Step: 132310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:20,497-Speed 9239.18 samples/sec   Loss 6.5118   LearningRate 0.0364   Epoch: 7   Global Step: 132320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:21,600-Speed 9293.00 samples/sec   Loss 6.4667   LearningRate 0.0364   Epoch: 7   Global Step: 132330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:22,651-Speed 9748.00 samples/sec   Loss 6.5810   LearningRate 0.0364   Epoch: 7   Global Step: 132340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:23,703-Speed 9747.61 samples/sec   Loss 6.4817   LearningRate 0.0364   Epoch: 7   Global Step: 132350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:24,757-Speed 9716.03 samples/sec   Loss 6.6032   LearningRate 0.0364   Epoch: 7   Global Step: 132360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:25,880-Speed 9127.36 samples/sec   Loss 6.4676   LearningRate 0.0364   Epoch: 7   Global Step: 132370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:27,016-Speed 9018.84 samples/sec   Loss 6.4885   LearningRate 0.0364   Epoch: 7   Global Step: 132380   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:52:28,059-Speed 9820.75 samples/sec   Loss 6.3501   LearningRate 0.0364   Epoch: 7   Global Step: 132390   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:52:29,156-Speed 9336.04 samples/sec   Loss 6.6254   LearningRate 0.0364   Epoch: 7   Global Step: 132400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:30,247-Speed 9393.83 samples/sec   Loss 6.4425   LearningRate 0.0364   Epoch: 7   Global Step: 132410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:31,340-Speed 9373.89 samples/sec   Loss 6.4658   LearningRate 0.0364   Epoch: 7   Global Step: 132420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:32,415-Speed 9536.92 samples/sec   Loss 6.4809   LearningRate 0.0364   Epoch: 7   Global Step: 132430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:33,496-Speed 9482.08 samples/sec   Loss 6.5527   LearningRate 0.0364   Epoch: 7   Global Step: 132440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:34,577-Speed 9471.66 samples/sec   Loss 6.5342   LearningRate 0.0364   Epoch: 7   Global Step: 132450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:35,679-Speed 9297.75 samples/sec   Loss 6.4971   LearningRate 0.0364   Epoch: 7   Global Step: 132460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:36,741-Speed 9648.16 samples/sec   Loss 6.4442   LearningRate 0.0364   Epoch: 7   Global Step: 132470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:37,830-Speed 9409.42 samples/sec   Loss 6.4906   LearningRate 0.0364   Epoch: 7   Global Step: 132480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:38,956-Speed 9097.13 samples/sec   Loss 6.4640   LearningRate 0.0364   Epoch: 7   Global Step: 132490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:40,075-Speed 9157.56 samples/sec   Loss 6.3881   LearningRate 0.0364   Epoch: 7   Global Step: 132500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:41,188-Speed 9212.10 samples/sec   Loss 6.4950   LearningRate 0.0364   Epoch: 7   Global Step: 132510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:52:42,293-Speed 9266.00 samples/sec   Loss 6.4477   LearningRate 0.0364   Epoch: 7   Global Step: 132520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:43,382-Speed 9410.34 samples/sec   Loss 6.3972   LearningRate 0.0364   Epoch: 7   Global Step: 132530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:44,464-Speed 9469.40 samples/sec   Loss 6.4432   LearningRate 0.0364   Epoch: 7   Global Step: 132540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:45,561-Speed 9339.67 samples/sec   Loss 6.6135   LearningRate 0.0364   Epoch: 7   Global Step: 132550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:46,587-Speed 9982.63 samples/sec   Loss 6.6114   LearningRate 0.0363   Epoch: 7   Global Step: 132560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:47,664-Speed 9513.08 samples/sec   Loss 6.5319   LearningRate 0.0363   Epoch: 7   Global Step: 132570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:48,753-Speed 9415.40 samples/sec   Loss 6.5196   LearningRate 0.0363   Epoch: 7   Global Step: 132580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:49,861-Speed 9244.61 samples/sec   Loss 6.5108   LearningRate 0.0363   Epoch: 7   Global Step: 132590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:50,931-Speed 9576.98 samples/sec   Loss 6.3852   LearningRate 0.0363   Epoch: 7   Global Step: 132600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:52,009-Speed 9504.09 samples/sec   Loss 6.6009   LearningRate 0.0363   Epoch: 7   Global Step: 132610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:53,073-Speed 9639.11 samples/sec   Loss 6.4641   LearningRate 0.0363   Epoch: 7   Global Step: 132620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:54,151-Speed 9499.14 samples/sec   Loss 6.4785   LearningRate 0.0363   Epoch: 7   Global Step: 132630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:55,263-Speed 9214.22 samples/sec   Loss 6.4218   LearningRate 0.0363   Epoch: 7   Global Step: 132640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:56,336-Speed 9549.27 samples/sec   Loss 6.5184   LearningRate 0.0363   Epoch: 7   Global Step: 132650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:57,400-Speed 9626.84 samples/sec   Loss 6.4767   LearningRate 0.0363   Epoch: 7   Global Step: 132660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:58,469-Speed 9586.33 samples/sec   Loss 6.4584   LearningRate 0.0363   Epoch: 7   Global Step: 132670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:52:59,574-Speed 9271.19 samples/sec   Loss 6.4589   LearningRate 0.0363   Epoch: 7   Global Step: 132680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:00,650-Speed 9519.54 samples/sec   Loss 6.4793   LearningRate 0.0363   Epoch: 7   Global Step: 132690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:01,738-Speed 9418.27 samples/sec   Loss 6.4657   LearningRate 0.0363   Epoch: 7   Global Step: 132700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:02,838-Speed 9327.04 samples/sec   Loss 6.4290   LearningRate 0.0363   Epoch: 7   Global Step: 132710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:03,900-Speed 9646.63 samples/sec   Loss 6.5150   LearningRate 0.0363   Epoch: 7   Global Step: 132720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:04,999-Speed 9324.30 samples/sec   Loss 6.5214   LearningRate 0.0363   Epoch: 7   Global Step: 132730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:06,100-Speed 9305.54 samples/sec   Loss 6.4750   LearningRate 0.0363   Epoch: 7   Global Step: 132740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:07,191-Speed 9388.84 samples/sec   Loss 6.5550   LearningRate 0.0363   Epoch: 7   Global Step: 132750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:08,243-Speed 9738.14 samples/sec   Loss 6.3608   LearningRate 0.0363   Epoch: 7   Global Step: 132760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:09,337-Speed 9372.92 samples/sec   Loss 6.5011   LearningRate 0.0363   Epoch: 7   Global Step: 132770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:10,419-Speed 9465.08 samples/sec   Loss 6.4885   LearningRate 0.0363   Epoch: 7   Global Step: 132780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:11,482-Speed 9645.66 samples/sec   Loss 6.4438   LearningRate 0.0363   Epoch: 7   Global Step: 132790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:12,509-Speed 9972.18 samples/sec   Loss 6.4120   LearningRate 0.0363   Epoch: 7   Global Step: 132800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:13,580-Speed 9565.85 samples/sec   Loss 6.4315   LearningRate 0.0363   Epoch: 7   Global Step: 132810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:14,643-Speed 9638.40 samples/sec   Loss 6.4809   LearningRate 0.0363   Epoch: 7   Global Step: 132820   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:53:15,720-Speed 9515.11 samples/sec   Loss 6.5243   LearningRate 0.0363   Epoch: 7   Global Step: 132830   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:53:16,810-Speed 9400.09 samples/sec   Loss 6.3800   LearningRate 0.0362   Epoch: 7   Global Step: 132840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:17,893-Speed 9458.49 samples/sec   Loss 6.5385   LearningRate 0.0362   Epoch: 7   Global Step: 132850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:18,966-Speed 9548.15 samples/sec   Loss 6.5118   LearningRate 0.0362   Epoch: 7   Global Step: 132860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:20,033-Speed 9602.10 samples/sec   Loss 6.4457   LearningRate 0.0362   Epoch: 7   Global Step: 132870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:21,101-Speed 9592.53 samples/sec   Loss 6.3567   LearningRate 0.0362   Epoch: 7   Global Step: 132880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:22,127-Speed 9986.79 samples/sec   Loss 6.3412   LearningRate 0.0362   Epoch: 7   Global Step: 132890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:23,223-Speed 9355.62 samples/sec   Loss 6.5353   LearningRate 0.0362   Epoch: 7   Global Step: 132900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:24,282-Speed 9671.61 samples/sec   Loss 6.4885   LearningRate 0.0362   Epoch: 7   Global Step: 132910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:25,408-Speed 9097.90 samples/sec   Loss 6.4238   LearningRate 0.0362   Epoch: 7   Global Step: 132920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:26,526-Speed 9162.94 samples/sec   Loss 6.6124   LearningRate 0.0362   Epoch: 7   Global Step: 132930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:27,585-Speed 9677.03 samples/sec   Loss 6.5818   LearningRate 0.0362   Epoch: 7   Global Step: 132940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:28,631-Speed 9804.77 samples/sec   Loss 6.4265   LearningRate 0.0362   Epoch: 7   Global Step: 132950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:29,668-Speed 9873.24 samples/sec   Loss 6.5030   LearningRate 0.0362   Epoch: 7   Global Step: 132960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:30,776-Speed 9250.42 samples/sec   Loss 6.5019   LearningRate 0.0362   Epoch: 7   Global Step: 132970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:31,872-Speed 9347.07 samples/sec   Loss 6.4470   LearningRate 0.0362   Epoch: 7   Global Step: 132980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:32,987-Speed 9195.12 samples/sec   Loss 6.5377   LearningRate 0.0362   Epoch: 7   Global Step: 132990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:34,062-Speed 9532.01 samples/sec   Loss 6.4978   LearningRate 0.0362   Epoch: 7   Global Step: 133000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:35,154-Speed 9384.10 samples/sec   Loss 6.4639   LearningRate 0.0362   Epoch: 7   Global Step: 133010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:36,216-Speed 9644.33 samples/sec   Loss 6.3958   LearningRate 0.0362   Epoch: 7   Global Step: 133020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:37,269-Speed 9732.54 samples/sec   Loss 6.4452   LearningRate 0.0362   Epoch: 7   Global Step: 133030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:38,349-Speed 9479.86 samples/sec   Loss 6.4304   LearningRate 0.0362   Epoch: 7   Global Step: 133040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:39,463-Speed 9207.65 samples/sec   Loss 6.5502   LearningRate 0.0362   Epoch: 7   Global Step: 133050   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:53:40,565-Speed 9295.52 samples/sec   Loss 6.4411   LearningRate 0.0362   Epoch: 7   Global Step: 133060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:41,631-Speed 9612.46 samples/sec   Loss 6.5786   LearningRate 0.0362   Epoch: 7   Global Step: 133070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:42,730-Speed 9320.54 samples/sec   Loss 6.3627   LearningRate 0.0362   Epoch: 7   Global Step: 133080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:43,843-Speed 9204.69 samples/sec   Loss 6.5501   LearningRate 0.0362   Epoch: 7   Global Step: 133090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:44,973-Speed 9071.75 samples/sec   Loss 6.4435   LearningRate 0.0362   Epoch: 7   Global Step: 133100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:46,028-Speed 9711.66 samples/sec   Loss 6.5504   LearningRate 0.0362   Epoch: 7   Global Step: 133110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:47,115-Speed 9427.01 samples/sec   Loss 6.5521   LearningRate 0.0361   Epoch: 7   Global Step: 133120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:53:48,197-Speed 9469.79 samples/sec   Loss 6.5289   LearningRate 0.0361   Epoch: 7   Global Step: 133130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:49,288-Speed 9387.35 samples/sec   Loss 6.6217   LearningRate 0.0361   Epoch: 7   Global Step: 133140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:50,363-Speed 9534.27 samples/sec   Loss 6.5010   LearningRate 0.0361   Epoch: 7   Global Step: 133150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:51,444-Speed 9480.29 samples/sec   Loss 6.4419   LearningRate 0.0361   Epoch: 7   Global Step: 133160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:52,496-Speed 9735.02 samples/sec   Loss 6.3918   LearningRate 0.0361   Epoch: 7   Global Step: 133170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:53,557-Speed 9659.29 samples/sec   Loss 6.4472   LearningRate 0.0361   Epoch: 7   Global Step: 133180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:54,640-Speed 9463.06 samples/sec   Loss 6.5241   LearningRate 0.0361   Epoch: 7   Global Step: 133190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:55,707-Speed 9597.43 samples/sec   Loss 6.4735   LearningRate 0.0361   Epoch: 7   Global Step: 133200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:56,861-Speed 8881.31 samples/sec   Loss 6.3962   LearningRate 0.0361   Epoch: 7   Global Step: 133210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:57,932-Speed 9568.23 samples/sec   Loss 6.4627   LearningRate 0.0361   Epoch: 7   Global Step: 133220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:53:59,048-Speed 9173.77 samples/sec   Loss 6.4928   LearningRate 0.0361   Epoch: 7   Global Step: 133230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:54:00,127-Speed 9496.01 samples/sec   Loss 6.4976   LearningRate 0.0361   Epoch: 7   Global Step: 133240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:54:01,208-Speed 9479.30 samples/sec   Loss 6.5578   LearningRate 0.0361   Epoch: 7   Global Step: 133250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:54:02,344-Speed 9017.50 samples/sec   Loss 6.4871   LearningRate 0.0361   Epoch: 7   Global Step: 133260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:54:03,482-Speed 9008.03 samples/sec   Loss 6.3623   LearningRate 0.0361   Epoch: 7   Global Step: 133270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:54:04,553-Speed 9571.41 samples/sec   Loss 6.4899   LearningRate 0.0361   Epoch: 7   Global Step: 133280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:54:05,623-Speed 9580.78 samples/sec   Loss 6.4630   LearningRate 0.0361   Epoch: 7   Global Step: 133290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:54:06,745-Speed 9126.77 samples/sec   Loss 6.3669   LearningRate 0.0361   Epoch: 7   Global Step: 133300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:54:07,808-Speed 9638.71 samples/sec   Loss 6.4764   LearningRate 0.0361   Epoch: 7   Global Step: 133310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:54:08,908-Speed 9316.27 samples/sec   Loss 6.4850   LearningRate 0.0361   Epoch: 7   Global Step: 133320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:54:09,979-Speed 9574.70 samples/sec   Loss 6.4760   LearningRate 0.0361   Epoch: 7   Global Step: 133330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:54:11,051-Speed 9558.29 samples/sec   Loss 6.4347   LearningRate 0.0361   Epoch: 7   Global Step: 133340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:54:12,103-Speed 9740.19 samples/sec   Loss 6.4753   LearningRate 0.0361   Epoch: 7   Global Step: 133350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:54:13,193-Speed 9400.22 samples/sec   Loss 6.3191   LearningRate 0.0361   Epoch: 7   Global Step: 133360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:54:14,279-Speed 9430.36 samples/sec   Loss 6.4986   LearningRate 0.0361   Epoch: 7   Global Step: 133370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:54:15,357-Speed 9500.21 samples/sec   Loss 6.5219   LearningRate 0.0361   Epoch: 7   Global Step: 133380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:54:16,450-Speed 9375.64 samples/sec   Loss 6.5047   LearningRate 0.0360   Epoch: 7   Global Step: 133390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:54:17,532-Speed 9466.09 samples/sec   Loss 6.4182   LearningRate 0.0360   Epoch: 7   Global Step: 133400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:54:18,643-Speed 9227.87 samples/sec   Loss 6.4758   LearningRate 0.0360   Epoch: 7   Global Step: 133410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:54:19,728-Speed 9441.25 samples/sec   Loss 6.6120   LearningRate 0.0360   Epoch: 7   Global Step: 133420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:54:20,830-Speed 9300.91 samples/sec   Loss 6.4803   LearningRate 0.0360   Epoch: 7   Global Step: 133430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:54:21,924-Speed 9367.34 samples/sec   Loss 6.3399   LearningRate 0.0360   Epoch: 7   Global Step: 133440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:54:23,020-Speed 9347.78 samples/sec   Loss 6.4492   LearningRate 0.0360   Epoch: 7   Global Step: 133450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:54:24,094-Speed 9548.79 samples/sec   Loss 6.4832   LearningRate 0.0360   Epoch: 7   Global Step: 133460   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:54:25,146-Speed 9738.57 samples/sec   Loss 6.5695   LearningRate 0.0360   Epoch: 7   Global Step: 133470   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:54:26,206-Speed 9667.17 samples/sec   Loss 6.4099   LearningRate 0.0360   Epoch: 7   Global Step: 133480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:54:27,229-Speed 10008.80 samples/sec   Loss 6.4003   LearningRate 0.0360   Epoch: 7   Global Step: 133490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:54:28,293-Speed 9630.75 samples/sec   Loss 6.4133   LearningRate 0.0360   Epoch: 7   Global Step: 133500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:54:29,416-Speed 9126.79 samples/sec   Loss 6.5299   LearningRate 0.0360   Epoch: 7   Global Step: 133510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:54:30,804-Speed 7377.91 samples/sec   Loss 6.4412   LearningRate 0.0360   Epoch: 7   Global Step: 133520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:54:59,977-Speed 351.04 samples/sec   Loss 6.2787   LearningRate 0.0360   Epoch: 8   Global Step: 133530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:55:01,216-Speed 8274.77 samples/sec   Loss 5.7452   LearningRate 0.0360   Epoch: 8   Global Step: 133540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:55:02,926-Speed 5990.68 samples/sec   Loss 5.7061   LearningRate 0.0360   Epoch: 8   Global Step: 133550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:55:04,744-Speed 5634.74 samples/sec   Loss 5.6522   LearningRate 0.0360   Epoch: 8   Global Step: 133560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:55:06,347-Speed 6392.35 samples/sec   Loss 5.6688   LearningRate 0.0360   Epoch: 8   Global Step: 133570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:55:07,516-Speed 8768.47 samples/sec   Loss 5.6419   LearningRate 0.0360   Epoch: 8   Global Step: 133580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:55:08,771-Speed 8160.24 samples/sec   Loss 5.5655   LearningRate 0.0360   Epoch: 8   Global Step: 133590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:09,843-Speed 9559.07 samples/sec   Loss 5.7168   LearningRate 0.0360   Epoch: 8   Global Step: 133600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:10,917-Speed 9540.75 samples/sec   Loss 5.6078   LearningRate 0.0360   Epoch: 8   Global Step: 133610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:12,026-Speed 9243.98 samples/sec   Loss 5.7123   LearningRate 0.0360   Epoch: 8   Global Step: 133620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:13,404-Speed 7434.64 samples/sec   Loss 5.7700   LearningRate 0.0360   Epoch: 8   Global Step: 133630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:14,522-Speed 9158.21 samples/sec   Loss 5.6549   LearningRate 0.0360   Epoch: 8   Global Step: 133640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:15,645-Speed 9126.16 samples/sec   Loss 5.6491   LearningRate 0.0360   Epoch: 8   Global Step: 133650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:16,738-Speed 9378.88 samples/sec   Loss 5.8074   LearningRate 0.0360   Epoch: 8   Global Step: 133660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:17,816-Speed 9504.89 samples/sec   Loss 5.6989   LearningRate 0.0359   Epoch: 8   Global Step: 133670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:18,930-Speed 9192.66 samples/sec   Loss 5.6363   LearningRate 0.0359   Epoch: 8   Global Step: 133680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:20,031-Speed 9309.71 samples/sec   Loss 5.7916   LearningRate 0.0359   Epoch: 8   Global Step: 133690   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:55:21,096-Speed 9623.10 samples/sec   Loss 5.6810   LearningRate 0.0359   Epoch: 8   Global Step: 133700   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:55:22,152-Speed 9706.40 samples/sec   Loss 5.7197   LearningRate 0.0359   Epoch: 8   Global Step: 133710   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:55:23,199-Speed 9782.95 samples/sec   Loss 5.7753   LearningRate 0.0359   Epoch: 8   Global Step: 133720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:24,255-Speed 9699.98 samples/sec   Loss 5.6736   LearningRate 0.0359   Epoch: 8   Global Step: 133730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:25,333-Speed 9505.52 samples/sec   Loss 5.7397   LearningRate 0.0359   Epoch: 8   Global Step: 133740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:26,415-Speed 9473.56 samples/sec   Loss 5.7265   LearningRate 0.0359   Epoch: 8   Global Step: 133750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:27,482-Speed 9598.68 samples/sec   Loss 5.5625   LearningRate 0.0359   Epoch: 8   Global Step: 133760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:28,540-Speed 9690.00 samples/sec   Loss 5.6823   LearningRate 0.0359   Epoch: 8   Global Step: 133770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:55:29,896-Speed 7555.72 samples/sec   Loss 5.7231   LearningRate 0.0359   Epoch: 8   Global Step: 133780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:55:31,004-Speed 9249.06 samples/sec   Loss 5.7348   LearningRate 0.0359   Epoch: 8   Global Step: 133790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:55:32,117-Speed 9198.82 samples/sec   Loss 5.7691   LearningRate 0.0359   Epoch: 8   Global Step: 133800   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:55:33,228-Speed 9222.84 samples/sec   Loss 5.6956   LearningRate 0.0359   Epoch: 8   Global Step: 133810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:55:34,300-Speed 9554.64 samples/sec   Loss 5.7851   LearningRate 0.0359   Epoch: 8   Global Step: 133820   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:55:35,370-Speed 9581.12 samples/sec   Loss 5.6825   LearningRate 0.0359   Epoch: 8   Global Step: 133830   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:55:36,410-Speed 9849.76 samples/sec   Loss 5.6766   LearningRate 0.0359   Epoch: 8   Global Step: 133840   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:55:37,482-Speed 9561.78 samples/sec   Loss 5.7435   LearningRate 0.0359   Epoch: 8   Global Step: 133850   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:55:38,559-Speed 9510.69 samples/sec   Loss 5.6844   LearningRate 0.0359   Epoch: 8   Global Step: 133860   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:55:39,616-Speed 9698.76 samples/sec   Loss 5.6794   LearningRate 0.0359   Epoch: 8   Global Step: 133870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:40,685-Speed 9580.04 samples/sec   Loss 5.7011   LearningRate 0.0359   Epoch: 8   Global Step: 133880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:41,758-Speed 9552.32 samples/sec   Loss 5.7557   LearningRate 0.0359   Epoch: 8   Global Step: 133890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:42,876-Speed 9168.45 samples/sec   Loss 5.7714   LearningRate 0.0359   Epoch: 8   Global Step: 133900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:43,973-Speed 9341.17 samples/sec   Loss 5.6567   LearningRate 0.0359   Epoch: 8   Global Step: 133910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:45,081-Speed 9245.92 samples/sec   Loss 5.7114   LearningRate 0.0359   Epoch: 8   Global Step: 133920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:46,121-Speed 9855.46 samples/sec   Loss 5.7356   LearningRate 0.0359   Epoch: 8   Global Step: 133930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:47,228-Speed 9254.27 samples/sec   Loss 5.7527   LearningRate 0.0359   Epoch: 8   Global Step: 133940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:48,265-Speed 9873.91 samples/sec   Loss 5.8555   LearningRate 0.0358   Epoch: 8   Global Step: 133950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:49,378-Speed 9209.53 samples/sec   Loss 5.7181   LearningRate 0.0358   Epoch: 8   Global Step: 133960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:55:50,660-Speed 7991.70 samples/sec   Loss 5.7000   LearningRate 0.0358   Epoch: 8   Global Step: 133970   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:55:51,776-Speed 9181.91 samples/sec   Loss 5.6996   LearningRate 0.0358   Epoch: 8   Global Step: 133980   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:55:52,847-Speed 9560.41 samples/sec   Loss 5.7628   LearningRate 0.0358   Epoch: 8   Global Step: 133990   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:55:54,109-Speed 8118.10 samples/sec   Loss 5.8491   LearningRate 0.0358   Epoch: 8   Global Step: 134000   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:56:16,118-[lfw][134000]XNorm: 11.110978
Training: 2022-04-11 16:56:16,118-[lfw][134000]Accuracy-Flip: 0.97767+-0.00611
Training: 2022-04-11 16:56:16,119-[lfw][134000]Accuracy-Highest: 0.99683
Training: 2022-04-11 16:56:41,439-[cfp_fp][134000]XNorm: 9.446907
Training: 2022-04-11 16:56:41,440-[cfp_fp][134000]Accuracy-Flip: 0.89029+-0.01258
Training: 2022-04-11 16:56:41,440-[cfp_fp][134000]Accuracy-Highest: 0.96157
Training: 2022-04-11 16:57:03,287-[agedb_30][134000]XNorm: 10.635493
Training: 2022-04-11 16:57:03,288-[agedb_30][134000]Accuracy-Flip: 0.91617+-0.01520
Training: 2022-04-11 16:57:03,288-[agedb_30][134000]Accuracy-Highest: 0.96650
Training: 2022-04-11 16:57:04,737-Speed 144.99 samples/sec   Loss 5.8002   LearningRate 0.0358   Epoch: 8   Global Step: 134010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:06,043-Speed 7850.46 samples/sec   Loss 5.7898   LearningRate 0.0358   Epoch: 8   Global Step: 134020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:07,308-Speed 8100.06 samples/sec   Loss 5.6676   LearningRate 0.0358   Epoch: 8   Global Step: 134030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:08,582-Speed 8037.89 samples/sec   Loss 5.6885   LearningRate 0.0358   Epoch: 8   Global Step: 134040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:09,662-Speed 9488.64 samples/sec   Loss 5.7868   LearningRate 0.0358   Epoch: 8   Global Step: 134050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:10,757-Speed 9355.82 samples/sec   Loss 5.6469   LearningRate 0.0358   Epoch: 8   Global Step: 134060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:11,984-Speed 8352.30 samples/sec   Loss 5.8915   LearningRate 0.0358   Epoch: 8   Global Step: 134070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:13,132-Speed 8920.45 samples/sec   Loss 5.7855   LearningRate 0.0358   Epoch: 8   Global Step: 134080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:14,233-Speed 9308.10 samples/sec   Loss 5.7690   LearningRate 0.0358   Epoch: 8   Global Step: 134090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:15,312-Speed 9504.56 samples/sec   Loss 5.7725   LearningRate 0.0358   Epoch: 8   Global Step: 134100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:16,367-Speed 9710.60 samples/sec   Loss 5.8072   LearningRate 0.0358   Epoch: 8   Global Step: 134110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:17,492-Speed 9105.23 samples/sec   Loss 5.8168   LearningRate 0.0358   Epoch: 8   Global Step: 134120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:18,591-Speed 9322.10 samples/sec   Loss 5.7034   LearningRate 0.0358   Epoch: 8   Global Step: 134130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:19,656-Speed 9617.98 samples/sec   Loss 5.8428   LearningRate 0.0358   Epoch: 8   Global Step: 134140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:20,708-Speed 9743.31 samples/sec   Loss 5.8577   LearningRate 0.0358   Epoch: 8   Global Step: 134150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:21,765-Speed 9687.90 samples/sec   Loss 5.6755   LearningRate 0.0358   Epoch: 8   Global Step: 134160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:22,800-Speed 9901.44 samples/sec   Loss 5.8761   LearningRate 0.0358   Epoch: 8   Global Step: 134170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:23,854-Speed 9722.13 samples/sec   Loss 5.8198   LearningRate 0.0358   Epoch: 8   Global Step: 134180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:24,920-Speed 9610.72 samples/sec   Loss 5.8963   LearningRate 0.0358   Epoch: 8   Global Step: 134190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:25,982-Speed 9643.72 samples/sec   Loss 5.7302   LearningRate 0.0358   Epoch: 8   Global Step: 134200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:27,038-Speed 9710.27 samples/sec   Loss 5.8144   LearningRate 0.0358   Epoch: 8   Global Step: 134210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:28,119-Speed 9477.53 samples/sec   Loss 5.7742   LearningRate 0.0358   Epoch: 8   Global Step: 134220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:29,201-Speed 9466.02 samples/sec   Loss 5.8323   LearningRate 0.0357   Epoch: 8   Global Step: 134230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:30,252-Speed 9747.23 samples/sec   Loss 5.7941   LearningRate 0.0357   Epoch: 8   Global Step: 134240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:31,286-Speed 9917.20 samples/sec   Loss 5.8263   LearningRate 0.0357   Epoch: 8   Global Step: 134250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:32,364-Speed 9501.87 samples/sec   Loss 5.7887   LearningRate 0.0357   Epoch: 8   Global Step: 134260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:33,462-Speed 9328.85 samples/sec   Loss 5.7350   LearningRate 0.0357   Epoch: 8   Global Step: 134270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:34,535-Speed 9553.61 samples/sec   Loss 5.8527   LearningRate 0.0357   Epoch: 8   Global Step: 134280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:35,647-Speed 9216.66 samples/sec   Loss 5.8963   LearningRate 0.0357   Epoch: 8   Global Step: 134290   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:57:36,736-Speed 9407.85 samples/sec   Loss 5.8666   LearningRate 0.0357   Epoch: 8   Global Step: 134300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:37,846-Speed 9231.75 samples/sec   Loss 5.7828   LearningRate 0.0357   Epoch: 8   Global Step: 134310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:38,949-Speed 9285.88 samples/sec   Loss 5.8622   LearningRate 0.0357   Epoch: 8   Global Step: 134320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:40,003-Speed 9727.20 samples/sec   Loss 5.9011   LearningRate 0.0357   Epoch: 8   Global Step: 134330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:41,060-Speed 9693.28 samples/sec   Loss 5.8523   LearningRate 0.0357   Epoch: 8   Global Step: 134340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:42,152-Speed 9376.14 samples/sec   Loss 5.8762   LearningRate 0.0357   Epoch: 8   Global Step: 134350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:43,207-Speed 9718.73 samples/sec   Loss 5.7977   LearningRate 0.0357   Epoch: 8   Global Step: 134360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:44,284-Speed 9509.75 samples/sec   Loss 5.7271   LearningRate 0.0357   Epoch: 8   Global Step: 134370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:45,350-Speed 9616.12 samples/sec   Loss 5.7249   LearningRate 0.0357   Epoch: 8   Global Step: 134380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:46,412-Speed 9647.54 samples/sec   Loss 5.7863   LearningRate 0.0357   Epoch: 8   Global Step: 134390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:47,512-Speed 9315.07 samples/sec   Loss 5.8311   LearningRate 0.0357   Epoch: 8   Global Step: 134400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:48,636-Speed 9116.93 samples/sec   Loss 5.8355   LearningRate 0.0357   Epoch: 8   Global Step: 134410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:49,705-Speed 9577.63 samples/sec   Loss 5.8568   LearningRate 0.0357   Epoch: 8   Global Step: 134420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:57:50,784-Speed 9500.12 samples/sec   Loss 5.8022   LearningRate 0.0357   Epoch: 8   Global Step: 134430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:51,857-Speed 9551.79 samples/sec   Loss 5.7601   LearningRate 0.0357   Epoch: 8   Global Step: 134440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:52,934-Speed 9515.47 samples/sec   Loss 5.8642   LearningRate 0.0357   Epoch: 8   Global Step: 134450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:54,044-Speed 9226.50 samples/sec   Loss 5.7894   LearningRate 0.0357   Epoch: 8   Global Step: 134460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:55,138-Speed 9367.79 samples/sec   Loss 5.8305   LearningRate 0.0357   Epoch: 8   Global Step: 134470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:56,176-Speed 9870.01 samples/sec   Loss 5.8220   LearningRate 0.0357   Epoch: 8   Global Step: 134480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:57,251-Speed 9530.99 samples/sec   Loss 5.9715   LearningRate 0.0357   Epoch: 8   Global Step: 134490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:58,336-Speed 9448.37 samples/sec   Loss 5.8855   LearningRate 0.0357   Epoch: 8   Global Step: 134500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:57:59,389-Speed 9726.69 samples/sec   Loss 5.9273   LearningRate 0.0356   Epoch: 8   Global Step: 134510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:00,483-Speed 9368.89 samples/sec   Loss 5.7903   LearningRate 0.0356   Epoch: 8   Global Step: 134520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:01,565-Speed 9464.43 samples/sec   Loss 5.8805   LearningRate 0.0356   Epoch: 8   Global Step: 134530   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:58:02,616-Speed 9748.68 samples/sec   Loss 5.9151   LearningRate 0.0356   Epoch: 8   Global Step: 134540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:03,729-Speed 9202.39 samples/sec   Loss 5.9196   LearningRate 0.0356   Epoch: 8   Global Step: 134550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:04,844-Speed 9195.70 samples/sec   Loss 5.9684   LearningRate 0.0356   Epoch: 8   Global Step: 134560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:05,902-Speed 9682.11 samples/sec   Loss 5.8429   LearningRate 0.0356   Epoch: 8   Global Step: 134570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:58:06,982-Speed 9481.63 samples/sec   Loss 5.9154   LearningRate 0.0356   Epoch: 8   Global Step: 134580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:58:08,050-Speed 9596.57 samples/sec   Loss 5.8728   LearningRate 0.0356   Epoch: 8   Global Step: 134590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:58:09,089-Speed 9864.46 samples/sec   Loss 5.8353   LearningRate 0.0356   Epoch: 8   Global Step: 134600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:58:10,134-Speed 9802.10 samples/sec   Loss 5.8411   LearningRate 0.0356   Epoch: 8   Global Step: 134610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:58:11,203-Speed 9584.91 samples/sec   Loss 5.8441   LearningRate 0.0356   Epoch: 8   Global Step: 134620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:58:12,277-Speed 9542.75 samples/sec   Loss 5.8547   LearningRate 0.0356   Epoch: 8   Global Step: 134630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:58:13,392-Speed 9194.93 samples/sec   Loss 5.9465   LearningRate 0.0356   Epoch: 8   Global Step: 134640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:58:14,534-Speed 8967.34 samples/sec   Loss 5.8800   LearningRate 0.0356   Epoch: 8   Global Step: 134650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:58:15,645-Speed 9228.86 samples/sec   Loss 5.8303   LearningRate 0.0356   Epoch: 8   Global Step: 134660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:58:16,694-Speed 9768.17 samples/sec   Loss 5.9455   LearningRate 0.0356   Epoch: 8   Global Step: 134670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:17,801-Speed 9252.87 samples/sec   Loss 5.8267   LearningRate 0.0356   Epoch: 8   Global Step: 134680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:18,867-Speed 9612.12 samples/sec   Loss 5.9111   LearningRate 0.0356   Epoch: 8   Global Step: 134690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:19,940-Speed 9552.33 samples/sec   Loss 5.9315   LearningRate 0.0356   Epoch: 8   Global Step: 134700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:21,059-Speed 9153.94 samples/sec   Loss 5.9259   LearningRate 0.0356   Epoch: 8   Global Step: 134710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:22,139-Speed 9484.71 samples/sec   Loss 5.9406   LearningRate 0.0356   Epoch: 8   Global Step: 134720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:23,244-Speed 9274.13 samples/sec   Loss 5.7995   LearningRate 0.0356   Epoch: 8   Global Step: 134730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:24,348-Speed 9279.60 samples/sec   Loss 5.8351   LearningRate 0.0356   Epoch: 8   Global Step: 134740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:25,453-Speed 9272.24 samples/sec   Loss 5.8596   LearningRate 0.0356   Epoch: 8   Global Step: 134750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:26,557-Speed 9281.84 samples/sec   Loss 5.9263   LearningRate 0.0356   Epoch: 8   Global Step: 134760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:27,671-Speed 9200.37 samples/sec   Loss 5.8765   LearningRate 0.0356   Epoch: 8   Global Step: 134770   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:58:28,758-Speed 9428.18 samples/sec   Loss 5.9792   LearningRate 0.0356   Epoch: 8   Global Step: 134780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:29,829-Speed 9565.82 samples/sec   Loss 5.9993   LearningRate 0.0355   Epoch: 8   Global Step: 134790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:30,899-Speed 9578.05 samples/sec   Loss 5.8664   LearningRate 0.0355   Epoch: 8   Global Step: 134800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:31,963-Speed 9633.51 samples/sec   Loss 5.9494   LearningRate 0.0355   Epoch: 8   Global Step: 134810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:33,037-Speed 9540.06 samples/sec   Loss 5.9110   LearningRate 0.0355   Epoch: 8   Global Step: 134820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:34,144-Speed 9254.92 samples/sec   Loss 5.9474   LearningRate 0.0355   Epoch: 8   Global Step: 134830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:35,277-Speed 9038.62 samples/sec   Loss 5.9001   LearningRate 0.0355   Epoch: 8   Global Step: 134840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:36,355-Speed 9505.46 samples/sec   Loss 6.0242   LearningRate 0.0355   Epoch: 8   Global Step: 134850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:37,490-Speed 9022.88 samples/sec   Loss 5.8527   LearningRate 0.0355   Epoch: 8   Global Step: 134860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:38,532-Speed 9836.83 samples/sec   Loss 5.8996   LearningRate 0.0355   Epoch: 8   Global Step: 134870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:39,646-Speed 9199.88 samples/sec   Loss 5.9326   LearningRate 0.0355   Epoch: 8   Global Step: 134880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:40,726-Speed 9489.80 samples/sec   Loss 5.7742   LearningRate 0.0355   Epoch: 8   Global Step: 134890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:41,828-Speed 9299.07 samples/sec   Loss 5.9753   LearningRate 0.0355   Epoch: 8   Global Step: 134900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:42,924-Speed 9341.32 samples/sec   Loss 5.8356   LearningRate 0.0355   Epoch: 8   Global Step: 134910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:44,018-Speed 9370.65 samples/sec   Loss 5.7935   LearningRate 0.0355   Epoch: 8   Global Step: 134920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:45,125-Speed 9259.93 samples/sec   Loss 5.9840   LearningRate 0.0355   Epoch: 8   Global Step: 134930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:46,209-Speed 9446.73 samples/sec   Loss 5.8380   LearningRate 0.0355   Epoch: 8   Global Step: 134940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:47,295-Speed 9437.19 samples/sec   Loss 5.9611   LearningRate 0.0355   Epoch: 8   Global Step: 134950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:48,400-Speed 9275.91 samples/sec   Loss 6.0487   LearningRate 0.0355   Epoch: 8   Global Step: 134960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:49,482-Speed 9464.38 samples/sec   Loss 5.7926   LearningRate 0.0355   Epoch: 8   Global Step: 134970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:50,522-Speed 9861.12 samples/sec   Loss 5.9113   LearningRate 0.0355   Epoch: 8   Global Step: 134980   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:58:51,613-Speed 9388.63 samples/sec   Loss 5.9651   LearningRate 0.0355   Epoch: 8   Global Step: 134990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:52,661-Speed 9779.83 samples/sec   Loss 5.8617   LearningRate 0.0355   Epoch: 8   Global Step: 135000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:53,735-Speed 9536.65 samples/sec   Loss 5.9869   LearningRate 0.0355   Epoch: 8   Global Step: 135010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:58:54,754-Speed 10053.86 samples/sec   Loss 5.8662   LearningRate 0.0355   Epoch: 8   Global Step: 135020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:58:55,851-Speed 9342.06 samples/sec   Loss 5.8968   LearningRate 0.0355   Epoch: 8   Global Step: 135030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:58:56,948-Speed 9339.32 samples/sec   Loss 5.8256   LearningRate 0.0355   Epoch: 8   Global Step: 135040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:58:58,012-Speed 9632.30 samples/sec   Loss 5.9482   LearningRate 0.0355   Epoch: 8   Global Step: 135050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:58:59,099-Speed 9427.66 samples/sec   Loss 5.8771   LearningRate 0.0355   Epoch: 8   Global Step: 135060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:59:00,165-Speed 9607.03 samples/sec   Loss 5.9194   LearningRate 0.0354   Epoch: 8   Global Step: 135070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:59:01,252-Speed 9424.94 samples/sec   Loss 5.9826   LearningRate 0.0354   Epoch: 8   Global Step: 135080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:59:02,322-Speed 9577.88 samples/sec   Loss 5.9552   LearningRate 0.0354   Epoch: 8   Global Step: 135090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:59:03,371-Speed 9772.73 samples/sec   Loss 5.8697   LearningRate 0.0354   Epoch: 8   Global Step: 135100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:59:04,475-Speed 9271.99 samples/sec   Loss 5.9296   LearningRate 0.0354   Epoch: 8   Global Step: 135110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:59:05,529-Speed 9722.06 samples/sec   Loss 5.9353   LearningRate 0.0354   Epoch: 8   Global Step: 135120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:06,627-Speed 9337.53 samples/sec   Loss 6.0282   LearningRate 0.0354   Epoch: 8   Global Step: 135130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:07,683-Speed 9695.39 samples/sec   Loss 5.9071   LearningRate 0.0354   Epoch: 8   Global Step: 135140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:08,771-Speed 9424.27 samples/sec   Loss 5.9299   LearningRate 0.0354   Epoch: 8   Global Step: 135150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:09,811-Speed 9854.48 samples/sec   Loss 5.9350   LearningRate 0.0354   Epoch: 8   Global Step: 135160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:10,852-Speed 9840.82 samples/sec   Loss 6.0223   LearningRate 0.0354   Epoch: 8   Global Step: 135170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:11,926-Speed 9541.31 samples/sec   Loss 5.9283   LearningRate 0.0354   Epoch: 8   Global Step: 135180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:12,977-Speed 9753.48 samples/sec   Loss 6.0663   LearningRate 0.0354   Epoch: 8   Global Step: 135190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:14,068-Speed 9388.69 samples/sec   Loss 6.0477   LearningRate 0.0354   Epoch: 8   Global Step: 135200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:15,139-Speed 9570.21 samples/sec   Loss 5.9908   LearningRate 0.0354   Epoch: 8   Global Step: 135210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:16,219-Speed 9486.59 samples/sec   Loss 5.9698   LearningRate 0.0354   Epoch: 8   Global Step: 135220   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:59:17,280-Speed 9656.99 samples/sec   Loss 6.0046   LearningRate 0.0354   Epoch: 8   Global Step: 135230   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:59:18,320-Speed 9855.25 samples/sec   Loss 5.9475   LearningRate 0.0354   Epoch: 8   Global Step: 135240   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:59:19,422-Speed 9294.64 samples/sec   Loss 6.0660   LearningRate 0.0354   Epoch: 8   Global Step: 135250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:20,476-Speed 9718.63 samples/sec   Loss 5.9995   LearningRate 0.0354   Epoch: 8   Global Step: 135260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:21,547-Speed 9563.97 samples/sec   Loss 6.0346   LearningRate 0.0354   Epoch: 8   Global Step: 135270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:22,600-Speed 9737.20 samples/sec   Loss 6.0529   LearningRate 0.0354   Epoch: 8   Global Step: 135280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:23,695-Speed 9350.37 samples/sec   Loss 6.0807   LearningRate 0.0354   Epoch: 8   Global Step: 135290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:24,776-Speed 9480.59 samples/sec   Loss 6.0070   LearningRate 0.0354   Epoch: 8   Global Step: 135300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:25,875-Speed 9326.05 samples/sec   Loss 6.0005   LearningRate 0.0354   Epoch: 8   Global Step: 135310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:26,963-Speed 9413.23 samples/sec   Loss 5.9215   LearningRate 0.0354   Epoch: 8   Global Step: 135320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:28,034-Speed 9571.52 samples/sec   Loss 6.0157   LearningRate 0.0354   Epoch: 8   Global Step: 135330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:29,115-Speed 9478.38 samples/sec   Loss 5.9353   LearningRate 0.0354   Epoch: 8   Global Step: 135340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:30,194-Speed 9500.92 samples/sec   Loss 5.9995   LearningRate 0.0353   Epoch: 8   Global Step: 135350   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:59:31,268-Speed 9534.43 samples/sec   Loss 6.0342   LearningRate 0.0353   Epoch: 8   Global Step: 135360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:59:32,307-Speed 9867.98 samples/sec   Loss 5.9963   LearningRate 0.0353   Epoch: 8   Global Step: 135370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:59:33,425-Speed 9162.18 samples/sec   Loss 6.0308   LearningRate 0.0353   Epoch: 8   Global Step: 135380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:59:34,509-Speed 9449.39 samples/sec   Loss 5.9757   LearningRate 0.0353   Epoch: 8   Global Step: 135390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:59:35,615-Speed 9264.41 samples/sec   Loss 5.8845   LearningRate 0.0353   Epoch: 8   Global Step: 135400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:59:36,712-Speed 9334.50 samples/sec   Loss 5.9945   LearningRate 0.0353   Epoch: 8   Global Step: 135410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:59:37,815-Speed 9296.11 samples/sec   Loss 5.9632   LearningRate 0.0353   Epoch: 8   Global Step: 135420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:59:38,898-Speed 9464.49 samples/sec   Loss 5.9811   LearningRate 0.0353   Epoch: 8   Global Step: 135430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:59:40,007-Speed 9233.58 samples/sec   Loss 5.9686   LearningRate 0.0353   Epoch: 8   Global Step: 135440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:59:41,108-Speed 9307.95 samples/sec   Loss 6.0352   LearningRate 0.0353   Epoch: 8   Global Step: 135450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 16:59:42,215-Speed 9250.96 samples/sec   Loss 6.1260   LearningRate 0.0353   Epoch: 8   Global Step: 135460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:43,277-Speed 9651.94 samples/sec   Loss 5.9333   LearningRate 0.0353   Epoch: 8   Global Step: 135470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:44,378-Speed 9301.07 samples/sec   Loss 6.0558   LearningRate 0.0353   Epoch: 8   Global Step: 135480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:45,515-Speed 9019.26 samples/sec   Loss 6.0569   LearningRate 0.0353   Epoch: 8   Global Step: 135490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:46,628-Speed 9201.39 samples/sec   Loss 5.8791   LearningRate 0.0353   Epoch: 8   Global Step: 135500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:47,720-Speed 9386.03 samples/sec   Loss 6.0216   LearningRate 0.0353   Epoch: 8   Global Step: 135510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:48,752-Speed 9933.64 samples/sec   Loss 5.8813   LearningRate 0.0353   Epoch: 8   Global Step: 135520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:49,820-Speed 9605.70 samples/sec   Loss 5.9588   LearningRate 0.0353   Epoch: 8   Global Step: 135530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:50,905-Speed 9443.28 samples/sec   Loss 6.0664   LearningRate 0.0353   Epoch: 8   Global Step: 135540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:51,975-Speed 9574.77 samples/sec   Loss 5.9390   LearningRate 0.0353   Epoch: 8   Global Step: 135550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:53,080-Speed 9274.74 samples/sec   Loss 5.8819   LearningRate 0.0353   Epoch: 8   Global Step: 135560   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 16:59:54,145-Speed 9619.34 samples/sec   Loss 6.0380   LearningRate 0.0353   Epoch: 8   Global Step: 135570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:55,226-Speed 9477.15 samples/sec   Loss 6.0944   LearningRate 0.0353   Epoch: 8   Global Step: 135580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:56,273-Speed 9782.87 samples/sec   Loss 5.9649   LearningRate 0.0353   Epoch: 8   Global Step: 135590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:57,294-Speed 10032.63 samples/sec   Loss 6.0295   LearningRate 0.0353   Epoch: 8   Global Step: 135600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:58,384-Speed 9409.06 samples/sec   Loss 6.0503   LearningRate 0.0353   Epoch: 8   Global Step: 135610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 16:59:59,453-Speed 9581.71 samples/sec   Loss 5.9706   LearningRate 0.0353   Epoch: 8   Global Step: 135620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:00,521-Speed 9595.28 samples/sec   Loss 6.0926   LearningRate 0.0352   Epoch: 8   Global Step: 135630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:01,640-Speed 9150.90 samples/sec   Loss 6.1090   LearningRate 0.0352   Epoch: 8   Global Step: 135640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:02,736-Speed 9356.06 samples/sec   Loss 6.0254   LearningRate 0.0352   Epoch: 8   Global Step: 135650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:03,798-Speed 9646.80 samples/sec   Loss 6.0218   LearningRate 0.0352   Epoch: 8   Global Step: 135660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:04,836-Speed 9864.36 samples/sec   Loss 6.1289   LearningRate 0.0352   Epoch: 8   Global Step: 135670   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:00:05,917-Speed 9475.31 samples/sec   Loss 6.0067   LearningRate 0.0352   Epoch: 8   Global Step: 135680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:07,025-Speed 9249.67 samples/sec   Loss 6.0767   LearningRate 0.0352   Epoch: 8   Global Step: 135690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:08,085-Speed 9671.96 samples/sec   Loss 6.0675   LearningRate 0.0352   Epoch: 8   Global Step: 135700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:09,150-Speed 9620.36 samples/sec   Loss 5.9034   LearningRate 0.0352   Epoch: 8   Global Step: 135710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:10,208-Speed 9690.47 samples/sec   Loss 6.0252   LearningRate 0.0352   Epoch: 8   Global Step: 135720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:11,255-Speed 9781.41 samples/sec   Loss 6.0451   LearningRate 0.0352   Epoch: 8   Global Step: 135730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:12,307-Speed 9740.35 samples/sec   Loss 6.0922   LearningRate 0.0352   Epoch: 8   Global Step: 135740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:13,425-Speed 9165.28 samples/sec   Loss 6.0157   LearningRate 0.0352   Epoch: 8   Global Step: 135750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:14,489-Speed 9627.99 samples/sec   Loss 6.0042   LearningRate 0.0352   Epoch: 8   Global Step: 135760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:15,539-Speed 9759.20 samples/sec   Loss 5.8953   LearningRate 0.0352   Epoch: 8   Global Step: 135770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:16,588-Speed 9771.79 samples/sec   Loss 5.9689   LearningRate 0.0352   Epoch: 8   Global Step: 135780   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:00:17,656-Speed 9593.85 samples/sec   Loss 6.0752   LearningRate 0.0352   Epoch: 8   Global Step: 135790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:18,783-Speed 9090.34 samples/sec   Loss 5.9759   LearningRate 0.0352   Epoch: 8   Global Step: 135800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:19,858-Speed 9528.00 samples/sec   Loss 6.0970   LearningRate 0.0352   Epoch: 8   Global Step: 135810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:20,928-Speed 9583.30 samples/sec   Loss 6.0516   LearningRate 0.0352   Epoch: 8   Global Step: 135820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:22,049-Speed 9132.48 samples/sec   Loss 5.9436   LearningRate 0.0352   Epoch: 8   Global Step: 135830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:23,127-Speed 9504.38 samples/sec   Loss 6.0113   LearningRate 0.0352   Epoch: 8   Global Step: 135840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:24,236-Speed 9243.50 samples/sec   Loss 5.9836   LearningRate 0.0352   Epoch: 8   Global Step: 135850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:25,366-Speed 9070.05 samples/sec   Loss 5.9767   LearningRate 0.0352   Epoch: 8   Global Step: 135860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:26,449-Speed 9455.35 samples/sec   Loss 5.9905   LearningRate 0.0352   Epoch: 8   Global Step: 135870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:27,538-Speed 9408.64 samples/sec   Loss 6.0319   LearningRate 0.0352   Epoch: 8   Global Step: 135880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:28,611-Speed 9560.50 samples/sec   Loss 6.0254   LearningRate 0.0352   Epoch: 8   Global Step: 135890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:29,693-Speed 9464.55 samples/sec   Loss 6.0724   LearningRate 0.0352   Epoch: 8   Global Step: 135900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:30,758-Speed 9622.60 samples/sec   Loss 6.2302   LearningRate 0.0351   Epoch: 8   Global Step: 135910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:31,868-Speed 9234.20 samples/sec   Loss 6.0990   LearningRate 0.0351   Epoch: 8   Global Step: 135920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:32,932-Speed 9622.20 samples/sec   Loss 6.0747   LearningRate 0.0351   Epoch: 8   Global Step: 135930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:34,008-Speed 9524.67 samples/sec   Loss 6.0519   LearningRate 0.0351   Epoch: 8   Global Step: 135940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:35,132-Speed 9115.04 samples/sec   Loss 6.0617   LearningRate 0.0351   Epoch: 8   Global Step: 135950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:36,187-Speed 9722.02 samples/sec   Loss 6.0732   LearningRate 0.0351   Epoch: 8   Global Step: 135960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:37,254-Speed 9600.49 samples/sec   Loss 6.0185   LearningRate 0.0351   Epoch: 8   Global Step: 135970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:38,364-Speed 9229.97 samples/sec   Loss 6.0292   LearningRate 0.0351   Epoch: 8   Global Step: 135980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:39,428-Speed 9632.50 samples/sec   Loss 6.0128   LearningRate 0.0351   Epoch: 8   Global Step: 135990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:00:40,503-Speed 9531.73 samples/sec   Loss 6.0701   LearningRate 0.0351   Epoch: 8   Global Step: 136000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:01:02,246-[lfw][136000]XNorm: 10.414027
Training: 2022-04-11 17:01:02,247-[lfw][136000]Accuracy-Flip: 0.99617+-0.00248
Training: 2022-04-11 17:01:02,247-[lfw][136000]Accuracy-Highest: 0.99683
Training: 2022-04-11 17:01:27,384-[cfp_fp][136000]XNorm: 8.860287
Training: 2022-04-11 17:01:27,385-[cfp_fp][136000]Accuracy-Flip: 0.95943+-0.01254
Training: 2022-04-11 17:01:27,385-[cfp_fp][136000]Accuracy-Highest: 0.96157
Training: 2022-04-11 17:01:49,069-[agedb_30][136000]XNorm: 10.062916
Training: 2022-04-11 17:01:49,069-[agedb_30][136000]Accuracy-Flip: 0.96500+-0.01003
Training: 2022-04-11 17:01:49,070-[agedb_30][136000]Accuracy-Highest: 0.96650
Training: 2022-04-11 17:01:50,158-Speed 147.01 samples/sec   Loss 6.0386   LearningRate 0.0351   Epoch: 8   Global Step: 136010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:01:51,261-Speed 9284.61 samples/sec   Loss 6.1490   LearningRate 0.0351   Epoch: 8   Global Step: 136020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:01:52,381-Speed 9143.64 samples/sec   Loss 5.9747   LearningRate 0.0351   Epoch: 8   Global Step: 136030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:01:53,496-Speed 9191.76 samples/sec   Loss 5.9745   LearningRate 0.0351   Epoch: 8   Global Step: 136040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:01:54,633-Speed 9016.52 samples/sec   Loss 6.0460   LearningRate 0.0351   Epoch: 8   Global Step: 136050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:01:55,709-Speed 9520.63 samples/sec   Loss 5.9804   LearningRate 0.0351   Epoch: 8   Global Step: 136060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:01:56,785-Speed 9518.48 samples/sec   Loss 6.1235   LearningRate 0.0351   Epoch: 8   Global Step: 136070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:01:57,862-Speed 9514.82 samples/sec   Loss 6.0584   LearningRate 0.0351   Epoch: 8   Global Step: 136080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:01:58,903-Speed 9843.99 samples/sec   Loss 6.0276   LearningRate 0.0351   Epoch: 8   Global Step: 136090   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:02:00,011-Speed 9252.57 samples/sec   Loss 5.9950   LearningRate 0.0351   Epoch: 8   Global Step: 136100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:01,114-Speed 9288.13 samples/sec   Loss 6.0086   LearningRate 0.0351   Epoch: 8   Global Step: 136110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:02,200-Speed 9433.67 samples/sec   Loss 6.0446   LearningRate 0.0351   Epoch: 8   Global Step: 136120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:03,267-Speed 9605.07 samples/sec   Loss 6.1094   LearningRate 0.0351   Epoch: 8   Global Step: 136130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:04,313-Speed 9790.13 samples/sec   Loss 6.1079   LearningRate 0.0351   Epoch: 8   Global Step: 136140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:05,453-Speed 8986.46 samples/sec   Loss 6.0443   LearningRate 0.0351   Epoch: 8   Global Step: 136150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:06,536-Speed 9462.83 samples/sec   Loss 5.9349   LearningRate 0.0351   Epoch: 8   Global Step: 136160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:07,608-Speed 9559.75 samples/sec   Loss 5.9949   LearningRate 0.0351   Epoch: 8   Global Step: 136170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:08,707-Speed 9328.25 samples/sec   Loss 6.0808   LearningRate 0.0351   Epoch: 8   Global Step: 136180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:09,776-Speed 9580.10 samples/sec   Loss 6.0008   LearningRate 0.0350   Epoch: 8   Global Step: 136190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:10,852-Speed 9521.86 samples/sec   Loss 6.0632   LearningRate 0.0350   Epoch: 8   Global Step: 136200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:11,942-Speed 9398.90 samples/sec   Loss 6.0261   LearningRate 0.0350   Epoch: 8   Global Step: 136210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:13,040-Speed 9336.25 samples/sec   Loss 6.0275   LearningRate 0.0350   Epoch: 8   Global Step: 136220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:14,130-Speed 9400.96 samples/sec   Loss 6.1430   LearningRate 0.0350   Epoch: 8   Global Step: 136230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:15,175-Speed 9803.51 samples/sec   Loss 6.0355   LearningRate 0.0350   Epoch: 8   Global Step: 136240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:16,240-Speed 9621.58 samples/sec   Loss 6.0701   LearningRate 0.0350   Epoch: 8   Global Step: 136250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:17,316-Speed 9522.35 samples/sec   Loss 6.1121   LearningRate 0.0350   Epoch: 8   Global Step: 136260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:18,423-Speed 9259.24 samples/sec   Loss 6.0742   LearningRate 0.0350   Epoch: 8   Global Step: 136270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:19,488-Speed 9616.82 samples/sec   Loss 5.9903   LearningRate 0.0350   Epoch: 8   Global Step: 136280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:20,564-Speed 9520.25 samples/sec   Loss 6.1075   LearningRate 0.0350   Epoch: 8   Global Step: 136290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:21,623-Speed 9676.44 samples/sec   Loss 6.0239   LearningRate 0.0350   Epoch: 8   Global Step: 136300   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:02:22,712-Speed 9405.82 samples/sec   Loss 6.1007   LearningRate 0.0350   Epoch: 8   Global Step: 136310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:23,795-Speed 9467.20 samples/sec   Loss 6.0143   LearningRate 0.0350   Epoch: 8   Global Step: 136320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:24,882-Speed 9426.28 samples/sec   Loss 6.0426   LearningRate 0.0350   Epoch: 8   Global Step: 136330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:26,001-Speed 9153.18 samples/sec   Loss 5.8701   LearningRate 0.0350   Epoch: 8   Global Step: 136340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:27,097-Speed 9344.84 samples/sec   Loss 6.0150   LearningRate 0.0350   Epoch: 8   Global Step: 136350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:28,134-Speed 9887.88 samples/sec   Loss 5.9417   LearningRate 0.0350   Epoch: 8   Global Step: 136360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:29,181-Speed 9784.71 samples/sec   Loss 5.9999   LearningRate 0.0350   Epoch: 8   Global Step: 136370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:30,236-Speed 9706.41 samples/sec   Loss 5.9955   LearningRate 0.0350   Epoch: 8   Global Step: 136380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:31,319-Speed 9465.15 samples/sec   Loss 6.0884   LearningRate 0.0350   Epoch: 8   Global Step: 136390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:32,429-Speed 9235.63 samples/sec   Loss 6.0623   LearningRate 0.0350   Epoch: 8   Global Step: 136400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:33,535-Speed 9264.14 samples/sec   Loss 6.1963   LearningRate 0.0350   Epoch: 8   Global Step: 136410   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:02:34,612-Speed 9509.29 samples/sec   Loss 6.1062   LearningRate 0.0350   Epoch: 8   Global Step: 136420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:35,649-Speed 9884.63 samples/sec   Loss 6.0293   LearningRate 0.0350   Epoch: 8   Global Step: 136430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:36,739-Speed 9395.95 samples/sec   Loss 6.0123   LearningRate 0.0350   Epoch: 8   Global Step: 136440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:37,852-Speed 9204.42 samples/sec   Loss 6.0029   LearningRate 0.0350   Epoch: 8   Global Step: 136450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:38,941-Speed 9408.88 samples/sec   Loss 6.0101   LearningRate 0.0350   Epoch: 8   Global Step: 136460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:40,036-Speed 9357.42 samples/sec   Loss 5.9740   LearningRate 0.0350   Epoch: 8   Global Step: 136470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:41,091-Speed 9714.64 samples/sec   Loss 6.0571   LearningRate 0.0349   Epoch: 8   Global Step: 136480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:42,210-Speed 9158.16 samples/sec   Loss 6.1399   LearningRate 0.0349   Epoch: 8   Global Step: 136490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:43,326-Speed 9176.91 samples/sec   Loss 6.0723   LearningRate 0.0349   Epoch: 8   Global Step: 136500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:44,393-Speed 9605.60 samples/sec   Loss 6.0268   LearningRate 0.0349   Epoch: 8   Global Step: 136510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:45,456-Speed 9639.78 samples/sec   Loss 6.0567   LearningRate 0.0349   Epoch: 8   Global Step: 136520   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:02:46,537-Speed 9482.71 samples/sec   Loss 6.1823   LearningRate 0.0349   Epoch: 8   Global Step: 136530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:47,605-Speed 9591.37 samples/sec   Loss 6.0007   LearningRate 0.0349   Epoch: 8   Global Step: 136540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:02:48,677-Speed 9557.07 samples/sec   Loss 6.0384   LearningRate 0.0349   Epoch: 8   Global Step: 136550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:49,761-Speed 9449.73 samples/sec   Loss 6.0989   LearningRate 0.0349   Epoch: 8   Global Step: 136560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:50,861-Speed 9324.02 samples/sec   Loss 6.0172   LearningRate 0.0349   Epoch: 8   Global Step: 136570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:51,953-Speed 9378.55 samples/sec   Loss 6.2316   LearningRate 0.0349   Epoch: 8   Global Step: 136580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:53,051-Speed 9334.11 samples/sec   Loss 6.0674   LearningRate 0.0349   Epoch: 8   Global Step: 136590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:54,115-Speed 9625.59 samples/sec   Loss 6.0130   LearningRate 0.0349   Epoch: 8   Global Step: 136600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:55,191-Speed 9521.40 samples/sec   Loss 6.0960   LearningRate 0.0349   Epoch: 8   Global Step: 136610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:56,243-Speed 9738.15 samples/sec   Loss 6.1652   LearningRate 0.0349   Epoch: 8   Global Step: 136620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:57,326-Speed 9461.21 samples/sec   Loss 6.0737   LearningRate 0.0349   Epoch: 8   Global Step: 136630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:58,380-Speed 9720.36 samples/sec   Loss 6.1429   LearningRate 0.0349   Epoch: 8   Global Step: 136640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:02:59,420-Speed 9849.85 samples/sec   Loss 6.0902   LearningRate 0.0349   Epoch: 8   Global Step: 136650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:00,482-Speed 9658.34 samples/sec   Loss 6.0067   LearningRate 0.0349   Epoch: 8   Global Step: 136660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:01,521-Speed 9855.11 samples/sec   Loss 6.1647   LearningRate 0.0349   Epoch: 8   Global Step: 136670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:02,593-Speed 9558.41 samples/sec   Loss 6.1673   LearningRate 0.0349   Epoch: 8   Global Step: 136680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:03,670-Speed 9523.46 samples/sec   Loss 6.2104   LearningRate 0.0349   Epoch: 8   Global Step: 136690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:04,761-Speed 9388.50 samples/sec   Loss 6.1220   LearningRate 0.0349   Epoch: 8   Global Step: 136700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:05,830-Speed 9587.34 samples/sec   Loss 6.1172   LearningRate 0.0349   Epoch: 8   Global Step: 136710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:06,863-Speed 9914.42 samples/sec   Loss 6.0295   LearningRate 0.0349   Epoch: 8   Global Step: 136720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:07,926-Speed 9643.70 samples/sec   Loss 6.0673   LearningRate 0.0349   Epoch: 8   Global Step: 136730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:08,981-Speed 9704.08 samples/sec   Loss 6.0468   LearningRate 0.0349   Epoch: 8   Global Step: 136740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:10,030-Speed 9776.48 samples/sec   Loss 5.9918   LearningRate 0.0349   Epoch: 8   Global Step: 136750   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:03:11,104-Speed 9540.93 samples/sec   Loss 5.9507   LearningRate 0.0348   Epoch: 8   Global Step: 136760   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:03:12,190-Speed 9430.20 samples/sec   Loss 6.1072   LearningRate 0.0348   Epoch: 8   Global Step: 136770   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:03:13,280-Speed 9403.84 samples/sec   Loss 6.1652   LearningRate 0.0348   Epoch: 8   Global Step: 136780   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:03:14,310-Speed 9951.32 samples/sec   Loss 6.0792   LearningRate 0.0348   Epoch: 8   Global Step: 136790   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:03:15,386-Speed 9519.41 samples/sec   Loss 6.0376   LearningRate 0.0348   Epoch: 8   Global Step: 136800   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:03:16,463-Speed 9515.15 samples/sec   Loss 6.1390   LearningRate 0.0348   Epoch: 8   Global Step: 136810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:17,565-Speed 9298.12 samples/sec   Loss 6.1306   LearningRate 0.0348   Epoch: 8   Global Step: 136820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:18,642-Speed 9510.15 samples/sec   Loss 6.0538   LearningRate 0.0348   Epoch: 8   Global Step: 136830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:19,725-Speed 9463.04 samples/sec   Loss 6.1440   LearningRate 0.0348   Epoch: 8   Global Step: 136840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:20,772-Speed 9798.39 samples/sec   Loss 6.0760   LearningRate 0.0348   Epoch: 8   Global Step: 136850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:21,846-Speed 9534.31 samples/sec   Loss 6.0837   LearningRate 0.0348   Epoch: 8   Global Step: 136860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:22,949-Speed 9293.54 samples/sec   Loss 6.0647   LearningRate 0.0348   Epoch: 8   Global Step: 136870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:24,037-Speed 9415.72 samples/sec   Loss 6.1517   LearningRate 0.0348   Epoch: 8   Global Step: 136880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:25,114-Speed 9512.66 samples/sec   Loss 6.0851   LearningRate 0.0348   Epoch: 8   Global Step: 136890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:26,194-Speed 9484.77 samples/sec   Loss 6.1061   LearningRate 0.0348   Epoch: 8   Global Step: 136900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:27,278-Speed 9451.38 samples/sec   Loss 6.1557   LearningRate 0.0348   Epoch: 8   Global Step: 136910   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:03:28,344-Speed 9617.32 samples/sec   Loss 6.2953   LearningRate 0.0348   Epoch: 8   Global Step: 136920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:29,422-Speed 9507.62 samples/sec   Loss 6.1098   LearningRate 0.0348   Epoch: 8   Global Step: 136930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:30,473-Speed 9746.29 samples/sec   Loss 6.1224   LearningRate 0.0348   Epoch: 8   Global Step: 136940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:31,562-Speed 9407.88 samples/sec   Loss 6.1180   LearningRate 0.0348   Epoch: 8   Global Step: 136950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:32,648-Speed 9436.85 samples/sec   Loss 6.1730   LearningRate 0.0348   Epoch: 8   Global Step: 136960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:33,765-Speed 9172.15 samples/sec   Loss 6.2489   LearningRate 0.0348   Epoch: 8   Global Step: 136970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:34,815-Speed 9756.43 samples/sec   Loss 6.0189   LearningRate 0.0348   Epoch: 8   Global Step: 136980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:35,862-Speed 9791.63 samples/sec   Loss 6.0130   LearningRate 0.0348   Epoch: 8   Global Step: 136990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:36,975-Speed 9199.54 samples/sec   Loss 6.1665   LearningRate 0.0348   Epoch: 8   Global Step: 137000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:38,035-Speed 9667.26 samples/sec   Loss 6.2678   LearningRate 0.0348   Epoch: 8   Global Step: 137010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:39,173-Speed 9006.99 samples/sec   Loss 6.2697   LearningRate 0.0348   Epoch: 8   Global Step: 137020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:40,287-Speed 9194.24 samples/sec   Loss 6.1570   LearningRate 0.0348   Epoch: 8   Global Step: 137030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:41,373-Speed 9432.65 samples/sec   Loss 6.0436   LearningRate 0.0347   Epoch: 8   Global Step: 137040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:42,482-Speed 9242.94 samples/sec   Loss 6.1671   LearningRate 0.0347   Epoch: 8   Global Step: 137050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:43,595-Speed 9207.21 samples/sec   Loss 6.0274   LearningRate 0.0347   Epoch: 8   Global Step: 137060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:44,674-Speed 9492.16 samples/sec   Loss 6.1665   LearningRate 0.0347   Epoch: 8   Global Step: 137070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:45,745-Speed 9572.34 samples/sec   Loss 6.0877   LearningRate 0.0347   Epoch: 8   Global Step: 137080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:46,852-Speed 9257.11 samples/sec   Loss 6.1468   LearningRate 0.0347   Epoch: 8   Global Step: 137090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:47,936-Speed 9451.40 samples/sec   Loss 6.1129   LearningRate 0.0347   Epoch: 8   Global Step: 137100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:49,028-Speed 9381.30 samples/sec   Loss 6.1071   LearningRate 0.0347   Epoch: 8   Global Step: 137110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:50,143-Speed 9191.39 samples/sec   Loss 6.1348   LearningRate 0.0347   Epoch: 8   Global Step: 137120   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:03:51,208-Speed 9627.04 samples/sec   Loss 6.2486   LearningRate 0.0347   Epoch: 8   Global Step: 137130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:52,298-Speed 9394.47 samples/sec   Loss 6.1408   LearningRate 0.0347   Epoch: 8   Global Step: 137140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:53,427-Speed 9076.73 samples/sec   Loss 6.2426   LearningRate 0.0347   Epoch: 8   Global Step: 137150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:54,507-Speed 9486.11 samples/sec   Loss 6.2051   LearningRate 0.0347   Epoch: 8   Global Step: 137160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:55,585-Speed 9503.51 samples/sec   Loss 6.1374   LearningRate 0.0347   Epoch: 8   Global Step: 137170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:56,667-Speed 9470.19 samples/sec   Loss 6.1847   LearningRate 0.0347   Epoch: 8   Global Step: 137180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:57,724-Speed 9691.71 samples/sec   Loss 6.0369   LearningRate 0.0347   Epoch: 8   Global Step: 137190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:58,828-Speed 9278.56 samples/sec   Loss 6.1205   LearningRate 0.0347   Epoch: 8   Global Step: 137200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:03:59,910-Speed 9473.19 samples/sec   Loss 6.1171   LearningRate 0.0347   Epoch: 8   Global Step: 137210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:00,973-Speed 9651.54 samples/sec   Loss 6.0906   LearningRate 0.0347   Epoch: 8   Global Step: 137220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:02,044-Speed 9559.90 samples/sec   Loss 6.2198   LearningRate 0.0347   Epoch: 8   Global Step: 137230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:03,136-Speed 9383.36 samples/sec   Loss 6.1489   LearningRate 0.0347   Epoch: 8   Global Step: 137240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:04,250-Speed 9202.52 samples/sec   Loss 6.1611   LearningRate 0.0347   Epoch: 8   Global Step: 137250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:05,339-Speed 9407.99 samples/sec   Loss 6.0573   LearningRate 0.0347   Epoch: 8   Global Step: 137260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:06,454-Speed 9197.68 samples/sec   Loss 6.1067   LearningRate 0.0347   Epoch: 8   Global Step: 137270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:07,504-Speed 9760.59 samples/sec   Loss 6.1505   LearningRate 0.0347   Epoch: 8   Global Step: 137280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:08,569-Speed 9620.06 samples/sec   Loss 6.2302   LearningRate 0.0347   Epoch: 8   Global Step: 137290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:09,662-Speed 9367.97 samples/sec   Loss 6.1274   LearningRate 0.0347   Epoch: 8   Global Step: 137300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:10,741-Speed 9497.86 samples/sec   Loss 6.1225   LearningRate 0.0347   Epoch: 8   Global Step: 137310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:11,811-Speed 9580.15 samples/sec   Loss 6.0935   LearningRate 0.0346   Epoch: 8   Global Step: 137320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:12,876-Speed 9619.55 samples/sec   Loss 6.1916   LearningRate 0.0346   Epoch: 8   Global Step: 137330   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:04:13,947-Speed 9572.09 samples/sec   Loss 6.1238   LearningRate 0.0346   Epoch: 8   Global Step: 137340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:15,010-Speed 9630.95 samples/sec   Loss 6.2440   LearningRate 0.0346   Epoch: 8   Global Step: 137350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:16,092-Speed 9472.49 samples/sec   Loss 6.2002   LearningRate 0.0346   Epoch: 8   Global Step: 137360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:17,130-Speed 9865.09 samples/sec   Loss 6.0921   LearningRate 0.0346   Epoch: 8   Global Step: 137370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:18,225-Speed 9357.93 samples/sec   Loss 6.1238   LearningRate 0.0346   Epoch: 8   Global Step: 137380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:19,320-Speed 9357.71 samples/sec   Loss 6.1432   LearningRate 0.0346   Epoch: 8   Global Step: 137390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:20,381-Speed 9665.58 samples/sec   Loss 6.2252   LearningRate 0.0346   Epoch: 8   Global Step: 137400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:21,462-Speed 9476.22 samples/sec   Loss 6.1349   LearningRate 0.0346   Epoch: 8   Global Step: 137410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:22,551-Speed 9406.10 samples/sec   Loss 6.2413   LearningRate 0.0346   Epoch: 8   Global Step: 137420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:23,636-Speed 9444.05 samples/sec   Loss 6.0554   LearningRate 0.0346   Epoch: 8   Global Step: 137430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:24,737-Speed 9303.44 samples/sec   Loss 6.1056   LearningRate 0.0346   Epoch: 8   Global Step: 137440   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:04:25,803-Speed 9613.96 samples/sec   Loss 6.0868   LearningRate 0.0346   Epoch: 8   Global Step: 137450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:26,867-Speed 9635.76 samples/sec   Loss 6.1102   LearningRate 0.0346   Epoch: 8   Global Step: 137460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:27,943-Speed 9520.14 samples/sec   Loss 6.0660   LearningRate 0.0346   Epoch: 8   Global Step: 137470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:29,006-Speed 9635.88 samples/sec   Loss 6.1637   LearningRate 0.0346   Epoch: 8   Global Step: 137480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:30,076-Speed 9573.31 samples/sec   Loss 6.1247   LearningRate 0.0346   Epoch: 8   Global Step: 137490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:31,198-Speed 9131.72 samples/sec   Loss 6.1767   LearningRate 0.0346   Epoch: 8   Global Step: 137500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:32,292-Speed 9369.45 samples/sec   Loss 6.1754   LearningRate 0.0346   Epoch: 8   Global Step: 137510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:33,392-Speed 9318.94 samples/sec   Loss 6.1641   LearningRate 0.0346   Epoch: 8   Global Step: 137520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:34,437-Speed 9801.88 samples/sec   Loss 6.1231   LearningRate 0.0346   Epoch: 8   Global Step: 137530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:35,554-Speed 9175.02 samples/sec   Loss 6.1489   LearningRate 0.0346   Epoch: 8   Global Step: 137540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:36,668-Speed 9196.22 samples/sec   Loss 6.1792   LearningRate 0.0346   Epoch: 8   Global Step: 137550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:37,721-Speed 9724.61 samples/sec   Loss 6.2035   LearningRate 0.0346   Epoch: 8   Global Step: 137560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:38,847-Speed 9099.96 samples/sec   Loss 6.1318   LearningRate 0.0346   Epoch: 8   Global Step: 137570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:39,949-Speed 9298.13 samples/sec   Loss 6.0980   LearningRate 0.0346   Epoch: 8   Global Step: 137580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:41,022-Speed 9548.91 samples/sec   Loss 6.0537   LearningRate 0.0346   Epoch: 8   Global Step: 137590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:42,112-Speed 9395.96 samples/sec   Loss 6.1828   LearningRate 0.0346   Epoch: 8   Global Step: 137600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:43,191-Speed 9505.99 samples/sec   Loss 6.0999   LearningRate 0.0345   Epoch: 8   Global Step: 137610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:44,288-Speed 9342.58 samples/sec   Loss 6.0641   LearningRate 0.0345   Epoch: 8   Global Step: 137620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:45,370-Speed 9474.90 samples/sec   Loss 6.1854   LearningRate 0.0345   Epoch: 8   Global Step: 137630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:46,438-Speed 9587.75 samples/sec   Loss 6.1390   LearningRate 0.0345   Epoch: 8   Global Step: 137640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:47,507-Speed 9583.11 samples/sec   Loss 6.0102   LearningRate 0.0345   Epoch: 8   Global Step: 137650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:04:48,593-Speed 9440.19 samples/sec   Loss 6.0516   LearningRate 0.0345   Epoch: 8   Global Step: 137660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:04:49,692-Speed 9318.40 samples/sec   Loss 6.1228   LearningRate 0.0345   Epoch: 8   Global Step: 137670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:04:50,762-Speed 9576.81 samples/sec   Loss 6.1245   LearningRate 0.0345   Epoch: 8   Global Step: 137680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:04:51,907-Speed 8947.81 samples/sec   Loss 6.2951   LearningRate 0.0345   Epoch: 8   Global Step: 137690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:04:53,012-Speed 9273.70 samples/sec   Loss 6.1327   LearningRate 0.0345   Epoch: 8   Global Step: 137700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:04:54,070-Speed 9688.86 samples/sec   Loss 6.1382   LearningRate 0.0345   Epoch: 8   Global Step: 137710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:04:55,128-Speed 9678.81 samples/sec   Loss 6.1741   LearningRate 0.0345   Epoch: 8   Global Step: 137720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:04:56,246-Speed 9167.93 samples/sec   Loss 6.0621   LearningRate 0.0345   Epoch: 8   Global Step: 137730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:04:57,315-Speed 9589.88 samples/sec   Loss 6.0887   LearningRate 0.0345   Epoch: 8   Global Step: 137740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:04:58,376-Speed 9649.95 samples/sec   Loss 6.1582   LearningRate 0.0345   Epoch: 8   Global Step: 137750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:04:59,423-Speed 9784.43 samples/sec   Loss 6.1629   LearningRate 0.0345   Epoch: 8   Global Step: 137760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:05:00,499-Speed 9528.67 samples/sec   Loss 6.2214   LearningRate 0.0345   Epoch: 8   Global Step: 137770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:05:01,564-Speed 9616.78 samples/sec   Loss 6.0574   LearningRate 0.0345   Epoch: 8   Global Step: 137780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:05:02,656-Speed 9389.76 samples/sec   Loss 6.0621   LearningRate 0.0345   Epoch: 8   Global Step: 137790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:05:03,735-Speed 9499.62 samples/sec   Loss 6.0644   LearningRate 0.0345   Epoch: 8   Global Step: 137800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:05:04,799-Speed 9630.36 samples/sec   Loss 6.1773   LearningRate 0.0345   Epoch: 8   Global Step: 137810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:05:05,900-Speed 9307.26 samples/sec   Loss 6.1374   LearningRate 0.0345   Epoch: 8   Global Step: 137820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:05:06,982-Speed 9463.34 samples/sec   Loss 6.1524   LearningRate 0.0345   Epoch: 8   Global Step: 137830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:05:08,075-Speed 9373.29 samples/sec   Loss 6.2338   LearningRate 0.0345   Epoch: 8   Global Step: 137840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:05:09,149-Speed 9541.53 samples/sec   Loss 6.1893   LearningRate 0.0345   Epoch: 8   Global Step: 137850   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:05:10,218-Speed 9581.57 samples/sec   Loss 6.2406   LearningRate 0.0345   Epoch: 8   Global Step: 137860   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:05:11,313-Speed 9363.59 samples/sec   Loss 6.1753   LearningRate 0.0345   Epoch: 8   Global Step: 137870   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:05:12,400-Speed 9418.99 samples/sec   Loss 6.2255   LearningRate 0.0345   Epoch: 8   Global Step: 137880   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:05:13,511-Speed 9228.35 samples/sec   Loss 6.1982   LearningRate 0.0344   Epoch: 8   Global Step: 137890   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:05:14,618-Speed 9260.58 samples/sec   Loss 6.0685   LearningRate 0.0344   Epoch: 8   Global Step: 137900   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:05:15,717-Speed 9317.74 samples/sec   Loss 6.2737   LearningRate 0.0344   Epoch: 8   Global Step: 137910   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:05:16,788-Speed 9570.46 samples/sec   Loss 6.1220   LearningRate 0.0344   Epoch: 8   Global Step: 137920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:05:17,900-Speed 9215.28 samples/sec   Loss 6.1493   LearningRate 0.0344   Epoch: 8   Global Step: 137930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:05:18,972-Speed 9551.91 samples/sec   Loss 6.1104   LearningRate 0.0344   Epoch: 8   Global Step: 137940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:05:20,108-Speed 9023.22 samples/sec   Loss 6.2067   LearningRate 0.0344   Epoch: 8   Global Step: 137950   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:05:21,186-Speed 9501.42 samples/sec   Loss 6.0943   LearningRate 0.0344   Epoch: 8   Global Step: 137960   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:05:22,245-Speed 9681.38 samples/sec   Loss 6.1824   LearningRate 0.0344   Epoch: 8   Global Step: 137970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:05:23,364-Speed 9158.20 samples/sec   Loss 6.2593   LearningRate 0.0344   Epoch: 8   Global Step: 137980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:05:24,411-Speed 9790.98 samples/sec   Loss 6.1724   LearningRate 0.0344   Epoch: 8   Global Step: 137990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:05:25,474-Speed 9637.76 samples/sec   Loss 6.1578   LearningRate 0.0344   Epoch: 8   Global Step: 138000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:05:47,336-[lfw][138000]XNorm: 10.263365
Training: 2022-04-11 17:05:47,337-[lfw][138000]Accuracy-Flip: 0.99667+-0.00289
Training: 2022-04-11 17:05:47,337-[lfw][138000]Accuracy-Highest: 0.99683
Training: 2022-04-11 17:06:12,608-[cfp_fp][138000]XNorm: 8.801229
Training: 2022-04-11 17:06:12,609-[cfp_fp][138000]Accuracy-Flip: 0.96500+-0.00918
Training: 2022-04-11 17:06:12,609-[cfp_fp][138000]Accuracy-Highest: 0.96500
Training: 2022-04-11 17:06:34,369-[agedb_30][138000]XNorm: 9.990886
Training: 2022-04-11 17:06:34,370-[agedb_30][138000]Accuracy-Flip: 0.96300+-0.00991
Training: 2022-04-11 17:06:34,370-[agedb_30][138000]Accuracy-Highest: 0.96650
Training: 2022-04-11 17:06:35,460-Speed 146.32 samples/sec   Loss 6.2353   LearningRate 0.0344   Epoch: 8   Global Step: 138010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:36,519-Speed 9677.12 samples/sec   Loss 6.1906   LearningRate 0.0344   Epoch: 8   Global Step: 138020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:37,621-Speed 9294.15 samples/sec   Loss 6.2120   LearningRate 0.0344   Epoch: 8   Global Step: 138030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:38,717-Speed 9351.88 samples/sec   Loss 6.2495   LearningRate 0.0344   Epoch: 8   Global Step: 138040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:39,797-Speed 9478.98 samples/sec   Loss 6.1530   LearningRate 0.0344   Epoch: 8   Global Step: 138050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:40,866-Speed 9584.57 samples/sec   Loss 6.2038   LearningRate 0.0344   Epoch: 8   Global Step: 138060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:41,934-Speed 9600.64 samples/sec   Loss 6.1287   LearningRate 0.0344   Epoch: 8   Global Step: 138070   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:06:43,011-Speed 9515.73 samples/sec   Loss 6.1864   LearningRate 0.0344   Epoch: 8   Global Step: 138080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:44,104-Speed 9370.54 samples/sec   Loss 6.1638   LearningRate 0.0344   Epoch: 8   Global Step: 138090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:45,219-Speed 9194.42 samples/sec   Loss 6.0576   LearningRate 0.0344   Epoch: 8   Global Step: 138100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:46,283-Speed 9629.33 samples/sec   Loss 6.1313   LearningRate 0.0344   Epoch: 8   Global Step: 138110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:47,389-Speed 9258.49 samples/sec   Loss 6.1637   LearningRate 0.0344   Epoch: 8   Global Step: 138120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:48,471-Speed 9470.14 samples/sec   Loss 6.1746   LearningRate 0.0344   Epoch: 8   Global Step: 138130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:49,547-Speed 9523.63 samples/sec   Loss 6.1852   LearningRate 0.0344   Epoch: 8   Global Step: 138140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:50,595-Speed 9779.24 samples/sec   Loss 6.1675   LearningRate 0.0344   Epoch: 8   Global Step: 138150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:51,666-Speed 9571.85 samples/sec   Loss 6.1623   LearningRate 0.0344   Epoch: 8   Global Step: 138160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:52,721-Speed 9712.84 samples/sec   Loss 6.1640   LearningRate 0.0344   Epoch: 8   Global Step: 138170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:53,810-Speed 9403.74 samples/sec   Loss 6.1121   LearningRate 0.0343   Epoch: 8   Global Step: 138180   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:06:54,927-Speed 9173.28 samples/sec   Loss 6.3069   LearningRate 0.0343   Epoch: 8   Global Step: 138190   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:06:56,013-Speed 9435.74 samples/sec   Loss 6.1738   LearningRate 0.0343   Epoch: 8   Global Step: 138200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:06:57,071-Speed 9681.30 samples/sec   Loss 6.1594   LearningRate 0.0343   Epoch: 8   Global Step: 138210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:06:58,148-Speed 9511.97 samples/sec   Loss 6.2248   LearningRate 0.0343   Epoch: 8   Global Step: 138220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:06:59,180-Speed 9928.24 samples/sec   Loss 6.2151   LearningRate 0.0343   Epoch: 8   Global Step: 138230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:00,222-Speed 9834.83 samples/sec   Loss 6.2022   LearningRate 0.0343   Epoch: 8   Global Step: 138240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:01,305-Speed 9460.22 samples/sec   Loss 6.2117   LearningRate 0.0343   Epoch: 8   Global Step: 138250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:02,428-Speed 9122.31 samples/sec   Loss 6.0512   LearningRate 0.0343   Epoch: 8   Global Step: 138260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:03,513-Speed 9444.58 samples/sec   Loss 6.2511   LearningRate 0.0343   Epoch: 8   Global Step: 138270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:04,555-Speed 9838.12 samples/sec   Loss 6.2176   LearningRate 0.0343   Epoch: 8   Global Step: 138280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:05,588-Speed 9915.50 samples/sec   Loss 6.1855   LearningRate 0.0343   Epoch: 8   Global Step: 138290   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:06,681-Speed 9378.63 samples/sec   Loss 6.1293   LearningRate 0.0343   Epoch: 8   Global Step: 138300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:07,760-Speed 9490.72 samples/sec   Loss 6.2899   LearningRate 0.0343   Epoch: 8   Global Step: 138310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:08,835-Speed 9537.85 samples/sec   Loss 6.2051   LearningRate 0.0343   Epoch: 8   Global Step: 138320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:09,918-Speed 9456.92 samples/sec   Loss 6.2834   LearningRate 0.0343   Epoch: 8   Global Step: 138330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:10,955-Speed 9874.34 samples/sec   Loss 6.1991   LearningRate 0.0343   Epoch: 8   Global Step: 138340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:11,995-Speed 9853.15 samples/sec   Loss 6.3085   LearningRate 0.0343   Epoch: 8   Global Step: 138350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:13,101-Speed 9263.15 samples/sec   Loss 6.2154   LearningRate 0.0343   Epoch: 8   Global Step: 138360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:14,172-Speed 9572.63 samples/sec   Loss 6.1063   LearningRate 0.0343   Epoch: 8   Global Step: 138370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:15,214-Speed 9835.46 samples/sec   Loss 6.2188   LearningRate 0.0343   Epoch: 8   Global Step: 138380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:16,325-Speed 9223.73 samples/sec   Loss 6.1807   LearningRate 0.0343   Epoch: 8   Global Step: 138390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:17,378-Speed 9735.23 samples/sec   Loss 6.2232   LearningRate 0.0343   Epoch: 8   Global Step: 138400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:18,469-Speed 9385.70 samples/sec   Loss 6.2086   LearningRate 0.0343   Epoch: 8   Global Step: 138410   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:07:19,546-Speed 9512.56 samples/sec   Loss 6.0744   LearningRate 0.0343   Epoch: 8   Global Step: 138420   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:07:20,620-Speed 9546.44 samples/sec   Loss 6.1668   LearningRate 0.0343   Epoch: 8   Global Step: 138430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:21,684-Speed 9624.35 samples/sec   Loss 6.0953   LearningRate 0.0343   Epoch: 8   Global Step: 138440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:22,778-Speed 9365.86 samples/sec   Loss 6.1973   LearningRate 0.0343   Epoch: 8   Global Step: 138450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:23,846-Speed 9599.44 samples/sec   Loss 6.1367   LearningRate 0.0342   Epoch: 8   Global Step: 138460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:24,919-Speed 9543.08 samples/sec   Loss 6.1668   LearningRate 0.0342   Epoch: 8   Global Step: 138470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:25,968-Speed 9772.69 samples/sec   Loss 6.0890   LearningRate 0.0342   Epoch: 8   Global Step: 138480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:27,033-Speed 9618.72 samples/sec   Loss 6.0990   LearningRate 0.0342   Epoch: 8   Global Step: 138490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:28,084-Speed 9747.98 samples/sec   Loss 6.2237   LearningRate 0.0342   Epoch: 8   Global Step: 138500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:29,129-Speed 9798.46 samples/sec   Loss 6.2839   LearningRate 0.0342   Epoch: 8   Global Step: 138510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:30,168-Speed 9868.10 samples/sec   Loss 6.1219   LearningRate 0.0342   Epoch: 8   Global Step: 138520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:31,212-Speed 9808.59 samples/sec   Loss 6.2424   LearningRate 0.0342   Epoch: 8   Global Step: 138530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:07:32,246-Speed 9909.96 samples/sec   Loss 6.2547   LearningRate 0.0342   Epoch: 8   Global Step: 138540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:33,354-Speed 9250.68 samples/sec   Loss 6.1690   LearningRate 0.0342   Epoch: 8   Global Step: 138550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:34,421-Speed 9606.46 samples/sec   Loss 6.1582   LearningRate 0.0342   Epoch: 8   Global Step: 138560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:35,518-Speed 9339.40 samples/sec   Loss 6.1535   LearningRate 0.0342   Epoch: 8   Global Step: 138570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:36,600-Speed 9473.73 samples/sec   Loss 6.2033   LearningRate 0.0342   Epoch: 8   Global Step: 138580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:37,678-Speed 9499.06 samples/sec   Loss 6.1442   LearningRate 0.0342   Epoch: 8   Global Step: 138590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:38,761-Speed 9460.24 samples/sec   Loss 6.2392   LearningRate 0.0342   Epoch: 8   Global Step: 138600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:39,845-Speed 9456.39 samples/sec   Loss 6.1282   LearningRate 0.0342   Epoch: 8   Global Step: 138610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:40,921-Speed 9517.22 samples/sec   Loss 6.0966   LearningRate 0.0342   Epoch: 8   Global Step: 138620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:41,976-Speed 9710.00 samples/sec   Loss 6.1249   LearningRate 0.0342   Epoch: 8   Global Step: 138630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:43,029-Speed 9734.50 samples/sec   Loss 6.1913   LearningRate 0.0342   Epoch: 8   Global Step: 138640   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:07:44,127-Speed 9332.10 samples/sec   Loss 6.1347   LearningRate 0.0342   Epoch: 8   Global Step: 138650   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:07:45,181-Speed 9718.51 samples/sec   Loss 6.1767   LearningRate 0.0342   Epoch: 8   Global Step: 138660   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:07:46,294-Speed 9211.17 samples/sec   Loss 6.1651   LearningRate 0.0342   Epoch: 8   Global Step: 138670   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:07:47,392-Speed 9329.47 samples/sec   Loss 6.1930   LearningRate 0.0342   Epoch: 8   Global Step: 138680   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:07:48,473-Speed 9474.06 samples/sec   Loss 6.2555   LearningRate 0.0342   Epoch: 8   Global Step: 138690   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:07:49,535-Speed 9646.07 samples/sec   Loss 6.2747   LearningRate 0.0342   Epoch: 8   Global Step: 138700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:50,662-Speed 9090.33 samples/sec   Loss 6.2422   LearningRate 0.0342   Epoch: 8   Global Step: 138710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:51,812-Speed 8921.17 samples/sec   Loss 6.2510   LearningRate 0.0342   Epoch: 8   Global Step: 138720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:52,892-Speed 9485.48 samples/sec   Loss 6.2415   LearningRate 0.0342   Epoch: 8   Global Step: 138730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:53,965-Speed 9546.92 samples/sec   Loss 6.1404   LearningRate 0.0342   Epoch: 8   Global Step: 138740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:55,048-Speed 9463.54 samples/sec   Loss 6.1829   LearningRate 0.0341   Epoch: 8   Global Step: 138750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:56,142-Speed 9369.75 samples/sec   Loss 6.1579   LearningRate 0.0341   Epoch: 8   Global Step: 138760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:57,219-Speed 9512.60 samples/sec   Loss 6.1928   LearningRate 0.0341   Epoch: 8   Global Step: 138770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:58,249-Speed 9939.21 samples/sec   Loss 6.1676   LearningRate 0.0341   Epoch: 8   Global Step: 138780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:07:59,323-Speed 9543.49 samples/sec   Loss 6.2378   LearningRate 0.0341   Epoch: 8   Global Step: 138790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:00,413-Speed 9402.26 samples/sec   Loss 6.2540   LearningRate 0.0341   Epoch: 8   Global Step: 138800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:01,475-Speed 9648.20 samples/sec   Loss 6.0985   LearningRate 0.0341   Epoch: 8   Global Step: 138810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:02,569-Speed 9362.41 samples/sec   Loss 6.2355   LearningRate 0.0341   Epoch: 8   Global Step: 138820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:03,634-Speed 9618.40 samples/sec   Loss 6.1589   LearningRate 0.0341   Epoch: 8   Global Step: 138830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:04,705-Speed 9574.27 samples/sec   Loss 6.2079   LearningRate 0.0341   Epoch: 8   Global Step: 138840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:05,786-Speed 9474.76 samples/sec   Loss 6.2619   LearningRate 0.0341   Epoch: 8   Global Step: 138850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:06,828-Speed 9831.69 samples/sec   Loss 6.1966   LearningRate 0.0341   Epoch: 8   Global Step: 138860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:07,884-Speed 9701.93 samples/sec   Loss 6.2543   LearningRate 0.0341   Epoch: 8   Global Step: 138870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:08,955-Speed 9565.82 samples/sec   Loss 6.1783   LearningRate 0.0341   Epoch: 8   Global Step: 138880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:10,014-Speed 9674.24 samples/sec   Loss 6.2047   LearningRate 0.0341   Epoch: 8   Global Step: 138890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:11,093-Speed 9500.78 samples/sec   Loss 6.0948   LearningRate 0.0341   Epoch: 8   Global Step: 138900   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:08:12,172-Speed 9497.32 samples/sec   Loss 6.1670   LearningRate 0.0341   Epoch: 8   Global Step: 138910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:13,239-Speed 9599.66 samples/sec   Loss 6.0335   LearningRate 0.0341   Epoch: 8   Global Step: 138920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:14,324-Speed 9448.58 samples/sec   Loss 6.0985   LearningRate 0.0341   Epoch: 8   Global Step: 138930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:15,448-Speed 9115.63 samples/sec   Loss 6.2339   LearningRate 0.0341   Epoch: 8   Global Step: 138940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:16,530-Speed 9463.87 samples/sec   Loss 6.0687   LearningRate 0.0341   Epoch: 8   Global Step: 138950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:17,594-Speed 9636.36 samples/sec   Loss 6.1497   LearningRate 0.0341   Epoch: 8   Global Step: 138960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:18,673-Speed 9494.13 samples/sec   Loss 6.1637   LearningRate 0.0341   Epoch: 8   Global Step: 138970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:19,741-Speed 9595.39 samples/sec   Loss 6.2015   LearningRate 0.0341   Epoch: 8   Global Step: 138980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:20,794-Speed 9735.71 samples/sec   Loss 6.1234   LearningRate 0.0341   Epoch: 8   Global Step: 138990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:21,884-Speed 9395.04 samples/sec   Loss 6.1963   LearningRate 0.0341   Epoch: 8   Global Step: 139000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:22,982-Speed 9327.63 samples/sec   Loss 6.1591   LearningRate 0.0341   Epoch: 8   Global Step: 139010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:24,056-Speed 9540.21 samples/sec   Loss 6.1944   LearningRate 0.0341   Epoch: 8   Global Step: 139020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:25,171-Speed 9190.55 samples/sec   Loss 6.1203   LearningRate 0.0340   Epoch: 8   Global Step: 139030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:26,230-Speed 9674.03 samples/sec   Loss 6.1824   LearningRate 0.0340   Epoch: 8   Global Step: 139040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:27,324-Speed 9372.17 samples/sec   Loss 6.2679   LearningRate 0.0340   Epoch: 8   Global Step: 139050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:28,378-Speed 9716.48 samples/sec   Loss 6.1434   LearningRate 0.0340   Epoch: 8   Global Step: 139060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:29,416-Speed 9868.64 samples/sec   Loss 6.0650   LearningRate 0.0340   Epoch: 8   Global Step: 139070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:30,507-Speed 9398.22 samples/sec   Loss 6.1593   LearningRate 0.0340   Epoch: 8   Global Step: 139080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:31,618-Speed 9227.08 samples/sec   Loss 6.1727   LearningRate 0.0340   Epoch: 8   Global Step: 139090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:32,738-Speed 9141.48 samples/sec   Loss 6.1711   LearningRate 0.0340   Epoch: 8   Global Step: 139100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:33,838-Speed 9323.84 samples/sec   Loss 6.1402   LearningRate 0.0340   Epoch: 8   Global Step: 139110   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:08:34,973-Speed 9023.50 samples/sec   Loss 6.2175   LearningRate 0.0340   Epoch: 8   Global Step: 139120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:36,014-Speed 9838.76 samples/sec   Loss 6.1082   LearningRate 0.0340   Epoch: 8   Global Step: 139130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:08:37,066-Speed 9746.97 samples/sec   Loss 6.2341   LearningRate 0.0340   Epoch: 8   Global Step: 139140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:08:38,093-Speed 9974.38 samples/sec   Loss 6.1988   LearningRate 0.0340   Epoch: 8   Global Step: 139150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:08:39,137-Speed 9811.30 samples/sec   Loss 6.0432   LearningRate 0.0340   Epoch: 8   Global Step: 139160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:08:40,222-Speed 9442.14 samples/sec   Loss 6.1684   LearningRate 0.0340   Epoch: 8   Global Step: 139170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:08:41,335-Speed 9206.73 samples/sec   Loss 6.1350   LearningRate 0.0340   Epoch: 8   Global Step: 139180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:08:42,392-Speed 9691.12 samples/sec   Loss 6.2078   LearningRate 0.0340   Epoch: 8   Global Step: 139190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:08:43,477-Speed 9443.98 samples/sec   Loss 6.1724   LearningRate 0.0340   Epoch: 8   Global Step: 139200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:08:44,560-Speed 9468.08 samples/sec   Loss 6.1135   LearningRate 0.0340   Epoch: 8   Global Step: 139210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:08:45,622-Speed 9643.55 samples/sec   Loss 6.1988   LearningRate 0.0340   Epoch: 8   Global Step: 139220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:08:46,704-Speed 9468.37 samples/sec   Loss 6.1732   LearningRate 0.0340   Epoch: 8   Global Step: 139230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:47,811-Speed 9256.21 samples/sec   Loss 6.2660   LearningRate 0.0340   Epoch: 8   Global Step: 139240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:48,906-Speed 9361.10 samples/sec   Loss 6.1587   LearningRate 0.0340   Epoch: 8   Global Step: 139250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:49,939-Speed 9914.57 samples/sec   Loss 6.1726   LearningRate 0.0340   Epoch: 8   Global Step: 139260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:50,984-Speed 9813.80 samples/sec   Loss 6.3172   LearningRate 0.0340   Epoch: 8   Global Step: 139270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:52,063-Speed 9499.46 samples/sec   Loss 6.2095   LearningRate 0.0340   Epoch: 8   Global Step: 139280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:53,155-Speed 9375.80 samples/sec   Loss 6.2048   LearningRate 0.0340   Epoch: 8   Global Step: 139290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:54,212-Speed 9693.54 samples/sec   Loss 6.2638   LearningRate 0.0340   Epoch: 8   Global Step: 139300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:55,261-Speed 9769.28 samples/sec   Loss 6.1829   LearningRate 0.0340   Epoch: 8   Global Step: 139310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:56,341-Speed 9489.79 samples/sec   Loss 6.1227   LearningRate 0.0339   Epoch: 8   Global Step: 139320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:57,429-Speed 9414.13 samples/sec   Loss 6.1510   LearningRate 0.0339   Epoch: 8   Global Step: 139330   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:08:58,474-Speed 9801.62 samples/sec   Loss 6.1410   LearningRate 0.0339   Epoch: 8   Global Step: 139340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:08:59,568-Speed 9365.48 samples/sec   Loss 6.2211   LearningRate 0.0339   Epoch: 8   Global Step: 139350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:00,636-Speed 9593.25 samples/sec   Loss 6.1620   LearningRate 0.0339   Epoch: 8   Global Step: 139360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:01,773-Speed 9013.82 samples/sec   Loss 6.1493   LearningRate 0.0339   Epoch: 8   Global Step: 139370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:02,859-Speed 9430.68 samples/sec   Loss 6.1397   LearningRate 0.0339   Epoch: 8   Global Step: 139380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:03,934-Speed 9541.55 samples/sec   Loss 6.0860   LearningRate 0.0339   Epoch: 8   Global Step: 139390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:04,989-Speed 9708.00 samples/sec   Loss 6.2170   LearningRate 0.0339   Epoch: 8   Global Step: 139400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:06,056-Speed 9600.55 samples/sec   Loss 6.1786   LearningRate 0.0339   Epoch: 8   Global Step: 139410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:07,144-Speed 9417.66 samples/sec   Loss 6.0847   LearningRate 0.0339   Epoch: 8   Global Step: 139420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:08,217-Speed 9553.30 samples/sec   Loss 6.1826   LearningRate 0.0339   Epoch: 8   Global Step: 139430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:09,305-Speed 9417.40 samples/sec   Loss 6.1513   LearningRate 0.0339   Epoch: 8   Global Step: 139440   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:09:10,344-Speed 9866.88 samples/sec   Loss 6.1951   LearningRate 0.0339   Epoch: 8   Global Step: 139450   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:09:11,401-Speed 9692.12 samples/sec   Loss 6.2595   LearningRate 0.0339   Epoch: 8   Global Step: 139460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:12,448-Speed 9778.58 samples/sec   Loss 6.3053   LearningRate 0.0339   Epoch: 8   Global Step: 139470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:13,514-Speed 9615.94 samples/sec   Loss 6.2992   LearningRate 0.0339   Epoch: 8   Global Step: 139480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:14,563-Speed 9773.29 samples/sec   Loss 6.2133   LearningRate 0.0339   Epoch: 8   Global Step: 139490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:15,632-Speed 9585.84 samples/sec   Loss 6.2015   LearningRate 0.0339   Epoch: 8   Global Step: 139500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:16,727-Speed 9354.53 samples/sec   Loss 6.1913   LearningRate 0.0339   Epoch: 8   Global Step: 139510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:17,813-Speed 9433.38 samples/sec   Loss 6.2282   LearningRate 0.0339   Epoch: 8   Global Step: 139520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:18,851-Speed 9867.29 samples/sec   Loss 6.1450   LearningRate 0.0339   Epoch: 8   Global Step: 139530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:19,907-Speed 9704.52 samples/sec   Loss 6.2096   LearningRate 0.0339   Epoch: 8   Global Step: 139540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:20,959-Speed 9738.68 samples/sec   Loss 6.3402   LearningRate 0.0339   Epoch: 8   Global Step: 139550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:22,011-Speed 9739.56 samples/sec   Loss 6.1670   LearningRate 0.0339   Epoch: 8   Global Step: 139560   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:09:23,075-Speed 9636.37 samples/sec   Loss 6.3201   LearningRate 0.0339   Epoch: 8   Global Step: 139570   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:09:24,122-Speed 9777.85 samples/sec   Loss 6.3077   LearningRate 0.0339   Epoch: 8   Global Step: 139580   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:09:25,180-Speed 9687.20 samples/sec   Loss 6.2008   LearningRate 0.0339   Epoch: 8   Global Step: 139590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:26,238-Speed 9684.12 samples/sec   Loss 6.1829   LearningRate 0.0339   Epoch: 8   Global Step: 139600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:27,275-Speed 9880.95 samples/sec   Loss 6.0729   LearningRate 0.0338   Epoch: 8   Global Step: 139610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:28,357-Speed 9473.20 samples/sec   Loss 6.1640   LearningRate 0.0338   Epoch: 8   Global Step: 139620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:29,454-Speed 9342.48 samples/sec   Loss 6.2228   LearningRate 0.0338   Epoch: 8   Global Step: 139630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:30,522-Speed 9587.14 samples/sec   Loss 6.1758   LearningRate 0.0338   Epoch: 8   Global Step: 139640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:31,579-Speed 9696.22 samples/sec   Loss 6.2286   LearningRate 0.0338   Epoch: 8   Global Step: 139650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:32,664-Speed 9446.48 samples/sec   Loss 6.1019   LearningRate 0.0338   Epoch: 8   Global Step: 139660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:33,748-Speed 9455.68 samples/sec   Loss 6.1938   LearningRate 0.0338   Epoch: 8   Global Step: 139670   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:09:34,857-Speed 9238.01 samples/sec   Loss 6.1694   LearningRate 0.0338   Epoch: 8   Global Step: 139680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:09:35,949-Speed 9383.78 samples/sec   Loss 6.2248   LearningRate 0.0338   Epoch: 8   Global Step: 139690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:09:37,025-Speed 9523.68 samples/sec   Loss 6.2243   LearningRate 0.0338   Epoch: 8   Global Step: 139700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:09:38,128-Speed 9285.55 samples/sec   Loss 6.1946   LearningRate 0.0338   Epoch: 8   Global Step: 139710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:09:39,174-Speed 9796.97 samples/sec   Loss 6.2297   LearningRate 0.0338   Epoch: 8   Global Step: 139720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:09:40,312-Speed 9001.48 samples/sec   Loss 6.1893   LearningRate 0.0338   Epoch: 8   Global Step: 139730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:09:41,387-Speed 9535.33 samples/sec   Loss 6.1897   LearningRate 0.0338   Epoch: 8   Global Step: 139740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:09:42,440-Speed 9728.80 samples/sec   Loss 6.2345   LearningRate 0.0338   Epoch: 8   Global Step: 139750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:09:43,486-Speed 9792.62 samples/sec   Loss 6.1892   LearningRate 0.0338   Epoch: 8   Global Step: 139760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:09:44,624-Speed 9009.31 samples/sec   Loss 6.2017   LearningRate 0.0338   Epoch: 8   Global Step: 139770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:45,701-Speed 9510.90 samples/sec   Loss 6.2608   LearningRate 0.0338   Epoch: 8   Global Step: 139780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:46,780-Speed 9500.57 samples/sec   Loss 6.2129   LearningRate 0.0338   Epoch: 8   Global Step: 139790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:47,851-Speed 9561.73 samples/sec   Loss 6.2710   LearningRate 0.0338   Epoch: 8   Global Step: 139800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:48,956-Speed 9279.19 samples/sec   Loss 6.2329   LearningRate 0.0338   Epoch: 8   Global Step: 139810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:50,014-Speed 9685.67 samples/sec   Loss 6.1790   LearningRate 0.0338   Epoch: 8   Global Step: 139820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:51,082-Speed 9594.47 samples/sec   Loss 6.1835   LearningRate 0.0338   Epoch: 8   Global Step: 139830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:52,135-Speed 9727.00 samples/sec   Loss 6.1882   LearningRate 0.0338   Epoch: 8   Global Step: 139840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:53,191-Speed 9701.62 samples/sec   Loss 6.1760   LearningRate 0.0338   Epoch: 8   Global Step: 139850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:54,249-Speed 9684.00 samples/sec   Loss 6.2529   LearningRate 0.0338   Epoch: 8   Global Step: 139860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:55,312-Speed 9643.04 samples/sec   Loss 6.1690   LearningRate 0.0338   Epoch: 8   Global Step: 139870   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:09:56,358-Speed 9796.66 samples/sec   Loss 6.2088   LearningRate 0.0338   Epoch: 8   Global Step: 139880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:57,433-Speed 9529.86 samples/sec   Loss 6.2452   LearningRate 0.0337   Epoch: 8   Global Step: 139890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:58,536-Speed 9284.08 samples/sec   Loss 6.1787   LearningRate 0.0337   Epoch: 8   Global Step: 139900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:09:59,594-Speed 9683.26 samples/sec   Loss 6.2066   LearningRate 0.0337   Epoch: 8   Global Step: 139910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:10:00,681-Speed 9431.00 samples/sec   Loss 6.1712   LearningRate 0.0337   Epoch: 8   Global Step: 139920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:10:01,789-Speed 9244.28 samples/sec   Loss 6.1290   LearningRate 0.0337   Epoch: 8   Global Step: 139930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:10:02,850-Speed 9660.01 samples/sec   Loss 6.2574   LearningRate 0.0337   Epoch: 8   Global Step: 139940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:10:03,902-Speed 9742.31 samples/sec   Loss 6.1915   LearningRate 0.0337   Epoch: 8   Global Step: 139950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:10:04,951-Speed 9772.13 samples/sec   Loss 6.1653   LearningRate 0.0337   Epoch: 8   Global Step: 139960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:10:06,055-Speed 9277.49 samples/sec   Loss 6.3222   LearningRate 0.0337   Epoch: 8   Global Step: 139970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:10:07,159-Speed 9286.07 samples/sec   Loss 6.2665   LearningRate 0.0337   Epoch: 8   Global Step: 139980   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:10:08,267-Speed 9248.49 samples/sec   Loss 6.1683   LearningRate 0.0337   Epoch: 8   Global Step: 139990   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:10:09,380-Speed 9200.82 samples/sec   Loss 6.2481   LearningRate 0.0337   Epoch: 8   Global Step: 140000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:10:31,424-[lfw][140000]XNorm: 10.335960
Training: 2022-04-11 17:10:31,425-[lfw][140000]Accuracy-Flip: 0.99600+-0.00281
Training: 2022-04-11 17:10:31,426-[lfw][140000]Accuracy-Highest: 0.99683
Training: 2022-04-11 17:10:56,887-[cfp_fp][140000]XNorm: 8.793890
Training: 2022-04-11 17:10:56,888-[cfp_fp][140000]Accuracy-Flip: 0.96229+-0.01035
Training: 2022-04-11 17:10:56,888-[cfp_fp][140000]Accuracy-Highest: 0.96500
Training: 2022-04-11 17:11:18,756-[agedb_30][140000]XNorm: 10.007913
Training: 2022-04-11 17:11:18,756-[agedb_30][140000]Accuracy-Flip: 0.96300+-0.00859
Training: 2022-04-11 17:11:18,756-[agedb_30][140000]Accuracy-Highest: 0.96650
Training: 2022-04-11 17:11:19,830-Speed 145.35 samples/sec   Loss 6.2682   LearningRate 0.0337   Epoch: 8   Global Step: 140010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:20,915-Speed 9450.39 samples/sec   Loss 6.2826   LearningRate 0.0337   Epoch: 8   Global Step: 140020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:21,963-Speed 9774.82 samples/sec   Loss 6.1669   LearningRate 0.0337   Epoch: 8   Global Step: 140030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:23,042-Speed 9491.19 samples/sec   Loss 6.2485   LearningRate 0.0337   Epoch: 8   Global Step: 140040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:24,102-Speed 9665.78 samples/sec   Loss 6.2438   LearningRate 0.0337   Epoch: 8   Global Step: 140050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:25,166-Speed 9634.45 samples/sec   Loss 6.2239   LearningRate 0.0337   Epoch: 8   Global Step: 140060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:26,242-Speed 9519.34 samples/sec   Loss 6.1736   LearningRate 0.0337   Epoch: 8   Global Step: 140070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:27,367-Speed 9109.85 samples/sec   Loss 6.2732   LearningRate 0.0337   Epoch: 8   Global Step: 140080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:28,436-Speed 9586.26 samples/sec   Loss 6.2312   LearningRate 0.0337   Epoch: 8   Global Step: 140090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:29,493-Speed 9687.06 samples/sec   Loss 6.2207   LearningRate 0.0337   Epoch: 8   Global Step: 140100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:30,519-Speed 9992.67 samples/sec   Loss 6.2847   LearningRate 0.0337   Epoch: 8   Global Step: 140110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:31,622-Speed 9293.00 samples/sec   Loss 6.1792   LearningRate 0.0337   Epoch: 8   Global Step: 140120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:32,690-Speed 9591.56 samples/sec   Loss 6.1653   LearningRate 0.0337   Epoch: 8   Global Step: 140130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:33,740-Speed 9763.75 samples/sec   Loss 6.2642   LearningRate 0.0337   Epoch: 8   Global Step: 140140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:34,848-Speed 9247.99 samples/sec   Loss 6.2459   LearningRate 0.0337   Epoch: 8   Global Step: 140150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:35,944-Speed 9342.94 samples/sec   Loss 6.2307   LearningRate 0.0337   Epoch: 8   Global Step: 140160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:37,024-Speed 9487.54 samples/sec   Loss 6.3432   LearningRate 0.0337   Epoch: 8   Global Step: 140170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:38,095-Speed 9566.96 samples/sec   Loss 6.2460   LearningRate 0.0336   Epoch: 8   Global Step: 140180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:39,186-Speed 9393.40 samples/sec   Loss 6.1958   LearningRate 0.0336   Epoch: 8   Global Step: 140190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:40,242-Speed 9705.24 samples/sec   Loss 6.2711   LearningRate 0.0336   Epoch: 8   Global Step: 140200   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:11:41,296-Speed 9722.94 samples/sec   Loss 6.2054   LearningRate 0.0336   Epoch: 8   Global Step: 140210   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:11:42,360-Speed 9626.88 samples/sec   Loss 6.2848   LearningRate 0.0336   Epoch: 8   Global Step: 140220   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:11:43,440-Speed 9489.59 samples/sec   Loss 6.3395   LearningRate 0.0336   Epoch: 8   Global Step: 140230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:44,553-Speed 9205.31 samples/sec   Loss 6.2770   LearningRate 0.0336   Epoch: 8   Global Step: 140240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:45,655-Speed 9289.64 samples/sec   Loss 6.3425   LearningRate 0.0336   Epoch: 8   Global Step: 140250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:46,751-Speed 9348.10 samples/sec   Loss 6.2680   LearningRate 0.0336   Epoch: 8   Global Step: 140260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:47,834-Speed 9463.88 samples/sec   Loss 6.1844   LearningRate 0.0336   Epoch: 8   Global Step: 140270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:48,927-Speed 9373.22 samples/sec   Loss 6.2395   LearningRate 0.0336   Epoch: 8   Global Step: 140280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:50,011-Speed 9449.27 samples/sec   Loss 6.2736   LearningRate 0.0336   Epoch: 8   Global Step: 140290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:51,080-Speed 9589.25 samples/sec   Loss 6.2732   LearningRate 0.0336   Epoch: 8   Global Step: 140300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:52,183-Speed 9293.12 samples/sec   Loss 6.2472   LearningRate 0.0336   Epoch: 8   Global Step: 140310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:53,241-Speed 9685.17 samples/sec   Loss 6.3303   LearningRate 0.0336   Epoch: 8   Global Step: 140320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:54,317-Speed 9519.55 samples/sec   Loss 6.3293   LearningRate 0.0336   Epoch: 8   Global Step: 140330   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:11:55,422-Speed 9278.16 samples/sec   Loss 6.1394   LearningRate 0.0336   Epoch: 8   Global Step: 140340   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:11:56,449-Speed 9974.50 samples/sec   Loss 6.2563   LearningRate 0.0336   Epoch: 8   Global Step: 140350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:57,519-Speed 9570.40 samples/sec   Loss 6.2889   LearningRate 0.0336   Epoch: 8   Global Step: 140360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:58,583-Speed 9634.06 samples/sec   Loss 6.1808   LearningRate 0.0336   Epoch: 8   Global Step: 140370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:11:59,631-Speed 9773.33 samples/sec   Loss 6.3816   LearningRate 0.0336   Epoch: 8   Global Step: 140380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:00,712-Speed 9484.03 samples/sec   Loss 6.2400   LearningRate 0.0336   Epoch: 8   Global Step: 140390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:01,804-Speed 9384.41 samples/sec   Loss 6.1150   LearningRate 0.0336   Epoch: 8   Global Step: 140400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:02,912-Speed 9247.24 samples/sec   Loss 6.2695   LearningRate 0.0336   Epoch: 8   Global Step: 140410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:04,006-Speed 9367.19 samples/sec   Loss 6.2910   LearningRate 0.0336   Epoch: 8   Global Step: 140420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:05,098-Speed 9384.81 samples/sec   Loss 6.2508   LearningRate 0.0336   Epoch: 8   Global Step: 140430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:06,197-Speed 9317.88 samples/sec   Loss 6.3551   LearningRate 0.0336   Epoch: 8   Global Step: 140440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:07,301-Speed 9281.59 samples/sec   Loss 6.3454   LearningRate 0.0336   Epoch: 8   Global Step: 140450   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:12:08,365-Speed 9635.85 samples/sec   Loss 6.2113   LearningRate 0.0336   Epoch: 8   Global Step: 140460   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:12:09,469-Speed 9282.09 samples/sec   Loss 6.2544   LearningRate 0.0335   Epoch: 8   Global Step: 140470   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:12:10,545-Speed 9523.42 samples/sec   Loss 6.3370   LearningRate 0.0335   Epoch: 8   Global Step: 140480   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:12:11,635-Speed 9400.01 samples/sec   Loss 6.1713   LearningRate 0.0335   Epoch: 8   Global Step: 140490   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:12:12,737-Speed 9296.15 samples/sec   Loss 6.2896   LearningRate 0.0335   Epoch: 8   Global Step: 140500   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:12:13,832-Speed 9353.29 samples/sec   Loss 6.2797   LearningRate 0.0335   Epoch: 8   Global Step: 140510   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:12:14,902-Speed 9584.65 samples/sec   Loss 6.1125   LearningRate 0.0335   Epoch: 8   Global Step: 140520   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:12:15,984-Speed 9469.81 samples/sec   Loss 6.4132   LearningRate 0.0335   Epoch: 8   Global Step: 140530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:17,066-Speed 9465.03 samples/sec   Loss 6.3524   LearningRate 0.0335   Epoch: 8   Global Step: 140540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:18,127-Speed 9661.26 samples/sec   Loss 6.2707   LearningRate 0.0335   Epoch: 8   Global Step: 140550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:19,230-Speed 9286.18 samples/sec   Loss 6.1678   LearningRate 0.0335   Epoch: 8   Global Step: 140560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:20,289-Speed 9676.08 samples/sec   Loss 6.2362   LearningRate 0.0335   Epoch: 8   Global Step: 140570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:21,389-Speed 9318.75 samples/sec   Loss 6.2683   LearningRate 0.0335   Epoch: 8   Global Step: 140580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:22,513-Speed 9111.72 samples/sec   Loss 6.3042   LearningRate 0.0335   Epoch: 8   Global Step: 140590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:23,572-Speed 9676.52 samples/sec   Loss 6.2088   LearningRate 0.0335   Epoch: 8   Global Step: 140600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:24,686-Speed 9197.40 samples/sec   Loss 6.2547   LearningRate 0.0335   Epoch: 8   Global Step: 140610   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:25,734-Speed 9782.34 samples/sec   Loss 6.3267   LearningRate 0.0335   Epoch: 8   Global Step: 140620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:26,835-Speed 9306.01 samples/sec   Loss 6.2326   LearningRate 0.0335   Epoch: 8   Global Step: 140630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:27,942-Speed 9256.07 samples/sec   Loss 6.3520   LearningRate 0.0335   Epoch: 8   Global Step: 140640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:29,055-Speed 9198.37 samples/sec   Loss 6.2126   LearningRate 0.0335   Epoch: 8   Global Step: 140650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:30,152-Speed 9347.78 samples/sec   Loss 6.1800   LearningRate 0.0335   Epoch: 8   Global Step: 140660   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:31,220-Speed 9600.54 samples/sec   Loss 6.1831   LearningRate 0.0335   Epoch: 8   Global Step: 140670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:32,290-Speed 9567.76 samples/sec   Loss 6.3022   LearningRate 0.0335   Epoch: 8   Global Step: 140680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:33,360-Speed 9578.92 samples/sec   Loss 6.1232   LearningRate 0.0335   Epoch: 8   Global Step: 140690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:34,424-Speed 9635.16 samples/sec   Loss 6.1819   LearningRate 0.0335   Epoch: 8   Global Step: 140700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:35,545-Speed 9138.98 samples/sec   Loss 6.2315   LearningRate 0.0335   Epoch: 8   Global Step: 140710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:36,635-Speed 9398.96 samples/sec   Loss 6.1675   LearningRate 0.0335   Epoch: 8   Global Step: 140720   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:37,736-Speed 9308.75 samples/sec   Loss 6.2201   LearningRate 0.0335   Epoch: 8   Global Step: 140730   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:38,850-Speed 9195.41 samples/sec   Loss 6.3293   LearningRate 0.0335   Epoch: 8   Global Step: 140740   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:39,950-Speed 9313.42 samples/sec   Loss 6.2777   LearningRate 0.0335   Epoch: 8   Global Step: 140750   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:41,028-Speed 9503.47 samples/sec   Loss 6.3572   LearningRate 0.0334   Epoch: 8   Global Step: 140760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:42,093-Speed 9623.64 samples/sec   Loss 6.1437   LearningRate 0.0334   Epoch: 8   Global Step: 140770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:43,182-Speed 9406.80 samples/sec   Loss 6.2443   LearningRate 0.0334   Epoch: 8   Global Step: 140780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:44,307-Speed 9111.90 samples/sec   Loss 6.3096   LearningRate 0.0334   Epoch: 8   Global Step: 140790   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:12:45,434-Speed 9089.47 samples/sec   Loss 6.2478   LearningRate 0.0334   Epoch: 8   Global Step: 140800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:46,558-Speed 9109.45 samples/sec   Loss 6.1385   LearningRate 0.0334   Epoch: 8   Global Step: 140810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:47,666-Speed 9250.20 samples/sec   Loss 6.2155   LearningRate 0.0334   Epoch: 8   Global Step: 140820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:48,733-Speed 9603.39 samples/sec   Loss 6.2028   LearningRate 0.0334   Epoch: 8   Global Step: 140830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:49,830-Speed 9347.64 samples/sec   Loss 6.3354   LearningRate 0.0334   Epoch: 8   Global Step: 140840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:50,875-Speed 9808.23 samples/sec   Loss 6.2486   LearningRate 0.0334   Epoch: 8   Global Step: 140850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:51,961-Speed 9432.54 samples/sec   Loss 6.1988   LearningRate 0.0334   Epoch: 8   Global Step: 140860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:53,043-Speed 9471.79 samples/sec   Loss 6.2743   LearningRate 0.0334   Epoch: 8   Global Step: 140870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:54,124-Speed 9477.06 samples/sec   Loss 6.3081   LearningRate 0.0334   Epoch: 8   Global Step: 140880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:55,220-Speed 9340.64 samples/sec   Loss 6.2391   LearningRate 0.0334   Epoch: 8   Global Step: 140890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:56,277-Speed 9700.18 samples/sec   Loss 6.1768   LearningRate 0.0334   Epoch: 8   Global Step: 140900   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:12:57,337-Speed 9658.70 samples/sec   Loss 6.1306   LearningRate 0.0334   Epoch: 8   Global Step: 140910   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:12:58,385-Speed 9776.22 samples/sec   Loss 6.1185   LearningRate 0.0334   Epoch: 8   Global Step: 140920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:12:59,490-Speed 9281.11 samples/sec   Loss 6.1924   LearningRate 0.0334   Epoch: 8   Global Step: 140930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:00,570-Speed 9482.42 samples/sec   Loss 6.2852   LearningRate 0.0334   Epoch: 8   Global Step: 140940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:01,652-Speed 9472.19 samples/sec   Loss 6.1857   LearningRate 0.0334   Epoch: 8   Global Step: 140950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:02,779-Speed 9097.20 samples/sec   Loss 6.1701   LearningRate 0.0334   Epoch: 8   Global Step: 140960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:03,884-Speed 9268.25 samples/sec   Loss 6.1814   LearningRate 0.0334   Epoch: 8   Global Step: 140970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:04,975-Speed 9389.97 samples/sec   Loss 6.2486   LearningRate 0.0334   Epoch: 8   Global Step: 140980   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:06,017-Speed 9838.24 samples/sec   Loss 6.2500   LearningRate 0.0334   Epoch: 8   Global Step: 140990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:07,112-Speed 9355.02 samples/sec   Loss 6.2707   LearningRate 0.0334   Epoch: 8   Global Step: 141000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:08,207-Speed 9356.78 samples/sec   Loss 6.1966   LearningRate 0.0334   Epoch: 8   Global Step: 141010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:09,278-Speed 9569.98 samples/sec   Loss 6.2595   LearningRate 0.0334   Epoch: 8   Global Step: 141020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:10,373-Speed 9359.55 samples/sec   Loss 6.2366   LearningRate 0.0334   Epoch: 8   Global Step: 141030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:11,464-Speed 9386.59 samples/sec   Loss 6.3040   LearningRate 0.0334   Epoch: 8   Global Step: 141040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:12,532-Speed 9600.69 samples/sec   Loss 6.3104   LearningRate 0.0333   Epoch: 8   Global Step: 141050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:13,623-Speed 9389.89 samples/sec   Loss 6.2124   LearningRate 0.0333   Epoch: 8   Global Step: 141060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:14,698-Speed 9524.82 samples/sec   Loss 6.1347   LearningRate 0.0333   Epoch: 8   Global Step: 141070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:15,791-Speed 9375.49 samples/sec   Loss 6.1638   LearningRate 0.0333   Epoch: 8   Global Step: 141080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:16,875-Speed 9450.62 samples/sec   Loss 6.1742   LearningRate 0.0333   Epoch: 8   Global Step: 141090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:17,962-Speed 9433.98 samples/sec   Loss 6.1620   LearningRate 0.0333   Epoch: 8   Global Step: 141100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:19,077-Speed 9184.92 samples/sec   Loss 6.1892   LearningRate 0.0333   Epoch: 8   Global Step: 141110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:20,157-Speed 9486.79 samples/sec   Loss 6.2280   LearningRate 0.0333   Epoch: 8   Global Step: 141120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:21,237-Speed 9489.21 samples/sec   Loss 6.2494   LearningRate 0.0333   Epoch: 8   Global Step: 141130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:22,288-Speed 9752.29 samples/sec   Loss 6.2648   LearningRate 0.0333   Epoch: 8   Global Step: 141140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:23,373-Speed 9444.29 samples/sec   Loss 6.2869   LearningRate 0.0333   Epoch: 8   Global Step: 141150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:24,452-Speed 9492.81 samples/sec   Loss 6.3179   LearningRate 0.0333   Epoch: 8   Global Step: 141160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:25,540-Speed 9416.78 samples/sec   Loss 6.3205   LearningRate 0.0333   Epoch: 8   Global Step: 141170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:26,561-Speed 10035.14 samples/sec   Loss 6.1963   LearningRate 0.0333   Epoch: 8   Global Step: 141180   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:13:27,643-Speed 9476.36 samples/sec   Loss 6.2865   LearningRate 0.0333   Epoch: 8   Global Step: 141190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:28,736-Speed 9370.64 samples/sec   Loss 6.2185   LearningRate 0.0333   Epoch: 8   Global Step: 141200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:29,820-Speed 9448.25 samples/sec   Loss 6.2300   LearningRate 0.0333   Epoch: 8   Global Step: 141210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:30,901-Speed 9476.20 samples/sec   Loss 6.1885   LearningRate 0.0333   Epoch: 8   Global Step: 141220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:32,013-Speed 9218.80 samples/sec   Loss 6.1430   LearningRate 0.0333   Epoch: 8   Global Step: 141230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:33,084-Speed 9573.01 samples/sec   Loss 6.1936   LearningRate 0.0333   Epoch: 8   Global Step: 141240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:34,228-Speed 8952.98 samples/sec   Loss 6.2997   LearningRate 0.0333   Epoch: 8   Global Step: 141250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:35,358-Speed 9068.75 samples/sec   Loss 6.2577   LearningRate 0.0333   Epoch: 8   Global Step: 141260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:36,450-Speed 9385.30 samples/sec   Loss 6.2571   LearningRate 0.0333   Epoch: 8   Global Step: 141270   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:37,522-Speed 9550.62 samples/sec   Loss 6.2020   LearningRate 0.0333   Epoch: 8   Global Step: 141280   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:38,621-Speed 9340.93 samples/sec   Loss 6.2281   LearningRate 0.0333   Epoch: 8   Global Step: 141290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:39,681-Speed 9667.89 samples/sec   Loss 6.2448   LearningRate 0.0333   Epoch: 8   Global Step: 141300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:40,715-Speed 9909.68 samples/sec   Loss 6.2503   LearningRate 0.0333   Epoch: 8   Global Step: 141310   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:41,792-Speed 9514.97 samples/sec   Loss 6.2097   LearningRate 0.0333   Epoch: 8   Global Step: 141320   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:42,933-Speed 8973.03 samples/sec   Loss 6.1372   LearningRate 0.0332   Epoch: 8   Global Step: 141330   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:43,982-Speed 9772.29 samples/sec   Loss 6.3722   LearningRate 0.0332   Epoch: 8   Global Step: 141340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:45,076-Speed 9366.31 samples/sec   Loss 6.2051   LearningRate 0.0332   Epoch: 8   Global Step: 141350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:46,144-Speed 9589.98 samples/sec   Loss 6.2156   LearningRate 0.0332   Epoch: 8   Global Step: 141360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:47,218-Speed 9540.74 samples/sec   Loss 6.2208   LearningRate 0.0332   Epoch: 8   Global Step: 141370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:48,319-Speed 9302.85 samples/sec   Loss 6.3031   LearningRate 0.0332   Epoch: 8   Global Step: 141380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:49,344-Speed 9999.93 samples/sec   Loss 6.1786   LearningRate 0.0332   Epoch: 8   Global Step: 141390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:50,407-Speed 9636.79 samples/sec   Loss 6.1795   LearningRate 0.0332   Epoch: 8   Global Step: 141400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:13:51,475-Speed 9597.72 samples/sec   Loss 6.2127   LearningRate 0.0332   Epoch: 8   Global Step: 141410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:52,566-Speed 9395.20 samples/sec   Loss 6.1241   LearningRate 0.0332   Epoch: 8   Global Step: 141420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:53,692-Speed 9098.96 samples/sec   Loss 6.2406   LearningRate 0.0332   Epoch: 8   Global Step: 141430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:54,800-Speed 9243.56 samples/sec   Loss 6.2053   LearningRate 0.0332   Epoch: 8   Global Step: 141440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:55,854-Speed 9728.39 samples/sec   Loss 6.2237   LearningRate 0.0332   Epoch: 8   Global Step: 141450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:56,943-Speed 9410.18 samples/sec   Loss 6.0713   LearningRate 0.0332   Epoch: 8   Global Step: 141460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:58,029-Speed 9428.21 samples/sec   Loss 6.2314   LearningRate 0.0332   Epoch: 8   Global Step: 141470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:13:59,105-Speed 9526.52 samples/sec   Loss 6.1995   LearningRate 0.0332   Epoch: 8   Global Step: 141480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:00,181-Speed 9521.11 samples/sec   Loss 6.2338   LearningRate 0.0332   Epoch: 8   Global Step: 141490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:01,248-Speed 9605.41 samples/sec   Loss 6.1946   LearningRate 0.0332   Epoch: 8   Global Step: 141500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:02,271-Speed 10017.41 samples/sec   Loss 6.2076   LearningRate 0.0332   Epoch: 8   Global Step: 141510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:14:03,353-Speed 9468.87 samples/sec   Loss 6.2005   LearningRate 0.0332   Epoch: 8   Global Step: 141520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:14:04,466-Speed 9204.97 samples/sec   Loss 6.3037   LearningRate 0.0332   Epoch: 8   Global Step: 141530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:14:05,614-Speed 8923.51 samples/sec   Loss 6.2294   LearningRate 0.0332   Epoch: 8   Global Step: 141540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:14:06,724-Speed 9232.44 samples/sec   Loss 6.3281   LearningRate 0.0332   Epoch: 8   Global Step: 141550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:14:07,802-Speed 9512.81 samples/sec   Loss 6.2769   LearningRate 0.0332   Epoch: 8   Global Step: 141560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:14:08,888-Speed 9427.58 samples/sec   Loss 6.3589   LearningRate 0.0332   Epoch: 8   Global Step: 141570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:14:09,975-Speed 9425.25 samples/sec   Loss 6.2042   LearningRate 0.0332   Epoch: 8   Global Step: 141580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:14:11,056-Speed 9481.37 samples/sec   Loss 6.4421   LearningRate 0.0332   Epoch: 8   Global Step: 141590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:14:12,102-Speed 9794.39 samples/sec   Loss 6.2664   LearningRate 0.0332   Epoch: 8   Global Step: 141600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:14:13,194-Speed 9383.75 samples/sec   Loss 6.1666   LearningRate 0.0332   Epoch: 8   Global Step: 141610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:14,296-Speed 9294.04 samples/sec   Loss 6.3232   LearningRate 0.0331   Epoch: 8   Global Step: 141620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:15,376-Speed 9493.26 samples/sec   Loss 6.0951   LearningRate 0.0331   Epoch: 8   Global Step: 141630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:16,508-Speed 9049.28 samples/sec   Loss 6.3430   LearningRate 0.0331   Epoch: 8   Global Step: 141640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:17,637-Speed 9074.13 samples/sec   Loss 6.1047   LearningRate 0.0331   Epoch: 8   Global Step: 141650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:18,685-Speed 9776.47 samples/sec   Loss 6.1565   LearningRate 0.0331   Epoch: 8   Global Step: 141660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:19,784-Speed 9327.25 samples/sec   Loss 6.2603   LearningRate 0.0331   Epoch: 8   Global Step: 141670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:20,865-Speed 9479.05 samples/sec   Loss 6.1842   LearningRate 0.0331   Epoch: 8   Global Step: 141680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:21,906-Speed 9839.53 samples/sec   Loss 6.2701   LearningRate 0.0331   Epoch: 8   Global Step: 141690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:22,959-Speed 9730.89 samples/sec   Loss 6.1429   LearningRate 0.0331   Epoch: 8   Global Step: 141700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:23,998-Speed 9858.77 samples/sec   Loss 6.2359   LearningRate 0.0331   Epoch: 8   Global Step: 141710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:25,070-Speed 9561.83 samples/sec   Loss 6.1513   LearningRate 0.0331   Epoch: 8   Global Step: 141720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:26,198-Speed 9079.12 samples/sec   Loss 6.0568   LearningRate 0.0331   Epoch: 8   Global Step: 141730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:27,280-Speed 9473.02 samples/sec   Loss 6.2731   LearningRate 0.0331   Epoch: 8   Global Step: 141740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:28,373-Speed 9371.44 samples/sec   Loss 6.3766   LearningRate 0.0331   Epoch: 8   Global Step: 141750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:29,453-Speed 9488.90 samples/sec   Loss 6.1716   LearningRate 0.0331   Epoch: 8   Global Step: 141760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:30,534-Speed 9473.90 samples/sec   Loss 6.1965   LearningRate 0.0331   Epoch: 8   Global Step: 141770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:31,632-Speed 9341.98 samples/sec   Loss 6.2803   LearningRate 0.0331   Epoch: 8   Global Step: 141780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:32,687-Speed 9709.63 samples/sec   Loss 6.2294   LearningRate 0.0331   Epoch: 8   Global Step: 141790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:33,781-Speed 9361.90 samples/sec   Loss 6.2681   LearningRate 0.0331   Epoch: 8   Global Step: 141800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:34,848-Speed 9604.61 samples/sec   Loss 6.3122   LearningRate 0.0331   Epoch: 8   Global Step: 141810   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:14:35,904-Speed 9704.78 samples/sec   Loss 6.2339   LearningRate 0.0331   Epoch: 8   Global Step: 141820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:36,991-Speed 9421.52 samples/sec   Loss 6.2431   LearningRate 0.0331   Epoch: 8   Global Step: 141830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:38,128-Speed 9014.65 samples/sec   Loss 6.2840   LearningRate 0.0331   Epoch: 8   Global Step: 141840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:39,204-Speed 9526.96 samples/sec   Loss 6.1848   LearningRate 0.0331   Epoch: 8   Global Step: 141850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:40,289-Speed 9443.76 samples/sec   Loss 6.2853   LearningRate 0.0331   Epoch: 8   Global Step: 141860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:41,380-Speed 9388.07 samples/sec   Loss 6.1804   LearningRate 0.0331   Epoch: 8   Global Step: 141870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:42,504-Speed 9114.46 samples/sec   Loss 6.1434   LearningRate 0.0331   Epoch: 8   Global Step: 141880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:43,616-Speed 9217.22 samples/sec   Loss 6.3039   LearningRate 0.0331   Epoch: 8   Global Step: 141890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:44,680-Speed 9625.68 samples/sec   Loss 6.2770   LearningRate 0.0331   Epoch: 8   Global Step: 141900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:45,737-Speed 9694.66 samples/sec   Loss 6.3574   LearningRate 0.0330   Epoch: 8   Global Step: 141910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:46,787-Speed 9755.17 samples/sec   Loss 6.1992   LearningRate 0.0330   Epoch: 8   Global Step: 141920   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:14:47,847-Speed 9669.92 samples/sec   Loss 6.2963   LearningRate 0.0330   Epoch: 8   Global Step: 141930   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:14:48,925-Speed 9499.32 samples/sec   Loss 6.1981   LearningRate 0.0330   Epoch: 8   Global Step: 141940   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:14:49,979-Speed 9722.55 samples/sec   Loss 6.2547   LearningRate 0.0330   Epoch: 8   Global Step: 141950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:51,092-Speed 9221.47 samples/sec   Loss 6.2564   LearningRate 0.0330   Epoch: 8   Global Step: 141960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:52,144-Speed 9731.96 samples/sec   Loss 6.0840   LearningRate 0.0330   Epoch: 8   Global Step: 141970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:53,251-Speed 9259.90 samples/sec   Loss 6.2489   LearningRate 0.0330   Epoch: 8   Global Step: 141980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:54,349-Speed 9337.15 samples/sec   Loss 6.2699   LearningRate 0.0330   Epoch: 8   Global Step: 141990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:14:55,429-Speed 9487.03 samples/sec   Loss 6.3012   LearningRate 0.0330   Epoch: 8   Global Step: 142000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:15:17,792-[lfw][142000]XNorm: 10.203931
Training: 2022-04-11 17:15:17,793-[lfw][142000]Accuracy-Flip: 0.99550+-0.00248
Training: 2022-04-11 17:15:17,793-[lfw][142000]Accuracy-Highest: 0.99683
Training: 2022-04-11 17:15:43,444-[cfp_fp][142000]XNorm: 8.713773
Training: 2022-04-11 17:15:43,445-[cfp_fp][142000]Accuracy-Flip: 0.95700+-0.00839
Training: 2022-04-11 17:15:43,445-[cfp_fp][142000]Accuracy-Highest: 0.96500
Training: 2022-04-11 17:16:05,505-[agedb_30][142000]XNorm: 9.926271
Training: 2022-04-11 17:16:05,562-[agedb_30][142000]Accuracy-Flip: 0.96017+-0.01076
Training: 2022-04-11 17:16:05,562-[agedb_30][142000]Accuracy-Highest: 0.96650
Training: 2022-04-11 17:16:06,685-Speed 143.71 samples/sec   Loss 6.2521   LearningRate 0.0330   Epoch: 8   Global Step: 142010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:07,814-Speed 9078.24 samples/sec   Loss 6.2125   LearningRate 0.0330   Epoch: 8   Global Step: 142020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:08,927-Speed 9202.04 samples/sec   Loss 6.2445   LearningRate 0.0330   Epoch: 8   Global Step: 142030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:16:09,993-Speed 9612.28 samples/sec   Loss 6.2976   LearningRate 0.0330   Epoch: 8   Global Step: 142040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:16:11,076-Speed 9456.44 samples/sec   Loss 6.2459   LearningRate 0.0330   Epoch: 8   Global Step: 142050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:16:12,125-Speed 9767.28 samples/sec   Loss 6.2432   LearningRate 0.0330   Epoch: 8   Global Step: 142060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:16:13,227-Speed 9302.60 samples/sec   Loss 6.3893   LearningRate 0.0330   Epoch: 8   Global Step: 142070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:16:14,319-Speed 9385.17 samples/sec   Loss 6.1955   LearningRate 0.0330   Epoch: 8   Global Step: 142080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:16:15,433-Speed 9197.20 samples/sec   Loss 6.2162   LearningRate 0.0330   Epoch: 8   Global Step: 142090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:16:16,526-Speed 9390.84 samples/sec   Loss 6.3551   LearningRate 0.0330   Epoch: 8   Global Step: 142100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:16:17,592-Speed 9606.17 samples/sec   Loss 6.3009   LearningRate 0.0330   Epoch: 8   Global Step: 142110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:16:18,695-Speed 9293.52 samples/sec   Loss 6.3242   LearningRate 0.0330   Epoch: 8   Global Step: 142120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:16:19,766-Speed 9564.91 samples/sec   Loss 6.2606   LearningRate 0.0330   Epoch: 8   Global Step: 142130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:20,849-Speed 9460.39 samples/sec   Loss 6.2031   LearningRate 0.0330   Epoch: 8   Global Step: 142140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:21,914-Speed 9626.84 samples/sec   Loss 6.1945   LearningRate 0.0330   Epoch: 8   Global Step: 142150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:22,972-Speed 9686.51 samples/sec   Loss 6.2228   LearningRate 0.0330   Epoch: 8   Global Step: 142160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:24,046-Speed 9537.13 samples/sec   Loss 6.2691   LearningRate 0.0330   Epoch: 8   Global Step: 142170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:25,142-Speed 9348.78 samples/sec   Loss 6.2074   LearningRate 0.0330   Epoch: 8   Global Step: 142180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:26,232-Speed 9398.46 samples/sec   Loss 6.2541   LearningRate 0.0330   Epoch: 8   Global Step: 142190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:27,318-Speed 9437.48 samples/sec   Loss 6.2324   LearningRate 0.0330   Epoch: 8   Global Step: 142200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:28,435-Speed 9167.65 samples/sec   Loss 6.2226   LearningRate 0.0329   Epoch: 8   Global Step: 142210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:29,529-Speed 9369.68 samples/sec   Loss 6.2300   LearningRate 0.0329   Epoch: 8   Global Step: 142220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:30,626-Speed 9345.16 samples/sec   Loss 6.1721   LearningRate 0.0329   Epoch: 8   Global Step: 142230   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:16:31,716-Speed 9394.92 samples/sec   Loss 6.2712   LearningRate 0.0329   Epoch: 8   Global Step: 142240   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:16:32,807-Speed 9389.26 samples/sec   Loss 6.2214   LearningRate 0.0329   Epoch: 8   Global Step: 142250   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:16:33,854-Speed 9792.35 samples/sec   Loss 6.2435   LearningRate 0.0329   Epoch: 8   Global Step: 142260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:34,925-Speed 9563.09 samples/sec   Loss 6.1375   LearningRate 0.0329   Epoch: 8   Global Step: 142270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:36,016-Speed 9392.05 samples/sec   Loss 6.2855   LearningRate 0.0329   Epoch: 8   Global Step: 142280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:37,114-Speed 9332.13 samples/sec   Loss 6.2449   LearningRate 0.0329   Epoch: 8   Global Step: 142290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:38,231-Speed 9173.43 samples/sec   Loss 6.2579   LearningRate 0.0329   Epoch: 8   Global Step: 142300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:39,365-Speed 9038.40 samples/sec   Loss 6.1604   LearningRate 0.0329   Epoch: 8   Global Step: 142310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:40,466-Speed 9302.41 samples/sec   Loss 6.2176   LearningRate 0.0329   Epoch: 8   Global Step: 142320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:41,539-Speed 9553.91 samples/sec   Loss 6.2096   LearningRate 0.0329   Epoch: 8   Global Step: 142330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:42,652-Speed 9211.83 samples/sec   Loss 6.2204   LearningRate 0.0329   Epoch: 8   Global Step: 142340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:43,718-Speed 9608.32 samples/sec   Loss 6.2757   LearningRate 0.0329   Epoch: 8   Global Step: 142350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:44,760-Speed 9834.48 samples/sec   Loss 6.2559   LearningRate 0.0329   Epoch: 8   Global Step: 142360   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:16:45,845-Speed 9440.59 samples/sec   Loss 6.0951   LearningRate 0.0329   Epoch: 8   Global Step: 142370   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:16:46,945-Speed 9315.35 samples/sec   Loss 6.1813   LearningRate 0.0329   Epoch: 8   Global Step: 142380   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:16:48,016-Speed 9570.65 samples/sec   Loss 6.1757   LearningRate 0.0329   Epoch: 8   Global Step: 142390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:49,117-Speed 9307.72 samples/sec   Loss 6.2160   LearningRate 0.0329   Epoch: 8   Global Step: 142400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:50,186-Speed 9581.52 samples/sec   Loss 6.1525   LearningRate 0.0329   Epoch: 8   Global Step: 142410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:51,258-Speed 9560.20 samples/sec   Loss 6.3239   LearningRate 0.0329   Epoch: 8   Global Step: 142420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:52,312-Speed 9717.72 samples/sec   Loss 6.1660   LearningRate 0.0329   Epoch: 8   Global Step: 142430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:53,406-Speed 9371.76 samples/sec   Loss 6.2856   LearningRate 0.0329   Epoch: 8   Global Step: 142440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:54,500-Speed 9358.83 samples/sec   Loss 6.2294   LearningRate 0.0329   Epoch: 8   Global Step: 142450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:55,600-Speed 9321.64 samples/sec   Loss 6.2947   LearningRate 0.0329   Epoch: 8   Global Step: 142460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:56,685-Speed 9442.30 samples/sec   Loss 6.2206   LearningRate 0.0329   Epoch: 8   Global Step: 142470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:57,741-Speed 9699.58 samples/sec   Loss 6.2531   LearningRate 0.0329   Epoch: 8   Global Step: 142480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:58,810-Speed 9593.48 samples/sec   Loss 6.2851   LearningRate 0.0329   Epoch: 8   Global Step: 142490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:16:59,888-Speed 9507.83 samples/sec   Loss 6.2810   LearningRate 0.0328   Epoch: 8   Global Step: 142500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:00,970-Speed 9470.77 samples/sec   Loss 6.2293   LearningRate 0.0328   Epoch: 8   Global Step: 142510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:02,049-Speed 9501.04 samples/sec   Loss 6.1958   LearningRate 0.0328   Epoch: 8   Global Step: 142520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:03,092-Speed 9823.26 samples/sec   Loss 6.1991   LearningRate 0.0328   Epoch: 8   Global Step: 142530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:04,144-Speed 9733.46 samples/sec   Loss 6.2493   LearningRate 0.0328   Epoch: 8   Global Step: 142540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:05,231-Speed 9432.73 samples/sec   Loss 6.2761   LearningRate 0.0328   Epoch: 8   Global Step: 142550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:06,271-Speed 9846.68 samples/sec   Loss 6.1682   LearningRate 0.0328   Epoch: 8   Global Step: 142560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:07,360-Speed 9412.12 samples/sec   Loss 6.2326   LearningRate 0.0328   Epoch: 8   Global Step: 142570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:08,435-Speed 9530.69 samples/sec   Loss 6.2244   LearningRate 0.0328   Epoch: 8   Global Step: 142580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:09,550-Speed 9190.31 samples/sec   Loss 6.2211   LearningRate 0.0328   Epoch: 8   Global Step: 142590   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:17:10,636-Speed 9431.64 samples/sec   Loss 6.2414   LearningRate 0.0328   Epoch: 8   Global Step: 142600   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:17:11,704-Speed 9592.10 samples/sec   Loss 6.2200   LearningRate 0.0328   Epoch: 8   Global Step: 142610   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:17:12,782-Speed 9505.69 samples/sec   Loss 6.1282   LearningRate 0.0328   Epoch: 8   Global Step: 142620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:13,843-Speed 9656.99 samples/sec   Loss 6.2288   LearningRate 0.0328   Epoch: 8   Global Step: 142630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:14,900-Speed 9690.67 samples/sec   Loss 6.3227   LearningRate 0.0328   Epoch: 8   Global Step: 142640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:15,968-Speed 9596.80 samples/sec   Loss 6.2098   LearningRate 0.0328   Epoch: 8   Global Step: 142650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:17,065-Speed 9338.03 samples/sec   Loss 6.2765   LearningRate 0.0328   Epoch: 8   Global Step: 142660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:18,181-Speed 9183.57 samples/sec   Loss 6.2747   LearningRate 0.0328   Epoch: 8   Global Step: 142670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:19,319-Speed 9375.05 samples/sec   Loss 6.2616   LearningRate 0.0328   Epoch: 8   Global Step: 142680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:20,357-Speed 9874.11 samples/sec   Loss 6.2327   LearningRate 0.0328   Epoch: 8   Global Step: 142690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:21,524-Speed 8783.58 samples/sec   Loss 6.2562   LearningRate 0.0328   Epoch: 8   Global Step: 142700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:22,716-Speed 9851.92 samples/sec   Loss 6.1769   LearningRate 0.0328   Epoch: 8   Global Step: 142710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:23,797-Speed 9478.26 samples/sec   Loss 6.3092   LearningRate 0.0328   Epoch: 8   Global Step: 142720   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:17:24,874-Speed 9509.97 samples/sec   Loss 6.1495   LearningRate 0.0328   Epoch: 8   Global Step: 142730   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:17:25,943-Speed 9590.62 samples/sec   Loss 6.1711   LearningRate 0.0328   Epoch: 8   Global Step: 142740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:27,031-Speed 9416.34 samples/sec   Loss 6.2648   LearningRate 0.0328   Epoch: 8   Global Step: 142750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:28,107-Speed 9516.40 samples/sec   Loss 6.1904   LearningRate 0.0328   Epoch: 8   Global Step: 142760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:29,172-Speed 9645.67 samples/sec   Loss 6.1821   LearningRate 0.0328   Epoch: 8   Global Step: 142770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:30,249-Speed 9512.32 samples/sec   Loss 6.2938   LearningRate 0.0328   Epoch: 8   Global Step: 142780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:31,392-Speed 8959.24 samples/sec   Loss 6.1568   LearningRate 0.0327   Epoch: 8   Global Step: 142790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:32,514-Speed 9134.04 samples/sec   Loss 6.1640   LearningRate 0.0327   Epoch: 8   Global Step: 142800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:33,580-Speed 9606.81 samples/sec   Loss 6.1909   LearningRate 0.0327   Epoch: 8   Global Step: 142810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:34,667-Speed 9431.16 samples/sec   Loss 6.3279   LearningRate 0.0327   Epoch: 8   Global Step: 142820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:35,707-Speed 9845.98 samples/sec   Loss 6.2023   LearningRate 0.0327   Epoch: 8   Global Step: 142830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:36,784-Speed 9514.21 samples/sec   Loss 6.2818   LearningRate 0.0327   Epoch: 8   Global Step: 142840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:37,881-Speed 9335.63 samples/sec   Loss 6.3060   LearningRate 0.0327   Epoch: 8   Global Step: 142850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:39,007-Speed 9106.10 samples/sec   Loss 6.2321   LearningRate 0.0327   Epoch: 8   Global Step: 142860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:40,102-Speed 9365.74 samples/sec   Loss 6.2477   LearningRate 0.0327   Epoch: 8   Global Step: 142870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:41,167-Speed 9613.33 samples/sec   Loss 6.2040   LearningRate 0.0327   Epoch: 8   Global Step: 142880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:42,275-Speed 9256.75 samples/sec   Loss 6.2951   LearningRate 0.0327   Epoch: 8   Global Step: 142890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:43,402-Speed 9090.63 samples/sec   Loss 6.2451   LearningRate 0.0327   Epoch: 8   Global Step: 142900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:44,502-Speed 9312.36 samples/sec   Loss 6.2632   LearningRate 0.0327   Epoch: 8   Global Step: 142910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:45,595-Speed 9374.85 samples/sec   Loss 6.2109   LearningRate 0.0327   Epoch: 8   Global Step: 142920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:46,694-Speed 9323.89 samples/sec   Loss 6.2735   LearningRate 0.0327   Epoch: 8   Global Step: 142930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:47,792-Speed 9333.96 samples/sec   Loss 6.2998   LearningRate 0.0327   Epoch: 8   Global Step: 142940   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:17:48,871-Speed 9488.82 samples/sec   Loss 6.2789   LearningRate 0.0327   Epoch: 8   Global Step: 142950   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:17:49,977-Speed 9265.04 samples/sec   Loss 6.2818   LearningRate 0.0327   Epoch: 8   Global Step: 142960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:51,065-Speed 9423.16 samples/sec   Loss 6.1552   LearningRate 0.0327   Epoch: 8   Global Step: 142970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:52,113-Speed 9782.40 samples/sec   Loss 6.2180   LearningRate 0.0327   Epoch: 8   Global Step: 142980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:53,188-Speed 9522.23 samples/sec   Loss 6.1537   LearningRate 0.0327   Epoch: 8   Global Step: 142990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:54,288-Speed 9316.05 samples/sec   Loss 6.1645   LearningRate 0.0327   Epoch: 8   Global Step: 143000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:55,384-Speed 9345.49 samples/sec   Loss 6.1936   LearningRate 0.0327   Epoch: 8   Global Step: 143010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:56,458-Speed 9549.98 samples/sec   Loss 6.2570   LearningRate 0.0327   Epoch: 8   Global Step: 143020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:57,520-Speed 9646.01 samples/sec   Loss 6.2889   LearningRate 0.0327   Epoch: 8   Global Step: 143030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:58,581-Speed 9657.76 samples/sec   Loss 6.1239   LearningRate 0.0327   Epoch: 8   Global Step: 143040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:17:59,646-Speed 9621.56 samples/sec   Loss 6.2491   LearningRate 0.0327   Epoch: 8   Global Step: 143050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:00,703-Speed 9695.21 samples/sec   Loss 6.2031   LearningRate 0.0327   Epoch: 8   Global Step: 143060   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:01,758-Speed 9711.58 samples/sec   Loss 6.1369   LearningRate 0.0327   Epoch: 8   Global Step: 143070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:02,802-Speed 9814.42 samples/sec   Loss 6.2129   LearningRate 0.0326   Epoch: 8   Global Step: 143080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:03,844-Speed 9834.62 samples/sec   Loss 6.3342   LearningRate 0.0326   Epoch: 8   Global Step: 143090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:04,924-Speed 9493.38 samples/sec   Loss 6.3851   LearningRate 0.0326   Epoch: 8   Global Step: 143100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:05,981-Speed 9684.78 samples/sec   Loss 6.1414   LearningRate 0.0326   Epoch: 8   Global Step: 143110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:07,087-Speed 9270.37 samples/sec   Loss 6.2778   LearningRate 0.0326   Epoch: 8   Global Step: 143120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:08,216-Speed 9070.24 samples/sec   Loss 6.3126   LearningRate 0.0326   Epoch: 8   Global Step: 143130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:09,311-Speed 9360.10 samples/sec   Loss 6.2359   LearningRate 0.0326   Epoch: 8   Global Step: 143140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:10,391-Speed 9484.41 samples/sec   Loss 6.1346   LearningRate 0.0326   Epoch: 8   Global Step: 143150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:11,441-Speed 9762.90 samples/sec   Loss 6.1756   LearningRate 0.0326   Epoch: 8   Global Step: 143160   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:18:12,497-Speed 9700.75 samples/sec   Loss 6.2401   LearningRate 0.0326   Epoch: 8   Global Step: 143170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:13,573-Speed 9518.84 samples/sec   Loss 6.2044   LearningRate 0.0326   Epoch: 8   Global Step: 143180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:14,644-Speed 9569.90 samples/sec   Loss 6.2151   LearningRate 0.0326   Epoch: 8   Global Step: 143190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:15,749-Speed 9274.10 samples/sec   Loss 6.3294   LearningRate 0.0326   Epoch: 8   Global Step: 143200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:16,864-Speed 9195.10 samples/sec   Loss 6.3524   LearningRate 0.0326   Epoch: 8   Global Step: 143210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:17,921-Speed 9698.30 samples/sec   Loss 6.3304   LearningRate 0.0326   Epoch: 8   Global Step: 143220   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:18,961-Speed 9855.96 samples/sec   Loss 6.2984   LearningRate 0.0326   Epoch: 8   Global Step: 143230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:20,069-Speed 9246.47 samples/sec   Loss 6.2623   LearningRate 0.0326   Epoch: 8   Global Step: 143240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:21,130-Speed 9656.34 samples/sec   Loss 6.3131   LearningRate 0.0326   Epoch: 8   Global Step: 143250   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:22,228-Speed 9329.14 samples/sec   Loss 6.2306   LearningRate 0.0326   Epoch: 8   Global Step: 143260   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:23,339-Speed 9223.89 samples/sec   Loss 6.1824   LearningRate 0.0326   Epoch: 8   Global Step: 143270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:24,470-Speed 9053.17 samples/sec   Loss 6.2631   LearningRate 0.0326   Epoch: 8   Global Step: 143280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:25,541-Speed 9568.45 samples/sec   Loss 6.1838   LearningRate 0.0326   Epoch: 8   Global Step: 143290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:26,620-Speed 9498.19 samples/sec   Loss 6.3309   LearningRate 0.0326   Epoch: 8   Global Step: 143300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:27,728-Speed 9250.13 samples/sec   Loss 6.2807   LearningRate 0.0326   Epoch: 8   Global Step: 143310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:28,817-Speed 9401.94 samples/sec   Loss 6.1661   LearningRate 0.0326   Epoch: 8   Global Step: 143320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:29,845-Speed 9974.41 samples/sec   Loss 6.2537   LearningRate 0.0326   Epoch: 8   Global Step: 143330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:30,901-Speed 9703.16 samples/sec   Loss 6.3380   LearningRate 0.0326   Epoch: 8   Global Step: 143340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:31,957-Speed 9694.04 samples/sec   Loss 6.1979   LearningRate 0.0326   Epoch: 8   Global Step: 143350   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:33,039-Speed 9469.99 samples/sec   Loss 6.2214   LearningRate 0.0326   Epoch: 8   Global Step: 143360   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:34,101-Speed 9657.61 samples/sec   Loss 6.2294   LearningRate 0.0325   Epoch: 8   Global Step: 143370   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:35,159-Speed 9687.46 samples/sec   Loss 6.2500   LearningRate 0.0325   Epoch: 8   Global Step: 143380   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:36,228-Speed 9582.02 samples/sec   Loss 6.1965   LearningRate 0.0325   Epoch: 8   Global Step: 143390   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:37,301-Speed 9554.92 samples/sec   Loss 6.2363   LearningRate 0.0325   Epoch: 8   Global Step: 143400   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:38,334-Speed 9919.97 samples/sec   Loss 6.1952   LearningRate 0.0325   Epoch: 8   Global Step: 143410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:39,361-Speed 9971.64 samples/sec   Loss 6.1859   LearningRate 0.0325   Epoch: 8   Global Step: 143420   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:40,453-Speed 9381.29 samples/sec   Loss 6.1917   LearningRate 0.0325   Epoch: 8   Global Step: 143430   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:18:41,546-Speed 9371.53 samples/sec   Loss 6.3974   LearningRate 0.0325   Epoch: 8   Global Step: 143440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:42,585-Speed 9868.27 samples/sec   Loss 6.2646   LearningRate 0.0325   Epoch: 8   Global Step: 143450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:43,660-Speed 9525.81 samples/sec   Loss 6.3012   LearningRate 0.0325   Epoch: 8   Global Step: 143460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:44,703-Speed 9826.51 samples/sec   Loss 6.1367   LearningRate 0.0325   Epoch: 8   Global Step: 143470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:45,812-Speed 9241.26 samples/sec   Loss 6.2288   LearningRate 0.0325   Epoch: 8   Global Step: 143480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:46,912-Speed 9317.61 samples/sec   Loss 6.2516   LearningRate 0.0325   Epoch: 8   Global Step: 143490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:47,951-Speed 9855.10 samples/sec   Loss 6.1820   LearningRate 0.0325   Epoch: 8   Global Step: 143500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:49,021-Speed 9578.63 samples/sec   Loss 6.2788   LearningRate 0.0325   Epoch: 8   Global Step: 143510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:50,068-Speed 9787.50 samples/sec   Loss 6.2282   LearningRate 0.0325   Epoch: 8   Global Step: 143520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:51,119-Speed 9749.94 samples/sec   Loss 6.2265   LearningRate 0.0325   Epoch: 8   Global Step: 143530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:52,187-Speed 9590.23 samples/sec   Loss 6.3487   LearningRate 0.0325   Epoch: 8   Global Step: 143540   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:18:53,261-Speed 9540.13 samples/sec   Loss 6.2757   LearningRate 0.0325   Epoch: 8   Global Step: 143550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:54,368-Speed 9261.69 samples/sec   Loss 6.1997   LearningRate 0.0325   Epoch: 8   Global Step: 143560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:55,453-Speed 9440.49 samples/sec   Loss 6.2870   LearningRate 0.0325   Epoch: 8   Global Step: 143570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:56,557-Speed 9277.82 samples/sec   Loss 6.2815   LearningRate 0.0325   Epoch: 8   Global Step: 143580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:57,646-Speed 9410.57 samples/sec   Loss 6.2080   LearningRate 0.0325   Epoch: 8   Global Step: 143590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:58,714-Speed 9600.41 samples/sec   Loss 6.2662   LearningRate 0.0325   Epoch: 8   Global Step: 143600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:18:59,798-Speed 9452.46 samples/sec   Loss 6.2067   LearningRate 0.0325   Epoch: 8   Global Step: 143610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:00,888-Speed 9393.18 samples/sec   Loss 6.3189   LearningRate 0.0325   Epoch: 8   Global Step: 143620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:01,968-Speed 9492.29 samples/sec   Loss 6.2190   LearningRate 0.0325   Epoch: 8   Global Step: 143630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:03,079-Speed 9220.64 samples/sec   Loss 6.2286   LearningRate 0.0325   Epoch: 8   Global Step: 143640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:04,138-Speed 9681.60 samples/sec   Loss 6.2676   LearningRate 0.0325   Epoch: 8   Global Step: 143650   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:19:05,158-Speed 10042.88 samples/sec   Loss 6.2618   LearningRate 0.0324   Epoch: 8   Global Step: 143660   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:19:06,226-Speed 9594.31 samples/sec   Loss 6.2516   LearningRate 0.0324   Epoch: 8   Global Step: 143670   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:19:07,286-Speed 9659.25 samples/sec   Loss 6.2950   LearningRate 0.0324   Epoch: 8   Global Step: 143680   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:19:08,381-Speed 9357.86 samples/sec   Loss 6.2559   LearningRate 0.0324   Epoch: 8   Global Step: 143690   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:19:09,423-Speed 9833.21 samples/sec   Loss 6.3123   LearningRate 0.0324   Epoch: 8   Global Step: 143700   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:19:10,472-Speed 9772.17 samples/sec   Loss 6.2791   LearningRate 0.0324   Epoch: 8   Global Step: 143710   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:19:11,554-Speed 9464.08 samples/sec   Loss 6.2831   LearningRate 0.0324   Epoch: 8   Global Step: 143720   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:19:12,601-Speed 9790.64 samples/sec   Loss 6.2373   LearningRate 0.0324   Epoch: 8   Global Step: 143730   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:19:13,662-Speed 9656.83 samples/sec   Loss 6.2607   LearningRate 0.0324   Epoch: 8   Global Step: 143740   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:19:14,709-Speed 9784.02 samples/sec   Loss 6.2734   LearningRate 0.0324   Epoch: 8   Global Step: 143750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:15,793-Speed 9451.46 samples/sec   Loss 6.3649   LearningRate 0.0324   Epoch: 8   Global Step: 143760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:16,894-Speed 9309.93 samples/sec   Loss 6.2950   LearningRate 0.0324   Epoch: 8   Global Step: 143770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:17,948-Speed 9726.12 samples/sec   Loss 6.1738   LearningRate 0.0324   Epoch: 8   Global Step: 143780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:19,009-Speed 9650.28 samples/sec   Loss 6.2775   LearningRate 0.0324   Epoch: 8   Global Step: 143790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:20,105-Speed 9347.30 samples/sec   Loss 6.1722   LearningRate 0.0324   Epoch: 8   Global Step: 143800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:21,142-Speed 9883.96 samples/sec   Loss 6.2020   LearningRate 0.0324   Epoch: 8   Global Step: 143810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:22,217-Speed 9529.24 samples/sec   Loss 6.1329   LearningRate 0.0324   Epoch: 8   Global Step: 143820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:23,285-Speed 9596.31 samples/sec   Loss 6.2327   LearningRate 0.0324   Epoch: 8   Global Step: 143830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:24,378-Speed 9378.29 samples/sec   Loss 6.1937   LearningRate 0.0324   Epoch: 8   Global Step: 143840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:25,451-Speed 9546.41 samples/sec   Loss 6.1189   LearningRate 0.0324   Epoch: 8   Global Step: 143850   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:19:26,518-Speed 9599.86 samples/sec   Loss 6.2433   LearningRate 0.0324   Epoch: 8   Global Step: 143860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:27,624-Speed 9267.27 samples/sec   Loss 6.2061   LearningRate 0.0324   Epoch: 8   Global Step: 143870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:28,713-Speed 9411.32 samples/sec   Loss 6.2265   LearningRate 0.0324   Epoch: 8   Global Step: 143880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:29,785-Speed 9553.27 samples/sec   Loss 6.1611   LearningRate 0.0324   Epoch: 8   Global Step: 143890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:30,864-Speed 9501.95 samples/sec   Loss 6.2178   LearningRate 0.0324   Epoch: 8   Global Step: 143900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:31,913-Speed 9764.52 samples/sec   Loss 6.1456   LearningRate 0.0324   Epoch: 8   Global Step: 143910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:32,974-Speed 9656.05 samples/sec   Loss 6.1864   LearningRate 0.0324   Epoch: 8   Global Step: 143920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:34,050-Speed 9528.91 samples/sec   Loss 6.2505   LearningRate 0.0324   Epoch: 8   Global Step: 143930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:35,098-Speed 9773.89 samples/sec   Loss 6.1750   LearningRate 0.0324   Epoch: 8   Global Step: 143940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:36,141-Speed 9828.10 samples/sec   Loss 6.2448   LearningRate 0.0324   Epoch: 8   Global Step: 143950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:37,216-Speed 9526.15 samples/sec   Loss 6.2507   LearningRate 0.0323   Epoch: 8   Global Step: 143960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:38,289-Speed 9545.88 samples/sec   Loss 6.1799   LearningRate 0.0323   Epoch: 8   Global Step: 143970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:39,395-Speed 9266.24 samples/sec   Loss 6.2208   LearningRate 0.0323   Epoch: 8   Global Step: 143980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:40,477-Speed 9473.00 samples/sec   Loss 6.2500   LearningRate 0.0323   Epoch: 8   Global Step: 143990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:19:41,552-Speed 9527.91 samples/sec   Loss 6.2702   LearningRate 0.0323   Epoch: 8   Global Step: 144000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:20:03,667-[lfw][144000]XNorm: 10.260583
Training: 2022-04-11 17:20:03,668-[lfw][144000]Accuracy-Flip: 0.99600+-0.00281
Training: 2022-04-11 17:20:03,668-[lfw][144000]Accuracy-Highest: 0.99683
Training: 2022-04-11 17:20:29,212-[cfp_fp][144000]XNorm: 8.743006
Training: 2022-04-11 17:20:29,213-[cfp_fp][144000]Accuracy-Flip: 0.95929+-0.00957
Training: 2022-04-11 17:20:29,213-[cfp_fp][144000]Accuracy-Highest: 0.96500
Training: 2022-04-11 17:20:51,185-[agedb_30][144000]XNorm: 9.975129
Training: 2022-04-11 17:20:51,185-[agedb_30][144000]Accuracy-Flip: 0.96517+-0.01061
Training: 2022-04-11 17:20:51,186-[agedb_30][144000]Accuracy-Highest: 0.96650
Training: 2022-04-11 17:20:52,244-Speed 144.85 samples/sec   Loss 6.2565   LearningRate 0.0323   Epoch: 8   Global Step: 144010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:20:53,305-Speed 9664.77 samples/sec   Loss 6.2612   LearningRate 0.0323   Epoch: 8   Global Step: 144020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:20:54,349-Speed 9814.89 samples/sec   Loss 6.2012   LearningRate 0.0323   Epoch: 8   Global Step: 144030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:20:55,476-Speed 9092.34 samples/sec   Loss 6.2585   LearningRate 0.0323   Epoch: 8   Global Step: 144040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:20:56,558-Speed 9462.44 samples/sec   Loss 6.1358   LearningRate 0.0323   Epoch: 8   Global Step: 144050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:20:57,633-Speed 9534.99 samples/sec   Loss 6.1911   LearningRate 0.0323   Epoch: 8   Global Step: 144060   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:20:58,734-Speed 9300.71 samples/sec   Loss 6.2470   LearningRate 0.0323   Epoch: 8   Global Step: 144070   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:20:59,780-Speed 9795.96 samples/sec   Loss 6.3082   LearningRate 0.0323   Epoch: 8   Global Step: 144080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:00,841-Speed 9662.61 samples/sec   Loss 6.1587   LearningRate 0.0323   Epoch: 8   Global Step: 144090   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:01,896-Speed 9709.61 samples/sec   Loss 6.1122   LearningRate 0.0323   Epoch: 8   Global Step: 144100   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:02,946-Speed 9759.20 samples/sec   Loss 6.3351   LearningRate 0.0323   Epoch: 8   Global Step: 144110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:04,016-Speed 9578.00 samples/sec   Loss 6.1769   LearningRate 0.0323   Epoch: 8   Global Step: 144120   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:05,083-Speed 9599.62 samples/sec   Loss 6.1823   LearningRate 0.0323   Epoch: 8   Global Step: 144130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:06,165-Speed 9477.08 samples/sec   Loss 6.3643   LearningRate 0.0323   Epoch: 8   Global Step: 144140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:07,243-Speed 9504.49 samples/sec   Loss 6.3187   LearningRate 0.0323   Epoch: 8   Global Step: 144150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:08,317-Speed 9531.31 samples/sec   Loss 6.2555   LearningRate 0.0323   Epoch: 8   Global Step: 144160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:09,381-Speed 9632.90 samples/sec   Loss 6.2109   LearningRate 0.0323   Epoch: 8   Global Step: 144170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:10,514-Speed 9051.16 samples/sec   Loss 6.2943   LearningRate 0.0323   Epoch: 8   Global Step: 144180   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:21:11,608-Speed 9360.50 samples/sec   Loss 6.3558   LearningRate 0.0323   Epoch: 8   Global Step: 144190   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:21:12,680-Speed 9564.86 samples/sec   Loss 6.2556   LearningRate 0.0323   Epoch: 8   Global Step: 144200   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:21:13,790-Speed 9225.74 samples/sec   Loss 6.2534   LearningRate 0.0323   Epoch: 8   Global Step: 144210   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:21:14,859-Speed 9589.05 samples/sec   Loss 6.1214   LearningRate 0.0323   Epoch: 8   Global Step: 144220   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:21:15,978-Speed 9155.28 samples/sec   Loss 6.1480   LearningRate 0.0323   Epoch: 8   Global Step: 144230   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:21:17,056-Speed 9500.77 samples/sec   Loss 6.2125   LearningRate 0.0323   Epoch: 8   Global Step: 144240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:18,162-Speed 9260.11 samples/sec   Loss 6.2335   LearningRate 0.0322   Epoch: 8   Global Step: 144250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:19,254-Speed 9389.99 samples/sec   Loss 6.2786   LearningRate 0.0322   Epoch: 8   Global Step: 144260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:20,374-Speed 9149.84 samples/sec   Loss 6.2976   LearningRate 0.0322   Epoch: 8   Global Step: 144270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:21,440-Speed 9607.42 samples/sec   Loss 6.1702   LearningRate 0.0322   Epoch: 8   Global Step: 144280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:22,507-Speed 9607.50 samples/sec   Loss 6.1547   LearningRate 0.0322   Epoch: 8   Global Step: 144290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:23,591-Speed 9451.33 samples/sec   Loss 6.1791   LearningRate 0.0322   Epoch: 8   Global Step: 144300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:24,685-Speed 9369.66 samples/sec   Loss 6.2539   LearningRate 0.0322   Epoch: 8   Global Step: 144310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:25,781-Speed 9345.61 samples/sec   Loss 6.3106   LearningRate 0.0322   Epoch: 8   Global Step: 144320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:26,878-Speed 9338.23 samples/sec   Loss 6.2544   LearningRate 0.0322   Epoch: 8   Global Step: 144330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:27,993-Speed 9188.67 samples/sec   Loss 6.3441   LearningRate 0.0322   Epoch: 8   Global Step: 144340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:29,061-Speed 9588.86 samples/sec   Loss 6.2328   LearningRate 0.0322   Epoch: 8   Global Step: 144350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:30,148-Speed 9424.42 samples/sec   Loss 6.2642   LearningRate 0.0322   Epoch: 8   Global Step: 144360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:31,206-Speed 9686.99 samples/sec   Loss 6.2077   LearningRate 0.0322   Epoch: 8   Global Step: 144370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:32,289-Speed 9462.61 samples/sec   Loss 6.2781   LearningRate 0.0322   Epoch: 8   Global Step: 144380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:33,408-Speed 9154.93 samples/sec   Loss 6.2507   LearningRate 0.0322   Epoch: 8   Global Step: 144390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:34,463-Speed 9718.59 samples/sec   Loss 6.2963   LearningRate 0.0322   Epoch: 8   Global Step: 144400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:35,553-Speed 9397.04 samples/sec   Loss 6.2991   LearningRate 0.0322   Epoch: 8   Global Step: 144410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:36,620-Speed 9599.45 samples/sec   Loss 6.2095   LearningRate 0.0322   Epoch: 8   Global Step: 144420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:37,697-Speed 9513.48 samples/sec   Loss 6.1979   LearningRate 0.0322   Epoch: 8   Global Step: 144430   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:38,797-Speed 9319.28 samples/sec   Loss 6.2582   LearningRate 0.0322   Epoch: 8   Global Step: 144440   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:21:39,858-Speed 9658.61 samples/sec   Loss 6.1645   LearningRate 0.0322   Epoch: 8   Global Step: 144450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:40,948-Speed 9399.38 samples/sec   Loss 6.2045   LearningRate 0.0322   Epoch: 8   Global Step: 144460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:42,038-Speed 9402.20 samples/sec   Loss 6.2854   LearningRate 0.0322   Epoch: 8   Global Step: 144470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:43,110-Speed 9563.32 samples/sec   Loss 6.2839   LearningRate 0.0322   Epoch: 8   Global Step: 144480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:44,222-Speed 9208.62 samples/sec   Loss 6.2386   LearningRate 0.0322   Epoch: 8   Global Step: 144490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:45,294-Speed 9556.09 samples/sec   Loss 6.2900   LearningRate 0.0322   Epoch: 8   Global Step: 144500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:46,388-Speed 9365.79 samples/sec   Loss 6.3215   LearningRate 0.0322   Epoch: 8   Global Step: 144510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:47,485-Speed 9342.23 samples/sec   Loss 6.1237   LearningRate 0.0322   Epoch: 8   Global Step: 144520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:48,578-Speed 9374.40 samples/sec   Loss 6.2090   LearningRate 0.0322   Epoch: 8   Global Step: 144530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:49,632-Speed 9718.76 samples/sec   Loss 6.2492   LearningRate 0.0322   Epoch: 8   Global Step: 144540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:50,685-Speed 9735.84 samples/sec   Loss 6.3475   LearningRate 0.0321   Epoch: 8   Global Step: 144550   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:51,779-Speed 9358.27 samples/sec   Loss 6.1973   LearningRate 0.0321   Epoch: 8   Global Step: 144560   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:52,853-Speed 9541.51 samples/sec   Loss 6.2573   LearningRate 0.0321   Epoch: 8   Global Step: 144570   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:53,947-Speed 9373.93 samples/sec   Loss 6.2470   LearningRate 0.0321   Epoch: 8   Global Step: 144580   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:55,016-Speed 9576.96 samples/sec   Loss 6.1895   LearningRate 0.0321   Epoch: 8   Global Step: 144590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:56,120-Speed 9281.23 samples/sec   Loss 6.1667   LearningRate 0.0321   Epoch: 8   Global Step: 144600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:57,206-Speed 9432.38 samples/sec   Loss 6.2180   LearningRate 0.0321   Epoch: 8   Global Step: 144610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:58,307-Speed 9310.94 samples/sec   Loss 6.1785   LearningRate 0.0321   Epoch: 8   Global Step: 144620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:21:59,393-Speed 9435.55 samples/sec   Loss 6.2345   LearningRate 0.0321   Epoch: 8   Global Step: 144630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:00,466-Speed 9556.67 samples/sec   Loss 6.2989   LearningRate 0.0321   Epoch: 8   Global Step: 144640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:01,575-Speed 9232.28 samples/sec   Loss 6.3180   LearningRate 0.0321   Epoch: 8   Global Step: 144650   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:22:02,669-Speed 9368.54 samples/sec   Loss 6.1660   LearningRate 0.0321   Epoch: 8   Global Step: 144660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:03,761-Speed 9385.51 samples/sec   Loss 6.2749   LearningRate 0.0321   Epoch: 8   Global Step: 144670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:04,865-Speed 9278.38 samples/sec   Loss 6.1892   LearningRate 0.0321   Epoch: 8   Global Step: 144680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:05,930-Speed 9629.23 samples/sec   Loss 6.2649   LearningRate 0.0321   Epoch: 8   Global Step: 144690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:07,006-Speed 9520.26 samples/sec   Loss 6.2644   LearningRate 0.0321   Epoch: 8   Global Step: 144700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:08,066-Speed 9663.64 samples/sec   Loss 6.1916   LearningRate 0.0321   Epoch: 8   Global Step: 144710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:09,190-Speed 9115.54 samples/sec   Loss 6.1447   LearningRate 0.0321   Epoch: 8   Global Step: 144720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:10,384-Speed 9820.37 samples/sec   Loss 6.2223   LearningRate 0.0321   Epoch: 8   Global Step: 144730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:11,460-Speed 9521.36 samples/sec   Loss 6.1941   LearningRate 0.0321   Epoch: 8   Global Step: 144740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:12,534-Speed 9537.89 samples/sec   Loss 6.2897   LearningRate 0.0321   Epoch: 8   Global Step: 144750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:13,598-Speed 9629.40 samples/sec   Loss 6.2923   LearningRate 0.0321   Epoch: 8   Global Step: 144760   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:22:14,694-Speed 9348.50 samples/sec   Loss 6.2082   LearningRate 0.0321   Epoch: 8   Global Step: 144770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:15,738-Speed 9818.62 samples/sec   Loss 6.2806   LearningRate 0.0321   Epoch: 8   Global Step: 144780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:16,803-Speed 9620.95 samples/sec   Loss 6.2497   LearningRate 0.0321   Epoch: 8   Global Step: 144790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:17,859-Speed 9700.43 samples/sec   Loss 6.1638   LearningRate 0.0321   Epoch: 8   Global Step: 144800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:18,948-Speed 9413.82 samples/sec   Loss 6.2501   LearningRate 0.0321   Epoch: 8   Global Step: 144810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:20,050-Speed 9294.91 samples/sec   Loss 6.1645   LearningRate 0.0321   Epoch: 8   Global Step: 144820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:21,154-Speed 9278.37 samples/sec   Loss 6.1694   LearningRate 0.0321   Epoch: 8   Global Step: 144830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:22,231-Speed 9521.40 samples/sec   Loss 6.2272   LearningRate 0.0320   Epoch: 8   Global Step: 144840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:23,324-Speed 9372.30 samples/sec   Loss 6.2993   LearningRate 0.0320   Epoch: 8   Global Step: 144850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:24,415-Speed 9390.38 samples/sec   Loss 6.2309   LearningRate 0.0320   Epoch: 8   Global Step: 144860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:25,478-Speed 9643.40 samples/sec   Loss 6.2774   LearningRate 0.0320   Epoch: 8   Global Step: 144870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:26,563-Speed 9440.06 samples/sec   Loss 6.2266   LearningRate 0.0320   Epoch: 8   Global Step: 144880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:27,670-Speed 9255.29 samples/sec   Loss 6.3276   LearningRate 0.0320   Epoch: 8   Global Step: 144890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:28,731-Speed 9659.80 samples/sec   Loss 6.2670   LearningRate 0.0320   Epoch: 8   Global Step: 144900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:29,820-Speed 9409.61 samples/sec   Loss 6.3036   LearningRate 0.0320   Epoch: 8   Global Step: 144910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:30,934-Speed 9199.07 samples/sec   Loss 6.2628   LearningRate 0.0320   Epoch: 8   Global Step: 144920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:32,019-Speed 9441.61 samples/sec   Loss 6.1363   LearningRate 0.0320   Epoch: 8   Global Step: 144930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:33,097-Speed 9508.60 samples/sec   Loss 6.2867   LearningRate 0.0320   Epoch: 8   Global Step: 144940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:34,233-Speed 9019.58 samples/sec   Loss 6.2660   LearningRate 0.0320   Epoch: 8   Global Step: 144950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:35,318-Speed 9449.95 samples/sec   Loss 6.2517   LearningRate 0.0320   Epoch: 8   Global Step: 144960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:36,401-Speed 9459.84 samples/sec   Loss 6.2518   LearningRate 0.0320   Epoch: 8   Global Step: 144970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:37,509-Speed 9245.73 samples/sec   Loss 6.2015   LearningRate 0.0320   Epoch: 8   Global Step: 144980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:38,602-Speed 9377.93 samples/sec   Loss 6.1984   LearningRate 0.0320   Epoch: 8   Global Step: 144990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:39,677-Speed 9531.43 samples/sec   Loss 6.1883   LearningRate 0.0320   Epoch: 8   Global Step: 145000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:40,750-Speed 9549.40 samples/sec   Loss 6.1666   LearningRate 0.0320   Epoch: 8   Global Step: 145010   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:22:41,793-Speed 9829.55 samples/sec   Loss 6.1972   LearningRate 0.0320   Epoch: 8   Global Step: 145020   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:22:42,881-Speed 9410.71 samples/sec   Loss 6.2188   LearningRate 0.0320   Epoch: 8   Global Step: 145030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:22:43,938-Speed 9699.09 samples/sec   Loss 6.2294   LearningRate 0.0320   Epoch: 8   Global Step: 145040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:22:45,023-Speed 9440.36 samples/sec   Loss 6.2728   LearningRate 0.0320   Epoch: 8   Global Step: 145050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:22:46,114-Speed 9388.28 samples/sec   Loss 6.2114   LearningRate 0.0320   Epoch: 8   Global Step: 145060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:22:47,271-Speed 8861.41 samples/sec   Loss 6.1703   LearningRate 0.0320   Epoch: 8   Global Step: 145070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:22:48,370-Speed 9321.82 samples/sec   Loss 6.1500   LearningRate 0.0320   Epoch: 8   Global Step: 145080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:22:49,466-Speed 9357.28 samples/sec   Loss 6.2433   LearningRate 0.0320   Epoch: 8   Global Step: 145090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:22:50,574-Speed 9248.39 samples/sec   Loss 6.2644   LearningRate 0.0320   Epoch: 8   Global Step: 145100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:22:51,647-Speed 9544.70 samples/sec   Loss 6.3326   LearningRate 0.0320   Epoch: 8   Global Step: 145110   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:22:52,722-Speed 9535.22 samples/sec   Loss 6.3969   LearningRate 0.0320   Epoch: 8   Global Step: 145120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:22:53,810-Speed 9418.45 samples/sec   Loss 6.3117   LearningRate 0.0320   Epoch: 8   Global Step: 145130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:22:54,903-Speed 9375.21 samples/sec   Loss 6.1992   LearningRate 0.0319   Epoch: 8   Global Step: 145140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:22:55,980-Speed 9521.30 samples/sec   Loss 6.1721   LearningRate 0.0319   Epoch: 8   Global Step: 145150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:22:57,056-Speed 9521.56 samples/sec   Loss 6.2500   LearningRate 0.0319   Epoch: 8   Global Step: 145160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:22:58,152-Speed 9346.00 samples/sec   Loss 6.1932   LearningRate 0.0319   Epoch: 8   Global Step: 145170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:22:59,225-Speed 9548.54 samples/sec   Loss 6.1775   LearningRate 0.0319   Epoch: 8   Global Step: 145180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:23:00,270-Speed 9801.37 samples/sec   Loss 6.1585   LearningRate 0.0319   Epoch: 8   Global Step: 145190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:23:01,390-Speed 9145.90 samples/sec   Loss 6.2451   LearningRate 0.0319   Epoch: 8   Global Step: 145200   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:23:02,471-Speed 9483.78 samples/sec   Loss 6.2057   LearningRate 0.0319   Epoch: 8   Global Step: 145210   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:23:03,581-Speed 9234.72 samples/sec   Loss 6.2751   LearningRate 0.0319   Epoch: 8   Global Step: 145220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:04,662-Speed 9480.37 samples/sec   Loss 6.2714   LearningRate 0.0319   Epoch: 8   Global Step: 145230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:05,763-Speed 9304.27 samples/sec   Loss 6.2974   LearningRate 0.0319   Epoch: 8   Global Step: 145240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:06,851-Speed 9413.61 samples/sec   Loss 6.3141   LearningRate 0.0319   Epoch: 8   Global Step: 145250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:07,957-Speed 9270.79 samples/sec   Loss 6.1645   LearningRate 0.0319   Epoch: 8   Global Step: 145260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:09,057-Speed 9312.30 samples/sec   Loss 6.2021   LearningRate 0.0319   Epoch: 8   Global Step: 145270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:10,140-Speed 9462.51 samples/sec   Loss 6.2153   LearningRate 0.0319   Epoch: 8   Global Step: 145280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:11,239-Speed 9321.91 samples/sec   Loss 6.2191   LearningRate 0.0319   Epoch: 8   Global Step: 145290   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:12,344-Speed 9272.77 samples/sec   Loss 6.1880   LearningRate 0.0319   Epoch: 8   Global Step: 145300   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:13,501-Speed 8858.73 samples/sec   Loss 6.2308   LearningRate 0.0319   Epoch: 8   Global Step: 145310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:14,549-Speed 9773.04 samples/sec   Loss 6.1481   LearningRate 0.0319   Epoch: 8   Global Step: 145320   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:23:15,608-Speed 9682.44 samples/sec   Loss 6.2717   LearningRate 0.0319   Epoch: 8   Global Step: 145330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:16,668-Speed 9665.97 samples/sec   Loss 6.2366   LearningRate 0.0319   Epoch: 8   Global Step: 145340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:17,769-Speed 9300.18 samples/sec   Loss 6.1625   LearningRate 0.0319   Epoch: 8   Global Step: 145350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:18,844-Speed 9530.26 samples/sec   Loss 6.1877   LearningRate 0.0319   Epoch: 8   Global Step: 145360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:19,915-Speed 9566.62 samples/sec   Loss 6.2514   LearningRate 0.0319   Epoch: 8   Global Step: 145370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:20,977-Speed 9653.12 samples/sec   Loss 6.2602   LearningRate 0.0319   Epoch: 8   Global Step: 145380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:22,041-Speed 9632.14 samples/sec   Loss 6.1522   LearningRate 0.0319   Epoch: 8   Global Step: 145390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:23,125-Speed 9469.93 samples/sec   Loss 6.1623   LearningRate 0.0319   Epoch: 8   Global Step: 145400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:24,186-Speed 9658.53 samples/sec   Loss 6.2341   LearningRate 0.0319   Epoch: 8   Global Step: 145410   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:25,262-Speed 9527.04 samples/sec   Loss 6.1304   LearningRate 0.0319   Epoch: 8   Global Step: 145420   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:26,337-Speed 9524.64 samples/sec   Loss 6.2352   LearningRate 0.0318   Epoch: 8   Global Step: 145430   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:23:27,430-Speed 9376.46 samples/sec   Loss 6.2620   LearningRate 0.0318   Epoch: 8   Global Step: 145440   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:28,494-Speed 9625.20 samples/sec   Loss 6.3057   LearningRate 0.0318   Epoch: 8   Global Step: 145450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:29,566-Speed 9558.54 samples/sec   Loss 6.1153   LearningRate 0.0318   Epoch: 8   Global Step: 145460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:30,647-Speed 9479.53 samples/sec   Loss 6.1845   LearningRate 0.0318   Epoch: 8   Global Step: 145470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:31,707-Speed 9671.44 samples/sec   Loss 6.1607   LearningRate 0.0318   Epoch: 8   Global Step: 145480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:32,769-Speed 9646.19 samples/sec   Loss 6.2154   LearningRate 0.0318   Epoch: 8   Global Step: 145490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:33,849-Speed 9495.96 samples/sec   Loss 6.2186   LearningRate 0.0318   Epoch: 8   Global Step: 145500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:34,906-Speed 9696.58 samples/sec   Loss 6.2580   LearningRate 0.0318   Epoch: 8   Global Step: 145510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:23:36,007-Speed 9307.09 samples/sec   Loss 6.1972   LearningRate 0.0318   Epoch: 8   Global Step: 145520   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:23:37,130-Speed 9116.93 samples/sec   Loss 6.2518   LearningRate 0.0318   Epoch: 8   Global Step: 145530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:23:38,271-Speed 8982.58 samples/sec   Loss 6.2559   LearningRate 0.0318   Epoch: 8   Global Step: 145540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:23:39,368-Speed 9337.83 samples/sec   Loss 6.2414   LearningRate 0.0318   Epoch: 8   Global Step: 145550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:23:40,422-Speed 9721.15 samples/sec   Loss 6.2656   LearningRate 0.0318   Epoch: 8   Global Step: 145560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:23:41,506-Speed 9460.31 samples/sec   Loss 6.1907   LearningRate 0.0318   Epoch: 8   Global Step: 145570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:23:42,618-Speed 9212.52 samples/sec   Loss 6.2470   LearningRate 0.0318   Epoch: 8   Global Step: 145580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:23:43,717-Speed 9325.52 samples/sec   Loss 6.2964   LearningRate 0.0318   Epoch: 8   Global Step: 145590   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:23:44,806-Speed 9407.88 samples/sec   Loss 6.1682   LearningRate 0.0318   Epoch: 8   Global Step: 145600   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:23:45,845-Speed 9853.93 samples/sec   Loss 6.1501   LearningRate 0.0318   Epoch: 8   Global Step: 145610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:46,930-Speed 9448.94 samples/sec   Loss 6.3069   LearningRate 0.0318   Epoch: 8   Global Step: 145620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:48,003-Speed 9549.46 samples/sec   Loss 6.1700   LearningRate 0.0318   Epoch: 8   Global Step: 145630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:49,133-Speed 9065.73 samples/sec   Loss 6.3285   LearningRate 0.0318   Epoch: 8   Global Step: 145640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:50,243-Speed 9228.59 samples/sec   Loss 6.2477   LearningRate 0.0318   Epoch: 8   Global Step: 145650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:51,352-Speed 9236.94 samples/sec   Loss 6.3014   LearningRate 0.0318   Epoch: 8   Global Step: 145660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:52,449-Speed 9345.64 samples/sec   Loss 6.3323   LearningRate 0.0318   Epoch: 8   Global Step: 145670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:53,532-Speed 9461.79 samples/sec   Loss 6.1538   LearningRate 0.0318   Epoch: 8   Global Step: 145680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:54,591-Speed 9674.05 samples/sec   Loss 6.2186   LearningRate 0.0318   Epoch: 8   Global Step: 145690   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:55,690-Speed 9322.47 samples/sec   Loss 6.2435   LearningRate 0.0318   Epoch: 8   Global Step: 145700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:56,831-Speed 8980.08 samples/sec   Loss 6.1795   LearningRate 0.0318   Epoch: 8   Global Step: 145710   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:23:57,882-Speed 9746.50 samples/sec   Loss 6.1658   LearningRate 0.0318   Epoch: 8   Global Step: 145720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:23:58,964-Speed 9471.12 samples/sec   Loss 6.1630   LearningRate 0.0317   Epoch: 8   Global Step: 145730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:00,066-Speed 9295.30 samples/sec   Loss 6.1286   LearningRate 0.0317   Epoch: 8   Global Step: 145740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:01,127-Speed 9660.73 samples/sec   Loss 6.1650   LearningRate 0.0317   Epoch: 8   Global Step: 145750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:02,201-Speed 9537.99 samples/sec   Loss 6.2190   LearningRate 0.0317   Epoch: 8   Global Step: 145760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:03,293-Speed 9377.55 samples/sec   Loss 6.2177   LearningRate 0.0317   Epoch: 8   Global Step: 145770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:04,360-Speed 9609.58 samples/sec   Loss 6.2067   LearningRate 0.0317   Epoch: 8   Global Step: 145780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:05,438-Speed 9505.72 samples/sec   Loss 6.2470   LearningRate 0.0317   Epoch: 8   Global Step: 145790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:06,546-Speed 9249.66 samples/sec   Loss 6.1837   LearningRate 0.0317   Epoch: 8   Global Step: 145800   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:07,634-Speed 9409.53 samples/sec   Loss 6.1293   LearningRate 0.0317   Epoch: 8   Global Step: 145810   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:08,715-Speed 9480.66 samples/sec   Loss 6.2213   LearningRate 0.0317   Epoch: 8   Global Step: 145820   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:09,809-Speed 9368.31 samples/sec   Loss 6.3019   LearningRate 0.0317   Epoch: 8   Global Step: 145830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:10,870-Speed 9669.72 samples/sec   Loss 6.2080   LearningRate 0.0317   Epoch: 8   Global Step: 145840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:11,953-Speed 9457.92 samples/sec   Loss 6.1650   LearningRate 0.0317   Epoch: 8   Global Step: 145850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:13,028-Speed 9533.23 samples/sec   Loss 6.1301   LearningRate 0.0317   Epoch: 8   Global Step: 145860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:14,086-Speed 9678.57 samples/sec   Loss 6.2801   LearningRate 0.0317   Epoch: 8   Global Step: 145870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:15,141-Speed 9710.54 samples/sec   Loss 6.1592   LearningRate 0.0317   Epoch: 8   Global Step: 145880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:16,211-Speed 9574.10 samples/sec   Loss 6.2382   LearningRate 0.0317   Epoch: 8   Global Step: 145890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:17,272-Speed 9661.10 samples/sec   Loss 6.3815   LearningRate 0.0317   Epoch: 8   Global Step: 145900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:18,317-Speed 9802.05 samples/sec   Loss 6.2140   LearningRate 0.0317   Epoch: 8   Global Step: 145910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:19,383-Speed 9612.50 samples/sec   Loss 6.3280   LearningRate 0.0317   Epoch: 8   Global Step: 145920   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:24:20,469-Speed 9430.36 samples/sec   Loss 6.1708   LearningRate 0.0317   Epoch: 8   Global Step: 145930   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:24:21,542-Speed 9549.12 samples/sec   Loss 6.2563   LearningRate 0.0317   Epoch: 8   Global Step: 145940   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:24:22,590-Speed 9778.45 samples/sec   Loss 6.3161   LearningRate 0.0317   Epoch: 8   Global Step: 145950   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:24:23,690-Speed 9319.50 samples/sec   Loss 6.2587   LearningRate 0.0317   Epoch: 8   Global Step: 145960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:24,793-Speed 9292.64 samples/sec   Loss 6.2833   LearningRate 0.0317   Epoch: 8   Global Step: 145970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:25,838-Speed 9802.91 samples/sec   Loss 6.2638   LearningRate 0.0317   Epoch: 8   Global Step: 145980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:26,918-Speed 9488.96 samples/sec   Loss 6.1933   LearningRate 0.0317   Epoch: 8   Global Step: 145990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:27,977-Speed 9670.61 samples/sec   Loss 6.2157   LearningRate 0.0317   Epoch: 8   Global Step: 146000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:24:49,740-[lfw][146000]XNorm: 10.177552
Training: 2022-04-11 17:24:49,741-[lfw][146000]Accuracy-Flip: 0.99617+-0.00269
Training: 2022-04-11 17:24:49,742-[lfw][146000]Accuracy-Highest: 0.99683
Training: 2022-04-11 17:25:14,940-[cfp_fp][146000]XNorm: 8.666866
Training: 2022-04-11 17:25:14,940-[cfp_fp][146000]Accuracy-Flip: 0.95743+-0.00777
Training: 2022-04-11 17:25:14,941-[cfp_fp][146000]Accuracy-Highest: 0.96500
Training: 2022-04-11 17:25:36,671-[agedb_30][146000]XNorm: 9.805984
Training: 2022-04-11 17:25:36,672-[agedb_30][146000]Accuracy-Flip: 0.96783+-0.00913
Training: 2022-04-11 17:25:36,672-[agedb_30][146000]Accuracy-Highest: 0.96783
Training: 2022-04-11 17:25:37,747-Speed 146.77 samples/sec   Loss 6.3326   LearningRate 0.0317   Epoch: 8   Global Step: 146010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:25:38,788-Speed 9849.75 samples/sec   Loss 6.2823   LearningRate 0.0316   Epoch: 8   Global Step: 146020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:25:39,895-Speed 9249.01 samples/sec   Loss 6.2275   LearningRate 0.0316   Epoch: 8   Global Step: 146030   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:25:40,995-Speed 9320.44 samples/sec   Loss 6.4050   LearningRate 0.0316   Epoch: 8   Global Step: 146040   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:25:42,030-Speed 9892.81 samples/sec   Loss 6.2818   LearningRate 0.0316   Epoch: 8   Global Step: 146050   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:25:43,156-Speed 9103.56 samples/sec   Loss 6.2794   LearningRate 0.0316   Epoch: 8   Global Step: 146060   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:25:44,230-Speed 9534.58 samples/sec   Loss 6.1632   LearningRate 0.0316   Epoch: 8   Global Step: 146070   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:25:45,289-Speed 9677.53 samples/sec   Loss 6.2980   LearningRate 0.0316   Epoch: 8   Global Step: 146080   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:25:46,372-Speed 9465.36 samples/sec   Loss 6.2030   LearningRate 0.0316   Epoch: 8   Global Step: 146090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:25:47,445-Speed 9541.31 samples/sec   Loss 6.1731   LearningRate 0.0316   Epoch: 8   Global Step: 146100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:25:48,577-Speed 9052.44 samples/sec   Loss 6.2612   LearningRate 0.0316   Epoch: 8   Global Step: 146110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:25:49,674-Speed 9345.95 samples/sec   Loss 6.2326   LearningRate 0.0316   Epoch: 8   Global Step: 146120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:25:50,781-Speed 9255.60 samples/sec   Loss 6.1902   LearningRate 0.0316   Epoch: 8   Global Step: 146130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:25:51,851-Speed 9567.93 samples/sec   Loss 6.2735   LearningRate 0.0316   Epoch: 8   Global Step: 146140   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:25:52,952-Speed 9307.96 samples/sec   Loss 6.1767   LearningRate 0.0316   Epoch: 8   Global Step: 146150   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:25:54,095-Speed 8967.41 samples/sec   Loss 6.2588   LearningRate 0.0316   Epoch: 8   Global Step: 146160   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:25:55,207-Speed 9209.59 samples/sec   Loss 6.1859   LearningRate 0.0316   Epoch: 8   Global Step: 146170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:25:56,308-Speed 9309.27 samples/sec   Loss 6.3946   LearningRate 0.0316   Epoch: 8   Global Step: 146180   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:25:57,384-Speed 9522.05 samples/sec   Loss 6.2578   LearningRate 0.0316   Epoch: 8   Global Step: 146190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:25:58,481-Speed 9338.81 samples/sec   Loss 6.2224   LearningRate 0.0316   Epoch: 8   Global Step: 146200   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:25:59,563-Speed 9473.92 samples/sec   Loss 6.2577   LearningRate 0.0316   Epoch: 8   Global Step: 146210   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:00,662-Speed 9319.26 samples/sec   Loss 6.2985   LearningRate 0.0316   Epoch: 8   Global Step: 146220   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:01,743-Speed 9482.29 samples/sec   Loss 6.2377   LearningRate 0.0316   Epoch: 8   Global Step: 146230   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:02,799-Speed 9699.28 samples/sec   Loss 6.1839   LearningRate 0.0316   Epoch: 8   Global Step: 146240   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:03,851-Speed 9740.97 samples/sec   Loss 6.2476   LearningRate 0.0316   Epoch: 8   Global Step: 146250   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:04,953-Speed 9297.43 samples/sec   Loss 6.3055   LearningRate 0.0316   Epoch: 8   Global Step: 146260   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:06,045-Speed 9381.66 samples/sec   Loss 6.2798   LearningRate 0.0316   Epoch: 8   Global Step: 146270   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:07,108-Speed 9636.65 samples/sec   Loss 6.3085   LearningRate 0.0316   Epoch: 8   Global Step: 146280   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:08,179-Speed 9571.90 samples/sec   Loss 6.2009   LearningRate 0.0316   Epoch: 8   Global Step: 146290   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:26:09,249-Speed 9609.92 samples/sec   Loss 6.3047   LearningRate 0.0316   Epoch: 8   Global Step: 146300   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:26:10,319-Speed 9572.15 samples/sec   Loss 6.2955   LearningRate 0.0316   Epoch: 8   Global Step: 146310   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:11,363-Speed 9818.92 samples/sec   Loss 6.2895   LearningRate 0.0315   Epoch: 8   Global Step: 146320   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:12,443-Speed 9486.60 samples/sec   Loss 6.1751   LearningRate 0.0315   Epoch: 8   Global Step: 146330   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:13,567-Speed 9110.39 samples/sec   Loss 6.0903   LearningRate 0.0315   Epoch: 8   Global Step: 146340   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:14,685-Speed 9167.47 samples/sec   Loss 6.0840   LearningRate 0.0315   Epoch: 8   Global Step: 146350   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:15,767-Speed 9472.78 samples/sec   Loss 6.2279   LearningRate 0.0315   Epoch: 8   Global Step: 146360   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:16,814-Speed 9785.11 samples/sec   Loss 6.2337   LearningRate 0.0315   Epoch: 8   Global Step: 146370   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:17,911-Speed 9339.99 samples/sec   Loss 6.0151   LearningRate 0.0315   Epoch: 8   Global Step: 146380   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:18,956-Speed 9808.89 samples/sec   Loss 6.2659   LearningRate 0.0315   Epoch: 8   Global Step: 146390   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:20,034-Speed 9507.24 samples/sec   Loss 6.2715   LearningRate 0.0315   Epoch: 8   Global Step: 146400   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:21,079-Speed 9802.37 samples/sec   Loss 6.3527   LearningRate 0.0315   Epoch: 8   Global Step: 146410   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:26:22,135-Speed 9698.99 samples/sec   Loss 6.2935   LearningRate 0.0315   Epoch: 8   Global Step: 146420   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:26:23,271-Speed 9017.91 samples/sec   Loss 6.2814   LearningRate 0.0315   Epoch: 8   Global Step: 146430   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:26:24,333-Speed 9652.06 samples/sec   Loss 6.2045   LearningRate 0.0315   Epoch: 8   Global Step: 146440   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:26:25,411-Speed 9504.46 samples/sec   Loss 6.2597   LearningRate 0.0315   Epoch: 8   Global Step: 146450   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:26,516-Speed 9278.74 samples/sec   Loss 6.2188   LearningRate 0.0315   Epoch: 8   Global Step: 146460   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:27,656-Speed 8986.49 samples/sec   Loss 6.2549   LearningRate 0.0315   Epoch: 8   Global Step: 146470   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:28,722-Speed 9611.61 samples/sec   Loss 6.1611   LearningRate 0.0315   Epoch: 8   Global Step: 146480   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:29,775-Speed 9727.00 samples/sec   Loss 6.2497   LearningRate 0.0315   Epoch: 8   Global Step: 146490   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:30,881-Speed 9265.09 samples/sec   Loss 6.2453   LearningRate 0.0315   Epoch: 8   Global Step: 146500   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:31,964-Speed 9460.76 samples/sec   Loss 6.2589   LearningRate 0.0315   Epoch: 8   Global Step: 146510   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:33,082-Speed 9165.41 samples/sec   Loss 6.2153   LearningRate 0.0315   Epoch: 8   Global Step: 146520   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:34,198-Speed 9178.62 samples/sec   Loss 6.1563   LearningRate 0.0315   Epoch: 8   Global Step: 146530   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:35,244-Speed 9798.86 samples/sec   Loss 6.2609   LearningRate 0.0315   Epoch: 8   Global Step: 146540   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:36,285-Speed 9840.06 samples/sec   Loss 6.2656   LearningRate 0.0315   Epoch: 8   Global Step: 146550   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:26:37,340-Speed 9712.29 samples/sec   Loss 6.2555   LearningRate 0.0315   Epoch: 8   Global Step: 146560   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:26:38,433-Speed 9373.12 samples/sec   Loss 6.2905   LearningRate 0.0315   Epoch: 8   Global Step: 146570   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:26:39,523-Speed 9405.11 samples/sec   Loss 6.2037   LearningRate 0.0315   Epoch: 8   Global Step: 146580   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:26:40,602-Speed 9491.26 samples/sec   Loss 6.3187   LearningRate 0.0315   Epoch: 8   Global Step: 146590   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:41,669-Speed 9605.48 samples/sec   Loss 6.1369   LearningRate 0.0315   Epoch: 8   Global Step: 146600   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:42,803-Speed 9030.11 samples/sec   Loss 6.1719   LearningRate 0.0315   Epoch: 8   Global Step: 146610   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:43,842-Speed 9865.65 samples/sec   Loss 6.1395   LearningRate 0.0314   Epoch: 8   Global Step: 146620   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:44,946-Speed 9277.54 samples/sec   Loss 6.1652   LearningRate 0.0314   Epoch: 8   Global Step: 146630   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:45,983-Speed 9878.59 samples/sec   Loss 6.2678   LearningRate 0.0314   Epoch: 8   Global Step: 146640   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:47,035-Speed 9746.31 samples/sec   Loss 6.3035   LearningRate 0.0314   Epoch: 8   Global Step: 146650   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:48,119-Speed 9446.39 samples/sec   Loss 6.2851   LearningRate 0.0314   Epoch: 8   Global Step: 146660   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:49,201-Speed 9471.15 samples/sec   Loss 6.2417   LearningRate 0.0314   Epoch: 8   Global Step: 146670   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:50,267-Speed 9616.28 samples/sec   Loss 6.3555   LearningRate 0.0314   Epoch: 8   Global Step: 146680   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:51,333-Speed 9613.07 samples/sec   Loss 6.1504   LearningRate 0.0314   Epoch: 8   Global Step: 146690   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:26:52,381-Speed 9776.24 samples/sec   Loss 6.1952   LearningRate 0.0314   Epoch: 8   Global Step: 146700   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:53,481-Speed 9314.07 samples/sec   Loss 6.2931   LearningRate 0.0314   Epoch: 8   Global Step: 146710   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:54,559-Speed 9505.13 samples/sec   Loss 6.2208   LearningRate 0.0314   Epoch: 8   Global Step: 146720   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:55,629-Speed 9580.45 samples/sec   Loss 6.3204   LearningRate 0.0314   Epoch: 8   Global Step: 146730   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:56,729-Speed 9314.51 samples/sec   Loss 6.2094   LearningRate 0.0314   Epoch: 8   Global Step: 146740   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:57,802-Speed 9544.38 samples/sec   Loss 6.1554   LearningRate 0.0314   Epoch: 8   Global Step: 146750   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:58,863-Speed 9658.22 samples/sec   Loss 6.1547   LearningRate 0.0314   Epoch: 8   Global Step: 146760   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:26:59,945-Speed 9469.53 samples/sec   Loss 6.2165   LearningRate 0.0314   Epoch: 8   Global Step: 146770   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:01,005-Speed 9665.39 samples/sec   Loss 6.1764   LearningRate 0.0314   Epoch: 8   Global Step: 146780   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:02,047-Speed 9831.55 samples/sec   Loss 6.1967   LearningRate 0.0314   Epoch: 8   Global Step: 146790   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:03,127-Speed 9485.76 samples/sec   Loss 6.1941   LearningRate 0.0314   Epoch: 8   Global Step: 146800   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:27:04,220-Speed 9374.32 samples/sec   Loss 6.2290   LearningRate 0.0314   Epoch: 8   Global Step: 146810   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:27:05,285-Speed 9619.42 samples/sec   Loss 6.1451   LearningRate 0.0314   Epoch: 8   Global Step: 146820   Fp16 Grad Scale: 262144   Required: 8 hours
Training: 2022-04-11 17:27:06,363-Speed 9507.59 samples/sec   Loss 6.2875   LearningRate 0.0314   Epoch: 8   Global Step: 146830   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:07,489-Speed 9101.11 samples/sec   Loss 6.1441   LearningRate 0.0314   Epoch: 8   Global Step: 146840   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:08,614-Speed 9120.29 samples/sec   Loss 6.0730   LearningRate 0.0314   Epoch: 8   Global Step: 146850   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:09,707-Speed 9374.12 samples/sec   Loss 6.1787   LearningRate 0.0314   Epoch: 8   Global Step: 146860   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:10,817-Speed 9230.52 samples/sec   Loss 6.1640   LearningRate 0.0314   Epoch: 8   Global Step: 146870   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:11,880-Speed 9647.49 samples/sec   Loss 6.3178   LearningRate 0.0314   Epoch: 8   Global Step: 146880   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:12,979-Speed 9321.95 samples/sec   Loss 6.1887   LearningRate 0.0314   Epoch: 8   Global Step: 146890   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:14,076-Speed 9336.62 samples/sec   Loss 6.2109   LearningRate 0.0314   Epoch: 8   Global Step: 146900   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:15,151-Speed 9527.64 samples/sec   Loss 6.2541   LearningRate 0.0314   Epoch: 8   Global Step: 146910   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:16,271-Speed 9716.64 samples/sec   Loss 6.2130   LearningRate 0.0313   Epoch: 8   Global Step: 146920   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:17,356-Speed 9435.97 samples/sec   Loss 6.2557   LearningRate 0.0313   Epoch: 8   Global Step: 146930   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:18,422-Speed 9612.86 samples/sec   Loss 6.3504   LearningRate 0.0313   Epoch: 8   Global Step: 146940   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:19,540-Speed 9178.04 samples/sec   Loss 6.1107   LearningRate 0.0313   Epoch: 8   Global Step: 146950   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:20,606-Speed 9614.81 samples/sec   Loss 6.1484   LearningRate 0.0313   Epoch: 8   Global Step: 146960   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:21,669-Speed 9637.97 samples/sec   Loss 6.1087   LearningRate 0.0313   Epoch: 8   Global Step: 146970   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:22,729-Speed 9661.56 samples/sec   Loss 6.2602   LearningRate 0.0313   Epoch: 8   Global Step: 146980   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:23,763-Speed 9910.14 samples/sec   Loss 6.3461   LearningRate 0.0313   Epoch: 8   Global Step: 146990   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:24,841-Speed 9508.40 samples/sec   Loss 6.1490   LearningRate 0.0313   Epoch: 8   Global Step: 147000   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:25,952-Speed 9228.54 samples/sec   Loss 6.2462   LearningRate 0.0313   Epoch: 8   Global Step: 147010   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:27,017-Speed 9613.70 samples/sec   Loss 6.1557   LearningRate 0.0313   Epoch: 8   Global Step: 147020   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:28,113-Speed 9349.35 samples/sec   Loss 6.1529   LearningRate 0.0313   Epoch: 8   Global Step: 147030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:27:29,183-Speed 9582.46 samples/sec   Loss 6.2260   LearningRate 0.0313   Epoch: 8   Global Step: 147040   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:27:30,267-Speed 9454.53 samples/sec   Loss 6.2123   LearningRate 0.0313   Epoch: 8   Global Step: 147050   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:27:31,369-Speed 9295.35 samples/sec   Loss 6.2231   LearningRate 0.0313   Epoch: 8   Global Step: 147060   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:27:32,462-Speed 9371.29 samples/sec   Loss 6.2399   LearningRate 0.0313   Epoch: 8   Global Step: 147070   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:27:33,562-Speed 9312.54 samples/sec   Loss 6.1367   LearningRate 0.0313   Epoch: 8   Global Step: 147080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:27:34,661-Speed 9324.89 samples/sec   Loss 6.1916   LearningRate 0.0313   Epoch: 8   Global Step: 147090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:27:35,771-Speed 9234.03 samples/sec   Loss 6.2225   LearningRate 0.0313   Epoch: 8   Global Step: 147100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:27:36,896-Speed 9104.20 samples/sec   Loss 6.2182   LearningRate 0.0313   Epoch: 8   Global Step: 147110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:27:37,986-Speed 9403.19 samples/sec   Loss 6.1888   LearningRate 0.0313   Epoch: 8   Global Step: 147120   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-11 17:27:39,088-Speed 9302.76 samples/sec   Loss 6.2818   LearningRate 0.0313   Epoch: 8   Global Step: 147130   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:40,140-Speed 9738.71 samples/sec   Loss 6.1445   LearningRate 0.0313   Epoch: 8   Global Step: 147140   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:41,205-Speed 9619.96 samples/sec   Loss 6.2218   LearningRate 0.0313   Epoch: 8   Global Step: 147150   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:42,289-Speed 9446.84 samples/sec   Loss 6.2048   LearningRate 0.0313   Epoch: 8   Global Step: 147160   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:43,365-Speed 9522.36 samples/sec   Loss 6.1811   LearningRate 0.0313   Epoch: 8   Global Step: 147170   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:44,429-Speed 9630.27 samples/sec   Loss 6.2715   LearningRate 0.0313   Epoch: 8   Global Step: 147180   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:45,466-Speed 9886.13 samples/sec   Loss 6.3139   LearningRate 0.0313   Epoch: 8   Global Step: 147190   Fp16 Grad Scale: 131072   Required: 8 hours
Training: 2022-04-11 17:27:46,578-Speed 9215.04 samples/sec   Loss 6.1366   LearningRate 0.0313   Epoch: 8   Global Step: 147200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:27:47,680-Speed 9296.08 samples/sec   Loss 6.2086   LearningRate 0.0312   Epoch: 8   Global Step: 147210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:27:48,756-Speed 9524.20 samples/sec   Loss 6.1454   LearningRate 0.0312   Epoch: 8   Global Step: 147220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:27:49,835-Speed 9499.93 samples/sec   Loss 6.2221   LearningRate 0.0312   Epoch: 8   Global Step: 147230   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:27:50,914-Speed 9495.23 samples/sec   Loss 6.2224   LearningRate 0.0312   Epoch: 8   Global Step: 147240   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:27:52,005-Speed 9388.70 samples/sec   Loss 6.3498   LearningRate 0.0312   Epoch: 8   Global Step: 147250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:27:53,050-Speed 9810.08 samples/sec   Loss 6.2556   LearningRate 0.0312   Epoch: 8   Global Step: 147260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:27:54,149-Speed 9318.34 samples/sec   Loss 6.2512   LearningRate 0.0312   Epoch: 8   Global Step: 147270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:27:55,219-Speed 9575.04 samples/sec   Loss 6.3095   LearningRate 0.0312   Epoch: 8   Global Step: 147280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:27:56,286-Speed 9606.97 samples/sec   Loss 6.2231   LearningRate 0.0312   Epoch: 8   Global Step: 147290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:27:57,335-Speed 9764.96 samples/sec   Loss 6.2220   LearningRate 0.0312   Epoch: 8   Global Step: 147300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:27:58,394-Speed 9680.07 samples/sec   Loss 6.0466   LearningRate 0.0312   Epoch: 8   Global Step: 147310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:27:59,449-Speed 9707.32 samples/sec   Loss 6.3122   LearningRate 0.0312   Epoch: 8   Global Step: 147320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:00,557-Speed 9246.71 samples/sec   Loss 6.2345   LearningRate 0.0312   Epoch: 8   Global Step: 147330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:01,650-Speed 9373.65 samples/sec   Loss 6.2874   LearningRate 0.0312   Epoch: 8   Global Step: 147340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:02,701-Speed 9752.44 samples/sec   Loss 6.1969   LearningRate 0.0312   Epoch: 8   Global Step: 147350   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:28:03,754-Speed 9728.09 samples/sec   Loss 6.1747   LearningRate 0.0312   Epoch: 8   Global Step: 147360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:04,811-Speed 9691.45 samples/sec   Loss 6.1822   LearningRate 0.0312   Epoch: 8   Global Step: 147370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:05,876-Speed 9616.99 samples/sec   Loss 6.1925   LearningRate 0.0312   Epoch: 8   Global Step: 147380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:06,983-Speed 9260.04 samples/sec   Loss 6.2659   LearningRate 0.0312   Epoch: 8   Global Step: 147390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:08,044-Speed 9651.83 samples/sec   Loss 6.1023   LearningRate 0.0312   Epoch: 8   Global Step: 147400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:09,137-Speed 9397.81 samples/sec   Loss 6.2599   LearningRate 0.0312   Epoch: 8   Global Step: 147410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:10,208-Speed 9568.27 samples/sec   Loss 6.2479   LearningRate 0.0312   Epoch: 8   Global Step: 147420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:11,281-Speed 9550.87 samples/sec   Loss 6.2469   LearningRate 0.0312   Epoch: 8   Global Step: 147430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:12,354-Speed 9549.03 samples/sec   Loss 6.1809   LearningRate 0.0312   Epoch: 8   Global Step: 147440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:13,397-Speed 9822.26 samples/sec   Loss 6.2778   LearningRate 0.0312   Epoch: 8   Global Step: 147450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:14,445-Speed 9777.59 samples/sec   Loss 6.1836   LearningRate 0.0312   Epoch: 8   Global Step: 147460   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:28:15,491-Speed 9790.19 samples/sec   Loss 6.1731   LearningRate 0.0312   Epoch: 8   Global Step: 147470   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:28:16,564-Speed 9547.15 samples/sec   Loss 6.2293   LearningRate 0.0312   Epoch: 8   Global Step: 147480   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:28:17,692-Speed 9082.01 samples/sec   Loss 6.2090   LearningRate 0.0312   Epoch: 8   Global Step: 147490   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:28:18,744-Speed 9744.30 samples/sec   Loss 6.1984   LearningRate 0.0312   Epoch: 8   Global Step: 147500   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:28:19,812-Speed 9600.30 samples/sec   Loss 6.0853   LearningRate 0.0311   Epoch: 8   Global Step: 147510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:20,890-Speed 9505.46 samples/sec   Loss 6.2977   LearningRate 0.0311   Epoch: 8   Global Step: 147520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:21,977-Speed 9425.06 samples/sec   Loss 6.1580   LearningRate 0.0311   Epoch: 8   Global Step: 147530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:23,057-Speed 9482.14 samples/sec   Loss 6.3664   LearningRate 0.0311   Epoch: 8   Global Step: 147540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:24,116-Speed 9682.24 samples/sec   Loss 6.1815   LearningRate 0.0311   Epoch: 8   Global Step: 147550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:25,154-Speed 9870.30 samples/sec   Loss 6.2916   LearningRate 0.0311   Epoch: 8   Global Step: 147560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:26,238-Speed 9450.02 samples/sec   Loss 6.1872   LearningRate 0.0311   Epoch: 8   Global Step: 147570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:27,325-Speed 9427.39 samples/sec   Loss 6.2609   LearningRate 0.0311   Epoch: 8   Global Step: 147580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:28,440-Speed 9195.66 samples/sec   Loss 6.1191   LearningRate 0.0311   Epoch: 8   Global Step: 147590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:29,551-Speed 9215.17 samples/sec   Loss 6.3122   LearningRate 0.0311   Epoch: 8   Global Step: 147600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:30,642-Speed 9397.82 samples/sec   Loss 6.2807   LearningRate 0.0311   Epoch: 8   Global Step: 147610   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:28:31,723-Speed 9478.85 samples/sec   Loss 6.2403   LearningRate 0.0311   Epoch: 8   Global Step: 147620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:32,810-Speed 9424.27 samples/sec   Loss 6.2041   LearningRate 0.0311   Epoch: 8   Global Step: 147630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:33,902-Speed 9379.05 samples/sec   Loss 6.2060   LearningRate 0.0311   Epoch: 8   Global Step: 147640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:34,930-Speed 9965.05 samples/sec   Loss 6.2790   LearningRate 0.0311   Epoch: 8   Global Step: 147650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:36,052-Speed 9142.83 samples/sec   Loss 6.2022   LearningRate 0.0311   Epoch: 8   Global Step: 147660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:37,127-Speed 9532.48 samples/sec   Loss 6.0980   LearningRate 0.0311   Epoch: 8   Global Step: 147670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:38,198-Speed 9568.03 samples/sec   Loss 6.2224   LearningRate 0.0311   Epoch: 8   Global Step: 147680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:39,246-Speed 9774.63 samples/sec   Loss 6.2505   LearningRate 0.0311   Epoch: 8   Global Step: 147690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:40,297-Speed 9750.92 samples/sec   Loss 6.1896   LearningRate 0.0311   Epoch: 8   Global Step: 147700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:41,384-Speed 9427.32 samples/sec   Loss 6.1058   LearningRate 0.0311   Epoch: 8   Global Step: 147710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:42,478-Speed 9366.30 samples/sec   Loss 6.2192   LearningRate 0.0311   Epoch: 8   Global Step: 147720   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:28:43,533-Speed 9707.17 samples/sec   Loss 6.1859   LearningRate 0.0311   Epoch: 8   Global Step: 147730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:44,615-Speed 9466.23 samples/sec   Loss 6.1918   LearningRate 0.0311   Epoch: 8   Global Step: 147740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:45,668-Speed 9733.63 samples/sec   Loss 6.1762   LearningRate 0.0311   Epoch: 8   Global Step: 147750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:46,748-Speed 9488.68 samples/sec   Loss 6.1666   LearningRate 0.0311   Epoch: 8   Global Step: 147760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:47,810-Speed 9651.80 samples/sec   Loss 6.2437   LearningRate 0.0311   Epoch: 8   Global Step: 147770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:48,851-Speed 9839.69 samples/sec   Loss 6.2290   LearningRate 0.0311   Epoch: 8   Global Step: 147780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:49,896-Speed 9809.36 samples/sec   Loss 6.1985   LearningRate 0.0311   Epoch: 8   Global Step: 147790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:50,993-Speed 9341.29 samples/sec   Loss 6.1873   LearningRate 0.0311   Epoch: 8   Global Step: 147800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:52,093-Speed 9315.47 samples/sec   Loss 6.2165   LearningRate 0.0310   Epoch: 8   Global Step: 147810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:53,129-Speed 9891.44 samples/sec   Loss 6.1027   LearningRate 0.0310   Epoch: 8   Global Step: 147820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:54,254-Speed 9103.84 samples/sec   Loss 6.1426   LearningRate 0.0310   Epoch: 8   Global Step: 147830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:55,333-Speed 9499.00 samples/sec   Loss 6.2243   LearningRate 0.0310   Epoch: 8   Global Step: 147840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:56,386-Speed 9734.67 samples/sec   Loss 6.1715   LearningRate 0.0310   Epoch: 8   Global Step: 147850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:57,484-Speed 9331.88 samples/sec   Loss 6.3195   LearningRate 0.0310   Epoch: 8   Global Step: 147860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:58,570-Speed 9434.04 samples/sec   Loss 6.2205   LearningRate 0.0310   Epoch: 8   Global Step: 147870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:28:59,627-Speed 9689.65 samples/sec   Loss 6.2265   LearningRate 0.0310   Epoch: 8   Global Step: 147880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:29:00,720-Speed 9372.42 samples/sec   Loss 6.1076   LearningRate 0.0310   Epoch: 8   Global Step: 147890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:29:01,807-Speed 9427.82 samples/sec   Loss 6.2018   LearningRate 0.0310   Epoch: 8   Global Step: 147900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:29:02,902-Speed 9351.05 samples/sec   Loss 6.1916   LearningRate 0.0310   Epoch: 8   Global Step: 147910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:29:03,952-Speed 9764.13 samples/sec   Loss 6.2331   LearningRate 0.0310   Epoch: 8   Global Step: 147920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:29:05,026-Speed 9540.93 samples/sec   Loss 6.1635   LearningRate 0.0310   Epoch: 8   Global Step: 147930   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:29:06,143-Speed 9171.67 samples/sec   Loss 6.1590   LearningRate 0.0310   Epoch: 8   Global Step: 147940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:29:07,239-Speed 9349.16 samples/sec   Loss 6.1397   LearningRate 0.0310   Epoch: 8   Global Step: 147950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:29:08,348-Speed 9240.66 samples/sec   Loss 6.1536   LearningRate 0.0310   Epoch: 8   Global Step: 147960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:29:09,455-Speed 9357.48 samples/sec   Loss 6.1375   LearningRate 0.0310   Epoch: 8   Global Step: 147970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:29:10,518-Speed 9636.88 samples/sec   Loss 6.1738   LearningRate 0.0310   Epoch: 8   Global Step: 147980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:29:11,585-Speed 9605.06 samples/sec   Loss 6.1776   LearningRate 0.0310   Epoch: 8   Global Step: 147990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:29:12,661-Speed 9520.85 samples/sec   Loss 6.2050   LearningRate 0.0310   Epoch: 8   Global Step: 148000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:29:34,752-[lfw][148000]XNorm: 10.113518
Training: 2022-04-11 17:29:34,753-[lfw][148000]Accuracy-Flip: 0.99583+-0.00227
Training: 2022-04-11 17:29:34,753-[lfw][148000]Accuracy-Highest: 0.99683
Training: 2022-04-11 17:30:00,126-[cfp_fp][148000]XNorm: 8.712494
Training: 2022-04-11 17:30:00,127-[cfp_fp][148000]Accuracy-Flip: 0.95929+-0.00897
Training: 2022-04-11 17:30:00,128-[cfp_fp][148000]Accuracy-Highest: 0.96500
Training: 2022-04-11 17:30:22,529-[agedb_30][148000]XNorm: 9.862267
Training: 2022-04-11 17:30:22,530-[agedb_30][148000]Accuracy-Flip: 0.96183+-0.01047
Training: 2022-04-11 17:30:22,530-[agedb_30][148000]Accuracy-Highest: 0.96783
Training: 2022-04-11 17:30:23,624-Speed 144.30 samples/sec   Loss 6.1677   LearningRate 0.0310   Epoch: 8   Global Step: 148010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:24,713-Speed 9408.31 samples/sec   Loss 6.1801   LearningRate 0.0310   Epoch: 8   Global Step: 148020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:25,791-Speed 9500.37 samples/sec   Loss 6.1171   LearningRate 0.0310   Epoch: 8   Global Step: 148030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:26,852-Speed 9658.57 samples/sec   Loss 6.2886   LearningRate 0.0310   Epoch: 8   Global Step: 148040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:27,910-Speed 9682.63 samples/sec   Loss 6.1401   LearningRate 0.0310   Epoch: 8   Global Step: 148050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:29,009-Speed 9326.36 samples/sec   Loss 6.2257   LearningRate 0.0310   Epoch: 8   Global Step: 148060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:30,081-Speed 9565.19 samples/sec   Loss 6.0874   LearningRate 0.0310   Epoch: 8   Global Step: 148070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:31,184-Speed 9286.76 samples/sec   Loss 6.1955   LearningRate 0.0310   Epoch: 8   Global Step: 148080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:32,268-Speed 9455.39 samples/sec   Loss 6.1507   LearningRate 0.0310   Epoch: 8   Global Step: 148090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:33,322-Speed 9719.79 samples/sec   Loss 6.3237   LearningRate 0.0310   Epoch: 8   Global Step: 148100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:34,422-Speed 9311.71 samples/sec   Loss 6.1996   LearningRate 0.0309   Epoch: 8   Global Step: 148110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:35,512-Speed 9400.41 samples/sec   Loss 6.0735   LearningRate 0.0309   Epoch: 8   Global Step: 148120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:36,641-Speed 9075.77 samples/sec   Loss 6.2147   LearningRate 0.0309   Epoch: 8   Global Step: 148130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:37,762-Speed 9143.34 samples/sec   Loss 6.2280   LearningRate 0.0309   Epoch: 8   Global Step: 148140   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:30:38,840-Speed 9498.74 samples/sec   Loss 6.1845   LearningRate 0.0309   Epoch: 8   Global Step: 148150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:39,940-Speed 9318.90 samples/sec   Loss 6.1245   LearningRate 0.0309   Epoch: 8   Global Step: 148160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:41,062-Speed 9130.29 samples/sec   Loss 6.2884   LearningRate 0.0309   Epoch: 8   Global Step: 148170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:42,153-Speed 9390.89 samples/sec   Loss 6.2208   LearningRate 0.0309   Epoch: 8   Global Step: 148180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:43,219-Speed 9611.69 samples/sec   Loss 6.2357   LearningRate 0.0309   Epoch: 8   Global Step: 148190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:44,319-Speed 9314.22 samples/sec   Loss 6.2558   LearningRate 0.0309   Epoch: 8   Global Step: 148200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:45,359-Speed 9851.62 samples/sec   Loss 6.1167   LearningRate 0.0309   Epoch: 8   Global Step: 148210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:46,425-Speed 9611.44 samples/sec   Loss 6.2365   LearningRate 0.0309   Epoch: 8   Global Step: 148220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:47,510-Speed 9448.20 samples/sec   Loss 6.2021   LearningRate 0.0309   Epoch: 8   Global Step: 148230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:48,623-Speed 9205.44 samples/sec   Loss 6.2999   LearningRate 0.0309   Epoch: 8   Global Step: 148240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:49,723-Speed 9311.89 samples/sec   Loss 6.2573   LearningRate 0.0309   Epoch: 8   Global Step: 148250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:50,789-Speed 9617.97 samples/sec   Loss 6.2229   LearningRate 0.0309   Epoch: 8   Global Step: 148260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:51,884-Speed 9352.29 samples/sec   Loss 6.2897   LearningRate 0.0309   Epoch: 8   Global Step: 148270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:52,994-Speed 9231.90 samples/sec   Loss 6.1450   LearningRate 0.0309   Epoch: 8   Global Step: 148280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:54,070-Speed 9521.80 samples/sec   Loss 6.2711   LearningRate 0.0309   Epoch: 8   Global Step: 148290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:55,172-Speed 9303.71 samples/sec   Loss 6.1143   LearningRate 0.0309   Epoch: 8   Global Step: 148300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:56,273-Speed 9314.52 samples/sec   Loss 6.2343   LearningRate 0.0309   Epoch: 8   Global Step: 148310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:57,331-Speed 9682.99 samples/sec   Loss 6.2189   LearningRate 0.0309   Epoch: 8   Global Step: 148320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:58,408-Speed 9511.38 samples/sec   Loss 6.2051   LearningRate 0.0309   Epoch: 8   Global Step: 148330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:30:59,461-Speed 9729.05 samples/sec   Loss 6.1658   LearningRate 0.0309   Epoch: 8   Global Step: 148340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:00,526-Speed 9620.42 samples/sec   Loss 6.1485   LearningRate 0.0309   Epoch: 8   Global Step: 148350   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:31:01,669-Speed 8961.76 samples/sec   Loss 6.2367   LearningRate 0.0309   Epoch: 8   Global Step: 148360   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:31:02,776-Speed 9257.52 samples/sec   Loss 6.1060   LearningRate 0.0309   Epoch: 8   Global Step: 148370   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:31:03,871-Speed 9360.17 samples/sec   Loss 6.2011   LearningRate 0.0309   Epoch: 8   Global Step: 148380   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:31:04,951-Speed 9484.24 samples/sec   Loss 6.2670   LearningRate 0.0309   Epoch: 8   Global Step: 148390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:06,030-Speed 9494.80 samples/sec   Loss 6.1460   LearningRate 0.0309   Epoch: 8   Global Step: 148400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:07,073-Speed 9825.89 samples/sec   Loss 6.2103   LearningRate 0.0308   Epoch: 8   Global Step: 148410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:08,153-Speed 9486.13 samples/sec   Loss 6.1965   LearningRate 0.0308   Epoch: 8   Global Step: 148420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:09,228-Speed 9531.84 samples/sec   Loss 6.1464   LearningRate 0.0308   Epoch: 8   Global Step: 148430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:10,348-Speed 9147.38 samples/sec   Loss 6.2057   LearningRate 0.0308   Epoch: 8   Global Step: 148440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:11,460-Speed 9221.62 samples/sec   Loss 6.1952   LearningRate 0.0308   Epoch: 8   Global Step: 148450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:12,529-Speed 9581.66 samples/sec   Loss 6.1432   LearningRate 0.0308   Epoch: 8   Global Step: 148460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:13,599-Speed 9575.95 samples/sec   Loss 6.3050   LearningRate 0.0308   Epoch: 8   Global Step: 148470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:14,680-Speed 9479.60 samples/sec   Loss 6.1410   LearningRate 0.0308   Epoch: 8   Global Step: 148480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:15,780-Speed 9312.64 samples/sec   Loss 6.2378   LearningRate 0.0308   Epoch: 8   Global Step: 148490   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:31:16,865-Speed 9438.94 samples/sec   Loss 6.1755   LearningRate 0.0308   Epoch: 8   Global Step: 148500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:17,949-Speed 9452.58 samples/sec   Loss 6.1412   LearningRate 0.0308   Epoch: 8   Global Step: 148510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:19,052-Speed 9290.02 samples/sec   Loss 6.2874   LearningRate 0.0308   Epoch: 8   Global Step: 148520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:20,133-Speed 9486.77 samples/sec   Loss 6.1946   LearningRate 0.0308   Epoch: 8   Global Step: 148530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:21,234-Speed 9304.36 samples/sec   Loss 6.2829   LearningRate 0.0308   Epoch: 8   Global Step: 148540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:22,289-Speed 9712.92 samples/sec   Loss 6.2225   LearningRate 0.0308   Epoch: 8   Global Step: 148550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:23,341-Speed 9733.76 samples/sec   Loss 6.2174   LearningRate 0.0308   Epoch: 8   Global Step: 148560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:24,403-Speed 9649.60 samples/sec   Loss 6.0440   LearningRate 0.0308   Epoch: 8   Global Step: 148570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:25,487-Speed 9456.28 samples/sec   Loss 6.2040   LearningRate 0.0308   Epoch: 8   Global Step: 148580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:26,556-Speed 9585.96 samples/sec   Loss 6.3123   LearningRate 0.0308   Epoch: 8   Global Step: 148590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:27,632-Speed 9521.78 samples/sec   Loss 6.1584   LearningRate 0.0308   Epoch: 8   Global Step: 148600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:28,702-Speed 9575.71 samples/sec   Loss 6.1460   LearningRate 0.0308   Epoch: 8   Global Step: 148610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:29,825-Speed 9117.56 samples/sec   Loss 6.2272   LearningRate 0.0308   Epoch: 8   Global Step: 148620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:30,919-Speed 9371.91 samples/sec   Loss 6.1940   LearningRate 0.0308   Epoch: 8   Global Step: 148630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:31,944-Speed 9994.95 samples/sec   Loss 6.2810   LearningRate 0.0308   Epoch: 8   Global Step: 148640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:33,016-Speed 9554.80 samples/sec   Loss 6.1025   LearningRate 0.0308   Epoch: 8   Global Step: 148650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:34,109-Speed 9376.45 samples/sec   Loss 6.2837   LearningRate 0.0308   Epoch: 8   Global Step: 148660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:35,164-Speed 9709.72 samples/sec   Loss 6.2748   LearningRate 0.0308   Epoch: 8   Global Step: 148670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:36,241-Speed 9515.86 samples/sec   Loss 6.2175   LearningRate 0.0308   Epoch: 8   Global Step: 148680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:37,327-Speed 9431.61 samples/sec   Loss 6.0783   LearningRate 0.0308   Epoch: 8   Global Step: 148690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:38,405-Speed 9510.54 samples/sec   Loss 6.2192   LearningRate 0.0308   Epoch: 8   Global Step: 148700   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:31:39,445-Speed 9850.24 samples/sec   Loss 6.1946   LearningRate 0.0307   Epoch: 8   Global Step: 148710   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:31:40,497-Speed 9742.31 samples/sec   Loss 6.0784   LearningRate 0.0307   Epoch: 8   Global Step: 148720   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:31:41,561-Speed 9630.09 samples/sec   Loss 6.1378   LearningRate 0.0307   Epoch: 8   Global Step: 148730   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:31:42,650-Speed 9409.25 samples/sec   Loss 6.1450   LearningRate 0.0307   Epoch: 8   Global Step: 148740   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:31:43,719-Speed 9580.88 samples/sec   Loss 6.1587   LearningRate 0.0307   Epoch: 8   Global Step: 148750   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:31:44,801-Speed 9469.79 samples/sec   Loss 6.2232   LearningRate 0.0307   Epoch: 8   Global Step: 148760   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:31:45,894-Speed 9374.33 samples/sec   Loss 6.1242   LearningRate 0.0307   Epoch: 8   Global Step: 148770   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:31:46,969-Speed 9529.46 samples/sec   Loss 6.1795   LearningRate 0.0307   Epoch: 8   Global Step: 148780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:48,022-Speed 9730.89 samples/sec   Loss 6.1678   LearningRate 0.0307   Epoch: 8   Global Step: 148790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:49,100-Speed 9506.34 samples/sec   Loss 6.1028   LearningRate 0.0307   Epoch: 8   Global Step: 148800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:50,145-Speed 9807.62 samples/sec   Loss 6.1546   LearningRate 0.0307   Epoch: 8   Global Step: 148810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:51,231-Speed 9438.31 samples/sec   Loss 6.2294   LearningRate 0.0307   Epoch: 8   Global Step: 148820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:52,316-Speed 9443.12 samples/sec   Loss 6.2648   LearningRate 0.0307   Epoch: 8   Global Step: 148830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:53,376-Speed 9664.88 samples/sec   Loss 6.2801   LearningRate 0.0307   Epoch: 8   Global Step: 148840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:31:54,432-Speed 9704.12 samples/sec   Loss 6.3053   LearningRate 0.0307   Epoch: 8   Global Step: 148850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:31:55,515-Speed 9458.77 samples/sec   Loss 6.1582   LearningRate 0.0307   Epoch: 8   Global Step: 148860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:31:56,590-Speed 9532.44 samples/sec   Loss 6.2090   LearningRate 0.0307   Epoch: 8   Global Step: 148870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:31:57,687-Speed 9345.98 samples/sec   Loss 6.1770   LearningRate 0.0307   Epoch: 8   Global Step: 148880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:31:58,757-Speed 9572.37 samples/sec   Loss 6.1811   LearningRate 0.0307   Epoch: 8   Global Step: 148890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:31:59,822-Speed 9618.43 samples/sec   Loss 6.2152   LearningRate 0.0307   Epoch: 8   Global Step: 148900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:32:00,906-Speed 9453.31 samples/sec   Loss 6.1629   LearningRate 0.0307   Epoch: 8   Global Step: 148910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:32:01,995-Speed 9409.54 samples/sec   Loss 6.1894   LearningRate 0.0307   Epoch: 8   Global Step: 148920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:32:03,082-Speed 9424.76 samples/sec   Loss 6.2207   LearningRate 0.0307   Epoch: 8   Global Step: 148930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:32:04,136-Speed 9722.09 samples/sec   Loss 6.1907   LearningRate 0.0307   Epoch: 8   Global Step: 148940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:32:05,222-Speed 9437.53 samples/sec   Loss 6.1487   LearningRate 0.0307   Epoch: 8   Global Step: 148950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:06,339-Speed 9171.15 samples/sec   Loss 6.0736   LearningRate 0.0307   Epoch: 8   Global Step: 148960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:07,405-Speed 9608.25 samples/sec   Loss 6.1965   LearningRate 0.0307   Epoch: 8   Global Step: 148970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:08,470-Speed 9628.97 samples/sec   Loss 6.1947   LearningRate 0.0307   Epoch: 8   Global Step: 148980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:09,497-Speed 9977.91 samples/sec   Loss 6.1434   LearningRate 0.0307   Epoch: 8   Global Step: 148990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:10,581-Speed 9451.16 samples/sec   Loss 6.1587   LearningRate 0.0307   Epoch: 8   Global Step: 149000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:11,640-Speed 9671.91 samples/sec   Loss 6.1661   LearningRate 0.0306   Epoch: 8   Global Step: 149010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:12,688-Speed 9781.23 samples/sec   Loss 6.1007   LearningRate 0.0306   Epoch: 8   Global Step: 149020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:13,760-Speed 9558.84 samples/sec   Loss 6.1519   LearningRate 0.0306   Epoch: 8   Global Step: 149030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:14,849-Speed 9405.25 samples/sec   Loss 6.2062   LearningRate 0.0306   Epoch: 8   Global Step: 149040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:15,911-Speed 9645.77 samples/sec   Loss 6.2199   LearningRate 0.0306   Epoch: 8   Global Step: 149050   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:32:16,980-Speed 9590.45 samples/sec   Loss 6.1953   LearningRate 0.0306   Epoch: 8   Global Step: 149060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:18,099-Speed 9149.07 samples/sec   Loss 6.2265   LearningRate 0.0306   Epoch: 8   Global Step: 149070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:19,229-Speed 9073.79 samples/sec   Loss 6.2178   LearningRate 0.0306   Epoch: 8   Global Step: 149080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:20,343-Speed 9198.82 samples/sec   Loss 6.1574   LearningRate 0.0306   Epoch: 8   Global Step: 149090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:21,398-Speed 9712.01 samples/sec   Loss 6.3089   LearningRate 0.0306   Epoch: 8   Global Step: 149100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:22,493-Speed 9351.33 samples/sec   Loss 6.2730   LearningRate 0.0306   Epoch: 8   Global Step: 149110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:23,589-Speed 9349.41 samples/sec   Loss 6.2059   LearningRate 0.0306   Epoch: 8   Global Step: 149120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:24,685-Speed 9349.88 samples/sec   Loss 6.2411   LearningRate 0.0306   Epoch: 8   Global Step: 149130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:25,772-Speed 9433.93 samples/sec   Loss 6.2622   LearningRate 0.0306   Epoch: 8   Global Step: 149140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:26,839-Speed 9607.36 samples/sec   Loss 6.2171   LearningRate 0.0306   Epoch: 8   Global Step: 149150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:27,930-Speed 9387.15 samples/sec   Loss 6.2385   LearningRate 0.0306   Epoch: 8   Global Step: 149160   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:32:28,992-Speed 9650.56 samples/sec   Loss 6.1848   LearningRate 0.0306   Epoch: 8   Global Step: 149170   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:32:30,031-Speed 9854.95 samples/sec   Loss 6.2246   LearningRate 0.0306   Epoch: 8   Global Step: 149180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:31,146-Speed 9190.07 samples/sec   Loss 6.1964   LearningRate 0.0306   Epoch: 8   Global Step: 149190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:32,218-Speed 9562.53 samples/sec   Loss 6.1316   LearningRate 0.0306   Epoch: 8   Global Step: 149200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:33,276-Speed 9677.71 samples/sec   Loss 6.1904   LearningRate 0.0306   Epoch: 8   Global Step: 149210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:34,352-Speed 9521.83 samples/sec   Loss 6.1977   LearningRate 0.0306   Epoch: 8   Global Step: 149220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:35,415-Speed 9644.87 samples/sec   Loss 6.1790   LearningRate 0.0306   Epoch: 8   Global Step: 149230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:36,492-Speed 9509.69 samples/sec   Loss 6.1435   LearningRate 0.0306   Epoch: 8   Global Step: 149240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:37,618-Speed 9097.49 samples/sec   Loss 6.0379   LearningRate 0.0306   Epoch: 8   Global Step: 149250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:38,704-Speed 9437.29 samples/sec   Loss 6.0291   LearningRate 0.0306   Epoch: 8   Global Step: 149260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:39,786-Speed 9472.57 samples/sec   Loss 6.1694   LearningRate 0.0306   Epoch: 8   Global Step: 149270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:40,836-Speed 9760.07 samples/sec   Loss 6.1868   LearningRate 0.0306   Epoch: 8   Global Step: 149280   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:32:41,919-Speed 9458.53 samples/sec   Loss 6.1767   LearningRate 0.0306   Epoch: 8   Global Step: 149290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:42,994-Speed 9529.99 samples/sec   Loss 6.1993   LearningRate 0.0306   Epoch: 8   Global Step: 149300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:44,089-Speed 9361.10 samples/sec   Loss 6.2555   LearningRate 0.0306   Epoch: 8   Global Step: 149310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:45,169-Speed 9483.45 samples/sec   Loss 6.1359   LearningRate 0.0305   Epoch: 8   Global Step: 149320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:46,226-Speed 9699.50 samples/sec   Loss 6.1465   LearningRate 0.0305   Epoch: 8   Global Step: 149330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:47,326-Speed 9308.60 samples/sec   Loss 6.2795   LearningRate 0.0305   Epoch: 8   Global Step: 149340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:48,372-Speed 9796.60 samples/sec   Loss 6.1838   LearningRate 0.0305   Epoch: 8   Global Step: 149350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:49,505-Speed 9045.06 samples/sec   Loss 6.2521   LearningRate 0.0305   Epoch: 8   Global Step: 149360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:50,591-Speed 9442.94 samples/sec   Loss 6.1199   LearningRate 0.0305   Epoch: 8   Global Step: 149370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:51,655-Speed 9626.36 samples/sec   Loss 6.2149   LearningRate 0.0305   Epoch: 8   Global Step: 149380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:52,720-Speed 9620.64 samples/sec   Loss 6.2190   LearningRate 0.0305   Epoch: 8   Global Step: 149390   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:32:53,784-Speed 9629.24 samples/sec   Loss 6.1341   LearningRate 0.0305   Epoch: 8   Global Step: 149400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:54,890-Speed 9266.05 samples/sec   Loss 6.1742   LearningRate 0.0305   Epoch: 8   Global Step: 149410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:55,989-Speed 9324.07 samples/sec   Loss 6.2115   LearningRate 0.0305   Epoch: 8   Global Step: 149420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:57,089-Speed 9315.02 samples/sec   Loss 6.2214   LearningRate 0.0305   Epoch: 8   Global Step: 149430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:58,176-Speed 9424.45 samples/sec   Loss 6.2227   LearningRate 0.0305   Epoch: 8   Global Step: 149440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:32:59,247-Speed 9569.20 samples/sec   Loss 6.0894   LearningRate 0.0305   Epoch: 8   Global Step: 149450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:00,308-Speed 9658.83 samples/sec   Loss 6.1408   LearningRate 0.0305   Epoch: 8   Global Step: 149460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:01,401-Speed 9374.58 samples/sec   Loss 6.1001   LearningRate 0.0305   Epoch: 8   Global Step: 149470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:02,505-Speed 9276.79 samples/sec   Loss 6.1323   LearningRate 0.0305   Epoch: 8   Global Step: 149480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:03,580-Speed 9534.82 samples/sec   Loss 6.2222   LearningRate 0.0305   Epoch: 8   Global Step: 149490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:04,642-Speed 9647.79 samples/sec   Loss 6.0966   LearningRate 0.0305   Epoch: 8   Global Step: 149500   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:33:05,717-Speed 9529.96 samples/sec   Loss 6.2740   LearningRate 0.0305   Epoch: 8   Global Step: 149510   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:33:06,789-Speed 9556.55 samples/sec   Loss 6.1911   LearningRate 0.0305   Epoch: 8   Global Step: 149520   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:33:07,913-Speed 9117.54 samples/sec   Loss 6.1953   LearningRate 0.0305   Epoch: 8   Global Step: 149530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:09,023-Speed 9231.89 samples/sec   Loss 6.1070   LearningRate 0.0305   Epoch: 8   Global Step: 149540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:10,100-Speed 9512.88 samples/sec   Loss 6.1212   LearningRate 0.0305   Epoch: 8   Global Step: 149550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:11,185-Speed 9451.17 samples/sec   Loss 6.0806   LearningRate 0.0305   Epoch: 8   Global Step: 149560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:12,241-Speed 9708.16 samples/sec   Loss 6.0598   LearningRate 0.0305   Epoch: 8   Global Step: 149570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:13,346-Speed 9271.28 samples/sec   Loss 6.1965   LearningRate 0.0305   Epoch: 8   Global Step: 149580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:14,407-Speed 9650.41 samples/sec   Loss 6.2009   LearningRate 0.0305   Epoch: 8   Global Step: 149590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:15,506-Speed 9327.13 samples/sec   Loss 6.1198   LearningRate 0.0305   Epoch: 8   Global Step: 149600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:16,559-Speed 9730.12 samples/sec   Loss 6.1454   LearningRate 0.0305   Epoch: 8   Global Step: 149610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:17,649-Speed 9395.45 samples/sec   Loss 6.1138   LearningRate 0.0304   Epoch: 8   Global Step: 149620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:18,711-Speed 9649.60 samples/sec   Loss 6.2959   LearningRate 0.0304   Epoch: 8   Global Step: 149630   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:33:19,778-Speed 9599.12 samples/sec   Loss 6.2280   LearningRate 0.0304   Epoch: 8   Global Step: 149640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:20,868-Speed 9405.09 samples/sec   Loss 6.0109   LearningRate 0.0304   Epoch: 8   Global Step: 149650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:21,927-Speed 9676.44 samples/sec   Loss 6.1492   LearningRate 0.0304   Epoch: 8   Global Step: 149660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:22,996-Speed 9591.22 samples/sec   Loss 6.1957   LearningRate 0.0304   Epoch: 8   Global Step: 149670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:24,082-Speed 9437.60 samples/sec   Loss 6.2589   LearningRate 0.0304   Epoch: 8   Global Step: 149680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:25,170-Speed 9413.72 samples/sec   Loss 6.1531   LearningRate 0.0304   Epoch: 8   Global Step: 149690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:26,279-Speed 9240.55 samples/sec   Loss 6.0835   LearningRate 0.0304   Epoch: 8   Global Step: 149700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:27,354-Speed 9533.74 samples/sec   Loss 6.1960   LearningRate 0.0304   Epoch: 8   Global Step: 149710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:28,455-Speed 9306.76 samples/sec   Loss 6.1891   LearningRate 0.0304   Epoch: 8   Global Step: 149720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:29,513-Speed 9687.39 samples/sec   Loss 6.2004   LearningRate 0.0304   Epoch: 8   Global Step: 149730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:30,563-Speed 9749.50 samples/sec   Loss 6.1775   LearningRate 0.0304   Epoch: 8   Global Step: 149740   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:33:31,640-Speed 9513.53 samples/sec   Loss 6.0919   LearningRate 0.0304   Epoch: 8   Global Step: 149750   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:33:32,748-Speed 9250.37 samples/sec   Loss 6.2012   LearningRate 0.0304   Epoch: 8   Global Step: 149760   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:33:33,824-Speed 9521.47 samples/sec   Loss 6.1405   LearningRate 0.0304   Epoch: 8   Global Step: 149770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:34,883-Speed 9676.99 samples/sec   Loss 6.0932   LearningRate 0.0304   Epoch: 8   Global Step: 149780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:35,969-Speed 9431.63 samples/sec   Loss 6.1064   LearningRate 0.0304   Epoch: 8   Global Step: 149790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:37,072-Speed 9288.09 samples/sec   Loss 6.2032   LearningRate 0.0304   Epoch: 8   Global Step: 149800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:38,150-Speed 9506.38 samples/sec   Loss 6.2483   LearningRate 0.0304   Epoch: 8   Global Step: 149810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:39,215-Speed 9620.15 samples/sec   Loss 6.2060   LearningRate 0.0304   Epoch: 8   Global Step: 149820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:40,307-Speed 9386.03 samples/sec   Loss 6.2217   LearningRate 0.0304   Epoch: 8   Global Step: 149830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:41,378-Speed 9569.83 samples/sec   Loss 6.3418   LearningRate 0.0304   Epoch: 8   Global Step: 149840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:42,434-Speed 9701.46 samples/sec   Loss 6.2346   LearningRate 0.0304   Epoch: 8   Global Step: 149850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:43,515-Speed 9481.96 samples/sec   Loss 6.2115   LearningRate 0.0304   Epoch: 8   Global Step: 149860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:33:44,608-Speed 9372.04 samples/sec   Loss 6.2283   LearningRate 0.0304   Epoch: 8   Global Step: 149870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:33:45,677-Speed 9582.03 samples/sec   Loss 6.0609   LearningRate 0.0304   Epoch: 8   Global Step: 149880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:33:46,806-Speed 9074.30 samples/sec   Loss 6.1872   LearningRate 0.0304   Epoch: 8   Global Step: 149890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:33:47,888-Speed 9469.25 samples/sec   Loss 6.1767   LearningRate 0.0304   Epoch: 8   Global Step: 149900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:33:48,989-Speed 9304.66 samples/sec   Loss 6.2221   LearningRate 0.0304   Epoch: 8   Global Step: 149910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:33:50,063-Speed 9549.34 samples/sec   Loss 6.0724   LearningRate 0.0303   Epoch: 8   Global Step: 149920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:33:51,144-Speed 9476.33 samples/sec   Loss 6.1622   LearningRate 0.0303   Epoch: 8   Global Step: 149930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:33:52,213-Speed 9586.67 samples/sec   Loss 6.1694   LearningRate 0.0303   Epoch: 8   Global Step: 149940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:33:53,274-Speed 9654.41 samples/sec   Loss 6.2272   LearningRate 0.0303   Epoch: 8   Global Step: 149950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:33:54,382-Speed 9246.10 samples/sec   Loss 6.0969   LearningRate 0.0303   Epoch: 8   Global Step: 149960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:55,457-Speed 9531.58 samples/sec   Loss 6.1849   LearningRate 0.0303   Epoch: 8   Global Step: 149970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:56,514-Speed 9690.01 samples/sec   Loss 6.1566   LearningRate 0.0303   Epoch: 8   Global Step: 149980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:57,587-Speed 9556.13 samples/sec   Loss 6.1493   LearningRate 0.0303   Epoch: 8   Global Step: 149990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:33:58,723-Speed 9015.02 samples/sec   Loss 6.1378   LearningRate 0.0303   Epoch: 8   Global Step: 150000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:34:20,662-[lfw][150000]XNorm: 10.094884
Training: 2022-04-11 17:34:20,663-[lfw][150000]Accuracy-Flip: 0.99650+-0.00283
Training: 2022-04-11 17:34:20,663-[lfw][150000]Accuracy-Highest: 0.99683
Training: 2022-04-11 17:34:46,006-[cfp_fp][150000]XNorm: 8.577838
Training: 2022-04-11 17:34:46,007-[cfp_fp][150000]Accuracy-Flip: 0.95886+-0.00966
Training: 2022-04-11 17:34:46,007-[cfp_fp][150000]Accuracy-Highest: 0.96500
Training: 2022-04-11 17:35:07,852-[agedb_30][150000]XNorm: 9.737256
Training: 2022-04-11 17:35:07,853-[agedb_30][150000]Accuracy-Flip: 0.96567+-0.00676
Training: 2022-04-11 17:35:07,853-[agedb_30][150000]Accuracy-Highest: 0.96783
Training: 2022-04-11 17:35:08,985-Speed 145.74 samples/sec   Loss 6.1997   LearningRate 0.0303   Epoch: 8   Global Step: 150010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:10,022-Speed 9876.72 samples/sec   Loss 6.0972   LearningRate 0.0303   Epoch: 8   Global Step: 150020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:11,076-Speed 9719.98 samples/sec   Loss 6.2037   LearningRate 0.0303   Epoch: 8   Global Step: 150030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:12,156-Speed 9491.26 samples/sec   Loss 6.0296   LearningRate 0.0303   Epoch: 8   Global Step: 150040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:13,252-Speed 9344.10 samples/sec   Loss 6.2588   LearningRate 0.0303   Epoch: 8   Global Step: 150050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:14,317-Speed 9619.81 samples/sec   Loss 6.1438   LearningRate 0.0303   Epoch: 8   Global Step: 150060   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:35:15,399-Speed 9478.49 samples/sec   Loss 6.1572   LearningRate 0.0303   Epoch: 8   Global Step: 150070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:16,476-Speed 9511.15 samples/sec   Loss 6.2452   LearningRate 0.0303   Epoch: 8   Global Step: 150080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:17,587-Speed 9221.50 samples/sec   Loss 6.2671   LearningRate 0.0303   Epoch: 8   Global Step: 150090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:18,683-Speed 9345.40 samples/sec   Loss 6.2647   LearningRate 0.0303   Epoch: 8   Global Step: 150100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:19,760-Speed 9516.80 samples/sec   Loss 6.2585   LearningRate 0.0303   Epoch: 8   Global Step: 150110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:20,830-Speed 9570.28 samples/sec   Loss 6.1465   LearningRate 0.0303   Epoch: 8   Global Step: 150120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:21,904-Speed 9540.45 samples/sec   Loss 6.0186   LearningRate 0.0303   Epoch: 8   Global Step: 150130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:22,969-Speed 9625.98 samples/sec   Loss 6.0589   LearningRate 0.0303   Epoch: 8   Global Step: 150140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:24,024-Speed 9709.48 samples/sec   Loss 6.1551   LearningRate 0.0303   Epoch: 8   Global Step: 150150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:25,154-Speed 9063.59 samples/sec   Loss 6.1579   LearningRate 0.0303   Epoch: 8   Global Step: 150160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:26,243-Speed 9410.79 samples/sec   Loss 6.1888   LearningRate 0.0303   Epoch: 8   Global Step: 150170   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:35:27,307-Speed 9624.96 samples/sec   Loss 6.1795   LearningRate 0.0303   Epoch: 8   Global Step: 150180   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:35:28,400-Speed 9377.54 samples/sec   Loss 6.2236   LearningRate 0.0303   Epoch: 8   Global Step: 150190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:29,461-Speed 9662.02 samples/sec   Loss 6.1764   LearningRate 0.0303   Epoch: 8   Global Step: 150200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:35:30,779-Speed 7774.79 samples/sec   Loss 6.1648   LearningRate 0.0303   Epoch: 8   Global Step: 150210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:00,098-Speed 349.27 samples/sec   Loss 6.0630   LearningRate 0.0302   Epoch: 9   Global Step: 150220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:02,338-Speed 4575.05 samples/sec   Loss 5.4139   LearningRate 0.0302   Epoch: 9   Global Step: 150230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:04,245-Speed 5373.38 samples/sec   Loss 5.3798   LearningRate 0.0302   Epoch: 9   Global Step: 150240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:05,550-Speed 7847.79 samples/sec   Loss 5.4377   LearningRate 0.0302   Epoch: 9   Global Step: 150250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:06,635-Speed 9440.83 samples/sec   Loss 5.3727   LearningRate 0.0302   Epoch: 9   Global Step: 150260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:08,347-Speed 5986.79 samples/sec   Loss 5.3082   LearningRate 0.0302   Epoch: 9   Global Step: 150270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:09,617-Speed 8069.09 samples/sec   Loss 5.3834   LearningRate 0.0302   Epoch: 9   Global Step: 150280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:10,743-Speed 9093.83 samples/sec   Loss 5.4596   LearningRate 0.0302   Epoch: 9   Global Step: 150290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:11,874-Speed 9062.26 samples/sec   Loss 5.4681   LearningRate 0.0302   Epoch: 9   Global Step: 150300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:13,039-Speed 8794.37 samples/sec   Loss 5.4692   LearningRate 0.0302   Epoch: 9   Global Step: 150310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:14,125-Speed 9439.30 samples/sec   Loss 5.4152   LearningRate 0.0302   Epoch: 9   Global Step: 150320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:36:15,250-Speed 9102.72 samples/sec   Loss 5.3779   LearningRate 0.0302   Epoch: 9   Global Step: 150330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:36:16,354-Speed 9277.53 samples/sec   Loss 5.3618   LearningRate 0.0302   Epoch: 9   Global Step: 150340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:36:17,485-Speed 9062.45 samples/sec   Loss 5.3853   LearningRate 0.0302   Epoch: 9   Global Step: 150350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:36:18,559-Speed 9538.20 samples/sec   Loss 5.4133   LearningRate 0.0302   Epoch: 9   Global Step: 150360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:36:19,689-Speed 9068.36 samples/sec   Loss 5.3533   LearningRate 0.0302   Epoch: 9   Global Step: 150370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:36:20,801-Speed 9218.49 samples/sec   Loss 5.4325   LearningRate 0.0302   Epoch: 9   Global Step: 150380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:36:21,944-Speed 8961.73 samples/sec   Loss 5.3775   LearningRate 0.0302   Epoch: 9   Global Step: 150390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:36:23,047-Speed 9291.67 samples/sec   Loss 5.3122   LearningRate 0.0302   Epoch: 9   Global Step: 150400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:36:24,145-Speed 9334.17 samples/sec   Loss 5.3432   LearningRate 0.0302   Epoch: 9   Global Step: 150410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:36:25,297-Speed 8892.61 samples/sec   Loss 5.4207   LearningRate 0.0302   Epoch: 9   Global Step: 150420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:26,419-Speed 9132.14 samples/sec   Loss 5.4448   LearningRate 0.0302   Epoch: 9   Global Step: 150430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:27,547-Speed 9078.70 samples/sec   Loss 5.4171   LearningRate 0.0302   Epoch: 9   Global Step: 150440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:28,674-Speed 9088.58 samples/sec   Loss 5.4746   LearningRate 0.0302   Epoch: 9   Global Step: 150450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:29,832-Speed 8847.49 samples/sec   Loss 5.5185   LearningRate 0.0302   Epoch: 9   Global Step: 150460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:30,961-Speed 9075.52 samples/sec   Loss 5.4657   LearningRate 0.0302   Epoch: 9   Global Step: 150470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:32,086-Speed 9107.66 samples/sec   Loss 5.4316   LearningRate 0.0302   Epoch: 9   Global Step: 150480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:33,239-Speed 8885.66 samples/sec   Loss 5.2612   LearningRate 0.0302   Epoch: 9   Global Step: 150490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:34,350-Speed 9230.45 samples/sec   Loss 5.3448   LearningRate 0.0302   Epoch: 9   Global Step: 150500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:35,415-Speed 9627.19 samples/sec   Loss 5.4667   LearningRate 0.0302   Epoch: 9   Global Step: 150510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:36,539-Speed 9113.70 samples/sec   Loss 5.5063   LearningRate 0.0302   Epoch: 9   Global Step: 150520   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:36:37,626-Speed 9426.93 samples/sec   Loss 5.4556   LearningRate 0.0301   Epoch: 9   Global Step: 150530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:38,741-Speed 9189.16 samples/sec   Loss 5.4500   LearningRate 0.0301   Epoch: 9   Global Step: 150540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:39,791-Speed 9760.83 samples/sec   Loss 5.4070   LearningRate 0.0301   Epoch: 9   Global Step: 150550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:40,873-Speed 9468.75 samples/sec   Loss 5.4377   LearningRate 0.0301   Epoch: 9   Global Step: 150560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:41,959-Speed 9436.93 samples/sec   Loss 5.3901   LearningRate 0.0301   Epoch: 9   Global Step: 150570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:43,109-Speed 8908.59 samples/sec   Loss 5.3807   LearningRate 0.0301   Epoch: 9   Global Step: 150580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:44,232-Speed 9119.57 samples/sec   Loss 5.4172   LearningRate 0.0301   Epoch: 9   Global Step: 150590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:45,309-Speed 9513.78 samples/sec   Loss 5.3809   LearningRate 0.0301   Epoch: 9   Global Step: 150600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:46,374-Speed 9625.54 samples/sec   Loss 5.5130   LearningRate 0.0301   Epoch: 9   Global Step: 150610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:47,441-Speed 9604.82 samples/sec   Loss 5.5298   LearningRate 0.0301   Epoch: 9   Global Step: 150620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:48,501-Speed 9661.31 samples/sec   Loss 5.4709   LearningRate 0.0301   Epoch: 9   Global Step: 150630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:49,642-Speed 8978.00 samples/sec   Loss 5.5057   LearningRate 0.0301   Epoch: 9   Global Step: 150640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:50,738-Speed 9353.91 samples/sec   Loss 5.5233   LearningRate 0.0301   Epoch: 9   Global Step: 150650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:51,872-Speed 9036.79 samples/sec   Loss 5.4363   LearningRate 0.0301   Epoch: 9   Global Step: 150660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:52,981-Speed 9238.82 samples/sec   Loss 5.5406   LearningRate 0.0301   Epoch: 9   Global Step: 150670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:54,048-Speed 9598.99 samples/sec   Loss 5.4025   LearningRate 0.0301   Epoch: 9   Global Step: 150680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:55,108-Speed 9666.54 samples/sec   Loss 5.4426   LearningRate 0.0301   Epoch: 9   Global Step: 150690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:56,173-Speed 9620.70 samples/sec   Loss 5.4489   LearningRate 0.0301   Epoch: 9   Global Step: 150700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:57,285-Speed 9214.93 samples/sec   Loss 5.4858   LearningRate 0.0301   Epoch: 9   Global Step: 150710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:58,359-Speed 9536.13 samples/sec   Loss 5.4539   LearningRate 0.0301   Epoch: 9   Global Step: 150720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:36:59,589-Speed 8331.67 samples/sec   Loss 5.4516   LearningRate 0.0301   Epoch: 9   Global Step: 150730   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:37:00,656-Speed 9602.11 samples/sec   Loss 5.5425   LearningRate 0.0301   Epoch: 9   Global Step: 150740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:01,751-Speed 9358.72 samples/sec   Loss 5.5433   LearningRate 0.0301   Epoch: 9   Global Step: 150750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:02,832-Speed 9473.75 samples/sec   Loss 5.5326   LearningRate 0.0301   Epoch: 9   Global Step: 150760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:03,926-Speed 9368.10 samples/sec   Loss 5.4254   LearningRate 0.0301   Epoch: 9   Global Step: 150770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:05,035-Speed 9236.79 samples/sec   Loss 5.5012   LearningRate 0.0301   Epoch: 9   Global Step: 150780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:06,149-Speed 9198.31 samples/sec   Loss 5.5760   LearningRate 0.0301   Epoch: 9   Global Step: 150790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:07,311-Speed 8819.42 samples/sec   Loss 5.4466   LearningRate 0.0301   Epoch: 9   Global Step: 150800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:08,416-Speed 9271.33 samples/sec   Loss 5.4837   LearningRate 0.0301   Epoch: 9   Global Step: 150810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:09,515-Speed 9320.78 samples/sec   Loss 5.5947   LearningRate 0.0301   Epoch: 9   Global Step: 150820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:10,598-Speed 9460.35 samples/sec   Loss 5.5421   LearningRate 0.0300   Epoch: 9   Global Step: 150830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:11,686-Speed 9419.42 samples/sec   Loss 5.5425   LearningRate 0.0300   Epoch: 9   Global Step: 150840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:12,774-Speed 9421.79 samples/sec   Loss 5.4956   LearningRate 0.0300   Epoch: 9   Global Step: 150850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:13,857-Speed 9465.86 samples/sec   Loss 5.5874   LearningRate 0.0300   Epoch: 9   Global Step: 150860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:15,391-Speed 6677.02 samples/sec   Loss 5.5778   LearningRate 0.0300   Epoch: 9   Global Step: 150870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:16,652-Speed 8129.59 samples/sec   Loss 5.5778   LearningRate 0.0300   Epoch: 9   Global Step: 150880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:17,907-Speed 8159.66 samples/sec   Loss 5.5394   LearningRate 0.0300   Epoch: 9   Global Step: 150890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:18,966-Speed 9679.31 samples/sec   Loss 5.5212   LearningRate 0.0300   Epoch: 9   Global Step: 150900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:20,251-Speed 7971.75 samples/sec   Loss 5.6131   LearningRate 0.0300   Epoch: 9   Global Step: 150910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:21,349-Speed 9333.61 samples/sec   Loss 5.5344   LearningRate 0.0300   Epoch: 9   Global Step: 150920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:22,491-Speed 8968.17 samples/sec   Loss 5.4429   LearningRate 0.0300   Epoch: 9   Global Step: 150930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:23,793-Speed 7870.37 samples/sec   Loss 5.5605   LearningRate 0.0300   Epoch: 9   Global Step: 150940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:24,881-Speed 9413.19 samples/sec   Loss 5.4143   LearningRate 0.0300   Epoch: 9   Global Step: 150950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:25,952-Speed 9574.74 samples/sec   Loss 5.5971   LearningRate 0.0300   Epoch: 9   Global Step: 150960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:27,038-Speed 9433.08 samples/sec   Loss 5.5256   LearningRate 0.0300   Epoch: 9   Global Step: 150970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:28,118-Speed 9480.60 samples/sec   Loss 5.4241   LearningRate 0.0300   Epoch: 9   Global Step: 150980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:29,252-Speed 9036.55 samples/sec   Loss 5.5391   LearningRate 0.0300   Epoch: 9   Global Step: 150990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:30,345-Speed 9376.58 samples/sec   Loss 5.5402   LearningRate 0.0300   Epoch: 9   Global Step: 151000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:31,478-Speed 9042.87 samples/sec   Loss 5.5542   LearningRate 0.0300   Epoch: 9   Global Step: 151010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:32,557-Speed 9496.91 samples/sec   Loss 5.6175   LearningRate 0.0300   Epoch: 9   Global Step: 151020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:33,668-Speed 9226.41 samples/sec   Loss 5.4324   LearningRate 0.0300   Epoch: 9   Global Step: 151030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:34,777-Speed 9240.00 samples/sec   Loss 5.5813   LearningRate 0.0300   Epoch: 9   Global Step: 151040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:35,919-Speed 8971.67 samples/sec   Loss 5.5682   LearningRate 0.0300   Epoch: 9   Global Step: 151050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:37,019-Speed 9315.71 samples/sec   Loss 5.5581   LearningRate 0.0300   Epoch: 9   Global Step: 151060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:38,102-Speed 9461.60 samples/sec   Loss 5.5166   LearningRate 0.0300   Epoch: 9   Global Step: 151070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:39,217-Speed 9186.05 samples/sec   Loss 5.6788   LearningRate 0.0300   Epoch: 9   Global Step: 151080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:40,293-Speed 9521.02 samples/sec   Loss 5.5725   LearningRate 0.0300   Epoch: 9   Global Step: 151090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:41,355-Speed 9648.74 samples/sec   Loss 5.5432   LearningRate 0.0300   Epoch: 9   Global Step: 151100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:42,417-Speed 9652.17 samples/sec   Loss 5.5271   LearningRate 0.0300   Epoch: 9   Global Step: 151110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:43,529-Speed 9214.03 samples/sec   Loss 5.4796   LearningRate 0.0300   Epoch: 9   Global Step: 151120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:44,606-Speed 9516.01 samples/sec   Loss 5.5500   LearningRate 0.0300   Epoch: 9   Global Step: 151130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:45,745-Speed 8994.39 samples/sec   Loss 5.6272   LearningRate 0.0299   Epoch: 9   Global Step: 151140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:46,849-Speed 9279.46 samples/sec   Loss 5.5479   LearningRate 0.0299   Epoch: 9   Global Step: 151150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:47,919-Speed 9574.19 samples/sec   Loss 5.5992   LearningRate 0.0299   Epoch: 9   Global Step: 151160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:49,022-Speed 9286.03 samples/sec   Loss 5.5172   LearningRate 0.0299   Epoch: 9   Global Step: 151170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:50,107-Speed 9447.77 samples/sec   Loss 5.6389   LearningRate 0.0299   Epoch: 9   Global Step: 151180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:51,216-Speed 9244.31 samples/sec   Loss 5.5207   LearningRate 0.0299   Epoch: 9   Global Step: 151190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:52,312-Speed 9345.35 samples/sec   Loss 5.6363   LearningRate 0.0299   Epoch: 9   Global Step: 151200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:53,398-Speed 9438.80 samples/sec   Loss 5.5180   LearningRate 0.0299   Epoch: 9   Global Step: 151210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:54,554-Speed 8856.41 samples/sec   Loss 5.6633   LearningRate 0.0299   Epoch: 9   Global Step: 151220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:55,673-Speed 9156.47 samples/sec   Loss 5.6301   LearningRate 0.0299   Epoch: 9   Global Step: 151230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:37:56,777-Speed 9282.39 samples/sec   Loss 5.4974   LearningRate 0.0299   Epoch: 9   Global Step: 151240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:57,836-Speed 9676.96 samples/sec   Loss 5.5128   LearningRate 0.0299   Epoch: 9   Global Step: 151250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:37:58,936-Speed 9314.94 samples/sec   Loss 5.6660   LearningRate 0.0299   Epoch: 9   Global Step: 151260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:38:00,048-Speed 9218.81 samples/sec   Loss 5.4727   LearningRate 0.0299   Epoch: 9   Global Step: 151270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:38:01,152-Speed 9280.43 samples/sec   Loss 5.5228   LearningRate 0.0299   Epoch: 9   Global Step: 151280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:38:02,218-Speed 9603.90 samples/sec   Loss 5.6064   LearningRate 0.0299   Epoch: 9   Global Step: 151290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:38:03,380-Speed 8821.99 samples/sec   Loss 5.5925   LearningRate 0.0299   Epoch: 9   Global Step: 151300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:38:04,535-Speed 8865.00 samples/sec   Loss 5.5437   LearningRate 0.0299   Epoch: 9   Global Step: 151310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:38:05,649-Speed 9201.22 samples/sec   Loss 5.5735   LearningRate 0.0299   Epoch: 9   Global Step: 151320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:38:06,773-Speed 9116.70 samples/sec   Loss 5.6522   LearningRate 0.0299   Epoch: 9   Global Step: 151330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:38:07,862-Speed 9409.67 samples/sec   Loss 5.5525   LearningRate 0.0299   Epoch: 9   Global Step: 151340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:08,992-Speed 9069.17 samples/sec   Loss 5.5101   LearningRate 0.0299   Epoch: 9   Global Step: 151350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:10,075-Speed 9464.65 samples/sec   Loss 5.6729   LearningRate 0.0299   Epoch: 9   Global Step: 151360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:11,199-Speed 9110.44 samples/sec   Loss 5.6250   LearningRate 0.0299   Epoch: 9   Global Step: 151370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:12,305-Speed 9267.74 samples/sec   Loss 5.4947   LearningRate 0.0299   Epoch: 9   Global Step: 151380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:13,403-Speed 9333.33 samples/sec   Loss 5.5384   LearningRate 0.0299   Epoch: 9   Global Step: 151390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:14,515-Speed 9210.83 samples/sec   Loss 5.5828   LearningRate 0.0299   Epoch: 9   Global Step: 151400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:15,613-Speed 9334.45 samples/sec   Loss 5.5025   LearningRate 0.0299   Epoch: 9   Global Step: 151410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:16,720-Speed 9254.69 samples/sec   Loss 5.5566   LearningRate 0.0299   Epoch: 9   Global Step: 151420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:17,793-Speed 9546.90 samples/sec   Loss 5.5255   LearningRate 0.0299   Epoch: 9   Global Step: 151430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:18,888-Speed 9362.38 samples/sec   Loss 5.6088   LearningRate 0.0298   Epoch: 9   Global Step: 151440   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:38:20,032-Speed 8953.20 samples/sec   Loss 5.5319   LearningRate 0.0298   Epoch: 9   Global Step: 151450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:21,122-Speed 9403.54 samples/sec   Loss 5.5602   LearningRate 0.0298   Epoch: 9   Global Step: 151460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:22,270-Speed 8924.53 samples/sec   Loss 5.6007   LearningRate 0.0298   Epoch: 9   Global Step: 151470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:23,350-Speed 9487.22 samples/sec   Loss 5.5628   LearningRate 0.0298   Epoch: 9   Global Step: 151480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:24,489-Speed 8989.66 samples/sec   Loss 5.5723   LearningRate 0.0298   Epoch: 9   Global Step: 151490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:25,574-Speed 9451.38 samples/sec   Loss 5.6442   LearningRate 0.0298   Epoch: 9   Global Step: 151500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:26,684-Speed 9239.14 samples/sec   Loss 5.5662   LearningRate 0.0298   Epoch: 9   Global Step: 151510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:27,794-Speed 9228.64 samples/sec   Loss 5.4624   LearningRate 0.0298   Epoch: 9   Global Step: 151520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:28,930-Speed 9019.95 samples/sec   Loss 5.6658   LearningRate 0.0298   Epoch: 9   Global Step: 151530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:29,994-Speed 9635.34 samples/sec   Loss 5.6606   LearningRate 0.0298   Epoch: 9   Global Step: 151540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:31,090-Speed 9342.91 samples/sec   Loss 5.5510   LearningRate 0.0298   Epoch: 9   Global Step: 151550   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:38:32,172-Speed 9471.86 samples/sec   Loss 5.6174   LearningRate 0.0298   Epoch: 9   Global Step: 151560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:33,284-Speed 9209.17 samples/sec   Loss 5.6426   LearningRate 0.0298   Epoch: 9   Global Step: 151570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:34,380-Speed 9350.98 samples/sec   Loss 5.5438   LearningRate 0.0298   Epoch: 9   Global Step: 151580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:35,454-Speed 9540.61 samples/sec   Loss 5.5402   LearningRate 0.0298   Epoch: 9   Global Step: 151590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:36,541-Speed 9424.72 samples/sec   Loss 5.5722   LearningRate 0.0298   Epoch: 9   Global Step: 151600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:37,609-Speed 9592.70 samples/sec   Loss 5.6233   LearningRate 0.0298   Epoch: 9   Global Step: 151610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:38,696-Speed 9428.04 samples/sec   Loss 5.6403   LearningRate 0.0298   Epoch: 9   Global Step: 151620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:39,745-Speed 9767.72 samples/sec   Loss 5.6512   LearningRate 0.0298   Epoch: 9   Global Step: 151630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:40,796-Speed 9750.25 samples/sec   Loss 5.5991   LearningRate 0.0298   Epoch: 9   Global Step: 151640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:41,847-Speed 9745.29 samples/sec   Loss 5.6581   LearningRate 0.0298   Epoch: 9   Global Step: 151650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:42,974-Speed 9088.70 samples/sec   Loss 5.6164   LearningRate 0.0298   Epoch: 9   Global Step: 151660   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:38:44,116-Speed 8978.42 samples/sec   Loss 5.6443   LearningRate 0.0298   Epoch: 9   Global Step: 151670   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:38:45,191-Speed 9532.16 samples/sec   Loss 5.6351   LearningRate 0.0298   Epoch: 9   Global Step: 151680   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:38:46,299-Speed 9255.74 samples/sec   Loss 5.6403   LearningRate 0.0298   Epoch: 9   Global Step: 151690   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:38:47,398-Speed 9315.68 samples/sec   Loss 5.5469   LearningRate 0.0298   Epoch: 9   Global Step: 151700   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:38:48,498-Speed 9314.52 samples/sec   Loss 5.5902   LearningRate 0.0298   Epoch: 9   Global Step: 151710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:49,558-Speed 9674.75 samples/sec   Loss 5.5858   LearningRate 0.0298   Epoch: 9   Global Step: 151720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:50,697-Speed 8988.45 samples/sec   Loss 5.6268   LearningRate 0.0298   Epoch: 9   Global Step: 151730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:51,804-Speed 9257.52 samples/sec   Loss 5.5681   LearningRate 0.0298   Epoch: 9   Global Step: 151740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:52,891-Speed 9422.90 samples/sec   Loss 5.6623   LearningRate 0.0297   Epoch: 9   Global Step: 151750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:54,041-Speed 8909.77 samples/sec   Loss 5.6893   LearningRate 0.0297   Epoch: 9   Global Step: 151760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:55,113-Speed 9561.03 samples/sec   Loss 5.7100   LearningRate 0.0297   Epoch: 9   Global Step: 151770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:56,223-Speed 9236.62 samples/sec   Loss 5.5553   LearningRate 0.0297   Epoch: 9   Global Step: 151780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:57,334-Speed 9217.48 samples/sec   Loss 5.5864   LearningRate 0.0297   Epoch: 9   Global Step: 151790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:58,415-Speed 9480.64 samples/sec   Loss 5.6573   LearningRate 0.0297   Epoch: 9   Global Step: 151800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:38:59,503-Speed 9415.28 samples/sec   Loss 5.7141   LearningRate 0.0297   Epoch: 9   Global Step: 151810   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:39:00,560-Speed 9694.60 samples/sec   Loss 5.6704   LearningRate 0.0297   Epoch: 9   Global Step: 151820   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:39:01,633-Speed 9552.93 samples/sec   Loss 5.6138   LearningRate 0.0297   Epoch: 9   Global Step: 151830   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:39:02,757-Speed 9113.87 samples/sec   Loss 5.6284   LearningRate 0.0297   Epoch: 9   Global Step: 151840   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:39:03,896-Speed 8999.56 samples/sec   Loss 5.7053   LearningRate 0.0297   Epoch: 9   Global Step: 151850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:39:04,989-Speed 9377.62 samples/sec   Loss 5.5859   LearningRate 0.0297   Epoch: 9   Global Step: 151860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:39:06,079-Speed 9394.50 samples/sec   Loss 5.6435   LearningRate 0.0297   Epoch: 9   Global Step: 151870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:39:07,178-Speed 9325.86 samples/sec   Loss 5.5388   LearningRate 0.0297   Epoch: 9   Global Step: 151880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:39:08,263-Speed 9447.90 samples/sec   Loss 5.5950   LearningRate 0.0297   Epoch: 9   Global Step: 151890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:39:09,380-Speed 9168.08 samples/sec   Loss 5.6305   LearningRate 0.0297   Epoch: 9   Global Step: 151900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:39:10,441-Speed 9660.56 samples/sec   Loss 5.6578   LearningRate 0.0297   Epoch: 9   Global Step: 151910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:39:11,565-Speed 9111.93 samples/sec   Loss 5.6461   LearningRate 0.0297   Epoch: 9   Global Step: 151920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:39:12,628-Speed 9637.80 samples/sec   Loss 5.6973   LearningRate 0.0297   Epoch: 9   Global Step: 151930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:39:13,703-Speed 9536.78 samples/sec   Loss 5.6267   LearningRate 0.0297   Epoch: 9   Global Step: 151940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:39:14,773-Speed 9578.48 samples/sec   Loss 5.6743   LearningRate 0.0297   Epoch: 9   Global Step: 151950   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:39:15,851-Speed 9502.05 samples/sec   Loss 5.6548   LearningRate 0.0297   Epoch: 9   Global Step: 151960   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:39:16,929-Speed 9508.14 samples/sec   Loss 5.7430   LearningRate 0.0297   Epoch: 9   Global Step: 151970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:39:18,050-Speed 9133.68 samples/sec   Loss 5.7389   LearningRate 0.0297   Epoch: 9   Global Step: 151980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:39:19,104-Speed 9725.88 samples/sec   Loss 5.5751   LearningRate 0.0297   Epoch: 9   Global Step: 151990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:39:20,221-Speed 9171.19 samples/sec   Loss 5.5875   LearningRate 0.0297   Epoch: 9   Global Step: 152000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:39:42,250-[lfw][152000]XNorm: 10.094522
Training: 2022-04-11 17:39:42,251-[lfw][152000]Accuracy-Flip: 0.99617+-0.00236
Training: 2022-04-11 17:39:42,251-[lfw][152000]Accuracy-Highest: 0.99683
Training: 2022-04-11 17:40:07,643-[cfp_fp][152000]XNorm: 8.675968
Training: 2022-04-11 17:40:07,644-[cfp_fp][152000]Accuracy-Flip: 0.96314+-0.00912
Training: 2022-04-11 17:40:07,644-[cfp_fp][152000]Accuracy-Highest: 0.96500
Training: 2022-04-11 17:40:29,539-[agedb_30][152000]XNorm: 9.791307
Training: 2022-04-11 17:40:29,540-[agedb_30][152000]Accuracy-Flip: 0.96700+-0.00819
Training: 2022-04-11 17:40:29,540-[agedb_30][152000]Accuracy-Highest: 0.96783
Training: 2022-04-11 17:40:30,643-Speed 145.41 samples/sec   Loss 5.5529   LearningRate 0.0297   Epoch: 9   Global Step: 152010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:31,732-Speed 9407.80 samples/sec   Loss 5.7041   LearningRate 0.0297   Epoch: 9   Global Step: 152020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:32,813-Speed 9479.96 samples/sec   Loss 5.5241   LearningRate 0.0297   Epoch: 9   Global Step: 152030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:33,914-Speed 9298.90 samples/sec   Loss 5.6040   LearningRate 0.0297   Epoch: 9   Global Step: 152040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:34,990-Speed 9525.27 samples/sec   Loss 5.6037   LearningRate 0.0296   Epoch: 9   Global Step: 152050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:36,074-Speed 9450.67 samples/sec   Loss 5.7864   LearningRate 0.0296   Epoch: 9   Global Step: 152060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:37,192-Speed 9167.40 samples/sec   Loss 5.6711   LearningRate 0.0296   Epoch: 9   Global Step: 152070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:38,283-Speed 9397.43 samples/sec   Loss 5.7700   LearningRate 0.0296   Epoch: 9   Global Step: 152080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:39,372-Speed 9403.86 samples/sec   Loss 5.7427   LearningRate 0.0296   Epoch: 9   Global Step: 152090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:40,513-Speed 8982.47 samples/sec   Loss 5.6514   LearningRate 0.0296   Epoch: 9   Global Step: 152100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:41,588-Speed 9524.00 samples/sec   Loss 5.7275   LearningRate 0.0296   Epoch: 9   Global Step: 152110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:42,669-Speed 9479.51 samples/sec   Loss 5.6288   LearningRate 0.0296   Epoch: 9   Global Step: 152120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:43,771-Speed 9297.97 samples/sec   Loss 5.6721   LearningRate 0.0296   Epoch: 9   Global Step: 152130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:44,870-Speed 9319.79 samples/sec   Loss 5.6936   LearningRate 0.0296   Epoch: 9   Global Step: 152140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:45,953-Speed 9460.44 samples/sec   Loss 5.6776   LearningRate 0.0296   Epoch: 9   Global Step: 152150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:47,023-Speed 9581.77 samples/sec   Loss 5.8072   LearningRate 0.0296   Epoch: 9   Global Step: 152160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:48,110-Speed 9426.29 samples/sec   Loss 5.6729   LearningRate 0.0296   Epoch: 9   Global Step: 152170   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:40:49,187-Speed 9509.41 samples/sec   Loss 5.7403   LearningRate 0.0296   Epoch: 9   Global Step: 152180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:50,298-Speed 9227.82 samples/sec   Loss 5.6967   LearningRate 0.0296   Epoch: 9   Global Step: 152190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:51,404-Speed 9258.13 samples/sec   Loss 5.7227   LearningRate 0.0296   Epoch: 9   Global Step: 152200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:52,507-Speed 9299.93 samples/sec   Loss 5.7318   LearningRate 0.0296   Epoch: 9   Global Step: 152210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:53,622-Speed 9194.41 samples/sec   Loss 5.6977   LearningRate 0.0296   Epoch: 9   Global Step: 152220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:54,736-Speed 9192.44 samples/sec   Loss 5.6784   LearningRate 0.0296   Epoch: 9   Global Step: 152230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:55,826-Speed 9401.14 samples/sec   Loss 5.8298   LearningRate 0.0296   Epoch: 9   Global Step: 152240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:56,885-Speed 9681.89 samples/sec   Loss 5.6298   LearningRate 0.0296   Epoch: 9   Global Step: 152250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:57,990-Speed 9270.29 samples/sec   Loss 5.6406   LearningRate 0.0296   Epoch: 9   Global Step: 152260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:40:59,063-Speed 9543.98 samples/sec   Loss 5.7621   LearningRate 0.0296   Epoch: 9   Global Step: 152270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:00,178-Speed 9189.71 samples/sec   Loss 5.8145   LearningRate 0.0296   Epoch: 9   Global Step: 152280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:01,267-Speed 9417.19 samples/sec   Loss 5.6694   LearningRate 0.0296   Epoch: 9   Global Step: 152290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:02,388-Speed 9139.03 samples/sec   Loss 5.7120   LearningRate 0.0296   Epoch: 9   Global Step: 152300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:03,489-Speed 9300.99 samples/sec   Loss 5.7040   LearningRate 0.0296   Epoch: 9   Global Step: 152310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:04,575-Speed 9440.57 samples/sec   Loss 5.7143   LearningRate 0.0296   Epoch: 9   Global Step: 152320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:05,636-Speed 9651.45 samples/sec   Loss 5.7191   LearningRate 0.0296   Epoch: 9   Global Step: 152330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:06,709-Speed 9554.50 samples/sec   Loss 5.6301   LearningRate 0.0296   Epoch: 9   Global Step: 152340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:07,819-Speed 9228.18 samples/sec   Loss 5.5611   LearningRate 0.0296   Epoch: 9   Global Step: 152350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:08,923-Speed 9277.47 samples/sec   Loss 5.6450   LearningRate 0.0295   Epoch: 9   Global Step: 152360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:10,006-Speed 9459.59 samples/sec   Loss 5.7490   LearningRate 0.0295   Epoch: 9   Global Step: 152370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:11,096-Speed 9407.56 samples/sec   Loss 5.7318   LearningRate 0.0295   Epoch: 9   Global Step: 152380   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:41:12,163-Speed 9601.99 samples/sec   Loss 5.7816   LearningRate 0.0295   Epoch: 9   Global Step: 152390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:13,289-Speed 9099.58 samples/sec   Loss 5.7347   LearningRate 0.0295   Epoch: 9   Global Step: 152400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:14,360-Speed 9562.01 samples/sec   Loss 5.6910   LearningRate 0.0295   Epoch: 9   Global Step: 152410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:15,470-Speed 9230.93 samples/sec   Loss 5.6718   LearningRate 0.0295   Epoch: 9   Global Step: 152420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:16,552-Speed 9474.50 samples/sec   Loss 5.7799   LearningRate 0.0295   Epoch: 9   Global Step: 152430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:17,629-Speed 9512.64 samples/sec   Loss 5.7461   LearningRate 0.0295   Epoch: 9   Global Step: 152440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:18,717-Speed 9415.98 samples/sec   Loss 5.7086   LearningRate 0.0295   Epoch: 9   Global Step: 152450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:19,819-Speed 9298.98 samples/sec   Loss 5.6948   LearningRate 0.0295   Epoch: 9   Global Step: 152460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:20,886-Speed 9606.14 samples/sec   Loss 5.8271   LearningRate 0.0295   Epoch: 9   Global Step: 152470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:21,946-Speed 9663.46 samples/sec   Loss 5.7098   LearningRate 0.0295   Epoch: 9   Global Step: 152480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:23,052-Speed 9263.73 samples/sec   Loss 5.7136   LearningRate 0.0295   Epoch: 9   Global Step: 152490   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:41:24,143-Speed 9390.24 samples/sec   Loss 5.6789   LearningRate 0.0295   Epoch: 9   Global Step: 152500   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:41:25,239-Speed 9355.04 samples/sec   Loss 5.6373   LearningRate 0.0295   Epoch: 9   Global Step: 152510   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:41:26,308-Speed 9584.65 samples/sec   Loss 5.7515   LearningRate 0.0295   Epoch: 9   Global Step: 152520   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:41:27,409-Speed 9302.25 samples/sec   Loss 5.6613   LearningRate 0.0295   Epoch: 9   Global Step: 152530   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:41:28,513-Speed 9282.52 samples/sec   Loss 5.6690   LearningRate 0.0295   Epoch: 9   Global Step: 152540   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:41:29,639-Speed 9097.40 samples/sec   Loss 5.6555   LearningRate 0.0295   Epoch: 9   Global Step: 152550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:30,695-Speed 9710.32 samples/sec   Loss 5.6493   LearningRate 0.0295   Epoch: 9   Global Step: 152560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:31,769-Speed 9534.86 samples/sec   Loss 5.7608   LearningRate 0.0295   Epoch: 9   Global Step: 152570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:32,825-Speed 9707.44 samples/sec   Loss 5.6914   LearningRate 0.0295   Epoch: 9   Global Step: 152580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:33,928-Speed 9290.72 samples/sec   Loss 5.7064   LearningRate 0.0295   Epoch: 9   Global Step: 152590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:35,018-Speed 9396.81 samples/sec   Loss 5.7386   LearningRate 0.0295   Epoch: 9   Global Step: 152600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:36,095-Speed 9519.61 samples/sec   Loss 5.7110   LearningRate 0.0295   Epoch: 9   Global Step: 152610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:37,217-Speed 9130.80 samples/sec   Loss 5.7550   LearningRate 0.0295   Epoch: 9   Global Step: 152620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:38,317-Speed 9311.84 samples/sec   Loss 5.7292   LearningRate 0.0295   Epoch: 9   Global Step: 152630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:39,426-Speed 9235.25 samples/sec   Loss 5.7134   LearningRate 0.0295   Epoch: 9   Global Step: 152640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:40,495-Speed 9588.01 samples/sec   Loss 5.7040   LearningRate 0.0295   Epoch: 9   Global Step: 152650   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:41:41,571-Speed 9518.65 samples/sec   Loss 5.8050   LearningRate 0.0295   Epoch: 9   Global Step: 152660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:42,679-Speed 9249.01 samples/sec   Loss 5.7296   LearningRate 0.0294   Epoch: 9   Global Step: 152670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:43,758-Speed 9494.41 samples/sec   Loss 5.6368   LearningRate 0.0294   Epoch: 9   Global Step: 152680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:44,864-Speed 9261.89 samples/sec   Loss 5.7033   LearningRate 0.0294   Epoch: 9   Global Step: 152690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:45,929-Speed 9621.67 samples/sec   Loss 5.6934   LearningRate 0.0294   Epoch: 9   Global Step: 152700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:47,080-Speed 8906.86 samples/sec   Loss 5.7253   LearningRate 0.0294   Epoch: 9   Global Step: 152710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:48,184-Speed 9281.21 samples/sec   Loss 5.8396   LearningRate 0.0294   Epoch: 9   Global Step: 152720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:49,295-Speed 9226.51 samples/sec   Loss 5.7792   LearningRate 0.0294   Epoch: 9   Global Step: 152730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:50,379-Speed 9449.26 samples/sec   Loss 5.6408   LearningRate 0.0294   Epoch: 9   Global Step: 152740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:51,502-Speed 9124.26 samples/sec   Loss 5.7982   LearningRate 0.0294   Epoch: 9   Global Step: 152750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:52,572-Speed 9573.77 samples/sec   Loss 5.8437   LearningRate 0.0294   Epoch: 9   Global Step: 152760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:53,696-Speed 9120.68 samples/sec   Loss 5.8456   LearningRate 0.0294   Epoch: 9   Global Step: 152770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:54,818-Speed 9131.67 samples/sec   Loss 5.8131   LearningRate 0.0294   Epoch: 9   Global Step: 152780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:55,924-Speed 9262.95 samples/sec   Loss 5.7889   LearningRate 0.0294   Epoch: 9   Global Step: 152790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:57,026-Speed 9298.08 samples/sec   Loss 5.6899   LearningRate 0.0294   Epoch: 9   Global Step: 152800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:58,133-Speed 9256.81 samples/sec   Loss 5.7947   LearningRate 0.0294   Epoch: 9   Global Step: 152810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:41:59,222-Speed 9402.18 samples/sec   Loss 5.7655   LearningRate 0.0294   Epoch: 9   Global Step: 152820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:00,293-Speed 9570.84 samples/sec   Loss 5.6281   LearningRate 0.0294   Epoch: 9   Global Step: 152830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:01,427-Speed 9030.94 samples/sec   Loss 5.6861   LearningRate 0.0294   Epoch: 9   Global Step: 152840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:02,531-Speed 9286.86 samples/sec   Loss 5.7774   LearningRate 0.0294   Epoch: 9   Global Step: 152850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:03,654-Speed 9118.37 samples/sec   Loss 5.7355   LearningRate 0.0294   Epoch: 9   Global Step: 152860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:04,743-Speed 9414.02 samples/sec   Loss 5.6555   LearningRate 0.0294   Epoch: 9   Global Step: 152870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:05,908-Speed 8796.36 samples/sec   Loss 5.6971   LearningRate 0.0294   Epoch: 9   Global Step: 152880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:06,958-Speed 9749.83 samples/sec   Loss 5.7501   LearningRate 0.0294   Epoch: 9   Global Step: 152890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:08,065-Speed 9259.06 samples/sec   Loss 5.6979   LearningRate 0.0294   Epoch: 9   Global Step: 152900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:09,195-Speed 9073.02 samples/sec   Loss 5.6185   LearningRate 0.0294   Epoch: 9   Global Step: 152910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:10,255-Speed 9664.77 samples/sec   Loss 5.7257   LearningRate 0.0294   Epoch: 9   Global Step: 152920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:11,369-Speed 9198.17 samples/sec   Loss 5.7477   LearningRate 0.0294   Epoch: 9   Global Step: 152930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:12,419-Speed 9754.04 samples/sec   Loss 5.8802   LearningRate 0.0294   Epoch: 9   Global Step: 152940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:13,477-Speed 9681.86 samples/sec   Loss 5.6835   LearningRate 0.0294   Epoch: 9   Global Step: 152950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:14,576-Speed 9321.63 samples/sec   Loss 5.7687   LearningRate 0.0294   Epoch: 9   Global Step: 152960   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:42:15,630-Speed 9726.18 samples/sec   Loss 5.7161   LearningRate 0.0294   Epoch: 9   Global Step: 152970   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:42:16,759-Speed 9075.65 samples/sec   Loss 5.6706   LearningRate 0.0293   Epoch: 9   Global Step: 152980   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:42:17,820-Speed 9659.28 samples/sec   Loss 5.7834   LearningRate 0.0293   Epoch: 9   Global Step: 152990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:18,946-Speed 9098.42 samples/sec   Loss 5.7700   LearningRate 0.0293   Epoch: 9   Global Step: 153000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:20,039-Speed 9372.14 samples/sec   Loss 5.8325   LearningRate 0.0293   Epoch: 9   Global Step: 153010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:21,144-Speed 9279.85 samples/sec   Loss 5.7389   LearningRate 0.0293   Epoch: 9   Global Step: 153020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:22,225-Speed 9471.53 samples/sec   Loss 5.7688   LearningRate 0.0293   Epoch: 9   Global Step: 153030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:23,308-Speed 9466.57 samples/sec   Loss 5.8700   LearningRate 0.0293   Epoch: 9   Global Step: 153040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:24,391-Speed 9460.02 samples/sec   Loss 5.7792   LearningRate 0.0293   Epoch: 9   Global Step: 153050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:25,501-Speed 9229.52 samples/sec   Loss 5.6517   LearningRate 0.0293   Epoch: 9   Global Step: 153060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:26,637-Speed 9020.71 samples/sec   Loss 5.6214   LearningRate 0.0293   Epoch: 9   Global Step: 153070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:27,713-Speed 9520.35 samples/sec   Loss 5.8853   LearningRate 0.0293   Epoch: 9   Global Step: 153080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:28,759-Speed 9795.61 samples/sec   Loss 5.7285   LearningRate 0.0293   Epoch: 9   Global Step: 153090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:29,846-Speed 9428.27 samples/sec   Loss 5.7563   LearningRate 0.0293   Epoch: 9   Global Step: 153100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:30,958-Speed 9217.05 samples/sec   Loss 5.7082   LearningRate 0.0293   Epoch: 9   Global Step: 153110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:32,105-Speed 8930.83 samples/sec   Loss 5.7653   LearningRate 0.0293   Epoch: 9   Global Step: 153120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:33,205-Speed 9315.29 samples/sec   Loss 5.8095   LearningRate 0.0293   Epoch: 9   Global Step: 153130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:34,294-Speed 9402.61 samples/sec   Loss 5.8661   LearningRate 0.0293   Epoch: 9   Global Step: 153140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:35,404-Speed 9238.39 samples/sec   Loss 5.7940   LearningRate 0.0293   Epoch: 9   Global Step: 153150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:36,506-Speed 9299.90 samples/sec   Loss 5.7438   LearningRate 0.0293   Epoch: 9   Global Step: 153160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:37,632-Speed 9095.30 samples/sec   Loss 5.8178   LearningRate 0.0293   Epoch: 9   Global Step: 153170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:38,688-Speed 9705.93 samples/sec   Loss 5.7320   LearningRate 0.0293   Epoch: 9   Global Step: 153180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:39,795-Speed 9249.61 samples/sec   Loss 5.7870   LearningRate 0.0293   Epoch: 9   Global Step: 153190   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:42:40,851-Speed 9708.73 samples/sec   Loss 5.8124   LearningRate 0.0293   Epoch: 9   Global Step: 153200   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:42:41,955-Speed 9279.58 samples/sec   Loss 5.8177   LearningRate 0.0293   Epoch: 9   Global Step: 153210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:43,053-Speed 9329.80 samples/sec   Loss 5.8581   LearningRate 0.0293   Epoch: 9   Global Step: 153220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:44,176-Speed 9125.15 samples/sec   Loss 5.7827   LearningRate 0.0293   Epoch: 9   Global Step: 153230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:45,270-Speed 9371.14 samples/sec   Loss 5.8539   LearningRate 0.0293   Epoch: 9   Global Step: 153240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:46,346-Speed 9533.80 samples/sec   Loss 5.8950   LearningRate 0.0293   Epoch: 9   Global Step: 153250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:47,479-Speed 9038.60 samples/sec   Loss 5.8235   LearningRate 0.0293   Epoch: 9   Global Step: 153260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:48,585-Speed 9264.17 samples/sec   Loss 5.7067   LearningRate 0.0293   Epoch: 9   Global Step: 153270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:49,702-Speed 9175.68 samples/sec   Loss 5.7109   LearningRate 0.0292   Epoch: 9   Global Step: 153280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:50,782-Speed 9488.91 samples/sec   Loss 5.8280   LearningRate 0.0292   Epoch: 9   Global Step: 153290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:51,875-Speed 9374.86 samples/sec   Loss 5.6855   LearningRate 0.0292   Epoch: 9   Global Step: 153300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:52,947-Speed 9560.03 samples/sec   Loss 5.8239   LearningRate 0.0292   Epoch: 9   Global Step: 153310   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:42:54,027-Speed 9491.49 samples/sec   Loss 5.8094   LearningRate 0.0292   Epoch: 9   Global Step: 153320   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:42:55,183-Speed 8856.99 samples/sec   Loss 5.7346   LearningRate 0.0292   Epoch: 9   Global Step: 153330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:56,313-Speed 9067.79 samples/sec   Loss 5.6801   LearningRate 0.0292   Epoch: 9   Global Step: 153340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:57,389-Speed 9525.83 samples/sec   Loss 5.8148   LearningRate 0.0292   Epoch: 9   Global Step: 153350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:58,479-Speed 9404.56 samples/sec   Loss 5.7287   LearningRate 0.0292   Epoch: 9   Global Step: 153360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:42:59,545-Speed 9610.35 samples/sec   Loss 5.7760   LearningRate 0.0292   Epoch: 9   Global Step: 153370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:00,668-Speed 9124.14 samples/sec   Loss 5.8331   LearningRate 0.0292   Epoch: 9   Global Step: 153380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:01,772-Speed 9281.27 samples/sec   Loss 5.6935   LearningRate 0.0292   Epoch: 9   Global Step: 153390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:02,884-Speed 9208.90 samples/sec   Loss 5.7883   LearningRate 0.0292   Epoch: 9   Global Step: 153400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:04,027-Speed 8968.64 samples/sec   Loss 5.7200   LearningRate 0.0292   Epoch: 9   Global Step: 153410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:05,129-Speed 9295.54 samples/sec   Loss 5.7521   LearningRate 0.0292   Epoch: 9   Global Step: 153420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:06,227-Speed 9328.65 samples/sec   Loss 5.8764   LearningRate 0.0292   Epoch: 9   Global Step: 153430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:07,317-Speed 9400.68 samples/sec   Loss 5.8015   LearningRate 0.0292   Epoch: 9   Global Step: 153440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:08,393-Speed 9521.53 samples/sec   Loss 5.7722   LearningRate 0.0292   Epoch: 9   Global Step: 153450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:09,465-Speed 9557.21 samples/sec   Loss 5.8070   LearningRate 0.0292   Epoch: 9   Global Step: 153460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:10,567-Speed 9307.55 samples/sec   Loss 5.9128   LearningRate 0.0292   Epoch: 9   Global Step: 153470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:11,659-Speed 9374.85 samples/sec   Loss 5.8072   LearningRate 0.0292   Epoch: 9   Global Step: 153480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:12,728-Speed 9589.01 samples/sec   Loss 5.8706   LearningRate 0.0292   Epoch: 9   Global Step: 153490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:13,832-Speed 9280.90 samples/sec   Loss 5.6899   LearningRate 0.0292   Epoch: 9   Global Step: 153500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:14,976-Speed 8958.20 samples/sec   Loss 5.8704   LearningRate 0.0292   Epoch: 9   Global Step: 153510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:16,085-Speed 9243.67 samples/sec   Loss 5.7165   LearningRate 0.0292   Epoch: 9   Global Step: 153520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:17,193-Speed 9242.77 samples/sec   Loss 5.7495   LearningRate 0.0292   Epoch: 9   Global Step: 153530   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:43:18,334-Speed 8982.47 samples/sec   Loss 5.8223   LearningRate 0.0292   Epoch: 9   Global Step: 153540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:19,424-Speed 9395.70 samples/sec   Loss 5.8080   LearningRate 0.0292   Epoch: 9   Global Step: 153550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:20,542-Speed 9172.50 samples/sec   Loss 5.7626   LearningRate 0.0292   Epoch: 9   Global Step: 153560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:21,616-Speed 9533.41 samples/sec   Loss 5.7486   LearningRate 0.0292   Epoch: 9   Global Step: 153570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:22,730-Speed 9202.34 samples/sec   Loss 5.8458   LearningRate 0.0292   Epoch: 9   Global Step: 153580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:23,828-Speed 9336.68 samples/sec   Loss 5.8131   LearningRate 0.0291   Epoch: 9   Global Step: 153590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:24,943-Speed 9190.07 samples/sec   Loss 5.7726   LearningRate 0.0291   Epoch: 9   Global Step: 153600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:26,028-Speed 9444.99 samples/sec   Loss 5.8362   LearningRate 0.0291   Epoch: 9   Global Step: 153610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:27,113-Speed 9437.29 samples/sec   Loss 5.6675   LearningRate 0.0291   Epoch: 9   Global Step: 153620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:28,237-Speed 9115.03 samples/sec   Loss 5.9753   LearningRate 0.0291   Epoch: 9   Global Step: 153630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:29,309-Speed 9567.09 samples/sec   Loss 5.8984   LearningRate 0.0291   Epoch: 9   Global Step: 153640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:30,484-Speed 8717.17 samples/sec   Loss 5.8111   LearningRate 0.0291   Epoch: 9   Global Step: 153650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:31,584-Speed 9316.62 samples/sec   Loss 5.8269   LearningRate 0.0291   Epoch: 9   Global Step: 153660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:32,668-Speed 9452.79 samples/sec   Loss 5.7798   LearningRate 0.0291   Epoch: 9   Global Step: 153670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:33,747-Speed 9490.00 samples/sec   Loss 5.7966   LearningRate 0.0291   Epoch: 9   Global Step: 153680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:34,860-Speed 9208.16 samples/sec   Loss 5.7830   LearningRate 0.0291   Epoch: 9   Global Step: 153690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:35,980-Speed 9145.22 samples/sec   Loss 5.8357   LearningRate 0.0291   Epoch: 9   Global Step: 153700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:37,084-Speed 9281.02 samples/sec   Loss 5.8412   LearningRate 0.0291   Epoch: 9   Global Step: 153710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:38,160-Speed 9521.53 samples/sec   Loss 5.8080   LearningRate 0.0291   Epoch: 9   Global Step: 153720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:39,239-Speed 9493.85 samples/sec   Loss 5.8421   LearningRate 0.0291   Epoch: 9   Global Step: 153730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:40,308-Speed 9590.23 samples/sec   Loss 5.8152   LearningRate 0.0291   Epoch: 9   Global Step: 153740   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:43:41,362-Speed 9721.24 samples/sec   Loss 5.7722   LearningRate 0.0291   Epoch: 9   Global Step: 153750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:42,449-Speed 9428.45 samples/sec   Loss 5.6515   LearningRate 0.0291   Epoch: 9   Global Step: 153760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:43,535-Speed 9434.66 samples/sec   Loss 5.7290   LearningRate 0.0291   Epoch: 9   Global Step: 153770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:44,617-Speed 9469.85 samples/sec   Loss 5.7306   LearningRate 0.0291   Epoch: 9   Global Step: 153780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:45,686-Speed 9587.42 samples/sec   Loss 5.8424   LearningRate 0.0291   Epoch: 9   Global Step: 153790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:46,780-Speed 9359.74 samples/sec   Loss 5.8327   LearningRate 0.0291   Epoch: 9   Global Step: 153800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:47,885-Speed 9275.21 samples/sec   Loss 5.8919   LearningRate 0.0291   Epoch: 9   Global Step: 153810   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:48,956-Speed 9569.51 samples/sec   Loss 5.7970   LearningRate 0.0291   Epoch: 9   Global Step: 153820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:50,061-Speed 9266.30 samples/sec   Loss 5.7215   LearningRate 0.0291   Epoch: 9   Global Step: 153830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:51,188-Speed 9092.44 samples/sec   Loss 5.8240   LearningRate 0.0291   Epoch: 9   Global Step: 153840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:52,239-Speed 9748.09 samples/sec   Loss 5.8048   LearningRate 0.0291   Epoch: 9   Global Step: 153850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:43:53,323-Speed 9454.87 samples/sec   Loss 5.7826   LearningRate 0.0291   Epoch: 9   Global Step: 153860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:54,490-Speed 8780.35 samples/sec   Loss 5.7572   LearningRate 0.0291   Epoch: 9   Global Step: 153870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:55,596-Speed 9264.97 samples/sec   Loss 5.8051   LearningRate 0.0291   Epoch: 9   Global Step: 153880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:56,751-Speed 8871.96 samples/sec   Loss 5.8812   LearningRate 0.0291   Epoch: 9   Global Step: 153890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:57,855-Speed 9278.67 samples/sec   Loss 5.8653   LearningRate 0.0290   Epoch: 9   Global Step: 153900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:43:58,959-Speed 9281.83 samples/sec   Loss 5.8162   LearningRate 0.0290   Epoch: 9   Global Step: 153910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:44:00,081-Speed 9133.67 samples/sec   Loss 5.8967   LearningRate 0.0290   Epoch: 9   Global Step: 153920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:44:01,193-Speed 9213.23 samples/sec   Loss 5.7743   LearningRate 0.0290   Epoch: 9   Global Step: 153930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:44:02,295-Speed 9295.09 samples/sec   Loss 5.8543   LearningRate 0.0290   Epoch: 9   Global Step: 153940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:44:03,363-Speed 9596.50 samples/sec   Loss 5.8052   LearningRate 0.0290   Epoch: 9   Global Step: 153950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:44:04,427-Speed 9627.28 samples/sec   Loss 5.8302   LearningRate 0.0290   Epoch: 9   Global Step: 153960   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:44:05,508-Speed 9480.61 samples/sec   Loss 5.7312   LearningRate 0.0290   Epoch: 9   Global Step: 153970   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:44:06,658-Speed 8910.88 samples/sec   Loss 5.7303   LearningRate 0.0290   Epoch: 9   Global Step: 153980   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:44:07,754-Speed 9343.64 samples/sec   Loss 5.8524   LearningRate 0.0290   Epoch: 9   Global Step: 153990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:44:08,889-Speed 9031.07 samples/sec   Loss 5.7550   LearningRate 0.0290   Epoch: 9   Global Step: 154000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:44:31,219-[lfw][154000]XNorm: 9.903580
Training: 2022-04-11 17:44:31,220-[lfw][154000]Accuracy-Flip: 0.99600+-0.00327
Training: 2022-04-11 17:44:31,220-[lfw][154000]Accuracy-Highest: 0.99683
Training: 2022-04-11 17:44:56,692-[cfp_fp][154000]XNorm: 8.532174
Training: 2022-04-11 17:44:56,693-[cfp_fp][154000]Accuracy-Flip: 0.95914+-0.01094
Training: 2022-04-11 17:44:56,693-[cfp_fp][154000]Accuracy-Highest: 0.96500
Training: 2022-04-11 17:45:18,679-[agedb_30][154000]XNorm: 9.655456
Training: 2022-04-11 17:45:18,680-[agedb_30][154000]Accuracy-Flip: 0.96283+-0.00860
Training: 2022-04-11 17:45:18,680-[agedb_30][154000]Accuracy-Highest: 0.96783
Training: 2022-04-11 17:45:19,770-Speed 144.47 samples/sec   Loss 5.8909   LearningRate 0.0290   Epoch: 9   Global Step: 154010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:20,845-Speed 9533.03 samples/sec   Loss 5.8693   LearningRate 0.0290   Epoch: 9   Global Step: 154020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:21,907-Speed 9650.61 samples/sec   Loss 5.8160   LearningRate 0.0290   Epoch: 9   Global Step: 154030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:22,969-Speed 9654.35 samples/sec   Loss 5.8544   LearningRate 0.0290   Epoch: 9   Global Step: 154040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:24,058-Speed 9407.92 samples/sec   Loss 5.9875   LearningRate 0.0290   Epoch: 9   Global Step: 154050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:25,208-Speed 8910.61 samples/sec   Loss 5.8262   LearningRate 0.0290   Epoch: 9   Global Step: 154060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:26,284-Speed 9522.66 samples/sec   Loss 5.7821   LearningRate 0.0290   Epoch: 9   Global Step: 154070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:27,341-Speed 9697.58 samples/sec   Loss 5.7567   LearningRate 0.0290   Epoch: 9   Global Step: 154080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:28,434-Speed 9369.40 samples/sec   Loss 5.8289   LearningRate 0.0290   Epoch: 9   Global Step: 154090   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:45:29,485-Speed 9750.32 samples/sec   Loss 5.8356   LearningRate 0.0290   Epoch: 9   Global Step: 154100   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:45:30,583-Speed 9330.63 samples/sec   Loss 5.8900   LearningRate 0.0290   Epoch: 9   Global Step: 154110   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:45:31,664-Speed 9478.52 samples/sec   Loss 5.8658   LearningRate 0.0290   Epoch: 9   Global Step: 154120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:32,772-Speed 9245.54 samples/sec   Loss 5.8804   LearningRate 0.0290   Epoch: 9   Global Step: 154130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:33,876-Speed 9282.65 samples/sec   Loss 5.7719   LearningRate 0.0290   Epoch: 9   Global Step: 154140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:34,938-Speed 9647.89 samples/sec   Loss 5.7537   LearningRate 0.0290   Epoch: 9   Global Step: 154150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:36,031-Speed 9374.05 samples/sec   Loss 5.6986   LearningRate 0.0290   Epoch: 9   Global Step: 154160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:37,111-Speed 9490.37 samples/sec   Loss 5.7488   LearningRate 0.0290   Epoch: 9   Global Step: 154170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:38,193-Speed 9467.50 samples/sec   Loss 5.9000   LearningRate 0.0290   Epoch: 9   Global Step: 154180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:39,365-Speed 8743.19 samples/sec   Loss 5.8434   LearningRate 0.0290   Epoch: 9   Global Step: 154190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:40,446-Speed 9475.07 samples/sec   Loss 5.7774   LearningRate 0.0290   Epoch: 9   Global Step: 154200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:41,508-Speed 9645.85 samples/sec   Loss 5.7966   LearningRate 0.0289   Epoch: 9   Global Step: 154210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:42,605-Speed 9340.63 samples/sec   Loss 5.7885   LearningRate 0.0289   Epoch: 9   Global Step: 154220   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:45:43,700-Speed 9357.77 samples/sec   Loss 5.7708   LearningRate 0.0289   Epoch: 9   Global Step: 154230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:44,764-Speed 9627.81 samples/sec   Loss 5.8694   LearningRate 0.0289   Epoch: 9   Global Step: 154240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:45,904-Speed 8988.38 samples/sec   Loss 5.7989   LearningRate 0.0289   Epoch: 9   Global Step: 154250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:47,000-Speed 9346.75 samples/sec   Loss 5.7635   LearningRate 0.0289   Epoch: 9   Global Step: 154260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:48,094-Speed 9366.37 samples/sec   Loss 5.8929   LearningRate 0.0289   Epoch: 9   Global Step: 154270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:49,198-Speed 9282.70 samples/sec   Loss 5.7762   LearningRate 0.0289   Epoch: 9   Global Step: 154280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:50,311-Speed 9202.57 samples/sec   Loss 5.7885   LearningRate 0.0289   Epoch: 9   Global Step: 154290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:51,427-Speed 9187.60 samples/sec   Loss 5.9400   LearningRate 0.0289   Epoch: 9   Global Step: 154300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:52,543-Speed 9184.94 samples/sec   Loss 5.9152   LearningRate 0.0289   Epoch: 9   Global Step: 154310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:53,733-Speed 8610.74 samples/sec   Loss 5.8514   LearningRate 0.0289   Epoch: 9   Global Step: 154320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:54,798-Speed 9617.99 samples/sec   Loss 5.8584   LearningRate 0.0289   Epoch: 9   Global Step: 154330   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:45:55,887-Speed 9411.41 samples/sec   Loss 5.7739   LearningRate 0.0289   Epoch: 9   Global Step: 154340   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:45:56,940-Speed 9736.92 samples/sec   Loss 5.8396   LearningRate 0.0289   Epoch: 9   Global Step: 154350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:58,044-Speed 9274.87 samples/sec   Loss 5.8409   LearningRate 0.0289   Epoch: 9   Global Step: 154360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:45:59,129-Speed 9441.59 samples/sec   Loss 5.8517   LearningRate 0.0289   Epoch: 9   Global Step: 154370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:00,233-Speed 9283.54 samples/sec   Loss 5.8074   LearningRate 0.0289   Epoch: 9   Global Step: 154380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:01,302-Speed 9586.78 samples/sec   Loss 5.8234   LearningRate 0.0289   Epoch: 9   Global Step: 154390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:02,395-Speed 9377.90 samples/sec   Loss 5.7122   LearningRate 0.0289   Epoch: 9   Global Step: 154400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:03,503-Speed 9244.48 samples/sec   Loss 5.8810   LearningRate 0.0289   Epoch: 9   Global Step: 154410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:04,593-Speed 9402.65 samples/sec   Loss 5.7808   LearningRate 0.0289   Epoch: 9   Global Step: 154420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:05,713-Speed 9143.06 samples/sec   Loss 5.9039   LearningRate 0.0289   Epoch: 9   Global Step: 154430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:06,788-Speed 9535.11 samples/sec   Loss 5.8284   LearningRate 0.0289   Epoch: 9   Global Step: 154440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:07,890-Speed 9297.09 samples/sec   Loss 5.7450   LearningRate 0.0289   Epoch: 9   Global Step: 154450   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:46:09,004-Speed 9200.80 samples/sec   Loss 5.8703   LearningRate 0.0289   Epoch: 9   Global Step: 154460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:10,106-Speed 9297.76 samples/sec   Loss 5.8192   LearningRate 0.0289   Epoch: 9   Global Step: 154470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:11,212-Speed 9266.65 samples/sec   Loss 5.7919   LearningRate 0.0289   Epoch: 9   Global Step: 154480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:12,335-Speed 9122.11 samples/sec   Loss 5.7803   LearningRate 0.0289   Epoch: 9   Global Step: 154490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:13,476-Speed 8982.27 samples/sec   Loss 5.8540   LearningRate 0.0289   Epoch: 9   Global Step: 154500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:14,522-Speed 9791.57 samples/sec   Loss 5.8877   LearningRate 0.0289   Epoch: 9   Global Step: 154510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:15,586-Speed 9632.65 samples/sec   Loss 5.8408   LearningRate 0.0288   Epoch: 9   Global Step: 154520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:16,705-Speed 9152.76 samples/sec   Loss 5.7728   LearningRate 0.0288   Epoch: 9   Global Step: 154530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:17,781-Speed 9521.39 samples/sec   Loss 5.9508   LearningRate 0.0288   Epoch: 9   Global Step: 154540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:18,869-Speed 9417.71 samples/sec   Loss 5.9015   LearningRate 0.0288   Epoch: 9   Global Step: 154550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:19,948-Speed 9491.87 samples/sec   Loss 5.7942   LearningRate 0.0288   Epoch: 9   Global Step: 154560   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:46:21,007-Speed 9687.30 samples/sec   Loss 5.8584   LearningRate 0.0288   Epoch: 9   Global Step: 154570   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:46:22,062-Speed 9708.19 samples/sec   Loss 5.9983   LearningRate 0.0288   Epoch: 9   Global Step: 154580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:23,279-Speed 8420.15 samples/sec   Loss 5.8452   LearningRate 0.0288   Epoch: 9   Global Step: 154590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:24,421-Speed 8971.10 samples/sec   Loss 5.8536   LearningRate 0.0288   Epoch: 9   Global Step: 154600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:25,500-Speed 9491.70 samples/sec   Loss 5.7740   LearningRate 0.0288   Epoch: 9   Global Step: 154610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:26,614-Speed 9197.77 samples/sec   Loss 5.8237   LearningRate 0.0288   Epoch: 9   Global Step: 154620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:27,773-Speed 8843.48 samples/sec   Loss 5.9271   LearningRate 0.0288   Epoch: 9   Global Step: 154630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:28,867-Speed 9359.58 samples/sec   Loss 5.8733   LearningRate 0.0288   Epoch: 9   Global Step: 154640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:29,981-Speed 9201.59 samples/sec   Loss 5.8453   LearningRate 0.0288   Epoch: 9   Global Step: 154650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:31,059-Speed 9507.68 samples/sec   Loss 5.8554   LearningRate 0.0288   Epoch: 9   Global Step: 154660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:32,191-Speed 9051.42 samples/sec   Loss 5.9398   LearningRate 0.0288   Epoch: 9   Global Step: 154670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:33,310-Speed 9157.95 samples/sec   Loss 5.8239   LearningRate 0.0288   Epoch: 9   Global Step: 154680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:34,390-Speed 9484.81 samples/sec   Loss 5.8720   LearningRate 0.0288   Epoch: 9   Global Step: 154690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:35,468-Speed 9505.17 samples/sec   Loss 5.8301   LearningRate 0.0288   Epoch: 9   Global Step: 154700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:36,536-Speed 9591.72 samples/sec   Loss 5.8930   LearningRate 0.0288   Epoch: 9   Global Step: 154710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:37,669-Speed 9040.05 samples/sec   Loss 5.8752   LearningRate 0.0288   Epoch: 9   Global Step: 154720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:38,770-Speed 9315.67 samples/sec   Loss 5.7252   LearningRate 0.0288   Epoch: 9   Global Step: 154730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:39,901-Speed 9055.26 samples/sec   Loss 5.8349   LearningRate 0.0288   Epoch: 9   Global Step: 154740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:40,969-Speed 9596.97 samples/sec   Loss 5.8749   LearningRate 0.0288   Epoch: 9   Global Step: 154750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:42,051-Speed 9467.06 samples/sec   Loss 5.8610   LearningRate 0.0288   Epoch: 9   Global Step: 154760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:43,112-Speed 9657.65 samples/sec   Loss 5.7617   LearningRate 0.0288   Epoch: 9   Global Step: 154770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:44,200-Speed 9411.71 samples/sec   Loss 5.8614   LearningRate 0.0288   Epoch: 9   Global Step: 154780   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:46:45,340-Speed 8992.47 samples/sec   Loss 5.9583   LearningRate 0.0288   Epoch: 9   Global Step: 154790   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:46:46,421-Speed 9483.66 samples/sec   Loss 5.8446   LearningRate 0.0288   Epoch: 9   Global Step: 154800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:47,544-Speed 9117.26 samples/sec   Loss 5.8948   LearningRate 0.0288   Epoch: 9   Global Step: 154810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:48,685-Speed 8983.47 samples/sec   Loss 5.8499   LearningRate 0.0288   Epoch: 9   Global Step: 154820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:49,756-Speed 9564.15 samples/sec   Loss 5.8702   LearningRate 0.0287   Epoch: 9   Global Step: 154830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:50,861-Speed 9275.98 samples/sec   Loss 5.8659   LearningRate 0.0287   Epoch: 9   Global Step: 154840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:51,932-Speed 9565.51 samples/sec   Loss 5.9159   LearningRate 0.0287   Epoch: 9   Global Step: 154850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:53,062-Speed 9068.89 samples/sec   Loss 5.8120   LearningRate 0.0287   Epoch: 9   Global Step: 154860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:54,133-Speed 9563.84 samples/sec   Loss 5.9382   LearningRate 0.0287   Epoch: 9   Global Step: 154870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:55,230-Speed 9338.63 samples/sec   Loss 5.8851   LearningRate 0.0287   Epoch: 9   Global Step: 154880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:56,380-Speed 8915.75 samples/sec   Loss 5.8270   LearningRate 0.0287   Epoch: 9   Global Step: 154890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:57,478-Speed 9328.29 samples/sec   Loss 5.9388   LearningRate 0.0287   Epoch: 9   Global Step: 154900   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:46:58,587-Speed 9239.25 samples/sec   Loss 5.8340   LearningRate 0.0287   Epoch: 9   Global Step: 154910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:46:59,653-Speed 9609.25 samples/sec   Loss 5.8582   LearningRate 0.0287   Epoch: 9   Global Step: 154920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:00,771-Speed 9163.10 samples/sec   Loss 5.8008   LearningRate 0.0287   Epoch: 9   Global Step: 154930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:01,842-Speed 9571.01 samples/sec   Loss 5.8457   LearningRate 0.0287   Epoch: 9   Global Step: 154940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:02,933-Speed 9387.61 samples/sec   Loss 5.9683   LearningRate 0.0287   Epoch: 9   Global Step: 154950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:04,039-Speed 9270.27 samples/sec   Loss 5.8472   LearningRate 0.0287   Epoch: 9   Global Step: 154960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:05,122-Speed 9462.17 samples/sec   Loss 5.7208   LearningRate 0.0287   Epoch: 9   Global Step: 154970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:06,201-Speed 9493.04 samples/sec   Loss 5.8468   LearningRate 0.0287   Epoch: 9   Global Step: 154980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:07,260-Speed 9677.49 samples/sec   Loss 5.8621   LearningRate 0.0287   Epoch: 9   Global Step: 154990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:08,335-Speed 9525.58 samples/sec   Loss 5.9174   LearningRate 0.0287   Epoch: 9   Global Step: 155000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:09,460-Speed 9111.47 samples/sec   Loss 5.8916   LearningRate 0.0287   Epoch: 9   Global Step: 155010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:10,555-Speed 9360.77 samples/sec   Loss 5.9144   LearningRate 0.0287   Epoch: 9   Global Step: 155020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:11,627-Speed 9553.85 samples/sec   Loss 5.8649   LearningRate 0.0287   Epoch: 9   Global Step: 155030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:12,715-Speed 9422.00 samples/sec   Loss 5.8111   LearningRate 0.0287   Epoch: 9   Global Step: 155040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:13,808-Speed 9369.50 samples/sec   Loss 5.9330   LearningRate 0.0287   Epoch: 9   Global Step: 155050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:14,901-Speed 9379.26 samples/sec   Loss 5.8672   LearningRate 0.0287   Epoch: 9   Global Step: 155060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:16,011-Speed 9229.99 samples/sec   Loss 5.7692   LearningRate 0.0287   Epoch: 9   Global Step: 155070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:17,184-Speed 8733.35 samples/sec   Loss 5.8211   LearningRate 0.0287   Epoch: 9   Global Step: 155080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:18,309-Speed 9112.12 samples/sec   Loss 5.8605   LearningRate 0.0287   Epoch: 9   Global Step: 155090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:19,436-Speed 9089.70 samples/sec   Loss 5.9269   LearningRate 0.0287   Epoch: 9   Global Step: 155100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:20,525-Speed 9403.11 samples/sec   Loss 5.8299   LearningRate 0.0287   Epoch: 9   Global Step: 155110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:21,642-Speed 9174.03 samples/sec   Loss 5.8053   LearningRate 0.0287   Epoch: 9   Global Step: 155120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:22,785-Speed 8966.36 samples/sec   Loss 5.9617   LearningRate 0.0287   Epoch: 9   Global Step: 155130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:47:23,868-Speed 9463.96 samples/sec   Loss 6.0034   LearningRate 0.0287   Epoch: 9   Global Step: 155140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:24,917-Speed 9763.19 samples/sec   Loss 5.8621   LearningRate 0.0286   Epoch: 9   Global Step: 155150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:26,026-Speed 9245.51 samples/sec   Loss 5.8653   LearningRate 0.0286   Epoch: 9   Global Step: 155160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:27,171-Speed 8945.36 samples/sec   Loss 5.7687   LearningRate 0.0286   Epoch: 9   Global Step: 155170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:28,303-Speed 9050.74 samples/sec   Loss 5.9192   LearningRate 0.0286   Epoch: 9   Global Step: 155180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:29,374-Speed 9565.09 samples/sec   Loss 5.7760   LearningRate 0.0286   Epoch: 9   Global Step: 155190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:30,477-Speed 9291.18 samples/sec   Loss 5.8544   LearningRate 0.0286   Epoch: 9   Global Step: 155200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:31,564-Speed 9421.38 samples/sec   Loss 5.9489   LearningRate 0.0286   Epoch: 9   Global Step: 155210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:32,657-Speed 9382.47 samples/sec   Loss 5.8638   LearningRate 0.0286   Epoch: 9   Global Step: 155220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:33,829-Speed 8737.05 samples/sec   Loss 5.8497   LearningRate 0.0286   Epoch: 9   Global Step: 155230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:34,942-Speed 9210.41 samples/sec   Loss 5.8753   LearningRate 0.0286   Epoch: 9   Global Step: 155240   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:47:36,106-Speed 8796.82 samples/sec   Loss 5.7669   LearningRate 0.0286   Epoch: 9   Global Step: 155250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:37,244-Speed 9004.43 samples/sec   Loss 5.8812   LearningRate 0.0286   Epoch: 9   Global Step: 155260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:38,370-Speed 9103.43 samples/sec   Loss 5.9139   LearningRate 0.0286   Epoch: 9   Global Step: 155270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:39,492-Speed 9131.44 samples/sec   Loss 5.8717   LearningRate 0.0286   Epoch: 9   Global Step: 155280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:40,579-Speed 9429.74 samples/sec   Loss 5.8872   LearningRate 0.0286   Epoch: 9   Global Step: 155290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:41,657-Speed 9507.43 samples/sec   Loss 5.8622   LearningRate 0.0286   Epoch: 9   Global Step: 155300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:42,768-Speed 9221.91 samples/sec   Loss 5.9048   LearningRate 0.0286   Epoch: 9   Global Step: 155310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:43,862-Speed 9369.20 samples/sec   Loss 5.8271   LearningRate 0.0286   Epoch: 9   Global Step: 155320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:44,996-Speed 9031.46 samples/sec   Loss 5.8436   LearningRate 0.0286   Epoch: 9   Global Step: 155330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:46,128-Speed 9052.83 samples/sec   Loss 5.9260   LearningRate 0.0286   Epoch: 9   Global Step: 155340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:47,213-Speed 9447.66 samples/sec   Loss 5.8838   LearningRate 0.0286   Epoch: 9   Global Step: 155350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:48,354-Speed 8978.03 samples/sec   Loss 5.9160   LearningRate 0.0286   Epoch: 9   Global Step: 155360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:49,543-Speed 8615.64 samples/sec   Loss 5.9714   LearningRate 0.0286   Epoch: 9   Global Step: 155370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:50,631-Speed 9415.29 samples/sec   Loss 5.8367   LearningRate 0.0286   Epoch: 9   Global Step: 155380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:51,847-Speed 8425.70 samples/sec   Loss 5.9285   LearningRate 0.0286   Epoch: 9   Global Step: 155390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:52,940-Speed 9375.54 samples/sec   Loss 5.6923   LearningRate 0.0286   Epoch: 9   Global Step: 155400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:54,078-Speed 9003.23 samples/sec   Loss 6.0165   LearningRate 0.0286   Epoch: 9   Global Step: 155410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:55,152-Speed 9538.34 samples/sec   Loss 5.9419   LearningRate 0.0286   Epoch: 9   Global Step: 155420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:56,268-Speed 9182.56 samples/sec   Loss 5.9526   LearningRate 0.0286   Epoch: 9   Global Step: 155430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:57,369-Speed 9304.45 samples/sec   Loss 5.8318   LearningRate 0.0286   Epoch: 9   Global Step: 155440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:47:58,502-Speed 9044.11 samples/sec   Loss 5.8544   LearningRate 0.0286   Epoch: 9   Global Step: 155450   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:47:59,597-Speed 9356.40 samples/sec   Loss 5.8897   LearningRate 0.0285   Epoch: 9   Global Step: 155460   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:48:00,701-Speed 9281.75 samples/sec   Loss 5.8490   LearningRate 0.0285   Epoch: 9   Global Step: 155470   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:48:01,852-Speed 8904.04 samples/sec   Loss 5.9104   LearningRate 0.0285   Epoch: 9   Global Step: 155480   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:48:02,935-Speed 9461.98 samples/sec   Loss 5.9612   LearningRate 0.0285   Epoch: 9   Global Step: 155490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:04,052-Speed 9175.89 samples/sec   Loss 5.9001   LearningRate 0.0285   Epoch: 9   Global Step: 155500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:05,146-Speed 9365.49 samples/sec   Loss 5.8905   LearningRate 0.0285   Epoch: 9   Global Step: 155510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:06,261-Speed 9188.17 samples/sec   Loss 5.8645   LearningRate 0.0285   Epoch: 9   Global Step: 155520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:07,359-Speed 9333.12 samples/sec   Loss 5.8072   LearningRate 0.0285   Epoch: 9   Global Step: 155530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:08,474-Speed 9190.18 samples/sec   Loss 5.9049   LearningRate 0.0285   Epoch: 9   Global Step: 155540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:09,544-Speed 9573.66 samples/sec   Loss 5.7435   LearningRate 0.0285   Epoch: 9   Global Step: 155550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:10,605-Speed 9660.12 samples/sec   Loss 5.7573   LearningRate 0.0285   Epoch: 9   Global Step: 155560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:11,701-Speed 9344.99 samples/sec   Loss 5.8354   LearningRate 0.0285   Epoch: 9   Global Step: 155570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:12,845-Speed 8955.99 samples/sec   Loss 5.9483   LearningRate 0.0285   Epoch: 9   Global Step: 155580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:13,923-Speed 9506.70 samples/sec   Loss 5.8531   LearningRate 0.0285   Epoch: 9   Global Step: 155590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:14,984-Speed 9656.28 samples/sec   Loss 5.9112   LearningRate 0.0285   Epoch: 9   Global Step: 155600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:16,067-Speed 9458.98 samples/sec   Loss 5.8524   LearningRate 0.0285   Epoch: 9   Global Step: 155610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:17,163-Speed 9350.25 samples/sec   Loss 5.9198   LearningRate 0.0285   Epoch: 9   Global Step: 155620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:18,271-Speed 9249.90 samples/sec   Loss 5.8092   LearningRate 0.0285   Epoch: 9   Global Step: 155630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:19,378-Speed 9248.72 samples/sec   Loss 5.9226   LearningRate 0.0285   Epoch: 9   Global Step: 155640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:20,448-Speed 9582.02 samples/sec   Loss 5.9072   LearningRate 0.0285   Epoch: 9   Global Step: 155650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:21,507-Speed 9673.43 samples/sec   Loss 6.0097   LearningRate 0.0285   Epoch: 9   Global Step: 155660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:22,602-Speed 9369.35 samples/sec   Loss 5.8969   LearningRate 0.0285   Epoch: 9   Global Step: 155670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:23,700-Speed 9324.60 samples/sec   Loss 5.8610   LearningRate 0.0285   Epoch: 9   Global Step: 155680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:24,800-Speed 9314.43 samples/sec   Loss 5.9073   LearningRate 0.0285   Epoch: 9   Global Step: 155690   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:48:25,883-Speed 9462.48 samples/sec   Loss 5.8262   LearningRate 0.0285   Epoch: 9   Global Step: 155700   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:48:26,959-Speed 9526.20 samples/sec   Loss 5.7722   LearningRate 0.0285   Epoch: 9   Global Step: 155710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:28,086-Speed 9089.72 samples/sec   Loss 5.8442   LearningRate 0.0285   Epoch: 9   Global Step: 155720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:29,166-Speed 9489.04 samples/sec   Loss 5.8532   LearningRate 0.0285   Epoch: 9   Global Step: 155730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:30,298-Speed 9045.05 samples/sec   Loss 5.9258   LearningRate 0.0285   Epoch: 9   Global Step: 155740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:31,376-Speed 9506.89 samples/sec   Loss 5.8747   LearningRate 0.0285   Epoch: 9   Global Step: 155750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:32,504-Speed 9087.82 samples/sec   Loss 5.9433   LearningRate 0.0285   Epoch: 9   Global Step: 155760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:33,586-Speed 9468.07 samples/sec   Loss 5.8261   LearningRate 0.0284   Epoch: 9   Global Step: 155770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:34,699-Speed 9201.43 samples/sec   Loss 5.9039   LearningRate 0.0284   Epoch: 9   Global Step: 155780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:35,763-Speed 9632.91 samples/sec   Loss 5.8010   LearningRate 0.0284   Epoch: 9   Global Step: 155790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:36,825-Speed 9649.80 samples/sec   Loss 5.9953   LearningRate 0.0284   Epoch: 9   Global Step: 155800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:37,867-Speed 9834.20 samples/sec   Loss 5.8612   LearningRate 0.0284   Epoch: 9   Global Step: 155810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:38,962-Speed 9364.22 samples/sec   Loss 5.9017   LearningRate 0.0284   Epoch: 9   Global Step: 155820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:40,070-Speed 9242.07 samples/sec   Loss 6.0365   LearningRate 0.0284   Epoch: 9   Global Step: 155830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:41,144-Speed 9544.36 samples/sec   Loss 5.8519   LearningRate 0.0284   Epoch: 9   Global Step: 155840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:42,240-Speed 9350.20 samples/sec   Loss 5.8726   LearningRate 0.0284   Epoch: 9   Global Step: 155850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:43,342-Speed 9295.80 samples/sec   Loss 5.9037   LearningRate 0.0284   Epoch: 9   Global Step: 155860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:44,393-Speed 9745.51 samples/sec   Loss 5.7417   LearningRate 0.0284   Epoch: 9   Global Step: 155870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:45,493-Speed 9313.76 samples/sec   Loss 5.8127   LearningRate 0.0284   Epoch: 9   Global Step: 155880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:46,594-Speed 9303.33 samples/sec   Loss 5.8331   LearningRate 0.0284   Epoch: 9   Global Step: 155890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:47,705-Speed 9227.75 samples/sec   Loss 6.0111   LearningRate 0.0284   Epoch: 9   Global Step: 155900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:48,808-Speed 9287.42 samples/sec   Loss 5.9193   LearningRate 0.0284   Epoch: 9   Global Step: 155910   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:48:49,909-Speed 9304.75 samples/sec   Loss 5.9428   LearningRate 0.0284   Epoch: 9   Global Step: 155920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:50,993-Speed 9452.39 samples/sec   Loss 5.8710   LearningRate 0.0284   Epoch: 9   Global Step: 155930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:52,157-Speed 8805.26 samples/sec   Loss 5.8271   LearningRate 0.0284   Epoch: 9   Global Step: 155940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:53,288-Speed 9057.38 samples/sec   Loss 5.8870   LearningRate 0.0284   Epoch: 9   Global Step: 155950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:54,372-Speed 9453.51 samples/sec   Loss 5.8946   LearningRate 0.0284   Epoch: 9   Global Step: 155960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:55,527-Speed 8865.34 samples/sec   Loss 5.9954   LearningRate 0.0284   Epoch: 9   Global Step: 155970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:56,677-Speed 8915.64 samples/sec   Loss 5.7752   LearningRate 0.0284   Epoch: 9   Global Step: 155980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:57,850-Speed 8730.03 samples/sec   Loss 5.8288   LearningRate 0.0284   Epoch: 9   Global Step: 155990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:48:58,940-Speed 9405.13 samples/sec   Loss 5.8820   LearningRate 0.0284   Epoch: 9   Global Step: 156000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:49:20,817-[lfw][156000]XNorm: 9.713624
Training: 2022-04-11 17:49:22,968-[lfw][156000]Accuracy-Flip: 0.99667+-0.00279
Training: 2022-04-11 17:49:22,969-[lfw][156000]Accuracy-Highest: 0.99683
Training: 2022-04-11 17:49:48,296-[cfp_fp][156000]XNorm: 8.345846
Training: 2022-04-11 17:49:48,296-[cfp_fp][156000]Accuracy-Flip: 0.95943+-0.01094
Training: 2022-04-11 17:49:48,297-[cfp_fp][156000]Accuracy-Highest: 0.96500
Training: 2022-04-11 17:50:10,002-[agedb_30][156000]XNorm: 9.458214
Training: 2022-04-11 17:50:10,003-[agedb_30][156000]Accuracy-Flip: 0.96667+-0.00969
Training: 2022-04-11 17:50:10,003-[agedb_30][156000]Accuracy-Highest: 0.96783
Training: 2022-04-11 17:50:11,085-Speed 141.94 samples/sec   Loss 5.9465   LearningRate 0.0284   Epoch: 9   Global Step: 156010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:12,220-Speed 9021.00 samples/sec   Loss 5.8702   LearningRate 0.0284   Epoch: 9   Global Step: 156020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:13,357-Speed 9016.21 samples/sec   Loss 5.8714   LearningRate 0.0284   Epoch: 9   Global Step: 156030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:14,439-Speed 9466.36 samples/sec   Loss 5.8165   LearningRate 0.0284   Epoch: 9   Global Step: 156040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:15,532-Speed 9375.60 samples/sec   Loss 5.9611   LearningRate 0.0284   Epoch: 9   Global Step: 156050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:16,644-Speed 9212.75 samples/sec   Loss 5.8173   LearningRate 0.0284   Epoch: 9   Global Step: 156060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:17,721-Speed 9513.55 samples/sec   Loss 5.9116   LearningRate 0.0284   Epoch: 9   Global Step: 156070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:18,822-Speed 9307.27 samples/sec   Loss 5.9637   LearningRate 0.0283   Epoch: 9   Global Step: 156080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:19,928-Speed 9258.39 samples/sec   Loss 5.8948   LearningRate 0.0283   Epoch: 9   Global Step: 156090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:20,983-Speed 9715.11 samples/sec   Loss 5.9115   LearningRate 0.0283   Epoch: 9   Global Step: 156100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:22,053-Speed 9576.58 samples/sec   Loss 5.9016   LearningRate 0.0283   Epoch: 9   Global Step: 156110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:23,143-Speed 9396.15 samples/sec   Loss 5.8822   LearningRate 0.0283   Epoch: 9   Global Step: 156120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:24,253-Speed 9233.33 samples/sec   Loss 5.9721   LearningRate 0.0283   Epoch: 9   Global Step: 156130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:25,326-Speed 9551.94 samples/sec   Loss 5.8867   LearningRate 0.0283   Epoch: 9   Global Step: 156140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:26,448-Speed 9131.78 samples/sec   Loss 5.9092   LearningRate 0.0283   Epoch: 9   Global Step: 156150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:28,349-Speed 5388.15 samples/sec   Loss 5.8540   LearningRate 0.0283   Epoch: 9   Global Step: 156160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:29,451-Speed 9296.57 samples/sec   Loss 5.9026   LearningRate 0.0283   Epoch: 9   Global Step: 156170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:30,534-Speed 9460.94 samples/sec   Loss 5.9065   LearningRate 0.0283   Epoch: 9   Global Step: 156180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:31,618-Speed 9457.22 samples/sec   Loss 5.8891   LearningRate 0.0283   Epoch: 9   Global Step: 156190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:32,726-Speed 9246.73 samples/sec   Loss 5.8493   LearningRate 0.0283   Epoch: 9   Global Step: 156200   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:50:33,839-Speed 9207.23 samples/sec   Loss 5.9235   LearningRate 0.0283   Epoch: 9   Global Step: 156210   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:50:34,913-Speed 9541.35 samples/sec   Loss 6.0010   LearningRate 0.0283   Epoch: 9   Global Step: 156220   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:50:35,973-Speed 9665.09 samples/sec   Loss 5.9126   LearningRate 0.0283   Epoch: 9   Global Step: 156230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:37,074-Speed 9307.82 samples/sec   Loss 5.8124   LearningRate 0.0283   Epoch: 9   Global Step: 156240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:38,174-Speed 9307.74 samples/sec   Loss 5.9093   LearningRate 0.0283   Epoch: 9   Global Step: 156250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:39,232-Speed 9687.60 samples/sec   Loss 5.8943   LearningRate 0.0283   Epoch: 9   Global Step: 156260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:40,325-Speed 9374.84 samples/sec   Loss 5.8247   LearningRate 0.0283   Epoch: 9   Global Step: 156270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:41,429-Speed 9280.22 samples/sec   Loss 5.9039   LearningRate 0.0283   Epoch: 9   Global Step: 156280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:42,566-Speed 9012.85 samples/sec   Loss 5.8606   LearningRate 0.0283   Epoch: 9   Global Step: 156290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:43,682-Speed 9183.00 samples/sec   Loss 5.8785   LearningRate 0.0283   Epoch: 9   Global Step: 156300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:44,771-Speed 9404.78 samples/sec   Loss 5.9240   LearningRate 0.0283   Epoch: 9   Global Step: 156310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:45,850-Speed 9494.32 samples/sec   Loss 5.8905   LearningRate 0.0283   Epoch: 9   Global Step: 156320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:50:46,955-Speed 9271.78 samples/sec   Loss 5.9716   LearningRate 0.0283   Epoch: 9   Global Step: 156330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:48,082-Speed 9088.46 samples/sec   Loss 5.8732   LearningRate 0.0283   Epoch: 9   Global Step: 156340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:49,202-Speed 9154.87 samples/sec   Loss 5.9663   LearningRate 0.0283   Epoch: 9   Global Step: 156350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:50,297-Speed 9356.57 samples/sec   Loss 5.9125   LearningRate 0.0283   Epoch: 9   Global Step: 156360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:51,436-Speed 8989.05 samples/sec   Loss 5.8029   LearningRate 0.0283   Epoch: 9   Global Step: 156370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:52,513-Speed 9523.61 samples/sec   Loss 5.9286   LearningRate 0.0283   Epoch: 9   Global Step: 156380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:53,685-Speed 8734.98 samples/sec   Loss 5.9391   LearningRate 0.0283   Epoch: 9   Global Step: 156390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:54,801-Speed 9181.52 samples/sec   Loss 5.9697   LearningRate 0.0282   Epoch: 9   Global Step: 156400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:55,875-Speed 9539.54 samples/sec   Loss 5.7671   LearningRate 0.0282   Epoch: 9   Global Step: 156410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:56,955-Speed 9488.63 samples/sec   Loss 5.9357   LearningRate 0.0282   Epoch: 9   Global Step: 156420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:50:58,048-Speed 9375.56 samples/sec   Loss 5.8279   LearningRate 0.0282   Epoch: 9   Global Step: 156430   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:50:59,133-Speed 9450.09 samples/sec   Loss 5.7900   LearningRate 0.0282   Epoch: 9   Global Step: 156440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:00,272-Speed 8991.61 samples/sec   Loss 5.9051   LearningRate 0.0282   Epoch: 9   Global Step: 156450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:01,353-Speed 9480.97 samples/sec   Loss 5.8438   LearningRate 0.0282   Epoch: 9   Global Step: 156460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:02,503-Speed 8908.76 samples/sec   Loss 5.8411   LearningRate 0.0282   Epoch: 9   Global Step: 156470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:03,603-Speed 9313.90 samples/sec   Loss 5.7655   LearningRate 0.0282   Epoch: 9   Global Step: 156480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:04,682-Speed 9499.41 samples/sec   Loss 5.8603   LearningRate 0.0282   Epoch: 9   Global Step: 156490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:05,785-Speed 9292.23 samples/sec   Loss 5.9192   LearningRate 0.0282   Epoch: 9   Global Step: 156500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:06,876-Speed 9386.53 samples/sec   Loss 5.8654   LearningRate 0.0282   Epoch: 9   Global Step: 156510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:07,987-Speed 9219.67 samples/sec   Loss 5.9038   LearningRate 0.0282   Epoch: 9   Global Step: 156520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:09,045-Speed 9687.30 samples/sec   Loss 5.9080   LearningRate 0.0282   Epoch: 9   Global Step: 156530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:10,176-Speed 9061.20 samples/sec   Loss 5.9538   LearningRate 0.0282   Epoch: 9   Global Step: 156540   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:51:11,265-Speed 9412.22 samples/sec   Loss 5.9350   LearningRate 0.0282   Epoch: 9   Global Step: 156550   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:51:12,391-Speed 9096.97 samples/sec   Loss 5.8453   LearningRate 0.0282   Epoch: 9   Global Step: 156560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:13,490-Speed 9319.03 samples/sec   Loss 5.9571   LearningRate 0.0282   Epoch: 9   Global Step: 156570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:14,546-Speed 9708.89 samples/sec   Loss 5.8314   LearningRate 0.0282   Epoch: 9   Global Step: 156580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:15,649-Speed 9292.32 samples/sec   Loss 5.9765   LearningRate 0.0282   Epoch: 9   Global Step: 156590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:16,768-Speed 9154.04 samples/sec   Loss 5.9113   LearningRate 0.0282   Epoch: 9   Global Step: 156600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:17,898-Speed 9067.00 samples/sec   Loss 5.8634   LearningRate 0.0282   Epoch: 9   Global Step: 156610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:51:19,024-Speed 9101.90 samples/sec   Loss 5.8954   LearningRate 0.0282   Epoch: 9   Global Step: 156620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:51:20,171-Speed 8931.28 samples/sec   Loss 5.9169   LearningRate 0.0282   Epoch: 9   Global Step: 156630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:51:21,254-Speed 9460.78 samples/sec   Loss 5.8991   LearningRate 0.0282   Epoch: 9   Global Step: 156640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:51:22,328-Speed 9537.89 samples/sec   Loss 5.8092   LearningRate 0.0282   Epoch: 9   Global Step: 156650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:51:23,440-Speed 9220.21 samples/sec   Loss 5.9533   LearningRate 0.0282   Epoch: 9   Global Step: 156660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:51:24,563-Speed 9125.46 samples/sec   Loss 6.0474   LearningRate 0.0282   Epoch: 9   Global Step: 156670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:51:25,721-Speed 8844.86 samples/sec   Loss 5.7505   LearningRate 0.0282   Epoch: 9   Global Step: 156680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:51:26,811-Speed 9400.08 samples/sec   Loss 5.8844   LearningRate 0.0282   Epoch: 9   Global Step: 156690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:51:27,944-Speed 9038.50 samples/sec   Loss 6.0465   LearningRate 0.0282   Epoch: 9   Global Step: 156700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:51:29,074-Speed 9067.24 samples/sec   Loss 5.9100   LearningRate 0.0281   Epoch: 9   Global Step: 156710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:30,135-Speed 9657.36 samples/sec   Loss 5.8981   LearningRate 0.0281   Epoch: 9   Global Step: 156720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:31,197-Speed 9647.18 samples/sec   Loss 5.8971   LearningRate 0.0281   Epoch: 9   Global Step: 156730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:32,274-Speed 9521.15 samples/sec   Loss 5.8976   LearningRate 0.0281   Epoch: 9   Global Step: 156740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:33,417-Speed 8966.33 samples/sec   Loss 5.9003   LearningRate 0.0281   Epoch: 9   Global Step: 156750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:34,538-Speed 9135.63 samples/sec   Loss 5.8149   LearningRate 0.0281   Epoch: 9   Global Step: 156760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:35,636-Speed 9332.76 samples/sec   Loss 5.9598   LearningRate 0.0281   Epoch: 9   Global Step: 156770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:36,728-Speed 9387.62 samples/sec   Loss 5.9075   LearningRate 0.0281   Epoch: 9   Global Step: 156780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:37,815-Speed 9425.64 samples/sec   Loss 5.8698   LearningRate 0.0281   Epoch: 9   Global Step: 156790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:38,920-Speed 9268.30 samples/sec   Loss 5.9658   LearningRate 0.0281   Epoch: 9   Global Step: 156800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:40,059-Speed 8993.95 samples/sec   Loss 6.0604   LearningRate 0.0281   Epoch: 9   Global Step: 156810   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:51:41,143-Speed 9460.67 samples/sec   Loss 5.8889   LearningRate 0.0281   Epoch: 9   Global Step: 156820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:42,315-Speed 8743.51 samples/sec   Loss 6.0112   LearningRate 0.0281   Epoch: 9   Global Step: 156830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:43,435-Speed 9144.61 samples/sec   Loss 5.8655   LearningRate 0.0281   Epoch: 9   Global Step: 156840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:44,578-Speed 8962.30 samples/sec   Loss 5.7895   LearningRate 0.0281   Epoch: 9   Global Step: 156850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:45,628-Speed 9760.07 samples/sec   Loss 5.8863   LearningRate 0.0281   Epoch: 9   Global Step: 156860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:46,720-Speed 9387.42 samples/sec   Loss 5.9172   LearningRate 0.0281   Epoch: 9   Global Step: 156870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:47,802-Speed 9469.55 samples/sec   Loss 5.9983   LearningRate 0.0281   Epoch: 9   Global Step: 156880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:48,894-Speed 9385.21 samples/sec   Loss 5.9281   LearningRate 0.0281   Epoch: 9   Global Step: 156890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:49,988-Speed 9357.27 samples/sec   Loss 5.9388   LearningRate 0.0281   Epoch: 9   Global Step: 156900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:51,062-Speed 9549.90 samples/sec   Loss 5.9442   LearningRate 0.0281   Epoch: 9   Global Step: 156910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:52,168-Speed 9261.10 samples/sec   Loss 5.8548   LearningRate 0.0281   Epoch: 9   Global Step: 156920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:53,253-Speed 9441.92 samples/sec   Loss 5.8229   LearningRate 0.0281   Epoch: 9   Global Step: 156930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:54,347-Speed 9366.23 samples/sec   Loss 5.9431   LearningRate 0.0281   Epoch: 9   Global Step: 156940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:55,468-Speed 9142.54 samples/sec   Loss 5.8217   LearningRate 0.0281   Epoch: 9   Global Step: 156950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:56,599-Speed 9059.17 samples/sec   Loss 5.8440   LearningRate 0.0281   Epoch: 9   Global Step: 156960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:57,724-Speed 9104.61 samples/sec   Loss 5.9347   LearningRate 0.0281   Epoch: 9   Global Step: 156970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:58,856-Speed 9053.66 samples/sec   Loss 5.7996   LearningRate 0.0281   Epoch: 9   Global Step: 156980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:51:59,983-Speed 9093.73 samples/sec   Loss 5.8664   LearningRate 0.0281   Epoch: 9   Global Step: 156990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:01,141-Speed 8844.24 samples/sec   Loss 6.0549   LearningRate 0.0281   Epoch: 9   Global Step: 157000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:02,258-Speed 9172.80 samples/sec   Loss 5.9178   LearningRate 0.0281   Epoch: 9   Global Step: 157010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:03,361-Speed 9291.01 samples/sec   Loss 5.8184   LearningRate 0.0281   Epoch: 9   Global Step: 157020   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:52:04,433-Speed 9564.75 samples/sec   Loss 5.8238   LearningRate 0.0280   Epoch: 9   Global Step: 157030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:05,528-Speed 9350.32 samples/sec   Loss 5.9026   LearningRate 0.0280   Epoch: 9   Global Step: 157040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:06,601-Speed 9555.13 samples/sec   Loss 5.8211   LearningRate 0.0280   Epoch: 9   Global Step: 157050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:07,683-Speed 9461.62 samples/sec   Loss 5.9199   LearningRate 0.0280   Epoch: 9   Global Step: 157060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:52:08,811-Speed 9088.49 samples/sec   Loss 5.9123   LearningRate 0.0280   Epoch: 9   Global Step: 157070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:52:09,895-Speed 9453.09 samples/sec   Loss 5.8972   LearningRate 0.0280   Epoch: 9   Global Step: 157080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:52:11,012-Speed 9166.14 samples/sec   Loss 5.9579   LearningRate 0.0280   Epoch: 9   Global Step: 157090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:52:12,111-Speed 9328.40 samples/sec   Loss 6.0105   LearningRate 0.0280   Epoch: 9   Global Step: 157100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:52:13,203-Speed 9381.50 samples/sec   Loss 5.9831   LearningRate 0.0280   Epoch: 9   Global Step: 157110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:52:14,308-Speed 9274.65 samples/sec   Loss 6.0215   LearningRate 0.0280   Epoch: 9   Global Step: 157120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:52:15,430-Speed 9129.06 samples/sec   Loss 5.9081   LearningRate 0.0280   Epoch: 9   Global Step: 157130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:52:16,544-Speed 9199.62 samples/sec   Loss 5.8884   LearningRate 0.0280   Epoch: 9   Global Step: 157140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:52:17,721-Speed 8706.55 samples/sec   Loss 6.0490   LearningRate 0.0280   Epoch: 9   Global Step: 157150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:52:18,795-Speed 9537.61 samples/sec   Loss 5.9570   LearningRate 0.0280   Epoch: 9   Global Step: 157160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:19,883-Speed 9411.57 samples/sec   Loss 5.8771   LearningRate 0.0280   Epoch: 9   Global Step: 157170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:20,994-Speed 9223.04 samples/sec   Loss 5.9229   LearningRate 0.0280   Epoch: 9   Global Step: 157180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:22,102-Speed 9256.32 samples/sec   Loss 6.0146   LearningRate 0.0280   Epoch: 9   Global Step: 157190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:23,190-Speed 9416.86 samples/sec   Loss 5.8578   LearningRate 0.0280   Epoch: 9   Global Step: 157200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:24,293-Speed 9291.56 samples/sec   Loss 5.8549   LearningRate 0.0280   Epoch: 9   Global Step: 157210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:25,396-Speed 9293.51 samples/sec   Loss 5.7404   LearningRate 0.0280   Epoch: 9   Global Step: 157220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:26,512-Speed 9177.55 samples/sec   Loss 5.9113   LearningRate 0.0280   Epoch: 9   Global Step: 157230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:27,617-Speed 9271.59 samples/sec   Loss 5.9845   LearningRate 0.0280   Epoch: 9   Global Step: 157240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:28,745-Speed 9083.37 samples/sec   Loss 5.9556   LearningRate 0.0280   Epoch: 9   Global Step: 157250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:29,827-Speed 9478.27 samples/sec   Loss 5.8488   LearningRate 0.0280   Epoch: 9   Global Step: 157260   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:52:30,946-Speed 9153.16 samples/sec   Loss 5.8995   LearningRate 0.0280   Epoch: 9   Global Step: 157270   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:52:32,064-Speed 9166.67 samples/sec   Loss 5.9873   LearningRate 0.0280   Epoch: 9   Global Step: 157280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:33,143-Speed 9488.30 samples/sec   Loss 5.9442   LearningRate 0.0280   Epoch: 9   Global Step: 157290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:34,241-Speed 9334.57 samples/sec   Loss 5.9183   LearningRate 0.0280   Epoch: 9   Global Step: 157300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:35,337-Speed 9355.84 samples/sec   Loss 5.9564   LearningRate 0.0280   Epoch: 9   Global Step: 157310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:36,461-Speed 9114.10 samples/sec   Loss 5.8904   LearningRate 0.0280   Epoch: 9   Global Step: 157320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:37,583-Speed 9132.29 samples/sec   Loss 5.8867   LearningRate 0.0280   Epoch: 9   Global Step: 157330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:38,711-Speed 9079.88 samples/sec   Loss 5.9181   LearningRate 0.0279   Epoch: 9   Global Step: 157340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:39,811-Speed 9317.26 samples/sec   Loss 5.8933   LearningRate 0.0279   Epoch: 9   Global Step: 157350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:40,932-Speed 9136.57 samples/sec   Loss 5.8917   LearningRate 0.0279   Epoch: 9   Global Step: 157360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:42,041-Speed 9249.51 samples/sec   Loss 5.8715   LearningRate 0.0279   Epoch: 9   Global Step: 157370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:43,069-Speed 9958.77 samples/sec   Loss 5.9122   LearningRate 0.0279   Epoch: 9   Global Step: 157380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:44,145-Speed 9525.50 samples/sec   Loss 5.9069   LearningRate 0.0279   Epoch: 9   Global Step: 157390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:45,229-Speed 9455.50 samples/sec   Loss 5.8179   LearningRate 0.0279   Epoch: 9   Global Step: 157400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:46,311-Speed 9467.88 samples/sec   Loss 5.9097   LearningRate 0.0279   Epoch: 9   Global Step: 157410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:47,402-Speed 9390.08 samples/sec   Loss 5.9228   LearningRate 0.0279   Epoch: 9   Global Step: 157420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:48,527-Speed 9103.83 samples/sec   Loss 5.9143   LearningRate 0.0279   Epoch: 9   Global Step: 157430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:49,638-Speed 9225.10 samples/sec   Loss 5.8767   LearningRate 0.0279   Epoch: 9   Global Step: 157440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:50,721-Speed 9457.63 samples/sec   Loss 5.9320   LearningRate 0.0279   Epoch: 9   Global Step: 157450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:51,836-Speed 9193.67 samples/sec   Loss 6.0009   LearningRate 0.0279   Epoch: 9   Global Step: 157460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:52,945-Speed 9243.04 samples/sec   Loss 5.8988   LearningRate 0.0279   Epoch: 9   Global Step: 157470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:54,057-Speed 9214.10 samples/sec   Loss 5.9625   LearningRate 0.0279   Epoch: 9   Global Step: 157480   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:52:55,201-Speed 8956.10 samples/sec   Loss 5.8525   LearningRate 0.0279   Epoch: 9   Global Step: 157490   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:52:56,281-Speed 9486.18 samples/sec   Loss 5.9136   LearningRate 0.0279   Epoch: 9   Global Step: 157500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:57,340-Speed 9674.64 samples/sec   Loss 5.8458   LearningRate 0.0279   Epoch: 9   Global Step: 157510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:58,454-Speed 9192.73 samples/sec   Loss 5.8647   LearningRate 0.0279   Epoch: 9   Global Step: 157520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:52:59,536-Speed 9475.42 samples/sec   Loss 5.8551   LearningRate 0.0279   Epoch: 9   Global Step: 157530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:00,648-Speed 9217.31 samples/sec   Loss 5.9117   LearningRate 0.0279   Epoch: 9   Global Step: 157540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:01,724-Speed 9519.39 samples/sec   Loss 5.9545   LearningRate 0.0279   Epoch: 9   Global Step: 157550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:02,814-Speed 9404.78 samples/sec   Loss 6.0121   LearningRate 0.0279   Epoch: 9   Global Step: 157560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:03,958-Speed 8955.74 samples/sec   Loss 5.9317   LearningRate 0.0279   Epoch: 9   Global Step: 157570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:05,023-Speed 9618.90 samples/sec   Loss 5.8907   LearningRate 0.0279   Epoch: 9   Global Step: 157580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:06,116-Speed 9372.01 samples/sec   Loss 6.0768   LearningRate 0.0279   Epoch: 9   Global Step: 157590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:07,222-Speed 9265.20 samples/sec   Loss 5.9078   LearningRate 0.0279   Epoch: 9   Global Step: 157600   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:53:08,340-Speed 9163.98 samples/sec   Loss 6.0379   LearningRate 0.0279   Epoch: 9   Global Step: 157610   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:53:09,435-Speed 9359.14 samples/sec   Loss 5.9205   LearningRate 0.0279   Epoch: 9   Global Step: 157620   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:53:10,533-Speed 9323.96 samples/sec   Loss 5.9642   LearningRate 0.0279   Epoch: 9   Global Step: 157630   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:53:11,642-Speed 9244.88 samples/sec   Loss 5.9349   LearningRate 0.0279   Epoch: 9   Global Step: 157640   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:53:12,760-Speed 9166.11 samples/sec   Loss 5.8554   LearningRate 0.0279   Epoch: 9   Global Step: 157650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:13,857-Speed 9337.54 samples/sec   Loss 5.8610   LearningRate 0.0278   Epoch: 9   Global Step: 157660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:14,941-Speed 9452.77 samples/sec   Loss 5.8989   LearningRate 0.0278   Epoch: 9   Global Step: 157670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:16,014-Speed 9546.65 samples/sec   Loss 5.9488   LearningRate 0.0278   Epoch: 9   Global Step: 157680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:17,137-Speed 9127.32 samples/sec   Loss 5.8894   LearningRate 0.0278   Epoch: 9   Global Step: 157690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:18,224-Speed 9429.40 samples/sec   Loss 5.9570   LearningRate 0.0278   Epoch: 9   Global Step: 157700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:19,291-Speed 9596.56 samples/sec   Loss 5.9251   LearningRate 0.0278   Epoch: 9   Global Step: 157710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:20,386-Speed 9359.35 samples/sec   Loss 5.9496   LearningRate 0.0278   Epoch: 9   Global Step: 157720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:21,486-Speed 9316.71 samples/sec   Loss 5.9784   LearningRate 0.0278   Epoch: 9   Global Step: 157730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:22,618-Speed 9053.06 samples/sec   Loss 5.9676   LearningRate 0.0278   Epoch: 9   Global Step: 157740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:23,695-Speed 9512.56 samples/sec   Loss 5.8894   LearningRate 0.0278   Epoch: 9   Global Step: 157750   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:53:24,808-Speed 9208.00 samples/sec   Loss 5.9902   LearningRate 0.0278   Epoch: 9   Global Step: 157760   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:53:25,909-Speed 9302.84 samples/sec   Loss 5.8570   LearningRate 0.0278   Epoch: 9   Global Step: 157770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:26,976-Speed 9603.87 samples/sec   Loss 6.0142   LearningRate 0.0278   Epoch: 9   Global Step: 157780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:28,071-Speed 9359.19 samples/sec   Loss 5.9498   LearningRate 0.0278   Epoch: 9   Global Step: 157790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:29,197-Speed 9097.70 samples/sec   Loss 6.0132   LearningRate 0.0278   Epoch: 9   Global Step: 157800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:30,313-Speed 9179.26 samples/sec   Loss 5.8921   LearningRate 0.0278   Epoch: 9   Global Step: 157810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:31,436-Speed 9124.23 samples/sec   Loss 5.9053   LearningRate 0.0278   Epoch: 9   Global Step: 157820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:32,521-Speed 9446.92 samples/sec   Loss 5.9147   LearningRate 0.0278   Epoch: 9   Global Step: 157830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:53:33,602-Speed 9479.23 samples/sec   Loss 5.9756   LearningRate 0.0278   Epoch: 9   Global Step: 157840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:53:34,691-Speed 9401.26 samples/sec   Loss 5.8806   LearningRate 0.0278   Epoch: 9   Global Step: 157850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:53:35,777-Speed 9438.01 samples/sec   Loss 6.0452   LearningRate 0.0278   Epoch: 9   Global Step: 157860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:53:36,867-Speed 9397.69 samples/sec   Loss 5.9240   LearningRate 0.0278   Epoch: 9   Global Step: 157870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:53:37,956-Speed 9418.13 samples/sec   Loss 5.9440   LearningRate 0.0278   Epoch: 9   Global Step: 157880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:53:39,042-Speed 9429.85 samples/sec   Loss 5.9796   LearningRate 0.0278   Epoch: 9   Global Step: 157890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:53:40,145-Speed 9296.12 samples/sec   Loss 5.8964   LearningRate 0.0278   Epoch: 9   Global Step: 157900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:53:41,256-Speed 9221.33 samples/sec   Loss 5.8510   LearningRate 0.0278   Epoch: 9   Global Step: 157910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:53:42,344-Speed 9416.76 samples/sec   Loss 6.0100   LearningRate 0.0278   Epoch: 9   Global Step: 157920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:53:43,419-Speed 9532.85 samples/sec   Loss 5.9421   LearningRate 0.0278   Epoch: 9   Global Step: 157930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:44,501-Speed 9469.92 samples/sec   Loss 5.8777   LearningRate 0.0278   Epoch: 9   Global Step: 157940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:45,581-Speed 9484.75 samples/sec   Loss 5.9997   LearningRate 0.0278   Epoch: 9   Global Step: 157950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:46,667-Speed 9433.13 samples/sec   Loss 5.8389   LearningRate 0.0278   Epoch: 9   Global Step: 157960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:47,754-Speed 9423.50 samples/sec   Loss 5.7857   LearningRate 0.0277   Epoch: 9   Global Step: 157970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:53:48,856-Speed 9298.60 samples/sec   Loss 5.9654   LearningRate 0.0277   Epoch: 9   Global Step: 157980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:53:49,940-Speed 9452.86 samples/sec   Loss 5.8832   LearningRate 0.0277   Epoch: 9   Global Step: 157990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:53:51,006-Speed 9614.19 samples/sec   Loss 5.9040   LearningRate 0.0277   Epoch: 9   Global Step: 158000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:54:13,014-[lfw][158000]XNorm: 9.898848
Training: 2022-04-11 17:54:13,015-[lfw][158000]Accuracy-Flip: 0.99683+-0.00320
Training: 2022-04-11 17:54:13,015-[lfw][158000]Accuracy-Highest: 0.99683
Training: 2022-04-11 17:54:38,441-[cfp_fp][158000]XNorm: 8.480223
Training: 2022-04-11 17:54:38,442-[cfp_fp][158000]Accuracy-Flip: 0.96086+-0.00819
Training: 2022-04-11 17:54:38,442-[cfp_fp][158000]Accuracy-Highest: 0.96500
Training: 2022-04-11 17:55:00,351-[agedb_30][158000]XNorm: 9.613047
Training: 2022-04-11 17:55:00,352-[agedb_30][158000]Accuracy-Flip: 0.96667+-0.00937
Training: 2022-04-11 17:55:00,353-[agedb_30][158000]Accuracy-Highest: 0.96783
Training: 2022-04-11 17:55:01,474-Speed 145.31 samples/sec   Loss 5.9447   LearningRate 0.0277   Epoch: 9   Global Step: 158010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:02,576-Speed 9303.58 samples/sec   Loss 5.8995   LearningRate 0.0277   Epoch: 9   Global Step: 158020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:03,733-Speed 8851.23 samples/sec   Loss 5.9123   LearningRate 0.0277   Epoch: 9   Global Step: 158030   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:04,819-Speed 9440.63 samples/sec   Loss 5.9192   LearningRate 0.0277   Epoch: 9   Global Step: 158040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:05,911-Speed 9376.98 samples/sec   Loss 5.9141   LearningRate 0.0277   Epoch: 9   Global Step: 158050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:06,988-Speed 9512.15 samples/sec   Loss 5.9469   LearningRate 0.0277   Epoch: 9   Global Step: 158060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:08,071-Speed 9472.02 samples/sec   Loss 5.8988   LearningRate 0.0277   Epoch: 9   Global Step: 158070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:09,170-Speed 9317.67 samples/sec   Loss 5.8958   LearningRate 0.0277   Epoch: 9   Global Step: 158080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:10,255-Speed 9448.45 samples/sec   Loss 5.8716   LearningRate 0.0277   Epoch: 9   Global Step: 158090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:11,364-Speed 9239.22 samples/sec   Loss 5.8386   LearningRate 0.0277   Epoch: 9   Global Step: 158100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:12,449-Speed 9441.24 samples/sec   Loss 5.9266   LearningRate 0.0277   Epoch: 9   Global Step: 158110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:13,560-Speed 9223.97 samples/sec   Loss 5.9345   LearningRate 0.0277   Epoch: 9   Global Step: 158120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:14,656-Speed 9349.56 samples/sec   Loss 5.8590   LearningRate 0.0277   Epoch: 9   Global Step: 158130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:15,746-Speed 9398.64 samples/sec   Loss 5.9098   LearningRate 0.0277   Epoch: 9   Global Step: 158140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:16,855-Speed 9239.18 samples/sec   Loss 5.8890   LearningRate 0.0277   Epoch: 9   Global Step: 158150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:17,954-Speed 9321.32 samples/sec   Loss 5.9714   LearningRate 0.0277   Epoch: 9   Global Step: 158160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:19,050-Speed 9352.06 samples/sec   Loss 5.9765   LearningRate 0.0277   Epoch: 9   Global Step: 158170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:20,139-Speed 9404.67 samples/sec   Loss 5.9547   LearningRate 0.0277   Epoch: 9   Global Step: 158180   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:55:21,241-Speed 9302.52 samples/sec   Loss 6.0376   LearningRate 0.0277   Epoch: 9   Global Step: 158190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:22,296-Speed 9708.38 samples/sec   Loss 5.9317   LearningRate 0.0277   Epoch: 9   Global Step: 158200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:23,432-Speed 9020.42 samples/sec   Loss 5.9226   LearningRate 0.0277   Epoch: 9   Global Step: 158210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:24,521-Speed 9407.60 samples/sec   Loss 5.8582   LearningRate 0.0277   Epoch: 9   Global Step: 158220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:25,603-Speed 9471.83 samples/sec   Loss 5.8937   LearningRate 0.0277   Epoch: 9   Global Step: 158230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:26,692-Speed 9413.15 samples/sec   Loss 5.9833   LearningRate 0.0277   Epoch: 9   Global Step: 158240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:27,844-Speed 8893.23 samples/sec   Loss 5.8818   LearningRate 0.0277   Epoch: 9   Global Step: 158250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:28,941-Speed 9339.60 samples/sec   Loss 5.9243   LearningRate 0.0277   Epoch: 9   Global Step: 158260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:30,095-Speed 8879.01 samples/sec   Loss 6.0617   LearningRate 0.0277   Epoch: 9   Global Step: 158270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:31,219-Speed 9121.40 samples/sec   Loss 5.8729   LearningRate 0.0277   Epoch: 9   Global Step: 158280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:32,385-Speed 8781.98 samples/sec   Loss 5.8859   LearningRate 0.0276   Epoch: 9   Global Step: 158290   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:55:33,458-Speed 9553.89 samples/sec   Loss 5.9368   LearningRate 0.0276   Epoch: 9   Global Step: 158300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:34,576-Speed 9160.38 samples/sec   Loss 5.8827   LearningRate 0.0276   Epoch: 9   Global Step: 158310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:35,640-Speed 9637.96 samples/sec   Loss 5.9557   LearningRate 0.0276   Epoch: 9   Global Step: 158320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:36,732-Speed 9380.54 samples/sec   Loss 5.9703   LearningRate 0.0276   Epoch: 9   Global Step: 158330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:37,838-Speed 9263.84 samples/sec   Loss 5.9940   LearningRate 0.0276   Epoch: 9   Global Step: 158340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:38,971-Speed 9037.70 samples/sec   Loss 5.9224   LearningRate 0.0276   Epoch: 9   Global Step: 158350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:40,118-Speed 8938.98 samples/sec   Loss 5.9229   LearningRate 0.0276   Epoch: 9   Global Step: 158360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:41,217-Speed 9317.09 samples/sec   Loss 5.9126   LearningRate 0.0276   Epoch: 9   Global Step: 158370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:42,331-Speed 9195.51 samples/sec   Loss 5.8487   LearningRate 0.0276   Epoch: 9   Global Step: 158380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:43,488-Speed 8865.33 samples/sec   Loss 5.7950   LearningRate 0.0276   Epoch: 9   Global Step: 158390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:44,602-Speed 9194.62 samples/sec   Loss 6.0188   LearningRate 0.0276   Epoch: 9   Global Step: 158400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:45,721-Speed 9153.60 samples/sec   Loss 5.9555   LearningRate 0.0276   Epoch: 9   Global Step: 158410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:55:46,830-Speed 9243.04 samples/sec   Loss 5.9145   LearningRate 0.0276   Epoch: 9   Global Step: 158420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:48,011-Speed 8675.62 samples/sec   Loss 5.9344   LearningRate 0.0276   Epoch: 9   Global Step: 158430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:49,105-Speed 9368.89 samples/sec   Loss 5.9602   LearningRate 0.0276   Epoch: 9   Global Step: 158440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:50,206-Speed 9301.23 samples/sec   Loss 5.9890   LearningRate 0.0276   Epoch: 9   Global Step: 158450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:51,326-Speed 9150.65 samples/sec   Loss 5.9595   LearningRate 0.0276   Epoch: 9   Global Step: 158460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:52,432-Speed 9263.74 samples/sec   Loss 6.0480   LearningRate 0.0276   Epoch: 9   Global Step: 158470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:53,517-Speed 9444.46 samples/sec   Loss 6.0005   LearningRate 0.0276   Epoch: 9   Global Step: 158480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:54,629-Speed 9212.31 samples/sec   Loss 6.0151   LearningRate 0.0276   Epoch: 9   Global Step: 158490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:55,770-Speed 8976.87 samples/sec   Loss 5.9863   LearningRate 0.0276   Epoch: 9   Global Step: 158500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:56,892-Speed 9137.28 samples/sec   Loss 5.9941   LearningRate 0.0276   Epoch: 9   Global Step: 158510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:55:57,986-Speed 9364.01 samples/sec   Loss 5.9450   LearningRate 0.0276   Epoch: 9   Global Step: 158520   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:55:59,058-Speed 9564.31 samples/sec   Loss 5.9072   LearningRate 0.0276   Epoch: 9   Global Step: 158530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:00,138-Speed 9483.13 samples/sec   Loss 5.9586   LearningRate 0.0276   Epoch: 9   Global Step: 158540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:01,248-Speed 9230.08 samples/sec   Loss 5.8859   LearningRate 0.0276   Epoch: 9   Global Step: 158550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:02,338-Speed 9407.30 samples/sec   Loss 5.9219   LearningRate 0.0276   Epoch: 9   Global Step: 158560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:03,448-Speed 9228.54 samples/sec   Loss 5.8735   LearningRate 0.0276   Epoch: 9   Global Step: 158570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:04,543-Speed 9365.50 samples/sec   Loss 5.9284   LearningRate 0.0276   Epoch: 9   Global Step: 158580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:05,663-Speed 9147.70 samples/sec   Loss 5.8539   LearningRate 0.0276   Epoch: 9   Global Step: 158590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:06,759-Speed 9344.48 samples/sec   Loss 5.8671   LearningRate 0.0276   Epoch: 9   Global Step: 158600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:09,467-Speed 3781.80 samples/sec   Loss 5.8816   LearningRate 0.0275   Epoch: 9   Global Step: 158610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:10,576-Speed 9241.08 samples/sec   Loss 5.8839   LearningRate 0.0275   Epoch: 9   Global Step: 158620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:11,673-Speed 9336.15 samples/sec   Loss 5.8836   LearningRate 0.0275   Epoch: 9   Global Step: 158630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:12,783-Speed 9234.11 samples/sec   Loss 5.8373   LearningRate 0.0275   Epoch: 9   Global Step: 158640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:13,847-Speed 9632.66 samples/sec   Loss 5.9551   LearningRate 0.0275   Epoch: 9   Global Step: 158650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:14,928-Speed 9476.17 samples/sec   Loss 5.9395   LearningRate 0.0275   Epoch: 9   Global Step: 158660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:16,066-Speed 8998.22 samples/sec   Loss 6.0298   LearningRate 0.0275   Epoch: 9   Global Step: 158670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:17,173-Speed 9264.08 samples/sec   Loss 5.9296   LearningRate 0.0275   Epoch: 9   Global Step: 158680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:18,262-Speed 9407.28 samples/sec   Loss 5.9019   LearningRate 0.0275   Epoch: 9   Global Step: 158690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:19,370-Speed 9247.23 samples/sec   Loss 5.9387   LearningRate 0.0275   Epoch: 9   Global Step: 158700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:20,493-Speed 9125.99 samples/sec   Loss 5.9965   LearningRate 0.0275   Epoch: 9   Global Step: 158710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:21,593-Speed 9318.90 samples/sec   Loss 5.8303   LearningRate 0.0275   Epoch: 9   Global Step: 158720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:22,674-Speed 9472.62 samples/sec   Loss 5.9641   LearningRate 0.0275   Epoch: 9   Global Step: 158730   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:56:23,808-Speed 9039.25 samples/sec   Loss 5.9081   LearningRate 0.0275   Epoch: 9   Global Step: 158740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:24,936-Speed 9076.09 samples/sec   Loss 5.9461   LearningRate 0.0275   Epoch: 9   Global Step: 158750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:26,033-Speed 9338.87 samples/sec   Loss 5.9845   LearningRate 0.0275   Epoch: 9   Global Step: 158760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:27,108-Speed 9532.30 samples/sec   Loss 5.9633   LearningRate 0.0275   Epoch: 9   Global Step: 158770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:28,196-Speed 9424.80 samples/sec   Loss 5.9800   LearningRate 0.0275   Epoch: 9   Global Step: 158780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:29,315-Speed 9156.69 samples/sec   Loss 5.8678   LearningRate 0.0275   Epoch: 9   Global Step: 158790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:30,438-Speed 9125.47 samples/sec   Loss 5.9326   LearningRate 0.0275   Epoch: 9   Global Step: 158800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:31,525-Speed 9423.58 samples/sec   Loss 5.8642   LearningRate 0.0275   Epoch: 9   Global Step: 158810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:32,628-Speed 9289.17 samples/sec   Loss 6.0334   LearningRate 0.0275   Epoch: 9   Global Step: 158820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:33,710-Speed 9469.35 samples/sec   Loss 5.9380   LearningRate 0.0275   Epoch: 9   Global Step: 158830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:34,782-Speed 9561.53 samples/sec   Loss 5.9628   LearningRate 0.0275   Epoch: 9   Global Step: 158840   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:56:35,851-Speed 9584.65 samples/sec   Loss 5.8071   LearningRate 0.0275   Epoch: 9   Global Step: 158850   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:56:36,949-Speed 9328.51 samples/sec   Loss 6.0611   LearningRate 0.0275   Epoch: 9   Global Step: 158860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:38,043-Speed 9367.52 samples/sec   Loss 5.9411   LearningRate 0.0275   Epoch: 9   Global Step: 158870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:39,127-Speed 9452.77 samples/sec   Loss 5.9687   LearningRate 0.0275   Epoch: 9   Global Step: 158880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:56:40,245-Speed 9171.25 samples/sec   Loss 5.8347   LearningRate 0.0275   Epoch: 9   Global Step: 158890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:56:41,410-Speed 8790.33 samples/sec   Loss 6.0019   LearningRate 0.0275   Epoch: 9   Global Step: 158900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:56:42,514-Speed 9278.70 samples/sec   Loss 5.9118   LearningRate 0.0275   Epoch: 9   Global Step: 158910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:56:43,636-Speed 9133.50 samples/sec   Loss 5.9392   LearningRate 0.0275   Epoch: 9   Global Step: 158920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:56:44,733-Speed 9336.99 samples/sec   Loss 5.8272   LearningRate 0.0274   Epoch: 9   Global Step: 158930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:56:45,792-Speed 9678.53 samples/sec   Loss 5.8309   LearningRate 0.0274   Epoch: 9   Global Step: 158940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:56:46,906-Speed 9200.77 samples/sec   Loss 6.0329   LearningRate 0.0274   Epoch: 9   Global Step: 158950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:56:47,971-Speed 9617.40 samples/sec   Loss 5.8824   LearningRate 0.0274   Epoch: 9   Global Step: 158960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:56:49,095-Speed 9116.56 samples/sec   Loss 5.9708   LearningRate 0.0274   Epoch: 9   Global Step: 158970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:56:50,220-Speed 9107.07 samples/sec   Loss 5.9777   LearningRate 0.0274   Epoch: 9   Global Step: 158980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:51,316-Speed 9349.52 samples/sec   Loss 5.9076   LearningRate 0.0274   Epoch: 9   Global Step: 158990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:52,423-Speed 9256.58 samples/sec   Loss 5.9045   LearningRate 0.0274   Epoch: 9   Global Step: 159000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:53,534-Speed 9221.21 samples/sec   Loss 5.9043   LearningRate 0.0274   Epoch: 9   Global Step: 159010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:54,610-Speed 9515.97 samples/sec   Loss 5.9495   LearningRate 0.0274   Epoch: 9   Global Step: 159020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:55,657-Speed 9791.07 samples/sec   Loss 5.8283   LearningRate 0.0274   Epoch: 9   Global Step: 159030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:56,756-Speed 9320.59 samples/sec   Loss 5.9642   LearningRate 0.0274   Epoch: 9   Global Step: 159040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:57,859-Speed 9296.07 samples/sec   Loss 5.8024   LearningRate 0.0274   Epoch: 9   Global Step: 159050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:56:58,961-Speed 9308.93 samples/sec   Loss 5.9492   LearningRate 0.0274   Epoch: 9   Global Step: 159060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:00,032-Speed 9564.23 samples/sec   Loss 5.9025   LearningRate 0.0274   Epoch: 9   Global Step: 159070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:01,135-Speed 9286.56 samples/sec   Loss 5.8888   LearningRate 0.0274   Epoch: 9   Global Step: 159080   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:57:02,245-Speed 9227.76 samples/sec   Loss 5.9502   LearningRate 0.0274   Epoch: 9   Global Step: 159090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:03,371-Speed 9107.68 samples/sec   Loss 5.9699   LearningRate 0.0274   Epoch: 9   Global Step: 159100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:04,511-Speed 8987.70 samples/sec   Loss 5.9227   LearningRate 0.0274   Epoch: 9   Global Step: 159110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:05,618-Speed 9253.99 samples/sec   Loss 5.8507   LearningRate 0.0274   Epoch: 9   Global Step: 159120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:06,731-Speed 9208.05 samples/sec   Loss 6.0030   LearningRate 0.0274   Epoch: 9   Global Step: 159130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:07,821-Speed 9396.89 samples/sec   Loss 5.8850   LearningRate 0.0274   Epoch: 9   Global Step: 159140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:08,889-Speed 9597.57 samples/sec   Loss 5.9234   LearningRate 0.0274   Epoch: 9   Global Step: 159150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:09,980-Speed 9391.70 samples/sec   Loss 5.8530   LearningRate 0.0274   Epoch: 9   Global Step: 159160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:11,066-Speed 9434.98 samples/sec   Loss 5.9157   LearningRate 0.0274   Epoch: 9   Global Step: 159170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:12,237-Speed 8746.94 samples/sec   Loss 5.8919   LearningRate 0.0274   Epoch: 9   Global Step: 159180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:13,375-Speed 9009.01 samples/sec   Loss 5.9414   LearningRate 0.0274   Epoch: 9   Global Step: 159190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:14,506-Speed 9056.48 samples/sec   Loss 5.8085   LearningRate 0.0274   Epoch: 9   Global Step: 159200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:15,655-Speed 8921.36 samples/sec   Loss 5.9389   LearningRate 0.0274   Epoch: 9   Global Step: 159210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:16,762-Speed 9258.88 samples/sec   Loss 5.9328   LearningRate 0.0274   Epoch: 9   Global Step: 159220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:17,826-Speed 9638.90 samples/sec   Loss 6.0833   LearningRate 0.0274   Epoch: 9   Global Step: 159230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:18,938-Speed 9208.99 samples/sec   Loss 5.8935   LearningRate 0.0274   Epoch: 9   Global Step: 159240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:20,081-Speed 8963.99 samples/sec   Loss 5.9416   LearningRate 0.0273   Epoch: 9   Global Step: 159250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:21,201-Speed 9149.65 samples/sec   Loss 5.9913   LearningRate 0.0273   Epoch: 9   Global Step: 159260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:22,300-Speed 9326.11 samples/sec   Loss 5.9865   LearningRate 0.0273   Epoch: 9   Global Step: 159270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:23,379-Speed 9497.35 samples/sec   Loss 5.9678   LearningRate 0.0273   Epoch: 9   Global Step: 159280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:24,501-Speed 9133.77 samples/sec   Loss 5.8669   LearningRate 0.0273   Epoch: 9   Global Step: 159290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:25,606-Speed 9271.90 samples/sec   Loss 5.9620   LearningRate 0.0273   Epoch: 9   Global Step: 159300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:26,776-Speed 8749.60 samples/sec   Loss 6.0761   LearningRate 0.0273   Epoch: 9   Global Step: 159310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:27,862-Speed 9441.75 samples/sec   Loss 5.9898   LearningRate 0.0273   Epoch: 9   Global Step: 159320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 17:57:28,952-Speed 9395.07 samples/sec   Loss 6.0254   LearningRate 0.0273   Epoch: 9   Global Step: 159330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:30,081-Speed 9073.67 samples/sec   Loss 5.8649   LearningRate 0.0273   Epoch: 9   Global Step: 159340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:31,186-Speed 9278.68 samples/sec   Loss 5.9819   LearningRate 0.0273   Epoch: 9   Global Step: 159350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:33,313-Speed 4815.28 samples/sec   Loss 5.9221   LearningRate 0.0273   Epoch: 9   Global Step: 159360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:34,394-Speed 9483.67 samples/sec   Loss 5.8872   LearningRate 0.0273   Epoch: 9   Global Step: 159370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:35,480-Speed 9433.83 samples/sec   Loss 5.9778   LearningRate 0.0273   Epoch: 9   Global Step: 159380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:37,618-Speed 4791.21 samples/sec   Loss 5.8963   LearningRate 0.0273   Epoch: 9   Global Step: 159390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:38,693-Speed 9529.67 samples/sec   Loss 5.8703   LearningRate 0.0273   Epoch: 9   Global Step: 159400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:39,791-Speed 9335.31 samples/sec   Loss 5.8877   LearningRate 0.0273   Epoch: 9   Global Step: 159410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:40,863-Speed 9558.27 samples/sec   Loss 5.9371   LearningRate 0.0273   Epoch: 9   Global Step: 159420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:41,980-Speed 9170.78 samples/sec   Loss 5.9302   LearningRate 0.0273   Epoch: 9   Global Step: 159430   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:57:43,054-Speed 9545.90 samples/sec   Loss 5.9594   LearningRate 0.0273   Epoch: 9   Global Step: 159440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:44,204-Speed 8907.07 samples/sec   Loss 5.8202   LearningRate 0.0273   Epoch: 9   Global Step: 159450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:45,275-Speed 9565.25 samples/sec   Loss 5.8808   LearningRate 0.0273   Epoch: 9   Global Step: 159460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:46,373-Speed 9331.57 samples/sec   Loss 5.8861   LearningRate 0.0273   Epoch: 9   Global Step: 159470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:47,471-Speed 9328.48 samples/sec   Loss 6.0327   LearningRate 0.0273   Epoch: 9   Global Step: 159480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:48,604-Speed 9047.93 samples/sec   Loss 5.8787   LearningRate 0.0273   Epoch: 9   Global Step: 159490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:49,694-Speed 9397.87 samples/sec   Loss 5.9619   LearningRate 0.0273   Epoch: 9   Global Step: 159500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:50,777-Speed 9463.08 samples/sec   Loss 5.9455   LearningRate 0.0273   Epoch: 9   Global Step: 159510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:51,888-Speed 9221.89 samples/sec   Loss 5.8150   LearningRate 0.0273   Epoch: 9   Global Step: 159520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:52,981-Speed 9377.94 samples/sec   Loss 5.9696   LearningRate 0.0273   Epoch: 9   Global Step: 159530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:54,147-Speed 8786.31 samples/sec   Loss 5.9708   LearningRate 0.0273   Epoch: 9   Global Step: 159540   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:57:55,236-Speed 9404.43 samples/sec   Loss 5.9078   LearningRate 0.0273   Epoch: 9   Global Step: 159550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:56,340-Speed 9284.17 samples/sec   Loss 5.9727   LearningRate 0.0273   Epoch: 9   Global Step: 159560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:57,480-Speed 8983.33 samples/sec   Loss 5.8741   LearningRate 0.0272   Epoch: 9   Global Step: 159570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:58,557-Speed 9515.51 samples/sec   Loss 5.9512   LearningRate 0.0272   Epoch: 9   Global Step: 159580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:57:59,628-Speed 9563.94 samples/sec   Loss 6.0808   LearningRate 0.0272   Epoch: 9   Global Step: 159590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:00,719-Speed 9391.10 samples/sec   Loss 5.9403   LearningRate 0.0272   Epoch: 9   Global Step: 159600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:01,856-Speed 9009.51 samples/sec   Loss 5.9612   LearningRate 0.0272   Epoch: 9   Global Step: 159610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:02,952-Speed 9353.16 samples/sec   Loss 5.8508   LearningRate 0.0272   Epoch: 9   Global Step: 159620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:04,047-Speed 9361.22 samples/sec   Loss 5.9589   LearningRate 0.0272   Epoch: 9   Global Step: 159630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:05,125-Speed 9504.95 samples/sec   Loss 5.9104   LearningRate 0.0272   Epoch: 9   Global Step: 159640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:06,225-Speed 9314.54 samples/sec   Loss 6.0066   LearningRate 0.0272   Epoch: 9   Global Step: 159650   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:58:07,360-Speed 9026.20 samples/sec   Loss 5.9488   LearningRate 0.0272   Epoch: 9   Global Step: 159660   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:58:08,412-Speed 9735.99 samples/sec   Loss 5.8730   LearningRate 0.0272   Epoch: 9   Global Step: 159670   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:58:09,496-Speed 9452.57 samples/sec   Loss 6.0368   LearningRate 0.0272   Epoch: 9   Global Step: 159680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:10,565-Speed 9590.88 samples/sec   Loss 5.8799   LearningRate 0.0272   Epoch: 9   Global Step: 159690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:11,638-Speed 9550.53 samples/sec   Loss 5.8175   LearningRate 0.0272   Epoch: 9   Global Step: 159700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:12,716-Speed 9501.02 samples/sec   Loss 5.8941   LearningRate 0.0272   Epoch: 9   Global Step: 159710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:13,822-Speed 9266.54 samples/sec   Loss 6.0406   LearningRate 0.0272   Epoch: 9   Global Step: 159720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:14,897-Speed 9533.09 samples/sec   Loss 5.8889   LearningRate 0.0272   Epoch: 9   Global Step: 159730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:15,985-Speed 9415.29 samples/sec   Loss 5.8154   LearningRate 0.0272   Epoch: 9   Global Step: 159740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:17,071-Speed 9430.27 samples/sec   Loss 5.8810   LearningRate 0.0272   Epoch: 9   Global Step: 159750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:18,172-Speed 9304.62 samples/sec   Loss 6.0255   LearningRate 0.0272   Epoch: 9   Global Step: 159760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:19,262-Speed 9405.36 samples/sec   Loss 5.8867   LearningRate 0.0272   Epoch: 9   Global Step: 159770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:20,370-Speed 9242.01 samples/sec   Loss 5.9761   LearningRate 0.0272   Epoch: 9   Global Step: 159780   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:58:21,468-Speed 9329.93 samples/sec   Loss 5.9945   LearningRate 0.0272   Epoch: 9   Global Step: 159790   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:58:22,554-Speed 9433.90 samples/sec   Loss 5.8302   LearningRate 0.0272   Epoch: 9   Global Step: 159800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:23,639-Speed 9448.36 samples/sec   Loss 5.8534   LearningRate 0.0272   Epoch: 9   Global Step: 159810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:24,729-Speed 9406.18 samples/sec   Loss 5.9841   LearningRate 0.0272   Epoch: 9   Global Step: 159820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:25,840-Speed 9217.20 samples/sec   Loss 5.9189   LearningRate 0.0272   Epoch: 9   Global Step: 159830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:26,978-Speed 9007.43 samples/sec   Loss 5.8775   LearningRate 0.0272   Epoch: 9   Global Step: 159840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:28,094-Speed 9173.10 samples/sec   Loss 5.9585   LearningRate 0.0272   Epoch: 9   Global Step: 159850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:29,204-Speed 9235.48 samples/sec   Loss 5.8668   LearningRate 0.0272   Epoch: 9   Global Step: 159860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:30,309-Speed 9270.52 samples/sec   Loss 6.0936   LearningRate 0.0272   Epoch: 9   Global Step: 159870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:31,416-Speed 9255.64 samples/sec   Loss 5.9518   LearningRate 0.0272   Epoch: 9   Global Step: 159880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:32,576-Speed 8832.94 samples/sec   Loss 5.9730   LearningRate 0.0271   Epoch: 9   Global Step: 159890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:33,709-Speed 9048.67 samples/sec   Loss 5.9891   LearningRate 0.0271   Epoch: 9   Global Step: 159900   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:58:34,803-Speed 9371.16 samples/sec   Loss 5.9969   LearningRate 0.0271   Epoch: 9   Global Step: 159910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:35,903-Speed 9311.50 samples/sec   Loss 5.9711   LearningRate 0.0271   Epoch: 9   Global Step: 159920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:37,085-Speed 8667.74 samples/sec   Loss 5.8456   LearningRate 0.0271   Epoch: 9   Global Step: 159930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:38,164-Speed 9494.28 samples/sec   Loss 5.8630   LearningRate 0.0271   Epoch: 9   Global Step: 159940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:39,268-Speed 9279.19 samples/sec   Loss 5.9433   LearningRate 0.0271   Epoch: 9   Global Step: 159950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:40,398-Speed 9074.26 samples/sec   Loss 5.8462   LearningRate 0.0271   Epoch: 9   Global Step: 159960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:41,528-Speed 9061.64 samples/sec   Loss 6.0143   LearningRate 0.0271   Epoch: 9   Global Step: 159970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:42,656-Speed 9084.40 samples/sec   Loss 5.8203   LearningRate 0.0271   Epoch: 9   Global Step: 159980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:43,789-Speed 9042.41 samples/sec   Loss 5.8705   LearningRate 0.0271   Epoch: 9   Global Step: 159990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:58:44,860-Speed 9566.80 samples/sec   Loss 5.9423   LearningRate 0.0271   Epoch: 9   Global Step: 160000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:59:07,016-[lfw][160000]XNorm: 9.833157
Training: 2022-04-11 17:59:07,017-[lfw][160000]Accuracy-Flip: 0.99583+-0.00310
Training: 2022-04-11 17:59:07,017-[lfw][160000]Accuracy-Highest: 0.99683
Training: 2022-04-11 17:59:32,650-[cfp_fp][160000]XNorm: 8.339008
Training: 2022-04-11 17:59:32,652-[cfp_fp][160000]Accuracy-Flip: 0.96286+-0.01290
Training: 2022-04-11 17:59:32,652-[cfp_fp][160000]Accuracy-Highest: 0.96500
Training: 2022-04-11 17:59:54,660-[agedb_30][160000]XNorm: 9.510938
Training: 2022-04-11 17:59:54,661-[agedb_30][160000]Accuracy-Flip: 0.96583+-0.01086
Training: 2022-04-11 17:59:54,661-[agedb_30][160000]Accuracy-Highest: 0.96783
Training: 2022-04-11 17:59:55,726-Speed 144.50 samples/sec   Loss 5.9835   LearningRate 0.0271   Epoch: 9   Global Step: 160010   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 17:59:56,788-Speed 9648.65 samples/sec   Loss 5.9552   LearningRate 0.0271   Epoch: 9   Global Step: 160020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:59:57,865-Speed 9513.56 samples/sec   Loss 5.9343   LearningRate 0.0271   Epoch: 9   Global Step: 160030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 17:59:58,986-Speed 9145.35 samples/sec   Loss 5.8518   LearningRate 0.0271   Epoch: 9   Global Step: 160040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:00,104-Speed 9158.63 samples/sec   Loss 5.8979   LearningRate 0.0271   Epoch: 9   Global Step: 160050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:01,213-Speed 9239.72 samples/sec   Loss 5.8463   LearningRate 0.0271   Epoch: 9   Global Step: 160060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:02,319-Speed 9271.54 samples/sec   Loss 5.9491   LearningRate 0.0271   Epoch: 9   Global Step: 160070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:03,456-Speed 9010.75 samples/sec   Loss 6.0675   LearningRate 0.0271   Epoch: 9   Global Step: 160080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:04,538-Speed 9466.42 samples/sec   Loss 5.9846   LearningRate 0.0271   Epoch: 9   Global Step: 160090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:05,607-Speed 9585.60 samples/sec   Loss 5.9111   LearningRate 0.0271   Epoch: 9   Global Step: 160100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:06,737-Speed 9067.92 samples/sec   Loss 5.8427   LearningRate 0.0271   Epoch: 9   Global Step: 160110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:07,869-Speed 9054.11 samples/sec   Loss 5.9533   LearningRate 0.0271   Epoch: 9   Global Step: 160120   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:00:08,969-Speed 9311.84 samples/sec   Loss 5.8713   LearningRate 0.0271   Epoch: 9   Global Step: 160130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:12,003-Speed 3376.80 samples/sec   Loss 5.9451   LearningRate 0.0271   Epoch: 9   Global Step: 160140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:13,103-Speed 9314.81 samples/sec   Loss 5.9724   LearningRate 0.0271   Epoch: 9   Global Step: 160150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:14,251-Speed 8927.79 samples/sec   Loss 5.9078   LearningRate 0.0271   Epoch: 9   Global Step: 160160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:16,217-Speed 5210.51 samples/sec   Loss 5.8344   LearningRate 0.0271   Epoch: 9   Global Step: 160170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:17,345-Speed 9081.16 samples/sec   Loss 5.9749   LearningRate 0.0271   Epoch: 9   Global Step: 160180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:18,484-Speed 8992.84 samples/sec   Loss 5.8728   LearningRate 0.0271   Epoch: 9   Global Step: 160190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:20,446-Speed 5222.13 samples/sec   Loss 5.9465   LearningRate 0.0271   Epoch: 9   Global Step: 160200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:21,565-Speed 9157.80 samples/sec   Loss 5.7677   LearningRate 0.0270   Epoch: 9   Global Step: 160210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:22,661-Speed 9344.06 samples/sec   Loss 5.9722   LearningRate 0.0270   Epoch: 9   Global Step: 160220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:23,748-Speed 9426.45 samples/sec   Loss 5.9939   LearningRate 0.0270   Epoch: 9   Global Step: 160230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:24,820-Speed 9559.28 samples/sec   Loss 5.9906   LearningRate 0.0270   Epoch: 9   Global Step: 160240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:25,907-Speed 9423.30 samples/sec   Loss 5.9428   LearningRate 0.0270   Epoch: 9   Global Step: 160250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:27,013-Speed 9270.99 samples/sec   Loss 5.9757   LearningRate 0.0270   Epoch: 9   Global Step: 160260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:28,129-Speed 9173.02 samples/sec   Loss 5.9642   LearningRate 0.0270   Epoch: 9   Global Step: 160270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:29,205-Speed 9528.34 samples/sec   Loss 6.0267   LearningRate 0.0270   Epoch: 9   Global Step: 160280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:30,289-Speed 9452.16 samples/sec   Loss 5.8585   LearningRate 0.0270   Epoch: 9   Global Step: 160290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:31,387-Speed 9334.87 samples/sec   Loss 5.8245   LearningRate 0.0270   Epoch: 9   Global Step: 160300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:32,497-Speed 9225.22 samples/sec   Loss 5.9022   LearningRate 0.0270   Epoch: 9   Global Step: 160310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:33,601-Speed 9289.22 samples/sec   Loss 5.9509   LearningRate 0.0270   Epoch: 9   Global Step: 160320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:34,680-Speed 9492.39 samples/sec   Loss 5.9883   LearningRate 0.0270   Epoch: 9   Global Step: 160330   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:00:35,763-Speed 9457.65 samples/sec   Loss 5.9996   LearningRate 0.0270   Epoch: 9   Global Step: 160340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:36,837-Speed 9542.43 samples/sec   Loss 5.9563   LearningRate 0.0270   Epoch: 9   Global Step: 160350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:37,929-Speed 9385.65 samples/sec   Loss 5.9223   LearningRate 0.0270   Epoch: 9   Global Step: 160360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:39,063-Speed 9028.12 samples/sec   Loss 5.9930   LearningRate 0.0270   Epoch: 9   Global Step: 160370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:40,138-Speed 9533.92 samples/sec   Loss 5.8638   LearningRate 0.0270   Epoch: 9   Global Step: 160380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:41,237-Speed 9325.60 samples/sec   Loss 5.8926   LearningRate 0.0270   Epoch: 9   Global Step: 160390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:42,316-Speed 9491.27 samples/sec   Loss 5.9085   LearningRate 0.0270   Epoch: 9   Global Step: 160400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:43,444-Speed 9087.76 samples/sec   Loss 5.9380   LearningRate 0.0270   Epoch: 9   Global Step: 160410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:44,530-Speed 9436.84 samples/sec   Loss 5.9313   LearningRate 0.0270   Epoch: 9   Global Step: 160420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:45,676-Speed 8938.52 samples/sec   Loss 5.8535   LearningRate 0.0270   Epoch: 9   Global Step: 160430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:46,764-Speed 9420.26 samples/sec   Loss 6.0042   LearningRate 0.0270   Epoch: 9   Global Step: 160440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:47,836-Speed 9557.18 samples/sec   Loss 5.7801   LearningRate 0.0270   Epoch: 9   Global Step: 160450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:48,915-Speed 9499.98 samples/sec   Loss 5.9026   LearningRate 0.0270   Epoch: 9   Global Step: 160460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:49,973-Speed 9686.05 samples/sec   Loss 5.9380   LearningRate 0.0270   Epoch: 9   Global Step: 160470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:51,039-Speed 9609.85 samples/sec   Loss 5.8993   LearningRate 0.0270   Epoch: 9   Global Step: 160480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:52,099-Speed 9662.88 samples/sec   Loss 5.8236   LearningRate 0.0270   Epoch: 9   Global Step: 160490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:53,185-Speed 9434.98 samples/sec   Loss 5.8462   LearningRate 0.0270   Epoch: 9   Global Step: 160500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:54,276-Speed 9392.18 samples/sec   Loss 5.8931   LearningRate 0.0270   Epoch: 9   Global Step: 160510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:55,402-Speed 9102.82 samples/sec   Loss 5.8550   LearningRate 0.0270   Epoch: 9   Global Step: 160520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:56,477-Speed 9532.83 samples/sec   Loss 5.8510   LearningRate 0.0269   Epoch: 9   Global Step: 160530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:57,563-Speed 9433.43 samples/sec   Loss 6.0072   LearningRate 0.0269   Epoch: 9   Global Step: 160540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:58,637-Speed 9539.33 samples/sec   Loss 5.9535   LearningRate 0.0269   Epoch: 9   Global Step: 160550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:00:59,745-Speed 9249.62 samples/sec   Loss 5.9462   LearningRate 0.0269   Epoch: 9   Global Step: 160560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:00,831-Speed 9434.56 samples/sec   Loss 5.8900   LearningRate 0.0269   Epoch: 9   Global Step: 160570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:01,940-Speed 9237.13 samples/sec   Loss 5.8807   LearningRate 0.0269   Epoch: 9   Global Step: 160580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:02,979-Speed 9865.63 samples/sec   Loss 5.8935   LearningRate 0.0269   Epoch: 9   Global Step: 160590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:01:04,077-Speed 9323.64 samples/sec   Loss 5.7780   LearningRate 0.0269   Epoch: 9   Global Step: 160600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:01:05,170-Speed 9381.05 samples/sec   Loss 5.9942   LearningRate 0.0269   Epoch: 9   Global Step: 160610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:01:06,267-Speed 9337.32 samples/sec   Loss 5.8786   LearningRate 0.0269   Epoch: 9   Global Step: 160620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:01:07,360-Speed 9373.11 samples/sec   Loss 6.0266   LearningRate 0.0269   Epoch: 9   Global Step: 160630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:01:08,466-Speed 9268.82 samples/sec   Loss 5.9228   LearningRate 0.0269   Epoch: 9   Global Step: 160640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:01:09,539-Speed 9551.02 samples/sec   Loss 5.9943   LearningRate 0.0269   Epoch: 9   Global Step: 160650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:01:10,607-Speed 9595.69 samples/sec   Loss 5.9533   LearningRate 0.0269   Epoch: 9   Global Step: 160660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:01:11,660-Speed 9726.36 samples/sec   Loss 5.9569   LearningRate 0.0269   Epoch: 9   Global Step: 160670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:01:12,799-Speed 8994.35 samples/sec   Loss 5.9729   LearningRate 0.0269   Epoch: 9   Global Step: 160680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:01:13,928-Speed 9076.73 samples/sec   Loss 5.8819   LearningRate 0.0269   Epoch: 9   Global Step: 160690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:15,017-Speed 9410.99 samples/sec   Loss 5.8751   LearningRate 0.0269   Epoch: 9   Global Step: 160700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:16,100-Speed 9462.37 samples/sec   Loss 5.8964   LearningRate 0.0269   Epoch: 9   Global Step: 160710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:17,219-Speed 9155.44 samples/sec   Loss 5.9711   LearningRate 0.0269   Epoch: 9   Global Step: 160720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:18,313-Speed 9363.68 samples/sec   Loss 5.9461   LearningRate 0.0269   Epoch: 9   Global Step: 160730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:19,419-Speed 9267.29 samples/sec   Loss 5.9222   LearningRate 0.0269   Epoch: 9   Global Step: 160740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:20,538-Speed 9154.38 samples/sec   Loss 5.9885   LearningRate 0.0269   Epoch: 9   Global Step: 160750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:21,679-Speed 8975.96 samples/sec   Loss 6.0529   LearningRate 0.0269   Epoch: 9   Global Step: 160760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:22,783-Speed 9284.34 samples/sec   Loss 5.9215   LearningRate 0.0269   Epoch: 9   Global Step: 160770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:23,898-Speed 9189.52 samples/sec   Loss 5.8960   LearningRate 0.0269   Epoch: 9   Global Step: 160780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:24,984-Speed 9433.97 samples/sec   Loss 6.0418   LearningRate 0.0269   Epoch: 9   Global Step: 160790   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:01:26,048-Speed 9634.34 samples/sec   Loss 5.8927   LearningRate 0.0269   Epoch: 9   Global Step: 160800   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:01:27,133-Speed 9452.25 samples/sec   Loss 5.9010   LearningRate 0.0269   Epoch: 9   Global Step: 160810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:28,219-Speed 9435.27 samples/sec   Loss 5.8098   LearningRate 0.0269   Epoch: 9   Global Step: 160820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:29,320-Speed 9298.99 samples/sec   Loss 5.9276   LearningRate 0.0269   Epoch: 9   Global Step: 160830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:30,444-Speed 9117.89 samples/sec   Loss 5.7365   LearningRate 0.0269   Epoch: 9   Global Step: 160840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:31,543-Speed 9326.99 samples/sec   Loss 5.8567   LearningRate 0.0268   Epoch: 9   Global Step: 160850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:32,639-Speed 9346.65 samples/sec   Loss 5.9245   LearningRate 0.0268   Epoch: 9   Global Step: 160860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:33,772-Speed 9045.21 samples/sec   Loss 6.0074   LearningRate 0.0268   Epoch: 9   Global Step: 160870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:34,866-Speed 9365.33 samples/sec   Loss 5.8513   LearningRate 0.0268   Epoch: 9   Global Step: 160880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:35,968-Speed 9296.16 samples/sec   Loss 6.0039   LearningRate 0.0268   Epoch: 9   Global Step: 160890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:37,090-Speed 9129.78 samples/sec   Loss 5.9664   LearningRate 0.0268   Epoch: 9   Global Step: 160900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:38,202-Speed 9219.04 samples/sec   Loss 6.0558   LearningRate 0.0268   Epoch: 9   Global Step: 160910   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:01:39,259-Speed 9691.60 samples/sec   Loss 5.9460   LearningRate 0.0268   Epoch: 9   Global Step: 160920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:40,320-Speed 9660.55 samples/sec   Loss 5.8349   LearningRate 0.0268   Epoch: 9   Global Step: 160930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:41,440-Speed 9141.73 samples/sec   Loss 5.8948   LearningRate 0.0268   Epoch: 9   Global Step: 160940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:42,539-Speed 9322.65 samples/sec   Loss 5.9168   LearningRate 0.0268   Epoch: 9   Global Step: 160950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:43,664-Speed 9109.96 samples/sec   Loss 5.9156   LearningRate 0.0268   Epoch: 9   Global Step: 160960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:44,748-Speed 9455.90 samples/sec   Loss 6.0211   LearningRate 0.0268   Epoch: 9   Global Step: 160970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:45,865-Speed 9166.08 samples/sec   Loss 5.7559   LearningRate 0.0268   Epoch: 9   Global Step: 160980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:46,952-Speed 9434.80 samples/sec   Loss 5.8521   LearningRate 0.0268   Epoch: 9   Global Step: 160990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:48,044-Speed 9382.21 samples/sec   Loss 5.9291   LearningRate 0.0268   Epoch: 9   Global Step: 161000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:49,154-Speed 9234.22 samples/sec   Loss 5.7871   LearningRate 0.0268   Epoch: 9   Global Step: 161010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:50,238-Speed 9448.65 samples/sec   Loss 5.9291   LearningRate 0.0268   Epoch: 9   Global Step: 161020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:51,302-Speed 9634.40 samples/sec   Loss 5.9816   LearningRate 0.0268   Epoch: 9   Global Step: 161030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:52,393-Speed 9390.68 samples/sec   Loss 5.9825   LearningRate 0.0268   Epoch: 9   Global Step: 161040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:53,493-Speed 9314.13 samples/sec   Loss 5.9572   LearningRate 0.0268   Epoch: 9   Global Step: 161050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:54,581-Speed 9422.38 samples/sec   Loss 5.9017   LearningRate 0.0268   Epoch: 9   Global Step: 161060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:55,699-Speed 9164.10 samples/sec   Loss 5.8804   LearningRate 0.0268   Epoch: 9   Global Step: 161070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:56,823-Speed 9112.63 samples/sec   Loss 5.8317   LearningRate 0.0268   Epoch: 9   Global Step: 161080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:57,885-Speed 9653.14 samples/sec   Loss 5.9277   LearningRate 0.0268   Epoch: 9   Global Step: 161090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:01:58,949-Speed 9621.50 samples/sec   Loss 5.9194   LearningRate 0.0268   Epoch: 9   Global Step: 161100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:00,023-Speed 9543.80 samples/sec   Loss 5.9435   LearningRate 0.0268   Epoch: 9   Global Step: 161110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:01,118-Speed 9355.55 samples/sec   Loss 5.9317   LearningRate 0.0268   Epoch: 9   Global Step: 161120   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:02:02,222-Speed 9277.42 samples/sec   Loss 5.8796   LearningRate 0.0268   Epoch: 9   Global Step: 161130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:03,302-Speed 9486.21 samples/sec   Loss 5.9732   LearningRate 0.0268   Epoch: 9   Global Step: 161140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:04,392-Speed 9401.08 samples/sec   Loss 5.9863   LearningRate 0.0268   Epoch: 9   Global Step: 161150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:05,472-Speed 9486.37 samples/sec   Loss 5.9003   LearningRate 0.0268   Epoch: 9   Global Step: 161160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:06,593-Speed 9140.85 samples/sec   Loss 6.0191   LearningRate 0.0267   Epoch: 9   Global Step: 161170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:07,682-Speed 9415.19 samples/sec   Loss 6.0327   LearningRate 0.0267   Epoch: 9   Global Step: 161180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:08,768-Speed 9433.96 samples/sec   Loss 5.9055   LearningRate 0.0267   Epoch: 9   Global Step: 161190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:09,917-Speed 8920.66 samples/sec   Loss 5.9615   LearningRate 0.0267   Epoch: 9   Global Step: 161200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:11,022-Speed 9269.70 samples/sec   Loss 6.0171   LearningRate 0.0267   Epoch: 9   Global Step: 161210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:12,131-Speed 9237.05 samples/sec   Loss 5.8067   LearningRate 0.0267   Epoch: 9   Global Step: 161220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:13,233-Speed 9295.10 samples/sec   Loss 5.8888   LearningRate 0.0267   Epoch: 9   Global Step: 161230   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:02:14,351-Speed 9168.73 samples/sec   Loss 5.8565   LearningRate 0.0267   Epoch: 9   Global Step: 161240   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:02:15,416-Speed 9620.33 samples/sec   Loss 5.9103   LearningRate 0.0267   Epoch: 9   Global Step: 161250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:16,540-Speed 9112.68 samples/sec   Loss 5.8634   LearningRate 0.0267   Epoch: 9   Global Step: 161260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:02:17,635-Speed 9359.11 samples/sec   Loss 5.9083   LearningRate 0.0267   Epoch: 9   Global Step: 161270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:02:18,719-Speed 9457.59 samples/sec   Loss 5.9342   LearningRate 0.0267   Epoch: 9   Global Step: 161280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:02:19,846-Speed 9090.73 samples/sec   Loss 6.0251   LearningRate 0.0267   Epoch: 9   Global Step: 161290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:02:20,964-Speed 9165.21 samples/sec   Loss 5.9979   LearningRate 0.0267   Epoch: 9   Global Step: 161300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:02:22,018-Speed 9726.70 samples/sec   Loss 5.9487   LearningRate 0.0267   Epoch: 9   Global Step: 161310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:02:23,103-Speed 9439.66 samples/sec   Loss 5.9446   LearningRate 0.0267   Epoch: 9   Global Step: 161320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:02:24,218-Speed 9194.76 samples/sec   Loss 6.0021   LearningRate 0.0267   Epoch: 9   Global Step: 161330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:02:25,373-Speed 8876.27 samples/sec   Loss 5.9258   LearningRate 0.0267   Epoch: 9   Global Step: 161340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:02:26,505-Speed 9045.69 samples/sec   Loss 5.9330   LearningRate 0.0267   Epoch: 9   Global Step: 161350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:02:27,582-Speed 9514.17 samples/sec   Loss 5.8700   LearningRate 0.0267   Epoch: 9   Global Step: 161360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:28,664-Speed 9467.28 samples/sec   Loss 5.8542   LearningRate 0.0267   Epoch: 9   Global Step: 161370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:29,755-Speed 9395.45 samples/sec   Loss 5.9940   LearningRate 0.0267   Epoch: 9   Global Step: 161380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:30,832-Speed 9510.25 samples/sec   Loss 5.9422   LearningRate 0.0267   Epoch: 9   Global Step: 161390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:31,909-Speed 9516.04 samples/sec   Loss 5.9454   LearningRate 0.0267   Epoch: 9   Global Step: 161400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:33,026-Speed 9165.94 samples/sec   Loss 5.8209   LearningRate 0.0267   Epoch: 9   Global Step: 161410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:34,107-Speed 9479.37 samples/sec   Loss 5.9454   LearningRate 0.0267   Epoch: 9   Global Step: 161420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:35,199-Speed 9386.01 samples/sec   Loss 6.0315   LearningRate 0.0267   Epoch: 9   Global Step: 161430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:36,318-Speed 9150.95 samples/sec   Loss 6.0417   LearningRate 0.0267   Epoch: 9   Global Step: 161440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:37,423-Speed 9280.78 samples/sec   Loss 5.8641   LearningRate 0.0267   Epoch: 9   Global Step: 161450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:38,510-Speed 9426.11 samples/sec   Loss 5.9268   LearningRate 0.0267   Epoch: 9   Global Step: 161460   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:02:39,607-Speed 9335.14 samples/sec   Loss 5.9969   LearningRate 0.0267   Epoch: 9   Global Step: 161470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:40,698-Speed 9391.35 samples/sec   Loss 5.9171   LearningRate 0.0267   Epoch: 9   Global Step: 161480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:41,848-Speed 8906.04 samples/sec   Loss 5.9058   LearningRate 0.0266   Epoch: 9   Global Step: 161490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:42,939-Speed 9401.96 samples/sec   Loss 5.7764   LearningRate 0.0266   Epoch: 9   Global Step: 161500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:44,027-Speed 9419.27 samples/sec   Loss 5.9116   LearningRate 0.0266   Epoch: 9   Global Step: 161510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:45,140-Speed 9205.05 samples/sec   Loss 5.8765   LearningRate 0.0266   Epoch: 9   Global Step: 161520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:46,254-Speed 9192.19 samples/sec   Loss 5.9152   LearningRate 0.0266   Epoch: 9   Global Step: 161530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:47,312-Speed 9683.72 samples/sec   Loss 5.9773   LearningRate 0.0266   Epoch: 9   Global Step: 161540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:48,391-Speed 9500.59 samples/sec   Loss 5.9319   LearningRate 0.0266   Epoch: 9   Global Step: 161550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:49,512-Speed 9139.55 samples/sec   Loss 5.9956   LearningRate 0.0266   Epoch: 9   Global Step: 161560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:50,606-Speed 9368.76 samples/sec   Loss 5.8890   LearningRate 0.0266   Epoch: 9   Global Step: 161570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:51,732-Speed 9099.31 samples/sec   Loss 5.9669   LearningRate 0.0266   Epoch: 9   Global Step: 161580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:52,802-Speed 9573.27 samples/sec   Loss 5.9247   LearningRate 0.0266   Epoch: 9   Global Step: 161590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:53,884-Speed 9475.13 samples/sec   Loss 5.8971   LearningRate 0.0266   Epoch: 9   Global Step: 161600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:54,984-Speed 9311.84 samples/sec   Loss 5.9527   LearningRate 0.0266   Epoch: 9   Global Step: 161610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:56,081-Speed 9343.14 samples/sec   Loss 5.9453   LearningRate 0.0266   Epoch: 9   Global Step: 161620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:57,189-Speed 9249.32 samples/sec   Loss 5.9056   LearningRate 0.0266   Epoch: 9   Global Step: 161630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:58,297-Speed 9243.21 samples/sec   Loss 5.9731   LearningRate 0.0266   Epoch: 9   Global Step: 161640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:02:59,381-Speed 9450.85 samples/sec   Loss 5.8934   LearningRate 0.0266   Epoch: 9   Global Step: 161650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:03:00,476-Speed 9357.97 samples/sec   Loss 5.9449   LearningRate 0.0266   Epoch: 9   Global Step: 161660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:03:01,583-Speed 9254.76 samples/sec   Loss 5.8675   LearningRate 0.0266   Epoch: 9   Global Step: 161670   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:03:02,725-Speed 8975.66 samples/sec   Loss 5.8631   LearningRate 0.0266   Epoch: 9   Global Step: 161680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:03:03,793-Speed 9591.34 samples/sec   Loss 5.8774   LearningRate 0.0266   Epoch: 9   Global Step: 161690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:03:04,882-Speed 9410.95 samples/sec   Loss 5.8941   LearningRate 0.0266   Epoch: 9   Global Step: 161700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:05,984-Speed 9296.92 samples/sec   Loss 5.9351   LearningRate 0.0266   Epoch: 9   Global Step: 161710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:07,083-Speed 9323.93 samples/sec   Loss 5.9437   LearningRate 0.0266   Epoch: 9   Global Step: 161720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:08,179-Speed 9351.21 samples/sec   Loss 5.9278   LearningRate 0.0266   Epoch: 9   Global Step: 161730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:09,292-Speed 9207.26 samples/sec   Loss 5.9146   LearningRate 0.0266   Epoch: 9   Global Step: 161740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:10,404-Speed 9212.07 samples/sec   Loss 6.0268   LearningRate 0.0266   Epoch: 9   Global Step: 161750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:11,518-Speed 9194.21 samples/sec   Loss 5.9035   LearningRate 0.0266   Epoch: 9   Global Step: 161760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:12,627-Speed 9243.52 samples/sec   Loss 5.8779   LearningRate 0.0266   Epoch: 9   Global Step: 161770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:13,737-Speed 9229.61 samples/sec   Loss 5.8326   LearningRate 0.0266   Epoch: 9   Global Step: 161780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:14,813-Speed 9522.89 samples/sec   Loss 5.8743   LearningRate 0.0266   Epoch: 9   Global Step: 161790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:15,854-Speed 9833.15 samples/sec   Loss 5.8610   LearningRate 0.0266   Epoch: 9   Global Step: 161800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:03:16,924-Speed 9580.44 samples/sec   Loss 5.8944   LearningRate 0.0266   Epoch: 9   Global Step: 161810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:03:17,989-Speed 9620.54 samples/sec   Loss 6.0305   LearningRate 0.0265   Epoch: 9   Global Step: 161820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:03:19,119-Speed 9066.16 samples/sec   Loss 5.8668   LearningRate 0.0265   Epoch: 9   Global Step: 161830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:03:20,185-Speed 9612.26 samples/sec   Loss 6.0458   LearningRate 0.0265   Epoch: 9   Global Step: 161840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:03:21,277-Speed 9398.11 samples/sec   Loss 6.0179   LearningRate 0.0265   Epoch: 9   Global Step: 161850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:03:22,417-Speed 8992.02 samples/sec   Loss 5.9173   LearningRate 0.0265   Epoch: 9   Global Step: 161860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:03:23,529-Speed 9211.81 samples/sec   Loss 5.9105   LearningRate 0.0265   Epoch: 9   Global Step: 161870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:03:24,621-Speed 9383.44 samples/sec   Loss 5.9646   LearningRate 0.0265   Epoch: 9   Global Step: 161880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:03:25,725-Speed 9280.72 samples/sec   Loss 5.8873   LearningRate 0.0265   Epoch: 9   Global Step: 161890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:03:26,825-Speed 9314.54 samples/sec   Loss 5.8948   LearningRate 0.0265   Epoch: 9   Global Step: 161900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:27,919-Speed 9366.66 samples/sec   Loss 6.0278   LearningRate 0.0265   Epoch: 9   Global Step: 161910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:28,992-Speed 9547.73 samples/sec   Loss 5.9108   LearningRate 0.0265   Epoch: 9   Global Step: 161920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:30,090-Speed 9329.21 samples/sec   Loss 5.9426   LearningRate 0.0265   Epoch: 9   Global Step: 161930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:31,219-Speed 9074.07 samples/sec   Loss 5.9139   LearningRate 0.0265   Epoch: 9   Global Step: 161940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:32,326-Speed 9260.11 samples/sec   Loss 5.9332   LearningRate 0.0265   Epoch: 9   Global Step: 161950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:33,396-Speed 9570.35 samples/sec   Loss 6.0044   LearningRate 0.0265   Epoch: 9   Global Step: 161960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:34,494-Speed 9329.77 samples/sec   Loss 5.8501   LearningRate 0.0265   Epoch: 9   Global Step: 161970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:35,646-Speed 8898.02 samples/sec   Loss 5.9721   LearningRate 0.0265   Epoch: 9   Global Step: 161980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:36,739-Speed 9368.94 samples/sec   Loss 6.0583   LearningRate 0.0265   Epoch: 9   Global Step: 161990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:03:37,858-Speed 9160.14 samples/sec   Loss 5.9602   LearningRate 0.0265   Epoch: 9   Global Step: 162000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:03:59,866-[lfw][162000]XNorm: 9.783905
Training: 2022-04-11 18:03:59,868-[lfw][162000]Accuracy-Flip: 0.99600+-0.00260
Training: 2022-04-11 18:03:59,868-[lfw][162000]Accuracy-Highest: 0.99683
Training: 2022-04-11 18:04:25,707-[cfp_fp][162000]XNorm: 8.399252
Training: 2022-04-11 18:04:25,708-[cfp_fp][162000]Accuracy-Flip: 0.96300+-0.00993
Training: 2022-04-11 18:04:25,708-[cfp_fp][162000]Accuracy-Highest: 0.96500
Training: 2022-04-11 18:04:47,912-[agedb_30][162000]XNorm: 9.478708
Training: 2022-04-11 18:04:47,913-[agedb_30][162000]Accuracy-Flip: 0.96317+-0.00845
Training: 2022-04-11 18:04:47,913-[agedb_30][162000]Accuracy-Highest: 0.96783
Training: 2022-04-11 18:04:49,004-Speed 143.93 samples/sec   Loss 5.9176   LearningRate 0.0265   Epoch: 9   Global Step: 162010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:04:50,122-Speed 9163.93 samples/sec   Loss 5.8604   LearningRate 0.0265   Epoch: 9   Global Step: 162020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:04:51,237-Speed 9191.93 samples/sec   Loss 5.9517   LearningRate 0.0265   Epoch: 9   Global Step: 162030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:04:52,317-Speed 9484.87 samples/sec   Loss 5.9049   LearningRate 0.0265   Epoch: 9   Global Step: 162040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:04:53,404-Speed 9422.03 samples/sec   Loss 5.9357   LearningRate 0.0265   Epoch: 9   Global Step: 162050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:04:54,462-Speed 9683.66 samples/sec   Loss 5.8775   LearningRate 0.0265   Epoch: 9   Global Step: 162060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:04:55,517-Speed 9720.09 samples/sec   Loss 5.8768   LearningRate 0.0265   Epoch: 9   Global Step: 162070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:04:56,575-Speed 9683.48 samples/sec   Loss 6.0787   LearningRate 0.0265   Epoch: 9   Global Step: 162080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:04:57,660-Speed 9442.68 samples/sec   Loss 5.9439   LearningRate 0.0265   Epoch: 9   Global Step: 162090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:04:58,741-Speed 9474.03 samples/sec   Loss 5.8880   LearningRate 0.0265   Epoch: 9   Global Step: 162100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:04:59,860-Speed 9158.10 samples/sec   Loss 5.8809   LearningRate 0.0265   Epoch: 9   Global Step: 162110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:00,901-Speed 9843.74 samples/sec   Loss 5.9844   LearningRate 0.0265   Epoch: 9   Global Step: 162120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:01,973-Speed 9555.64 samples/sec   Loss 5.8558   LearningRate 0.0265   Epoch: 9   Global Step: 162130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:03,074-Speed 9308.95 samples/sec   Loss 5.8502   LearningRate 0.0264   Epoch: 9   Global Step: 162140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:04,150-Speed 9522.48 samples/sec   Loss 5.8693   LearningRate 0.0264   Epoch: 9   Global Step: 162150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:05,204-Speed 9724.43 samples/sec   Loss 5.8881   LearningRate 0.0264   Epoch: 9   Global Step: 162160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:06,247-Speed 9815.52 samples/sec   Loss 5.8690   LearningRate 0.0264   Epoch: 9   Global Step: 162170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:07,309-Speed 9653.55 samples/sec   Loss 5.9528   LearningRate 0.0264   Epoch: 9   Global Step: 162180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:08,380-Speed 9579.80 samples/sec   Loss 5.9010   LearningRate 0.0264   Epoch: 9   Global Step: 162190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:09,421-Speed 9845.29 samples/sec   Loss 5.8829   LearningRate 0.0264   Epoch: 9   Global Step: 162200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:10,503-Speed 9470.85 samples/sec   Loss 5.9771   LearningRate 0.0264   Epoch: 9   Global Step: 162210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:11,593-Speed 9397.76 samples/sec   Loss 5.9711   LearningRate 0.0264   Epoch: 9   Global Step: 162220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:12,700-Speed 9249.05 samples/sec   Loss 5.8918   LearningRate 0.0264   Epoch: 9   Global Step: 162230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:13,774-Speed 9540.99 samples/sec   Loss 5.9121   LearningRate 0.0264   Epoch: 9   Global Step: 162240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:14,852-Speed 9506.45 samples/sec   Loss 5.9584   LearningRate 0.0264   Epoch: 9   Global Step: 162250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:15,934-Speed 9473.13 samples/sec   Loss 5.9639   LearningRate 0.0264   Epoch: 9   Global Step: 162260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:17,005-Speed 9566.81 samples/sec   Loss 5.9334   LearningRate 0.0264   Epoch: 9   Global Step: 162270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:18,110-Speed 9271.58 samples/sec   Loss 5.8674   LearningRate 0.0264   Epoch: 9   Global Step: 162280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:19,193-Speed 9460.17 samples/sec   Loss 5.9621   LearningRate 0.0264   Epoch: 9   Global Step: 162290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:20,276-Speed 9464.51 samples/sec   Loss 5.7844   LearningRate 0.0264   Epoch: 9   Global Step: 162300   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:05:21,344-Speed 9586.83 samples/sec   Loss 5.8612   LearningRate 0.0264   Epoch: 9   Global Step: 162310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:22,418-Speed 9541.16 samples/sec   Loss 5.9553   LearningRate 0.0264   Epoch: 9   Global Step: 162320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:23,591-Speed 8733.63 samples/sec   Loss 6.0495   LearningRate 0.0264   Epoch: 9   Global Step: 162330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:24,668-Speed 9518.64 samples/sec   Loss 5.8874   LearningRate 0.0264   Epoch: 9   Global Step: 162340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:25,743-Speed 9535.45 samples/sec   Loss 5.8848   LearningRate 0.0264   Epoch: 9   Global Step: 162350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:26,817-Speed 9546.14 samples/sec   Loss 5.9468   LearningRate 0.0264   Epoch: 9   Global Step: 162360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:27,886-Speed 9582.93 samples/sec   Loss 5.9522   LearningRate 0.0264   Epoch: 9   Global Step: 162370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:28,990-Speed 9278.35 samples/sec   Loss 5.8791   LearningRate 0.0264   Epoch: 9   Global Step: 162380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:30,059-Speed 9589.57 samples/sec   Loss 5.9011   LearningRate 0.0264   Epoch: 9   Global Step: 162390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:31,200-Speed 8976.64 samples/sec   Loss 5.8676   LearningRate 0.0264   Epoch: 9   Global Step: 162400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:32,295-Speed 9361.25 samples/sec   Loss 5.8696   LearningRate 0.0264   Epoch: 9   Global Step: 162410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:33,371-Speed 9522.02 samples/sec   Loss 5.9419   LearningRate 0.0264   Epoch: 9   Global Step: 162420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:34,415-Speed 9814.73 samples/sec   Loss 5.9922   LearningRate 0.0264   Epoch: 9   Global Step: 162430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:35,473-Speed 9683.36 samples/sec   Loss 5.9360   LearningRate 0.0264   Epoch: 9   Global Step: 162440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:36,553-Speed 9484.42 samples/sec   Loss 5.8773   LearningRate 0.0264   Epoch: 9   Global Step: 162450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:37,641-Speed 9420.68 samples/sec   Loss 6.0161   LearningRate 0.0264   Epoch: 9   Global Step: 162460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:38,739-Speed 9336.74 samples/sec   Loss 5.9564   LearningRate 0.0263   Epoch: 9   Global Step: 162470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:39,824-Speed 9434.55 samples/sec   Loss 5.9338   LearningRate 0.0263   Epoch: 9   Global Step: 162480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:40,894-Speed 9580.52 samples/sec   Loss 5.9539   LearningRate 0.0263   Epoch: 9   Global Step: 162490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:41,947-Speed 9734.91 samples/sec   Loss 5.9718   LearningRate 0.0263   Epoch: 9   Global Step: 162500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:43,024-Speed 9513.19 samples/sec   Loss 5.8982   LearningRate 0.0263   Epoch: 9   Global Step: 162510   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:05:44,089-Speed 9616.53 samples/sec   Loss 5.9729   LearningRate 0.0263   Epoch: 9   Global Step: 162520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:45,188-Speed 9326.39 samples/sec   Loss 5.8871   LearningRate 0.0263   Epoch: 9   Global Step: 162530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:46,301-Speed 9207.73 samples/sec   Loss 6.0192   LearningRate 0.0263   Epoch: 9   Global Step: 162540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:47,407-Speed 9259.86 samples/sec   Loss 5.9141   LearningRate 0.0263   Epoch: 9   Global Step: 162550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:48,518-Speed 9222.78 samples/sec   Loss 5.9258   LearningRate 0.0263   Epoch: 9   Global Step: 162560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:49,590-Speed 9560.66 samples/sec   Loss 5.9707   LearningRate 0.0263   Epoch: 9   Global Step: 162570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:05:50,694-Speed 9286.03 samples/sec   Loss 5.8586   LearningRate 0.0263   Epoch: 9   Global Step: 162580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:51,792-Speed 9332.69 samples/sec   Loss 5.9077   LearningRate 0.0263   Epoch: 9   Global Step: 162590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:52,932-Speed 8983.76 samples/sec   Loss 5.8544   LearningRate 0.0263   Epoch: 9   Global Step: 162600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:54,004-Speed 9554.15 samples/sec   Loss 5.8187   LearningRate 0.0263   Epoch: 9   Global Step: 162610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:55,073-Speed 9584.58 samples/sec   Loss 5.9218   LearningRate 0.0263   Epoch: 9   Global Step: 162620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:56,162-Speed 9413.85 samples/sec   Loss 5.8967   LearningRate 0.0263   Epoch: 9   Global Step: 162630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:57,240-Speed 9508.33 samples/sec   Loss 5.9512   LearningRate 0.0263   Epoch: 9   Global Step: 162640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:58,364-Speed 9113.73 samples/sec   Loss 5.9546   LearningRate 0.0263   Epoch: 9   Global Step: 162650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:05:59,423-Speed 9673.54 samples/sec   Loss 6.0164   LearningRate 0.0263   Epoch: 9   Global Step: 162660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:06:00,509-Speed 9434.72 samples/sec   Loss 5.9393   LearningRate 0.0263   Epoch: 9   Global Step: 162670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:06:01,611-Speed 9299.35 samples/sec   Loss 6.0245   LearningRate 0.0263   Epoch: 9   Global Step: 162680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:02,714-Speed 9285.92 samples/sec   Loss 6.0291   LearningRate 0.0263   Epoch: 9   Global Step: 162690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:03,796-Speed 9469.15 samples/sec   Loss 5.8820   LearningRate 0.0263   Epoch: 9   Global Step: 162700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:04,868-Speed 9561.14 samples/sec   Loss 5.8992   LearningRate 0.0263   Epoch: 9   Global Step: 162710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:06:05,974-Speed 9263.25 samples/sec   Loss 6.0163   LearningRate 0.0263   Epoch: 9   Global Step: 162720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:06:07,106-Speed 9050.93 samples/sec   Loss 6.0188   LearningRate 0.0263   Epoch: 9   Global Step: 162730   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:06:08,187-Speed 9481.88 samples/sec   Loss 5.9890   LearningRate 0.0263   Epoch: 9   Global Step: 162740   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:06:09,281-Speed 9371.40 samples/sec   Loss 5.8147   LearningRate 0.0263   Epoch: 9   Global Step: 162750   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:06:10,346-Speed 9617.95 samples/sec   Loss 5.7538   LearningRate 0.0263   Epoch: 9   Global Step: 162760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:06:11,419-Speed 9545.72 samples/sec   Loss 5.8899   LearningRate 0.0263   Epoch: 9   Global Step: 162770   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:06:12,514-Speed 9354.42 samples/sec   Loss 5.8498   LearningRate 0.0263   Epoch: 9   Global Step: 162780   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:06:13,613-Speed 9326.99 samples/sec   Loss 5.8500   LearningRate 0.0262   Epoch: 9   Global Step: 162790   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:06:14,708-Speed 9354.58 samples/sec   Loss 5.8993   LearningRate 0.0262   Epoch: 9   Global Step: 162800   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:06:15,733-Speed 9994.26 samples/sec   Loss 5.9130   LearningRate 0.0262   Epoch: 9   Global Step: 162810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:16,815-Speed 9472.07 samples/sec   Loss 5.9504   LearningRate 0.0262   Epoch: 9   Global Step: 162820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:17,933-Speed 9163.11 samples/sec   Loss 5.9597   LearningRate 0.0262   Epoch: 9   Global Step: 162830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:18,993-Speed 9671.15 samples/sec   Loss 6.0280   LearningRate 0.0262   Epoch: 9   Global Step: 162840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:20,080-Speed 9424.36 samples/sec   Loss 5.9737   LearningRate 0.0262   Epoch: 9   Global Step: 162850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:21,158-Speed 9507.30 samples/sec   Loss 5.9341   LearningRate 0.0262   Epoch: 9   Global Step: 162860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:22,241-Speed 9461.81 samples/sec   Loss 5.8930   LearningRate 0.0262   Epoch: 9   Global Step: 162870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:23,310-Speed 9590.56 samples/sec   Loss 6.0423   LearningRate 0.0262   Epoch: 9   Global Step: 162880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:24,421-Speed 9218.96 samples/sec   Loss 5.9698   LearningRate 0.0262   Epoch: 9   Global Step: 162890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:25,548-Speed 9096.53 samples/sec   Loss 5.8867   LearningRate 0.0262   Epoch: 9   Global Step: 162900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:26,631-Speed 9455.33 samples/sec   Loss 5.8507   LearningRate 0.0262   Epoch: 9   Global Step: 162910   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:06:27,694-Speed 9637.59 samples/sec   Loss 6.0001   LearningRate 0.0262   Epoch: 9   Global Step: 162920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:28,735-Speed 9845.85 samples/sec   Loss 5.8958   LearningRate 0.0262   Epoch: 9   Global Step: 162930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:29,811-Speed 9524.81 samples/sec   Loss 5.9637   LearningRate 0.0262   Epoch: 9   Global Step: 162940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:30,875-Speed 9624.53 samples/sec   Loss 5.8754   LearningRate 0.0262   Epoch: 9   Global Step: 162950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:31,969-Speed 9372.10 samples/sec   Loss 5.9206   LearningRate 0.0262   Epoch: 9   Global Step: 162960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:33,063-Speed 9359.82 samples/sec   Loss 5.8911   LearningRate 0.0262   Epoch: 9   Global Step: 162970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:34,197-Speed 9038.49 samples/sec   Loss 5.9324   LearningRate 0.0262   Epoch: 9   Global Step: 162980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:35,301-Speed 9287.91 samples/sec   Loss 5.9319   LearningRate 0.0262   Epoch: 9   Global Step: 162990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:36,388-Speed 9419.34 samples/sec   Loss 5.8881   LearningRate 0.0262   Epoch: 9   Global Step: 163000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:37,477-Speed 9409.67 samples/sec   Loss 6.0355   LearningRate 0.0262   Epoch: 9   Global Step: 163010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:38,525-Speed 9780.46 samples/sec   Loss 5.8898   LearningRate 0.0262   Epoch: 9   Global Step: 163020   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:06:39,611-Speed 9440.18 samples/sec   Loss 5.8553   LearningRate 0.0262   Epoch: 9   Global Step: 163030   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:06:40,719-Speed 9246.91 samples/sec   Loss 5.8850   LearningRate 0.0262   Epoch: 9   Global Step: 163040   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:06:41,770-Speed 9743.02 samples/sec   Loss 5.9459   LearningRate 0.0262   Epoch: 9   Global Step: 163050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:42,837-Speed 9601.12 samples/sec   Loss 5.9146   LearningRate 0.0262   Epoch: 9   Global Step: 163060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:43,913-Speed 9526.51 samples/sec   Loss 5.8856   LearningRate 0.0262   Epoch: 9   Global Step: 163070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:44,976-Speed 9636.69 samples/sec   Loss 5.9233   LearningRate 0.0262   Epoch: 9   Global Step: 163080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:46,069-Speed 9372.30 samples/sec   Loss 5.9132   LearningRate 0.0262   Epoch: 9   Global Step: 163090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:47,146-Speed 9511.91 samples/sec   Loss 5.9218   LearningRate 0.0262   Epoch: 9   Global Step: 163100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:48,245-Speed 9320.95 samples/sec   Loss 5.8748   LearningRate 0.0262   Epoch: 9   Global Step: 163110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:49,304-Speed 9683.47 samples/sec   Loss 5.9837   LearningRate 0.0261   Epoch: 9   Global Step: 163120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:50,364-Speed 9670.95 samples/sec   Loss 5.9352   LearningRate 0.0261   Epoch: 9   Global Step: 163130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:51,469-Speed 9274.66 samples/sec   Loss 6.0269   LearningRate 0.0261   Epoch: 9   Global Step: 163140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:52,559-Speed 9392.84 samples/sec   Loss 5.9287   LearningRate 0.0261   Epoch: 9   Global Step: 163150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:53,629-Speed 9579.91 samples/sec   Loss 5.8570   LearningRate 0.0261   Epoch: 9   Global Step: 163160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:54,706-Speed 9515.49 samples/sec   Loss 5.9203   LearningRate 0.0261   Epoch: 9   Global Step: 163170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:55,796-Speed 9405.76 samples/sec   Loss 5.8926   LearningRate 0.0261   Epoch: 9   Global Step: 163180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:56,889-Speed 9370.60 samples/sec   Loss 5.9010   LearningRate 0.0261   Epoch: 9   Global Step: 163190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:57,971-Speed 9474.19 samples/sec   Loss 5.9776   LearningRate 0.0261   Epoch: 9   Global Step: 163200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:06:59,070-Speed 9322.79 samples/sec   Loss 5.9564   LearningRate 0.0261   Epoch: 9   Global Step: 163210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:00,175-Speed 9264.51 samples/sec   Loss 6.0440   LearningRate 0.0261   Epoch: 9   Global Step: 163220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:01,306-Speed 9067.65 samples/sec   Loss 5.9447   LearningRate 0.0261   Epoch: 9   Global Step: 163230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:02,398-Speed 9375.55 samples/sec   Loss 5.9015   LearningRate 0.0261   Epoch: 9   Global Step: 163240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:03,482-Speed 9456.99 samples/sec   Loss 5.9052   LearningRate 0.0261   Epoch: 9   Global Step: 163250   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:07:04,565-Speed 9458.09 samples/sec   Loss 5.8515   LearningRate 0.0261   Epoch: 9   Global Step: 163260   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:07:05,648-Speed 9460.27 samples/sec   Loss 5.8582   LearningRate 0.0261   Epoch: 9   Global Step: 163270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:06,744-Speed 9351.50 samples/sec   Loss 5.9623   LearningRate 0.0261   Epoch: 9   Global Step: 163280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:07,810-Speed 9618.07 samples/sec   Loss 5.9207   LearningRate 0.0261   Epoch: 9   Global Step: 163290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:08,871-Speed 9652.12 samples/sec   Loss 6.0042   LearningRate 0.0261   Epoch: 9   Global Step: 163300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:09,965-Speed 9364.50 samples/sec   Loss 5.9665   LearningRate 0.0261   Epoch: 9   Global Step: 163310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:11,096-Speed 9058.71 samples/sec   Loss 5.9465   LearningRate 0.0261   Epoch: 9   Global Step: 163320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:12,177-Speed 9478.09 samples/sec   Loss 5.8812   LearningRate 0.0261   Epoch: 9   Global Step: 163330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:13,225-Speed 9785.87 samples/sec   Loss 5.9646   LearningRate 0.0261   Epoch: 9   Global Step: 163340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:14,342-Speed 9168.13 samples/sec   Loss 5.9719   LearningRate 0.0261   Epoch: 9   Global Step: 163350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:15,427-Speed 9446.33 samples/sec   Loss 5.8737   LearningRate 0.0261   Epoch: 9   Global Step: 163360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:16,547-Speed 9144.71 samples/sec   Loss 5.9143   LearningRate 0.0261   Epoch: 9   Global Step: 163370   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:07:17,590-Speed 9823.46 samples/sec   Loss 5.8650   LearningRate 0.0261   Epoch: 9   Global Step: 163380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:18,738-Speed 8928.71 samples/sec   Loss 5.9015   LearningRate 0.0261   Epoch: 9   Global Step: 163390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:19,846-Speed 9241.08 samples/sec   Loss 5.9610   LearningRate 0.0261   Epoch: 9   Global Step: 163400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:20,923-Speed 9517.09 samples/sec   Loss 5.9090   LearningRate 0.0261   Epoch: 9   Global Step: 163410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:22,003-Speed 9483.35 samples/sec   Loss 5.8501   LearningRate 0.0261   Epoch: 9   Global Step: 163420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:23,109-Speed 9268.44 samples/sec   Loss 5.8965   LearningRate 0.0261   Epoch: 9   Global Step: 163430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:24,197-Speed 9421.00 samples/sec   Loss 5.9201   LearningRate 0.0261   Epoch: 9   Global Step: 163440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:25,248-Speed 9748.93 samples/sec   Loss 5.8711   LearningRate 0.0260   Epoch: 9   Global Step: 163450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:26,330-Speed 9467.54 samples/sec   Loss 5.9575   LearningRate 0.0260   Epoch: 9   Global Step: 163460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:27,427-Speed 9340.93 samples/sec   Loss 5.7976   LearningRate 0.0260   Epoch: 9   Global Step: 163470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:28,511-Speed 9444.47 samples/sec   Loss 5.9160   LearningRate 0.0260   Epoch: 9   Global Step: 163480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:29,606-Speed 9366.42 samples/sec   Loss 5.9756   LearningRate 0.0260   Epoch: 9   Global Step: 163490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:30,652-Speed 9798.01 samples/sec   Loss 5.8944   LearningRate 0.0260   Epoch: 9   Global Step: 163500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:31,783-Speed 9062.15 samples/sec   Loss 5.9275   LearningRate 0.0260   Epoch: 9   Global Step: 163510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:32,864-Speed 9476.23 samples/sec   Loss 5.9547   LearningRate 0.0260   Epoch: 9   Global Step: 163520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:33,947-Speed 9457.57 samples/sec   Loss 5.9631   LearningRate 0.0260   Epoch: 9   Global Step: 163530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:35,045-Speed 9333.58 samples/sec   Loss 5.9464   LearningRate 0.0260   Epoch: 9   Global Step: 163540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:36,108-Speed 9643.08 samples/sec   Loss 5.9746   LearningRate 0.0260   Epoch: 9   Global Step: 163550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:37,239-Speed 9056.55 samples/sec   Loss 5.9509   LearningRate 0.0260   Epoch: 9   Global Step: 163560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:38,317-Speed 9510.88 samples/sec   Loss 5.8642   LearningRate 0.0260   Epoch: 9   Global Step: 163570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:07:39,446-Speed 9073.07 samples/sec   Loss 5.7726   LearningRate 0.0260   Epoch: 9   Global Step: 163580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:07:40,530-Speed 9452.91 samples/sec   Loss 5.8908   LearningRate 0.0260   Epoch: 9   Global Step: 163590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:07:41,601-Speed 9562.41 samples/sec   Loss 5.9214   LearningRate 0.0260   Epoch: 9   Global Step: 163600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:07:42,745-Speed 8956.08 samples/sec   Loss 5.9346   LearningRate 0.0260   Epoch: 9   Global Step: 163610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:07:43,817-Speed 9566.42 samples/sec   Loss 5.9601   LearningRate 0.0260   Epoch: 9   Global Step: 163620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:07:44,909-Speed 9378.77 samples/sec   Loss 5.8462   LearningRate 0.0260   Epoch: 9   Global Step: 163630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:07:45,987-Speed 9510.72 samples/sec   Loss 5.8044   LearningRate 0.0260   Epoch: 9   Global Step: 163640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:07:47,103-Speed 9178.23 samples/sec   Loss 5.8167   LearningRate 0.0260   Epoch: 9   Global Step: 163650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:07:48,227-Speed 9119.52 samples/sec   Loss 5.9216   LearningRate 0.0260   Epoch: 9   Global Step: 163660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:07:49,339-Speed 9212.92 samples/sec   Loss 6.0230   LearningRate 0.0260   Epoch: 9   Global Step: 163670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:50,448-Speed 9237.38 samples/sec   Loss 5.9021   LearningRate 0.0260   Epoch: 9   Global Step: 163680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:51,539-Speed 9390.30 samples/sec   Loss 5.8825   LearningRate 0.0260   Epoch: 9   Global Step: 163690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:52,636-Speed 9338.80 samples/sec   Loss 5.9489   LearningRate 0.0260   Epoch: 9   Global Step: 163700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:53,717-Speed 9481.52 samples/sec   Loss 5.8688   LearningRate 0.0260   Epoch: 9   Global Step: 163710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:54,801-Speed 9450.66 samples/sec   Loss 5.7623   LearningRate 0.0260   Epoch: 9   Global Step: 163720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:55,896-Speed 9363.67 samples/sec   Loss 5.9372   LearningRate 0.0260   Epoch: 9   Global Step: 163730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:57,032-Speed 9019.14 samples/sec   Loss 5.9479   LearningRate 0.0260   Epoch: 9   Global Step: 163740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:58,103-Speed 9558.26 samples/sec   Loss 5.8438   LearningRate 0.0260   Epoch: 9   Global Step: 163750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:07:59,227-Speed 9118.43 samples/sec   Loss 5.9012   LearningRate 0.0260   Epoch: 9   Global Step: 163760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:08:00,355-Speed 9086.06 samples/sec   Loss 5.8360   LearningRate 0.0259   Epoch: 9   Global Step: 163770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:08:01,475-Speed 9151.41 samples/sec   Loss 5.8203   LearningRate 0.0259   Epoch: 9   Global Step: 163780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:08:02,578-Speed 9290.14 samples/sec   Loss 5.9140   LearningRate 0.0259   Epoch: 9   Global Step: 163790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:08:03,642-Speed 9634.92 samples/sec   Loss 5.9521   LearningRate 0.0259   Epoch: 9   Global Step: 163800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:08:04,746-Speed 9276.33 samples/sec   Loss 5.8654   LearningRate 0.0259   Epoch: 9   Global Step: 163810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:08:05,850-Speed 9281.27 samples/sec   Loss 5.8787   LearningRate 0.0259   Epoch: 9   Global Step: 163820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:08:06,932-Speed 9465.20 samples/sec   Loss 6.0258   LearningRate 0.0259   Epoch: 9   Global Step: 163830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:08:08,026-Speed 9368.97 samples/sec   Loss 5.9080   LearningRate 0.0259   Epoch: 9   Global Step: 163840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:08:09,083-Speed 9695.94 samples/sec   Loss 5.8091   LearningRate 0.0259   Epoch: 9   Global Step: 163850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:08:10,122-Speed 9858.37 samples/sec   Loss 5.9609   LearningRate 0.0259   Epoch: 9   Global Step: 163860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:08:11,183-Speed 9655.61 samples/sec   Loss 5.9418   LearningRate 0.0259   Epoch: 9   Global Step: 163870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:08:12,238-Speed 9708.98 samples/sec   Loss 5.9772   LearningRate 0.0259   Epoch: 9   Global Step: 163880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:08:13,302-Speed 9645.81 samples/sec   Loss 5.8880   LearningRate 0.0259   Epoch: 9   Global Step: 163890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:08:14,393-Speed 9395.03 samples/sec   Loss 5.8692   LearningRate 0.0259   Epoch: 9   Global Step: 163900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:08:15,482-Speed 9404.47 samples/sec   Loss 5.8398   LearningRate 0.0259   Epoch: 9   Global Step: 163910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:08:16,598-Speed 9186.54 samples/sec   Loss 5.9693   LearningRate 0.0259   Epoch: 9   Global Step: 163920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:08:17,699-Speed 9299.80 samples/sec   Loss 5.9155   LearningRate 0.0259   Epoch: 9   Global Step: 163930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:08:18,830-Speed 9060.61 samples/sec   Loss 5.9472   LearningRate 0.0259   Epoch: 9   Global Step: 163940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:08:19,922-Speed 9380.30 samples/sec   Loss 5.8606   LearningRate 0.0259   Epoch: 9   Global Step: 163950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:08:21,042-Speed 9152.74 samples/sec   Loss 5.9330   LearningRate 0.0259   Epoch: 9   Global Step: 163960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:08:22,100-Speed 9687.81 samples/sec   Loss 5.9384   LearningRate 0.0259   Epoch: 9   Global Step: 163970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:08:23,180-Speed 9481.06 samples/sec   Loss 5.9945   LearningRate 0.0259   Epoch: 9   Global Step: 163980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:08:24,303-Speed 9132.13 samples/sec   Loss 5.9950   LearningRate 0.0259   Epoch: 9   Global Step: 163990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:08:25,405-Speed 9297.39 samples/sec   Loss 5.9197   LearningRate 0.0259   Epoch: 9   Global Step: 164000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:08:47,286-[lfw][164000]XNorm: 9.706120
Training: 2022-04-11 18:08:47,287-[lfw][164000]Accuracy-Flip: 0.99633+-0.00245
Training: 2022-04-11 18:08:47,288-[lfw][164000]Accuracy-Highest: 0.99683
Training: 2022-04-11 18:09:12,527-[cfp_fp][164000]XNorm: 8.224196
Training: 2022-04-11 18:09:12,528-[cfp_fp][164000]Accuracy-Flip: 0.96329+-0.00915
Training: 2022-04-11 18:09:12,528-[cfp_fp][164000]Accuracy-Highest: 0.96500
Training: 2022-04-11 18:09:34,292-[agedb_30][164000]XNorm: 9.287480
Training: 2022-04-11 18:09:34,293-[agedb_30][164000]Accuracy-Flip: 0.96683+-0.00867
Training: 2022-04-11 18:09:34,293-[agedb_30][164000]Accuracy-Highest: 0.96783
Training: 2022-04-11 18:09:35,378-Speed 146.34 samples/sec   Loss 6.0574   LearningRate 0.0259   Epoch: 9   Global Step: 164010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:36,425-Speed 9783.55 samples/sec   Loss 5.9910   LearningRate 0.0259   Epoch: 9   Global Step: 164020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:37,488-Speed 9640.83 samples/sec   Loss 5.9975   LearningRate 0.0259   Epoch: 9   Global Step: 164030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:38,579-Speed 9389.67 samples/sec   Loss 5.8989   LearningRate 0.0259   Epoch: 9   Global Step: 164040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:39,677-Speed 9329.44 samples/sec   Loss 5.8852   LearningRate 0.0259   Epoch: 9   Global Step: 164050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:40,737-Speed 9670.97 samples/sec   Loss 5.9226   LearningRate 0.0259   Epoch: 9   Global Step: 164060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:41,804-Speed 9600.50 samples/sec   Loss 5.9054   LearningRate 0.0259   Epoch: 9   Global Step: 164070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:42,936-Speed 9052.53 samples/sec   Loss 5.7961   LearningRate 0.0259   Epoch: 9   Global Step: 164080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:44,015-Speed 9499.03 samples/sec   Loss 5.9436   LearningRate 0.0259   Epoch: 9   Global Step: 164090   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:09:45,065-Speed 9758.46 samples/sec   Loss 5.8503   LearningRate 0.0258   Epoch: 9   Global Step: 164100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:46,139-Speed 9539.47 samples/sec   Loss 5.8996   LearningRate 0.0258   Epoch: 9   Global Step: 164110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:47,239-Speed 9311.92 samples/sec   Loss 5.9592   LearningRate 0.0258   Epoch: 9   Global Step: 164120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:48,343-Speed 9283.64 samples/sec   Loss 5.8228   LearningRate 0.0258   Epoch: 9   Global Step: 164130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:49,425-Speed 9464.45 samples/sec   Loss 5.7555   LearningRate 0.0258   Epoch: 9   Global Step: 164140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:50,496-Speed 9567.99 samples/sec   Loss 5.9029   LearningRate 0.0258   Epoch: 9   Global Step: 164150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:51,594-Speed 9335.63 samples/sec   Loss 5.9848   LearningRate 0.0258   Epoch: 9   Global Step: 164160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:52,721-Speed 9084.09 samples/sec   Loss 6.0556   LearningRate 0.0258   Epoch: 9   Global Step: 164170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:53,808-Speed 9425.76 samples/sec   Loss 5.7831   LearningRate 0.0258   Epoch: 9   Global Step: 164180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:54,883-Speed 9536.16 samples/sec   Loss 6.0165   LearningRate 0.0258   Epoch: 9   Global Step: 164190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:56,006-Speed 9122.28 samples/sec   Loss 5.9326   LearningRate 0.0258   Epoch: 9   Global Step: 164200   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:09:57,126-Speed 9153.07 samples/sec   Loss 6.0523   LearningRate 0.0258   Epoch: 9   Global Step: 164210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:58,238-Speed 9209.49 samples/sec   Loss 5.8954   LearningRate 0.0258   Epoch: 9   Global Step: 164220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:09:59,342-Speed 9280.51 samples/sec   Loss 5.8282   LearningRate 0.0258   Epoch: 9   Global Step: 164230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:00,426-Speed 9454.68 samples/sec   Loss 5.8973   LearningRate 0.0258   Epoch: 9   Global Step: 164240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:01,513-Speed 9423.23 samples/sec   Loss 5.8881   LearningRate 0.0258   Epoch: 9   Global Step: 164250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:02,673-Speed 8834.64 samples/sec   Loss 5.9333   LearningRate 0.0258   Epoch: 9   Global Step: 164260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:03,804-Speed 9058.65 samples/sec   Loss 5.9191   LearningRate 0.0258   Epoch: 9   Global Step: 164270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:04,909-Speed 9271.21 samples/sec   Loss 5.8133   LearningRate 0.0258   Epoch: 9   Global Step: 164280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:06,069-Speed 8833.11 samples/sec   Loss 5.9556   LearningRate 0.0258   Epoch: 9   Global Step: 164290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:07,154-Speed 9436.11 samples/sec   Loss 5.9118   LearningRate 0.0258   Epoch: 9   Global Step: 164300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:08,255-Speed 9309.71 samples/sec   Loss 5.9785   LearningRate 0.0258   Epoch: 9   Global Step: 164310   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:10:09,368-Speed 9215.04 samples/sec   Loss 5.8567   LearningRate 0.0258   Epoch: 9   Global Step: 164320   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:10:10,443-Speed 9527.50 samples/sec   Loss 5.9648   LearningRate 0.0258   Epoch: 9   Global Step: 164330   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:10:11,519-Speed 9525.28 samples/sec   Loss 5.9445   LearningRate 0.0258   Epoch: 9   Global Step: 164340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:12,555-Speed 9884.25 samples/sec   Loss 5.9460   LearningRate 0.0258   Epoch: 9   Global Step: 164350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:13,652-Speed 9341.51 samples/sec   Loss 5.9566   LearningRate 0.0258   Epoch: 9   Global Step: 164360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:14,782-Speed 9064.25 samples/sec   Loss 5.8587   LearningRate 0.0258   Epoch: 9   Global Step: 164370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:15,863-Speed 9486.46 samples/sec   Loss 5.9266   LearningRate 0.0258   Epoch: 9   Global Step: 164380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:16,955-Speed 9378.49 samples/sec   Loss 5.8495   LearningRate 0.0258   Epoch: 9   Global Step: 164390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:18,068-Speed 9209.74 samples/sec   Loss 6.0284   LearningRate 0.0258   Epoch: 9   Global Step: 164400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:19,109-Speed 9834.63 samples/sec   Loss 6.0356   LearningRate 0.0258   Epoch: 9   Global Step: 164410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:20,224-Speed 9190.52 samples/sec   Loss 5.8939   LearningRate 0.0258   Epoch: 9   Global Step: 164420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:21,274-Speed 9759.26 samples/sec   Loss 5.8194   LearningRate 0.0257   Epoch: 9   Global Step: 164430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:22,402-Speed 9080.20 samples/sec   Loss 5.9434   LearningRate 0.0257   Epoch: 9   Global Step: 164440   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:10:23,461-Speed 9676.48 samples/sec   Loss 5.9892   LearningRate 0.0257   Epoch: 9   Global Step: 164450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:24,549-Speed 9420.24 samples/sec   Loss 5.8571   LearningRate 0.0257   Epoch: 9   Global Step: 164460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:25,665-Speed 9178.09 samples/sec   Loss 5.9742   LearningRate 0.0257   Epoch: 9   Global Step: 164470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:26,761-Speed 9351.02 samples/sec   Loss 5.8930   LearningRate 0.0257   Epoch: 9   Global Step: 164480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:27,875-Speed 9199.58 samples/sec   Loss 5.8738   LearningRate 0.0257   Epoch: 9   Global Step: 164490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:28,959-Speed 9450.17 samples/sec   Loss 5.9547   LearningRate 0.0257   Epoch: 9   Global Step: 164500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:30,047-Speed 9423.27 samples/sec   Loss 5.8944   LearningRate 0.0257   Epoch: 9   Global Step: 164510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:31,114-Speed 9603.41 samples/sec   Loss 5.8555   LearningRate 0.0257   Epoch: 9   Global Step: 164520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:32,186-Speed 9549.52 samples/sec   Loss 5.9001   LearningRate 0.0257   Epoch: 9   Global Step: 164530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:33,296-Speed 9232.17 samples/sec   Loss 5.9194   LearningRate 0.0257   Epoch: 9   Global Step: 164540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:34,413-Speed 9173.30 samples/sec   Loss 5.9073   LearningRate 0.0257   Epoch: 9   Global Step: 164550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:35,542-Speed 9072.47 samples/sec   Loss 5.9442   LearningRate 0.0257   Epoch: 9   Global Step: 164560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:36,645-Speed 9287.57 samples/sec   Loss 5.7810   LearningRate 0.0257   Epoch: 9   Global Step: 164570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:37,758-Speed 9208.60 samples/sec   Loss 6.0063   LearningRate 0.0257   Epoch: 9   Global Step: 164580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:38,889-Speed 9065.51 samples/sec   Loss 5.8971   LearningRate 0.0257   Epoch: 9   Global Step: 164590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:39,953-Speed 9627.78 samples/sec   Loss 5.9557   LearningRate 0.0257   Epoch: 9   Global Step: 164600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:41,071-Speed 9163.53 samples/sec   Loss 5.8787   LearningRate 0.0257   Epoch: 9   Global Step: 164610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:42,152-Speed 9481.53 samples/sec   Loss 5.8447   LearningRate 0.0257   Epoch: 9   Global Step: 164620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:43,277-Speed 9105.34 samples/sec   Loss 5.7997   LearningRate 0.0257   Epoch: 9   Global Step: 164630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:44,333-Speed 9704.20 samples/sec   Loss 5.9896   LearningRate 0.0257   Epoch: 9   Global Step: 164640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:45,415-Speed 9470.18 samples/sec   Loss 5.8611   LearningRate 0.0257   Epoch: 9   Global Step: 164650   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:10:46,474-Speed 9675.99 samples/sec   Loss 5.8943   LearningRate 0.0257   Epoch: 9   Global Step: 164660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:47,561-Speed 9426.70 samples/sec   Loss 6.0483   LearningRate 0.0257   Epoch: 9   Global Step: 164670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:48,626-Speed 9621.13 samples/sec   Loss 5.9170   LearningRate 0.0257   Epoch: 9   Global Step: 164680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:49,726-Speed 9317.74 samples/sec   Loss 5.8932   LearningRate 0.0257   Epoch: 9   Global Step: 164690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:50,821-Speed 9360.80 samples/sec   Loss 5.8988   LearningRate 0.0257   Epoch: 9   Global Step: 164700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:51,925-Speed 9278.37 samples/sec   Loss 5.9341   LearningRate 0.0257   Epoch: 9   Global Step: 164710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:52,968-Speed 9825.83 samples/sec   Loss 5.8776   LearningRate 0.0257   Epoch: 9   Global Step: 164720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:54,051-Speed 9459.57 samples/sec   Loss 5.8715   LearningRate 0.0257   Epoch: 9   Global Step: 164730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:55,087-Speed 9884.63 samples/sec   Loss 5.8421   LearningRate 0.0257   Epoch: 9   Global Step: 164740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:56,159-Speed 9559.20 samples/sec   Loss 5.9389   LearningRate 0.0257   Epoch: 9   Global Step: 164750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:57,272-Speed 9211.45 samples/sec   Loss 5.9079   LearningRate 0.0256   Epoch: 9   Global Step: 164760   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:10:58,358-Speed 9437.25 samples/sec   Loss 5.9814   LearningRate 0.0256   Epoch: 9   Global Step: 164770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:10:59,443-Speed 9437.29 samples/sec   Loss 5.9274   LearningRate 0.0256   Epoch: 9   Global Step: 164780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:00,558-Speed 9189.14 samples/sec   Loss 5.8610   LearningRate 0.0256   Epoch: 9   Global Step: 164790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:01,635-Speed 9516.31 samples/sec   Loss 5.9884   LearningRate 0.0256   Epoch: 9   Global Step: 164800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:02,714-Speed 9498.05 samples/sec   Loss 5.8615   LearningRate 0.0256   Epoch: 9   Global Step: 164810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:03,782-Speed 9593.14 samples/sec   Loss 5.9325   LearningRate 0.0256   Epoch: 9   Global Step: 164820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:04,853-Speed 9563.13 samples/sec   Loss 5.8847   LearningRate 0.0256   Epoch: 9   Global Step: 164830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:05,931-Speed 9509.74 samples/sec   Loss 5.8787   LearningRate 0.0256   Epoch: 9   Global Step: 164840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:06,990-Speed 9667.94 samples/sec   Loss 5.8908   LearningRate 0.0256   Epoch: 9   Global Step: 164850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:08,068-Speed 9511.84 samples/sec   Loss 5.8910   LearningRate 0.0256   Epoch: 9   Global Step: 164860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:09,125-Speed 9692.21 samples/sec   Loss 5.9251   LearningRate 0.0256   Epoch: 9   Global Step: 164870   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:11:10,229-Speed 9284.52 samples/sec   Loss 5.8429   LearningRate 0.0256   Epoch: 9   Global Step: 164880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:11,323-Speed 9360.42 samples/sec   Loss 5.7410   LearningRate 0.0256   Epoch: 9   Global Step: 164890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:12,434-Speed 9221.27 samples/sec   Loss 5.9411   LearningRate 0.0256   Epoch: 9   Global Step: 164900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:13,512-Speed 9506.28 samples/sec   Loss 5.8348   LearningRate 0.0256   Epoch: 9   Global Step: 164910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:14,621-Speed 9236.54 samples/sec   Loss 5.8630   LearningRate 0.0256   Epoch: 9   Global Step: 164920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:15,730-Speed 9240.14 samples/sec   Loss 5.8116   LearningRate 0.0256   Epoch: 9   Global Step: 164930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:16,816-Speed 9436.56 samples/sec   Loss 5.9330   LearningRate 0.0256   Epoch: 9   Global Step: 164940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:17,875-Speed 9675.38 samples/sec   Loss 5.9433   LearningRate 0.0256   Epoch: 9   Global Step: 164950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:18,988-Speed 9206.76 samples/sec   Loss 5.8092   LearningRate 0.0256   Epoch: 9   Global Step: 164960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:20,023-Speed 9898.61 samples/sec   Loss 5.8061   LearningRate 0.0256   Epoch: 9   Global Step: 164970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:21,064-Speed 9841.85 samples/sec   Loss 5.9395   LearningRate 0.0256   Epoch: 9   Global Step: 164980   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:11:22,190-Speed 9096.55 samples/sec   Loss 5.8621   LearningRate 0.0256   Epoch: 9   Global Step: 164990   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:11:23,281-Speed 9392.94 samples/sec   Loss 5.9264   LearningRate 0.0256   Epoch: 9   Global Step: 165000   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:11:24,319-Speed 9870.67 samples/sec   Loss 5.9146   LearningRate 0.0256   Epoch: 9   Global Step: 165010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:25,387-Speed 9599.83 samples/sec   Loss 5.8605   LearningRate 0.0256   Epoch: 9   Global Step: 165020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:26,488-Speed 9307.09 samples/sec   Loss 5.8688   LearningRate 0.0256   Epoch: 9   Global Step: 165030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:27,604-Speed 9178.37 samples/sec   Loss 6.0102   LearningRate 0.0256   Epoch: 9   Global Step: 165040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:28,706-Speed 9298.07 samples/sec   Loss 5.8560   LearningRate 0.0256   Epoch: 9   Global Step: 165050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:29,782-Speed 9521.89 samples/sec   Loss 5.9698   LearningRate 0.0256   Epoch: 9   Global Step: 165060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:30,864-Speed 9475.13 samples/sec   Loss 5.9270   LearningRate 0.0256   Epoch: 9   Global Step: 165070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:31,978-Speed 9197.39 samples/sec   Loss 5.9127   LearningRate 0.0256   Epoch: 9   Global Step: 165080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:33,085-Speed 9248.29 samples/sec   Loss 5.9316   LearningRate 0.0255   Epoch: 9   Global Step: 165090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:34,208-Speed 9128.35 samples/sec   Loss 6.0013   LearningRate 0.0255   Epoch: 9   Global Step: 165100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:35,307-Speed 9321.28 samples/sec   Loss 5.8655   LearningRate 0.0255   Epoch: 9   Global Step: 165110   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:11:36,431-Speed 9115.62 samples/sec   Loss 5.8646   LearningRate 0.0255   Epoch: 9   Global Step: 165120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:37,543-Speed 9216.11 samples/sec   Loss 5.9191   LearningRate 0.0255   Epoch: 9   Global Step: 165130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:38,663-Speed 9151.60 samples/sec   Loss 5.8667   LearningRate 0.0255   Epoch: 9   Global Step: 165140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:39,753-Speed 9397.03 samples/sec   Loss 5.8606   LearningRate 0.0255   Epoch: 9   Global Step: 165150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:40,843-Speed 9401.84 samples/sec   Loss 5.8218   LearningRate 0.0255   Epoch: 9   Global Step: 165160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:41,918-Speed 9533.68 samples/sec   Loss 5.9373   LearningRate 0.0255   Epoch: 9   Global Step: 165170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:43,022-Speed 9275.08 samples/sec   Loss 5.9657   LearningRate 0.0255   Epoch: 9   Global Step: 165180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:44,091-Speed 9590.07 samples/sec   Loss 5.8549   LearningRate 0.0255   Epoch: 9   Global Step: 165190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:45,183-Speed 9384.06 samples/sec   Loss 5.8981   LearningRate 0.0255   Epoch: 9   Global Step: 165200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:46,242-Speed 9673.59 samples/sec   Loss 5.9040   LearningRate 0.0255   Epoch: 9   Global Step: 165210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:47,348-Speed 9262.94 samples/sec   Loss 5.9662   LearningRate 0.0255   Epoch: 9   Global Step: 165220   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:11:48,430-Speed 9471.44 samples/sec   Loss 5.8930   LearningRate 0.0255   Epoch: 9   Global Step: 165230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:49,478-Speed 9775.08 samples/sec   Loss 5.8879   LearningRate 0.0255   Epoch: 9   Global Step: 165240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:50,572-Speed 9366.63 samples/sec   Loss 5.9527   LearningRate 0.0255   Epoch: 9   Global Step: 165250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:51,661-Speed 9405.49 samples/sec   Loss 5.8842   LearningRate 0.0255   Epoch: 9   Global Step: 165260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:52,782-Speed 9143.57 samples/sec   Loss 5.8845   LearningRate 0.0255   Epoch: 9   Global Step: 165270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:53,922-Speed 8989.56 samples/sec   Loss 5.9047   LearningRate 0.0255   Epoch: 9   Global Step: 165280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:55,034-Speed 9212.99 samples/sec   Loss 5.8638   LearningRate 0.0255   Epoch: 9   Global Step: 165290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:56,145-Speed 9224.62 samples/sec   Loss 5.9164   LearningRate 0.0255   Epoch: 9   Global Step: 165300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:57,259-Speed 9199.49 samples/sec   Loss 5.9272   LearningRate 0.0255   Epoch: 9   Global Step: 165310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:58,362-Speed 9288.29 samples/sec   Loss 5.8728   LearningRate 0.0255   Epoch: 9   Global Step: 165320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:11:59,461-Speed 9320.18 samples/sec   Loss 5.9023   LearningRate 0.0255   Epoch: 9   Global Step: 165330   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:12:00,563-Speed 9297.64 samples/sec   Loss 5.8933   LearningRate 0.0255   Epoch: 9   Global Step: 165340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:01,642-Speed 9498.28 samples/sec   Loss 5.9278   LearningRate 0.0255   Epoch: 9   Global Step: 165350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:02,711-Speed 9585.84 samples/sec   Loss 5.9825   LearningRate 0.0255   Epoch: 9   Global Step: 165360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:03,766-Speed 9710.79 samples/sec   Loss 5.9706   LearningRate 0.0255   Epoch: 9   Global Step: 165370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:04,848-Speed 9468.85 samples/sec   Loss 5.9181   LearningRate 0.0255   Epoch: 9   Global Step: 165380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:05,968-Speed 9151.51 samples/sec   Loss 5.8622   LearningRate 0.0255   Epoch: 9   Global Step: 165390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:07,071-Speed 9288.43 samples/sec   Loss 5.7887   LearningRate 0.0255   Epoch: 9   Global Step: 165400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:08,119-Speed 9770.33 samples/sec   Loss 5.9216   LearningRate 0.0255   Epoch: 9   Global Step: 165410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:09,256-Speed 9016.37 samples/sec   Loss 5.8504   LearningRate 0.0254   Epoch: 9   Global Step: 165420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:10,350-Speed 9366.67 samples/sec   Loss 6.0476   LearningRate 0.0254   Epoch: 9   Global Step: 165430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:11,458-Speed 9247.11 samples/sec   Loss 5.7943   LearningRate 0.0254   Epoch: 9   Global Step: 165440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:12,556-Speed 9339.37 samples/sec   Loss 5.8876   LearningRate 0.0254   Epoch: 9   Global Step: 165450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:13,655-Speed 9323.89 samples/sec   Loss 5.8076   LearningRate 0.0254   Epoch: 9   Global Step: 165460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:14,733-Speed 9498.90 samples/sec   Loss 5.9038   LearningRate 0.0254   Epoch: 9   Global Step: 165470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:15,805-Speed 9568.48 samples/sec   Loss 5.9081   LearningRate 0.0254   Epoch: 9   Global Step: 165480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:16,891-Speed 9427.40 samples/sec   Loss 5.9197   LearningRate 0.0254   Epoch: 9   Global Step: 165490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:17,988-Speed 9343.84 samples/sec   Loss 5.8729   LearningRate 0.0254   Epoch: 9   Global Step: 165500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:19,083-Speed 9354.23 samples/sec   Loss 5.8012   LearningRate 0.0254   Epoch: 9   Global Step: 165510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:20,212-Speed 9077.90 samples/sec   Loss 5.8234   LearningRate 0.0254   Epoch: 9   Global Step: 165520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:21,286-Speed 9535.36 samples/sec   Loss 5.8658   LearningRate 0.0254   Epoch: 9   Global Step: 165530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:22,343-Speed 9692.00 samples/sec   Loss 5.9717   LearningRate 0.0254   Epoch: 9   Global Step: 165540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:23,446-Speed 9291.59 samples/sec   Loss 5.9305   LearningRate 0.0254   Epoch: 9   Global Step: 165550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:24,586-Speed 8986.86 samples/sec   Loss 5.8857   LearningRate 0.0254   Epoch: 9   Global Step: 165560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:25,645-Speed 9670.36 samples/sec   Loss 5.8350   LearningRate 0.0254   Epoch: 9   Global Step: 165570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:26,762-Speed 9173.83 samples/sec   Loss 5.8769   LearningRate 0.0254   Epoch: 9   Global Step: 165580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:27,875-Speed 9213.90 samples/sec   Loss 5.7870   LearningRate 0.0254   Epoch: 9   Global Step: 165590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:28,943-Speed 9587.11 samples/sec   Loss 5.8762   LearningRate 0.0254   Epoch: 9   Global Step: 165600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:30,032-Speed 9409.46 samples/sec   Loss 5.9378   LearningRate 0.0254   Epoch: 9   Global Step: 165610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:31,102-Speed 9580.64 samples/sec   Loss 5.8914   LearningRate 0.0254   Epoch: 9   Global Step: 165620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:32,203-Speed 9305.53 samples/sec   Loss 5.9375   LearningRate 0.0254   Epoch: 9   Global Step: 165630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:12:33,300-Speed 9342.43 samples/sec   Loss 5.8770   LearningRate 0.0254   Epoch: 9   Global Step: 165640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:12:34,363-Speed 9634.02 samples/sec   Loss 5.8822   LearningRate 0.0254   Epoch: 9   Global Step: 165650   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:12:35,467-Speed 9281.87 samples/sec   Loss 5.8425   LearningRate 0.0254   Epoch: 9   Global Step: 165660   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:12:36,548-Speed 9475.38 samples/sec   Loss 5.9054   LearningRate 0.0254   Epoch: 9   Global Step: 165670   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:12:37,630-Speed 9475.05 samples/sec   Loss 5.8368   LearningRate 0.0254   Epoch: 9   Global Step: 165680   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:12:38,734-Speed 9284.10 samples/sec   Loss 5.9286   LearningRate 0.0254   Epoch: 9   Global Step: 165690   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:12:39,822-Speed 9412.77 samples/sec   Loss 5.8556   LearningRate 0.0254   Epoch: 9   Global Step: 165700   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:12:40,919-Speed 9340.26 samples/sec   Loss 5.9127   LearningRate 0.0254   Epoch: 9   Global Step: 165710   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:12:41,999-Speed 9490.59 samples/sec   Loss 5.8677   LearningRate 0.0254   Epoch: 9   Global Step: 165720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:12:43,156-Speed 8852.30 samples/sec   Loss 5.9347   LearningRate 0.0254   Epoch: 9   Global Step: 165730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:44,235-Speed 9492.15 samples/sec   Loss 5.8702   LearningRate 0.0254   Epoch: 9   Global Step: 165740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:45,331-Speed 9354.28 samples/sec   Loss 5.8393   LearningRate 0.0253   Epoch: 9   Global Step: 165750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:46,416-Speed 9438.93 samples/sec   Loss 5.8917   LearningRate 0.0253   Epoch: 9   Global Step: 165760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:47,504-Speed 9420.96 samples/sec   Loss 5.8945   LearningRate 0.0253   Epoch: 9   Global Step: 165770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:48,600-Speed 9350.70 samples/sec   Loss 5.8820   LearningRate 0.0253   Epoch: 9   Global Step: 165780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:49,664-Speed 9623.92 samples/sec   Loss 5.9001   LearningRate 0.0253   Epoch: 9   Global Step: 165790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:50,732-Speed 9594.65 samples/sec   Loss 5.9396   LearningRate 0.0253   Epoch: 9   Global Step: 165800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:51,817-Speed 9445.69 samples/sec   Loss 6.0209   LearningRate 0.0253   Epoch: 9   Global Step: 165810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:52,912-Speed 9358.43 samples/sec   Loss 5.8924   LearningRate 0.0253   Epoch: 9   Global Step: 165820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:54,038-Speed 9103.37 samples/sec   Loss 5.8916   LearningRate 0.0253   Epoch: 9   Global Step: 165830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:55,125-Speed 9421.45 samples/sec   Loss 5.8866   LearningRate 0.0253   Epoch: 9   Global Step: 165840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:56,192-Speed 9599.44 samples/sec   Loss 5.8961   LearningRate 0.0253   Epoch: 9   Global Step: 165850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:57,278-Speed 9443.07 samples/sec   Loss 5.9520   LearningRate 0.0253   Epoch: 9   Global Step: 165860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:58,338-Speed 9677.23 samples/sec   Loss 5.9311   LearningRate 0.0253   Epoch: 9   Global Step: 165870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:12:59,413-Speed 9526.81 samples/sec   Loss 5.8190   LearningRate 0.0253   Epoch: 9   Global Step: 165880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:13:00,500-Speed 9432.00 samples/sec   Loss 5.8917   LearningRate 0.0253   Epoch: 9   Global Step: 165890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:13:01,590-Speed 9392.13 samples/sec   Loss 5.8793   LearningRate 0.0253   Epoch: 9   Global Step: 165900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:13:02,701-Speed 9226.42 samples/sec   Loss 5.8633   LearningRate 0.0253   Epoch: 9   Global Step: 165910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:13:03,756-Speed 9715.57 samples/sec   Loss 5.8656   LearningRate 0.0253   Epoch: 9   Global Step: 165920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:13:04,891-Speed 9024.08 samples/sec   Loss 5.8728   LearningRate 0.0253   Epoch: 9   Global Step: 165930   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:13:05,954-Speed 9637.72 samples/sec   Loss 5.9037   LearningRate 0.0253   Epoch: 9   Global Step: 165940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:13:07,061-Speed 9261.03 samples/sec   Loss 5.9488   LearningRate 0.0253   Epoch: 9   Global Step: 165950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:13:08,181-Speed 9143.33 samples/sec   Loss 5.9126   LearningRate 0.0253   Epoch: 9   Global Step: 165960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:13:09,247-Speed 9621.53 samples/sec   Loss 5.7684   LearningRate 0.0253   Epoch: 9   Global Step: 165970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:13:10,371-Speed 9120.58 samples/sec   Loss 5.9243   LearningRate 0.0253   Epoch: 9   Global Step: 165980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:13:11,453-Speed 9463.59 samples/sec   Loss 5.8786   LearningRate 0.0253   Epoch: 9   Global Step: 165990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:13:12,552-Speed 9322.42 samples/sec   Loss 5.9531   LearningRate 0.0253   Epoch: 9   Global Step: 166000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:13:34,715-[lfw][166000]XNorm: 9.513423
Training: 2022-04-11 18:13:34,716-[lfw][166000]Accuracy-Flip: 0.99633+-0.00277
Training: 2022-04-11 18:13:34,716-[lfw][166000]Accuracy-Highest: 0.99683
Training: 2022-04-11 18:14:00,306-[cfp_fp][166000]XNorm: 8.136522
Training: 2022-04-11 18:14:00,307-[cfp_fp][166000]Accuracy-Flip: 0.95800+-0.01123
Training: 2022-04-11 18:14:00,307-[cfp_fp][166000]Accuracy-Highest: 0.96500
Training: 2022-04-11 18:14:22,199-[agedb_30][166000]XNorm: 9.208214
Training: 2022-04-11 18:14:22,200-[agedb_30][166000]Accuracy-Flip: 0.96917+-0.00935
Training: 2022-04-11 18:14:22,201-[agedb_30][166000]Accuracy-Highest: 0.96917
Training: 2022-04-11 18:14:23,266-Speed 144.81 samples/sec   Loss 5.8095   LearningRate 0.0253   Epoch: 9   Global Step: 166010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:24,370-Speed 9278.17 samples/sec   Loss 5.9448   LearningRate 0.0253   Epoch: 9   Global Step: 166020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:25,462-Speed 9393.04 samples/sec   Loss 5.8429   LearningRate 0.0253   Epoch: 9   Global Step: 166030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:26,552-Speed 9394.16 samples/sec   Loss 5.8523   LearningRate 0.0253   Epoch: 9   Global Step: 166040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:27,627-Speed 9530.39 samples/sec   Loss 5.9441   LearningRate 0.0253   Epoch: 9   Global Step: 166050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:28,741-Speed 9201.36 samples/sec   Loss 5.7388   LearningRate 0.0253   Epoch: 9   Global Step: 166060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:29,835-Speed 9372.90 samples/sec   Loss 5.8465   LearningRate 0.0253   Epoch: 9   Global Step: 166070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:30,951-Speed 9179.73 samples/sec   Loss 5.8922   LearningRate 0.0252   Epoch: 9   Global Step: 166080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:32,072-Speed 9139.48 samples/sec   Loss 5.7825   LearningRate 0.0252   Epoch: 9   Global Step: 166090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:33,183-Speed 9221.59 samples/sec   Loss 5.9767   LearningRate 0.0252   Epoch: 9   Global Step: 166100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:34,229-Speed 9797.49 samples/sec   Loss 5.8932   LearningRate 0.0252   Epoch: 9   Global Step: 166110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:35,311-Speed 9468.18 samples/sec   Loss 5.8979   LearningRate 0.0252   Epoch: 9   Global Step: 166120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:36,398-Speed 9428.23 samples/sec   Loss 5.7572   LearningRate 0.0252   Epoch: 9   Global Step: 166130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:37,532-Speed 9030.84 samples/sec   Loss 5.8867   LearningRate 0.0252   Epoch: 9   Global Step: 166140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:38,644-Speed 9213.46 samples/sec   Loss 5.7936   LearningRate 0.0252   Epoch: 9   Global Step: 166150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:39,749-Speed 9278.12 samples/sec   Loss 5.8863   LearningRate 0.0252   Epoch: 9   Global Step: 166160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:40,797-Speed 9779.92 samples/sec   Loss 5.9576   LearningRate 0.0252   Epoch: 9   Global Step: 166170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:14:41,896-Speed 9316.53 samples/sec   Loss 5.9263   LearningRate 0.0252   Epoch: 9   Global Step: 166180   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:14:42,987-Speed 9391.48 samples/sec   Loss 5.9036   LearningRate 0.0252   Epoch: 9   Global Step: 166190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:14:44,088-Speed 9308.98 samples/sec   Loss 5.8762   LearningRate 0.0252   Epoch: 9   Global Step: 166200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:14:45,202-Speed 9193.38 samples/sec   Loss 5.9392   LearningRate 0.0252   Epoch: 9   Global Step: 166210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:14:46,279-Speed 9514.42 samples/sec   Loss 5.8296   LearningRate 0.0252   Epoch: 9   Global Step: 166220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:14:47,370-Speed 9396.28 samples/sec   Loss 5.9175   LearningRate 0.0252   Epoch: 9   Global Step: 166230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:14:48,417-Speed 9786.54 samples/sec   Loss 5.8820   LearningRate 0.0252   Epoch: 9   Global Step: 166240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:14:49,520-Speed 9290.22 samples/sec   Loss 5.8808   LearningRate 0.0252   Epoch: 9   Global Step: 166250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:14:50,627-Speed 9260.05 samples/sec   Loss 5.7958   LearningRate 0.0252   Epoch: 9   Global Step: 166260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:14:51,717-Speed 9398.01 samples/sec   Loss 5.9093   LearningRate 0.0252   Epoch: 9   Global Step: 166270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:52,794-Speed 9514.85 samples/sec   Loss 5.9142   LearningRate 0.0252   Epoch: 9   Global Step: 166280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:53,901-Speed 9261.17 samples/sec   Loss 5.9629   LearningRate 0.0252   Epoch: 9   Global Step: 166290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:55,001-Speed 9313.09 samples/sec   Loss 5.7782   LearningRate 0.0252   Epoch: 9   Global Step: 166300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:56,094-Speed 9375.71 samples/sec   Loss 5.7942   LearningRate 0.0252   Epoch: 9   Global Step: 166310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:57,175-Speed 9479.16 samples/sec   Loss 5.9499   LearningRate 0.0252   Epoch: 9   Global Step: 166320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:58,290-Speed 9185.50 samples/sec   Loss 5.9464   LearningRate 0.0252   Epoch: 9   Global Step: 166330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:14:59,402-Speed 9213.97 samples/sec   Loss 5.8531   LearningRate 0.0252   Epoch: 9   Global Step: 166340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:00,506-Speed 9280.48 samples/sec   Loss 5.8510   LearningRate 0.0252   Epoch: 9   Global Step: 166350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:01,631-Speed 9112.54 samples/sec   Loss 5.9123   LearningRate 0.0252   Epoch: 9   Global Step: 166360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:02,764-Speed 9043.42 samples/sec   Loss 5.8198   LearningRate 0.0252   Epoch: 9   Global Step: 166370   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:15:03,829-Speed 9618.52 samples/sec   Loss 5.7588   LearningRate 0.0252   Epoch: 9   Global Step: 166380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:04,885-Speed 9705.77 samples/sec   Loss 5.9118   LearningRate 0.0252   Epoch: 9   Global Step: 166390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:05,982-Speed 9334.61 samples/sec   Loss 5.8590   LearningRate 0.0252   Epoch: 9   Global Step: 166400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:07,079-Speed 9339.79 samples/sec   Loss 5.9101   LearningRate 0.0252   Epoch: 9   Global Step: 166410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:08,178-Speed 9325.09 samples/sec   Loss 5.8672   LearningRate 0.0251   Epoch: 9   Global Step: 166420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:09,244-Speed 9615.87 samples/sec   Loss 6.0134   LearningRate 0.0251   Epoch: 9   Global Step: 166430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:10,330-Speed 9437.58 samples/sec   Loss 6.0142   LearningRate 0.0251   Epoch: 9   Global Step: 166440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:11,427-Speed 9339.75 samples/sec   Loss 6.0240   LearningRate 0.0251   Epoch: 9   Global Step: 166450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:12,539-Speed 9209.05 samples/sec   Loss 5.8037   LearningRate 0.0251   Epoch: 9   Global Step: 166460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:13,636-Speed 9342.80 samples/sec   Loss 5.8457   LearningRate 0.0251   Epoch: 9   Global Step: 166470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:14,745-Speed 9242.15 samples/sec   Loss 5.9621   LearningRate 0.0251   Epoch: 9   Global Step: 166480   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:15:15,845-Speed 9315.62 samples/sec   Loss 5.8552   LearningRate 0.0251   Epoch: 9   Global Step: 166490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:16,898-Speed 9730.10 samples/sec   Loss 5.8914   LearningRate 0.0251   Epoch: 9   Global Step: 166500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:17,992-Speed 9363.87 samples/sec   Loss 5.8500   LearningRate 0.0251   Epoch: 9   Global Step: 166510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:19,120-Speed 9082.81 samples/sec   Loss 5.8672   LearningRate 0.0251   Epoch: 9   Global Step: 166520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:20,213-Speed 9380.15 samples/sec   Loss 5.8990   LearningRate 0.0251   Epoch: 9   Global Step: 166530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:21,289-Speed 9515.47 samples/sec   Loss 5.8733   LearningRate 0.0251   Epoch: 9   Global Step: 166540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:22,378-Speed 9414.74 samples/sec   Loss 5.8285   LearningRate 0.0251   Epoch: 9   Global Step: 166550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:23,420-Speed 9830.75 samples/sec   Loss 5.8353   LearningRate 0.0251   Epoch: 9   Global Step: 166560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:24,512-Speed 9379.95 samples/sec   Loss 5.9064   LearningRate 0.0251   Epoch: 9   Global Step: 166570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:25,612-Speed 9309.21 samples/sec   Loss 5.9381   LearningRate 0.0251   Epoch: 9   Global Step: 166580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:26,671-Speed 9682.24 samples/sec   Loss 5.9156   LearningRate 0.0251   Epoch: 9   Global Step: 166590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:27,737-Speed 9610.86 samples/sec   Loss 5.8524   LearningRate 0.0251   Epoch: 9   Global Step: 166600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:28,825-Speed 9418.85 samples/sec   Loss 5.9040   LearningRate 0.0251   Epoch: 9   Global Step: 166610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:29,899-Speed 9544.52 samples/sec   Loss 5.8779   LearningRate 0.0251   Epoch: 9   Global Step: 166620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:30,995-Speed 9343.66 samples/sec   Loss 5.9213   LearningRate 0.0251   Epoch: 9   Global Step: 166630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:32,086-Speed 9388.94 samples/sec   Loss 5.9098   LearningRate 0.0251   Epoch: 9   Global Step: 166640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:33,217-Speed 9065.44 samples/sec   Loss 5.8763   LearningRate 0.0251   Epoch: 9   Global Step: 166650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:34,292-Speed 9533.51 samples/sec   Loss 5.7162   LearningRate 0.0251   Epoch: 9   Global Step: 166660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:35,396-Speed 9274.88 samples/sec   Loss 5.8495   LearningRate 0.0251   Epoch: 9   Global Step: 166670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:36,517-Speed 9145.24 samples/sec   Loss 5.8802   LearningRate 0.0251   Epoch: 9   Global Step: 166680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:37,616-Speed 9320.22 samples/sec   Loss 5.9357   LearningRate 0.0251   Epoch: 9   Global Step: 166690   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:15:38,725-Speed 9237.56 samples/sec   Loss 5.8838   LearningRate 0.0251   Epoch: 9   Global Step: 166700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:39,805-Speed 9488.82 samples/sec   Loss 5.7465   LearningRate 0.0251   Epoch: 9   Global Step: 166710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:40,926-Speed 9138.10 samples/sec   Loss 5.8707   LearningRate 0.0251   Epoch: 9   Global Step: 166720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:42,012-Speed 9433.76 samples/sec   Loss 5.9502   LearningRate 0.0251   Epoch: 9   Global Step: 166730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:43,121-Speed 9240.33 samples/sec   Loss 5.8728   LearningRate 0.0251   Epoch: 9   Global Step: 166740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:44,223-Speed 9299.59 samples/sec   Loss 5.8689   LearningRate 0.0250   Epoch: 9   Global Step: 166750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:45,337-Speed 9192.64 samples/sec   Loss 5.9342   LearningRate 0.0250   Epoch: 9   Global Step: 166760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:46,405-Speed 9607.08 samples/sec   Loss 5.9013   LearningRate 0.0250   Epoch: 9   Global Step: 166770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:47,505-Speed 9313.83 samples/sec   Loss 5.8277   LearningRate 0.0250   Epoch: 9   Global Step: 166780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:48,693-Speed 8627.09 samples/sec   Loss 5.8954   LearningRate 0.0250   Epoch: 9   Global Step: 166790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:49,764-Speed 9564.48 samples/sec   Loss 5.8911   LearningRate 0.0250   Epoch: 9   Global Step: 166800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:50,844-Speed 9484.46 samples/sec   Loss 5.8022   LearningRate 0.0250   Epoch: 9   Global Step: 166810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:51,908-Speed 9628.92 samples/sec   Loss 5.8033   LearningRate 0.0250   Epoch: 9   Global Step: 166820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:53,007-Speed 9328.20 samples/sec   Loss 5.7493   LearningRate 0.0250   Epoch: 9   Global Step: 166830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:54,143-Speed 9013.98 samples/sec   Loss 5.9195   LearningRate 0.0250   Epoch: 9   Global Step: 166840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:55,230-Speed 9426.91 samples/sec   Loss 5.8502   LearningRate 0.0250   Epoch: 9   Global Step: 166850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:56,305-Speed 9536.30 samples/sec   Loss 5.9175   LearningRate 0.0250   Epoch: 9   Global Step: 166860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:57,377-Speed 9556.24 samples/sec   Loss 5.8958   LearningRate 0.0250   Epoch: 9   Global Step: 166870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:58,495-Speed 9165.66 samples/sec   Loss 5.8182   LearningRate 0.0250   Epoch: 9   Global Step: 166880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:15:59,560-Speed 9617.27 samples/sec   Loss 5.7994   LearningRate 0.0250   Epoch: 9   Global Step: 166890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:16:01,138-Speed 6493.50 samples/sec   Loss 5.9078   LearningRate 0.0250   Epoch: 9   Global Step: 166900   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:16:02,148-Speed 10146.12 samples/sec   Loss 5.8624   LearningRate 0.0250   Epoch: 9   Global Step: 166910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:16:45,735-Speed 234.94 samples/sec   Loss 5.0620   LearningRate 0.0250   Epoch: 10   Global Step: 166920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:16:47,021-Speed 7974.88 samples/sec   Loss 4.9782   LearningRate 0.0250   Epoch: 10   Global Step: 166930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:16:48,267-Speed 8223.60 samples/sec   Loss 5.0605   LearningRate 0.0250   Epoch: 10   Global Step: 166940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:16:49,615-Speed 7599.32 samples/sec   Loss 5.1017   LearningRate 0.0250   Epoch: 10   Global Step: 166950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:16:50,705-Speed 9402.98 samples/sec   Loss 5.1200   LearningRate 0.0250   Epoch: 10   Global Step: 166960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:16:52,018-Speed 7802.50 samples/sec   Loss 5.0776   LearningRate 0.0250   Epoch: 10   Global Step: 166970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:16:53,376-Speed 7545.21 samples/sec   Loss 5.1151   LearningRate 0.0250   Epoch: 10   Global Step: 166980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:16:54,472-Speed 9349.27 samples/sec   Loss 5.1404   LearningRate 0.0250   Epoch: 10   Global Step: 166990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:16:55,837-Speed 7502.82 samples/sec   Loss 5.0729   LearningRate 0.0250   Epoch: 10   Global Step: 167000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:16:56,929-Speed 9385.10 samples/sec   Loss 5.0419   LearningRate 0.0250   Epoch: 10   Global Step: 167010   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:16:58,181-Speed 8186.63 samples/sec   Loss 5.1430   LearningRate 0.0250   Epoch: 10   Global Step: 167020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:16:59,286-Speed 9268.62 samples/sec   Loss 5.1004   LearningRate 0.0250   Epoch: 10   Global Step: 167030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:00,590-Speed 7859.99 samples/sec   Loss 5.0390   LearningRate 0.0250   Epoch: 10   Global Step: 167040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:01,737-Speed 8927.16 samples/sec   Loss 5.1679   LearningRate 0.0250   Epoch: 10   Global Step: 167050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:02,852-Speed 9191.86 samples/sec   Loss 5.0737   LearningRate 0.0250   Epoch: 10   Global Step: 167060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:03,974-Speed 9130.25 samples/sec   Loss 5.1314   LearningRate 0.0250   Epoch: 10   Global Step: 167070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:05,062-Speed 9426.30 samples/sec   Loss 5.0584   LearningRate 0.0249   Epoch: 10   Global Step: 167080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:06,139-Speed 9520.16 samples/sec   Loss 5.1344   LearningRate 0.0249   Epoch: 10   Global Step: 167090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:07,248-Speed 9239.22 samples/sec   Loss 5.2323   LearningRate 0.0249   Epoch: 10   Global Step: 167100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:08,371-Speed 9119.92 samples/sec   Loss 5.0975   LearningRate 0.0249   Epoch: 10   Global Step: 167110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:09,437-Speed 9608.18 samples/sec   Loss 5.2038   LearningRate 0.0249   Epoch: 10   Global Step: 167120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:10,538-Speed 9310.25 samples/sec   Loss 5.2337   LearningRate 0.0249   Epoch: 10   Global Step: 167130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:11,682-Speed 8951.93 samples/sec   Loss 5.0483   LearningRate 0.0249   Epoch: 10   Global Step: 167140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:12,770-Speed 9419.06 samples/sec   Loss 5.0497   LearningRate 0.0249   Epoch: 10   Global Step: 167150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:13,868-Speed 9331.20 samples/sec   Loss 5.1521   LearningRate 0.0249   Epoch: 10   Global Step: 167160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:14,916-Speed 9774.63 samples/sec   Loss 5.1220   LearningRate 0.0249   Epoch: 10   Global Step: 167170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:16,002-Speed 9433.57 samples/sec   Loss 5.1695   LearningRate 0.0249   Epoch: 10   Global Step: 167180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:17,091-Speed 9412.06 samples/sec   Loss 5.0713   LearningRate 0.0249   Epoch: 10   Global Step: 167190   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:18,214-Speed 9125.61 samples/sec   Loss 5.1136   LearningRate 0.0249   Epoch: 10   Global Step: 167200   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:19,256-Speed 9825.98 samples/sec   Loss 5.0867   LearningRate 0.0249   Epoch: 10   Global Step: 167210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:20,343-Speed 9431.87 samples/sec   Loss 5.0777   LearningRate 0.0249   Epoch: 10   Global Step: 167220   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:21,436-Speed 9377.68 samples/sec   Loss 5.1284   LearningRate 0.0249   Epoch: 10   Global Step: 167230   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:22,534-Speed 9332.25 samples/sec   Loss 5.0525   LearningRate 0.0249   Epoch: 10   Global Step: 167240   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:23,619-Speed 9435.25 samples/sec   Loss 5.2087   LearningRate 0.0249   Epoch: 10   Global Step: 167250   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:24,705-Speed 9435.84 samples/sec   Loss 5.1105   LearningRate 0.0249   Epoch: 10   Global Step: 167260   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:26,292-Speed 6457.04 samples/sec   Loss 5.2287   LearningRate 0.0249   Epoch: 10   Global Step: 167270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:27,426-Speed 9036.02 samples/sec   Loss 5.1159   LearningRate 0.0249   Epoch: 10   Global Step: 167280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:28,550-Speed 9129.53 samples/sec   Loss 5.2041   LearningRate 0.0249   Epoch: 10   Global Step: 167290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:29,664-Speed 9199.40 samples/sec   Loss 5.1309   LearningRate 0.0249   Epoch: 10   Global Step: 167300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:30,753-Speed 9404.14 samples/sec   Loss 5.1192   LearningRate 0.0249   Epoch: 10   Global Step: 167310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:31,820-Speed 9605.87 samples/sec   Loss 5.2199   LearningRate 0.0249   Epoch: 10   Global Step: 167320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:32,901-Speed 9478.13 samples/sec   Loss 5.1778   LearningRate 0.0249   Epoch: 10   Global Step: 167330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:33,955-Speed 9726.00 samples/sec   Loss 5.1627   LearningRate 0.0249   Epoch: 10   Global Step: 167340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:35,048-Speed 9370.12 samples/sec   Loss 5.1853   LearningRate 0.0249   Epoch: 10   Global Step: 167350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:36,170-Speed 9132.38 samples/sec   Loss 5.1712   LearningRate 0.0249   Epoch: 10   Global Step: 167360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:37,273-Speed 9289.39 samples/sec   Loss 5.1429   LearningRate 0.0249   Epoch: 10   Global Step: 167370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:38,383-Speed 9228.49 samples/sec   Loss 5.1911   LearningRate 0.0249   Epoch: 10   Global Step: 167380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:39,503-Speed 9153.81 samples/sec   Loss 5.2340   LearningRate 0.0249   Epoch: 10   Global Step: 167390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:40,565-Speed 9645.80 samples/sec   Loss 5.1838   LearningRate 0.0249   Epoch: 10   Global Step: 167400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:41,835-Speed 8068.68 samples/sec   Loss 5.1238   LearningRate 0.0249   Epoch: 10   Global Step: 167410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:42,918-Speed 9463.89 samples/sec   Loss 5.1397   LearningRate 0.0248   Epoch: 10   Global Step: 167420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:43,991-Speed 9549.35 samples/sec   Loss 5.2485   LearningRate 0.0248   Epoch: 10   Global Step: 167430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:17:45,055-Speed 9624.47 samples/sec   Loss 5.2123   LearningRate 0.0248   Epoch: 10   Global Step: 167440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:46,137-Speed 9473.46 samples/sec   Loss 5.2199   LearningRate 0.0248   Epoch: 10   Global Step: 167450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:47,245-Speed 9247.32 samples/sec   Loss 5.2207   LearningRate 0.0248   Epoch: 10   Global Step: 167460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:48,298-Speed 9730.19 samples/sec   Loss 5.2042   LearningRate 0.0248   Epoch: 10   Global Step: 167470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:49,372-Speed 9533.65 samples/sec   Loss 5.1202   LearningRate 0.0248   Epoch: 10   Global Step: 167480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:50,440-Speed 9598.15 samples/sec   Loss 5.1673   LearningRate 0.0248   Epoch: 10   Global Step: 167490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:51,531-Speed 9388.60 samples/sec   Loss 5.1878   LearningRate 0.0248   Epoch: 10   Global Step: 167500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:52,613-Speed 9474.95 samples/sec   Loss 5.2823   LearningRate 0.0248   Epoch: 10   Global Step: 167510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:53,674-Speed 9654.32 samples/sec   Loss 5.2317   LearningRate 0.0248   Epoch: 10   Global Step: 167520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:54,772-Speed 9326.64 samples/sec   Loss 5.2116   LearningRate 0.0248   Epoch: 10   Global Step: 167530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:55,857-Speed 9450.66 samples/sec   Loss 5.3045   LearningRate 0.0248   Epoch: 10   Global Step: 167540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:56,952-Speed 9352.90 samples/sec   Loss 5.1906   LearningRate 0.0248   Epoch: 10   Global Step: 167550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:58,003-Speed 9748.30 samples/sec   Loss 5.2753   LearningRate 0.0248   Epoch: 10   Global Step: 167560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:17:59,105-Speed 9299.88 samples/sec   Loss 5.2408   LearningRate 0.0248   Epoch: 10   Global Step: 167570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:00,159-Speed 9719.72 samples/sec   Loss 5.1254   LearningRate 0.0248   Epoch: 10   Global Step: 167580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:01,263-Speed 9282.83 samples/sec   Loss 5.2685   LearningRate 0.0248   Epoch: 10   Global Step: 167590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:02,328-Speed 9623.23 samples/sec   Loss 5.2200   LearningRate 0.0248   Epoch: 10   Global Step: 167600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:03,473-Speed 8949.78 samples/sec   Loss 5.1797   LearningRate 0.0248   Epoch: 10   Global Step: 167610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:04,534-Speed 9649.68 samples/sec   Loss 5.2819   LearningRate 0.0248   Epoch: 10   Global Step: 167620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:05,594-Speed 9666.89 samples/sec   Loss 5.1665   LearningRate 0.0248   Epoch: 10   Global Step: 167630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:06,638-Speed 9817.51 samples/sec   Loss 5.2388   LearningRate 0.0248   Epoch: 10   Global Step: 167640   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:18:07,718-Speed 9487.23 samples/sec   Loss 5.2124   LearningRate 0.0248   Epoch: 10   Global Step: 167650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:08,780-Speed 9648.18 samples/sec   Loss 5.2094   LearningRate 0.0248   Epoch: 10   Global Step: 167660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:09,852-Speed 9554.98 samples/sec   Loss 5.3279   LearningRate 0.0248   Epoch: 10   Global Step: 167670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:10,940-Speed 9422.41 samples/sec   Loss 5.2714   LearningRate 0.0248   Epoch: 10   Global Step: 167680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:12,087-Speed 8932.97 samples/sec   Loss 5.2366   LearningRate 0.0248   Epoch: 10   Global Step: 167690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:13,158-Speed 9564.47 samples/sec   Loss 5.3398   LearningRate 0.0248   Epoch: 10   Global Step: 167700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:14,272-Speed 9198.51 samples/sec   Loss 5.2836   LearningRate 0.0248   Epoch: 10   Global Step: 167710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:15,362-Speed 9400.02 samples/sec   Loss 5.2453   LearningRate 0.0248   Epoch: 10   Global Step: 167720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:16,454-Speed 9377.15 samples/sec   Loss 5.1898   LearningRate 0.0248   Epoch: 10   Global Step: 167730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:17,531-Speed 9514.85 samples/sec   Loss 5.0917   LearningRate 0.0248   Epoch: 10   Global Step: 167740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:18,601-Speed 9574.66 samples/sec   Loss 5.1835   LearningRate 0.0247   Epoch: 10   Global Step: 167750   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:18:19,667-Speed 9622.07 samples/sec   Loss 5.3050   LearningRate 0.0247   Epoch: 10   Global Step: 167760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:20,742-Speed 9532.86 samples/sec   Loss 5.3639   LearningRate 0.0247   Epoch: 10   Global Step: 167770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:21,788-Speed 9793.94 samples/sec   Loss 5.1965   LearningRate 0.0247   Epoch: 10   Global Step: 167780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:22,823-Speed 9896.17 samples/sec   Loss 5.2122   LearningRate 0.0247   Epoch: 10   Global Step: 167790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:23,907-Speed 9456.28 samples/sec   Loss 5.2064   LearningRate 0.0247   Epoch: 10   Global Step: 167800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:24,945-Speed 9865.02 samples/sec   Loss 5.1710   LearningRate 0.0247   Epoch: 10   Global Step: 167810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:26,015-Speed 9576.23 samples/sec   Loss 5.1772   LearningRate 0.0247   Epoch: 10   Global Step: 167820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:27,108-Speed 9380.79 samples/sec   Loss 5.3354   LearningRate 0.0247   Epoch: 10   Global Step: 167830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:28,207-Speed 9321.91 samples/sec   Loss 5.2450   LearningRate 0.0247   Epoch: 10   Global Step: 167840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:29,276-Speed 9585.84 samples/sec   Loss 5.3122   LearningRate 0.0247   Epoch: 10   Global Step: 167850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:30,341-Speed 9613.37 samples/sec   Loss 5.2733   LearningRate 0.0247   Epoch: 10   Global Step: 167860   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:18:31,447-Speed 9266.35 samples/sec   Loss 5.2587   LearningRate 0.0247   Epoch: 10   Global Step: 167870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:32,569-Speed 9130.25 samples/sec   Loss 5.2847   LearningRate 0.0247   Epoch: 10   Global Step: 167880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:33,668-Speed 9322.37 samples/sec   Loss 5.3011   LearningRate 0.0247   Epoch: 10   Global Step: 167890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:34,793-Speed 9106.22 samples/sec   Loss 5.2757   LearningRate 0.0247   Epoch: 10   Global Step: 167900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:35,879-Speed 9439.21 samples/sec   Loss 5.2776   LearningRate 0.0247   Epoch: 10   Global Step: 167910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:36,955-Speed 9525.76 samples/sec   Loss 5.3514   LearningRate 0.0247   Epoch: 10   Global Step: 167920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:38,025-Speed 9578.43 samples/sec   Loss 5.2502   LearningRate 0.0247   Epoch: 10   Global Step: 167930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:39,135-Speed 9231.43 samples/sec   Loss 5.2821   LearningRate 0.0247   Epoch: 10   Global Step: 167940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:40,230-Speed 9353.04 samples/sec   Loss 5.2944   LearningRate 0.0247   Epoch: 10   Global Step: 167950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:41,289-Speed 9677.61 samples/sec   Loss 5.1908   LearningRate 0.0247   Epoch: 10   Global Step: 167960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:18:42,366-Speed 9513.34 samples/sec   Loss 5.2223   LearningRate 0.0247   Epoch: 10   Global Step: 167970   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:18:43,453-Speed 9424.58 samples/sec   Loss 5.2910   LearningRate 0.0247   Epoch: 10   Global Step: 167980   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:18:44,543-Speed 9399.65 samples/sec   Loss 5.2275   LearningRate 0.0247   Epoch: 10   Global Step: 167990   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:18:45,630-Speed 9421.69 samples/sec   Loss 5.2949   LearningRate 0.0247   Epoch: 10   Global Step: 168000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:19:07,804-[lfw][168000]XNorm: 9.483930
Training: 2022-04-11 18:19:07,805-[lfw][168000]Accuracy-Flip: 0.99683+-0.00252
Training: 2022-04-11 18:19:07,805-[lfw][168000]Accuracy-Highest: 0.99683
Training: 2022-04-11 18:19:33,417-[cfp_fp][168000]XNorm: 8.157596
Training: 2022-04-11 18:19:33,418-[cfp_fp][168000]Accuracy-Flip: 0.96214+-0.00848
Training: 2022-04-11 18:19:33,418-[cfp_fp][168000]Accuracy-Highest: 0.96500
Training: 2022-04-11 18:19:55,590-[agedb_30][168000]XNorm: 9.209012
Training: 2022-04-11 18:19:55,591-[agedb_30][168000]Accuracy-Flip: 0.96817+-0.01020
Training: 2022-04-11 18:19:55,591-[agedb_30][168000]Accuracy-Highest: 0.96917
Training: 2022-04-11 18:19:56,665-Speed 144.16 samples/sec   Loss 5.2859   LearningRate 0.0247   Epoch: 10   Global Step: 168010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:19:57,780-Speed 9197.69 samples/sec   Loss 5.3488   LearningRate 0.0247   Epoch: 10   Global Step: 168020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:19:58,887-Speed 9254.63 samples/sec   Loss 5.1983   LearningRate 0.0247   Epoch: 10   Global Step: 168030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:19:59,991-Speed 9282.23 samples/sec   Loss 5.2899   LearningRate 0.0247   Epoch: 10   Global Step: 168040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:01,096-Speed 9265.39 samples/sec   Loss 5.2744   LearningRate 0.0247   Epoch: 10   Global Step: 168050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:02,166-Speed 9579.13 samples/sec   Loss 5.2534   LearningRate 0.0247   Epoch: 10   Global Step: 168060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:03,256-Speed 9398.66 samples/sec   Loss 5.2242   LearningRate 0.0247   Epoch: 10   Global Step: 168070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:04,371-Speed 9191.61 samples/sec   Loss 5.3337   LearningRate 0.0247   Epoch: 10   Global Step: 168080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:05,448-Speed 9510.99 samples/sec   Loss 5.2215   LearningRate 0.0246   Epoch: 10   Global Step: 168090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:06,516-Speed 9597.87 samples/sec   Loss 5.2530   LearningRate 0.0246   Epoch: 10   Global Step: 168100   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:20:07,621-Speed 9269.84 samples/sec   Loss 5.2041   LearningRate 0.0246   Epoch: 10   Global Step: 168110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:08,721-Speed 9316.22 samples/sec   Loss 5.2245   LearningRate 0.0246   Epoch: 10   Global Step: 168120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:09,787-Speed 9611.13 samples/sec   Loss 5.3240   LearningRate 0.0246   Epoch: 10   Global Step: 168130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:10,827-Speed 9849.29 samples/sec   Loss 5.2707   LearningRate 0.0246   Epoch: 10   Global Step: 168140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:11,883-Speed 9701.08 samples/sec   Loss 5.2481   LearningRate 0.0246   Epoch: 10   Global Step: 168150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:13,027-Speed 8959.82 samples/sec   Loss 5.2366   LearningRate 0.0246   Epoch: 10   Global Step: 168160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:14,112-Speed 9438.99 samples/sec   Loss 5.2910   LearningRate 0.0246   Epoch: 10   Global Step: 168170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:15,185-Speed 9552.99 samples/sec   Loss 5.3139   LearningRate 0.0246   Epoch: 10   Global Step: 168180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:16,273-Speed 9421.37 samples/sec   Loss 5.2650   LearningRate 0.0246   Epoch: 10   Global Step: 168190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:17,361-Speed 9411.38 samples/sec   Loss 5.2570   LearningRate 0.0246   Epoch: 10   Global Step: 168200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:18,410-Speed 9769.18 samples/sec   Loss 5.2845   LearningRate 0.0246   Epoch: 10   Global Step: 168210   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:20:19,495-Speed 9442.08 samples/sec   Loss 5.3163   LearningRate 0.0246   Epoch: 10   Global Step: 168220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:20,544-Speed 9767.43 samples/sec   Loss 5.3817   LearningRate 0.0246   Epoch: 10   Global Step: 168230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:21,609-Speed 9624.83 samples/sec   Loss 5.3540   LearningRate 0.0246   Epoch: 10   Global Step: 168240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:22,671-Speed 9649.62 samples/sec   Loss 5.3474   LearningRate 0.0246   Epoch: 10   Global Step: 168250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:23,769-Speed 9324.75 samples/sec   Loss 5.2647   LearningRate 0.0246   Epoch: 10   Global Step: 168260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:24,862-Speed 9377.01 samples/sec   Loss 5.2524   LearningRate 0.0246   Epoch: 10   Global Step: 168270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:20:25,963-Speed 9310.23 samples/sec   Loss 5.3232   LearningRate 0.0246   Epoch: 10   Global Step: 168280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:20:26,995-Speed 9936.16 samples/sec   Loss 5.2763   LearningRate 0.0246   Epoch: 10   Global Step: 168290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:20:28,079-Speed 9447.28 samples/sec   Loss 5.2829   LearningRate 0.0246   Epoch: 10   Global Step: 168300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:20:29,160-Speed 9479.50 samples/sec   Loss 5.2648   LearningRate 0.0246   Epoch: 10   Global Step: 168310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:20:30,201-Speed 9844.45 samples/sec   Loss 5.2528   LearningRate 0.0246   Epoch: 10   Global Step: 168320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:20:31,297-Speed 9346.65 samples/sec   Loss 5.3067   LearningRate 0.0246   Epoch: 10   Global Step: 168330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:20:32,386-Speed 9404.11 samples/sec   Loss 5.3759   LearningRate 0.0246   Epoch: 10   Global Step: 168340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:20:33,468-Speed 9473.06 samples/sec   Loss 5.3329   LearningRate 0.0246   Epoch: 10   Global Step: 168350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:20:34,593-Speed 9106.88 samples/sec   Loss 5.2947   LearningRate 0.0246   Epoch: 10   Global Step: 168360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:20:35,683-Speed 9393.12 samples/sec   Loss 5.4253   LearningRate 0.0246   Epoch: 10   Global Step: 168370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:36,776-Speed 9374.87 samples/sec   Loss 5.2042   LearningRate 0.0246   Epoch: 10   Global Step: 168380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:37,851-Speed 9533.02 samples/sec   Loss 5.4474   LearningRate 0.0246   Epoch: 10   Global Step: 168390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:38,925-Speed 9539.40 samples/sec   Loss 5.2012   LearningRate 0.0246   Epoch: 10   Global Step: 168400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:40,040-Speed 9189.27 samples/sec   Loss 5.2820   LearningRate 0.0246   Epoch: 10   Global Step: 168410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:41,170-Speed 9068.15 samples/sec   Loss 5.2867   LearningRate 0.0245   Epoch: 10   Global Step: 168420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:42,264-Speed 9371.79 samples/sec   Loss 5.4292   LearningRate 0.0245   Epoch: 10   Global Step: 168430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:43,330-Speed 9613.06 samples/sec   Loss 5.3466   LearningRate 0.0245   Epoch: 10   Global Step: 168440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:44,443-Speed 9205.72 samples/sec   Loss 5.3270   LearningRate 0.0245   Epoch: 10   Global Step: 168450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:45,488-Speed 9799.32 samples/sec   Loss 5.3528   LearningRate 0.0245   Epoch: 10   Global Step: 168460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:46,609-Speed 9147.02 samples/sec   Loss 5.3870   LearningRate 0.0245   Epoch: 10   Global Step: 168470   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:20:47,690-Speed 9474.32 samples/sec   Loss 5.4249   LearningRate 0.0245   Epoch: 10   Global Step: 168480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:48,757-Speed 9600.20 samples/sec   Loss 5.3230   LearningRate 0.0245   Epoch: 10   Global Step: 168490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:49,844-Speed 9423.35 samples/sec   Loss 5.4125   LearningRate 0.0245   Epoch: 10   Global Step: 168500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:50,950-Speed 9274.42 samples/sec   Loss 5.3698   LearningRate 0.0245   Epoch: 10   Global Step: 168510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:52,064-Speed 9194.11 samples/sec   Loss 5.4017   LearningRate 0.0245   Epoch: 10   Global Step: 168520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:53,181-Speed 9169.11 samples/sec   Loss 5.3738   LearningRate 0.0245   Epoch: 10   Global Step: 168530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:54,270-Speed 9416.25 samples/sec   Loss 5.3930   LearningRate 0.0245   Epoch: 10   Global Step: 168540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:20:55,342-Speed 9556.72 samples/sec   Loss 5.4226   LearningRate 0.0245   Epoch: 10   Global Step: 168550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:20:56,442-Speed 9308.43 samples/sec   Loss 5.3782   LearningRate 0.0245   Epoch: 10   Global Step: 168560   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:20:57,539-Speed 9345.95 samples/sec   Loss 5.3596   LearningRate 0.0245   Epoch: 10   Global Step: 168570   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:20:58,635-Speed 9353.50 samples/sec   Loss 5.2218   LearningRate 0.0245   Epoch: 10   Global Step: 168580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:20:59,732-Speed 9332.19 samples/sec   Loss 5.2361   LearningRate 0.0245   Epoch: 10   Global Step: 168590   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:21:00,873-Speed 8986.14 samples/sec   Loss 5.3454   LearningRate 0.0245   Epoch: 10   Global Step: 168600   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:21:01,938-Speed 9620.16 samples/sec   Loss 5.3233   LearningRate 0.0245   Epoch: 10   Global Step: 168610   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:21:03,060-Speed 9127.38 samples/sec   Loss 5.4084   LearningRate 0.0245   Epoch: 10   Global Step: 168620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:21:04,177-Speed 9175.41 samples/sec   Loss 5.3442   LearningRate 0.0245   Epoch: 10   Global Step: 168630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:21:05,235-Speed 9689.28 samples/sec   Loss 5.3875   LearningRate 0.0245   Epoch: 10   Global Step: 168640   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:21:06,306-Speed 9561.17 samples/sec   Loss 5.3704   LearningRate 0.0245   Epoch: 10   Global Step: 168650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:07,397-Speed 9393.90 samples/sec   Loss 5.3091   LearningRate 0.0245   Epoch: 10   Global Step: 168660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:08,481-Speed 9449.16 samples/sec   Loss 5.3237   LearningRate 0.0245   Epoch: 10   Global Step: 168670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:09,584-Speed 9292.60 samples/sec   Loss 5.3110   LearningRate 0.0245   Epoch: 10   Global Step: 168680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:10,664-Speed 9487.42 samples/sec   Loss 5.3930   LearningRate 0.0245   Epoch: 10   Global Step: 168690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:11,748-Speed 9451.79 samples/sec   Loss 5.3628   LearningRate 0.0245   Epoch: 10   Global Step: 168700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:12,869-Speed 9142.25 samples/sec   Loss 5.4420   LearningRate 0.0245   Epoch: 10   Global Step: 168710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:13,963-Speed 9360.20 samples/sec   Loss 5.3186   LearningRate 0.0245   Epoch: 10   Global Step: 168720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:15,046-Speed 9465.89 samples/sec   Loss 5.3080   LearningRate 0.0245   Epoch: 10   Global Step: 168730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:16,191-Speed 8942.85 samples/sec   Loss 5.3076   LearningRate 0.0245   Epoch: 10   Global Step: 168740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:17,249-Speed 9692.41 samples/sec   Loss 5.3993   LearningRate 0.0245   Epoch: 10   Global Step: 168750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:18,314-Speed 9614.24 samples/sec   Loss 5.4824   LearningRate 0.0244   Epoch: 10   Global Step: 168760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:19,399-Speed 9442.14 samples/sec   Loss 5.4362   LearningRate 0.0244   Epoch: 10   Global Step: 168770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:20,485-Speed 9440.83 samples/sec   Loss 5.3503   LearningRate 0.0244   Epoch: 10   Global Step: 168780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:21,546-Speed 9654.75 samples/sec   Loss 5.3975   LearningRate 0.0244   Epoch: 10   Global Step: 168790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:22,589-Speed 9827.48 samples/sec   Loss 5.4326   LearningRate 0.0244   Epoch: 10   Global Step: 168800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:23,662-Speed 9549.18 samples/sec   Loss 5.4167   LearningRate 0.0244   Epoch: 10   Global Step: 168810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:24,750-Speed 9419.01 samples/sec   Loss 5.3536   LearningRate 0.0244   Epoch: 10   Global Step: 168820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:25,816-Speed 9605.45 samples/sec   Loss 5.3717   LearningRate 0.0244   Epoch: 10   Global Step: 168830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:26,900-Speed 9458.98 samples/sec   Loss 5.4050   LearningRate 0.0244   Epoch: 10   Global Step: 168840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:27,973-Speed 9547.93 samples/sec   Loss 5.3482   LearningRate 0.0244   Epoch: 10   Global Step: 168850   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:21:29,035-Speed 9647.88 samples/sec   Loss 5.3023   LearningRate 0.0244   Epoch: 10   Global Step: 168860   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:21:30,125-Speed 9402.60 samples/sec   Loss 5.3436   LearningRate 0.0244   Epoch: 10   Global Step: 168870   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:21:31,186-Speed 9652.34 samples/sec   Loss 5.3733   LearningRate 0.0244   Epoch: 10   Global Step: 168880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:32,226-Speed 9854.32 samples/sec   Loss 5.4107   LearningRate 0.0244   Epoch: 10   Global Step: 168890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:33,320-Speed 9363.11 samples/sec   Loss 5.3256   LearningRate 0.0244   Epoch: 10   Global Step: 168900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:34,430-Speed 9234.53 samples/sec   Loss 5.3054   LearningRate 0.0244   Epoch: 10   Global Step: 168910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:35,509-Speed 9495.10 samples/sec   Loss 5.4668   LearningRate 0.0244   Epoch: 10   Global Step: 168920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:36,584-Speed 9531.03 samples/sec   Loss 5.4098   LearningRate 0.0244   Epoch: 10   Global Step: 168930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:37,641-Speed 9686.12 samples/sec   Loss 5.5060   LearningRate 0.0244   Epoch: 10   Global Step: 168940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:38,737-Speed 9357.04 samples/sec   Loss 5.3821   LearningRate 0.0244   Epoch: 10   Global Step: 168950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:39,843-Speed 9275.66 samples/sec   Loss 5.3673   LearningRate 0.0244   Epoch: 10   Global Step: 168960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:40,913-Speed 9568.51 samples/sec   Loss 5.2791   LearningRate 0.0244   Epoch: 10   Global Step: 168970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:41,997-Speed 9454.80 samples/sec   Loss 5.3772   LearningRate 0.0244   Epoch: 10   Global Step: 168980   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:21:43,137-Speed 8991.43 samples/sec   Loss 5.4713   LearningRate 0.0244   Epoch: 10   Global Step: 168990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:44,211-Speed 9540.46 samples/sec   Loss 5.3793   LearningRate 0.0244   Epoch: 10   Global Step: 169000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:45,286-Speed 9524.96 samples/sec   Loss 5.3593   LearningRate 0.0244   Epoch: 10   Global Step: 169010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:46,425-Speed 8999.33 samples/sec   Loss 5.3428   LearningRate 0.0244   Epoch: 10   Global Step: 169020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:47,540-Speed 9187.01 samples/sec   Loss 5.4039   LearningRate 0.0244   Epoch: 10   Global Step: 169030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:48,601-Speed 9658.48 samples/sec   Loss 5.2987   LearningRate 0.0244   Epoch: 10   Global Step: 169040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:49,679-Speed 9507.34 samples/sec   Loss 5.3563   LearningRate 0.0244   Epoch: 10   Global Step: 169050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:50,774-Speed 9358.86 samples/sec   Loss 5.4067   LearningRate 0.0244   Epoch: 10   Global Step: 169060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:51,842-Speed 9595.54 samples/sec   Loss 5.4830   LearningRate 0.0244   Epoch: 10   Global Step: 169070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:21:52,934-Speed 9383.05 samples/sec   Loss 5.3854   LearningRate 0.0244   Epoch: 10   Global Step: 169080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:21:53,982-Speed 9782.04 samples/sec   Loss 5.3903   LearningRate 0.0244   Epoch: 10   Global Step: 169090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:21:55,099-Speed 9175.27 samples/sec   Loss 5.4448   LearningRate 0.0243   Epoch: 10   Global Step: 169100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:21:56,175-Speed 9520.21 samples/sec   Loss 5.4781   LearningRate 0.0243   Epoch: 10   Global Step: 169110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:21:57,245-Speed 9584.02 samples/sec   Loss 5.4930   LearningRate 0.0243   Epoch: 10   Global Step: 169120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:21:58,333-Speed 9419.05 samples/sec   Loss 5.4283   LearningRate 0.0243   Epoch: 10   Global Step: 169130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:21:59,455-Speed 9130.85 samples/sec   Loss 5.4095   LearningRate 0.0243   Epoch: 10   Global Step: 169140   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:22:00,503-Speed 9779.88 samples/sec   Loss 5.4865   LearningRate 0.0243   Epoch: 10   Global Step: 169150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:22:01,572-Speed 9582.09 samples/sec   Loss 5.4153   LearningRate 0.0243   Epoch: 10   Global Step: 169160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:22:02,668-Speed 9354.71 samples/sec   Loss 5.3437   LearningRate 0.0243   Epoch: 10   Global Step: 169170   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:22:03,756-Speed 9409.82 samples/sec   Loss 5.4118   LearningRate 0.0243   Epoch: 10   Global Step: 169180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:04,821-Speed 9626.87 samples/sec   Loss 5.5152   LearningRate 0.0243   Epoch: 10   Global Step: 169190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:05,935-Speed 9193.62 samples/sec   Loss 5.4058   LearningRate 0.0243   Epoch: 10   Global Step: 169200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:07,051-Speed 9184.75 samples/sec   Loss 5.4124   LearningRate 0.0243   Epoch: 10   Global Step: 169210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:08,101-Speed 9756.13 samples/sec   Loss 5.4187   LearningRate 0.0243   Epoch: 10   Global Step: 169220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:09,182-Speed 9473.24 samples/sec   Loss 5.3629   LearningRate 0.0243   Epoch: 10   Global Step: 169230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:10,289-Speed 9264.69 samples/sec   Loss 5.5154   LearningRate 0.0243   Epoch: 10   Global Step: 169240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:11,356-Speed 9595.11 samples/sec   Loss 5.4875   LearningRate 0.0243   Epoch: 10   Global Step: 169250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:12,490-Speed 9037.56 samples/sec   Loss 5.3914   LearningRate 0.0243   Epoch: 10   Global Step: 169260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:13,604-Speed 9200.08 samples/sec   Loss 5.3951   LearningRate 0.0243   Epoch: 10   Global Step: 169270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:14,699-Speed 9350.27 samples/sec   Loss 5.4108   LearningRate 0.0243   Epoch: 10   Global Step: 169280   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:22:15,780-Speed 9477.11 samples/sec   Loss 5.4209   LearningRate 0.0243   Epoch: 10   Global Step: 169290   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:22:16,855-Speed 9536.35 samples/sec   Loss 5.3728   LearningRate 0.0243   Epoch: 10   Global Step: 169300   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:22:17,936-Speed 9475.30 samples/sec   Loss 5.3770   LearningRate 0.0243   Epoch: 10   Global Step: 169310   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:22:19,023-Speed 9433.69 samples/sec   Loss 5.3828   LearningRate 0.0243   Epoch: 10   Global Step: 169320   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:22:20,085-Speed 9640.64 samples/sec   Loss 5.4742   LearningRate 0.0243   Epoch: 10   Global Step: 169330   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:22:21,173-Speed 9426.29 samples/sec   Loss 5.4912   LearningRate 0.0243   Epoch: 10   Global Step: 169340   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:22:22,236-Speed 9635.06 samples/sec   Loss 5.5194   LearningRate 0.0243   Epoch: 10   Global Step: 169350   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:22:23,311-Speed 9533.17 samples/sec   Loss 5.4849   LearningRate 0.0243   Epoch: 10   Global Step: 169360   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:22:24,381-Speed 9577.17 samples/sec   Loss 5.4957   LearningRate 0.0243   Epoch: 10   Global Step: 169370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:22:25,456-Speed 9530.18 samples/sec   Loss 5.4946   LearningRate 0.0243   Epoch: 10   Global Step: 169380   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:22:26,560-Speed 9273.31 samples/sec   Loss 5.4763   LearningRate 0.0243   Epoch: 10   Global Step: 169390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:27,685-Speed 9112.99 samples/sec   Loss 5.4273   LearningRate 0.0243   Epoch: 10   Global Step: 169400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:28,794-Speed 9240.05 samples/sec   Loss 5.3479   LearningRate 0.0243   Epoch: 10   Global Step: 169410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:29,900-Speed 9259.03 samples/sec   Loss 5.4222   LearningRate 0.0243   Epoch: 10   Global Step: 169420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:30,996-Speed 9354.10 samples/sec   Loss 5.4416   LearningRate 0.0243   Epoch: 10   Global Step: 169430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:32,104-Speed 9240.67 samples/sec   Loss 5.4436   LearningRate 0.0242   Epoch: 10   Global Step: 169440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:33,251-Speed 8939.36 samples/sec   Loss 5.3906   LearningRate 0.0242   Epoch: 10   Global Step: 169450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:34,311-Speed 9659.30 samples/sec   Loss 5.4408   LearningRate 0.0242   Epoch: 10   Global Step: 169460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:35,424-Speed 9207.45 samples/sec   Loss 5.4166   LearningRate 0.0242   Epoch: 10   Global Step: 169470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:36,502-Speed 9507.99 samples/sec   Loss 5.4078   LearningRate 0.0242   Epoch: 10   Global Step: 169480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:37,607-Speed 9273.54 samples/sec   Loss 5.4280   LearningRate 0.0242   Epoch: 10   Global Step: 169490   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:22:38,664-Speed 9689.53 samples/sec   Loss 5.5519   LearningRate 0.0242   Epoch: 10   Global Step: 169500   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:22:39,739-Speed 9536.80 samples/sec   Loss 5.3574   LearningRate 0.0242   Epoch: 10   Global Step: 169510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:40,779-Speed 9851.94 samples/sec   Loss 5.4226   LearningRate 0.0242   Epoch: 10   Global Step: 169520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:41,922-Speed 8965.98 samples/sec   Loss 5.3986   LearningRate 0.0242   Epoch: 10   Global Step: 169530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:43,015-Speed 9372.80 samples/sec   Loss 5.4868   LearningRate 0.0242   Epoch: 10   Global Step: 169540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:44,138-Speed 9127.80 samples/sec   Loss 5.4303   LearningRate 0.0242   Epoch: 10   Global Step: 169550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:45,237-Speed 9319.00 samples/sec   Loss 5.3804   LearningRate 0.0242   Epoch: 10   Global Step: 169560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:46,304-Speed 9600.89 samples/sec   Loss 5.3266   LearningRate 0.0242   Epoch: 10   Global Step: 169570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:47,426-Speed 9138.62 samples/sec   Loss 5.4250   LearningRate 0.0242   Epoch: 10   Global Step: 169580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:48,498-Speed 9554.44 samples/sec   Loss 5.4349   LearningRate 0.0242   Epoch: 10   Global Step: 169590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:49,574-Speed 9519.68 samples/sec   Loss 5.4848   LearningRate 0.0242   Epoch: 10   Global Step: 169600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:50,681-Speed 9257.92 samples/sec   Loss 5.4949   LearningRate 0.0242   Epoch: 10   Global Step: 169610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:51,748-Speed 9606.89 samples/sec   Loss 5.5159   LearningRate 0.0242   Epoch: 10   Global Step: 169620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:52,838-Speed 9400.23 samples/sec   Loss 5.4174   LearningRate 0.0242   Epoch: 10   Global Step: 169630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:53,954-Speed 9176.03 samples/sec   Loss 5.3982   LearningRate 0.0242   Epoch: 10   Global Step: 169640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:55,045-Speed 9390.41 samples/sec   Loss 5.4251   LearningRate 0.0242   Epoch: 10   Global Step: 169650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:56,136-Speed 9393.20 samples/sec   Loss 5.4170   LearningRate 0.0242   Epoch: 10   Global Step: 169660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:57,211-Speed 9535.70 samples/sec   Loss 5.4258   LearningRate 0.0242   Epoch: 10   Global Step: 169670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:58,308-Speed 9343.07 samples/sec   Loss 5.4101   LearningRate 0.0242   Epoch: 10   Global Step: 169680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:22:59,390-Speed 9467.55 samples/sec   Loss 5.5128   LearningRate 0.0242   Epoch: 10   Global Step: 169690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:23:00,483-Speed 9372.99 samples/sec   Loss 5.4370   LearningRate 0.0242   Epoch: 10   Global Step: 169700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:23:01,590-Speed 9259.83 samples/sec   Loss 5.4809   LearningRate 0.0242   Epoch: 10   Global Step: 169710   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:23:02,660-Speed 9576.21 samples/sec   Loss 5.4753   LearningRate 0.0242   Epoch: 10   Global Step: 169720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:23:03,753-Speed 9374.17 samples/sec   Loss 5.4760   LearningRate 0.0242   Epoch: 10   Global Step: 169730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:23:04,811-Speed 9680.59 samples/sec   Loss 5.4206   LearningRate 0.0242   Epoch: 10   Global Step: 169740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:23:05,861-Speed 9755.93 samples/sec   Loss 5.4535   LearningRate 0.0242   Epoch: 10   Global Step: 169750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:23:06,935-Speed 9543.00 samples/sec   Loss 5.4249   LearningRate 0.0242   Epoch: 10   Global Step: 169760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:23:08,020-Speed 9437.07 samples/sec   Loss 5.3977   LearningRate 0.0242   Epoch: 10   Global Step: 169770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:23:09,151-Speed 9063.90 samples/sec   Loss 5.4157   LearningRate 0.0241   Epoch: 10   Global Step: 169780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:23:10,223-Speed 9561.48 samples/sec   Loss 5.4111   LearningRate 0.0241   Epoch: 10   Global Step: 169790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:23:11,264-Speed 9836.30 samples/sec   Loss 5.4512   LearningRate 0.0241   Epoch: 10   Global Step: 169800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:23:12,369-Speed 9275.13 samples/sec   Loss 5.5286   LearningRate 0.0241   Epoch: 10   Global Step: 169810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:23:13,529-Speed 8833.54 samples/sec   Loss 5.4829   LearningRate 0.0241   Epoch: 10   Global Step: 169820   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:23:14,635-Speed 9266.43 samples/sec   Loss 5.4663   LearningRate 0.0241   Epoch: 10   Global Step: 169830   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:23:15,754-Speed 9162.57 samples/sec   Loss 5.5038   LearningRate 0.0241   Epoch: 10   Global Step: 169840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:23:16,858-Speed 9279.39 samples/sec   Loss 5.3786   LearningRate 0.0241   Epoch: 10   Global Step: 169850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:23:18,011-Speed 8890.60 samples/sec   Loss 5.4285   LearningRate 0.0241   Epoch: 10   Global Step: 169860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:23:19,105-Speed 9365.71 samples/sec   Loss 5.5124   LearningRate 0.0241   Epoch: 10   Global Step: 169870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:23:20,211-Speed 9261.14 samples/sec   Loss 5.4090   LearningRate 0.0241   Epoch: 10   Global Step: 169880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:23:21,304-Speed 9375.45 samples/sec   Loss 5.5194   LearningRate 0.0241   Epoch: 10   Global Step: 169890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:23:22,367-Speed 9636.44 samples/sec   Loss 5.6298   LearningRate 0.0241   Epoch: 10   Global Step: 169900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:23:23,441-Speed 9541.20 samples/sec   Loss 5.5117   LearningRate 0.0241   Epoch: 10   Global Step: 169910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:23:24,607-Speed 8783.74 samples/sec   Loss 5.4981   LearningRate 0.0241   Epoch: 10   Global Step: 169920   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:23:25,675-Speed 9601.13 samples/sec   Loss 5.4676   LearningRate 0.0241   Epoch: 10   Global Step: 169930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:23:26,715-Speed 9855.05 samples/sec   Loss 5.5257   LearningRate 0.0241   Epoch: 10   Global Step: 169940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 18:23:27,789-Speed 9543.35 samples/sec   Loss 5.5225   LearningRate 0.0241   Epoch: 10   Global Step: 169950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 18:23:28,881-Speed 9379.32 samples/sec   Loss 5.4860   LearningRate 0.0241   Epoch: 10   Global Step: 169960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 18:23:29,945-Speed 9633.17 samples/sec   Loss 5.4784   LearningRate 0.0241   Epoch: 10   Global Step: 169970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 18:23:31,030-Speed 9441.28 samples/sec   Loss 5.4852   LearningRate 0.0241   Epoch: 10   Global Step: 169980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 18:23:32,118-Speed 9421.77 samples/sec   Loss 5.4222   LearningRate 0.0241   Epoch: 10   Global Step: 169990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 18:23:33,247-Speed 9069.67 samples/sec   Loss 5.4706   LearningRate 0.0241   Epoch: 10   Global Step: 170000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 18:23:55,361-[lfw][170000]XNorm: 9.365612
Training: 2022-04-11 18:23:55,362-[lfw][170000]Accuracy-Flip: 0.99683+-0.00241
Training: 2022-04-11 18:23:55,362-[lfw][170000]Accuracy-Highest: 0.99683
Training: 2022-04-11 18:24:20,886-[cfp_fp][170000]XNorm: 7.998890
Training: 2022-04-11 18:24:20,887-[cfp_fp][170000]Accuracy-Flip: 0.96343+-0.00876
Training: 2022-04-11 18:24:20,887-[cfp_fp][170000]Accuracy-Highest: 0.96500
Training: 2022-04-11 18:24:42,948-[agedb_30][170000]XNorm: 9.086145
Training: 2022-04-11 18:24:42,948-[agedb_30][170000]Accuracy-Flip: 0.96600+-0.01073
Training: 2022-04-11 18:24:42,949-[agedb_30][170000]Accuracy-Highest: 0.96917
Training: 2022-04-11 18:24:44,007-Speed 144.72 samples/sec   Loss 5.4968   LearningRate 0.0241   Epoch: 10   Global Step: 170010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 18:24:45,050-Speed 9822.35 samples/sec   Loss 5.4034   LearningRate 0.0241   Epoch: 10   Global Step: 170020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 18:24:46,159-Speed 9234.98 samples/sec   Loss 5.5051   LearningRate 0.0241   Epoch: 10   Global Step: 170030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-11 18:24:47,243-Speed 9448.28 samples/sec   Loss 5.5866   LearningRate 0.0241   Epoch: 10   Global Step: 170040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:24:48,328-Speed 9448.99 samples/sec   Loss 5.5471   LearningRate 0.0241   Epoch: 10   Global Step: 170050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:24:49,417-Speed 9402.68 samples/sec   Loss 5.5502   LearningRate 0.0241   Epoch: 10   Global Step: 170060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:24:50,497-Speed 9493.88 samples/sec   Loss 5.5402   LearningRate 0.0241   Epoch: 10   Global Step: 170070   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:24:51,601-Speed 9277.84 samples/sec   Loss 5.5368   LearningRate 0.0241   Epoch: 10   Global Step: 170080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:24:52,686-Speed 9443.47 samples/sec   Loss 5.5246   LearningRate 0.0241   Epoch: 10   Global Step: 170090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:24:53,849-Speed 8808.09 samples/sec   Loss 5.5093   LearningRate 0.0241   Epoch: 10   Global Step: 170100   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:24:54,951-Speed 9297.74 samples/sec   Loss 5.4802   LearningRate 0.0241   Epoch: 10   Global Step: 170110   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:24:56,020-Speed 9599.99 samples/sec   Loss 5.5595   LearningRate 0.0240   Epoch: 10   Global Step: 170120   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:24:57,115-Speed 9350.97 samples/sec   Loss 5.3908   LearningRate 0.0240   Epoch: 10   Global Step: 170130   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:24:58,253-Speed 9003.60 samples/sec   Loss 5.3814   LearningRate 0.0240   Epoch: 10   Global Step: 170140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:24:59,299-Speed 9802.33 samples/sec   Loss 5.5369   LearningRate 0.0240   Epoch: 10   Global Step: 170150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:00,378-Speed 9493.46 samples/sec   Loss 5.4974   LearningRate 0.0240   Epoch: 10   Global Step: 170160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:01,482-Speed 9279.41 samples/sec   Loss 5.4403   LearningRate 0.0240   Epoch: 10   Global Step: 170170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:02,598-Speed 9191.55 samples/sec   Loss 5.4680   LearningRate 0.0240   Epoch: 10   Global Step: 170180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:03,680-Speed 9467.40 samples/sec   Loss 5.5410   LearningRate 0.0240   Epoch: 10   Global Step: 170190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:04,763-Speed 9456.46 samples/sec   Loss 5.4962   LearningRate 0.0240   Epoch: 10   Global Step: 170200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:05,815-Speed 9744.23 samples/sec   Loss 5.4674   LearningRate 0.0240   Epoch: 10   Global Step: 170210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:06,917-Speed 9293.03 samples/sec   Loss 5.5122   LearningRate 0.0240   Epoch: 10   Global Step: 170220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:07,976-Speed 9680.01 samples/sec   Loss 5.4450   LearningRate 0.0240   Epoch: 10   Global Step: 170230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:09,088-Speed 9209.02 samples/sec   Loss 5.3993   LearningRate 0.0240   Epoch: 10   Global Step: 170240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:10,200-Speed 9214.01 samples/sec   Loss 5.4117   LearningRate 0.0240   Epoch: 10   Global Step: 170250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:11,288-Speed 9419.12 samples/sec   Loss 5.4890   LearningRate 0.0240   Epoch: 10   Global Step: 170260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:12,348-Speed 9666.97 samples/sec   Loss 5.3565   LearningRate 0.0240   Epoch: 10   Global Step: 170270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:13,490-Speed 8966.45 samples/sec   Loss 5.5264   LearningRate 0.0240   Epoch: 10   Global Step: 170280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:14,599-Speed 9241.83 samples/sec   Loss 5.4491   LearningRate 0.0240   Epoch: 10   Global Step: 170290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:15,699-Speed 9320.92 samples/sec   Loss 5.4747   LearningRate 0.0240   Epoch: 10   Global Step: 170300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:16,760-Speed 9653.37 samples/sec   Loss 5.5500   LearningRate 0.0240   Epoch: 10   Global Step: 170310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:17,851-Speed 9394.87 samples/sec   Loss 5.4447   LearningRate 0.0240   Epoch: 10   Global Step: 170320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:18,926-Speed 9532.12 samples/sec   Loss 5.5045   LearningRate 0.0240   Epoch: 10   Global Step: 170330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:19,993-Speed 9601.57 samples/sec   Loss 5.4673   LearningRate 0.0240   Epoch: 10   Global Step: 170340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:21,109-Speed 9175.54 samples/sec   Loss 5.5769   LearningRate 0.0240   Epoch: 10   Global Step: 170350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:22,197-Speed 9423.24 samples/sec   Loss 5.5412   LearningRate 0.0240   Epoch: 10   Global Step: 170360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:23,323-Speed 9098.94 samples/sec   Loss 5.5018   LearningRate 0.0240   Epoch: 10   Global Step: 170370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:24,412-Speed 9405.23 samples/sec   Loss 5.5100   LearningRate 0.0240   Epoch: 10   Global Step: 170380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:25,474-Speed 9647.10 samples/sec   Loss 5.5816   LearningRate 0.0240   Epoch: 10   Global Step: 170390   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:25:26,576-Speed 9298.08 samples/sec   Loss 5.5155   LearningRate 0.0240   Epoch: 10   Global Step: 170400   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:25:27,701-Speed 9107.87 samples/sec   Loss 5.4877   LearningRate 0.0240   Epoch: 10   Global Step: 170410   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:25:28,781-Speed 9487.47 samples/sec   Loss 5.5453   LearningRate 0.0240   Epoch: 10   Global Step: 170420   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:25:29,892-Speed 9224.38 samples/sec   Loss 5.4191   LearningRate 0.0240   Epoch: 10   Global Step: 170430   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:25:31,007-Speed 9184.33 samples/sec   Loss 5.5528   LearningRate 0.0240   Epoch: 10   Global Step: 170440   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:25:32,074-Speed 9601.12 samples/sec   Loss 5.4337   LearningRate 0.0240   Epoch: 10   Global Step: 170450   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:25:33,232-Speed 8857.88 samples/sec   Loss 5.4608   LearningRate 0.0239   Epoch: 10   Global Step: 170460   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:25:34,360-Speed 9079.40 samples/sec   Loss 5.3941   LearningRate 0.0239   Epoch: 10   Global Step: 170470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:25:35,418-Speed 9679.72 samples/sec   Loss 5.5694   LearningRate 0.0239   Epoch: 10   Global Step: 170480   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:25:36,533-Speed 9192.21 samples/sec   Loss 5.5346   LearningRate 0.0239   Epoch: 10   Global Step: 170490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:37,633-Speed 9319.21 samples/sec   Loss 5.5142   LearningRate 0.0239   Epoch: 10   Global Step: 170500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:38,757-Speed 9110.51 samples/sec   Loss 5.5642   LearningRate 0.0239   Epoch: 10   Global Step: 170510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:39,871-Speed 9202.34 samples/sec   Loss 5.5447   LearningRate 0.0239   Epoch: 10   Global Step: 170520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:40,955-Speed 9453.91 samples/sec   Loss 5.4551   LearningRate 0.0239   Epoch: 10   Global Step: 170530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:42,056-Speed 9304.82 samples/sec   Loss 5.5259   LearningRate 0.0239   Epoch: 10   Global Step: 170540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:43,137-Speed 9475.25 samples/sec   Loss 5.4545   LearningRate 0.0239   Epoch: 10   Global Step: 170550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:44,223-Speed 9438.36 samples/sec   Loss 5.5004   LearningRate 0.0239   Epoch: 10   Global Step: 170560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:45,319-Speed 9350.80 samples/sec   Loss 5.4674   LearningRate 0.0239   Epoch: 10   Global Step: 170570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:46,380-Speed 9652.45 samples/sec   Loss 5.4419   LearningRate 0.0239   Epoch: 10   Global Step: 170580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:47,456-Speed 9525.27 samples/sec   Loss 5.5694   LearningRate 0.0239   Epoch: 10   Global Step: 170590   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:25:48,559-Speed 9287.44 samples/sec   Loss 5.5242   LearningRate 0.0239   Epoch: 10   Global Step: 170600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:49,653-Speed 9362.16 samples/sec   Loss 5.5136   LearningRate 0.0239   Epoch: 10   Global Step: 170610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:50,737-Speed 9459.94 samples/sec   Loss 5.6318   LearningRate 0.0239   Epoch: 10   Global Step: 170620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:51,818-Speed 9473.32 samples/sec   Loss 5.5814   LearningRate 0.0239   Epoch: 10   Global Step: 170630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:52,946-Speed 9090.10 samples/sec   Loss 5.5015   LearningRate 0.0239   Epoch: 10   Global Step: 170640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:54,085-Speed 8993.78 samples/sec   Loss 5.6197   LearningRate 0.0239   Epoch: 10   Global Step: 170650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:55,190-Speed 9275.74 samples/sec   Loss 5.5353   LearningRate 0.0239   Epoch: 10   Global Step: 170660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:56,284-Speed 9364.07 samples/sec   Loss 5.6055   LearningRate 0.0239   Epoch: 10   Global Step: 170670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:57,386-Speed 9296.84 samples/sec   Loss 5.6092   LearningRate 0.0239   Epoch: 10   Global Step: 170680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:58,482-Speed 9351.29 samples/sec   Loss 5.6942   LearningRate 0.0239   Epoch: 10   Global Step: 170690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:25:59,601-Speed 9156.45 samples/sec   Loss 5.5025   LearningRate 0.0239   Epoch: 10   Global Step: 170700   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:26:00,681-Speed 9486.94 samples/sec   Loss 5.4927   LearningRate 0.0239   Epoch: 10   Global Step: 170710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:01,830-Speed 8926.96 samples/sec   Loss 5.5628   LearningRate 0.0239   Epoch: 10   Global Step: 170720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:02,921-Speed 9390.70 samples/sec   Loss 5.4806   LearningRate 0.0239   Epoch: 10   Global Step: 170730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:03,975-Speed 9720.08 samples/sec   Loss 5.5553   LearningRate 0.0239   Epoch: 10   Global Step: 170740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:05,139-Speed 8805.20 samples/sec   Loss 5.4719   LearningRate 0.0239   Epoch: 10   Global Step: 170750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:06,237-Speed 9327.01 samples/sec   Loss 5.4800   LearningRate 0.0239   Epoch: 10   Global Step: 170760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:07,348-Speed 9224.20 samples/sec   Loss 5.4833   LearningRate 0.0239   Epoch: 10   Global Step: 170770   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:08,460-Speed 9221.33 samples/sec   Loss 5.5335   LearningRate 0.0239   Epoch: 10   Global Step: 170780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:09,582-Speed 9128.30 samples/sec   Loss 5.5435   LearningRate 0.0239   Epoch: 10   Global Step: 170790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:10,683-Speed 9302.14 samples/sec   Loss 5.5220   LearningRate 0.0238   Epoch: 10   Global Step: 170800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:11,769-Speed 9438.87 samples/sec   Loss 5.4829   LearningRate 0.0238   Epoch: 10   Global Step: 170810   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:26:12,854-Speed 9443.10 samples/sec   Loss 5.5398   LearningRate 0.0238   Epoch: 10   Global Step: 170820   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:13,925-Speed 9564.81 samples/sec   Loss 5.4062   LearningRate 0.0238   Epoch: 10   Global Step: 170830   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:15,008-Speed 9457.89 samples/sec   Loss 5.4543   LearningRate 0.0238   Epoch: 10   Global Step: 170840   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:16,050-Speed 9833.47 samples/sec   Loss 5.4955   LearningRate 0.0238   Epoch: 10   Global Step: 170850   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:17,130-Speed 9489.90 samples/sec   Loss 5.4974   LearningRate 0.0238   Epoch: 10   Global Step: 170860   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:18,233-Speed 9287.61 samples/sec   Loss 5.4537   LearningRate 0.0238   Epoch: 10   Global Step: 170870   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:19,346-Speed 9203.60 samples/sec   Loss 5.4648   LearningRate 0.0238   Epoch: 10   Global Step: 170880   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:20,444-Speed 9338.48 samples/sec   Loss 5.4849   LearningRate 0.0238   Epoch: 10   Global Step: 170890   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:21,529-Speed 9438.49 samples/sec   Loss 5.5564   LearningRate 0.0238   Epoch: 10   Global Step: 170900   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:22,654-Speed 9112.66 samples/sec   Loss 5.5614   LearningRate 0.0238   Epoch: 10   Global Step: 170910   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:23,741-Speed 9425.43 samples/sec   Loss 5.5351   LearningRate 0.0238   Epoch: 10   Global Step: 170920   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:26:24,804-Speed 9643.99 samples/sec   Loss 5.5650   LearningRate 0.0238   Epoch: 10   Global Step: 170930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:26:25,860-Speed 9694.32 samples/sec   Loss 5.5257   LearningRate 0.0238   Epoch: 10   Global Step: 170940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:26:26,927-Speed 9602.57 samples/sec   Loss 5.5267   LearningRate 0.0238   Epoch: 10   Global Step: 170950   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:26:28,023-Speed 9351.25 samples/sec   Loss 5.4834   LearningRate 0.0238   Epoch: 10   Global Step: 170960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:26:29,106-Speed 9458.46 samples/sec   Loss 5.5675   LearningRate 0.0238   Epoch: 10   Global Step: 170970   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:26:30,186-Speed 9492.81 samples/sec   Loss 5.5020   LearningRate 0.0238   Epoch: 10   Global Step: 170980   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:26:31,256-Speed 9585.87 samples/sec   Loss 5.5465   LearningRate 0.0238   Epoch: 10   Global Step: 170990   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:26:32,396-Speed 8983.43 samples/sec   Loss 5.5343   LearningRate 0.0238   Epoch: 10   Global Step: 171000   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:26:33,487-Speed 9388.58 samples/sec   Loss 5.5768   LearningRate 0.0238   Epoch: 10   Global Step: 171010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:26:34,551-Speed 9629.77 samples/sec   Loss 5.5232   LearningRate 0.0238   Epoch: 10   Global Step: 171020   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:26:35,669-Speed 9168.18 samples/sec   Loss 5.5397   LearningRate 0.0238   Epoch: 10   Global Step: 171030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:36,771-Speed 9292.12 samples/sec   Loss 5.4636   LearningRate 0.0238   Epoch: 10   Global Step: 171040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:37,872-Speed 9311.95 samples/sec   Loss 5.5524   LearningRate 0.0238   Epoch: 10   Global Step: 171050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:38,937-Speed 9620.09 samples/sec   Loss 5.5558   LearningRate 0.0238   Epoch: 10   Global Step: 171060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:40,022-Speed 9440.18 samples/sec   Loss 5.6030   LearningRate 0.0238   Epoch: 10   Global Step: 171070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:41,150-Speed 9085.16 samples/sec   Loss 5.5794   LearningRate 0.0238   Epoch: 10   Global Step: 171080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:42,233-Speed 9466.86 samples/sec   Loss 5.4805   LearningRate 0.0238   Epoch: 10   Global Step: 171090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:43,313-Speed 9489.74 samples/sec   Loss 5.5381   LearningRate 0.0238   Epoch: 10   Global Step: 171100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:44,478-Speed 8790.06 samples/sec   Loss 5.5403   LearningRate 0.0238   Epoch: 10   Global Step: 171110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:45,568-Speed 9403.97 samples/sec   Loss 5.6051   LearningRate 0.0238   Epoch: 10   Global Step: 171120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:46,656-Speed 9409.92 samples/sec   Loss 5.5571   LearningRate 0.0238   Epoch: 10   Global Step: 171130   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:26:47,715-Speed 9682.35 samples/sec   Loss 5.4747   LearningRate 0.0237   Epoch: 10   Global Step: 171140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:48,829-Speed 9193.05 samples/sec   Loss 5.4703   LearningRate 0.0237   Epoch: 10   Global Step: 171150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:49,906-Speed 9521.05 samples/sec   Loss 5.4527   LearningRate 0.0237   Epoch: 10   Global Step: 171160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:51,011-Speed 9271.61 samples/sec   Loss 5.5124   LearningRate 0.0237   Epoch: 10   Global Step: 171170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:52,139-Speed 9080.13 samples/sec   Loss 5.6144   LearningRate 0.0237   Epoch: 10   Global Step: 171180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:53,221-Speed 9472.73 samples/sec   Loss 5.5612   LearningRate 0.0237   Epoch: 10   Global Step: 171190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:54,320-Speed 9320.00 samples/sec   Loss 5.4660   LearningRate 0.0237   Epoch: 10   Global Step: 171200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:55,379-Speed 9676.58 samples/sec   Loss 5.5171   LearningRate 0.0237   Epoch: 10   Global Step: 171210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:56,504-Speed 9106.97 samples/sec   Loss 5.6502   LearningRate 0.0237   Epoch: 10   Global Step: 171220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:57,589-Speed 9441.13 samples/sec   Loss 5.5783   LearningRate 0.0237   Epoch: 10   Global Step: 171230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:26:58,682-Speed 9373.42 samples/sec   Loss 5.6178   LearningRate 0.0237   Epoch: 10   Global Step: 171240   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:26:59,788-Speed 9262.33 samples/sec   Loss 5.4744   LearningRate 0.0237   Epoch: 10   Global Step: 171250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:00,895-Speed 9264.90 samples/sec   Loss 5.5239   LearningRate 0.0237   Epoch: 10   Global Step: 171260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:01,992-Speed 9341.27 samples/sec   Loss 5.5629   LearningRate 0.0237   Epoch: 10   Global Step: 171270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:03,127-Speed 9021.99 samples/sec   Loss 5.6025   LearningRate 0.0237   Epoch: 10   Global Step: 171280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:04,223-Speed 9346.69 samples/sec   Loss 5.6275   LearningRate 0.0237   Epoch: 10   Global Step: 171290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:05,346-Speed 9123.45 samples/sec   Loss 5.4352   LearningRate 0.0237   Epoch: 10   Global Step: 171300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:06,459-Speed 9203.02 samples/sec   Loss 5.4872   LearningRate 0.0237   Epoch: 10   Global Step: 171310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:07,585-Speed 9105.88 samples/sec   Loss 5.5624   LearningRate 0.0237   Epoch: 10   Global Step: 171320   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:08,672-Speed 9424.25 samples/sec   Loss 5.5347   LearningRate 0.0237   Epoch: 10   Global Step: 171330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:09,739-Speed 9598.88 samples/sec   Loss 5.6009   LearningRate 0.0237   Epoch: 10   Global Step: 171340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:10,799-Speed 9666.17 samples/sec   Loss 5.4960   LearningRate 0.0237   Epoch: 10   Global Step: 171350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:11,897-Speed 9336.86 samples/sec   Loss 5.5358   LearningRate 0.0237   Epoch: 10   Global Step: 171360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:12,968-Speed 9564.48 samples/sec   Loss 5.4872   LearningRate 0.0237   Epoch: 10   Global Step: 171370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:14,039-Speed 9562.88 samples/sec   Loss 5.5519   LearningRate 0.0237   Epoch: 10   Global Step: 171380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:15,145-Speed 9265.43 samples/sec   Loss 5.4932   LearningRate 0.0237   Epoch: 10   Global Step: 171390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:16,204-Speed 9672.82 samples/sec   Loss 5.4654   LearningRate 0.0237   Epoch: 10   Global Step: 171400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:17,344-Speed 8989.02 samples/sec   Loss 5.6305   LearningRate 0.0237   Epoch: 10   Global Step: 171410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:18,426-Speed 9468.63 samples/sec   Loss 5.5840   LearningRate 0.0237   Epoch: 10   Global Step: 171420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:19,515-Speed 9419.67 samples/sec   Loss 5.5152   LearningRate 0.0237   Epoch: 10   Global Step: 171430   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:20,651-Speed 9019.27 samples/sec   Loss 5.5365   LearningRate 0.0237   Epoch: 10   Global Step: 171440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:21,773-Speed 9134.72 samples/sec   Loss 5.5302   LearningRate 0.0237   Epoch: 10   Global Step: 171450   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:27:22,858-Speed 9443.72 samples/sec   Loss 5.6631   LearningRate 0.0237   Epoch: 10   Global Step: 171460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:23,924-Speed 9614.88 samples/sec   Loss 5.5453   LearningRate 0.0237   Epoch: 10   Global Step: 171470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:24,998-Speed 9540.76 samples/sec   Loss 5.5755   LearningRate 0.0236   Epoch: 10   Global Step: 171480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:26,104-Speed 9257.47 samples/sec   Loss 5.5644   LearningRate 0.0236   Epoch: 10   Global Step: 171490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:27,210-Speed 9270.96 samples/sec   Loss 5.5750   LearningRate 0.0236   Epoch: 10   Global Step: 171500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:28,275-Speed 9612.29 samples/sec   Loss 5.5567   LearningRate 0.0236   Epoch: 10   Global Step: 171510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:29,404-Speed 9081.26 samples/sec   Loss 5.5473   LearningRate 0.0236   Epoch: 10   Global Step: 171520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:30,471-Speed 9600.71 samples/sec   Loss 5.4623   LearningRate 0.0236   Epoch: 10   Global Step: 171530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:31,590-Speed 9154.59 samples/sec   Loss 5.4775   LearningRate 0.0236   Epoch: 10   Global Step: 171540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:32,712-Speed 9132.77 samples/sec   Loss 5.5077   LearningRate 0.0236   Epoch: 10   Global Step: 171550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:33,818-Speed 9263.26 samples/sec   Loss 5.5119   LearningRate 0.0236   Epoch: 10   Global Step: 171560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:34,891-Speed 9546.19 samples/sec   Loss 5.4713   LearningRate 0.0236   Epoch: 10   Global Step: 171570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:35,954-Speed 9644.69 samples/sec   Loss 5.4834   LearningRate 0.0236   Epoch: 10   Global Step: 171580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:37,073-Speed 9150.80 samples/sec   Loss 5.5795   LearningRate 0.0236   Epoch: 10   Global Step: 171590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:38,245-Speed 8747.79 samples/sec   Loss 5.5465   LearningRate 0.0236   Epoch: 10   Global Step: 171600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:39,350-Speed 9269.70 samples/sec   Loss 5.5165   LearningRate 0.0236   Epoch: 10   Global Step: 171610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:40,428-Speed 9508.84 samples/sec   Loss 5.4851   LearningRate 0.0236   Epoch: 10   Global Step: 171620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:41,528-Speed 9319.69 samples/sec   Loss 5.4647   LearningRate 0.0236   Epoch: 10   Global Step: 171630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:42,623-Speed 9356.10 samples/sec   Loss 5.5020   LearningRate 0.0236   Epoch: 10   Global Step: 171640   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:43,721-Speed 9328.15 samples/sec   Loss 5.5178   LearningRate 0.0236   Epoch: 10   Global Step: 171650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:44,843-Speed 9129.96 samples/sec   Loss 5.4866   LearningRate 0.0236   Epoch: 10   Global Step: 171660   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:27:45,909-Speed 9611.13 samples/sec   Loss 5.4920   LearningRate 0.0236   Epoch: 10   Global Step: 171670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:47,007-Speed 9334.11 samples/sec   Loss 5.6501   LearningRate 0.0236   Epoch: 10   Global Step: 171680   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:48,100-Speed 9372.69 samples/sec   Loss 5.6029   LearningRate 0.0236   Epoch: 10   Global Step: 171690   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:49,203-Speed 9291.18 samples/sec   Loss 5.5947   LearningRate 0.0236   Epoch: 10   Global Step: 171700   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:50,317-Speed 9202.53 samples/sec   Loss 5.4646   LearningRate 0.0236   Epoch: 10   Global Step: 171710   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:51,405-Speed 9412.84 samples/sec   Loss 5.5771   LearningRate 0.0236   Epoch: 10   Global Step: 171720   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:52,504-Speed 9320.97 samples/sec   Loss 5.4662   LearningRate 0.0236   Epoch: 10   Global Step: 171730   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:53,601-Speed 9345.85 samples/sec   Loss 5.5848   LearningRate 0.0236   Epoch: 10   Global Step: 171740   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:54,655-Speed 9718.85 samples/sec   Loss 5.5786   LearningRate 0.0236   Epoch: 10   Global Step: 171750   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:55,736-Speed 9479.51 samples/sec   Loss 5.5367   LearningRate 0.0236   Epoch: 10   Global Step: 171760   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:56,816-Speed 9487.14 samples/sec   Loss 5.5408   LearningRate 0.0236   Epoch: 10   Global Step: 171770   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:27:57,927-Speed 9225.62 samples/sec   Loss 5.5885   LearningRate 0.0236   Epoch: 10   Global Step: 171780   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:27:58,999-Speed 9553.59 samples/sec   Loss 5.5421   LearningRate 0.0236   Epoch: 10   Global Step: 171790   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:28:00,052-Speed 9727.00 samples/sec   Loss 5.5887   LearningRate 0.0236   Epoch: 10   Global Step: 171800   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:28:01,145-Speed 9385.14 samples/sec   Loss 5.5864   LearningRate 0.0236   Epoch: 10   Global Step: 171810   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:28:02,190-Speed 9806.39 samples/sec   Loss 5.5235   LearningRate 0.0236   Epoch: 10   Global Step: 171820   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:28:03,316-Speed 9095.31 samples/sec   Loss 5.5889   LearningRate 0.0235   Epoch: 10   Global Step: 171830   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:28:04,411-Speed 9355.09 samples/sec   Loss 5.5935   LearningRate 0.0235   Epoch: 10   Global Step: 171840   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:28:05,535-Speed 9114.28 samples/sec   Loss 5.6051   LearningRate 0.0235   Epoch: 10   Global Step: 171850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:28:06,628-Speed 9376.34 samples/sec   Loss 5.4631   LearningRate 0.0235   Epoch: 10   Global Step: 171860   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:28:07,693-Speed 9622.81 samples/sec   Loss 5.5725   LearningRate 0.0235   Epoch: 10   Global Step: 171870   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:28:08,763-Speed 9581.33 samples/sec   Loss 5.6211   LearningRate 0.0235   Epoch: 10   Global Step: 171880   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:28:09,848-Speed 9442.56 samples/sec   Loss 5.6272   LearningRate 0.0235   Epoch: 10   Global Step: 171890   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:28:10,921-Speed 9545.96 samples/sec   Loss 5.6065   LearningRate 0.0235   Epoch: 10   Global Step: 171900   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:28:11,991-Speed 9583.77 samples/sec   Loss 5.4647   LearningRate 0.0235   Epoch: 10   Global Step: 171910   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-11 18:28:13,159-Speed 8770.48 samples/sec   Loss 5.6828   LearningRate 0.0235   Epoch: 10   Global Step: 171920   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:28:14,211-Speed 9734.65 samples/sec   Loss 5.6659   LearningRate 0.0235   Epoch: 10   Global Step: 171930   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:28:15,240-Speed 9963.98 samples/sec   Loss 5.6464   LearningRate 0.0235   Epoch: 10   Global Step: 171940   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:28:16,284-Speed 9813.64 samples/sec   Loss 5.5492   LearningRate 0.0235   Epoch: 10   Global Step: 171950   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:28:17,331-Speed 9786.93 samples/sec   Loss 5.4665   LearningRate 0.0235   Epoch: 10   Global Step: 171960   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:28:18,396-Speed 9620.33 samples/sec   Loss 5.6234   LearningRate 0.0235   Epoch: 10   Global Step: 171970   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:28:19,455-Speed 9670.38 samples/sec   Loss 5.5691   LearningRate 0.0235   Epoch: 10   Global Step: 171980   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:28:20,535-Speed 9498.46 samples/sec   Loss 5.5662   LearningRate 0.0235   Epoch: 10   Global Step: 171990   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:28:21,631-Speed 9347.35 samples/sec   Loss 5.7322   LearningRate 0.0235   Epoch: 10   Global Step: 172000   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:28:43,747-[lfw][172000]XNorm: 9.457694
Training: 2022-04-11 18:28:43,748-[lfw][172000]Accuracy-Flip: 0.99683+-0.00252
Training: 2022-04-11 18:28:43,748-[lfw][172000]Accuracy-Highest: 0.99683
Training: 2022-04-11 18:29:09,279-[cfp_fp][172000]XNorm: 8.043288
Training: 2022-04-11 18:29:09,280-[cfp_fp][172000]Accuracy-Flip: 0.96257+-0.00983
Training: 2022-04-11 18:29:09,280-[cfp_fp][172000]Accuracy-Highest: 0.96500
Training: 2022-04-11 18:29:31,196-[agedb_30][172000]XNorm: 9.145506
Training: 2022-04-11 18:29:31,196-[agedb_30][172000]Accuracy-Flip: 0.96583+-0.00857
Training: 2022-04-11 18:29:31,197-[agedb_30][172000]Accuracy-Highest: 0.96917
Training: 2022-04-11 18:29:32,274-Speed 144.95 samples/sec   Loss 5.5827   LearningRate 0.0235   Epoch: 10   Global Step: 172010   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:33,332-Speed 9686.39 samples/sec   Loss 5.6046   LearningRate 0.0235   Epoch: 10   Global Step: 172020   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:34,468-Speed 9015.90 samples/sec   Loss 5.6371   LearningRate 0.0235   Epoch: 10   Global Step: 172030   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:35,549-Speed 9476.75 samples/sec   Loss 5.6314   LearningRate 0.0235   Epoch: 10   Global Step: 172040   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:36,618-Speed 9588.36 samples/sec   Loss 5.6463   LearningRate 0.0235   Epoch: 10   Global Step: 172050   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:37,723-Speed 9274.64 samples/sec   Loss 5.5457   LearningRate 0.0235   Epoch: 10   Global Step: 172060   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:38,848-Speed 9107.41 samples/sec   Loss 5.5555   LearningRate 0.0235   Epoch: 10   Global Step: 172070   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:39,925-Speed 9516.42 samples/sec   Loss 5.6453   LearningRate 0.0235   Epoch: 10   Global Step: 172080   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:40,997-Speed 9554.79 samples/sec   Loss 5.5321   LearningRate 0.0235   Epoch: 10   Global Step: 172090   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:42,050-Speed 9734.93 samples/sec   Loss 5.5404   LearningRate 0.0235   Epoch: 10   Global Step: 172100   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:43,122-Speed 9560.50 samples/sec   Loss 5.5767   LearningRate 0.0235   Epoch: 10   Global Step: 172110   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:44,191-Speed 9587.19 samples/sec   Loss 5.5760   LearningRate 0.0235   Epoch: 10   Global Step: 172120   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:45,261-Speed 9574.35 samples/sec   Loss 5.5661   LearningRate 0.0235   Epoch: 10   Global Step: 172130   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:46,336-Speed 9526.42 samples/sec   Loss 5.6389   LearningRate 0.0235   Epoch: 10   Global Step: 172140   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:47,430-Speed 9370.37 samples/sec   Loss 5.6453   LearningRate 0.0235   Epoch: 10   Global Step: 172150   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:48,500-Speed 9574.25 samples/sec   Loss 5.4544   LearningRate 0.0235   Epoch: 10   Global Step: 172160   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:49,588-Speed 9412.57 samples/sec   Loss 5.4998   LearningRate 0.0234   Epoch: 10   Global Step: 172170   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:50,653-Speed 9624.72 samples/sec   Loss 5.6535   LearningRate 0.0234   Epoch: 10   Global Step: 172180   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:51,727-Speed 9547.13 samples/sec   Loss 5.5529   LearningRate 0.0234   Epoch: 10   Global Step: 172190   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:52,845-Speed 9158.74 samples/sec   Loss 5.5317   LearningRate 0.0234   Epoch: 10   Global Step: 172200   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:53,941-Speed 9351.80 samples/sec   Loss 5.5066   LearningRate 0.0234   Epoch: 10   Global Step: 172210   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:55,008-Speed 9595.67 samples/sec   Loss 5.6454   LearningRate 0.0234   Epoch: 10   Global Step: 172220   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:56,064-Speed 9703.56 samples/sec   Loss 5.6063   LearningRate 0.0234   Epoch: 10   Global Step: 172230   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:57,130-Speed 9610.63 samples/sec   Loss 5.5804   LearningRate 0.0234   Epoch: 10   Global Step: 172240   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:58,215-Speed 9448.26 samples/sec   Loss 5.5378   LearningRate 0.0234   Epoch: 10   Global Step: 172250   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:29:59,334-Speed 9156.06 samples/sec   Loss 5.5594   LearningRate 0.0234   Epoch: 10   Global Step: 172260   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:00,438-Speed 9280.07 samples/sec   Loss 5.5884   LearningRate 0.0234   Epoch: 10   Global Step: 172270   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:01,528-Speed 9400.71 samples/sec   Loss 5.4831   LearningRate 0.0234   Epoch: 10   Global Step: 172280   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:02,636-Speed 9252.25 samples/sec   Loss 5.6298   LearningRate 0.0234   Epoch: 10   Global Step: 172290   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:03,736-Speed 9308.93 samples/sec   Loss 5.5594   LearningRate 0.0234   Epoch: 10   Global Step: 172300   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:04,800-Speed 9633.95 samples/sec   Loss 5.5671   LearningRate 0.0234   Epoch: 10   Global Step: 172310   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:05,888-Speed 9414.05 samples/sec   Loss 5.5861   LearningRate 0.0234   Epoch: 10   Global Step: 172320   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:30:06,954-Speed 9614.80 samples/sec   Loss 5.5587   LearningRate 0.0234   Epoch: 10   Global Step: 172330   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:08,043-Speed 9413.80 samples/sec   Loss 5.4507   LearningRate 0.0234   Epoch: 10   Global Step: 172340   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:09,113-Speed 9573.21 samples/sec   Loss 5.5378   LearningRate 0.0234   Epoch: 10   Global Step: 172350   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:10,193-Speed 9487.64 samples/sec   Loss 5.5647   LearningRate 0.0234   Epoch: 10   Global Step: 172360   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:11,271-Speed 9507.76 samples/sec   Loss 5.5757   LearningRate 0.0234   Epoch: 10   Global Step: 172370   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:12,382-Speed 9216.37 samples/sec   Loss 5.5760   LearningRate 0.0234   Epoch: 10   Global Step: 172380   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:13,535-Speed 8885.81 samples/sec   Loss 5.5871   LearningRate 0.0234   Epoch: 10   Global Step: 172390   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:14,639-Speed 9284.89 samples/sec   Loss 5.5465   LearningRate 0.0234   Epoch: 10   Global Step: 172400   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:15,741-Speed 9298.57 samples/sec   Loss 5.5674   LearningRate 0.0234   Epoch: 10   Global Step: 172410   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:16,826-Speed 9443.17 samples/sec   Loss 5.5120   LearningRate 0.0234   Epoch: 10   Global Step: 172420   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:17,906-Speed 9486.90 samples/sec   Loss 5.6068   LearningRate 0.0234   Epoch: 10   Global Step: 172430   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:30:18,995-Speed 9404.19 samples/sec   Loss 5.4996   LearningRate 0.0234   Epoch: 10   Global Step: 172440   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:20,066-Speed 9570.80 samples/sec   Loss 5.5614   LearningRate 0.0234   Epoch: 10   Global Step: 172450   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:21,162-Speed 9347.74 samples/sec   Loss 5.5789   LearningRate 0.0234   Epoch: 10   Global Step: 172460   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:22,271-Speed 9241.49 samples/sec   Loss 5.5680   LearningRate 0.0234   Epoch: 10   Global Step: 172470   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:23,397-Speed 9096.51 samples/sec   Loss 5.5016   LearningRate 0.0234   Epoch: 10   Global Step: 172480   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:24,456-Speed 9676.49 samples/sec   Loss 5.5561   LearningRate 0.0234   Epoch: 10   Global Step: 172490   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:25,549-Speed 9376.34 samples/sec   Loss 5.5426   LearningRate 0.0234   Epoch: 10   Global Step: 172500   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:26,614-Speed 9617.28 samples/sec   Loss 5.5731   LearningRate 0.0234   Epoch: 10   Global Step: 172510   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:27,709-Speed 9358.28 samples/sec   Loss 5.5701   LearningRate 0.0233   Epoch: 10   Global Step: 172520   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:28,802-Speed 9377.93 samples/sec   Loss 5.5519   LearningRate 0.0233   Epoch: 10   Global Step: 172530   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:29,871-Speed 9583.01 samples/sec   Loss 5.6179   LearningRate 0.0233   Epoch: 10   Global Step: 172540   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:30,930-Speed 9672.70 samples/sec   Loss 5.5403   LearningRate 0.0233   Epoch: 10   Global Step: 172550   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:32,020-Speed 9404.85 samples/sec   Loss 5.5642   LearningRate 0.0233   Epoch: 10   Global Step: 172560   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:33,080-Speed 9664.84 samples/sec   Loss 5.4467   LearningRate 0.0233   Epoch: 10   Global Step: 172570   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:34,149-Speed 9583.50 samples/sec   Loss 5.5802   LearningRate 0.0233   Epoch: 10   Global Step: 172580   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:35,260-Speed 9222.42 samples/sec   Loss 5.5901   LearningRate 0.0233   Epoch: 10   Global Step: 172590   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:36,351-Speed 9388.64 samples/sec   Loss 5.5232   LearningRate 0.0233   Epoch: 10   Global Step: 172600   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:37,430-Speed 9499.58 samples/sec   Loss 5.5933   LearningRate 0.0233   Epoch: 10   Global Step: 172610   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:38,531-Speed 9306.44 samples/sec   Loss 5.5855   LearningRate 0.0233   Epoch: 10   Global Step: 172620   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:39,630-Speed 9321.33 samples/sec   Loss 5.6434   LearningRate 0.0233   Epoch: 10   Global Step: 172630   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:40,716-Speed 9437.41 samples/sec   Loss 5.5760   LearningRate 0.0233   Epoch: 10   Global Step: 172640   Fp16 Grad Scale: 262144   Required: 7 hours
Training: 2022-04-11 18:30:41,777-Speed 9654.62 samples/sec   Loss 5.5378   LearningRate 0.0233   Epoch: 10   Global Step: 172650   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:42,857-Speed 9489.05 samples/sec   Loss 5.5836   LearningRate 0.0233   Epoch: 10   Global Step: 172660   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:43,922-Speed 9620.02 samples/sec   Loss 5.6738   LearningRate 0.0233   Epoch: 10   Global Step: 172670   Fp16 Grad Scale: 131072   Required: 7 hours
Training: 2022-04-11 18:30:45,000-Speed 9504.05 samples/sec   Loss 5.5561   LearningRate 0.0233   Epoch: 10   Global Step: 172680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:30:46,114-Speed 9197.71 samples/sec   Loss 5.5414   LearningRate 0.0233   Epoch: 10   Global Step: 172690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:30:47,195-Speed 9478.23 samples/sec   Loss 5.5893   LearningRate 0.0233   Epoch: 10   Global Step: 172700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:30:48,279-Speed 9451.60 samples/sec   Loss 5.6265   LearningRate 0.0233   Epoch: 10   Global Step: 172710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:30:49,363-Speed 9456.04 samples/sec   Loss 5.6916   LearningRate 0.0233   Epoch: 10   Global Step: 172720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:30:50,453-Speed 9392.68 samples/sec   Loss 5.5850   LearningRate 0.0233   Epoch: 10   Global Step: 172730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:30:51,542-Speed 9413.23 samples/sec   Loss 5.6390   LearningRate 0.0233   Epoch: 10   Global Step: 172740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:30:52,609-Speed 9603.45 samples/sec   Loss 5.5921   LearningRate 0.0233   Epoch: 10   Global Step: 172750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:30:53,693-Speed 9459.77 samples/sec   Loss 5.5584   LearningRate 0.0233   Epoch: 10   Global Step: 172760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:30:54,759-Speed 9608.95 samples/sec   Loss 5.5700   LearningRate 0.0233   Epoch: 10   Global Step: 172770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:30:55,838-Speed 9494.82 samples/sec   Loss 5.4935   LearningRate 0.0233   Epoch: 10   Global Step: 172780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:30:56,913-Speed 9525.34 samples/sec   Loss 5.6042   LearningRate 0.0233   Epoch: 10   Global Step: 172790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:30:58,009-Speed 9356.49 samples/sec   Loss 5.6547   LearningRate 0.0233   Epoch: 10   Global Step: 172800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:30:59,106-Speed 9334.31 samples/sec   Loss 5.6087   LearningRate 0.0233   Epoch: 10   Global Step: 172810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:00,209-Speed 9286.19 samples/sec   Loss 5.5477   LearningRate 0.0233   Epoch: 10   Global Step: 172820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:01,282-Speed 9560.91 samples/sec   Loss 5.6586   LearningRate 0.0233   Epoch: 10   Global Step: 172830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:02,387-Speed 9272.03 samples/sec   Loss 5.6409   LearningRate 0.0233   Epoch: 10   Global Step: 172840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:03,469-Speed 9471.08 samples/sec   Loss 5.5002   LearningRate 0.0233   Epoch: 10   Global Step: 172850   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:31:04,520-Speed 9749.53 samples/sec   Loss 5.6158   LearningRate 0.0232   Epoch: 10   Global Step: 172860   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:31:05,588-Speed 9594.44 samples/sec   Loss 5.6422   LearningRate 0.0232   Epoch: 10   Global Step: 172870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:06,666-Speed 9502.73 samples/sec   Loss 5.5283   LearningRate 0.0232   Epoch: 10   Global Step: 172880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:07,786-Speed 9154.84 samples/sec   Loss 5.5753   LearningRate 0.0232   Epoch: 10   Global Step: 172890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:08,880-Speed 9360.64 samples/sec   Loss 5.6212   LearningRate 0.0232   Epoch: 10   Global Step: 172900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:09,991-Speed 9225.89 samples/sec   Loss 5.7212   LearningRate 0.0232   Epoch: 10   Global Step: 172910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:11,081-Speed 9392.34 samples/sec   Loss 5.5544   LearningRate 0.0232   Epoch: 10   Global Step: 172920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:12,177-Speed 9353.06 samples/sec   Loss 5.4916   LearningRate 0.0232   Epoch: 10   Global Step: 172930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:13,279-Speed 9298.24 samples/sec   Loss 5.6053   LearningRate 0.0232   Epoch: 10   Global Step: 172940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:14,328-Speed 9764.13 samples/sec   Loss 5.6503   LearningRate 0.0232   Epoch: 10   Global Step: 172950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:15,437-Speed 9234.04 samples/sec   Loss 5.5632   LearningRate 0.0232   Epoch: 10   Global Step: 172960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:16,570-Speed 9042.70 samples/sec   Loss 5.6808   LearningRate 0.0232   Epoch: 10   Global Step: 172970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:17,649-Speed 9501.98 samples/sec   Loss 5.5070   LearningRate 0.0232   Epoch: 10   Global Step: 172980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:18,730-Speed 9478.37 samples/sec   Loss 5.5502   LearningRate 0.0232   Epoch: 10   Global Step: 172990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:19,813-Speed 9463.88 samples/sec   Loss 5.5930   LearningRate 0.0232   Epoch: 10   Global Step: 173000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:20,882-Speed 9582.10 samples/sec   Loss 5.6079   LearningRate 0.0232   Epoch: 10   Global Step: 173010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:21,983-Speed 9304.59 samples/sec   Loss 5.4618   LearningRate 0.0232   Epoch: 10   Global Step: 173020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:23,079-Speed 9354.50 samples/sec   Loss 5.5735   LearningRate 0.0232   Epoch: 10   Global Step: 173030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:24,152-Speed 9545.09 samples/sec   Loss 5.5949   LearningRate 0.0232   Epoch: 10   Global Step: 173040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:25,248-Speed 9352.39 samples/sec   Loss 5.4886   LearningRate 0.0232   Epoch: 10   Global Step: 173050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:26,364-Speed 9179.44 samples/sec   Loss 5.5332   LearningRate 0.0232   Epoch: 10   Global Step: 173060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:27,477-Speed 9205.33 samples/sec   Loss 5.5173   LearningRate 0.0232   Epoch: 10   Global Step: 173070   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:31:28,548-Speed 9560.71 samples/sec   Loss 5.5007   LearningRate 0.0232   Epoch: 10   Global Step: 173080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:29,612-Speed 9636.96 samples/sec   Loss 5.7483   LearningRate 0.0232   Epoch: 10   Global Step: 173090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:30,696-Speed 9445.62 samples/sec   Loss 5.6028   LearningRate 0.0232   Epoch: 10   Global Step: 173100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:31,788-Speed 9387.28 samples/sec   Loss 5.6370   LearningRate 0.0232   Epoch: 10   Global Step: 173110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:32,888-Speed 9315.71 samples/sec   Loss 5.7136   LearningRate 0.0232   Epoch: 10   Global Step: 173120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:34,004-Speed 9177.89 samples/sec   Loss 5.5803   LearningRate 0.0232   Epoch: 10   Global Step: 173130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:35,072-Speed 9599.67 samples/sec   Loss 5.6178   LearningRate 0.0232   Epoch: 10   Global Step: 173140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:36,150-Speed 9499.23 samples/sec   Loss 5.5502   LearningRate 0.0232   Epoch: 10   Global Step: 173150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:37,245-Speed 9362.96 samples/sec   Loss 5.6483   LearningRate 0.0232   Epoch: 10   Global Step: 173160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:38,352-Speed 9256.74 samples/sec   Loss 5.5507   LearningRate 0.0232   Epoch: 10   Global Step: 173170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:39,422-Speed 9573.57 samples/sec   Loss 5.5569   LearningRate 0.0232   Epoch: 10   Global Step: 173180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:40,536-Speed 9200.59 samples/sec   Loss 5.6275   LearningRate 0.0232   Epoch: 10   Global Step: 173190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:41,674-Speed 9002.09 samples/sec   Loss 5.5848   LearningRate 0.0232   Epoch: 10   Global Step: 173200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:42,754-Speed 9491.64 samples/sec   Loss 5.6528   LearningRate 0.0231   Epoch: 10   Global Step: 173210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:43,868-Speed 9197.53 samples/sec   Loss 5.5468   LearningRate 0.0231   Epoch: 10   Global Step: 173220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:44,916-Speed 9775.95 samples/sec   Loss 5.6013   LearningRate 0.0231   Epoch: 10   Global Step: 173230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:31:46,003-Speed 9426.09 samples/sec   Loss 5.6013   LearningRate 0.0231   Epoch: 10   Global Step: 173240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:31:47,089-Speed 9431.99 samples/sec   Loss 5.6857   LearningRate 0.0231   Epoch: 10   Global Step: 173250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:31:48,152-Speed 9639.24 samples/sec   Loss 5.5188   LearningRate 0.0231   Epoch: 10   Global Step: 173260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:31:49,212-Speed 9667.06 samples/sec   Loss 5.5418   LearningRate 0.0231   Epoch: 10   Global Step: 173270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:31:50,283-Speed 9571.17 samples/sec   Loss 5.6274   LearningRate 0.0231   Epoch: 10   Global Step: 173280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:31:51,347-Speed 9625.48 samples/sec   Loss 5.5808   LearningRate 0.0231   Epoch: 10   Global Step: 173290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:31:52,469-Speed 9137.87 samples/sec   Loss 5.5421   LearningRate 0.0231   Epoch: 10   Global Step: 173300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:31:53,567-Speed 9324.52 samples/sec   Loss 5.6001   LearningRate 0.0231   Epoch: 10   Global Step: 173310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:31:54,646-Speed 9495.26 samples/sec   Loss 5.6347   LearningRate 0.0231   Epoch: 10   Global Step: 173320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:31:55,698-Speed 9738.89 samples/sec   Loss 5.6382   LearningRate 0.0231   Epoch: 10   Global Step: 173330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:56,773-Speed 9535.92 samples/sec   Loss 5.5516   LearningRate 0.0231   Epoch: 10   Global Step: 173340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:57,893-Speed 9146.61 samples/sec   Loss 5.5835   LearningRate 0.0231   Epoch: 10   Global Step: 173350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:31:59,008-Speed 9183.65 samples/sec   Loss 5.6896   LearningRate 0.0231   Epoch: 10   Global Step: 173360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:00,169-Speed 8829.63 samples/sec   Loss 5.6523   LearningRate 0.0231   Epoch: 10   Global Step: 173370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:01,298-Speed 9080.12 samples/sec   Loss 5.5013   LearningRate 0.0231   Epoch: 10   Global Step: 173380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:02,396-Speed 9337.05 samples/sec   Loss 5.6405   LearningRate 0.0231   Epoch: 10   Global Step: 173390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:03,480-Speed 9451.29 samples/sec   Loss 5.6574   LearningRate 0.0231   Epoch: 10   Global Step: 173400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:04,571-Speed 9387.24 samples/sec   Loss 5.5748   LearningRate 0.0231   Epoch: 10   Global Step: 173410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:05,666-Speed 9362.87 samples/sec   Loss 5.5732   LearningRate 0.0231   Epoch: 10   Global Step: 173420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:06,767-Speed 9300.52 samples/sec   Loss 5.5888   LearningRate 0.0231   Epoch: 10   Global Step: 173430   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:32:07,843-Speed 9522.53 samples/sec   Loss 5.6739   LearningRate 0.0231   Epoch: 10   Global Step: 173440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:08,919-Speed 9522.31 samples/sec   Loss 5.6224   LearningRate 0.0231   Epoch: 10   Global Step: 173450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:09,990-Speed 9566.43 samples/sec   Loss 5.5204   LearningRate 0.0231   Epoch: 10   Global Step: 173460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:11,069-Speed 9502.46 samples/sec   Loss 5.6693   LearningRate 0.0231   Epoch: 10   Global Step: 173470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:12,171-Speed 9291.84 samples/sec   Loss 5.6338   LearningRate 0.0231   Epoch: 10   Global Step: 173480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:13,274-Speed 9292.22 samples/sec   Loss 5.6032   LearningRate 0.0231   Epoch: 10   Global Step: 173490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:14,325-Speed 9749.81 samples/sec   Loss 5.5986   LearningRate 0.0231   Epoch: 10   Global Step: 173500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:15,436-Speed 9222.32 samples/sec   Loss 5.4989   LearningRate 0.0231   Epoch: 10   Global Step: 173510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:16,503-Speed 9603.88 samples/sec   Loss 5.6024   LearningRate 0.0231   Epoch: 10   Global Step: 173520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:17,577-Speed 9534.57 samples/sec   Loss 5.6039   LearningRate 0.0231   Epoch: 10   Global Step: 173530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:18,757-Speed 8683.27 samples/sec   Loss 5.6394   LearningRate 0.0231   Epoch: 10   Global Step: 173540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:19,811-Speed 9726.22 samples/sec   Loss 5.5180   LearningRate 0.0231   Epoch: 10   Global Step: 173550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:20,917-Speed 9264.72 samples/sec   Loss 5.5800   LearningRate 0.0230   Epoch: 10   Global Step: 173560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:22,018-Speed 9309.15 samples/sec   Loss 5.5840   LearningRate 0.0230   Epoch: 10   Global Step: 173570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:23,117-Speed 9321.90 samples/sec   Loss 5.6129   LearningRate 0.0230   Epoch: 10   Global Step: 173580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:24,230-Speed 9201.27 samples/sec   Loss 5.7171   LearningRate 0.0230   Epoch: 10   Global Step: 173590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:25,309-Speed 9502.53 samples/sec   Loss 5.5539   LearningRate 0.0230   Epoch: 10   Global Step: 173600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:26,374-Speed 9617.91 samples/sec   Loss 5.5415   LearningRate 0.0230   Epoch: 10   Global Step: 173610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:27,463-Speed 9406.38 samples/sec   Loss 5.6385   LearningRate 0.0230   Epoch: 10   Global Step: 173620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:28,514-Speed 9749.40 samples/sec   Loss 5.6611   LearningRate 0.0230   Epoch: 10   Global Step: 173630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:29,552-Speed 9868.60 samples/sec   Loss 5.6599   LearningRate 0.0230   Epoch: 10   Global Step: 173640   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:32:30,639-Speed 9424.11 samples/sec   Loss 5.5635   LearningRate 0.0230   Epoch: 10   Global Step: 173650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:31,760-Speed 9148.20 samples/sec   Loss 5.5240   LearningRate 0.0230   Epoch: 10   Global Step: 173660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:32,820-Speed 9660.29 samples/sec   Loss 5.4595   LearningRate 0.0230   Epoch: 10   Global Step: 173670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:33,895-Speed 9530.47 samples/sec   Loss 5.6009   LearningRate 0.0230   Epoch: 10   Global Step: 173680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:34,993-Speed 9334.75 samples/sec   Loss 5.6463   LearningRate 0.0230   Epoch: 10   Global Step: 173690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:36,085-Speed 9385.18 samples/sec   Loss 5.6475   LearningRate 0.0230   Epoch: 10   Global Step: 173700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:37,164-Speed 9492.56 samples/sec   Loss 5.6653   LearningRate 0.0230   Epoch: 10   Global Step: 173710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:38,213-Speed 9773.30 samples/sec   Loss 5.6323   LearningRate 0.0230   Epoch: 10   Global Step: 173720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:39,272-Speed 9675.06 samples/sec   Loss 5.6031   LearningRate 0.0230   Epoch: 10   Global Step: 173730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:40,331-Speed 9680.53 samples/sec   Loss 5.7045   LearningRate 0.0230   Epoch: 10   Global Step: 173740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:41,434-Speed 9288.39 samples/sec   Loss 5.5311   LearningRate 0.0230   Epoch: 10   Global Step: 173750   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:32:42,533-Speed 9318.78 samples/sec   Loss 5.6388   LearningRate 0.0230   Epoch: 10   Global Step: 173760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:43,593-Speed 9663.61 samples/sec   Loss 5.6243   LearningRate 0.0230   Epoch: 10   Global Step: 173770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:32:44,698-Speed 9278.85 samples/sec   Loss 5.6181   LearningRate 0.0230   Epoch: 10   Global Step: 173780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:32:45,729-Speed 9930.59 samples/sec   Loss 5.6748   LearningRate 0.0230   Epoch: 10   Global Step: 173790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:32:46,836-Speed 9254.23 samples/sec   Loss 5.6485   LearningRate 0.0230   Epoch: 10   Global Step: 173800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:32:47,915-Speed 9498.10 samples/sec   Loss 5.7208   LearningRate 0.0230   Epoch: 10   Global Step: 173810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:32:48,989-Speed 9543.13 samples/sec   Loss 5.6074   LearningRate 0.0230   Epoch: 10   Global Step: 173820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:32:50,044-Speed 9711.54 samples/sec   Loss 5.6008   LearningRate 0.0230   Epoch: 10   Global Step: 173830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:32:51,120-Speed 9525.11 samples/sec   Loss 5.6252   LearningRate 0.0230   Epoch: 10   Global Step: 173840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:32:52,172-Speed 9736.83 samples/sec   Loss 5.5671   LearningRate 0.0230   Epoch: 10   Global Step: 173850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:32:53,260-Speed 9413.68 samples/sec   Loss 5.6425   LearningRate 0.0230   Epoch: 10   Global Step: 173860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:32:54,357-Speed 9346.43 samples/sec   Loss 5.6208   LearningRate 0.0230   Epoch: 10   Global Step: 173870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:55,426-Speed 9581.75 samples/sec   Loss 5.6055   LearningRate 0.0230   Epoch: 10   Global Step: 173880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:56,517-Speed 9392.39 samples/sec   Loss 5.5638   LearningRate 0.0230   Epoch: 10   Global Step: 173890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:57,618-Speed 9306.95 samples/sec   Loss 5.5687   LearningRate 0.0229   Epoch: 10   Global Step: 173900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:58,720-Speed 9298.85 samples/sec   Loss 5.6595   LearningRate 0.0229   Epoch: 10   Global Step: 173910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:32:59,780-Speed 9664.44 samples/sec   Loss 5.6358   LearningRate 0.0229   Epoch: 10   Global Step: 173920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:33:00,857-Speed 9519.14 samples/sec   Loss 5.5878   LearningRate 0.0229   Epoch: 10   Global Step: 173930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:33:01,922-Speed 9618.18 samples/sec   Loss 5.6249   LearningRate 0.0229   Epoch: 10   Global Step: 173940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:33:02,998-Speed 9523.13 samples/sec   Loss 5.6627   LearningRate 0.0229   Epoch: 10   Global Step: 173950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:33:04,043-Speed 9810.60 samples/sec   Loss 5.7093   LearningRate 0.0229   Epoch: 10   Global Step: 173960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:33:05,114-Speed 9561.44 samples/sec   Loss 5.5373   LearningRate 0.0229   Epoch: 10   Global Step: 173970   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:33:06,192-Speed 9507.31 samples/sec   Loss 5.6511   LearningRate 0.0229   Epoch: 10   Global Step: 173980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:33:07,287-Speed 9359.57 samples/sec   Loss 5.5062   LearningRate 0.0229   Epoch: 10   Global Step: 173990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:33:08,389-Speed 9295.13 samples/sec   Loss 5.5919   LearningRate 0.0229   Epoch: 10   Global Step: 174000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:33:30,363-[lfw][174000]XNorm: 9.322621
Training: 2022-04-11 18:33:30,364-[lfw][174000]Accuracy-Flip: 0.99683+-0.00283
Training: 2022-04-11 18:33:30,365-[lfw][174000]Accuracy-Highest: 0.99683
Training: 2022-04-11 18:33:55,602-[cfp_fp][174000]XNorm: 7.952245
Training: 2022-04-11 18:33:55,603-[cfp_fp][174000]Accuracy-Flip: 0.96443+-0.00970
Training: 2022-04-11 18:33:55,604-[cfp_fp][174000]Accuracy-Highest: 0.96500
Training: 2022-04-11 18:34:17,465-[agedb_30][174000]XNorm: 9.038745
Training: 2022-04-11 18:34:17,466-[agedb_30][174000]Accuracy-Flip: 0.96750+-0.01047
Training: 2022-04-11 18:34:17,466-[agedb_30][174000]Accuracy-Highest: 0.96917
Training: 2022-04-11 18:34:18,565-Speed 145.92 samples/sec   Loss 5.5199   LearningRate 0.0229   Epoch: 10   Global Step: 174010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:19,619-Speed 9720.20 samples/sec   Loss 5.6029   LearningRate 0.0229   Epoch: 10   Global Step: 174020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:20,668-Speed 9766.90 samples/sec   Loss 5.6372   LearningRate 0.0229   Epoch: 10   Global Step: 174030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:21,728-Speed 9668.50 samples/sec   Loss 5.7038   LearningRate 0.0229   Epoch: 10   Global Step: 174040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:22,804-Speed 9519.81 samples/sec   Loss 5.6057   LearningRate 0.0229   Epoch: 10   Global Step: 174050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:23,915-Speed 9220.95 samples/sec   Loss 5.5419   LearningRate 0.0229   Epoch: 10   Global Step: 174060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:25,011-Speed 9354.56 samples/sec   Loss 5.5655   LearningRate 0.0229   Epoch: 10   Global Step: 174070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:26,095-Speed 9455.18 samples/sec   Loss 5.5056   LearningRate 0.0229   Epoch: 10   Global Step: 174080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:27,192-Speed 9332.22 samples/sec   Loss 5.5824   LearningRate 0.0229   Epoch: 10   Global Step: 174090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:28,282-Speed 9402.10 samples/sec   Loss 5.6776   LearningRate 0.0229   Epoch: 10   Global Step: 174100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:29,383-Speed 9308.36 samples/sec   Loss 5.6224   LearningRate 0.0229   Epoch: 10   Global Step: 174110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:30,437-Speed 9720.91 samples/sec   Loss 5.7111   LearningRate 0.0229   Epoch: 10   Global Step: 174120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:31,535-Speed 9329.29 samples/sec   Loss 5.5851   LearningRate 0.0229   Epoch: 10   Global Step: 174130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:32,641-Speed 9264.90 samples/sec   Loss 5.6479   LearningRate 0.0229   Epoch: 10   Global Step: 174140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:33,761-Speed 9149.32 samples/sec   Loss 5.6929   LearningRate 0.0229   Epoch: 10   Global Step: 174150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:34,842-Speed 9474.52 samples/sec   Loss 5.6792   LearningRate 0.0229   Epoch: 10   Global Step: 174160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:35,938-Speed 9346.63 samples/sec   Loss 5.6205   LearningRate 0.0229   Epoch: 10   Global Step: 174170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:36,998-Speed 9673.99 samples/sec   Loss 5.6458   LearningRate 0.0229   Epoch: 10   Global Step: 174180   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:34:38,088-Speed 9404.29 samples/sec   Loss 5.5614   LearningRate 0.0229   Epoch: 10   Global Step: 174190   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:34:39,147-Speed 9673.41 samples/sec   Loss 5.6048   LearningRate 0.0229   Epoch: 10   Global Step: 174200   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:34:40,204-Speed 9694.14 samples/sec   Loss 5.6430   LearningRate 0.0229   Epoch: 10   Global Step: 174210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:41,281-Speed 9514.28 samples/sec   Loss 5.5842   LearningRate 0.0229   Epoch: 10   Global Step: 174220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:42,395-Speed 9196.53 samples/sec   Loss 5.6079   LearningRate 0.0229   Epoch: 10   Global Step: 174230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:43,467-Speed 9557.10 samples/sec   Loss 5.6532   LearningRate 0.0229   Epoch: 10   Global Step: 174240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:44,530-Speed 9642.02 samples/sec   Loss 5.7281   LearningRate 0.0228   Epoch: 10   Global Step: 174250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:45,588-Speed 9686.22 samples/sec   Loss 5.6002   LearningRate 0.0228   Epoch: 10   Global Step: 174260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:46,670-Speed 9465.95 samples/sec   Loss 5.6559   LearningRate 0.0228   Epoch: 10   Global Step: 174270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:47,771-Speed 9304.61 samples/sec   Loss 5.5384   LearningRate 0.0228   Epoch: 10   Global Step: 174280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:48,854-Speed 9463.91 samples/sec   Loss 5.6080   LearningRate 0.0228   Epoch: 10   Global Step: 174290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:34:49,920-Speed 9608.79 samples/sec   Loss 5.6096   LearningRate 0.0228   Epoch: 10   Global Step: 174300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:34:51,000-Speed 9492.63 samples/sec   Loss 5.6364   LearningRate 0.0228   Epoch: 10   Global Step: 174310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:34:52,133-Speed 9038.84 samples/sec   Loss 5.5721   LearningRate 0.0228   Epoch: 10   Global Step: 174320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:34:53,221-Speed 9416.26 samples/sec   Loss 5.6569   LearningRate 0.0228   Epoch: 10   Global Step: 174330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:34:54,328-Speed 9261.09 samples/sec   Loss 5.5632   LearningRate 0.0228   Epoch: 10   Global Step: 174340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:34:55,454-Speed 9094.02 samples/sec   Loss 5.6527   LearningRate 0.0228   Epoch: 10   Global Step: 174350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:34:56,526-Speed 9564.05 samples/sec   Loss 5.6006   LearningRate 0.0228   Epoch: 10   Global Step: 174360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:34:57,636-Speed 9231.81 samples/sec   Loss 5.5520   LearningRate 0.0228   Epoch: 10   Global Step: 174370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:34:58,734-Speed 9324.72 samples/sec   Loss 5.5320   LearningRate 0.0228   Epoch: 10   Global Step: 174380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:34:59,856-Speed 9137.23 samples/sec   Loss 5.5280   LearningRate 0.0228   Epoch: 10   Global Step: 174390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:35:00,952-Speed 9346.45 samples/sec   Loss 5.5696   LearningRate 0.0228   Epoch: 10   Global Step: 174400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:02,072-Speed 9146.97 samples/sec   Loss 5.6434   LearningRate 0.0228   Epoch: 10   Global Step: 174410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:03,138-Speed 9614.65 samples/sec   Loss 5.6533   LearningRate 0.0228   Epoch: 10   Global Step: 174420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:04,246-Speed 9246.67 samples/sec   Loss 5.6886   LearningRate 0.0228   Epoch: 10   Global Step: 174430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:05,359-Speed 9208.63 samples/sec   Loss 5.6965   LearningRate 0.0228   Epoch: 10   Global Step: 174440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:06,484-Speed 9105.09 samples/sec   Loss 5.6072   LearningRate 0.0228   Epoch: 10   Global Step: 174450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:07,564-Speed 9486.32 samples/sec   Loss 5.5338   LearningRate 0.0228   Epoch: 10   Global Step: 174460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:08,604-Speed 9858.04 samples/sec   Loss 5.6277   LearningRate 0.0228   Epoch: 10   Global Step: 174470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:09,670-Speed 9615.80 samples/sec   Loss 5.6957   LearningRate 0.0228   Epoch: 10   Global Step: 174480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:10,787-Speed 9171.87 samples/sec   Loss 5.6925   LearningRate 0.0228   Epoch: 10   Global Step: 174490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:11,902-Speed 9190.88 samples/sec   Loss 5.6239   LearningRate 0.0228   Epoch: 10   Global Step: 174500   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:35:12,955-Speed 9726.70 samples/sec   Loss 5.6452   LearningRate 0.0228   Epoch: 10   Global Step: 174510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:14,068-Speed 9210.22 samples/sec   Loss 5.6603   LearningRate 0.0228   Epoch: 10   Global Step: 174520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:15,179-Speed 9228.34 samples/sec   Loss 5.6667   LearningRate 0.0228   Epoch: 10   Global Step: 174530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:16,264-Speed 9442.63 samples/sec   Loss 5.6968   LearningRate 0.0228   Epoch: 10   Global Step: 174540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:17,360-Speed 9345.97 samples/sec   Loss 5.5446   LearningRate 0.0228   Epoch: 10   Global Step: 174550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:18,454-Speed 9366.63 samples/sec   Loss 5.7333   LearningRate 0.0228   Epoch: 10   Global Step: 174560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:19,533-Speed 9489.09 samples/sec   Loss 5.6429   LearningRate 0.0228   Epoch: 10   Global Step: 174570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:20,630-Speed 9339.89 samples/sec   Loss 5.5744   LearningRate 0.0228   Epoch: 10   Global Step: 174580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:21,690-Speed 9666.93 samples/sec   Loss 5.5233   LearningRate 0.0228   Epoch: 10   Global Step: 174590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:35:22,776-Speed 9440.20 samples/sec   Loss 5.6351   LearningRate 0.0227   Epoch: 10   Global Step: 174600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:35:23,877-Speed 9306.15 samples/sec   Loss 5.5867   LearningRate 0.0227   Epoch: 10   Global Step: 174610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:35:24,965-Speed 9416.71 samples/sec   Loss 5.6746   LearningRate 0.0227   Epoch: 10   Global Step: 174620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:35:26,034-Speed 9585.85 samples/sec   Loss 5.5461   LearningRate 0.0227   Epoch: 10   Global Step: 174630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:35:27,141-Speed 9257.15 samples/sec   Loss 5.6742   LearningRate 0.0227   Epoch: 10   Global Step: 174640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:35:28,223-Speed 9474.28 samples/sec   Loss 5.5964   LearningRate 0.0227   Epoch: 10   Global Step: 174650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:35:29,334-Speed 9218.92 samples/sec   Loss 5.6052   LearningRate 0.0227   Epoch: 10   Global Step: 174660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:35:30,434-Speed 9316.60 samples/sec   Loss 5.6589   LearningRate 0.0227   Epoch: 10   Global Step: 174670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:35:31,529-Speed 9357.14 samples/sec   Loss 5.5429   LearningRate 0.0227   Epoch: 10   Global Step: 174680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:35:32,647-Speed 9162.35 samples/sec   Loss 5.5763   LearningRate 0.0227   Epoch: 10   Global Step: 174690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:33,798-Speed 8896.35 samples/sec   Loss 5.5335   LearningRate 0.0227   Epoch: 10   Global Step: 174700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:34,882-Speed 9457.78 samples/sec   Loss 5.5552   LearningRate 0.0227   Epoch: 10   Global Step: 174710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:35,949-Speed 9596.51 samples/sec   Loss 5.6328   LearningRate 0.0227   Epoch: 10   Global Step: 174720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:37,008-Speed 9682.82 samples/sec   Loss 5.6462   LearningRate 0.0227   Epoch: 10   Global Step: 174730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:38,098-Speed 9398.12 samples/sec   Loss 5.7009   LearningRate 0.0227   Epoch: 10   Global Step: 174740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:39,213-Speed 9192.78 samples/sec   Loss 5.7491   LearningRate 0.0227   Epoch: 10   Global Step: 174750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:40,279-Speed 9613.23 samples/sec   Loss 5.5741   LearningRate 0.0227   Epoch: 10   Global Step: 174760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:41,355-Speed 9518.00 samples/sec   Loss 5.6262   LearningRate 0.0227   Epoch: 10   Global Step: 174770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:42,468-Speed 9209.86 samples/sec   Loss 5.7000   LearningRate 0.0227   Epoch: 10   Global Step: 174780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:43,574-Speed 9284.53 samples/sec   Loss 5.5319   LearningRate 0.0227   Epoch: 10   Global Step: 174790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:44,651-Speed 9515.26 samples/sec   Loss 5.4583   LearningRate 0.0227   Epoch: 10   Global Step: 174800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:45,773-Speed 9130.67 samples/sec   Loss 5.5525   LearningRate 0.0227   Epoch: 10   Global Step: 174810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:46,838-Speed 9618.85 samples/sec   Loss 5.5648   LearningRate 0.0227   Epoch: 10   Global Step: 174820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:47,894-Speed 9707.14 samples/sec   Loss 5.5498   LearningRate 0.0227   Epoch: 10   Global Step: 174830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:48,971-Speed 9515.71 samples/sec   Loss 5.6339   LearningRate 0.0227   Epoch: 10   Global Step: 174840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:50,071-Speed 9313.19 samples/sec   Loss 5.5583   LearningRate 0.0227   Epoch: 10   Global Step: 174850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:51,186-Speed 9184.94 samples/sec   Loss 5.7076   LearningRate 0.0227   Epoch: 10   Global Step: 174860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:52,305-Speed 9154.43 samples/sec   Loss 5.5726   LearningRate 0.0227   Epoch: 10   Global Step: 174870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:53,405-Speed 9315.10 samples/sec   Loss 5.6078   LearningRate 0.0227   Epoch: 10   Global Step: 174880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:35:54,445-Speed 9851.35 samples/sec   Loss 5.6041   LearningRate 0.0227   Epoch: 10   Global Step: 174890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:35:55,526-Speed 9481.66 samples/sec   Loss 5.6018   LearningRate 0.0227   Epoch: 10   Global Step: 174900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:35:56,576-Speed 9759.18 samples/sec   Loss 5.5558   LearningRate 0.0227   Epoch: 10   Global Step: 174910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:35:57,666-Speed 9403.15 samples/sec   Loss 5.5093   LearningRate 0.0227   Epoch: 10   Global Step: 174920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:35:58,770-Speed 9281.29 samples/sec   Loss 5.5244   LearningRate 0.0227   Epoch: 10   Global Step: 174930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:35:59,844-Speed 9539.24 samples/sec   Loss 5.5964   LearningRate 0.0227   Epoch: 10   Global Step: 174940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:00,913-Speed 9583.85 samples/sec   Loss 5.6865   LearningRate 0.0226   Epoch: 10   Global Step: 174950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:02,012-Speed 9319.97 samples/sec   Loss 5.5841   LearningRate 0.0226   Epoch: 10   Global Step: 174960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:03,091-Speed 9494.75 samples/sec   Loss 5.5565   LearningRate 0.0226   Epoch: 10   Global Step: 174970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:04,200-Speed 9245.85 samples/sec   Loss 5.5983   LearningRate 0.0226   Epoch: 10   Global Step: 174980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:05,319-Speed 9150.43 samples/sec   Loss 5.5939   LearningRate 0.0226   Epoch: 10   Global Step: 174990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:06,375-Speed 9706.00 samples/sec   Loss 5.5800   LearningRate 0.0226   Epoch: 10   Global Step: 175000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:07,494-Speed 9161.54 samples/sec   Loss 5.5921   LearningRate 0.0226   Epoch: 10   Global Step: 175010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:08,556-Speed 9643.11 samples/sec   Loss 5.7622   LearningRate 0.0226   Epoch: 10   Global Step: 175020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:09,629-Speed 9548.24 samples/sec   Loss 5.5135   LearningRate 0.0226   Epoch: 10   Global Step: 175030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:10,688-Speed 9680.13 samples/sec   Loss 5.5762   LearningRate 0.0226   Epoch: 10   Global Step: 175040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:11,778-Speed 9397.61 samples/sec   Loss 5.6196   LearningRate 0.0226   Epoch: 10   Global Step: 175050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:12,867-Speed 9411.02 samples/sec   Loss 5.6024   LearningRate 0.0226   Epoch: 10   Global Step: 175060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:13,963-Speed 9354.67 samples/sec   Loss 5.5582   LearningRate 0.0226   Epoch: 10   Global Step: 175070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:15,073-Speed 9225.54 samples/sec   Loss 5.7766   LearningRate 0.0226   Epoch: 10   Global Step: 175080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:16,197-Speed 9115.46 samples/sec   Loss 5.6555   LearningRate 0.0226   Epoch: 10   Global Step: 175090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:17,288-Speed 9390.09 samples/sec   Loss 5.6176   LearningRate 0.0226   Epoch: 10   Global Step: 175100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:18,359-Speed 9568.12 samples/sec   Loss 5.6587   LearningRate 0.0226   Epoch: 10   Global Step: 175110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:19,436-Speed 9517.01 samples/sec   Loss 5.6315   LearningRate 0.0226   Epoch: 10   Global Step: 175120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:20,579-Speed 8962.81 samples/sec   Loss 5.5144   LearningRate 0.0226   Epoch: 10   Global Step: 175130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:21,658-Speed 9490.84 samples/sec   Loss 5.6435   LearningRate 0.0226   Epoch: 10   Global Step: 175140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:22,727-Speed 9592.16 samples/sec   Loss 5.6486   LearningRate 0.0226   Epoch: 10   Global Step: 175150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:23,836-Speed 9237.37 samples/sec   Loss 5.6563   LearningRate 0.0226   Epoch: 10   Global Step: 175160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:24,924-Speed 9418.91 samples/sec   Loss 5.5878   LearningRate 0.0226   Epoch: 10   Global Step: 175170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:26,039-Speed 9189.37 samples/sec   Loss 5.6828   LearningRate 0.0226   Epoch: 10   Global Step: 175180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:27,151-Speed 9209.89 samples/sec   Loss 5.6952   LearningRate 0.0226   Epoch: 10   Global Step: 175190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:28,225-Speed 9544.15 samples/sec   Loss 5.5344   LearningRate 0.0226   Epoch: 10   Global Step: 175200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:29,294-Speed 9579.48 samples/sec   Loss 5.6268   LearningRate 0.0226   Epoch: 10   Global Step: 175210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:30,380-Speed 9436.24 samples/sec   Loss 5.6528   LearningRate 0.0226   Epoch: 10   Global Step: 175220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:31,485-Speed 9275.40 samples/sec   Loss 5.5608   LearningRate 0.0226   Epoch: 10   Global Step: 175230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:32,559-Speed 9539.20 samples/sec   Loss 5.6392   LearningRate 0.0226   Epoch: 10   Global Step: 175240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:33,646-Speed 9425.48 samples/sec   Loss 5.5277   LearningRate 0.0226   Epoch: 10   Global Step: 175250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:34,732-Speed 9437.97 samples/sec   Loss 5.6857   LearningRate 0.0226   Epoch: 10   Global Step: 175260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:35,841-Speed 9238.25 samples/sec   Loss 5.6201   LearningRate 0.0226   Epoch: 10   Global Step: 175270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:36,925-Speed 9449.68 samples/sec   Loss 5.5464   LearningRate 0.0226   Epoch: 10   Global Step: 175280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:37,991-Speed 9611.93 samples/sec   Loss 5.6056   LearningRate 0.0226   Epoch: 10   Global Step: 175290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:39,049-Speed 9683.58 samples/sec   Loss 5.7413   LearningRate 0.0225   Epoch: 10   Global Step: 175300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:40,118-Speed 9584.06 samples/sec   Loss 5.5819   LearningRate 0.0225   Epoch: 10   Global Step: 175310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:41,189-Speed 9568.90 samples/sec   Loss 5.5567   LearningRate 0.0225   Epoch: 10   Global Step: 175320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:42,268-Speed 9499.41 samples/sec   Loss 5.5510   LearningRate 0.0225   Epoch: 10   Global Step: 175330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:43,327-Speed 9683.23 samples/sec   Loss 5.6016   LearningRate 0.0225   Epoch: 10   Global Step: 175340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:44,401-Speed 9531.87 samples/sec   Loss 5.6367   LearningRate 0.0225   Epoch: 10   Global Step: 175350   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:36:45,453-Speed 9743.57 samples/sec   Loss 5.6344   LearningRate 0.0225   Epoch: 10   Global Step: 175360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:46,544-Speed 9393.76 samples/sec   Loss 5.5321   LearningRate 0.0225   Epoch: 10   Global Step: 175370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:47,594-Speed 9754.31 samples/sec   Loss 5.5255   LearningRate 0.0225   Epoch: 10   Global Step: 175380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:48,696-Speed 9297.58 samples/sec   Loss 5.5818   LearningRate 0.0225   Epoch: 10   Global Step: 175390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:49,810-Speed 9198.91 samples/sec   Loss 5.6840   LearningRate 0.0225   Epoch: 10   Global Step: 175400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:50,936-Speed 9100.55 samples/sec   Loss 5.6639   LearningRate 0.0225   Epoch: 10   Global Step: 175410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:52,003-Speed 9598.34 samples/sec   Loss 5.6409   LearningRate 0.0225   Epoch: 10   Global Step: 175420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:53,101-Speed 9328.75 samples/sec   Loss 5.7135   LearningRate 0.0225   Epoch: 10   Global Step: 175430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:54,200-Speed 9322.31 samples/sec   Loss 5.5544   LearningRate 0.0225   Epoch: 10   Global Step: 175440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:55,254-Speed 9721.38 samples/sec   Loss 5.6983   LearningRate 0.0225   Epoch: 10   Global Step: 175450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:36:56,360-Speed 9268.37 samples/sec   Loss 5.5833   LearningRate 0.0225   Epoch: 10   Global Step: 175460   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:36:57,463-Speed 9291.01 samples/sec   Loss 5.6317   LearningRate 0.0225   Epoch: 10   Global Step: 175470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:58,577-Speed 9200.03 samples/sec   Loss 5.7184   LearningRate 0.0225   Epoch: 10   Global Step: 175480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:36:59,661-Speed 9452.44 samples/sec   Loss 5.5428   LearningRate 0.0225   Epoch: 10   Global Step: 175490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:37:00,726-Speed 9615.97 samples/sec   Loss 5.8387   LearningRate 0.0225   Epoch: 10   Global Step: 175500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:37:01,826-Speed 9315.07 samples/sec   Loss 5.6147   LearningRate 0.0225   Epoch: 10   Global Step: 175510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:37:02,916-Speed 9405.38 samples/sec   Loss 5.6912   LearningRate 0.0225   Epoch: 10   Global Step: 175520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:37:04,002-Speed 9435.18 samples/sec   Loss 5.5784   LearningRate 0.0225   Epoch: 10   Global Step: 175530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:37:05,086-Speed 9447.24 samples/sec   Loss 5.6207   LearningRate 0.0225   Epoch: 10   Global Step: 175540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:37:06,142-Speed 9698.49 samples/sec   Loss 5.4579   LearningRate 0.0225   Epoch: 10   Global Step: 175550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:37:07,235-Speed 9380.69 samples/sec   Loss 5.6327   LearningRate 0.0225   Epoch: 10   Global Step: 175560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:37:08,392-Speed 8855.31 samples/sec   Loss 5.6691   LearningRate 0.0225   Epoch: 10   Global Step: 175570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:09,458-Speed 9614.90 samples/sec   Loss 5.7077   LearningRate 0.0225   Epoch: 10   Global Step: 175580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:10,513-Speed 9704.31 samples/sec   Loss 5.6884   LearningRate 0.0225   Epoch: 10   Global Step: 175590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:11,604-Speed 9398.29 samples/sec   Loss 5.6767   LearningRate 0.0225   Epoch: 10   Global Step: 175600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:12,731-Speed 9087.00 samples/sec   Loss 5.6179   LearningRate 0.0225   Epoch: 10   Global Step: 175610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:13,808-Speed 9513.78 samples/sec   Loss 5.5571   LearningRate 0.0225   Epoch: 10   Global Step: 175620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:14,861-Speed 9734.53 samples/sec   Loss 5.7246   LearningRate 0.0225   Epoch: 10   Global Step: 175630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:15,961-Speed 9312.81 samples/sec   Loss 5.6290   LearningRate 0.0225   Epoch: 10   Global Step: 175640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:17,069-Speed 9244.63 samples/sec   Loss 5.7200   LearningRate 0.0225   Epoch: 10   Global Step: 175650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:18,186-Speed 9172.58 samples/sec   Loss 5.5802   LearningRate 0.0224   Epoch: 10   Global Step: 175660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:19,297-Speed 9225.19 samples/sec   Loss 5.6374   LearningRate 0.0224   Epoch: 10   Global Step: 175670   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:37:20,373-Speed 9524.50 samples/sec   Loss 5.6585   LearningRate 0.0224   Epoch: 10   Global Step: 175680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:21,459-Speed 9435.95 samples/sec   Loss 5.6588   LearningRate 0.0224   Epoch: 10   Global Step: 175690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:22,507-Speed 9774.22 samples/sec   Loss 5.5977   LearningRate 0.0224   Epoch: 10   Global Step: 175700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:23,621-Speed 9202.08 samples/sec   Loss 5.5534   LearningRate 0.0224   Epoch: 10   Global Step: 175710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:24,691-Speed 9576.28 samples/sec   Loss 5.7039   LearningRate 0.0224   Epoch: 10   Global Step: 175720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:25,777-Speed 9431.91 samples/sec   Loss 5.6603   LearningRate 0.0224   Epoch: 10   Global Step: 175730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:26,832-Speed 9715.14 samples/sec   Loss 5.5905   LearningRate 0.0224   Epoch: 10   Global Step: 175740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:27,889-Speed 9692.70 samples/sec   Loss 5.6108   LearningRate 0.0224   Epoch: 10   Global Step: 175750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:28,954-Speed 9622.06 samples/sec   Loss 5.5740   LearningRate 0.0224   Epoch: 10   Global Step: 175760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:30,085-Speed 9059.17 samples/sec   Loss 5.5795   LearningRate 0.0224   Epoch: 10   Global Step: 175770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:31,198-Speed 9199.83 samples/sec   Loss 5.8018   LearningRate 0.0224   Epoch: 10   Global Step: 175780   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:37:32,273-Speed 9533.39 samples/sec   Loss 5.6908   LearningRate 0.0224   Epoch: 10   Global Step: 175790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:33,406-Speed 9040.88 samples/sec   Loss 5.6513   LearningRate 0.0224   Epoch: 10   Global Step: 175800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:34,514-Speed 9248.06 samples/sec   Loss 5.5340   LearningRate 0.0224   Epoch: 10   Global Step: 175810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:35,602-Speed 9419.55 samples/sec   Loss 5.6526   LearningRate 0.0224   Epoch: 10   Global Step: 175820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:36,675-Speed 9551.67 samples/sec   Loss 5.6359   LearningRate 0.0224   Epoch: 10   Global Step: 175830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:37,747-Speed 9556.63 samples/sec   Loss 5.5887   LearningRate 0.0224   Epoch: 10   Global Step: 175840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:38,839-Speed 9391.36 samples/sec   Loss 5.6507   LearningRate 0.0224   Epoch: 10   Global Step: 175850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:39,923-Speed 9455.77 samples/sec   Loss 5.6767   LearningRate 0.0224   Epoch: 10   Global Step: 175860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:41,008-Speed 9440.35 samples/sec   Loss 5.6785   LearningRate 0.0224   Epoch: 10   Global Step: 175870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:42,116-Speed 9246.56 samples/sec   Loss 5.6615   LearningRate 0.0224   Epoch: 10   Global Step: 175880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:43,196-Speed 9489.25 samples/sec   Loss 5.6381   LearningRate 0.0224   Epoch: 10   Global Step: 175890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:44,313-Speed 9172.32 samples/sec   Loss 5.6604   LearningRate 0.0224   Epoch: 10   Global Step: 175900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:45,379-Speed 9612.30 samples/sec   Loss 5.5885   LearningRate 0.0224   Epoch: 10   Global Step: 175910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:46,442-Speed 9643.79 samples/sec   Loss 5.5998   LearningRate 0.0224   Epoch: 10   Global Step: 175920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:47,537-Speed 9352.18 samples/sec   Loss 5.6579   LearningRate 0.0224   Epoch: 10   Global Step: 175930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:48,634-Speed 9341.91 samples/sec   Loss 5.4957   LearningRate 0.0224   Epoch: 10   Global Step: 175940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:49,756-Speed 9128.49 samples/sec   Loss 5.5974   LearningRate 0.0224   Epoch: 10   Global Step: 175950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:50,847-Speed 9389.52 samples/sec   Loss 5.6044   LearningRate 0.0224   Epoch: 10   Global Step: 175960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:51,936-Speed 9409.04 samples/sec   Loss 5.5227   LearningRate 0.0224   Epoch: 10   Global Step: 175970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:53,075-Speed 9000.09 samples/sec   Loss 5.5337   LearningRate 0.0224   Epoch: 10   Global Step: 175980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:37:54,171-Speed 9346.19 samples/sec   Loss 5.6387   LearningRate 0.0224   Epoch: 10   Global Step: 175990   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:37:55,248-Speed 9514.58 samples/sec   Loss 5.7409   LearningRate 0.0224   Epoch: 10   Global Step: 176000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:38:17,021-[lfw][176000]XNorm: 9.283242
Training: 2022-04-11 18:38:17,022-[lfw][176000]Accuracy-Flip: 0.99550+-0.00279
Training: 2022-04-11 18:38:17,023-[lfw][176000]Accuracy-Highest: 0.99683
Training: 2022-04-11 18:38:42,188-[cfp_fp][176000]XNorm: 8.007009
Training: 2022-04-11 18:38:42,189-[cfp_fp][176000]Accuracy-Flip: 0.96100+-0.01002
Training: 2022-04-11 18:38:42,189-[cfp_fp][176000]Accuracy-Highest: 0.96500
Training: 2022-04-11 18:39:03,905-[agedb_30][176000]XNorm: 8.989899
Training: 2022-04-11 18:39:03,906-[agedb_30][176000]Accuracy-Flip: 0.96550+-0.00931
Training: 2022-04-11 18:39:03,906-[agedb_30][176000]Accuracy-Highest: 0.96917
Training: 2022-04-11 18:39:04,959-Speed 146.89 samples/sec   Loss 5.5873   LearningRate 0.0223   Epoch: 10   Global Step: 176010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:06,004-Speed 9795.05 samples/sec   Loss 5.6207   LearningRate 0.0223   Epoch: 10   Global Step: 176020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:07,064-Speed 9666.30 samples/sec   Loss 5.6416   LearningRate 0.0223   Epoch: 10   Global Step: 176030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:08,184-Speed 9151.08 samples/sec   Loss 5.5674   LearningRate 0.0223   Epoch: 10   Global Step: 176040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:09,266-Speed 9471.01 samples/sec   Loss 5.7244   LearningRate 0.0223   Epoch: 10   Global Step: 176050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:10,354-Speed 9412.49 samples/sec   Loss 5.5498   LearningRate 0.0223   Epoch: 10   Global Step: 176060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:11,436-Speed 9468.07 samples/sec   Loss 5.5895   LearningRate 0.0223   Epoch: 10   Global Step: 176070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:12,504-Speed 9598.30 samples/sec   Loss 5.6402   LearningRate 0.0223   Epoch: 10   Global Step: 176080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:13,619-Speed 9185.01 samples/sec   Loss 5.6699   LearningRate 0.0223   Epoch: 10   Global Step: 176090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:14,681-Speed 9649.68 samples/sec   Loss 5.6574   LearningRate 0.0223   Epoch: 10   Global Step: 176100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:15,765-Speed 9450.17 samples/sec   Loss 5.7409   LearningRate 0.0223   Epoch: 10   Global Step: 176110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:16,833-Speed 9599.32 samples/sec   Loss 5.5235   LearningRate 0.0223   Epoch: 10   Global Step: 176120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:17,973-Speed 8991.26 samples/sec   Loss 5.6368   LearningRate 0.0223   Epoch: 10   Global Step: 176130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:19,058-Speed 9439.56 samples/sec   Loss 5.5672   LearningRate 0.0223   Epoch: 10   Global Step: 176140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:20,173-Speed 9190.42 samples/sec   Loss 5.6815   LearningRate 0.0223   Epoch: 10   Global Step: 176150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:21,247-Speed 9536.17 samples/sec   Loss 5.6182   LearningRate 0.0223   Epoch: 10   Global Step: 176160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:39:22,363-Speed 9177.78 samples/sec   Loss 5.4764   LearningRate 0.0223   Epoch: 10   Global Step: 176170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:39:23,451-Speed 9426.30 samples/sec   Loss 5.6420   LearningRate 0.0223   Epoch: 10   Global Step: 176180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:39:24,587-Speed 9015.86 samples/sec   Loss 5.7506   LearningRate 0.0223   Epoch: 10   Global Step: 176190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:39:25,693-Speed 9269.39 samples/sec   Loss 5.5643   LearningRate 0.0223   Epoch: 10   Global Step: 176200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:39:26,789-Speed 9348.24 samples/sec   Loss 5.7092   LearningRate 0.0223   Epoch: 10   Global Step: 176210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:39:27,858-Speed 9589.40 samples/sec   Loss 5.6900   LearningRate 0.0223   Epoch: 10   Global Step: 176220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:39:28,959-Speed 9302.91 samples/sec   Loss 5.7405   LearningRate 0.0223   Epoch: 10   Global Step: 176230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:39:30,053-Speed 9361.98 samples/sec   Loss 5.6190   LearningRate 0.0223   Epoch: 10   Global Step: 176240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:39:31,089-Speed 9890.17 samples/sec   Loss 5.6986   LearningRate 0.0223   Epoch: 10   Global Step: 176250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:39:32,188-Speed 9327.10 samples/sec   Loss 5.5847   LearningRate 0.0223   Epoch: 10   Global Step: 176260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:33,332-Speed 8950.29 samples/sec   Loss 5.5840   LearningRate 0.0223   Epoch: 10   Global Step: 176270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:34,412-Speed 9485.03 samples/sec   Loss 5.5823   LearningRate 0.0223   Epoch: 10   Global Step: 176280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:35,529-Speed 9180.04 samples/sec   Loss 5.6069   LearningRate 0.0223   Epoch: 10   Global Step: 176290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:36,622-Speed 9376.32 samples/sec   Loss 5.7007   LearningRate 0.0223   Epoch: 10   Global Step: 176300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:37,726-Speed 9282.19 samples/sec   Loss 5.6093   LearningRate 0.0223   Epoch: 10   Global Step: 176310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:38,871-Speed 8948.72 samples/sec   Loss 5.6192   LearningRate 0.0223   Epoch: 10   Global Step: 176320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:40,036-Speed 8791.31 samples/sec   Loss 5.6347   LearningRate 0.0223   Epoch: 10   Global Step: 176330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:41,136-Speed 9320.53 samples/sec   Loss 5.6088   LearningRate 0.0223   Epoch: 10   Global Step: 176340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:42,245-Speed 9236.05 samples/sec   Loss 5.5839   LearningRate 0.0223   Epoch: 10   Global Step: 176350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:43,329-Speed 9456.02 samples/sec   Loss 5.5815   LearningRate 0.0222   Epoch: 10   Global Step: 176360   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:39:44,433-Speed 9277.45 samples/sec   Loss 5.6291   LearningRate 0.0222   Epoch: 10   Global Step: 176370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:45,532-Speed 9322.75 samples/sec   Loss 5.5633   LearningRate 0.0222   Epoch: 10   Global Step: 176380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:46,639-Speed 9260.30 samples/sec   Loss 5.7331   LearningRate 0.0222   Epoch: 10   Global Step: 176390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:47,750-Speed 9221.52 samples/sec   Loss 5.6109   LearningRate 0.0222   Epoch: 10   Global Step: 176400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:48,840-Speed 9406.29 samples/sec   Loss 5.6082   LearningRate 0.0222   Epoch: 10   Global Step: 176410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:49,914-Speed 9532.41 samples/sec   Loss 5.6885   LearningRate 0.0222   Epoch: 10   Global Step: 176420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:51,073-Speed 8844.48 samples/sec   Loss 5.7015   LearningRate 0.0222   Epoch: 10   Global Step: 176430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:52,186-Speed 9202.82 samples/sec   Loss 5.6878   LearningRate 0.0222   Epoch: 10   Global Step: 176440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:53,292-Speed 9268.33 samples/sec   Loss 5.6956   LearningRate 0.0222   Epoch: 10   Global Step: 176450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:54,407-Speed 9190.47 samples/sec   Loss 5.6092   LearningRate 0.0222   Epoch: 10   Global Step: 176460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:55,522-Speed 9185.47 samples/sec   Loss 5.5872   LearningRate 0.0222   Epoch: 10   Global Step: 176470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:56,596-Speed 9541.10 samples/sec   Loss 5.6440   LearningRate 0.0222   Epoch: 10   Global Step: 176480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:57,690-Speed 9362.25 samples/sec   Loss 5.7211   LearningRate 0.0222   Epoch: 10   Global Step: 176490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:58,762-Speed 9556.37 samples/sec   Loss 5.6030   LearningRate 0.0222   Epoch: 10   Global Step: 176500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:39:59,889-Speed 9093.80 samples/sec   Loss 5.5993   LearningRate 0.0222   Epoch: 10   Global Step: 176510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:01,018-Speed 9072.40 samples/sec   Loss 5.6482   LearningRate 0.0222   Epoch: 10   Global Step: 176520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:02,121-Speed 9296.03 samples/sec   Loss 5.5816   LearningRate 0.0222   Epoch: 10   Global Step: 176530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:03,252-Speed 9054.46 samples/sec   Loss 5.6626   LearningRate 0.0222   Epoch: 10   Global Step: 176540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:04,319-Speed 9602.92 samples/sec   Loss 5.6804   LearningRate 0.0222   Epoch: 10   Global Step: 176550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:05,366-Speed 9796.21 samples/sec   Loss 5.6762   LearningRate 0.0222   Epoch: 10   Global Step: 176560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:06,496-Speed 9061.29 samples/sec   Loss 5.5991   LearningRate 0.0222   Epoch: 10   Global Step: 176570   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:40:07,581-Speed 9447.49 samples/sec   Loss 5.6493   LearningRate 0.0222   Epoch: 10   Global Step: 176580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:08,656-Speed 9530.12 samples/sec   Loss 5.5905   LearningRate 0.0222   Epoch: 10   Global Step: 176590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:09,727-Speed 9571.12 samples/sec   Loss 5.5832   LearningRate 0.0222   Epoch: 10   Global Step: 176600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:10,825-Speed 9325.84 samples/sec   Loss 5.6079   LearningRate 0.0222   Epoch: 10   Global Step: 176610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:11,912-Speed 9422.03 samples/sec   Loss 5.6760   LearningRate 0.0222   Epoch: 10   Global Step: 176620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:12,969-Speed 9700.05 samples/sec   Loss 5.5921   LearningRate 0.0222   Epoch: 10   Global Step: 176630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:14,122-Speed 8882.65 samples/sec   Loss 5.6668   LearningRate 0.0222   Epoch: 10   Global Step: 176640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:15,217-Speed 9358.96 samples/sec   Loss 5.5178   LearningRate 0.0222   Epoch: 10   Global Step: 176650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:16,307-Speed 9401.31 samples/sec   Loss 5.6215   LearningRate 0.0222   Epoch: 10   Global Step: 176660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:17,384-Speed 9517.65 samples/sec   Loss 5.5758   LearningRate 0.0222   Epoch: 10   Global Step: 176670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:18,515-Speed 9058.83 samples/sec   Loss 5.5817   LearningRate 0.0222   Epoch: 10   Global Step: 176680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:19,602-Speed 9424.02 samples/sec   Loss 5.5608   LearningRate 0.0222   Epoch: 10   Global Step: 176690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:20,671-Speed 9582.82 samples/sec   Loss 5.6230   LearningRate 0.0222   Epoch: 10   Global Step: 176700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:21,754-Speed 9461.12 samples/sec   Loss 5.6215   LearningRate 0.0222   Epoch: 10   Global Step: 176710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:22,846-Speed 9381.66 samples/sec   Loss 5.6143   LearningRate 0.0221   Epoch: 10   Global Step: 176720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:23,980-Speed 9040.93 samples/sec   Loss 5.7216   LearningRate 0.0221   Epoch: 10   Global Step: 176730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:25,044-Speed 9635.13 samples/sec   Loss 5.6030   LearningRate 0.0221   Epoch: 10   Global Step: 176740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:26,110-Speed 9609.58 samples/sec   Loss 5.6095   LearningRate 0.0221   Epoch: 10   Global Step: 176750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:27,201-Speed 9387.42 samples/sec   Loss 5.6838   LearningRate 0.0221   Epoch: 10   Global Step: 176760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:28,266-Speed 9624.25 samples/sec   Loss 5.6826   LearningRate 0.0221   Epoch: 10   Global Step: 176770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:29,399-Speed 9039.86 samples/sec   Loss 5.7198   LearningRate 0.0221   Epoch: 10   Global Step: 176780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:30,455-Speed 9709.72 samples/sec   Loss 5.7016   LearningRate 0.0221   Epoch: 10   Global Step: 176790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:31,543-Speed 9412.89 samples/sec   Loss 5.6474   LearningRate 0.0221   Epoch: 10   Global Step: 176800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:32,601-Speed 9685.01 samples/sec   Loss 5.5905   LearningRate 0.0221   Epoch: 10   Global Step: 176810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:33,678-Speed 9513.93 samples/sec   Loss 5.6407   LearningRate 0.0221   Epoch: 10   Global Step: 176820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:40:34,763-Speed 9436.85 samples/sec   Loss 5.6841   LearningRate 0.0221   Epoch: 10   Global Step: 176830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:40:35,849-Speed 9440.98 samples/sec   Loss 5.6408   LearningRate 0.0221   Epoch: 10   Global Step: 176840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:40:36,927-Speed 9506.65 samples/sec   Loss 5.6145   LearningRate 0.0221   Epoch: 10   Global Step: 176850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:40:38,020-Speed 9372.39 samples/sec   Loss 5.6073   LearningRate 0.0221   Epoch: 10   Global Step: 176860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:40:39,093-Speed 9549.71 samples/sec   Loss 5.7179   LearningRate 0.0221   Epoch: 10   Global Step: 176870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:40:40,153-Speed 9659.47 samples/sec   Loss 5.6832   LearningRate 0.0221   Epoch: 10   Global Step: 176880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:40:41,215-Speed 9647.55 samples/sec   Loss 5.5906   LearningRate 0.0221   Epoch: 10   Global Step: 176890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:40:42,308-Speed 9377.39 samples/sec   Loss 5.6645   LearningRate 0.0221   Epoch: 10   Global Step: 176900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:40:43,406-Speed 9328.13 samples/sec   Loss 5.7371   LearningRate 0.0221   Epoch: 10   Global Step: 176910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:40:44,502-Speed 9351.92 samples/sec   Loss 5.5605   LearningRate 0.0221   Epoch: 10   Global Step: 176920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:45,607-Speed 9267.52 samples/sec   Loss 5.6809   LearningRate 0.0221   Epoch: 10   Global Step: 176930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:46,694-Speed 9429.12 samples/sec   Loss 5.5599   LearningRate 0.0221   Epoch: 10   Global Step: 176940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:47,778-Speed 9455.47 samples/sec   Loss 5.7195   LearningRate 0.0221   Epoch: 10   Global Step: 176950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:48,872-Speed 9368.36 samples/sec   Loss 5.6693   LearningRate 0.0221   Epoch: 10   Global Step: 176960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:49,986-Speed 9192.64 samples/sec   Loss 5.5787   LearningRate 0.0221   Epoch: 10   Global Step: 176970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:51,093-Speed 9258.63 samples/sec   Loss 5.6221   LearningRate 0.0221   Epoch: 10   Global Step: 176980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:52,171-Speed 9507.88 samples/sec   Loss 5.6414   LearningRate 0.0221   Epoch: 10   Global Step: 176990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:53,321-Speed 8913.76 samples/sec   Loss 5.6257   LearningRate 0.0221   Epoch: 10   Global Step: 177000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:54,422-Speed 9308.62 samples/sec   Loss 5.6782   LearningRate 0.0221   Epoch: 10   Global Step: 177010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:55,516-Speed 9369.82 samples/sec   Loss 5.6960   LearningRate 0.0221   Epoch: 10   Global Step: 177020   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:40:56,595-Speed 9493.77 samples/sec   Loss 5.6874   LearningRate 0.0221   Epoch: 10   Global Step: 177030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:57,674-Speed 9492.32 samples/sec   Loss 5.5657   LearningRate 0.0221   Epoch: 10   Global Step: 177040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:58,754-Speed 9483.93 samples/sec   Loss 5.6215   LearningRate 0.0221   Epoch: 10   Global Step: 177050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:40:59,875-Speed 9137.66 samples/sec   Loss 5.5971   LearningRate 0.0221   Epoch: 10   Global Step: 177060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:00,941-Speed 9619.80 samples/sec   Loss 5.6483   LearningRate 0.0220   Epoch: 10   Global Step: 177070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:02,070-Speed 9069.17 samples/sec   Loss 5.6512   LearningRate 0.0220   Epoch: 10   Global Step: 177080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:03,117-Speed 9790.60 samples/sec   Loss 5.6215   LearningRate 0.0220   Epoch: 10   Global Step: 177090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:04,237-Speed 9144.84 samples/sec   Loss 5.7160   LearningRate 0.0220   Epoch: 10   Global Step: 177100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:41:05,329-Speed 9386.64 samples/sec   Loss 5.5712   LearningRate 0.0220   Epoch: 10   Global Step: 177110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:41:06,442-Speed 9205.08 samples/sec   Loss 5.7283   LearningRate 0.0220   Epoch: 10   Global Step: 177120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:41:07,562-Speed 9149.66 samples/sec   Loss 5.6099   LearningRate 0.0220   Epoch: 10   Global Step: 177130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:41:08,644-Speed 9466.40 samples/sec   Loss 5.7241   LearningRate 0.0220   Epoch: 10   Global Step: 177140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:41:09,731-Speed 9428.11 samples/sec   Loss 5.5610   LearningRate 0.0220   Epoch: 10   Global Step: 177150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:41:10,800-Speed 9588.48 samples/sec   Loss 5.5997   LearningRate 0.0220   Epoch: 10   Global Step: 177160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:41:11,882-Speed 9470.07 samples/sec   Loss 5.6210   LearningRate 0.0220   Epoch: 10   Global Step: 177170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:41:12,995-Speed 9197.98 samples/sec   Loss 5.6832   LearningRate 0.0220   Epoch: 10   Global Step: 177180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:41:14,081-Speed 9441.38 samples/sec   Loss 5.6487   LearningRate 0.0220   Epoch: 10   Global Step: 177190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:41:15,140-Speed 9673.35 samples/sec   Loss 5.5713   LearningRate 0.0220   Epoch: 10   Global Step: 177200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:16,246-Speed 9263.03 samples/sec   Loss 5.7188   LearningRate 0.0220   Epoch: 10   Global Step: 177210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:17,337-Speed 9402.95 samples/sec   Loss 5.6455   LearningRate 0.0220   Epoch: 10   Global Step: 177220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:18,443-Speed 9265.07 samples/sec   Loss 5.5972   LearningRate 0.0220   Epoch: 10   Global Step: 177230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:19,513-Speed 9571.78 samples/sec   Loss 5.5954   LearningRate 0.0220   Epoch: 10   Global Step: 177240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:20,598-Speed 9445.53 samples/sec   Loss 5.5056   LearningRate 0.0220   Epoch: 10   Global Step: 177250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:21,670-Speed 9560.30 samples/sec   Loss 5.5789   LearningRate 0.0220   Epoch: 10   Global Step: 177260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:22,762-Speed 9379.36 samples/sec   Loss 5.6260   LearningRate 0.0220   Epoch: 10   Global Step: 177270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:23,850-Speed 9420.44 samples/sec   Loss 5.5602   LearningRate 0.0220   Epoch: 10   Global Step: 177280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:24,926-Speed 9528.97 samples/sec   Loss 5.5937   LearningRate 0.0220   Epoch: 10   Global Step: 177290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:26,003-Speed 9514.45 samples/sec   Loss 5.6996   LearningRate 0.0220   Epoch: 10   Global Step: 177300   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:41:27,095-Speed 9376.46 samples/sec   Loss 5.6338   LearningRate 0.0220   Epoch: 10   Global Step: 177310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:28,226-Speed 9059.30 samples/sec   Loss 5.6042   LearningRate 0.0220   Epoch: 10   Global Step: 177320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:29,292-Speed 9613.73 samples/sec   Loss 5.7469   LearningRate 0.0220   Epoch: 10   Global Step: 177330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:30,332-Speed 9849.81 samples/sec   Loss 5.6728   LearningRate 0.0220   Epoch: 10   Global Step: 177340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:31,435-Speed 9291.01 samples/sec   Loss 5.7280   LearningRate 0.0220   Epoch: 10   Global Step: 177350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:32,524-Speed 9403.32 samples/sec   Loss 5.6625   LearningRate 0.0220   Epoch: 10   Global Step: 177360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:33,594-Speed 9583.29 samples/sec   Loss 5.7124   LearningRate 0.0220   Epoch: 10   Global Step: 177370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:34,671-Speed 9508.11 samples/sec   Loss 5.6668   LearningRate 0.0220   Epoch: 10   Global Step: 177380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:35,723-Speed 9746.13 samples/sec   Loss 5.6383   LearningRate 0.0220   Epoch: 10   Global Step: 177390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:36,813-Speed 9401.05 samples/sec   Loss 5.6701   LearningRate 0.0220   Epoch: 10   Global Step: 177400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:37,943-Speed 9065.08 samples/sec   Loss 5.6433   LearningRate 0.0220   Epoch: 10   Global Step: 177410   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:41:39,022-Speed 9500.38 samples/sec   Loss 5.7070   LearningRate 0.0220   Epoch: 10   Global Step: 177420   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:41:40,099-Speed 9512.02 samples/sec   Loss 5.6636   LearningRate 0.0219   Epoch: 10   Global Step: 177430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:41,180-Speed 9482.72 samples/sec   Loss 5.5542   LearningRate 0.0219   Epoch: 10   Global Step: 177440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:42,301-Speed 9136.43 samples/sec   Loss 5.6727   LearningRate 0.0219   Epoch: 10   Global Step: 177450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:43,406-Speed 9274.72 samples/sec   Loss 5.6419   LearningRate 0.0219   Epoch: 10   Global Step: 177460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:44,513-Speed 9257.31 samples/sec   Loss 5.7026   LearningRate 0.0219   Epoch: 10   Global Step: 177470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:45,584-Speed 9561.17 samples/sec   Loss 5.5812   LearningRate 0.0219   Epoch: 10   Global Step: 177480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:46,673-Speed 9416.46 samples/sec   Loss 5.6783   LearningRate 0.0219   Epoch: 10   Global Step: 177490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:47,778-Speed 9272.47 samples/sec   Loss 5.6623   LearningRate 0.0219   Epoch: 10   Global Step: 177500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:48,881-Speed 9281.87 samples/sec   Loss 5.5880   LearningRate 0.0219   Epoch: 10   Global Step: 177510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:49,954-Speed 9551.94 samples/sec   Loss 5.7698   LearningRate 0.0219   Epoch: 10   Global Step: 177520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:51,027-Speed 9545.82 samples/sec   Loss 5.6682   LearningRate 0.0219   Epoch: 10   Global Step: 177530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:52,128-Speed 9303.48 samples/sec   Loss 5.5857   LearningRate 0.0219   Epoch: 10   Global Step: 177540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:53,202-Speed 9551.27 samples/sec   Loss 5.6878   LearningRate 0.0219   Epoch: 10   Global Step: 177550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:54,294-Speed 9379.84 samples/sec   Loss 5.6398   LearningRate 0.0219   Epoch: 10   Global Step: 177560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:55,402-Speed 9250.16 samples/sec   Loss 5.6622   LearningRate 0.0219   Epoch: 10   Global Step: 177570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:56,472-Speed 9578.62 samples/sec   Loss 5.6084   LearningRate 0.0219   Epoch: 10   Global Step: 177580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:57,538-Speed 9609.49 samples/sec   Loss 5.6146   LearningRate 0.0219   Epoch: 10   Global Step: 177590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:58,614-Speed 9528.94 samples/sec   Loss 5.6426   LearningRate 0.0219   Epoch: 10   Global Step: 177600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:41:59,673-Speed 9670.43 samples/sec   Loss 5.7032   LearningRate 0.0219   Epoch: 10   Global Step: 177610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:00,775-Speed 9300.96 samples/sec   Loss 5.6798   LearningRate 0.0219   Epoch: 10   Global Step: 177620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:01,882-Speed 9259.07 samples/sec   Loss 5.6443   LearningRate 0.0219   Epoch: 10   Global Step: 177630   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:42:02,937-Speed 9711.08 samples/sec   Loss 5.6303   LearningRate 0.0219   Epoch: 10   Global Step: 177640   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:42:04,022-Speed 9443.09 samples/sec   Loss 5.6054   LearningRate 0.0219   Epoch: 10   Global Step: 177650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:05,104-Speed 9465.56 samples/sec   Loss 5.6080   LearningRate 0.0219   Epoch: 10   Global Step: 177660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:42:06,217-Speed 9210.48 samples/sec   Loss 5.6838   LearningRate 0.0219   Epoch: 10   Global Step: 177670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:42:07,295-Speed 9500.76 samples/sec   Loss 5.6428   LearningRate 0.0219   Epoch: 10   Global Step: 177680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:42:08,477-Speed 8672.84 samples/sec   Loss 5.7342   LearningRate 0.0219   Epoch: 10   Global Step: 177690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:42:09,590-Speed 9204.93 samples/sec   Loss 5.6431   LearningRate 0.0219   Epoch: 10   Global Step: 177700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:42:10,694-Speed 9278.58 samples/sec   Loss 5.6919   LearningRate 0.0219   Epoch: 10   Global Step: 177710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:42:11,793-Speed 9325.70 samples/sec   Loss 5.6387   LearningRate 0.0219   Epoch: 10   Global Step: 177720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:42:12,894-Speed 9304.88 samples/sec   Loss 5.5788   LearningRate 0.0219   Epoch: 10   Global Step: 177730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:42:14,020-Speed 9106.48 samples/sec   Loss 5.5696   LearningRate 0.0219   Epoch: 10   Global Step: 177740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:42:15,124-Speed 9279.07 samples/sec   Loss 5.5716   LearningRate 0.0219   Epoch: 10   Global Step: 177750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:42:16,201-Speed 9509.23 samples/sec   Loss 5.6893   LearningRate 0.0219   Epoch: 10   Global Step: 177760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:17,322-Speed 9158.73 samples/sec   Loss 5.6187   LearningRate 0.0219   Epoch: 10   Global Step: 177770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:18,442-Speed 9146.28 samples/sec   Loss 5.5252   LearningRate 0.0218   Epoch: 10   Global Step: 177780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:19,529-Speed 9429.52 samples/sec   Loss 5.7383   LearningRate 0.0218   Epoch: 10   Global Step: 177790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:20,609-Speed 9486.58 samples/sec   Loss 5.6875   LearningRate 0.0218   Epoch: 10   Global Step: 177800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:21,710-Speed 9306.73 samples/sec   Loss 5.6540   LearningRate 0.0218   Epoch: 10   Global Step: 177810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:22,800-Speed 9396.20 samples/sec   Loss 5.7208   LearningRate 0.0218   Epoch: 10   Global Step: 177820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:23,865-Speed 9625.46 samples/sec   Loss 5.6828   LearningRate 0.0218   Epoch: 10   Global Step: 177830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:24,935-Speed 9573.53 samples/sec   Loss 5.5604   LearningRate 0.0218   Epoch: 10   Global Step: 177840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:26,027-Speed 9380.00 samples/sec   Loss 5.5758   LearningRate 0.0218   Epoch: 10   Global Step: 177850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:27,143-Speed 9186.86 samples/sec   Loss 5.6067   LearningRate 0.0218   Epoch: 10   Global Step: 177860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:28,231-Speed 9419.06 samples/sec   Loss 5.6172   LearningRate 0.0218   Epoch: 10   Global Step: 177870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:29,286-Speed 9711.13 samples/sec   Loss 5.7662   LearningRate 0.0218   Epoch: 10   Global Step: 177880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:30,371-Speed 9443.93 samples/sec   Loss 5.7206   LearningRate 0.0218   Epoch: 10   Global Step: 177890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:31,460-Speed 9407.35 samples/sec   Loss 5.5881   LearningRate 0.0218   Epoch: 10   Global Step: 177900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:32,619-Speed 8845.02 samples/sec   Loss 5.6345   LearningRate 0.0218   Epoch: 10   Global Step: 177910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:33,707-Speed 9416.71 samples/sec   Loss 5.6762   LearningRate 0.0218   Epoch: 10   Global Step: 177920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:34,806-Speed 9323.09 samples/sec   Loss 5.7180   LearningRate 0.0218   Epoch: 10   Global Step: 177930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:35,901-Speed 9359.88 samples/sec   Loss 5.6263   LearningRate 0.0218   Epoch: 10   Global Step: 177940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:37,018-Speed 9167.00 samples/sec   Loss 5.5218   LearningRate 0.0218   Epoch: 10   Global Step: 177950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:38,130-Speed 9216.97 samples/sec   Loss 5.5799   LearningRate 0.0218   Epoch: 10   Global Step: 177960   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:42:39,194-Speed 9627.69 samples/sec   Loss 5.5668   LearningRate 0.0218   Epoch: 10   Global Step: 177970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:40,286-Speed 9385.51 samples/sec   Loss 5.5736   LearningRate 0.0218   Epoch: 10   Global Step: 177980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:41,348-Speed 9643.57 samples/sec   Loss 5.5965   LearningRate 0.0218   Epoch: 10   Global Step: 177990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:42:42,496-Speed 8922.72 samples/sec   Loss 5.6176   LearningRate 0.0218   Epoch: 10   Global Step: 178000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:43:04,678-[lfw][178000]XNorm: 9.328771
Training: 2022-04-11 18:43:04,679-[lfw][178000]Accuracy-Flip: 0.99617+-0.00269
Training: 2022-04-11 18:43:04,679-[lfw][178000]Accuracy-Highest: 0.99683
Training: 2022-04-11 18:43:30,233-[cfp_fp][178000]XNorm: 7.926289
Training: 2022-04-11 18:43:30,234-[cfp_fp][178000]Accuracy-Flip: 0.96586+-0.00765
Training: 2022-04-11 18:43:30,234-[cfp_fp][178000]Accuracy-Highest: 0.96586
Training: 2022-04-11 18:43:52,234-[agedb_30][178000]XNorm: 8.942136
Training: 2022-04-11 18:43:52,235-[agedb_30][178000]Accuracy-Flip: 0.96867+-0.01137
Training: 2022-04-11 18:43:52,236-[agedb_30][178000]Accuracy-Highest: 0.96917
Training: 2022-04-11 18:43:53,332-Speed 144.56 samples/sec   Loss 5.6423   LearningRate 0.0218   Epoch: 10   Global Step: 178010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:43:54,433-Speed 9310.02 samples/sec   Loss 5.7036   LearningRate 0.0218   Epoch: 10   Global Step: 178020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:43:55,545-Speed 9210.04 samples/sec   Loss 5.7103   LearningRate 0.0218   Epoch: 10   Global Step: 178030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:43:56,632-Speed 9425.47 samples/sec   Loss 5.6158   LearningRate 0.0218   Epoch: 10   Global Step: 178040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:43:57,685-Speed 9730.95 samples/sec   Loss 5.6494   LearningRate 0.0218   Epoch: 10   Global Step: 178050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:43:58,749-Speed 9630.00 samples/sec   Loss 5.6988   LearningRate 0.0218   Epoch: 10   Global Step: 178060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:43:59,837-Speed 9419.18 samples/sec   Loss 5.5224   LearningRate 0.0218   Epoch: 10   Global Step: 178070   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:44:00,920-Speed 9465.05 samples/sec   Loss 5.7026   LearningRate 0.0218   Epoch: 10   Global Step: 178080   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:44:02,007-Speed 9425.77 samples/sec   Loss 5.6855   LearningRate 0.0218   Epoch: 10   Global Step: 178090   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:44:03,078-Speed 9564.16 samples/sec   Loss 5.6272   LearningRate 0.0218   Epoch: 10   Global Step: 178100   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:44:04,171-Speed 9374.88 samples/sec   Loss 5.6299   LearningRate 0.0218   Epoch: 10   Global Step: 178110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:05,238-Speed 9613.66 samples/sec   Loss 5.5649   LearningRate 0.0218   Epoch: 10   Global Step: 178120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:06,309-Speed 9566.50 samples/sec   Loss 5.5788   LearningRate 0.0218   Epoch: 10   Global Step: 178130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:07,380-Speed 9563.26 samples/sec   Loss 5.5745   LearningRate 0.0217   Epoch: 10   Global Step: 178140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:08,470-Speed 9400.13 samples/sec   Loss 5.7367   LearningRate 0.0217   Epoch: 10   Global Step: 178150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:09,563-Speed 9374.06 samples/sec   Loss 5.6530   LearningRate 0.0217   Epoch: 10   Global Step: 178160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:10,650-Speed 9421.68 samples/sec   Loss 5.5318   LearningRate 0.0217   Epoch: 10   Global Step: 178170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:11,728-Speed 9506.15 samples/sec   Loss 5.5698   LearningRate 0.0217   Epoch: 10   Global Step: 178180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:12,783-Speed 9717.69 samples/sec   Loss 5.6518   LearningRate 0.0217   Epoch: 10   Global Step: 178190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:13,870-Speed 9422.88 samples/sec   Loss 5.6929   LearningRate 0.0217   Epoch: 10   Global Step: 178200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:14,949-Speed 9497.76 samples/sec   Loss 5.5490   LearningRate 0.0217   Epoch: 10   Global Step: 178210   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:44:16,035-Speed 9434.28 samples/sec   Loss 5.5929   LearningRate 0.0217   Epoch: 10   Global Step: 178220   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:44:17,102-Speed 9601.58 samples/sec   Loss 5.5842   LearningRate 0.0217   Epoch: 10   Global Step: 178230   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:44:18,232-Speed 9070.91 samples/sec   Loss 5.6232   LearningRate 0.0217   Epoch: 10   Global Step: 178240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:19,295-Speed 9638.00 samples/sec   Loss 5.6194   LearningRate 0.0217   Epoch: 10   Global Step: 178250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:20,346-Speed 9747.09 samples/sec   Loss 5.6783   LearningRate 0.0217   Epoch: 10   Global Step: 178260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:21,387-Speed 9843.36 samples/sec   Loss 5.6611   LearningRate 0.0217   Epoch: 10   Global Step: 178270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:22,475-Speed 9418.54 samples/sec   Loss 5.5915   LearningRate 0.0217   Epoch: 10   Global Step: 178280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:23,632-Speed 8851.05 samples/sec   Loss 5.7012   LearningRate 0.0217   Epoch: 10   Global Step: 178290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:24,740-Speed 9245.71 samples/sec   Loss 5.7186   LearningRate 0.0217   Epoch: 10   Global Step: 178300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:25,813-Speed 9555.31 samples/sec   Loss 5.6158   LearningRate 0.0217   Epoch: 10   Global Step: 178310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:26,900-Speed 9421.22 samples/sec   Loss 5.6789   LearningRate 0.0217   Epoch: 10   Global Step: 178320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:27,999-Speed 9326.12 samples/sec   Loss 5.6180   LearningRate 0.0217   Epoch: 10   Global Step: 178330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:29,082-Speed 9461.76 samples/sec   Loss 5.5843   LearningRate 0.0217   Epoch: 10   Global Step: 178340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:30,165-Speed 9461.95 samples/sec   Loss 5.7125   LearningRate 0.0217   Epoch: 10   Global Step: 178350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:31,252-Speed 9423.93 samples/sec   Loss 5.6895   LearningRate 0.0217   Epoch: 10   Global Step: 178360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:32,325-Speed 9550.77 samples/sec   Loss 5.5757   LearningRate 0.0217   Epoch: 10   Global Step: 178370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:33,442-Speed 9174.95 samples/sec   Loss 5.6273   LearningRate 0.0217   Epoch: 10   Global Step: 178380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:34,546-Speed 9273.82 samples/sec   Loss 5.6101   LearningRate 0.0217   Epoch: 10   Global Step: 178390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:35,578-Speed 9935.07 samples/sec   Loss 5.6221   LearningRate 0.0217   Epoch: 10   Global Step: 178400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:36,654-Speed 9526.40 samples/sec   Loss 5.7227   LearningRate 0.0217   Epoch: 10   Global Step: 178410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:37,743-Speed 9407.17 samples/sec   Loss 5.6924   LearningRate 0.0217   Epoch: 10   Global Step: 178420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:38,821-Speed 9503.95 samples/sec   Loss 5.7494   LearningRate 0.0217   Epoch: 10   Global Step: 178430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:39,859-Speed 9869.31 samples/sec   Loss 5.5943   LearningRate 0.0217   Epoch: 10   Global Step: 178440   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:44:40,909-Speed 9762.83 samples/sec   Loss 5.6337   LearningRate 0.0217   Epoch: 10   Global Step: 178450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:42,011-Speed 9295.71 samples/sec   Loss 5.6676   LearningRate 0.0217   Epoch: 10   Global Step: 178460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:43,124-Speed 9207.12 samples/sec   Loss 5.5937   LearningRate 0.0217   Epoch: 10   Global Step: 178470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:44,242-Speed 9157.77 samples/sec   Loss 5.6157   LearningRate 0.0217   Epoch: 10   Global Step: 178480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:45,324-Speed 9474.20 samples/sec   Loss 5.5897   LearningRate 0.0217   Epoch: 10   Global Step: 178490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:46,373-Speed 9767.31 samples/sec   Loss 5.6945   LearningRate 0.0216   Epoch: 10   Global Step: 178500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:47,441-Speed 9599.81 samples/sec   Loss 5.6506   LearningRate 0.0216   Epoch: 10   Global Step: 178510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:48,529-Speed 9413.53 samples/sec   Loss 5.6019   LearningRate 0.0216   Epoch: 10   Global Step: 178520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:49,602-Speed 9550.48 samples/sec   Loss 5.6676   LearningRate 0.0216   Epoch: 10   Global Step: 178530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:50,678-Speed 9523.65 samples/sec   Loss 5.6004   LearningRate 0.0216   Epoch: 10   Global Step: 178540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:51,759-Speed 9471.02 samples/sec   Loss 5.6723   LearningRate 0.0216   Epoch: 10   Global Step: 178550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:44:52,848-Speed 9406.54 samples/sec   Loss 5.6847   LearningRate 0.0216   Epoch: 10   Global Step: 178560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:44:53,927-Speed 9499.61 samples/sec   Loss 5.6864   LearningRate 0.0216   Epoch: 10   Global Step: 178570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:44:54,998-Speed 9567.18 samples/sec   Loss 5.5744   LearningRate 0.0216   Epoch: 10   Global Step: 178580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:44:56,096-Speed 9332.88 samples/sec   Loss 5.5241   LearningRate 0.0216   Epoch: 10   Global Step: 178590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:44:57,180-Speed 9460.52 samples/sec   Loss 5.6090   LearningRate 0.0216   Epoch: 10   Global Step: 178600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:44:58,271-Speed 9392.55 samples/sec   Loss 5.6632   LearningRate 0.0216   Epoch: 10   Global Step: 178610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:44:59,324-Speed 9723.99 samples/sec   Loss 5.6681   LearningRate 0.0216   Epoch: 10   Global Step: 178620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:00,409-Speed 9441.51 samples/sec   Loss 5.6815   LearningRate 0.0216   Epoch: 10   Global Step: 178630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:01,472-Speed 9638.16 samples/sec   Loss 5.7832   LearningRate 0.0216   Epoch: 10   Global Step: 178640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:02,546-Speed 9540.10 samples/sec   Loss 5.6706   LearningRate 0.0216   Epoch: 10   Global Step: 178650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:03,671-Speed 9109.05 samples/sec   Loss 5.5593   LearningRate 0.0216   Epoch: 10   Global Step: 178660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:04,703-Speed 9930.53 samples/sec   Loss 5.6089   LearningRate 0.0216   Epoch: 10   Global Step: 178670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:05,783-Speed 9488.61 samples/sec   Loss 5.5803   LearningRate 0.0216   Epoch: 10   Global Step: 178680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:06,866-Speed 9461.13 samples/sec   Loss 5.4717   LearningRate 0.0216   Epoch: 10   Global Step: 178690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:07,935-Speed 9579.12 samples/sec   Loss 5.6709   LearningRate 0.0216   Epoch: 10   Global Step: 178700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:08,992-Speed 9696.08 samples/sec   Loss 5.5352   LearningRate 0.0216   Epoch: 10   Global Step: 178710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:10,142-Speed 8909.44 samples/sec   Loss 5.6683   LearningRate 0.0216   Epoch: 10   Global Step: 178720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:11,226-Speed 9455.51 samples/sec   Loss 5.5864   LearningRate 0.0216   Epoch: 10   Global Step: 178730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:12,377-Speed 8895.03 samples/sec   Loss 5.6356   LearningRate 0.0216   Epoch: 10   Global Step: 178740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:13,463-Speed 9439.95 samples/sec   Loss 5.6711   LearningRate 0.0216   Epoch: 10   Global Step: 178750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:14,550-Speed 9429.28 samples/sec   Loss 5.6479   LearningRate 0.0216   Epoch: 10   Global Step: 178760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:15,612-Speed 9647.09 samples/sec   Loss 5.6402   LearningRate 0.0216   Epoch: 10   Global Step: 178770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:16,706-Speed 9370.23 samples/sec   Loss 5.6833   LearningRate 0.0216   Epoch: 10   Global Step: 178780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:17,834-Speed 9080.46 samples/sec   Loss 5.6505   LearningRate 0.0216   Epoch: 10   Global Step: 178790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:18,946-Speed 9217.80 samples/sec   Loss 5.5769   LearningRate 0.0216   Epoch: 10   Global Step: 178800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:20,042-Speed 9344.35 samples/sec   Loss 5.5615   LearningRate 0.0216   Epoch: 10   Global Step: 178810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:21,135-Speed 9378.68 samples/sec   Loss 5.6850   LearningRate 0.0216   Epoch: 10   Global Step: 178820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:22,245-Speed 9223.33 samples/sec   Loss 5.6038   LearningRate 0.0216   Epoch: 10   Global Step: 178830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:23,347-Speed 9298.79 samples/sec   Loss 5.6985   LearningRate 0.0216   Epoch: 10   Global Step: 178840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:24,406-Speed 9673.72 samples/sec   Loss 5.6847   LearningRate 0.0216   Epoch: 10   Global Step: 178850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:25,510-Speed 9282.85 samples/sec   Loss 5.6602   LearningRate 0.0215   Epoch: 10   Global Step: 178860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:26,546-Speed 9891.55 samples/sec   Loss 5.6882   LearningRate 0.0215   Epoch: 10   Global Step: 178870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:27,646-Speed 9320.70 samples/sec   Loss 5.6556   LearningRate 0.0215   Epoch: 10   Global Step: 178880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:28,736-Speed 9396.76 samples/sec   Loss 5.7584   LearningRate 0.0215   Epoch: 10   Global Step: 178890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:29,819-Speed 9464.36 samples/sec   Loss 5.6402   LearningRate 0.0215   Epoch: 10   Global Step: 178900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:30,930-Speed 9222.06 samples/sec   Loss 5.5846   LearningRate 0.0215   Epoch: 10   Global Step: 178910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:32,009-Speed 9498.18 samples/sec   Loss 5.5628   LearningRate 0.0215   Epoch: 10   Global Step: 178920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:33,096-Speed 9432.93 samples/sec   Loss 5.7236   LearningRate 0.0215   Epoch: 10   Global Step: 178930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:34,198-Speed 9297.90 samples/sec   Loss 5.6579   LearningRate 0.0215   Epoch: 10   Global Step: 178940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:35,305-Speed 9250.09 samples/sec   Loss 5.7603   LearningRate 0.0215   Epoch: 10   Global Step: 178950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:36,430-Speed 9111.10 samples/sec   Loss 5.7157   LearningRate 0.0215   Epoch: 10   Global Step: 178960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:37,548-Speed 9163.82 samples/sec   Loss 5.6961   LearningRate 0.0215   Epoch: 10   Global Step: 178970   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:45:38,668-Speed 9142.33 samples/sec   Loss 5.4796   LearningRate 0.0215   Epoch: 10   Global Step: 178980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:39,719-Speed 9749.53 samples/sec   Loss 5.5922   LearningRate 0.0215   Epoch: 10   Global Step: 178990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:40,794-Speed 9532.76 samples/sec   Loss 5.6456   LearningRate 0.0215   Epoch: 10   Global Step: 179000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:41,940-Speed 8937.53 samples/sec   Loss 5.6611   LearningRate 0.0215   Epoch: 10   Global Step: 179010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:43,028-Speed 9423.30 samples/sec   Loss 5.6361   LearningRate 0.0215   Epoch: 10   Global Step: 179020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:44,121-Speed 9369.83 samples/sec   Loss 5.6326   LearningRate 0.0215   Epoch: 10   Global Step: 179030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:45,196-Speed 9535.79 samples/sec   Loss 5.6512   LearningRate 0.0215   Epoch: 10   Global Step: 179040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:46,271-Speed 9528.58 samples/sec   Loss 5.5765   LearningRate 0.0215   Epoch: 10   Global Step: 179050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:47,385-Speed 9194.54 samples/sec   Loss 5.6186   LearningRate 0.0215   Epoch: 10   Global Step: 179060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:48,480-Speed 9360.79 samples/sec   Loss 5.5535   LearningRate 0.0215   Epoch: 10   Global Step: 179070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:49,619-Speed 8991.72 samples/sec   Loss 5.6771   LearningRate 0.0215   Epoch: 10   Global Step: 179080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:50,728-Speed 9246.45 samples/sec   Loss 5.6427   LearningRate 0.0215   Epoch: 10   Global Step: 179090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:51,847-Speed 9152.12 samples/sec   Loss 5.5962   LearningRate 0.0215   Epoch: 10   Global Step: 179100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:52,982-Speed 9029.88 samples/sec   Loss 5.6463   LearningRate 0.0215   Epoch: 10   Global Step: 179110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:45:54,074-Speed 9382.64 samples/sec   Loss 5.6365   LearningRate 0.0215   Epoch: 10   Global Step: 179120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:55,171-Speed 9337.04 samples/sec   Loss 5.5761   LearningRate 0.0215   Epoch: 10   Global Step: 179130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:56,280-Speed 9242.95 samples/sec   Loss 5.7159   LearningRate 0.0215   Epoch: 10   Global Step: 179140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:57,380-Speed 9320.16 samples/sec   Loss 5.6870   LearningRate 0.0215   Epoch: 10   Global Step: 179150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:58,486-Speed 9255.84 samples/sec   Loss 5.7652   LearningRate 0.0215   Epoch: 10   Global Step: 179160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:45:59,579-Speed 9373.65 samples/sec   Loss 5.6321   LearningRate 0.0215   Epoch: 10   Global Step: 179170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:00,656-Speed 9516.81 samples/sec   Loss 5.6479   LearningRate 0.0215   Epoch: 10   Global Step: 179180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:01,704-Speed 9774.47 samples/sec   Loss 5.5936   LearningRate 0.0215   Epoch: 10   Global Step: 179190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:02,802-Speed 9336.53 samples/sec   Loss 5.6767   LearningRate 0.0215   Epoch: 10   Global Step: 179200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:03,884-Speed 9472.36 samples/sec   Loss 5.6970   LearningRate 0.0215   Epoch: 10   Global Step: 179210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:04,935-Speed 9747.08 samples/sec   Loss 5.6843   LearningRate 0.0214   Epoch: 10   Global Step: 179220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:06,021-Speed 9438.94 samples/sec   Loss 5.6077   LearningRate 0.0214   Epoch: 10   Global Step: 179230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:07,091-Speed 9572.20 samples/sec   Loss 5.6720   LearningRate 0.0214   Epoch: 10   Global Step: 179240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:08,149-Speed 9681.01 samples/sec   Loss 5.5739   LearningRate 0.0214   Epoch: 10   Global Step: 179250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:09,196-Speed 9790.21 samples/sec   Loss 5.6943   LearningRate 0.0214   Epoch: 10   Global Step: 179260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:10,237-Speed 9847.17 samples/sec   Loss 5.4959   LearningRate 0.0214   Epoch: 10   Global Step: 179270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:11,306-Speed 9581.64 samples/sec   Loss 5.5517   LearningRate 0.0214   Epoch: 10   Global Step: 179280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:12,393-Speed 9425.30 samples/sec   Loss 5.5701   LearningRate 0.0214   Epoch: 10   Global Step: 179290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:13,446-Speed 9731.88 samples/sec   Loss 5.5839   LearningRate 0.0214   Epoch: 10   Global Step: 179300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:14,530-Speed 9454.37 samples/sec   Loss 5.5755   LearningRate 0.0214   Epoch: 10   Global Step: 179310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:15,622-Speed 9375.78 samples/sec   Loss 5.5552   LearningRate 0.0214   Epoch: 10   Global Step: 179320   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:46:16,723-Speed 9317.52 samples/sec   Loss 5.6352   LearningRate 0.0214   Epoch: 10   Global Step: 179330   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:46:17,783-Speed 9663.97 samples/sec   Loss 5.6484   LearningRate 0.0214   Epoch: 10   Global Step: 179340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:18,864-Speed 9480.75 samples/sec   Loss 5.7064   LearningRate 0.0214   Epoch: 10   Global Step: 179350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:19,946-Speed 9464.45 samples/sec   Loss 5.5829   LearningRate 0.0214   Epoch: 10   Global Step: 179360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:21,053-Speed 9257.60 samples/sec   Loss 5.6370   LearningRate 0.0214   Epoch: 10   Global Step: 179370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:22,106-Speed 9729.78 samples/sec   Loss 5.7036   LearningRate 0.0214   Epoch: 10   Global Step: 179380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:23,176-Speed 9576.07 samples/sec   Loss 5.7670   LearningRate 0.0214   Epoch: 10   Global Step: 179390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:24,285-Speed 9239.60 samples/sec   Loss 5.6715   LearningRate 0.0214   Epoch: 10   Global Step: 179400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:25,363-Speed 9502.23 samples/sec   Loss 5.6789   LearningRate 0.0214   Epoch: 10   Global Step: 179410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:26,466-Speed 9289.44 samples/sec   Loss 5.7175   LearningRate 0.0214   Epoch: 10   Global Step: 179420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:27,583-Speed 9176.09 samples/sec   Loss 5.7011   LearningRate 0.0214   Epoch: 10   Global Step: 179430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:28,654-Speed 9565.83 samples/sec   Loss 5.6702   LearningRate 0.0214   Epoch: 10   Global Step: 179440   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:46:29,753-Speed 9329.33 samples/sec   Loss 5.5791   LearningRate 0.0214   Epoch: 10   Global Step: 179450   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:46:30,795-Speed 9826.33 samples/sec   Loss 5.6335   LearningRate 0.0214   Epoch: 10   Global Step: 179460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:31,873-Speed 9510.76 samples/sec   Loss 5.6177   LearningRate 0.0214   Epoch: 10   Global Step: 179470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:32,970-Speed 9334.17 samples/sec   Loss 5.5655   LearningRate 0.0214   Epoch: 10   Global Step: 179480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:34,061-Speed 9397.97 samples/sec   Loss 5.6933   LearningRate 0.0214   Epoch: 10   Global Step: 179490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:35,136-Speed 9532.94 samples/sec   Loss 5.5463   LearningRate 0.0214   Epoch: 10   Global Step: 179500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:36,227-Speed 9385.30 samples/sec   Loss 5.5650   LearningRate 0.0214   Epoch: 10   Global Step: 179510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:37,325-Speed 9338.08 samples/sec   Loss 5.6519   LearningRate 0.0214   Epoch: 10   Global Step: 179520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:38,397-Speed 9555.09 samples/sec   Loss 5.7354   LearningRate 0.0214   Epoch: 10   Global Step: 179530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:39,464-Speed 9596.62 samples/sec   Loss 5.5716   LearningRate 0.0214   Epoch: 10   Global Step: 179540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:40,555-Speed 9393.45 samples/sec   Loss 5.6040   LearningRate 0.0214   Epoch: 10   Global Step: 179550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:41,649-Speed 9363.76 samples/sec   Loss 5.4922   LearningRate 0.0214   Epoch: 10   Global Step: 179560   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:46:42,719-Speed 9581.10 samples/sec   Loss 5.6029   LearningRate 0.0214   Epoch: 10   Global Step: 179570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:43,788-Speed 9584.15 samples/sec   Loss 5.6137   LearningRate 0.0213   Epoch: 10   Global Step: 179580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:44,872-Speed 9451.38 samples/sec   Loss 5.6099   LearningRate 0.0213   Epoch: 10   Global Step: 179590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:45,940-Speed 9587.16 samples/sec   Loss 5.5929   LearningRate 0.0213   Epoch: 10   Global Step: 179600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:47,029-Speed 9413.74 samples/sec   Loss 5.5888   LearningRate 0.0213   Epoch: 10   Global Step: 179610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:48,080-Speed 9759.25 samples/sec   Loss 5.6242   LearningRate 0.0213   Epoch: 10   Global Step: 179620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:49,140-Speed 9658.03 samples/sec   Loss 5.6957   LearningRate 0.0213   Epoch: 10   Global Step: 179630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:50,241-Speed 9311.54 samples/sec   Loss 5.6218   LearningRate 0.0213   Epoch: 10   Global Step: 179640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:51,294-Speed 9731.10 samples/sec   Loss 5.6811   LearningRate 0.0213   Epoch: 10   Global Step: 179650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:52,401-Speed 9254.96 samples/sec   Loss 5.6362   LearningRate 0.0213   Epoch: 10   Global Step: 179660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:53,497-Speed 9343.68 samples/sec   Loss 5.6804   LearningRate 0.0213   Epoch: 10   Global Step: 179670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:54,593-Speed 9351.07 samples/sec   Loss 5.6567   LearningRate 0.0213   Epoch: 10   Global Step: 179680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:55,714-Speed 9141.77 samples/sec   Loss 5.6594   LearningRate 0.0213   Epoch: 10   Global Step: 179690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:56,776-Speed 9651.64 samples/sec   Loss 5.6444   LearningRate 0.0213   Epoch: 10   Global Step: 179700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:57,867-Speed 9387.52 samples/sec   Loss 5.5692   LearningRate 0.0213   Epoch: 10   Global Step: 179710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:58,939-Speed 9561.99 samples/sec   Loss 5.7212   LearningRate 0.0213   Epoch: 10   Global Step: 179720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:46:59,983-Speed 9809.23 samples/sec   Loss 5.7158   LearningRate 0.0213   Epoch: 10   Global Step: 179730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:01,067-Speed 9459.52 samples/sec   Loss 5.6814   LearningRate 0.0213   Epoch: 10   Global Step: 179740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:02,130-Speed 9636.68 samples/sec   Loss 5.5769   LearningRate 0.0213   Epoch: 10   Global Step: 179750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:03,190-Speed 9663.88 samples/sec   Loss 5.5907   LearningRate 0.0213   Epoch: 10   Global Step: 179760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:04,248-Speed 9683.79 samples/sec   Loss 5.6624   LearningRate 0.0213   Epoch: 10   Global Step: 179770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:05,345-Speed 9343.88 samples/sec   Loss 5.7115   LearningRate 0.0213   Epoch: 10   Global Step: 179780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:06,425-Speed 9489.02 samples/sec   Loss 5.6487   LearningRate 0.0213   Epoch: 10   Global Step: 179790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:07,496-Speed 9565.57 samples/sec   Loss 5.6620   LearningRate 0.0213   Epoch: 10   Global Step: 179800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:08,601-Speed 9272.14 samples/sec   Loss 5.7764   LearningRate 0.0213   Epoch: 10   Global Step: 179810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:09,661-Speed 9669.61 samples/sec   Loss 5.6091   LearningRate 0.0213   Epoch: 10   Global Step: 179820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:10,740-Speed 9497.44 samples/sec   Loss 5.5935   LearningRate 0.0213   Epoch: 10   Global Step: 179830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:11,791-Speed 9746.51 samples/sec   Loss 5.6686   LearningRate 0.0213   Epoch: 10   Global Step: 179840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:12,927-Speed 9015.19 samples/sec   Loss 5.6782   LearningRate 0.0213   Epoch: 10   Global Step: 179850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:14,027-Speed 9311.94 samples/sec   Loss 5.5745   LearningRate 0.0213   Epoch: 10   Global Step: 179860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:15,104-Speed 9529.10 samples/sec   Loss 5.6497   LearningRate 0.0213   Epoch: 10   Global Step: 179870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:16,152-Speed 9782.02 samples/sec   Loss 5.7403   LearningRate 0.0213   Epoch: 10   Global Step: 179880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:17,234-Speed 9469.37 samples/sec   Loss 5.5906   LearningRate 0.0213   Epoch: 10   Global Step: 179890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:18,347-Speed 9204.84 samples/sec   Loss 5.4919   LearningRate 0.0213   Epoch: 10   Global Step: 179900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:19,493-Speed 8942.67 samples/sec   Loss 5.6631   LearningRate 0.0213   Epoch: 10   Global Step: 179910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:20,570-Speed 9514.11 samples/sec   Loss 5.6664   LearningRate 0.0213   Epoch: 10   Global Step: 179920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:21,649-Speed 9493.65 samples/sec   Loss 5.5232   LearningRate 0.0213   Epoch: 10   Global Step: 179930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:22,713-Speed 9632.40 samples/sec   Loss 5.6240   LearningRate 0.0212   Epoch: 10   Global Step: 179940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:23,790-Speed 9506.41 samples/sec   Loss 5.7183   LearningRate 0.0212   Epoch: 10   Global Step: 179950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:24,900-Speed 9233.45 samples/sec   Loss 5.7076   LearningRate 0.0212   Epoch: 10   Global Step: 179960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:26,000-Speed 9319.26 samples/sec   Loss 5.5832   LearningRate 0.0212   Epoch: 10   Global Step: 179970   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:47:27,074-Speed 9542.91 samples/sec   Loss 5.6386   LearningRate 0.0212   Epoch: 10   Global Step: 179980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:28,177-Speed 9293.62 samples/sec   Loss 5.6799   LearningRate 0.0212   Epoch: 10   Global Step: 179990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:29,264-Speed 9418.83 samples/sec   Loss 5.6542   LearningRate 0.0212   Epoch: 10   Global Step: 180000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:47:51,352-[lfw][180000]XNorm: 9.247617
Training: 2022-04-11 18:47:51,353-[lfw][180000]Accuracy-Flip: 0.99650+-0.00361
Training: 2022-04-11 18:47:51,354-[lfw][180000]Accuracy-Highest: 0.99683
Training: 2022-04-11 18:48:16,872-[cfp_fp][180000]XNorm: 7.956881
Training: 2022-04-11 18:48:16,873-[cfp_fp][180000]Accuracy-Flip: 0.96357+-0.00924
Training: 2022-04-11 18:48:16,873-[cfp_fp][180000]Accuracy-Highest: 0.96586
Training: 2022-04-11 18:48:38,932-[agedb_30][180000]XNorm: 8.940597
Training: 2022-04-11 18:48:38,932-[agedb_30][180000]Accuracy-Flip: 0.96733+-0.00937
Training: 2022-04-11 18:48:38,933-[agedb_30][180000]Accuracy-Highest: 0.96917
Training: 2022-04-11 18:48:40,014-Speed 144.74 samples/sec   Loss 5.6204   LearningRate 0.0212   Epoch: 10   Global Step: 180010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:41,104-Speed 9400.70 samples/sec   Loss 5.6844   LearningRate 0.0212   Epoch: 10   Global Step: 180020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:42,201-Speed 9338.76 samples/sec   Loss 5.5781   LearningRate 0.0212   Epoch: 10   Global Step: 180030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:43,305-Speed 9282.81 samples/sec   Loss 5.6363   LearningRate 0.0212   Epoch: 10   Global Step: 180040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:44,390-Speed 9437.60 samples/sec   Loss 5.7741   LearningRate 0.0212   Epoch: 10   Global Step: 180050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:45,445-Speed 9719.64 samples/sec   Loss 5.6623   LearningRate 0.0212   Epoch: 10   Global Step: 180060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:46,533-Speed 9412.10 samples/sec   Loss 5.6776   LearningRate 0.0212   Epoch: 10   Global Step: 180070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:47,603-Speed 9577.57 samples/sec   Loss 5.7317   LearningRate 0.0212   Epoch: 10   Global Step: 180080   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:48:48,677-Speed 9538.71 samples/sec   Loss 5.5854   LearningRate 0.0212   Epoch: 10   Global Step: 180090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:49,793-Speed 9180.56 samples/sec   Loss 5.6296   LearningRate 0.0212   Epoch: 10   Global Step: 180100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:50,890-Speed 9340.97 samples/sec   Loss 5.5866   LearningRate 0.0212   Epoch: 10   Global Step: 180110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:51,949-Speed 9680.03 samples/sec   Loss 5.6243   LearningRate 0.0212   Epoch: 10   Global Step: 180120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:53,025-Speed 9518.89 samples/sec   Loss 5.5315   LearningRate 0.0212   Epoch: 10   Global Step: 180130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:54,191-Speed 8788.27 samples/sec   Loss 5.6843   LearningRate 0.0212   Epoch: 10   Global Step: 180140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:55,310-Speed 9158.48 samples/sec   Loss 5.6038   LearningRate 0.0212   Epoch: 10   Global Step: 180150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:56,412-Speed 9299.31 samples/sec   Loss 5.5562   LearningRate 0.0212   Epoch: 10   Global Step: 180160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:57,508-Speed 9343.14 samples/sec   Loss 5.6679   LearningRate 0.0212   Epoch: 10   Global Step: 180170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:58,602-Speed 9369.35 samples/sec   Loss 5.6423   LearningRate 0.0212   Epoch: 10   Global Step: 180180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:48:59,680-Speed 9507.33 samples/sec   Loss 5.6314   LearningRate 0.0212   Epoch: 10   Global Step: 180190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:00,773-Speed 9372.51 samples/sec   Loss 5.6666   LearningRate 0.0212   Epoch: 10   Global Step: 180200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:01,845-Speed 9562.51 samples/sec   Loss 5.5473   LearningRate 0.0212   Epoch: 10   Global Step: 180210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:02,909-Speed 9624.16 samples/sec   Loss 5.5719   LearningRate 0.0212   Epoch: 10   Global Step: 180220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:04,004-Speed 9355.85 samples/sec   Loss 5.6256   LearningRate 0.0212   Epoch: 10   Global Step: 180230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:05,089-Speed 9440.20 samples/sec   Loss 5.7098   LearningRate 0.0212   Epoch: 10   Global Step: 180240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:06,150-Speed 9659.50 samples/sec   Loss 5.6585   LearningRate 0.0212   Epoch: 10   Global Step: 180250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:07,231-Speed 9477.58 samples/sec   Loss 5.5324   LearningRate 0.0212   Epoch: 10   Global Step: 180260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:08,309-Speed 9506.54 samples/sec   Loss 5.5343   LearningRate 0.0212   Epoch: 10   Global Step: 180270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:09,404-Speed 9353.30 samples/sec   Loss 5.7549   LearningRate 0.0212   Epoch: 10   Global Step: 180280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:10,453-Speed 9772.81 samples/sec   Loss 5.6131   LearningRate 0.0212   Epoch: 10   Global Step: 180290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:11,524-Speed 9573.24 samples/sec   Loss 5.5757   LearningRate 0.0211   Epoch: 10   Global Step: 180300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:12,638-Speed 9197.11 samples/sec   Loss 5.6421   LearningRate 0.0211   Epoch: 10   Global Step: 180310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:13,758-Speed 9143.12 samples/sec   Loss 5.6852   LearningRate 0.0211   Epoch: 10   Global Step: 180320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:14,835-Speed 9512.69 samples/sec   Loss 5.5999   LearningRate 0.0211   Epoch: 10   Global Step: 180330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:15,950-Speed 9192.03 samples/sec   Loss 5.5765   LearningRate 0.0211   Epoch: 10   Global Step: 180340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:17,069-Speed 9159.19 samples/sec   Loss 5.5357   LearningRate 0.0211   Epoch: 10   Global Step: 180350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:18,125-Speed 9701.78 samples/sec   Loss 5.6061   LearningRate 0.0211   Epoch: 10   Global Step: 180360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:19,205-Speed 9487.91 samples/sec   Loss 5.6122   LearningRate 0.0211   Epoch: 10   Global Step: 180370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:20,320-Speed 9190.33 samples/sec   Loss 5.6663   LearningRate 0.0211   Epoch: 10   Global Step: 180380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:21,382-Speed 9647.97 samples/sec   Loss 5.5569   LearningRate 0.0211   Epoch: 10   Global Step: 180390   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:49:22,481-Speed 9324.54 samples/sec   Loss 5.6064   LearningRate 0.0211   Epoch: 10   Global Step: 180400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:23,584-Speed 9285.89 samples/sec   Loss 5.6214   LearningRate 0.0211   Epoch: 10   Global Step: 180410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:24,682-Speed 9335.17 samples/sec   Loss 5.7100   LearningRate 0.0211   Epoch: 10   Global Step: 180420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:25,752-Speed 9571.34 samples/sec   Loss 5.5579   LearningRate 0.0211   Epoch: 10   Global Step: 180430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:26,852-Speed 9316.73 samples/sec   Loss 5.5754   LearningRate 0.0211   Epoch: 10   Global Step: 180440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:27,924-Speed 9565.43 samples/sec   Loss 5.5594   LearningRate 0.0211   Epoch: 10   Global Step: 180450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:29,007-Speed 9452.44 samples/sec   Loss 5.6418   LearningRate 0.0211   Epoch: 10   Global Step: 180460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:30,150-Speed 8970.46 samples/sec   Loss 5.6015   LearningRate 0.0211   Epoch: 10   Global Step: 180470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:31,232-Speed 9471.43 samples/sec   Loss 5.6993   LearningRate 0.0211   Epoch: 10   Global Step: 180480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:32,330-Speed 9327.59 samples/sec   Loss 5.6124   LearningRate 0.0211   Epoch: 10   Global Step: 180490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:33,430-Speed 9317.83 samples/sec   Loss 5.6498   LearningRate 0.0211   Epoch: 10   Global Step: 180500   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:49:34,543-Speed 9200.88 samples/sec   Loss 5.6828   LearningRate 0.0211   Epoch: 10   Global Step: 180510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:35,631-Speed 9421.94 samples/sec   Loss 5.7167   LearningRate 0.0211   Epoch: 10   Global Step: 180520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:36,750-Speed 9158.14 samples/sec   Loss 5.5211   LearningRate 0.0211   Epoch: 10   Global Step: 180530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:37,814-Speed 9626.28 samples/sec   Loss 5.6241   LearningRate 0.0211   Epoch: 10   Global Step: 180540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:38,940-Speed 9106.23 samples/sec   Loss 5.5788   LearningRate 0.0211   Epoch: 10   Global Step: 180550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:40,034-Speed 9361.12 samples/sec   Loss 5.6768   LearningRate 0.0211   Epoch: 10   Global Step: 180560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:41,131-Speed 9346.73 samples/sec   Loss 5.7130   LearningRate 0.0211   Epoch: 10   Global Step: 180570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:42,214-Speed 9459.87 samples/sec   Loss 5.7697   LearningRate 0.0211   Epoch: 10   Global Step: 180580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:43,320-Speed 9262.57 samples/sec   Loss 5.7095   LearningRate 0.0211   Epoch: 10   Global Step: 180590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:44,442-Speed 9129.84 samples/sec   Loss 5.5048   LearningRate 0.0211   Epoch: 10   Global Step: 180600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:45,530-Speed 9416.33 samples/sec   Loss 5.5987   LearningRate 0.0211   Epoch: 10   Global Step: 180610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:46,637-Speed 9260.12 samples/sec   Loss 5.6356   LearningRate 0.0211   Epoch: 10   Global Step: 180620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:47,715-Speed 9505.22 samples/sec   Loss 5.6238   LearningRate 0.0211   Epoch: 10   Global Step: 180630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:48,800-Speed 9441.52 samples/sec   Loss 5.6660   LearningRate 0.0211   Epoch: 10   Global Step: 180640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:49,929-Speed 9075.81 samples/sec   Loss 5.6314   LearningRate 0.0211   Epoch: 10   Global Step: 180650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:50,997-Speed 9589.63 samples/sec   Loss 5.6908   LearningRate 0.0211   Epoch: 10   Global Step: 180660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:52,059-Speed 9653.56 samples/sec   Loss 5.6982   LearningRate 0.0210   Epoch: 10   Global Step: 180670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:53,182-Speed 9122.84 samples/sec   Loss 5.5809   LearningRate 0.0210   Epoch: 10   Global Step: 180680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:54,317-Speed 9029.56 samples/sec   Loss 5.6048   LearningRate 0.0210   Epoch: 10   Global Step: 180690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:55,412-Speed 9352.71 samples/sec   Loss 5.5708   LearningRate 0.0210   Epoch: 10   Global Step: 180700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:56,495-Speed 9460.95 samples/sec   Loss 5.6906   LearningRate 0.0210   Epoch: 10   Global Step: 180710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:57,585-Speed 9407.69 samples/sec   Loss 5.6313   LearningRate 0.0210   Epoch: 10   Global Step: 180720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:58,795-Speed 8470.09 samples/sec   Loss 5.7131   LearningRate 0.0210   Epoch: 10   Global Step: 180730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:49:59,904-Speed 9230.48 samples/sec   Loss 5.6350   LearningRate 0.0210   Epoch: 10   Global Step: 180740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:00,990-Speed 9437.47 samples/sec   Loss 5.6016   LearningRate 0.0210   Epoch: 10   Global Step: 180750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:02,099-Speed 9240.06 samples/sec   Loss 5.5453   LearningRate 0.0210   Epoch: 10   Global Step: 180760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:03,208-Speed 9241.65 samples/sec   Loss 5.6001   LearningRate 0.0210   Epoch: 10   Global Step: 180770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:04,309-Speed 9305.23 samples/sec   Loss 5.6071   LearningRate 0.0210   Epoch: 10   Global Step: 180780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:05,397-Speed 9409.55 samples/sec   Loss 5.6707   LearningRate 0.0210   Epoch: 10   Global Step: 180790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:06,517-Speed 9152.30 samples/sec   Loss 5.5990   LearningRate 0.0210   Epoch: 10   Global Step: 180800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:07,609-Speed 9378.98 samples/sec   Loss 5.6595   LearningRate 0.0210   Epoch: 10   Global Step: 180810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:08,699-Speed 9405.36 samples/sec   Loss 5.5782   LearningRate 0.0210   Epoch: 10   Global Step: 180820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:09,771-Speed 9551.07 samples/sec   Loss 5.5398   LearningRate 0.0210   Epoch: 10   Global Step: 180830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:10,844-Speed 9556.42 samples/sec   Loss 5.6444   LearningRate 0.0210   Epoch: 10   Global Step: 180840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:11,914-Speed 9569.13 samples/sec   Loss 5.5036   LearningRate 0.0210   Epoch: 10   Global Step: 180850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:13,036-Speed 9135.51 samples/sec   Loss 5.5650   LearningRate 0.0210   Epoch: 10   Global Step: 180860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:14,128-Speed 9380.57 samples/sec   Loss 5.5918   LearningRate 0.0210   Epoch: 10   Global Step: 180870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:15,174-Speed 9803.85 samples/sec   Loss 5.6872   LearningRate 0.0210   Epoch: 10   Global Step: 180880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:16,220-Speed 9795.18 samples/sec   Loss 5.6435   LearningRate 0.0210   Epoch: 10   Global Step: 180890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:17,305-Speed 9445.55 samples/sec   Loss 5.5992   LearningRate 0.0210   Epoch: 10   Global Step: 180900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:18,350-Speed 9797.36 samples/sec   Loss 5.5788   LearningRate 0.0210   Epoch: 10   Global Step: 180910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:19,441-Speed 9394.12 samples/sec   Loss 5.7157   LearningRate 0.0210   Epoch: 10   Global Step: 180920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:20,558-Speed 9176.61 samples/sec   Loss 5.6408   LearningRate 0.0210   Epoch: 10   Global Step: 180930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:21,654-Speed 9349.34 samples/sec   Loss 5.6264   LearningRate 0.0210   Epoch: 10   Global Step: 180940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:22,704-Speed 9754.57 samples/sec   Loss 5.6743   LearningRate 0.0210   Epoch: 10   Global Step: 180950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:23,783-Speed 9494.23 samples/sec   Loss 5.7442   LearningRate 0.0210   Epoch: 10   Global Step: 180960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:24,838-Speed 9719.79 samples/sec   Loss 5.5779   LearningRate 0.0210   Epoch: 10   Global Step: 180970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:25,957-Speed 9150.21 samples/sec   Loss 5.7216   LearningRate 0.0210   Epoch: 10   Global Step: 180980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:27,043-Speed 9435.83 samples/sec   Loss 5.5420   LearningRate 0.0210   Epoch: 10   Global Step: 180990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:28,136-Speed 9381.07 samples/sec   Loss 5.6821   LearningRate 0.0210   Epoch: 10   Global Step: 181000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:29,222-Speed 9429.99 samples/sec   Loss 5.5576   LearningRate 0.0210   Epoch: 10   Global Step: 181010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:30,311-Speed 9413.28 samples/sec   Loss 5.6353   LearningRate 0.0210   Epoch: 10   Global Step: 181020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:31,370-Speed 9670.63 samples/sec   Loss 5.6050   LearningRate 0.0209   Epoch: 10   Global Step: 181030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:32,530-Speed 8833.41 samples/sec   Loss 5.7147   LearningRate 0.0209   Epoch: 10   Global Step: 181040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:33,606-Speed 9522.39 samples/sec   Loss 5.6316   LearningRate 0.0209   Epoch: 10   Global Step: 181050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:34,712-Speed 9266.43 samples/sec   Loss 5.5758   LearningRate 0.0209   Epoch: 10   Global Step: 181060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:35,829-Speed 9176.67 samples/sec   Loss 5.7086   LearningRate 0.0209   Epoch: 10   Global Step: 181070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:36,908-Speed 9495.01 samples/sec   Loss 5.6009   LearningRate 0.0209   Epoch: 10   Global Step: 181080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:37,991-Speed 9461.30 samples/sec   Loss 5.5266   LearningRate 0.0209   Epoch: 10   Global Step: 181090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:39,080-Speed 9409.05 samples/sec   Loss 5.6452   LearningRate 0.0209   Epoch: 10   Global Step: 181100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:40,145-Speed 9621.72 samples/sec   Loss 5.6058   LearningRate 0.0209   Epoch: 10   Global Step: 181110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:41,217-Speed 9552.57 samples/sec   Loss 5.6062   LearningRate 0.0209   Epoch: 10   Global Step: 181120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:42,266-Speed 9768.86 samples/sec   Loss 5.5085   LearningRate 0.0209   Epoch: 10   Global Step: 181130   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:50:43,395-Speed 9078.60 samples/sec   Loss 5.5726   LearningRate 0.0209   Epoch: 10   Global Step: 181140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:44,455-Speed 9661.08 samples/sec   Loss 5.6247   LearningRate 0.0209   Epoch: 10   Global Step: 181150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:45,537-Speed 9473.99 samples/sec   Loss 5.5418   LearningRate 0.0209   Epoch: 10   Global Step: 181160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:46,606-Speed 9579.48 samples/sec   Loss 5.6141   LearningRate 0.0209   Epoch: 10   Global Step: 181170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:47,705-Speed 9319.19 samples/sec   Loss 5.4677   LearningRate 0.0209   Epoch: 10   Global Step: 181180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:48,826-Speed 9140.51 samples/sec   Loss 5.5491   LearningRate 0.0209   Epoch: 10   Global Step: 181190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:49,933-Speed 9258.73 samples/sec   Loss 5.5575   LearningRate 0.0209   Epoch: 10   Global Step: 181200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:51,030-Speed 9344.15 samples/sec   Loss 5.6622   LearningRate 0.0209   Epoch: 10   Global Step: 181210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:52,091-Speed 9661.01 samples/sec   Loss 5.6933   LearningRate 0.0209   Epoch: 10   Global Step: 181220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:53,171-Speed 9490.10 samples/sec   Loss 5.6974   LearningRate 0.0209   Epoch: 10   Global Step: 181230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:50:54,270-Speed 9323.54 samples/sec   Loss 5.5561   LearningRate 0.0209   Epoch: 10   Global Step: 181240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:55,380-Speed 9226.09 samples/sec   Loss 5.7878   LearningRate 0.0209   Epoch: 10   Global Step: 181250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:56,502-Speed 9134.04 samples/sec   Loss 5.5603   LearningRate 0.0209   Epoch: 10   Global Step: 181260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:57,620-Speed 9165.98 samples/sec   Loss 5.6754   LearningRate 0.0209   Epoch: 10   Global Step: 181270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:58,717-Speed 9356.03 samples/sec   Loss 5.6441   LearningRate 0.0209   Epoch: 10   Global Step: 181280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:50:59,812-Speed 9359.42 samples/sec   Loss 5.5476   LearningRate 0.0209   Epoch: 10   Global Step: 181290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:51:00,914-Speed 9291.27 samples/sec   Loss 5.6210   LearningRate 0.0209   Epoch: 10   Global Step: 181300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:51:01,997-Speed 9467.08 samples/sec   Loss 5.6936   LearningRate 0.0209   Epoch: 10   Global Step: 181310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:51:03,136-Speed 8994.49 samples/sec   Loss 5.6400   LearningRate 0.0209   Epoch: 10   Global Step: 181320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:51:04,250-Speed 9199.25 samples/sec   Loss 5.5381   LearningRate 0.0209   Epoch: 10   Global Step: 181330   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:51:05,345-Speed 9361.92 samples/sec   Loss 5.6691   LearningRate 0.0209   Epoch: 10   Global Step: 181340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:06,439-Speed 9359.34 samples/sec   Loss 5.5773   LearningRate 0.0209   Epoch: 10   Global Step: 181350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:07,537-Speed 9333.68 samples/sec   Loss 5.7516   LearningRate 0.0209   Epoch: 10   Global Step: 181360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:08,657-Speed 9148.35 samples/sec   Loss 5.6163   LearningRate 0.0209   Epoch: 10   Global Step: 181370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:09,765-Speed 9243.85 samples/sec   Loss 5.6372   LearningRate 0.0209   Epoch: 10   Global Step: 181380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:10,862-Speed 9347.72 samples/sec   Loss 5.5625   LearningRate 0.0209   Epoch: 10   Global Step: 181390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:11,968-Speed 9260.14 samples/sec   Loss 5.5311   LearningRate 0.0208   Epoch: 10   Global Step: 181400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:13,026-Speed 9681.01 samples/sec   Loss 5.6505   LearningRate 0.0208   Epoch: 10   Global Step: 181410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:14,149-Speed 9130.88 samples/sec   Loss 5.5357   LearningRate 0.0208   Epoch: 10   Global Step: 181420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:15,231-Speed 9468.18 samples/sec   Loss 5.5377   LearningRate 0.0208   Epoch: 10   Global Step: 181430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:16,345-Speed 9196.96 samples/sec   Loss 5.6003   LearningRate 0.0208   Epoch: 10   Global Step: 181440   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:51:17,474-Speed 9069.89 samples/sec   Loss 5.5989   LearningRate 0.0208   Epoch: 10   Global Step: 181450   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:51:18,572-Speed 9336.65 samples/sec   Loss 5.6531   LearningRate 0.0208   Epoch: 10   Global Step: 181460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:19,667-Speed 9353.52 samples/sec   Loss 5.5019   LearningRate 0.0208   Epoch: 10   Global Step: 181470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:20,749-Speed 9471.89 samples/sec   Loss 5.6164   LearningRate 0.0208   Epoch: 10   Global Step: 181480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:21,870-Speed 9144.14 samples/sec   Loss 5.5038   LearningRate 0.0208   Epoch: 10   Global Step: 181490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:22,955-Speed 9446.39 samples/sec   Loss 5.7318   LearningRate 0.0208   Epoch: 10   Global Step: 181500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:24,141-Speed 8636.44 samples/sec   Loss 5.5975   LearningRate 0.0208   Epoch: 10   Global Step: 181510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:25,211-Speed 9576.14 samples/sec   Loss 5.4638   LearningRate 0.0208   Epoch: 10   Global Step: 181520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:26,342-Speed 9060.35 samples/sec   Loss 5.6532   LearningRate 0.0208   Epoch: 10   Global Step: 181530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:27,464-Speed 9125.77 samples/sec   Loss 5.6067   LearningRate 0.0208   Epoch: 10   Global Step: 181540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:28,521-Speed 9704.16 samples/sec   Loss 5.5547   LearningRate 0.0208   Epoch: 10   Global Step: 181550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:29,602-Speed 9477.21 samples/sec   Loss 5.5008   LearningRate 0.0208   Epoch: 10   Global Step: 181560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:30,683-Speed 9470.15 samples/sec   Loss 5.5585   LearningRate 0.0208   Epoch: 10   Global Step: 181570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:31,774-Speed 9395.95 samples/sec   Loss 5.5616   LearningRate 0.0208   Epoch: 10   Global Step: 181580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:32,899-Speed 9110.49 samples/sec   Loss 5.6655   LearningRate 0.0208   Epoch: 10   Global Step: 181590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:34,036-Speed 9005.40 samples/sec   Loss 5.7240   LearningRate 0.0208   Epoch: 10   Global Step: 181600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:35,098-Speed 9647.51 samples/sec   Loss 5.5580   LearningRate 0.0208   Epoch: 10   Global Step: 181610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:36,192-Speed 9372.37 samples/sec   Loss 5.5240   LearningRate 0.0208   Epoch: 10   Global Step: 181620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:37,279-Speed 9421.66 samples/sec   Loss 5.5062   LearningRate 0.0208   Epoch: 10   Global Step: 181630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:38,361-Speed 9469.47 samples/sec   Loss 5.5274   LearningRate 0.0208   Epoch: 10   Global Step: 181640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:39,471-Speed 9235.13 samples/sec   Loss 5.5004   LearningRate 0.0208   Epoch: 10   Global Step: 181650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:40,587-Speed 9183.89 samples/sec   Loss 5.5862   LearningRate 0.0208   Epoch: 10   Global Step: 181660   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:51:41,647-Speed 9664.17 samples/sec   Loss 5.6505   LearningRate 0.0208   Epoch: 10   Global Step: 181670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:42,762-Speed 9190.53 samples/sec   Loss 5.6566   LearningRate 0.0208   Epoch: 10   Global Step: 181680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:43,898-Speed 9016.39 samples/sec   Loss 5.5602   LearningRate 0.0208   Epoch: 10   Global Step: 181690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:44,982-Speed 9454.06 samples/sec   Loss 5.6773   LearningRate 0.0208   Epoch: 10   Global Step: 181700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:46,067-Speed 9437.92 samples/sec   Loss 5.5398   LearningRate 0.0208   Epoch: 10   Global Step: 181710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:47,106-Speed 9868.97 samples/sec   Loss 5.4874   LearningRate 0.0208   Epoch: 10   Global Step: 181720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:48,186-Speed 9485.68 samples/sec   Loss 5.6235   LearningRate 0.0208   Epoch: 10   Global Step: 181730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:49,235-Speed 9763.71 samples/sec   Loss 5.6116   LearningRate 0.0208   Epoch: 10   Global Step: 181740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:50,326-Speed 9392.31 samples/sec   Loss 5.6387   LearningRate 0.0208   Epoch: 10   Global Step: 181750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:51,414-Speed 9418.15 samples/sec   Loss 5.6454   LearningRate 0.0207   Epoch: 10   Global Step: 181760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:52,518-Speed 9282.51 samples/sec   Loss 5.7010   LearningRate 0.0207   Epoch: 10   Global Step: 181770   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:51:53,637-Speed 9155.42 samples/sec   Loss 5.6186   LearningRate 0.0207   Epoch: 10   Global Step: 181780   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:51:54,713-Speed 9522.46 samples/sec   Loss 5.5325   LearningRate 0.0207   Epoch: 10   Global Step: 181790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:55,781-Speed 9596.16 samples/sec   Loss 5.5884   LearningRate 0.0207   Epoch: 10   Global Step: 181800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:56,892-Speed 9220.32 samples/sec   Loss 5.6106   LearningRate 0.0207   Epoch: 10   Global Step: 181810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:58,007-Speed 9199.06 samples/sec   Loss 5.5213   LearningRate 0.0207   Epoch: 10   Global Step: 181820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:51:59,158-Speed 8901.22 samples/sec   Loss 5.5667   LearningRate 0.0207   Epoch: 10   Global Step: 181830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:00,212-Speed 9723.24 samples/sec   Loss 5.6580   LearningRate 0.0207   Epoch: 10   Global Step: 181840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:01,296-Speed 9450.79 samples/sec   Loss 5.5230   LearningRate 0.0207   Epoch: 10   Global Step: 181850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:02,396-Speed 9316.12 samples/sec   Loss 5.5810   LearningRate 0.0207   Epoch: 10   Global Step: 181860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:03,479-Speed 9461.34 samples/sec   Loss 5.6915   LearningRate 0.0207   Epoch: 10   Global Step: 181870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:04,582-Speed 9282.88 samples/sec   Loss 5.6160   LearningRate 0.0207   Epoch: 10   Global Step: 181880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:05,649-Speed 9609.94 samples/sec   Loss 5.5538   LearningRate 0.0207   Epoch: 10   Global Step: 181890   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:52:06,692-Speed 9819.89 samples/sec   Loss 5.5279   LearningRate 0.0207   Epoch: 10   Global Step: 181900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:07,785-Speed 9374.92 samples/sec   Loss 5.5133   LearningRate 0.0207   Epoch: 10   Global Step: 181910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:08,891-Speed 9257.69 samples/sec   Loss 5.5091   LearningRate 0.0207   Epoch: 10   Global Step: 181920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:10,066-Speed 8727.68 samples/sec   Loss 5.5121   LearningRate 0.0207   Epoch: 10   Global Step: 181930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:11,169-Speed 9282.96 samples/sec   Loss 5.5437   LearningRate 0.0207   Epoch: 10   Global Step: 181940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:12,216-Speed 9788.59 samples/sec   Loss 5.6042   LearningRate 0.0207   Epoch: 10   Global Step: 181950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:13,318-Speed 9301.67 samples/sec   Loss 5.6618   LearningRate 0.0207   Epoch: 10   Global Step: 181960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:14,451-Speed 9043.06 samples/sec   Loss 5.6189   LearningRate 0.0207   Epoch: 10   Global Step: 181970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:15,499-Speed 9775.16 samples/sec   Loss 5.5977   LearningRate 0.0207   Epoch: 10   Global Step: 181980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:16,582-Speed 9457.61 samples/sec   Loss 5.5196   LearningRate 0.0207   Epoch: 10   Global Step: 181990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:17,634-Speed 9742.04 samples/sec   Loss 5.5282   LearningRate 0.0207   Epoch: 10   Global Step: 182000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:52:39,709-[lfw][182000]XNorm: 9.092343
Training: 2022-04-11 18:52:39,710-[lfw][182000]Accuracy-Flip: 0.99667+-0.00269
Training: 2022-04-11 18:52:39,710-[lfw][182000]Accuracy-Highest: 0.99683
Training: 2022-04-11 18:53:05,080-[cfp_fp][182000]XNorm: 7.787156
Training: 2022-04-11 18:53:05,080-[cfp_fp][182000]Accuracy-Flip: 0.96643+-0.00806
Training: 2022-04-11 18:53:05,081-[cfp_fp][182000]Accuracy-Highest: 0.96643
Training: 2022-04-11 18:53:26,990-[agedb_30][182000]XNorm: 8.836864
Training: 2022-04-11 18:53:26,991-[agedb_30][182000]Accuracy-Flip: 0.96800+-0.00994
Training: 2022-04-11 18:53:26,992-[agedb_30][182000]Accuracy-Highest: 0.96917
Training: 2022-04-11 18:53:28,065-Speed 145.39 samples/sec   Loss 5.6238   LearningRate 0.0207   Epoch: 10   Global Step: 182010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:29,127-Speed 9654.11 samples/sec   Loss 5.6082   LearningRate 0.0207   Epoch: 10   Global Step: 182020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:30,184-Speed 9687.22 samples/sec   Loss 5.5817   LearningRate 0.0207   Epoch: 10   Global Step: 182030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:31,305-Speed 9145.97 samples/sec   Loss 5.5862   LearningRate 0.0207   Epoch: 10   Global Step: 182040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:53:32,380-Speed 9528.06 samples/sec   Loss 5.6473   LearningRate 0.0207   Epoch: 10   Global Step: 182050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:53:33,458-Speed 9500.03 samples/sec   Loss 5.6451   LearningRate 0.0207   Epoch: 10   Global Step: 182060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:53:34,533-Speed 9541.66 samples/sec   Loss 5.5579   LearningRate 0.0207   Epoch: 10   Global Step: 182070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:53:35,607-Speed 9531.32 samples/sec   Loss 5.6759   LearningRate 0.0207   Epoch: 10   Global Step: 182080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:53:36,687-Speed 9492.06 samples/sec   Loss 5.6180   LearningRate 0.0207   Epoch: 10   Global Step: 182090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:53:37,782-Speed 9357.20 samples/sec   Loss 5.6159   LearningRate 0.0207   Epoch: 10   Global Step: 182100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:53:38,880-Speed 9331.92 samples/sec   Loss 5.6732   LearningRate 0.0207   Epoch: 10   Global Step: 182110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:53:40,042-Speed 8812.22 samples/sec   Loss 5.7330   LearningRate 0.0207   Epoch: 10   Global Step: 182120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:53:41,110-Speed 9601.20 samples/sec   Loss 5.5622   LearningRate 0.0206   Epoch: 10   Global Step: 182130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:53:42,203-Speed 9373.15 samples/sec   Loss 5.6258   LearningRate 0.0206   Epoch: 10   Global Step: 182140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:43,298-Speed 9355.38 samples/sec   Loss 5.6507   LearningRate 0.0206   Epoch: 10   Global Step: 182150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:44,372-Speed 9538.11 samples/sec   Loss 5.6195   LearningRate 0.0206   Epoch: 10   Global Step: 182160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:45,444-Speed 9555.46 samples/sec   Loss 5.5303   LearningRate 0.0206   Epoch: 10   Global Step: 182170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:46,567-Speed 9129.51 samples/sec   Loss 5.5612   LearningRate 0.0206   Epoch: 10   Global Step: 182180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:47,657-Speed 9400.32 samples/sec   Loss 5.6525   LearningRate 0.0206   Epoch: 10   Global Step: 182190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:48,732-Speed 9527.91 samples/sec   Loss 5.6620   LearningRate 0.0206   Epoch: 10   Global Step: 182200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:49,852-Speed 9151.55 samples/sec   Loss 5.5979   LearningRate 0.0206   Epoch: 10   Global Step: 182210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:50,937-Speed 9443.12 samples/sec   Loss 5.5870   LearningRate 0.0206   Epoch: 10   Global Step: 182220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:52,016-Speed 9497.78 samples/sec   Loss 5.6357   LearningRate 0.0206   Epoch: 10   Global Step: 182230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:53,056-Speed 9852.54 samples/sec   Loss 5.5885   LearningRate 0.0206   Epoch: 10   Global Step: 182240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:54,157-Speed 9309.39 samples/sec   Loss 5.6244   LearningRate 0.0206   Epoch: 10   Global Step: 182250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:55,252-Speed 9358.31 samples/sec   Loss 5.5810   LearningRate 0.0206   Epoch: 10   Global Step: 182260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:56,333-Speed 9475.40 samples/sec   Loss 5.7566   LearningRate 0.0206   Epoch: 10   Global Step: 182270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:57,396-Speed 9644.03 samples/sec   Loss 5.6259   LearningRate 0.0206   Epoch: 10   Global Step: 182280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:58,450-Speed 9721.33 samples/sec   Loss 5.5914   LearningRate 0.0206   Epoch: 10   Global Step: 182290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:53:59,537-Speed 9425.81 samples/sec   Loss 5.6213   LearningRate 0.0206   Epoch: 10   Global Step: 182300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:00,674-Speed 9014.11 samples/sec   Loss 5.5508   LearningRate 0.0206   Epoch: 10   Global Step: 182310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:01,754-Speed 9489.83 samples/sec   Loss 5.6738   LearningRate 0.0206   Epoch: 10   Global Step: 182320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:02,832-Speed 9495.69 samples/sec   Loss 5.6995   LearningRate 0.0206   Epoch: 10   Global Step: 182330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:03,891-Speed 9681.78 samples/sec   Loss 5.6355   LearningRate 0.0206   Epoch: 10   Global Step: 182340   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:54:04,972-Speed 9479.68 samples/sec   Loss 5.6295   LearningRate 0.0206   Epoch: 10   Global Step: 182350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:06,059-Speed 9428.04 samples/sec   Loss 5.6184   LearningRate 0.0206   Epoch: 10   Global Step: 182360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:07,171-Speed 9214.44 samples/sec   Loss 5.6524   LearningRate 0.0206   Epoch: 10   Global Step: 182370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:08,254-Speed 9460.59 samples/sec   Loss 5.6582   LearningRate 0.0206   Epoch: 10   Global Step: 182380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:09,342-Speed 9410.68 samples/sec   Loss 5.5448   LearningRate 0.0206   Epoch: 10   Global Step: 182390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:10,428-Speed 9437.89 samples/sec   Loss 5.5687   LearningRate 0.0206   Epoch: 10   Global Step: 182400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:11,474-Speed 9793.42 samples/sec   Loss 5.6035   LearningRate 0.0206   Epoch: 10   Global Step: 182410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:12,559-Speed 9445.18 samples/sec   Loss 5.5611   LearningRate 0.0206   Epoch: 10   Global Step: 182420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:13,638-Speed 9501.08 samples/sec   Loss 5.5608   LearningRate 0.0206   Epoch: 10   Global Step: 182430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:14,691-Speed 9729.09 samples/sec   Loss 5.6674   LearningRate 0.0206   Epoch: 10   Global Step: 182440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:54:15,762-Speed 9564.08 samples/sec   Loss 5.6098   LearningRate 0.0206   Epoch: 10   Global Step: 182450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:54:16,849-Speed 9425.79 samples/sec   Loss 5.6631   LearningRate 0.0206   Epoch: 10   Global Step: 182460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:54:17,931-Speed 9476.75 samples/sec   Loss 5.7545   LearningRate 0.0206   Epoch: 10   Global Step: 182470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:54:19,027-Speed 9344.83 samples/sec   Loss 5.4929   LearningRate 0.0206   Epoch: 10   Global Step: 182480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:54:20,102-Speed 9532.86 samples/sec   Loss 5.5826   LearningRate 0.0206   Epoch: 10   Global Step: 182490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:54:21,194-Speed 9382.36 samples/sec   Loss 5.5249   LearningRate 0.0205   Epoch: 10   Global Step: 182500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:54:22,295-Speed 9302.69 samples/sec   Loss 5.4344   LearningRate 0.0205   Epoch: 10   Global Step: 182510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:54:23,406-Speed 9226.23 samples/sec   Loss 5.5435   LearningRate 0.0205   Epoch: 10   Global Step: 182520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:54:24,487-Speed 9477.90 samples/sec   Loss 5.5617   LearningRate 0.0205   Epoch: 10   Global Step: 182530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:54:25,565-Speed 9503.94 samples/sec   Loss 5.5567   LearningRate 0.0205   Epoch: 10   Global Step: 182540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:26,661-Speed 9351.39 samples/sec   Loss 5.5380   LearningRate 0.0205   Epoch: 10   Global Step: 182550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:27,750-Speed 9404.57 samples/sec   Loss 5.5377   LearningRate 0.0205   Epoch: 10   Global Step: 182560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:28,813-Speed 9644.50 samples/sec   Loss 5.6343   LearningRate 0.0205   Epoch: 10   Global Step: 182570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:29,918-Speed 9272.82 samples/sec   Loss 5.5837   LearningRate 0.0205   Epoch: 10   Global Step: 182580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:30,982-Speed 9631.97 samples/sec   Loss 5.6204   LearningRate 0.0205   Epoch: 10   Global Step: 182590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:32,041-Speed 9676.32 samples/sec   Loss 5.5737   LearningRate 0.0205   Epoch: 10   Global Step: 182600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:33,158-Speed 9166.57 samples/sec   Loss 5.5436   LearningRate 0.0205   Epoch: 10   Global Step: 182610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:34,243-Speed 9447.67 samples/sec   Loss 5.6154   LearningRate 0.0205   Epoch: 10   Global Step: 182620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:35,369-Speed 9097.99 samples/sec   Loss 5.7129   LearningRate 0.0205   Epoch: 10   Global Step: 182630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:36,476-Speed 9252.50 samples/sec   Loss 5.5720   LearningRate 0.0205   Epoch: 10   Global Step: 182640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:37,550-Speed 9545.06 samples/sec   Loss 5.6051   LearningRate 0.0205   Epoch: 10   Global Step: 182650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:38,682-Speed 9044.39 samples/sec   Loss 5.5697   LearningRate 0.0205   Epoch: 10   Global Step: 182660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:39,766-Speed 9458.13 samples/sec   Loss 5.5202   LearningRate 0.0205   Epoch: 10   Global Step: 182670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:40,811-Speed 9800.11 samples/sec   Loss 5.5546   LearningRate 0.0205   Epoch: 10   Global Step: 182680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:41,847-Speed 9895.70 samples/sec   Loss 5.5447   LearningRate 0.0205   Epoch: 10   Global Step: 182690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:42,948-Speed 9309.04 samples/sec   Loss 5.5890   LearningRate 0.0205   Epoch: 10   Global Step: 182700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:44,012-Speed 9630.42 samples/sec   Loss 5.6779   LearningRate 0.0205   Epoch: 10   Global Step: 182710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:45,085-Speed 9549.86 samples/sec   Loss 5.5508   LearningRate 0.0205   Epoch: 10   Global Step: 182720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:46,188-Speed 9287.69 samples/sec   Loss 5.6389   LearningRate 0.0205   Epoch: 10   Global Step: 182730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:47,302-Speed 9190.58 samples/sec   Loss 5.6313   LearningRate 0.0205   Epoch: 10   Global Step: 182740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:48,388-Speed 9436.24 samples/sec   Loss 5.5885   LearningRate 0.0205   Epoch: 10   Global Step: 182750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:49,442-Speed 9724.90 samples/sec   Loss 5.5795   LearningRate 0.0205   Epoch: 10   Global Step: 182760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:50,530-Speed 9414.00 samples/sec   Loss 5.5564   LearningRate 0.0205   Epoch: 10   Global Step: 182770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:51,590-Speed 9675.26 samples/sec   Loss 5.6013   LearningRate 0.0205   Epoch: 10   Global Step: 182780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:52,629-Speed 9855.74 samples/sec   Loss 5.6042   LearningRate 0.0205   Epoch: 10   Global Step: 182790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:53,786-Speed 8856.23 samples/sec   Loss 5.5566   LearningRate 0.0205   Epoch: 10   Global Step: 182800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:54,867-Speed 9482.50 samples/sec   Loss 5.5655   LearningRate 0.0205   Epoch: 10   Global Step: 182810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:55,936-Speed 9578.95 samples/sec   Loss 5.5553   LearningRate 0.0205   Epoch: 10   Global Step: 182820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:57,003-Speed 9602.68 samples/sec   Loss 5.6045   LearningRate 0.0205   Epoch: 10   Global Step: 182830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:54:58,041-Speed 9880.28 samples/sec   Loss 5.6326   LearningRate 0.0205   Epoch: 10   Global Step: 182840   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:54:59,095-Speed 9716.16 samples/sec   Loss 5.5484   LearningRate 0.0205   Epoch: 10   Global Step: 182850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:00,159-Speed 9627.15 samples/sec   Loss 5.5424   LearningRate 0.0205   Epoch: 10   Global Step: 182860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:01,243-Speed 9453.14 samples/sec   Loss 5.5995   LearningRate 0.0204   Epoch: 10   Global Step: 182870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:02,303-Speed 9683.56 samples/sec   Loss 5.5921   LearningRate 0.0204   Epoch: 10   Global Step: 182880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:03,347-Speed 9815.19 samples/sec   Loss 5.6609   LearningRate 0.0204   Epoch: 10   Global Step: 182890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:04,436-Speed 9403.43 samples/sec   Loss 5.6274   LearningRate 0.0204   Epoch: 10   Global Step: 182900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:05,526-Speed 9405.60 samples/sec   Loss 5.6448   LearningRate 0.0204   Epoch: 10   Global Step: 182910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:06,639-Speed 9199.12 samples/sec   Loss 5.5837   LearningRate 0.0204   Epoch: 10   Global Step: 182920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:07,724-Speed 9445.35 samples/sec   Loss 5.5744   LearningRate 0.0204   Epoch: 10   Global Step: 182930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:08,820-Speed 9351.82 samples/sec   Loss 5.6055   LearningRate 0.0204   Epoch: 10   Global Step: 182940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:09,899-Speed 9492.34 samples/sec   Loss 5.5055   LearningRate 0.0204   Epoch: 10   Global Step: 182950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:10,953-Speed 9729.43 samples/sec   Loss 5.6040   LearningRate 0.0204   Epoch: 10   Global Step: 182960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:12,047-Speed 9365.02 samples/sec   Loss 5.4835   LearningRate 0.0204   Epoch: 10   Global Step: 182970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:13,079-Speed 9926.40 samples/sec   Loss 5.6108   LearningRate 0.0204   Epoch: 10   Global Step: 182980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:14,157-Speed 9504.17 samples/sec   Loss 5.5905   LearningRate 0.0204   Epoch: 10   Global Step: 182990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:15,222-Speed 9622.28 samples/sec   Loss 5.4704   LearningRate 0.0204   Epoch: 10   Global Step: 183000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:16,332-Speed 9226.09 samples/sec   Loss 5.5883   LearningRate 0.0204   Epoch: 10   Global Step: 183010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:17,407-Speed 9526.15 samples/sec   Loss 5.5622   LearningRate 0.0204   Epoch: 10   Global Step: 183020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:18,497-Speed 9407.50 samples/sec   Loss 5.5588   LearningRate 0.0204   Epoch: 10   Global Step: 183030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:19,609-Speed 9219.15 samples/sec   Loss 5.6118   LearningRate 0.0204   Epoch: 10   Global Step: 183040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:20,681-Speed 9555.69 samples/sec   Loss 5.5545   LearningRate 0.0204   Epoch: 10   Global Step: 183050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:21,735-Speed 9723.28 samples/sec   Loss 5.5672   LearningRate 0.0204   Epoch: 10   Global Step: 183060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:22,860-Speed 9107.23 samples/sec   Loss 5.5944   LearningRate 0.0204   Epoch: 10   Global Step: 183070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:24,012-Speed 8899.89 samples/sec   Loss 5.6068   LearningRate 0.0204   Epoch: 10   Global Step: 183080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:25,104-Speed 9377.92 samples/sec   Loss 5.5958   LearningRate 0.0204   Epoch: 10   Global Step: 183090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:26,165-Speed 9654.45 samples/sec   Loss 5.5829   LearningRate 0.0204   Epoch: 10   Global Step: 183100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:27,295-Speed 9067.54 samples/sec   Loss 5.5000   LearningRate 0.0204   Epoch: 10   Global Step: 183110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:28,393-Speed 9338.17 samples/sec   Loss 5.5956   LearningRate 0.0204   Epoch: 10   Global Step: 183120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:29,477-Speed 9452.25 samples/sec   Loss 5.6265   LearningRate 0.0204   Epoch: 10   Global Step: 183130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:30,547-Speed 9572.98 samples/sec   Loss 5.5586   LearningRate 0.0204   Epoch: 10   Global Step: 183140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:31,651-Speed 9283.36 samples/sec   Loss 5.6304   LearningRate 0.0204   Epoch: 10   Global Step: 183150   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:55:32,745-Speed 9364.32 samples/sec   Loss 5.5727   LearningRate 0.0204   Epoch: 10   Global Step: 183160   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:55:33,865-Speed 9149.84 samples/sec   Loss 5.5119   LearningRate 0.0204   Epoch: 10   Global Step: 183170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:35,038-Speed 8735.18 samples/sec   Loss 5.6349   LearningRate 0.0204   Epoch: 10   Global Step: 183180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:36,096-Speed 9685.14 samples/sec   Loss 5.7069   LearningRate 0.0204   Epoch: 10   Global Step: 183190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:37,180-Speed 9448.62 samples/sec   Loss 5.5557   LearningRate 0.0204   Epoch: 10   Global Step: 183200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:38,254-Speed 9538.62 samples/sec   Loss 5.5420   LearningRate 0.0204   Epoch: 10   Global Step: 183210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:39,357-Speed 9295.05 samples/sec   Loss 5.5582   LearningRate 0.0204   Epoch: 10   Global Step: 183220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:40,409-Speed 9743.24 samples/sec   Loss 5.5730   LearningRate 0.0204   Epoch: 10   Global Step: 183230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:41,479-Speed 9573.96 samples/sec   Loss 5.7083   LearningRate 0.0203   Epoch: 10   Global Step: 183240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:42,592-Speed 9207.03 samples/sec   Loss 5.6759   LearningRate 0.0203   Epoch: 10   Global Step: 183250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:43,648-Speed 9694.78 samples/sec   Loss 5.6249   LearningRate 0.0203   Epoch: 10   Global Step: 183260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:44,743-Speed 9360.67 samples/sec   Loss 5.5807   LearningRate 0.0203   Epoch: 10   Global Step: 183270   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:55:45,836-Speed 9378.71 samples/sec   Loss 5.6503   LearningRate 0.0203   Epoch: 10   Global Step: 183280   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:55:46,939-Speed 9286.58 samples/sec   Loss 5.6135   LearningRate 0.0203   Epoch: 10   Global Step: 183290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:48,053-Speed 9193.50 samples/sec   Loss 5.5318   LearningRate 0.0203   Epoch: 10   Global Step: 183300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:49,137-Speed 9451.64 samples/sec   Loss 5.5127   LearningRate 0.0203   Epoch: 10   Global Step: 183310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:50,240-Speed 9291.35 samples/sec   Loss 5.6676   LearningRate 0.0203   Epoch: 10   Global Step: 183320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:51,314-Speed 9537.95 samples/sec   Loss 5.6666   LearningRate 0.0203   Epoch: 10   Global Step: 183330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:52,392-Speed 9508.32 samples/sec   Loss 5.5407   LearningRate 0.0203   Epoch: 10   Global Step: 183340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:53,518-Speed 9100.11 samples/sec   Loss 5.5676   LearningRate 0.0203   Epoch: 10   Global Step: 183350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:54,642-Speed 9114.97 samples/sec   Loss 5.5671   LearningRate 0.0203   Epoch: 10   Global Step: 183360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:55:55,704-Speed 9652.39 samples/sec   Loss 5.5550   LearningRate 0.0203   Epoch: 10   Global Step: 183370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:55:56,783-Speed 9493.17 samples/sec   Loss 5.6711   LearningRate 0.0203   Epoch: 10   Global Step: 183380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:55:57,881-Speed 9335.78 samples/sec   Loss 5.4884   LearningRate 0.0203   Epoch: 10   Global Step: 183390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:55:58,930-Speed 9771.17 samples/sec   Loss 5.6652   LearningRate 0.0203   Epoch: 10   Global Step: 183400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:55:59,990-Speed 9673.11 samples/sec   Loss 5.5816   LearningRate 0.0203   Epoch: 10   Global Step: 183410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:56:01,084-Speed 9360.50 samples/sec   Loss 5.5389   LearningRate 0.0203   Epoch: 10   Global Step: 183420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:56:02,134-Speed 9760.64 samples/sec   Loss 5.5983   LearningRate 0.0203   Epoch: 10   Global Step: 183430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:56:03,187-Speed 9728.76 samples/sec   Loss 5.6523   LearningRate 0.0203   Epoch: 10   Global Step: 183440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:56:04,280-Speed 9370.67 samples/sec   Loss 5.5508   LearningRate 0.0203   Epoch: 10   Global Step: 183450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:56:05,348-Speed 9596.50 samples/sec   Loss 5.6522   LearningRate 0.0203   Epoch: 10   Global Step: 183460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:56:06,395-Speed 9787.31 samples/sec   Loss 5.5624   LearningRate 0.0203   Epoch: 10   Global Step: 183470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:07,507-Speed 9213.67 samples/sec   Loss 5.5962   LearningRate 0.0203   Epoch: 10   Global Step: 183480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:08,621-Speed 9199.14 samples/sec   Loss 5.6105   LearningRate 0.0203   Epoch: 10   Global Step: 183490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:09,709-Speed 9413.01 samples/sec   Loss 5.6253   LearningRate 0.0203   Epoch: 10   Global Step: 183500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:10,817-Speed 9247.45 samples/sec   Loss 5.6466   LearningRate 0.0203   Epoch: 10   Global Step: 183510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:11,851-Speed 9920.22 samples/sec   Loss 5.5812   LearningRate 0.0203   Epoch: 10   Global Step: 183520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:12,954-Speed 9289.19 samples/sec   Loss 5.6129   LearningRate 0.0203   Epoch: 10   Global Step: 183530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:14,008-Speed 9715.42 samples/sec   Loss 5.6037   LearningRate 0.0203   Epoch: 10   Global Step: 183540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:15,063-Speed 9712.94 samples/sec   Loss 5.5188   LearningRate 0.0203   Epoch: 10   Global Step: 183550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:16,126-Speed 9643.33 samples/sec   Loss 5.5483   LearningRate 0.0203   Epoch: 10   Global Step: 183560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:17,204-Speed 9503.76 samples/sec   Loss 5.5635   LearningRate 0.0203   Epoch: 10   Global Step: 183570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:18,274-Speed 9574.32 samples/sec   Loss 5.5489   LearningRate 0.0203   Epoch: 10   Global Step: 183580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:19,605-Speed 7694.51 samples/sec   Loss 5.5103   LearningRate 0.0203   Epoch: 10   Global Step: 183590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:20,616-Speed 10135.68 samples/sec   Loss 5.5595   LearningRate 0.0203   Epoch: 10   Global Step: 183600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:48,665-Speed 365.09 samples/sec   Loss 4.9268   LearningRate 0.0202   Epoch: 11   Global Step: 183610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:50,040-Speed 7449.33 samples/sec   Loss 4.8029   LearningRate 0.0202   Epoch: 11   Global Step: 183620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:51,201-Speed 8838.64 samples/sec   Loss 4.9345   LearningRate 0.0202   Epoch: 11   Global Step: 183630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:52,336-Speed 9031.47 samples/sec   Loss 4.8140   LearningRate 0.0202   Epoch: 11   Global Step: 183640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:53,869-Speed 6682.77 samples/sec   Loss 4.7780   LearningRate 0.0202   Epoch: 11   Global Step: 183650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:55,302-Speed 7148.93 samples/sec   Loss 4.7841   LearningRate 0.0202   Epoch: 11   Global Step: 183660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:56,390-Speed 9412.16 samples/sec   Loss 4.8076   LearningRate 0.0202   Epoch: 11   Global Step: 183670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:57,475-Speed 9446.82 samples/sec   Loss 4.7526   LearningRate 0.0202   Epoch: 11   Global Step: 183680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:58,650-Speed 8725.75 samples/sec   Loss 4.8343   LearningRate 0.0202   Epoch: 11   Global Step: 183690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:56:59,754-Speed 9282.14 samples/sec   Loss 4.9393   LearningRate 0.0202   Epoch: 11   Global Step: 183700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:00,841-Speed 9427.36 samples/sec   Loss 4.9270   LearningRate 0.0202   Epoch: 11   Global Step: 183710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:01,986-Speed 8954.68 samples/sec   Loss 4.7821   LearningRate 0.0202   Epoch: 11   Global Step: 183720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:03,156-Speed 8755.53 samples/sec   Loss 4.7759   LearningRate 0.0202   Epoch: 11   Global Step: 183730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:04,250-Speed 9363.43 samples/sec   Loss 4.8451   LearningRate 0.0202   Epoch: 11   Global Step: 183740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:05,313-Speed 9634.49 samples/sec   Loss 4.8778   LearningRate 0.0202   Epoch: 11   Global Step: 183750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:57:06,433-Speed 9151.17 samples/sec   Loss 4.9040   LearningRate 0.0202   Epoch: 11   Global Step: 183760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:57:07,531-Speed 9331.12 samples/sec   Loss 4.8773   LearningRate 0.0202   Epoch: 11   Global Step: 183770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:57:08,598-Speed 9597.86 samples/sec   Loss 4.9153   LearningRate 0.0202   Epoch: 11   Global Step: 183780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:57:09,655-Speed 9694.50 samples/sec   Loss 4.7982   LearningRate 0.0202   Epoch: 11   Global Step: 183790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:57:10,748-Speed 9381.38 samples/sec   Loss 4.8464   LearningRate 0.0202   Epoch: 11   Global Step: 183800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:57:11,868-Speed 9144.15 samples/sec   Loss 4.8691   LearningRate 0.0202   Epoch: 11   Global Step: 183810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:57:12,966-Speed 9334.33 samples/sec   Loss 4.8941   LearningRate 0.0202   Epoch: 11   Global Step: 183820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:57:14,047-Speed 9477.02 samples/sec   Loss 4.7691   LearningRate 0.0202   Epoch: 11   Global Step: 183830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:57:15,137-Speed 9397.55 samples/sec   Loss 4.8517   LearningRate 0.0202   Epoch: 11   Global Step: 183840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:57:16,574-Speed 7130.00 samples/sec   Loss 4.8946   LearningRate 0.0202   Epoch: 11   Global Step: 183850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:17,823-Speed 8202.65 samples/sec   Loss 4.9000   LearningRate 0.0202   Epoch: 11   Global Step: 183860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:19,488-Speed 6150.86 samples/sec   Loss 4.8430   LearningRate 0.0202   Epoch: 11   Global Step: 183870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:20,548-Speed 9671.04 samples/sec   Loss 4.8591   LearningRate 0.0202   Epoch: 11   Global Step: 183880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:21,652-Speed 9280.29 samples/sec   Loss 4.9169   LearningRate 0.0202   Epoch: 11   Global Step: 183890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:22,797-Speed 8954.66 samples/sec   Loss 4.9579   LearningRate 0.0202   Epoch: 11   Global Step: 183900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:24,107-Speed 7816.87 samples/sec   Loss 4.9482   LearningRate 0.0202   Epoch: 11   Global Step: 183910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:25,242-Speed 9030.53 samples/sec   Loss 4.9164   LearningRate 0.0202   Epoch: 11   Global Step: 183920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:26,354-Speed 9208.02 samples/sec   Loss 4.7748   LearningRate 0.0202   Epoch: 11   Global Step: 183930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:27,446-Speed 9386.44 samples/sec   Loss 4.8495   LearningRate 0.0202   Epoch: 11   Global Step: 183940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:28,488-Speed 9829.92 samples/sec   Loss 4.9308   LearningRate 0.0202   Epoch: 11   Global Step: 183950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:29,620-Speed 9059.30 samples/sec   Loss 4.9244   LearningRate 0.0202   Epoch: 11   Global Step: 183960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:30,693-Speed 9548.61 samples/sec   Loss 4.8406   LearningRate 0.0202   Epoch: 11   Global Step: 183970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:57:31,774-Speed 9473.97 samples/sec   Loss 4.8899   LearningRate 0.0201   Epoch: 11   Global Step: 183980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:57:32,843-Speed 9587.25 samples/sec   Loss 4.9610   LearningRate 0.0201   Epoch: 11   Global Step: 183990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:57:33,930-Speed 9425.75 samples/sec   Loss 4.9508   LearningRate 0.0201   Epoch: 11   Global Step: 184000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:57:55,934-[lfw][184000]XNorm: 9.015050
Training: 2022-04-11 18:57:55,934-[lfw][184000]Accuracy-Flip: 0.99583+-0.00310
Training: 2022-04-11 18:57:55,935-[lfw][184000]Accuracy-Highest: 0.99683
Training: 2022-04-11 18:58:21,329-[cfp_fp][184000]XNorm: 7.712500
Training: 2022-04-11 18:58:21,330-[cfp_fp][184000]Accuracy-Flip: 0.96586+-0.00839
Training: 2022-04-11 18:58:21,331-[cfp_fp][184000]Accuracy-Highest: 0.96643
Training: 2022-04-11 18:58:43,151-[agedb_30][184000]XNorm: 8.715467
Training: 2022-04-11 18:58:43,152-[agedb_30][184000]Accuracy-Flip: 0.96383+-0.01000
Training: 2022-04-11 18:58:43,152-[agedb_30][184000]Accuracy-Highest: 0.96917
Training: 2022-04-11 18:58:44,240-Speed 145.64 samples/sec   Loss 4.8704   LearningRate 0.0201   Epoch: 11   Global Step: 184010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:58:45,313-Speed 9552.59 samples/sec   Loss 4.8666   LearningRate 0.0201   Epoch: 11   Global Step: 184020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:58:46,400-Speed 9421.12 samples/sec   Loss 4.9195   LearningRate 0.0201   Epoch: 11   Global Step: 184030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:58:47,467-Speed 9605.65 samples/sec   Loss 4.9889   LearningRate 0.0201   Epoch: 11   Global Step: 184040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:58:48,551-Speed 9448.92 samples/sec   Loss 4.9943   LearningRate 0.0201   Epoch: 11   Global Step: 184050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:58:49,614-Speed 9640.84 samples/sec   Loss 4.9615   LearningRate 0.0201   Epoch: 11   Global Step: 184060   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:58:50,697-Speed 9461.96 samples/sec   Loss 5.0203   LearningRate 0.0201   Epoch: 11   Global Step: 184070   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 18:58:51,795-Speed 9328.08 samples/sec   Loss 4.9540   LearningRate 0.0201   Epoch: 11   Global Step: 184080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:58:52,908-Speed 9213.43 samples/sec   Loss 4.8982   LearningRate 0.0201   Epoch: 11   Global Step: 184090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:58:53,974-Speed 9611.70 samples/sec   Loss 4.9565   LearningRate 0.0201   Epoch: 11   Global Step: 184100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:58:55,029-Speed 9710.71 samples/sec   Loss 5.0132   LearningRate 0.0201   Epoch: 11   Global Step: 184110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:58:56,073-Speed 9814.51 samples/sec   Loss 5.0160   LearningRate 0.0201   Epoch: 11   Global Step: 184120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:58:57,168-Speed 9351.16 samples/sec   Loss 4.8398   LearningRate 0.0201   Epoch: 11   Global Step: 184130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:58:58,232-Speed 9634.44 samples/sec   Loss 4.9400   LearningRate 0.0201   Epoch: 11   Global Step: 184140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:58:59,317-Speed 9447.61 samples/sec   Loss 4.9002   LearningRate 0.0201   Epoch: 11   Global Step: 184150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:00,399-Speed 9470.22 samples/sec   Loss 4.9262   LearningRate 0.0201   Epoch: 11   Global Step: 184160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:01,698-Speed 7885.94 samples/sec   Loss 4.9077   LearningRate 0.0201   Epoch: 11   Global Step: 184170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:02,790-Speed 9381.44 samples/sec   Loss 5.0080   LearningRate 0.0201   Epoch: 11   Global Step: 184180   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:59:03,877-Speed 9424.02 samples/sec   Loss 4.8504   LearningRate 0.0201   Epoch: 11   Global Step: 184190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:05,015-Speed 9001.50 samples/sec   Loss 5.0051   LearningRate 0.0201   Epoch: 11   Global Step: 184200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:06,100-Speed 9447.40 samples/sec   Loss 5.0051   LearningRate 0.0201   Epoch: 11   Global Step: 184210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:07,216-Speed 9181.72 samples/sec   Loss 5.0235   LearningRate 0.0201   Epoch: 11   Global Step: 184220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:08,335-Speed 9157.06 samples/sec   Loss 4.8270   LearningRate 0.0201   Epoch: 11   Global Step: 184230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:09,388-Speed 9730.79 samples/sec   Loss 4.8826   LearningRate 0.0201   Epoch: 11   Global Step: 184240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:10,446-Speed 9683.00 samples/sec   Loss 4.9405   LearningRate 0.0201   Epoch: 11   Global Step: 184250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:11,542-Speed 9351.97 samples/sec   Loss 4.9825   LearningRate 0.0201   Epoch: 11   Global Step: 184260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:12,623-Speed 9481.90 samples/sec   Loss 4.8906   LearningRate 0.0201   Epoch: 11   Global Step: 184270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:13,714-Speed 9388.93 samples/sec   Loss 4.9033   LearningRate 0.0201   Epoch: 11   Global Step: 184280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:14,779-Speed 9620.19 samples/sec   Loss 4.9671   LearningRate 0.0201   Epoch: 11   Global Step: 184290   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:59:15,851-Speed 9560.40 samples/sec   Loss 4.9376   LearningRate 0.0201   Epoch: 11   Global Step: 184300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:16,929-Speed 9502.78 samples/sec   Loss 4.9059   LearningRate 0.0201   Epoch: 11   Global Step: 184310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:18,009-Speed 9483.67 samples/sec   Loss 4.9334   LearningRate 0.0201   Epoch: 11   Global Step: 184320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:19,125-Speed 9182.44 samples/sec   Loss 4.9056   LearningRate 0.0201   Epoch: 11   Global Step: 184330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:20,201-Speed 9528.24 samples/sec   Loss 4.8996   LearningRate 0.0201   Epoch: 11   Global Step: 184340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:21,281-Speed 9483.56 samples/sec   Loss 5.0497   LearningRate 0.0200   Epoch: 11   Global Step: 184350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:22,392-Speed 9234.03 samples/sec   Loss 4.9743   LearningRate 0.0200   Epoch: 11   Global Step: 184360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:23,528-Speed 9015.22 samples/sec   Loss 4.9630   LearningRate 0.0200   Epoch: 11   Global Step: 184370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:24,662-Speed 9035.45 samples/sec   Loss 5.0720   LearningRate 0.0200   Epoch: 11   Global Step: 184380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:25,754-Speed 9381.10 samples/sec   Loss 4.9187   LearningRate 0.0200   Epoch: 11   Global Step: 184390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:26,827-Speed 9547.66 samples/sec   Loss 5.0154   LearningRate 0.0200   Epoch: 11   Global Step: 184400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:27,910-Speed 9462.23 samples/sec   Loss 4.9531   LearningRate 0.0200   Epoch: 11   Global Step: 184410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:29,035-Speed 9116.17 samples/sec   Loss 4.9653   LearningRate 0.0200   Epoch: 11   Global Step: 184420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:30,080-Speed 9803.08 samples/sec   Loss 5.0883   LearningRate 0.0200   Epoch: 11   Global Step: 184430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:31,166-Speed 9438.27 samples/sec   Loss 4.9243   LearningRate 0.0200   Epoch: 11   Global Step: 184440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:32,234-Speed 9590.66 samples/sec   Loss 5.0714   LearningRate 0.0200   Epoch: 11   Global Step: 184450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:33,317-Speed 9459.53 samples/sec   Loss 5.0511   LearningRate 0.0200   Epoch: 11   Global Step: 184460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:34,389-Speed 9560.18 samples/sec   Loss 4.9576   LearningRate 0.0200   Epoch: 11   Global Step: 184470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:35,501-Speed 9212.75 samples/sec   Loss 4.9890   LearningRate 0.0200   Epoch: 11   Global Step: 184480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:36,593-Speed 9383.31 samples/sec   Loss 5.0397   LearningRate 0.0200   Epoch: 11   Global Step: 184490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:37,683-Speed 9400.62 samples/sec   Loss 4.9498   LearningRate 0.0200   Epoch: 11   Global Step: 184500   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 18:59:38,737-Speed 9718.05 samples/sec   Loss 4.9557   LearningRate 0.0200   Epoch: 11   Global Step: 184510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:39,860-Speed 9126.98 samples/sec   Loss 4.9647   LearningRate 0.0200   Epoch: 11   Global Step: 184520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:40,935-Speed 9527.62 samples/sec   Loss 4.8969   LearningRate 0.0200   Epoch: 11   Global Step: 184530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:42,040-Speed 9277.75 samples/sec   Loss 4.9401   LearningRate 0.0200   Epoch: 11   Global Step: 184540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:43,117-Speed 9513.04 samples/sec   Loss 4.9732   LearningRate 0.0200   Epoch: 11   Global Step: 184550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:44,224-Speed 9257.29 samples/sec   Loss 4.9558   LearningRate 0.0200   Epoch: 11   Global Step: 184560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:45,317-Speed 9374.67 samples/sec   Loss 5.1519   LearningRate 0.0200   Epoch: 11   Global Step: 184570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:46,378-Speed 9653.37 samples/sec   Loss 4.9444   LearningRate 0.0200   Epoch: 11   Global Step: 184580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:47,428-Speed 9763.40 samples/sec   Loss 5.1081   LearningRate 0.0200   Epoch: 11   Global Step: 184590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:48,472-Speed 9809.62 samples/sec   Loss 5.0185   LearningRate 0.0200   Epoch: 11   Global Step: 184600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:49,527-Speed 9718.98 samples/sec   Loss 4.9611   LearningRate 0.0200   Epoch: 11   Global Step: 184610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:50,608-Speed 9473.90 samples/sec   Loss 5.0117   LearningRate 0.0200   Epoch: 11   Global Step: 184620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:51,685-Speed 9512.96 samples/sec   Loss 4.9720   LearningRate 0.0200   Epoch: 11   Global Step: 184630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:52,722-Speed 9888.53 samples/sec   Loss 4.9262   LearningRate 0.0200   Epoch: 11   Global Step: 184640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:53,801-Speed 9493.64 samples/sec   Loss 4.9664   LearningRate 0.0200   Epoch: 11   Global Step: 184650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:54,899-Speed 9334.62 samples/sec   Loss 5.0768   LearningRate 0.0200   Epoch: 11   Global Step: 184660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:55,942-Speed 9817.39 samples/sec   Loss 4.9497   LearningRate 0.0200   Epoch: 11   Global Step: 184670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:57,001-Speed 9673.34 samples/sec   Loss 4.8980   LearningRate 0.0200   Epoch: 11   Global Step: 184680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:58,114-Speed 9206.53 samples/sec   Loss 4.9444   LearningRate 0.0200   Epoch: 11   Global Step: 184690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 18:59:59,186-Speed 9564.69 samples/sec   Loss 4.9389   LearningRate 0.0200   Epoch: 11   Global Step: 184700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:00,256-Speed 9576.87 samples/sec   Loss 4.9458   LearningRate 0.0200   Epoch: 11   Global Step: 184710   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:00:01,393-Speed 9007.13 samples/sec   Loss 4.9826   LearningRate 0.0199   Epoch: 11   Global Step: 184720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:02,505-Speed 9211.21 samples/sec   Loss 5.1168   LearningRate 0.0199   Epoch: 11   Global Step: 184730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:03,606-Speed 9311.46 samples/sec   Loss 5.0223   LearningRate 0.0199   Epoch: 11   Global Step: 184740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:04,706-Speed 9315.73 samples/sec   Loss 5.0012   LearningRate 0.0199   Epoch: 11   Global Step: 184750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:05,769-Speed 9637.83 samples/sec   Loss 4.9814   LearningRate 0.0199   Epoch: 11   Global Step: 184760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:06,817-Speed 9781.42 samples/sec   Loss 4.9704   LearningRate 0.0199   Epoch: 11   Global Step: 184770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:07,885-Speed 9595.34 samples/sec   Loss 4.9974   LearningRate 0.0199   Epoch: 11   Global Step: 184780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:08,974-Speed 9406.36 samples/sec   Loss 4.8379   LearningRate 0.0199   Epoch: 11   Global Step: 184790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:10,087-Speed 9200.66 samples/sec   Loss 4.9665   LearningRate 0.0199   Epoch: 11   Global Step: 184800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:11,206-Speed 9157.04 samples/sec   Loss 5.0285   LearningRate 0.0199   Epoch: 11   Global Step: 184810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:12,272-Speed 9620.40 samples/sec   Loss 5.0655   LearningRate 0.0199   Epoch: 11   Global Step: 184820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:13,326-Speed 9722.63 samples/sec   Loss 4.9257   LearningRate 0.0199   Epoch: 11   Global Step: 184830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:14,381-Speed 9711.85 samples/sec   Loss 4.9964   LearningRate 0.0199   Epoch: 11   Global Step: 184840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:15,424-Speed 9820.54 samples/sec   Loss 5.0860   LearningRate 0.0199   Epoch: 11   Global Step: 184850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:16,495-Speed 9571.03 samples/sec   Loss 5.0771   LearningRate 0.0199   Epoch: 11   Global Step: 184860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:17,596-Speed 9303.37 samples/sec   Loss 5.0875   LearningRate 0.0199   Epoch: 11   Global Step: 184870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:18,700-Speed 9274.80 samples/sec   Loss 5.0197   LearningRate 0.0199   Epoch: 11   Global Step: 184880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:19,793-Speed 9379.85 samples/sec   Loss 5.0203   LearningRate 0.0199   Epoch: 11   Global Step: 184890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:20,876-Speed 9459.61 samples/sec   Loss 5.0834   LearningRate 0.0199   Epoch: 11   Global Step: 184900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:21,953-Speed 9510.17 samples/sec   Loss 5.0878   LearningRate 0.0199   Epoch: 11   Global Step: 184910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:23,031-Speed 9507.65 samples/sec   Loss 5.1517   LearningRate 0.0199   Epoch: 11   Global Step: 184920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:24,092-Speed 9652.33 samples/sec   Loss 5.0246   LearningRate 0.0199   Epoch: 11   Global Step: 184930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:25,170-Speed 9511.91 samples/sec   Loss 5.0587   LearningRate 0.0199   Epoch: 11   Global Step: 184940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:26,290-Speed 9146.68 samples/sec   Loss 5.0144   LearningRate 0.0199   Epoch: 11   Global Step: 184950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:27,395-Speed 9272.25 samples/sec   Loss 4.9734   LearningRate 0.0199   Epoch: 11   Global Step: 184960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:00:28,490-Speed 9354.21 samples/sec   Loss 5.0016   LearningRate 0.0199   Epoch: 11   Global Step: 184970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:00:29,570-Speed 9497.63 samples/sec   Loss 5.0511   LearningRate 0.0199   Epoch: 11   Global Step: 184980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:00:30,701-Speed 9056.31 samples/sec   Loss 5.0711   LearningRate 0.0199   Epoch: 11   Global Step: 184990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:00:31,805-Speed 9275.96 samples/sec   Loss 5.1738   LearningRate 0.0199   Epoch: 11   Global Step: 185000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:00:32,930-Speed 9114.51 samples/sec   Loss 5.1008   LearningRate 0.0199   Epoch: 11   Global Step: 185010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:00:33,999-Speed 9579.63 samples/sec   Loss 5.0458   LearningRate 0.0199   Epoch: 11   Global Step: 185020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:00:35,085-Speed 9431.55 samples/sec   Loss 4.9649   LearningRate 0.0199   Epoch: 11   Global Step: 185030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:00:36,198-Speed 9207.57 samples/sec   Loss 5.0783   LearningRate 0.0199   Epoch: 11   Global Step: 185040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:00:37,291-Speed 9378.35 samples/sec   Loss 4.9867   LearningRate 0.0199   Epoch: 11   Global Step: 185050   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:00:38,349-Speed 9683.04 samples/sec   Loss 5.1372   LearningRate 0.0199   Epoch: 11   Global Step: 185060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:39,422-Speed 9547.37 samples/sec   Loss 5.0468   LearningRate 0.0199   Epoch: 11   Global Step: 185070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:40,516-Speed 9359.94 samples/sec   Loss 5.0160   LearningRate 0.0199   Epoch: 11   Global Step: 185080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:41,616-Speed 9325.62 samples/sec   Loss 5.0661   LearningRate 0.0199   Epoch: 11   Global Step: 185090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:42,718-Speed 9290.59 samples/sec   Loss 5.0360   LearningRate 0.0198   Epoch: 11   Global Step: 185100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:43,787-Speed 9588.06 samples/sec   Loss 4.9953   LearningRate 0.0198   Epoch: 11   Global Step: 185110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:44,851-Speed 9635.60 samples/sec   Loss 5.0402   LearningRate 0.0198   Epoch: 11   Global Step: 185120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:45,965-Speed 9190.97 samples/sec   Loss 5.0259   LearningRate 0.0198   Epoch: 11   Global Step: 185130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:47,004-Speed 9863.69 samples/sec   Loss 5.0879   LearningRate 0.0198   Epoch: 11   Global Step: 185140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:48,068-Speed 9629.31 samples/sec   Loss 4.9577   LearningRate 0.0198   Epoch: 11   Global Step: 185150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:49,134-Speed 9612.38 samples/sec   Loss 4.9642   LearningRate 0.0198   Epoch: 11   Global Step: 185160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:50,221-Speed 9426.10 samples/sec   Loss 5.1123   LearningRate 0.0198   Epoch: 11   Global Step: 185170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:51,305-Speed 9454.59 samples/sec   Loss 5.1389   LearningRate 0.0198   Epoch: 11   Global Step: 185180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:52,365-Speed 9671.08 samples/sec   Loss 5.1102   LearningRate 0.0198   Epoch: 11   Global Step: 185190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:53,410-Speed 9802.22 samples/sec   Loss 4.9574   LearningRate 0.0198   Epoch: 11   Global Step: 185200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:54,469-Speed 9672.20 samples/sec   Loss 5.0123   LearningRate 0.0198   Epoch: 11   Global Step: 185210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:55,540-Speed 9568.80 samples/sec   Loss 5.0993   LearningRate 0.0198   Epoch: 11   Global Step: 185220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:00:56,646-Speed 9266.80 samples/sec   Loss 5.0835   LearningRate 0.0198   Epoch: 11   Global Step: 185230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:00:57,722-Speed 9525.25 samples/sec   Loss 5.0583   LearningRate 0.0198   Epoch: 11   Global Step: 185240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:00:58,782-Speed 9669.70 samples/sec   Loss 5.0656   LearningRate 0.0198   Epoch: 11   Global Step: 185250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:00:59,923-Speed 8975.34 samples/sec   Loss 5.0316   LearningRate 0.0198   Epoch: 11   Global Step: 185260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:00,993-Speed 9574.17 samples/sec   Loss 5.0160   LearningRate 0.0198   Epoch: 11   Global Step: 185270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:02,077-Speed 9461.77 samples/sec   Loss 5.0325   LearningRate 0.0198   Epoch: 11   Global Step: 185280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:03,162-Speed 9439.86 samples/sec   Loss 5.0641   LearningRate 0.0198   Epoch: 11   Global Step: 185290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:04,219-Speed 9698.15 samples/sec   Loss 5.1305   LearningRate 0.0198   Epoch: 11   Global Step: 185300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:05,294-Speed 9522.29 samples/sec   Loss 5.0476   LearningRate 0.0198   Epoch: 11   Global Step: 185310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:06,403-Speed 9244.76 samples/sec   Loss 5.1513   LearningRate 0.0198   Epoch: 11   Global Step: 185320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:07,478-Speed 9527.83 samples/sec   Loss 5.0616   LearningRate 0.0198   Epoch: 11   Global Step: 185330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:08,590-Speed 9217.86 samples/sec   Loss 5.1141   LearningRate 0.0198   Epoch: 11   Global Step: 185340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:09,664-Speed 9533.35 samples/sec   Loss 5.0677   LearningRate 0.0198   Epoch: 11   Global Step: 185350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:10,801-Speed 9013.32 samples/sec   Loss 5.1108   LearningRate 0.0198   Epoch: 11   Global Step: 185360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:11,931-Speed 9065.46 samples/sec   Loss 5.0810   LearningRate 0.0198   Epoch: 11   Global Step: 185370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:13,001-Speed 9579.32 samples/sec   Loss 5.0718   LearningRate 0.0198   Epoch: 11   Global Step: 185380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:14,124-Speed 9126.02 samples/sec   Loss 5.1281   LearningRate 0.0198   Epoch: 11   Global Step: 185390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:15,193-Speed 9584.73 samples/sec   Loss 5.0483   LearningRate 0.0198   Epoch: 11   Global Step: 185400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:16,294-Speed 9303.75 samples/sec   Loss 4.9874   LearningRate 0.0198   Epoch: 11   Global Step: 185410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:17,396-Speed 9295.36 samples/sec   Loss 5.0039   LearningRate 0.0198   Epoch: 11   Global Step: 185420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:18,496-Speed 9314.73 samples/sec   Loss 5.0578   LearningRate 0.0198   Epoch: 11   Global Step: 185430   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:01:19,605-Speed 9242.05 samples/sec   Loss 5.0692   LearningRate 0.0198   Epoch: 11   Global Step: 185440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:20,747-Speed 8967.21 samples/sec   Loss 4.9882   LearningRate 0.0198   Epoch: 11   Global Step: 185450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:21,843-Speed 9351.08 samples/sec   Loss 5.1450   LearningRate 0.0198   Epoch: 11   Global Step: 185460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:22,947-Speed 9281.51 samples/sec   Loss 5.1633   LearningRate 0.0197   Epoch: 11   Global Step: 185470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:24,076-Speed 9076.50 samples/sec   Loss 5.0451   LearningRate 0.0197   Epoch: 11   Global Step: 185480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:25,156-Speed 9490.94 samples/sec   Loss 5.0752   LearningRate 0.0197   Epoch: 11   Global Step: 185490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:26,231-Speed 9528.54 samples/sec   Loss 5.0609   LearningRate 0.0197   Epoch: 11   Global Step: 185500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:27,275-Speed 9811.87 samples/sec   Loss 5.0356   LearningRate 0.0197   Epoch: 11   Global Step: 185510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:28,321-Speed 9795.29 samples/sec   Loss 5.2283   LearningRate 0.0197   Epoch: 11   Global Step: 185520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:29,405-Speed 9454.25 samples/sec   Loss 5.1222   LearningRate 0.0197   Epoch: 11   Global Step: 185530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:30,479-Speed 9544.53 samples/sec   Loss 5.0278   LearningRate 0.0197   Epoch: 11   Global Step: 185540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:31,548-Speed 9585.22 samples/sec   Loss 5.0752   LearningRate 0.0197   Epoch: 11   Global Step: 185550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:32,650-Speed 9292.50 samples/sec   Loss 5.1677   LearningRate 0.0197   Epoch: 11   Global Step: 185560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:33,738-Speed 9418.06 samples/sec   Loss 5.0854   LearningRate 0.0197   Epoch: 11   Global Step: 185570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:34,853-Speed 9191.02 samples/sec   Loss 5.0935   LearningRate 0.0197   Epoch: 11   Global Step: 185580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:35,922-Speed 9584.26 samples/sec   Loss 5.0839   LearningRate 0.0197   Epoch: 11   Global Step: 185590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:37,004-Speed 9470.79 samples/sec   Loss 5.2100   LearningRate 0.0197   Epoch: 11   Global Step: 185600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:38,091-Speed 9426.48 samples/sec   Loss 5.1797   LearningRate 0.0197   Epoch: 11   Global Step: 185610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:39,180-Speed 9405.32 samples/sec   Loss 5.0750   LearningRate 0.0197   Epoch: 11   Global Step: 185620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:40,260-Speed 9488.56 samples/sec   Loss 5.0557   LearningRate 0.0197   Epoch: 11   Global Step: 185630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:41,338-Speed 9510.62 samples/sec   Loss 5.0936   LearningRate 0.0197   Epoch: 11   Global Step: 185640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:42,435-Speed 9340.37 samples/sec   Loss 5.1522   LearningRate 0.0197   Epoch: 11   Global Step: 185650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:43,499-Speed 9627.28 samples/sec   Loss 5.0860   LearningRate 0.0197   Epoch: 11   Global Step: 185660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:44,589-Speed 9405.05 samples/sec   Loss 5.1532   LearningRate 0.0197   Epoch: 11   Global Step: 185670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:45,662-Speed 9543.48 samples/sec   Loss 5.1032   LearningRate 0.0197   Epoch: 11   Global Step: 185680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:46,714-Speed 9742.14 samples/sec   Loss 5.1292   LearningRate 0.0197   Epoch: 11   Global Step: 185690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:47,787-Speed 9553.02 samples/sec   Loss 5.1467   LearningRate 0.0197   Epoch: 11   Global Step: 185700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:48,867-Speed 9479.17 samples/sec   Loss 5.0591   LearningRate 0.0197   Epoch: 11   Global Step: 185710   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:01:49,924-Speed 9700.49 samples/sec   Loss 5.0276   LearningRate 0.0197   Epoch: 11   Global Step: 185720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:50,994-Speed 9569.18 samples/sec   Loss 5.2269   LearningRate 0.0197   Epoch: 11   Global Step: 185730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:52,091-Speed 9340.23 samples/sec   Loss 5.0910   LearningRate 0.0197   Epoch: 11   Global Step: 185740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:53,174-Speed 9468.61 samples/sec   Loss 5.0691   LearningRate 0.0197   Epoch: 11   Global Step: 185750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:54,251-Speed 9509.38 samples/sec   Loss 5.1600   LearningRate 0.0197   Epoch: 11   Global Step: 185760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:01:55,333-Speed 9466.56 samples/sec   Loss 5.1115   LearningRate 0.0197   Epoch: 11   Global Step: 185770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:56,436-Speed 9293.42 samples/sec   Loss 5.0308   LearningRate 0.0197   Epoch: 11   Global Step: 185780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:57,521-Speed 9444.69 samples/sec   Loss 5.0205   LearningRate 0.0197   Epoch: 11   Global Step: 185790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:58,599-Speed 9501.72 samples/sec   Loss 5.1337   LearningRate 0.0197   Epoch: 11   Global Step: 185800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:01:59,679-Speed 9495.50 samples/sec   Loss 5.1236   LearningRate 0.0197   Epoch: 11   Global Step: 185810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:02:00,746-Speed 9598.20 samples/sec   Loss 5.1519   LearningRate 0.0197   Epoch: 11   Global Step: 185820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:02:01,828-Speed 9468.59 samples/sec   Loss 5.1496   LearningRate 0.0197   Epoch: 11   Global Step: 185830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:02:02,891-Speed 9640.66 samples/sec   Loss 5.1714   LearningRate 0.0197   Epoch: 11   Global Step: 185840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:02:03,985-Speed 9363.44 samples/sec   Loss 5.1652   LearningRate 0.0196   Epoch: 11   Global Step: 185850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:02:05,048-Speed 9638.43 samples/sec   Loss 5.0879   LearningRate 0.0196   Epoch: 11   Global Step: 185860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:02:06,139-Speed 9387.87 samples/sec   Loss 5.1479   LearningRate 0.0196   Epoch: 11   Global Step: 185870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:02:07,220-Speed 9481.18 samples/sec   Loss 5.1829   LearningRate 0.0196   Epoch: 11   Global Step: 185880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:02:08,258-Speed 9877.28 samples/sec   Loss 5.1662   LearningRate 0.0196   Epoch: 11   Global Step: 185890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:02:09,347-Speed 9404.17 samples/sec   Loss 5.0926   LearningRate 0.0196   Epoch: 11   Global Step: 185900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:02:10,413-Speed 9614.51 samples/sec   Loss 5.1651   LearningRate 0.0196   Epoch: 11   Global Step: 185910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:02:11,477-Speed 9628.81 samples/sec   Loss 5.1012   LearningRate 0.0196   Epoch: 11   Global Step: 185920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:02:12,572-Speed 9357.87 samples/sec   Loss 5.0824   LearningRate 0.0196   Epoch: 11   Global Step: 185930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:02:13,631-Speed 9676.89 samples/sec   Loss 5.2122   LearningRate 0.0196   Epoch: 11   Global Step: 185940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:02:14,737-Speed 9262.38 samples/sec   Loss 5.1423   LearningRate 0.0196   Epoch: 11   Global Step: 185950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:02:15,826-Speed 9404.38 samples/sec   Loss 5.1980   LearningRate 0.0196   Epoch: 11   Global Step: 185960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:02:16,929-Speed 9290.79 samples/sec   Loss 5.2001   LearningRate 0.0196   Epoch: 11   Global Step: 185970   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:02:18,017-Speed 9419.69 samples/sec   Loss 5.1583   LearningRate 0.0196   Epoch: 11   Global Step: 185980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:02:19,078-Speed 9656.59 samples/sec   Loss 5.1847   LearningRate 0.0196   Epoch: 11   Global Step: 185990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:02:20,150-Speed 9562.01 samples/sec   Loss 4.9934   LearningRate 0.0196   Epoch: 11   Global Step: 186000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:02:42,012-[lfw][186000]XNorm: 9.011460
Training: 2022-04-11 19:02:42,013-[lfw][186000]Accuracy-Flip: 0.99683+-0.00293
Training: 2022-04-11 19:02:42,014-[lfw][186000]Accuracy-Highest: 0.99683
Training: 2022-04-11 19:03:07,309-[cfp_fp][186000]XNorm: 7.702360
Training: 2022-04-11 19:03:07,310-[cfp_fp][186000]Accuracy-Flip: 0.95914+-0.01125
Training: 2022-04-11 19:03:07,311-[cfp_fp][186000]Accuracy-Highest: 0.96643
Training: 2022-04-11 19:03:29,144-[agedb_30][186000]XNorm: 8.867334
Training: 2022-04-11 19:03:29,145-[agedb_30][186000]Accuracy-Flip: 0.96800+-0.01032
Training: 2022-04-11 19:03:29,145-[agedb_30][186000]Accuracy-Highest: 0.96917
Training: 2022-04-11 19:03:30,225-Speed 146.13 samples/sec   Loss 5.1681   LearningRate 0.0196   Epoch: 11   Global Step: 186010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:31,306-Speed 9479.55 samples/sec   Loss 5.1802   LearningRate 0.0196   Epoch: 11   Global Step: 186020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:32,425-Speed 9154.10 samples/sec   Loss 5.1861   LearningRate 0.0196   Epoch: 11   Global Step: 186030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:33,528-Speed 9288.43 samples/sec   Loss 5.1569   LearningRate 0.0196   Epoch: 11   Global Step: 186040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:34,644-Speed 9183.66 samples/sec   Loss 5.1740   LearningRate 0.0196   Epoch: 11   Global Step: 186050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:35,721-Speed 9508.54 samples/sec   Loss 5.1827   LearningRate 0.0196   Epoch: 11   Global Step: 186060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:36,783-Speed 9651.02 samples/sec   Loss 5.2189   LearningRate 0.0196   Epoch: 11   Global Step: 186070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:37,858-Speed 9532.58 samples/sec   Loss 5.0898   LearningRate 0.0196   Epoch: 11   Global Step: 186080   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:03:38,942-Speed 9457.12 samples/sec   Loss 5.1352   LearningRate 0.0196   Epoch: 11   Global Step: 186090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:40,054-Speed 9209.52 samples/sec   Loss 5.1464   LearningRate 0.0196   Epoch: 11   Global Step: 186100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:41,135-Speed 9479.59 samples/sec   Loss 5.2153   LearningRate 0.0196   Epoch: 11   Global Step: 186110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:42,212-Speed 9513.58 samples/sec   Loss 5.1056   LearningRate 0.0196   Epoch: 11   Global Step: 186120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:43,331-Speed 9160.82 samples/sec   Loss 5.0775   LearningRate 0.0196   Epoch: 11   Global Step: 186130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:44,435-Speed 9281.56 samples/sec   Loss 5.2057   LearningRate 0.0196   Epoch: 11   Global Step: 186140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:45,539-Speed 9275.44 samples/sec   Loss 5.1430   LearningRate 0.0196   Epoch: 11   Global Step: 186150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:46,666-Speed 9095.31 samples/sec   Loss 5.1776   LearningRate 0.0196   Epoch: 11   Global Step: 186160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:47,708-Speed 9826.12 samples/sec   Loss 5.1760   LearningRate 0.0196   Epoch: 11   Global Step: 186170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:48,804-Speed 9351.51 samples/sec   Loss 5.1443   LearningRate 0.0196   Epoch: 11   Global Step: 186180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:49,897-Speed 9372.99 samples/sec   Loss 5.1530   LearningRate 0.0196   Epoch: 11   Global Step: 186190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:50,977-Speed 9493.55 samples/sec   Loss 5.1410   LearningRate 0.0196   Epoch: 11   Global Step: 186200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:52,028-Speed 9745.14 samples/sec   Loss 5.1385   LearningRate 0.0196   Epoch: 11   Global Step: 186210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:53,076-Speed 9776.06 samples/sec   Loss 5.1293   LearningRate 0.0196   Epoch: 11   Global Step: 186220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:03:54,129-Speed 9732.48 samples/sec   Loss 5.0998   LearningRate 0.0195   Epoch: 11   Global Step: 186230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:03:55,197-Speed 9589.08 samples/sec   Loss 5.1546   LearningRate 0.0195   Epoch: 11   Global Step: 186240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:03:56,243-Speed 9794.44 samples/sec   Loss 5.2717   LearningRate 0.0195   Epoch: 11   Global Step: 186250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:03:57,320-Speed 9519.82 samples/sec   Loss 5.1148   LearningRate 0.0195   Epoch: 11   Global Step: 186260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:03:58,383-Speed 9634.83 samples/sec   Loss 5.2696   LearningRate 0.0195   Epoch: 11   Global Step: 186270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:03:59,448-Speed 9626.74 samples/sec   Loss 5.1545   LearningRate 0.0195   Epoch: 11   Global Step: 186280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:04:00,541-Speed 9371.86 samples/sec   Loss 5.1479   LearningRate 0.0195   Epoch: 11   Global Step: 186290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:04:01,591-Speed 9758.33 samples/sec   Loss 5.1307   LearningRate 0.0195   Epoch: 11   Global Step: 186300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:04:02,671-Speed 9483.77 samples/sec   Loss 5.0937   LearningRate 0.0195   Epoch: 11   Global Step: 186310   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:04:03,718-Speed 9786.85 samples/sec   Loss 5.2152   LearningRate 0.0195   Epoch: 11   Global Step: 186320   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:04:04,795-Speed 9515.63 samples/sec   Loss 5.1494   LearningRate 0.0195   Epoch: 11   Global Step: 186330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:05,836-Speed 9846.31 samples/sec   Loss 5.2243   LearningRate 0.0195   Epoch: 11   Global Step: 186340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:06,911-Speed 9536.59 samples/sec   Loss 5.1619   LearningRate 0.0195   Epoch: 11   Global Step: 186350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:07,978-Speed 9596.65 samples/sec   Loss 5.1657   LearningRate 0.0195   Epoch: 11   Global Step: 186360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:09,027-Speed 9771.00 samples/sec   Loss 5.2244   LearningRate 0.0195   Epoch: 11   Global Step: 186370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:10,084-Speed 9690.59 samples/sec   Loss 5.1169   LearningRate 0.0195   Epoch: 11   Global Step: 186380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:11,180-Speed 9348.31 samples/sec   Loss 5.1218   LearningRate 0.0195   Epoch: 11   Global Step: 186390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:12,289-Speed 9239.29 samples/sec   Loss 5.2180   LearningRate 0.0195   Epoch: 11   Global Step: 186400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:13,387-Speed 9334.85 samples/sec   Loss 5.1874   LearningRate 0.0195   Epoch: 11   Global Step: 186410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:14,503-Speed 9181.50 samples/sec   Loss 5.2127   LearningRate 0.0195   Epoch: 11   Global Step: 186420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:15,571-Speed 9592.08 samples/sec   Loss 5.1609   LearningRate 0.0195   Epoch: 11   Global Step: 186430   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:04:16,641-Speed 9571.60 samples/sec   Loss 5.1431   LearningRate 0.0195   Epoch: 11   Global Step: 186440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:17,802-Speed 8828.76 samples/sec   Loss 5.0126   LearningRate 0.0195   Epoch: 11   Global Step: 186450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:18,896-Speed 9371.02 samples/sec   Loss 5.2837   LearningRate 0.0195   Epoch: 11   Global Step: 186460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:19,940-Speed 9812.64 samples/sec   Loss 5.1721   LearningRate 0.0195   Epoch: 11   Global Step: 186470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:21,042-Speed 9291.98 samples/sec   Loss 5.1993   LearningRate 0.0195   Epoch: 11   Global Step: 186480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:22,186-Speed 8960.77 samples/sec   Loss 5.1844   LearningRate 0.0195   Epoch: 11   Global Step: 186490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:23,240-Speed 9724.26 samples/sec   Loss 5.1746   LearningRate 0.0195   Epoch: 11   Global Step: 186500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:24,353-Speed 9200.84 samples/sec   Loss 5.2097   LearningRate 0.0195   Epoch: 11   Global Step: 186510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:25,442-Speed 9409.35 samples/sec   Loss 5.2783   LearningRate 0.0195   Epoch: 11   Global Step: 186520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:26,524-Speed 9469.66 samples/sec   Loss 5.1549   LearningRate 0.0195   Epoch: 11   Global Step: 186530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:27,566-Speed 9834.14 samples/sec   Loss 5.0665   LearningRate 0.0195   Epoch: 11   Global Step: 186540   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:04:28,646-Speed 9491.84 samples/sec   Loss 5.1049   LearningRate 0.0195   Epoch: 11   Global Step: 186550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:29,699-Speed 9726.71 samples/sec   Loss 5.1247   LearningRate 0.0195   Epoch: 11   Global Step: 186560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:30,835-Speed 9022.80 samples/sec   Loss 5.1372   LearningRate 0.0195   Epoch: 11   Global Step: 186570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:31,926-Speed 9386.49 samples/sec   Loss 5.1746   LearningRate 0.0195   Epoch: 11   Global Step: 186580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:32,989-Speed 9637.91 samples/sec   Loss 5.0985   LearningRate 0.0195   Epoch: 11   Global Step: 186590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:34,087-Speed 9337.63 samples/sec   Loss 5.1090   LearningRate 0.0194   Epoch: 11   Global Step: 186600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:35,165-Speed 9502.02 samples/sec   Loss 5.1451   LearningRate 0.0194   Epoch: 11   Global Step: 186610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:36,255-Speed 9406.50 samples/sec   Loss 5.2141   LearningRate 0.0194   Epoch: 11   Global Step: 186620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:37,322-Speed 9594.83 samples/sec   Loss 5.1675   LearningRate 0.0194   Epoch: 11   Global Step: 186630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:38,400-Speed 9510.09 samples/sec   Loss 5.1987   LearningRate 0.0194   Epoch: 11   Global Step: 186640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:39,486-Speed 9428.42 samples/sec   Loss 5.0925   LearningRate 0.0194   Epoch: 11   Global Step: 186650   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:04:40,558-Speed 9556.88 samples/sec   Loss 5.1651   LearningRate 0.0194   Epoch: 11   Global Step: 186660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:41,618-Speed 9668.92 samples/sec   Loss 5.1761   LearningRate 0.0194   Epoch: 11   Global Step: 186670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:42,711-Speed 9379.48 samples/sec   Loss 5.2409   LearningRate 0.0194   Epoch: 11   Global Step: 186680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:43,759-Speed 9775.80 samples/sec   Loss 5.3034   LearningRate 0.0194   Epoch: 11   Global Step: 186690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:04:44,863-Speed 9283.19 samples/sec   Loss 5.1645   LearningRate 0.0194   Epoch: 11   Global Step: 186700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:04:45,931-Speed 9596.62 samples/sec   Loss 5.1628   LearningRate 0.0194   Epoch: 11   Global Step: 186710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:04:47,018-Speed 9428.47 samples/sec   Loss 5.2297   LearningRate 0.0194   Epoch: 11   Global Step: 186720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:04:48,070-Speed 9736.38 samples/sec   Loss 5.2121   LearningRate 0.0194   Epoch: 11   Global Step: 186730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:04:49,164-Speed 9367.86 samples/sec   Loss 5.1226   LearningRate 0.0194   Epoch: 11   Global Step: 186740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:04:50,254-Speed 9396.52 samples/sec   Loss 5.1867   LearningRate 0.0194   Epoch: 11   Global Step: 186750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:04:51,332-Speed 9502.88 samples/sec   Loss 5.2327   LearningRate 0.0194   Epoch: 11   Global Step: 186760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:04:52,503-Speed 8747.50 samples/sec   Loss 5.3096   LearningRate 0.0194   Epoch: 11   Global Step: 186770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:04:53,621-Speed 9164.15 samples/sec   Loss 5.1798   LearningRate 0.0194   Epoch: 11   Global Step: 186780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:04:54,724-Speed 9290.15 samples/sec   Loss 5.2187   LearningRate 0.0194   Epoch: 11   Global Step: 186790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:55,828-Speed 9282.87 samples/sec   Loss 5.1519   LearningRate 0.0194   Epoch: 11   Global Step: 186800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:56,895-Speed 9598.84 samples/sec   Loss 5.2383   LearningRate 0.0194   Epoch: 11   Global Step: 186810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:57,992-Speed 9345.85 samples/sec   Loss 5.2650   LearningRate 0.0194   Epoch: 11   Global Step: 186820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:04:59,084-Speed 9377.66 samples/sec   Loss 5.2053   LearningRate 0.0194   Epoch: 11   Global Step: 186830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:00,196-Speed 9222.17 samples/sec   Loss 5.0933   LearningRate 0.0194   Epoch: 11   Global Step: 186840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:01,275-Speed 9501.55 samples/sec   Loss 5.2289   LearningRate 0.0194   Epoch: 11   Global Step: 186850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:02,374-Speed 9320.69 samples/sec   Loss 5.0884   LearningRate 0.0194   Epoch: 11   Global Step: 186860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:03,441-Speed 9600.07 samples/sec   Loss 5.1590   LearningRate 0.0194   Epoch: 11   Global Step: 186870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:04,512-Speed 9571.85 samples/sec   Loss 5.1278   LearningRate 0.0194   Epoch: 11   Global Step: 186880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:05,593-Speed 9483.57 samples/sec   Loss 5.1674   LearningRate 0.0194   Epoch: 11   Global Step: 186890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:06,701-Speed 9241.90 samples/sec   Loss 5.1898   LearningRate 0.0194   Epoch: 11   Global Step: 186900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:07,788-Speed 9425.61 samples/sec   Loss 5.2427   LearningRate 0.0194   Epoch: 11   Global Step: 186910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:05:08,931-Speed 8964.63 samples/sec   Loss 5.1596   LearningRate 0.0194   Epoch: 11   Global Step: 186920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:05:10,026-Speed 9359.46 samples/sec   Loss 5.2205   LearningRate 0.0194   Epoch: 11   Global Step: 186930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:05:11,119-Speed 9372.79 samples/sec   Loss 5.1481   LearningRate 0.0194   Epoch: 11   Global Step: 186940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:05:12,200-Speed 9479.84 samples/sec   Loss 5.2172   LearningRate 0.0194   Epoch: 11   Global Step: 186950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:05:13,281-Speed 9474.07 samples/sec   Loss 5.2515   LearningRate 0.0194   Epoch: 11   Global Step: 186960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:05:14,356-Speed 9532.65 samples/sec   Loss 5.2148   LearningRate 0.0194   Epoch: 11   Global Step: 186970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:05:15,431-Speed 9534.57 samples/sec   Loss 5.1457   LearningRate 0.0193   Epoch: 11   Global Step: 186980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:05:16,548-Speed 9166.47 samples/sec   Loss 5.0909   LearningRate 0.0193   Epoch: 11   Global Step: 186990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:05:17,631-Speed 9468.36 samples/sec   Loss 5.1780   LearningRate 0.0193   Epoch: 11   Global Step: 187000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:05:18,708-Speed 9521.39 samples/sec   Loss 5.1404   LearningRate 0.0193   Epoch: 11   Global Step: 187010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:19,796-Speed 9416.74 samples/sec   Loss 5.2634   LearningRate 0.0193   Epoch: 11   Global Step: 187020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:20,910-Speed 9204.17 samples/sec   Loss 5.2157   LearningRate 0.0193   Epoch: 11   Global Step: 187030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:22,013-Speed 9287.24 samples/sec   Loss 5.2381   LearningRate 0.0193   Epoch: 11   Global Step: 187040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:23,119-Speed 9265.82 samples/sec   Loss 5.2345   LearningRate 0.0193   Epoch: 11   Global Step: 187050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:24,231-Speed 9215.35 samples/sec   Loss 5.1943   LearningRate 0.0193   Epoch: 11   Global Step: 187060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:25,315-Speed 9450.09 samples/sec   Loss 5.1864   LearningRate 0.0193   Epoch: 11   Global Step: 187070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:26,428-Speed 9203.48 samples/sec   Loss 5.1797   LearningRate 0.0193   Epoch: 11   Global Step: 187080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:27,517-Speed 9412.66 samples/sec   Loss 5.2031   LearningRate 0.0193   Epoch: 11   Global Step: 187090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:28,646-Speed 9135.21 samples/sec   Loss 5.1809   LearningRate 0.0193   Epoch: 11   Global Step: 187100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:29,708-Speed 9650.74 samples/sec   Loss 5.2190   LearningRate 0.0193   Epoch: 11   Global Step: 187110   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:05:30,809-Speed 9299.54 samples/sec   Loss 5.1051   LearningRate 0.0193   Epoch: 11   Global Step: 187120   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:05:31,881-Speed 9565.45 samples/sec   Loss 5.1828   LearningRate 0.0193   Epoch: 11   Global Step: 187130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:32,938-Speed 9686.44 samples/sec   Loss 5.2423   LearningRate 0.0193   Epoch: 11   Global Step: 187140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:34,007-Speed 9585.36 samples/sec   Loss 5.1669   LearningRate 0.0193   Epoch: 11   Global Step: 187150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:35,165-Speed 8854.51 samples/sec   Loss 5.2770   LearningRate 0.0193   Epoch: 11   Global Step: 187160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:36,290-Speed 9101.20 samples/sec   Loss 5.2217   LearningRate 0.0193   Epoch: 11   Global Step: 187170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:37,353-Speed 9641.90 samples/sec   Loss 5.2392   LearningRate 0.0193   Epoch: 11   Global Step: 187180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:38,434-Speed 9487.16 samples/sec   Loss 5.1485   LearningRate 0.0193   Epoch: 11   Global Step: 187190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:39,540-Speed 9260.29 samples/sec   Loss 5.1272   LearningRate 0.0193   Epoch: 11   Global Step: 187200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:40,656-Speed 9182.88 samples/sec   Loss 5.1396   LearningRate 0.0193   Epoch: 11   Global Step: 187210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:41,758-Speed 9298.94 samples/sec   Loss 5.1379   LearningRate 0.0193   Epoch: 11   Global Step: 187220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:42,850-Speed 9380.63 samples/sec   Loss 5.1964   LearningRate 0.0193   Epoch: 11   Global Step: 187230   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:05:43,900-Speed 9755.49 samples/sec   Loss 5.2207   LearningRate 0.0193   Epoch: 11   Global Step: 187240   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:05:45,007-Speed 9256.30 samples/sec   Loss 5.1874   LearningRate 0.0193   Epoch: 11   Global Step: 187250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:46,130-Speed 9127.97 samples/sec   Loss 5.1665   LearningRate 0.0193   Epoch: 11   Global Step: 187260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:47,195-Speed 9624.81 samples/sec   Loss 5.2414   LearningRate 0.0193   Epoch: 11   Global Step: 187270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:48,287-Speed 9375.97 samples/sec   Loss 5.2244   LearningRate 0.0193   Epoch: 11   Global Step: 187280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:49,347-Speed 9668.26 samples/sec   Loss 5.2489   LearningRate 0.0193   Epoch: 11   Global Step: 187290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:50,421-Speed 9542.31 samples/sec   Loss 5.1882   LearningRate 0.0193   Epoch: 11   Global Step: 187300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:51,500-Speed 9496.07 samples/sec   Loss 5.0715   LearningRate 0.0193   Epoch: 11   Global Step: 187310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:52,622-Speed 9135.53 samples/sec   Loss 5.1590   LearningRate 0.0193   Epoch: 11   Global Step: 187320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:53,730-Speed 9242.99 samples/sec   Loss 5.3053   LearningRate 0.0193   Epoch: 11   Global Step: 187330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:54,832-Speed 9301.67 samples/sec   Loss 5.2079   LearningRate 0.0193   Epoch: 11   Global Step: 187340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:55,952-Speed 9147.99 samples/sec   Loss 5.3064   LearningRate 0.0193   Epoch: 11   Global Step: 187350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:57,025-Speed 9556.18 samples/sec   Loss 5.1101   LearningRate 0.0192   Epoch: 11   Global Step: 187360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:58,131-Speed 9270.30 samples/sec   Loss 5.2920   LearningRate 0.0192   Epoch: 11   Global Step: 187370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:05:59,241-Speed 9235.42 samples/sec   Loss 5.3131   LearningRate 0.0192   Epoch: 11   Global Step: 187380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:00,368-Speed 9091.69 samples/sec   Loss 5.1865   LearningRate 0.0192   Epoch: 11   Global Step: 187390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:01,472-Speed 9279.56 samples/sec   Loss 5.2228   LearningRate 0.0192   Epoch: 11   Global Step: 187400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:02,574-Speed 9299.54 samples/sec   Loss 5.2727   LearningRate 0.0192   Epoch: 11   Global Step: 187410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:03,660-Speed 9435.13 samples/sec   Loss 5.1586   LearningRate 0.0192   Epoch: 11   Global Step: 187420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:04,765-Speed 9271.22 samples/sec   Loss 5.1867   LearningRate 0.0192   Epoch: 11   Global Step: 187430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:05,835-Speed 9574.47 samples/sec   Loss 5.2155   LearningRate 0.0192   Epoch: 11   Global Step: 187440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:06,907-Speed 9560.53 samples/sec   Loss 5.1751   LearningRate 0.0192   Epoch: 11   Global Step: 187450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:07,990-Speed 9457.23 samples/sec   Loss 5.1060   LearningRate 0.0192   Epoch: 11   Global Step: 187460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:09,077-Speed 9422.34 samples/sec   Loss 5.1787   LearningRate 0.0192   Epoch: 11   Global Step: 187470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:10,161-Speed 9462.55 samples/sec   Loss 5.1994   LearningRate 0.0192   Epoch: 11   Global Step: 187480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:11,218-Speed 9687.02 samples/sec   Loss 5.1166   LearningRate 0.0192   Epoch: 11   Global Step: 187490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:12,300-Speed 9473.15 samples/sec   Loss 5.3164   LearningRate 0.0192   Epoch: 11   Global Step: 187500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:13,375-Speed 9526.03 samples/sec   Loss 5.2303   LearningRate 0.0192   Epoch: 11   Global Step: 187510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:14,448-Speed 9551.24 samples/sec   Loss 5.2720   LearningRate 0.0192   Epoch: 11   Global Step: 187520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:15,528-Speed 9486.57 samples/sec   Loss 5.2664   LearningRate 0.0192   Epoch: 11   Global Step: 187530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:16,649-Speed 9145.69 samples/sec   Loss 5.1673   LearningRate 0.0192   Epoch: 11   Global Step: 187540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:17,709-Speed 9659.92 samples/sec   Loss 5.1811   LearningRate 0.0192   Epoch: 11   Global Step: 187550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:18,801-Speed 9383.35 samples/sec   Loss 5.2257   LearningRate 0.0192   Epoch: 11   Global Step: 187560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:19,866-Speed 9624.12 samples/sec   Loss 5.2697   LearningRate 0.0192   Epoch: 11   Global Step: 187570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:20,972-Speed 9264.83 samples/sec   Loss 5.1951   LearningRate 0.0192   Epoch: 11   Global Step: 187580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:22,080-Speed 9241.47 samples/sec   Loss 5.1825   LearningRate 0.0192   Epoch: 11   Global Step: 187590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:23,203-Speed 9131.88 samples/sec   Loss 5.2464   LearningRate 0.0192   Epoch: 11   Global Step: 187600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:24,306-Speed 9286.27 samples/sec   Loss 5.1653   LearningRate 0.0192   Epoch: 11   Global Step: 187610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:25,363-Speed 9689.43 samples/sec   Loss 5.1581   LearningRate 0.0192   Epoch: 11   Global Step: 187620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:26,438-Speed 9532.27 samples/sec   Loss 5.2188   LearningRate 0.0192   Epoch: 11   Global Step: 187630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:27,495-Speed 9697.17 samples/sec   Loss 5.2085   LearningRate 0.0192   Epoch: 11   Global Step: 187640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:28,585-Speed 9397.91 samples/sec   Loss 5.2183   LearningRate 0.0192   Epoch: 11   Global Step: 187650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:29,684-Speed 9323.15 samples/sec   Loss 5.2327   LearningRate 0.0192   Epoch: 11   Global Step: 187660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:30,765-Speed 9477.39 samples/sec   Loss 5.1301   LearningRate 0.0192   Epoch: 11   Global Step: 187670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:31,832-Speed 9597.95 samples/sec   Loss 5.1947   LearningRate 0.0192   Epoch: 11   Global Step: 187680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:32,884-Speed 9747.34 samples/sec   Loss 5.1526   LearningRate 0.0192   Epoch: 11   Global Step: 187690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:33,989-Speed 9273.62 samples/sec   Loss 5.2532   LearningRate 0.0192   Epoch: 11   Global Step: 187700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:35,048-Speed 9677.99 samples/sec   Loss 5.2980   LearningRate 0.0192   Epoch: 11   Global Step: 187710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:36,168-Speed 9142.46 samples/sec   Loss 5.2635   LearningRate 0.0192   Epoch: 11   Global Step: 187720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:37,280-Speed 9215.31 samples/sec   Loss 5.2273   LearningRate 0.0192   Epoch: 11   Global Step: 187730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:38,369-Speed 9407.26 samples/sec   Loss 5.1462   LearningRate 0.0191   Epoch: 11   Global Step: 187740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:39,462-Speed 9384.43 samples/sec   Loss 5.2396   LearningRate 0.0191   Epoch: 11   Global Step: 187750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:40,552-Speed 9398.84 samples/sec   Loss 5.3067   LearningRate 0.0191   Epoch: 11   Global Step: 187760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:41,622-Speed 9571.55 samples/sec   Loss 5.2318   LearningRate 0.0191   Epoch: 11   Global Step: 187770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:42,711-Speed 9414.25 samples/sec   Loss 5.1785   LearningRate 0.0191   Epoch: 11   Global Step: 187780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:43,815-Speed 9279.51 samples/sec   Loss 5.2232   LearningRate 0.0191   Epoch: 11   Global Step: 187790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:44,925-Speed 9231.35 samples/sec   Loss 5.3055   LearningRate 0.0191   Epoch: 11   Global Step: 187800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:45,980-Speed 9711.51 samples/sec   Loss 5.2292   LearningRate 0.0191   Epoch: 11   Global Step: 187810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:47,080-Speed 9313.83 samples/sec   Loss 5.2173   LearningRate 0.0191   Epoch: 11   Global Step: 187820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:48,193-Speed 9202.85 samples/sec   Loss 5.2936   LearningRate 0.0191   Epoch: 11   Global Step: 187830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:49,296-Speed 9291.07 samples/sec   Loss 5.1330   LearningRate 0.0191   Epoch: 11   Global Step: 187840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:50,396-Speed 9319.72 samples/sec   Loss 5.2722   LearningRate 0.0191   Epoch: 11   Global Step: 187850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:51,526-Speed 9066.71 samples/sec   Loss 5.3366   LearningRate 0.0191   Epoch: 11   Global Step: 187860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:52,641-Speed 9190.64 samples/sec   Loss 5.1632   LearningRate 0.0191   Epoch: 11   Global Step: 187870   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:06:53,760-Speed 9154.91 samples/sec   Loss 5.2788   LearningRate 0.0191   Epoch: 11   Global Step: 187880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:06:54,861-Speed 9313.23 samples/sec   Loss 5.2619   LearningRate 0.0191   Epoch: 11   Global Step: 187890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:55,983-Speed 9126.85 samples/sec   Loss 5.2239   LearningRate 0.0191   Epoch: 11   Global Step: 187900   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:57,078-Speed 9355.07 samples/sec   Loss 5.3058   LearningRate 0.0191   Epoch: 11   Global Step: 187910   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:58,213-Speed 9030.20 samples/sec   Loss 5.1704   LearningRate 0.0191   Epoch: 11   Global Step: 187920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:06:59,341-Speed 9080.62 samples/sec   Loss 5.2504   LearningRate 0.0191   Epoch: 11   Global Step: 187930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:07:00,451-Speed 9230.15 samples/sec   Loss 5.1935   LearningRate 0.0191   Epoch: 11   Global Step: 187940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:07:01,538-Speed 9426.09 samples/sec   Loss 5.2652   LearningRate 0.0191   Epoch: 11   Global Step: 187950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:07:02,640-Speed 9298.84 samples/sec   Loss 5.3076   LearningRate 0.0191   Epoch: 11   Global Step: 187960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:07:03,728-Speed 9427.50 samples/sec   Loss 5.1871   LearningRate 0.0191   Epoch: 11   Global Step: 187970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:07:04,817-Speed 9401.81 samples/sec   Loss 5.3295   LearningRate 0.0191   Epoch: 11   Global Step: 187980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:07:05,910-Speed 9374.44 samples/sec   Loss 5.2901   LearningRate 0.0191   Epoch: 11   Global Step: 187990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:07:07,023-Speed 9211.37 samples/sec   Loss 5.2511   LearningRate 0.0191   Epoch: 11   Global Step: 188000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:07:29,105-[lfw][188000]XNorm: 9.039301
Training: 2022-04-11 19:07:29,106-[lfw][188000]Accuracy-Flip: 0.99567+-0.00343
Training: 2022-04-11 19:07:29,106-[lfw][188000]Accuracy-Highest: 0.99683
Training: 2022-04-11 19:07:54,447-[cfp_fp][188000]XNorm: 7.775957
Training: 2022-04-11 19:07:54,448-[cfp_fp][188000]Accuracy-Flip: 0.96714+-0.00833
Training: 2022-04-11 19:07:54,448-[cfp_fp][188000]Accuracy-Highest: 0.96714
Training: 2022-04-11 19:08:16,257-[agedb_30][188000]XNorm: 8.748694
Training: 2022-04-11 19:08:16,257-[agedb_30][188000]Accuracy-Flip: 0.96833+-0.00860
Training: 2022-04-11 19:08:16,258-[agedb_30][188000]Accuracy-Highest: 0.96917
Training: 2022-04-11 19:08:17,340-Speed 145.63 samples/sec   Loss 5.3247   LearningRate 0.0191   Epoch: 11   Global Step: 188010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:18,377-Speed 9876.20 samples/sec   Loss 5.2771   LearningRate 0.0191   Epoch: 11   Global Step: 188020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:19,446-Speed 9588.53 samples/sec   Loss 5.2194   LearningRate 0.0191   Epoch: 11   Global Step: 188030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:20,575-Speed 9072.17 samples/sec   Loss 5.2292   LearningRate 0.0191   Epoch: 11   Global Step: 188040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:21,638-Speed 9636.21 samples/sec   Loss 5.2523   LearningRate 0.0191   Epoch: 11   Global Step: 188050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:22,722-Speed 9456.59 samples/sec   Loss 5.2568   LearningRate 0.0191   Epoch: 11   Global Step: 188060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:23,826-Speed 9288.67 samples/sec   Loss 5.2644   LearningRate 0.0191   Epoch: 11   Global Step: 188070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:24,913-Speed 9426.37 samples/sec   Loss 5.2680   LearningRate 0.0191   Epoch: 11   Global Step: 188080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:08:26,060-Speed 8930.19 samples/sec   Loss 5.2308   LearningRate 0.0191   Epoch: 11   Global Step: 188090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:08:27,202-Speed 8969.13 samples/sec   Loss 5.2772   LearningRate 0.0191   Epoch: 11   Global Step: 188100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:08:28,310-Speed 9246.15 samples/sec   Loss 5.2493   LearningRate 0.0191   Epoch: 11   Global Step: 188110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:08:29,390-Speed 9492.24 samples/sec   Loss 5.2727   LearningRate 0.0190   Epoch: 11   Global Step: 188120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:08:30,500-Speed 9232.50 samples/sec   Loss 5.2840   LearningRate 0.0190   Epoch: 11   Global Step: 188130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:08:31,566-Speed 9604.94 samples/sec   Loss 5.2052   LearningRate 0.0190   Epoch: 11   Global Step: 188140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:08:32,691-Speed 9107.93 samples/sec   Loss 5.2507   LearningRate 0.0190   Epoch: 11   Global Step: 188150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:08:33,791-Speed 9317.26 samples/sec   Loss 5.2226   LearningRate 0.0190   Epoch: 11   Global Step: 188160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:08:34,925-Speed 9035.37 samples/sec   Loss 5.2611   LearningRate 0.0190   Epoch: 11   Global Step: 188170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:08:36,045-Speed 9148.53 samples/sec   Loss 5.3138   LearningRate 0.0190   Epoch: 11   Global Step: 188180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:37,136-Speed 9390.51 samples/sec   Loss 5.3163   LearningRate 0.0190   Epoch: 11   Global Step: 188190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:38,215-Speed 9497.57 samples/sec   Loss 5.2142   LearningRate 0.0190   Epoch: 11   Global Step: 188200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:39,317-Speed 9300.12 samples/sec   Loss 5.2320   LearningRate 0.0190   Epoch: 11   Global Step: 188210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:40,411-Speed 9372.45 samples/sec   Loss 5.2460   LearningRate 0.0190   Epoch: 11   Global Step: 188220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:41,511-Speed 9306.97 samples/sec   Loss 5.2143   LearningRate 0.0190   Epoch: 11   Global Step: 188230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:42,610-Speed 9324.31 samples/sec   Loss 5.2117   LearningRate 0.0190   Epoch: 11   Global Step: 188240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:43,741-Speed 9065.67 samples/sec   Loss 5.2941   LearningRate 0.0190   Epoch: 11   Global Step: 188250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:44,888-Speed 8927.23 samples/sec   Loss 5.2074   LearningRate 0.0190   Epoch: 11   Global Step: 188260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:45,981-Speed 9380.03 samples/sec   Loss 5.3042   LearningRate 0.0190   Epoch: 11   Global Step: 188270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:47,038-Speed 9691.26 samples/sec   Loss 5.3014   LearningRate 0.0190   Epoch: 11   Global Step: 188280   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:08:48,104-Speed 9606.01 samples/sec   Loss 5.2938   LearningRate 0.0190   Epoch: 11   Global Step: 188290   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:08:49,250-Speed 8943.46 samples/sec   Loss 5.2320   LearningRate 0.0190   Epoch: 11   Global Step: 188300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:50,322-Speed 9560.02 samples/sec   Loss 5.2842   LearningRate 0.0190   Epoch: 11   Global Step: 188310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:51,395-Speed 9546.87 samples/sec   Loss 5.2239   LearningRate 0.0190   Epoch: 11   Global Step: 188320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:52,458-Speed 9641.20 samples/sec   Loss 5.2412   LearningRate 0.0190   Epoch: 11   Global Step: 188330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:53,577-Speed 9158.61 samples/sec   Loss 5.2793   LearningRate 0.0190   Epoch: 11   Global Step: 188340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:54,660-Speed 9457.95 samples/sec   Loss 5.1736   LearningRate 0.0190   Epoch: 11   Global Step: 188350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:55,703-Speed 9820.50 samples/sec   Loss 5.3358   LearningRate 0.0190   Epoch: 11   Global Step: 188360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:56,803-Speed 9319.75 samples/sec   Loss 5.1981   LearningRate 0.0190   Epoch: 11   Global Step: 188370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:57,857-Speed 9715.17 samples/sec   Loss 5.1990   LearningRate 0.0190   Epoch: 11   Global Step: 188380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:08:59,004-Speed 8938.26 samples/sec   Loss 5.2183   LearningRate 0.0190   Epoch: 11   Global Step: 188390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:00,071-Speed 9599.78 samples/sec   Loss 5.2290   LearningRate 0.0190   Epoch: 11   Global Step: 188400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:01,178-Speed 9254.91 samples/sec   Loss 5.2118   LearningRate 0.0190   Epoch: 11   Global Step: 188410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:02,279-Speed 9305.76 samples/sec   Loss 5.1839   LearningRate 0.0190   Epoch: 11   Global Step: 188420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:03,374-Speed 9360.56 samples/sec   Loss 5.2833   LearningRate 0.0190   Epoch: 11   Global Step: 188430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:04,468-Speed 9371.29 samples/sec   Loss 5.2405   LearningRate 0.0190   Epoch: 11   Global Step: 188440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:05,569-Speed 9301.33 samples/sec   Loss 5.2701   LearningRate 0.0190   Epoch: 11   Global Step: 188450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:06,649-Speed 9492.35 samples/sec   Loss 5.1640   LearningRate 0.0190   Epoch: 11   Global Step: 188460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:07,712-Speed 9637.39 samples/sec   Loss 5.2224   LearningRate 0.0190   Epoch: 11   Global Step: 188470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:08,832-Speed 9143.87 samples/sec   Loss 5.2517   LearningRate 0.0190   Epoch: 11   Global Step: 188480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:09,949-Speed 9182.24 samples/sec   Loss 5.2239   LearningRate 0.0190   Epoch: 11   Global Step: 188490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:11,025-Speed 9519.73 samples/sec   Loss 5.1516   LearningRate 0.0190   Epoch: 11   Global Step: 188500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:12,108-Speed 9461.26 samples/sec   Loss 5.1849   LearningRate 0.0189   Epoch: 11   Global Step: 188510   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:13,174-Speed 9606.96 samples/sec   Loss 5.2459   LearningRate 0.0189   Epoch: 11   Global Step: 188520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:14,297-Speed 9126.68 samples/sec   Loss 5.3310   LearningRate 0.0189   Epoch: 11   Global Step: 188530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:15,446-Speed 8912.20 samples/sec   Loss 5.3189   LearningRate 0.0189   Epoch: 11   Global Step: 188540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:16,598-Speed 8898.54 samples/sec   Loss 5.2676   LearningRate 0.0189   Epoch: 11   Global Step: 188550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:17,738-Speed 8985.22 samples/sec   Loss 5.3226   LearningRate 0.0189   Epoch: 11   Global Step: 188560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:18,839-Speed 9305.69 samples/sec   Loss 5.2121   LearningRate 0.0189   Epoch: 11   Global Step: 188570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:19,934-Speed 9358.01 samples/sec   Loss 5.3260   LearningRate 0.0189   Epoch: 11   Global Step: 188580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:21,017-Speed 9458.96 samples/sec   Loss 5.2246   LearningRate 0.0189   Epoch: 11   Global Step: 188590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:22,115-Speed 9338.61 samples/sec   Loss 5.2754   LearningRate 0.0189   Epoch: 11   Global Step: 188600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:23,209-Speed 9361.51 samples/sec   Loss 5.1984   LearningRate 0.0189   Epoch: 11   Global Step: 188610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:24,312-Speed 9292.44 samples/sec   Loss 5.2842   LearningRate 0.0189   Epoch: 11   Global Step: 188620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:25,416-Speed 9281.22 samples/sec   Loss 5.4052   LearningRate 0.0189   Epoch: 11   Global Step: 188630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:26,562-Speed 8935.93 samples/sec   Loss 5.2235   LearningRate 0.0189   Epoch: 11   Global Step: 188640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:27,664-Speed 9304.60 samples/sec   Loss 5.1447   LearningRate 0.0189   Epoch: 11   Global Step: 188650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:28,770-Speed 9261.17 samples/sec   Loss 5.2056   LearningRate 0.0189   Epoch: 11   Global Step: 188660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:29,875-Speed 9272.24 samples/sec   Loss 5.2808   LearningRate 0.0189   Epoch: 11   Global Step: 188670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:30,959-Speed 9455.40 samples/sec   Loss 5.1513   LearningRate 0.0189   Epoch: 11   Global Step: 188680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:32,051-Speed 9379.23 samples/sec   Loss 5.3337   LearningRate 0.0189   Epoch: 11   Global Step: 188690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:33,123-Speed 9558.41 samples/sec   Loss 5.1918   LearningRate 0.0189   Epoch: 11   Global Step: 188700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:34,272-Speed 8923.60 samples/sec   Loss 5.1939   LearningRate 0.0189   Epoch: 11   Global Step: 188710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:35,326-Speed 9712.90 samples/sec   Loss 5.2718   LearningRate 0.0189   Epoch: 11   Global Step: 188720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:36,411-Speed 9443.68 samples/sec   Loss 5.2465   LearningRate 0.0189   Epoch: 11   Global Step: 188730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:37,516-Speed 9272.62 samples/sec   Loss 5.2918   LearningRate 0.0189   Epoch: 11   Global Step: 188740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:38,653-Speed 9016.63 samples/sec   Loss 5.3135   LearningRate 0.0189   Epoch: 11   Global Step: 188750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:39,782-Speed 9080.58 samples/sec   Loss 5.4141   LearningRate 0.0189   Epoch: 11   Global Step: 188760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:40,928-Speed 8938.64 samples/sec   Loss 5.3203   LearningRate 0.0189   Epoch: 11   Global Step: 188770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:42,017-Speed 9410.65 samples/sec   Loss 5.1622   LearningRate 0.0189   Epoch: 11   Global Step: 188780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:43,110-Speed 9380.33 samples/sec   Loss 5.2319   LearningRate 0.0189   Epoch: 11   Global Step: 188790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:44,226-Speed 9174.07 samples/sec   Loss 5.3614   LearningRate 0.0189   Epoch: 11   Global Step: 188800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:45,327-Speed 9311.31 samples/sec   Loss 5.3573   LearningRate 0.0189   Epoch: 11   Global Step: 188810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:46,406-Speed 9496.65 samples/sec   Loss 5.2538   LearningRate 0.0189   Epoch: 11   Global Step: 188820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:47,527-Speed 9136.64 samples/sec   Loss 5.2112   LearningRate 0.0189   Epoch: 11   Global Step: 188830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:48,690-Speed 8813.19 samples/sec   Loss 5.2966   LearningRate 0.0189   Epoch: 11   Global Step: 188840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:49,787-Speed 9339.07 samples/sec   Loss 5.2452   LearningRate 0.0189   Epoch: 11   Global Step: 188850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:50,851-Speed 9625.20 samples/sec   Loss 5.3001   LearningRate 0.0189   Epoch: 11   Global Step: 188860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:51,941-Speed 9403.61 samples/sec   Loss 5.3303   LearningRate 0.0189   Epoch: 11   Global Step: 188870   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:53,031-Speed 9403.51 samples/sec   Loss 5.2718   LearningRate 0.0189   Epoch: 11   Global Step: 188880   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:54,135-Speed 9277.92 samples/sec   Loss 5.2866   LearningRate 0.0188   Epoch: 11   Global Step: 188890   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:09:55,228-Speed 9370.61 samples/sec   Loss 5.1316   LearningRate 0.0188   Epoch: 11   Global Step: 188900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:56,324-Speed 9353.81 samples/sec   Loss 5.2246   LearningRate 0.0188   Epoch: 11   Global Step: 188910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:57,425-Speed 9302.07 samples/sec   Loss 5.3523   LearningRate 0.0188   Epoch: 11   Global Step: 188920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:58,543-Speed 9162.05 samples/sec   Loss 5.2675   LearningRate 0.0188   Epoch: 11   Global Step: 188930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:09:59,632-Speed 9413.36 samples/sec   Loss 5.2385   LearningRate 0.0188   Epoch: 11   Global Step: 188940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:00,693-Speed 9657.80 samples/sec   Loss 5.2606   LearningRate 0.0188   Epoch: 11   Global Step: 188950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:01,769-Speed 9522.60 samples/sec   Loss 5.3541   LearningRate 0.0188   Epoch: 11   Global Step: 188960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:02,829-Speed 9662.79 samples/sec   Loss 5.2803   LearningRate 0.0188   Epoch: 11   Global Step: 188970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:03,892-Speed 9642.34 samples/sec   Loss 5.2778   LearningRate 0.0188   Epoch: 11   Global Step: 188980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:04,930-Speed 9877.88 samples/sec   Loss 5.3208   LearningRate 0.0188   Epoch: 11   Global Step: 188990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:06,007-Speed 9510.07 samples/sec   Loss 5.1922   LearningRate 0.0188   Epoch: 11   Global Step: 189000   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:10:07,101-Speed 9368.68 samples/sec   Loss 5.2578   LearningRate 0.0188   Epoch: 11   Global Step: 189010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:08,172-Speed 9559.15 samples/sec   Loss 5.2504   LearningRate 0.0188   Epoch: 11   Global Step: 189020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:09,231-Speed 9678.29 samples/sec   Loss 5.3580   LearningRate 0.0188   Epoch: 11   Global Step: 189030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:10,312-Speed 9478.43 samples/sec   Loss 5.2344   LearningRate 0.0188   Epoch: 11   Global Step: 189040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:11,393-Speed 9480.29 samples/sec   Loss 5.2877   LearningRate 0.0188   Epoch: 11   Global Step: 189050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:12,448-Speed 9711.81 samples/sec   Loss 5.4023   LearningRate 0.0188   Epoch: 11   Global Step: 189060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:13,532-Speed 9451.87 samples/sec   Loss 5.4113   LearningRate 0.0188   Epoch: 11   Global Step: 189070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:14,674-Speed 8970.10 samples/sec   Loss 5.2168   LearningRate 0.0188   Epoch: 11   Global Step: 189080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:15,769-Speed 9358.59 samples/sec   Loss 5.2855   LearningRate 0.0188   Epoch: 11   Global Step: 189090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:16,875-Speed 9264.08 samples/sec   Loss 5.1798   LearningRate 0.0188   Epoch: 11   Global Step: 189100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:18,020-Speed 8954.29 samples/sec   Loss 5.1877   LearningRate 0.0188   Epoch: 11   Global Step: 189110   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:10:19,101-Speed 9478.93 samples/sec   Loss 5.2395   LearningRate 0.0188   Epoch: 11   Global Step: 189120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:20,174-Speed 9547.61 samples/sec   Loss 5.2987   LearningRate 0.0188   Epoch: 11   Global Step: 189130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:21,223-Speed 9765.93 samples/sec   Loss 5.2852   LearningRate 0.0188   Epoch: 11   Global Step: 189140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:22,308-Speed 9443.94 samples/sec   Loss 5.3209   LearningRate 0.0188   Epoch: 11   Global Step: 189150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:23,444-Speed 9021.28 samples/sec   Loss 5.2837   LearningRate 0.0188   Epoch: 11   Global Step: 189160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:24,548-Speed 9275.96 samples/sec   Loss 5.4039   LearningRate 0.0188   Epoch: 11   Global Step: 189170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:25,630-Speed 9466.36 samples/sec   Loss 5.3550   LearningRate 0.0188   Epoch: 11   Global Step: 189180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:26,721-Speed 9397.05 samples/sec   Loss 5.3098   LearningRate 0.0188   Epoch: 11   Global Step: 189190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:27,848-Speed 9091.18 samples/sec   Loss 5.2405   LearningRate 0.0188   Epoch: 11   Global Step: 189200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:28,930-Speed 9463.97 samples/sec   Loss 5.2910   LearningRate 0.0188   Epoch: 11   Global Step: 189210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:30,002-Speed 9563.16 samples/sec   Loss 5.2805   LearningRate 0.0188   Epoch: 11   Global Step: 189220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:31,076-Speed 9541.72 samples/sec   Loss 5.2957   LearningRate 0.0188   Epoch: 11   Global Step: 189230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:32,177-Speed 9304.06 samples/sec   Loss 5.3087   LearningRate 0.0188   Epoch: 11   Global Step: 189240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:33,246-Speed 9592.32 samples/sec   Loss 5.4194   LearningRate 0.0188   Epoch: 11   Global Step: 189250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:34,357-Speed 9219.84 samples/sec   Loss 5.3532   LearningRate 0.0188   Epoch: 11   Global Step: 189260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:35,481-Speed 9119.08 samples/sec   Loss 5.3806   LearningRate 0.0188   Epoch: 11   Global Step: 189270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:36,579-Speed 9333.60 samples/sec   Loss 5.3111   LearningRate 0.0187   Epoch: 11   Global Step: 189280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:37,659-Speed 9487.97 samples/sec   Loss 5.2096   LearningRate 0.0187   Epoch: 11   Global Step: 189290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:38,702-Speed 9824.76 samples/sec   Loss 5.3095   LearningRate 0.0187   Epoch: 11   Global Step: 189300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:39,793-Speed 9396.25 samples/sec   Loss 5.2635   LearningRate 0.0187   Epoch: 11   Global Step: 189310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:40,845-Speed 9739.32 samples/sec   Loss 5.3474   LearningRate 0.0187   Epoch: 11   Global Step: 189320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:41,927-Speed 9467.15 samples/sec   Loss 5.2220   LearningRate 0.0187   Epoch: 11   Global Step: 189330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:43,029-Speed 9296.90 samples/sec   Loss 5.3127   LearningRate 0.0187   Epoch: 11   Global Step: 189340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:44,092-Speed 9644.45 samples/sec   Loss 5.2431   LearningRate 0.0187   Epoch: 11   Global Step: 189350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:45,175-Speed 9451.88 samples/sec   Loss 5.2524   LearningRate 0.0187   Epoch: 11   Global Step: 189360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:46,261-Speed 9434.98 samples/sec   Loss 5.3067   LearningRate 0.0187   Epoch: 11   Global Step: 189370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:47,336-Speed 9531.02 samples/sec   Loss 5.2776   LearningRate 0.0187   Epoch: 11   Global Step: 189380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:48,420-Speed 9456.00 samples/sec   Loss 5.3049   LearningRate 0.0187   Epoch: 11   Global Step: 189390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:49,479-Speed 9675.84 samples/sec   Loss 5.3005   LearningRate 0.0187   Epoch: 11   Global Step: 189400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:50,547-Speed 9589.57 samples/sec   Loss 5.3175   LearningRate 0.0187   Epoch: 11   Global Step: 189410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:51,668-Speed 9141.47 samples/sec   Loss 5.3716   LearningRate 0.0187   Epoch: 11   Global Step: 189420   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:10:52,763-Speed 9361.48 samples/sec   Loss 5.1858   LearningRate 0.0187   Epoch: 11   Global Step: 189430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:53,810-Speed 9788.02 samples/sec   Loss 5.3559   LearningRate 0.0187   Epoch: 11   Global Step: 189440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:54,921-Speed 9218.00 samples/sec   Loss 5.3972   LearningRate 0.0187   Epoch: 11   Global Step: 189450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:56,017-Speed 9355.47 samples/sec   Loss 5.2252   LearningRate 0.0187   Epoch: 11   Global Step: 189460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:57,124-Speed 9255.63 samples/sec   Loss 5.3120   LearningRate 0.0187   Epoch: 11   Global Step: 189470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:58,200-Speed 9518.00 samples/sec   Loss 5.2938   LearningRate 0.0187   Epoch: 11   Global Step: 189480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:10:59,268-Speed 9591.84 samples/sec   Loss 5.3452   LearningRate 0.0187   Epoch: 11   Global Step: 189490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:00,353-Speed 9443.32 samples/sec   Loss 5.3469   LearningRate 0.0187   Epoch: 11   Global Step: 189500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:01,429-Speed 9522.82 samples/sec   Loss 5.3267   LearningRate 0.0187   Epoch: 11   Global Step: 189510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:02,542-Speed 9206.65 samples/sec   Loss 5.3377   LearningRate 0.0187   Epoch: 11   Global Step: 189520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:03,662-Speed 9149.92 samples/sec   Loss 5.2778   LearningRate 0.0187   Epoch: 11   Global Step: 189530   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:11:04,728-Speed 9612.12 samples/sec   Loss 5.3181   LearningRate 0.0187   Epoch: 11   Global Step: 189540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:05,834-Speed 9267.27 samples/sec   Loss 5.4650   LearningRate 0.0187   Epoch: 11   Global Step: 189550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:06,915-Speed 9474.58 samples/sec   Loss 5.3229   LearningRate 0.0187   Epoch: 11   Global Step: 189560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:07,999-Speed 9449.01 samples/sec   Loss 5.3104   LearningRate 0.0187   Epoch: 11   Global Step: 189570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:09,086-Speed 9430.21 samples/sec   Loss 5.2404   LearningRate 0.0187   Epoch: 11   Global Step: 189580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:10,148-Speed 9651.04 samples/sec   Loss 5.2402   LearningRate 0.0187   Epoch: 11   Global Step: 189590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:11,234-Speed 9433.44 samples/sec   Loss 5.2943   LearningRate 0.0187   Epoch: 11   Global Step: 189600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:12,312-Speed 9497.69 samples/sec   Loss 5.3422   LearningRate 0.0187   Epoch: 11   Global Step: 189610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:13,429-Speed 9181.53 samples/sec   Loss 5.2363   LearningRate 0.0187   Epoch: 11   Global Step: 189620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:14,483-Speed 9715.98 samples/sec   Loss 5.2145   LearningRate 0.0187   Epoch: 11   Global Step: 189630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:15,536-Speed 9734.15 samples/sec   Loss 5.2122   LearningRate 0.0187   Epoch: 11   Global Step: 189640   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:11:16,585-Speed 9767.92 samples/sec   Loss 5.2068   LearningRate 0.0187   Epoch: 11   Global Step: 189650   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:11:17,715-Speed 9064.86 samples/sec   Loss 5.3200   LearningRate 0.0186   Epoch: 11   Global Step: 189660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:18,790-Speed 9527.29 samples/sec   Loss 5.2646   LearningRate 0.0186   Epoch: 11   Global Step: 189670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:19,843-Speed 9731.73 samples/sec   Loss 5.3346   LearningRate 0.0186   Epoch: 11   Global Step: 189680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:20,886-Speed 9828.66 samples/sec   Loss 5.2698   LearningRate 0.0186   Epoch: 11   Global Step: 189690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:21,960-Speed 9539.50 samples/sec   Loss 5.2026   LearningRate 0.0186   Epoch: 11   Global Step: 189700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:23,058-Speed 9332.36 samples/sec   Loss 5.3650   LearningRate 0.0186   Epoch: 11   Global Step: 189710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:24,198-Speed 8983.81 samples/sec   Loss 5.2578   LearningRate 0.0186   Epoch: 11   Global Step: 189720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:25,356-Speed 8851.72 samples/sec   Loss 5.3737   LearningRate 0.0186   Epoch: 11   Global Step: 189730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:26,429-Speed 9546.08 samples/sec   Loss 5.3047   LearningRate 0.0186   Epoch: 11   Global Step: 189740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:27,540-Speed 9227.16 samples/sec   Loss 5.1863   LearningRate 0.0186   Epoch: 11   Global Step: 189750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:28,670-Speed 9061.45 samples/sec   Loss 5.3314   LearningRate 0.0186   Epoch: 11   Global Step: 189760   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:11:29,788-Speed 9166.59 samples/sec   Loss 5.2887   LearningRate 0.0186   Epoch: 11   Global Step: 189770   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:11:30,853-Speed 9617.82 samples/sec   Loss 5.2092   LearningRate 0.0186   Epoch: 11   Global Step: 189780   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:11:31,925-Speed 9568.63 samples/sec   Loss 5.3269   LearningRate 0.0186   Epoch: 11   Global Step: 189790   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:11:33,016-Speed 9389.16 samples/sec   Loss 5.1980   LearningRate 0.0186   Epoch: 11   Global Step: 189800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:34,134-Speed 9174.79 samples/sec   Loss 5.3936   LearningRate 0.0186   Epoch: 11   Global Step: 189810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:35,208-Speed 9532.40 samples/sec   Loss 5.3672   LearningRate 0.0186   Epoch: 11   Global Step: 189820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:36,293-Speed 9446.68 samples/sec   Loss 5.3191   LearningRate 0.0186   Epoch: 11   Global Step: 189830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:37,396-Speed 9292.16 samples/sec   Loss 5.2029   LearningRate 0.0186   Epoch: 11   Global Step: 189840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:38,480-Speed 9450.93 samples/sec   Loss 5.2784   LearningRate 0.0186   Epoch: 11   Global Step: 189850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:39,576-Speed 9347.14 samples/sec   Loss 5.2437   LearningRate 0.0186   Epoch: 11   Global Step: 189860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:40,695-Speed 9162.46 samples/sec   Loss 5.3162   LearningRate 0.0186   Epoch: 11   Global Step: 189870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:41,772-Speed 9506.25 samples/sec   Loss 5.3530   LearningRate 0.0186   Epoch: 11   Global Step: 189880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:42,856-Speed 9455.13 samples/sec   Loss 5.3523   LearningRate 0.0186   Epoch: 11   Global Step: 189890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:43,890-Speed 9906.34 samples/sec   Loss 5.3453   LearningRate 0.0186   Epoch: 11   Global Step: 189900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:45,034-Speed 8969.70 samples/sec   Loss 5.2741   LearningRate 0.0186   Epoch: 11   Global Step: 189910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:46,091-Speed 9690.94 samples/sec   Loss 5.2728   LearningRate 0.0186   Epoch: 11   Global Step: 189920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:47,168-Speed 9510.86 samples/sec   Loss 5.2408   LearningRate 0.0186   Epoch: 11   Global Step: 189930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:48,262-Speed 9364.42 samples/sec   Loss 5.3175   LearningRate 0.0186   Epoch: 11   Global Step: 189940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:49,343-Speed 9475.01 samples/sec   Loss 5.2981   LearningRate 0.0186   Epoch: 11   Global Step: 189950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:50,446-Speed 9293.99 samples/sec   Loss 5.3043   LearningRate 0.0186   Epoch: 11   Global Step: 189960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:51,508-Speed 9647.08 samples/sec   Loss 5.3073   LearningRate 0.0186   Epoch: 11   Global Step: 189970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:52,626-Speed 9168.84 samples/sec   Loss 5.2534   LearningRate 0.0186   Epoch: 11   Global Step: 189980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:53,685-Speed 9673.65 samples/sec   Loss 5.2781   LearningRate 0.0186   Epoch: 11   Global Step: 189990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:11:54,759-Speed 9541.81 samples/sec   Loss 5.3541   LearningRate 0.0186   Epoch: 11   Global Step: 190000   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:12:16,931-[lfw][190000]XNorm: 8.862163
Training: 2022-04-11 19:12:16,932-[lfw][190000]Accuracy-Flip: 0.99633+-0.00287
Training: 2022-04-11 19:12:16,933-[lfw][190000]Accuracy-Highest: 0.99683
Training: 2022-04-11 19:12:42,341-[cfp_fp][190000]XNorm: 7.633612
Training: 2022-04-11 19:12:42,342-[cfp_fp][190000]Accuracy-Flip: 0.96400+-0.01051
Training: 2022-04-11 19:12:42,342-[cfp_fp][190000]Accuracy-Highest: 0.96714
Training: 2022-04-11 19:13:04,211-[agedb_30][190000]XNorm: 8.639777
Training: 2022-04-11 19:13:04,212-[agedb_30][190000]Accuracy-Flip: 0.96600+-0.00943
Training: 2022-04-11 19:13:04,212-[agedb_30][190000]Accuracy-Highest: 0.96917
Training: 2022-04-11 19:13:05,282-Speed 145.20 samples/sec   Loss 5.3711   LearningRate 0.0186   Epoch: 11   Global Step: 190010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:06,338-Speed 9701.50 samples/sec   Loss 5.3241   LearningRate 0.0186   Epoch: 11   Global Step: 190020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:07,384-Speed 9796.73 samples/sec   Loss 5.3147   LearningRate 0.0186   Epoch: 11   Global Step: 190030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:08,468-Speed 9450.17 samples/sec   Loss 5.4267   LearningRate 0.0186   Epoch: 11   Global Step: 190040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:09,545-Speed 9513.05 samples/sec   Loss 5.3927   LearningRate 0.0185   Epoch: 11   Global Step: 190050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:10,683-Speed 9004.15 samples/sec   Loss 5.3012   LearningRate 0.0185   Epoch: 11   Global Step: 190060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:11,809-Speed 9107.54 samples/sec   Loss 5.2620   LearningRate 0.0185   Epoch: 11   Global Step: 190070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:12,895-Speed 9430.85 samples/sec   Loss 5.2355   LearningRate 0.0185   Epoch: 11   Global Step: 190080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:13,972-Speed 9512.34 samples/sec   Loss 5.3131   LearningRate 0.0185   Epoch: 11   Global Step: 190090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:15,055-Speed 9466.48 samples/sec   Loss 5.3228   LearningRate 0.0185   Epoch: 11   Global Step: 190100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:16,139-Speed 9451.88 samples/sec   Loss 5.3350   LearningRate 0.0185   Epoch: 11   Global Step: 190110   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:13:17,204-Speed 9618.85 samples/sec   Loss 5.2680   LearningRate 0.0185   Epoch: 11   Global Step: 190120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:18,320-Speed 9180.12 samples/sec   Loss 5.2702   LearningRate 0.0185   Epoch: 11   Global Step: 190130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:19,432-Speed 9209.08 samples/sec   Loss 5.2480   LearningRate 0.0185   Epoch: 11   Global Step: 190140   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:20,523-Speed 9398.40 samples/sec   Loss 5.3839   LearningRate 0.0185   Epoch: 11   Global Step: 190150   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:21,562-Speed 9867.87 samples/sec   Loss 5.3831   LearningRate 0.0185   Epoch: 11   Global Step: 190160   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:22,658-Speed 9341.64 samples/sec   Loss 5.3353   LearningRate 0.0185   Epoch: 11   Global Step: 190170   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:23,720-Speed 9653.74 samples/sec   Loss 5.2403   LearningRate 0.0185   Epoch: 11   Global Step: 190180   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:24,799-Speed 9493.04 samples/sec   Loss 5.2332   LearningRate 0.0185   Epoch: 11   Global Step: 190190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:25,880-Speed 9475.03 samples/sec   Loss 5.3316   LearningRate 0.0185   Epoch: 11   Global Step: 190200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:26,964-Speed 9455.59 samples/sec   Loss 5.3982   LearningRate 0.0185   Epoch: 11   Global Step: 190210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:28,078-Speed 9201.18 samples/sec   Loss 5.2947   LearningRate 0.0185   Epoch: 11   Global Step: 190220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:29,161-Speed 9461.23 samples/sec   Loss 5.2630   LearningRate 0.0185   Epoch: 11   Global Step: 190230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:30,268-Speed 9256.65 samples/sec   Loss 5.2325   LearningRate 0.0185   Epoch: 11   Global Step: 190240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:31,390-Speed 9128.47 samples/sec   Loss 5.3332   LearningRate 0.0185   Epoch: 11   Global Step: 190250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:32,468-Speed 9510.58 samples/sec   Loss 5.2981   LearningRate 0.0185   Epoch: 11   Global Step: 190260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:33,519-Speed 9745.69 samples/sec   Loss 5.2920   LearningRate 0.0185   Epoch: 11   Global Step: 190270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:34,569-Speed 9754.26 samples/sec   Loss 5.2880   LearningRate 0.0185   Epoch: 11   Global Step: 190280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:35,659-Speed 9400.47 samples/sec   Loss 5.3440   LearningRate 0.0185   Epoch: 11   Global Step: 190290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:36,767-Speed 9247.56 samples/sec   Loss 5.3224   LearningRate 0.0185   Epoch: 11   Global Step: 190300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:37,868-Speed 9310.38 samples/sec   Loss 5.4176   LearningRate 0.0185   Epoch: 11   Global Step: 190310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:38,940-Speed 9553.36 samples/sec   Loss 5.2955   LearningRate 0.0185   Epoch: 11   Global Step: 190320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:40,013-Speed 9553.21 samples/sec   Loss 5.3723   LearningRate 0.0185   Epoch: 11   Global Step: 190330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:41,047-Speed 9911.52 samples/sec   Loss 5.3527   LearningRate 0.0185   Epoch: 11   Global Step: 190340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:42,123-Speed 9523.88 samples/sec   Loss 5.3577   LearningRate 0.0185   Epoch: 11   Global Step: 190350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:43,221-Speed 9326.90 samples/sec   Loss 5.3303   LearningRate 0.0185   Epoch: 11   Global Step: 190360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:44,296-Speed 9536.42 samples/sec   Loss 5.2386   LearningRate 0.0185   Epoch: 11   Global Step: 190370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:45,357-Speed 9649.11 samples/sec   Loss 5.3580   LearningRate 0.0185   Epoch: 11   Global Step: 190380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:46,430-Speed 9554.90 samples/sec   Loss 5.3189   LearningRate 0.0185   Epoch: 11   Global Step: 190390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:47,511-Speed 9474.70 samples/sec   Loss 5.2921   LearningRate 0.0185   Epoch: 11   Global Step: 190400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:48,625-Speed 9201.54 samples/sec   Loss 5.3715   LearningRate 0.0185   Epoch: 11   Global Step: 190410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:49,719-Speed 9360.86 samples/sec   Loss 5.1784   LearningRate 0.0185   Epoch: 11   Global Step: 190420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:50,799-Speed 9483.76 samples/sec   Loss 5.2798   LearningRate 0.0185   Epoch: 11   Global Step: 190430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:13:51,892-Speed 9377.99 samples/sec   Loss 5.2741   LearningRate 0.0184   Epoch: 11   Global Step: 190440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:52,994-Speed 9294.68 samples/sec   Loss 5.3282   LearningRate 0.0184   Epoch: 11   Global Step: 190450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:54,117-Speed 9129.54 samples/sec   Loss 5.2974   LearningRate 0.0184   Epoch: 11   Global Step: 190460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:55,166-Speed 9768.80 samples/sec   Loss 5.3205   LearningRate 0.0184   Epoch: 11   Global Step: 190470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:56,247-Speed 9477.72 samples/sec   Loss 5.2386   LearningRate 0.0184   Epoch: 11   Global Step: 190480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:57,440-Speed 8588.23 samples/sec   Loss 5.2077   LearningRate 0.0184   Epoch: 11   Global Step: 190490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:58,555-Speed 9187.24 samples/sec   Loss 5.2158   LearningRate 0.0184   Epoch: 11   Global Step: 190500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:13:59,631-Speed 9528.33 samples/sec   Loss 5.2393   LearningRate 0.0184   Epoch: 11   Global Step: 190510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:00,696-Speed 9621.21 samples/sec   Loss 5.2125   LearningRate 0.0184   Epoch: 11   Global Step: 190520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:01,797-Speed 9303.74 samples/sec   Loss 5.2676   LearningRate 0.0184   Epoch: 11   Global Step: 190530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:02,872-Speed 9532.49 samples/sec   Loss 5.3209   LearningRate 0.0184   Epoch: 11   Global Step: 190540   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:14:03,962-Speed 9391.73 samples/sec   Loss 5.3756   LearningRate 0.0184   Epoch: 11   Global Step: 190550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:05,045-Speed 9465.83 samples/sec   Loss 5.3535   LearningRate 0.0184   Epoch: 11   Global Step: 190560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:06,102-Speed 9697.13 samples/sec   Loss 5.1801   LearningRate 0.0184   Epoch: 11   Global Step: 190570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:07,191-Speed 9406.82 samples/sec   Loss 5.2848   LearningRate 0.0184   Epoch: 11   Global Step: 190580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:08,299-Speed 9247.73 samples/sec   Loss 5.2394   LearningRate 0.0184   Epoch: 11   Global Step: 190590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:09,394-Speed 9355.72 samples/sec   Loss 5.2571   LearningRate 0.0184   Epoch: 11   Global Step: 190600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:10,472-Speed 9499.90 samples/sec   Loss 5.3032   LearningRate 0.0184   Epoch: 11   Global Step: 190610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:11,554-Speed 9480.06 samples/sec   Loss 5.3561   LearningRate 0.0184   Epoch: 11   Global Step: 190620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:12,644-Speed 9393.38 samples/sec   Loss 5.3462   LearningRate 0.0184   Epoch: 11   Global Step: 190630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:13,753-Speed 9242.86 samples/sec   Loss 5.2984   LearningRate 0.0184   Epoch: 11   Global Step: 190640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:14,848-Speed 9359.08 samples/sec   Loss 5.2770   LearningRate 0.0184   Epoch: 11   Global Step: 190650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:15,940-Speed 9384.04 samples/sec   Loss 5.2776   LearningRate 0.0184   Epoch: 11   Global Step: 190660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:17,015-Speed 9529.30 samples/sec   Loss 5.2890   LearningRate 0.0184   Epoch: 11   Global Step: 190670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:18,073-Speed 9684.11 samples/sec   Loss 5.2835   LearningRate 0.0184   Epoch: 11   Global Step: 190680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:19,191-Speed 9170.01 samples/sec   Loss 5.3129   LearningRate 0.0184   Epoch: 11   Global Step: 190690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:20,265-Speed 9532.82 samples/sec   Loss 5.2844   LearningRate 0.0184   Epoch: 11   Global Step: 190700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:21,336-Speed 9571.03 samples/sec   Loss 5.3751   LearningRate 0.0184   Epoch: 11   Global Step: 190710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:22,393-Speed 9697.85 samples/sec   Loss 5.3112   LearningRate 0.0184   Epoch: 11   Global Step: 190720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:23,465-Speed 9559.02 samples/sec   Loss 5.3300   LearningRate 0.0184   Epoch: 11   Global Step: 190730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:24,535-Speed 9575.63 samples/sec   Loss 5.4223   LearningRate 0.0184   Epoch: 11   Global Step: 190740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:25,637-Speed 9292.73 samples/sec   Loss 5.2487   LearningRate 0.0184   Epoch: 11   Global Step: 190750   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:14:26,683-Speed 9796.05 samples/sec   Loss 5.4037   LearningRate 0.0184   Epoch: 11   Global Step: 190760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:27,796-Speed 9205.73 samples/sec   Loss 5.2982   LearningRate 0.0184   Epoch: 11   Global Step: 190770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:28,902-Speed 9270.80 samples/sec   Loss 5.3087   LearningRate 0.0184   Epoch: 11   Global Step: 190780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:29,987-Speed 9439.85 samples/sec   Loss 5.2909   LearningRate 0.0184   Epoch: 11   Global Step: 190790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:31,081-Speed 9363.18 samples/sec   Loss 5.3069   LearningRate 0.0184   Epoch: 11   Global Step: 190800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:32,183-Speed 9298.93 samples/sec   Loss 5.4106   LearningRate 0.0184   Epoch: 11   Global Step: 190810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:33,271-Speed 9423.73 samples/sec   Loss 5.3226   LearningRate 0.0184   Epoch: 11   Global Step: 190820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:34,336-Speed 9623.47 samples/sec   Loss 5.2384   LearningRate 0.0183   Epoch: 11   Global Step: 190830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:35,414-Speed 9503.39 samples/sec   Loss 5.3702   LearningRate 0.0183   Epoch: 11   Global Step: 190840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:36,480-Speed 9609.54 samples/sec   Loss 5.2699   LearningRate 0.0183   Epoch: 11   Global Step: 190850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:37,533-Speed 9733.65 samples/sec   Loss 5.2772   LearningRate 0.0183   Epoch: 11   Global Step: 190860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:38,589-Speed 9709.63 samples/sec   Loss 5.3400   LearningRate 0.0183   Epoch: 11   Global Step: 190870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:39,678-Speed 9413.53 samples/sec   Loss 5.3660   LearningRate 0.0183   Epoch: 11   Global Step: 190880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:40,784-Speed 9263.52 samples/sec   Loss 5.3965   LearningRate 0.0183   Epoch: 11   Global Step: 190890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:41,873-Speed 9410.93 samples/sec   Loss 5.2483   LearningRate 0.0183   Epoch: 11   Global Step: 190900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:42,926-Speed 9727.43 samples/sec   Loss 5.2608   LearningRate 0.0183   Epoch: 11   Global Step: 190910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:44,029-Speed 9293.99 samples/sec   Loss 5.4057   LearningRate 0.0183   Epoch: 11   Global Step: 190920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:45,149-Speed 9143.19 samples/sec   Loss 5.3412   LearningRate 0.0183   Epoch: 11   Global Step: 190930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:46,218-Speed 9585.69 samples/sec   Loss 5.2824   LearningRate 0.0183   Epoch: 11   Global Step: 190940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:47,310-Speed 9378.80 samples/sec   Loss 5.3598   LearningRate 0.0183   Epoch: 11   Global Step: 190950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:48,405-Speed 9360.44 samples/sec   Loss 5.3369   LearningRate 0.0183   Epoch: 11   Global Step: 190960   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:14:49,546-Speed 8978.36 samples/sec   Loss 5.3622   LearningRate 0.0183   Epoch: 11   Global Step: 190970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:50,606-Speed 9667.74 samples/sec   Loss 5.2984   LearningRate 0.0183   Epoch: 11   Global Step: 190980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:51,734-Speed 9089.68 samples/sec   Loss 5.3086   LearningRate 0.0183   Epoch: 11   Global Step: 190990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:52,833-Speed 9324.92 samples/sec   Loss 5.2995   LearningRate 0.0183   Epoch: 11   Global Step: 191000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:53,987-Speed 8878.23 samples/sec   Loss 5.2706   LearningRate 0.0183   Epoch: 11   Global Step: 191010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:55,073-Speed 9428.93 samples/sec   Loss 5.3972   LearningRate 0.0183   Epoch: 11   Global Step: 191020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:56,192-Speed 9156.73 samples/sec   Loss 5.3439   LearningRate 0.0183   Epoch: 11   Global Step: 191030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:57,333-Speed 8981.52 samples/sec   Loss 5.2638   LearningRate 0.0183   Epoch: 11   Global Step: 191040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:58,478-Speed 8953.76 samples/sec   Loss 5.3190   LearningRate 0.0183   Epoch: 11   Global Step: 191050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:14:59,596-Speed 9165.01 samples/sec   Loss 5.2627   LearningRate 0.0183   Epoch: 11   Global Step: 191060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:00,676-Speed 9485.44 samples/sec   Loss 5.2868   LearningRate 0.0183   Epoch: 11   Global Step: 191070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:01,768-Speed 9380.53 samples/sec   Loss 5.2782   LearningRate 0.0183   Epoch: 11   Global Step: 191080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:02,886-Speed 9164.43 samples/sec   Loss 5.3503   LearningRate 0.0183   Epoch: 11   Global Step: 191090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:03,959-Speed 9543.46 samples/sec   Loss 5.3418   LearningRate 0.0183   Epoch: 11   Global Step: 191100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:05,066-Speed 9257.23 samples/sec   Loss 5.3560   LearningRate 0.0183   Epoch: 11   Global Step: 191110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:06,165-Speed 9327.70 samples/sec   Loss 5.2957   LearningRate 0.0183   Epoch: 11   Global Step: 191120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:07,244-Speed 9489.74 samples/sec   Loss 5.3684   LearningRate 0.0183   Epoch: 11   Global Step: 191130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:08,307-Speed 9638.94 samples/sec   Loss 5.3203   LearningRate 0.0183   Epoch: 11   Global Step: 191140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:09,367-Speed 9669.03 samples/sec   Loss 5.3671   LearningRate 0.0183   Epoch: 11   Global Step: 191150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:10,399-Speed 9930.62 samples/sec   Loss 5.3635   LearningRate 0.0183   Epoch: 11   Global Step: 191160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:11,491-Speed 9385.81 samples/sec   Loss 5.3402   LearningRate 0.0183   Epoch: 11   Global Step: 191170   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:15:12,598-Speed 9257.14 samples/sec   Loss 5.3543   LearningRate 0.0183   Epoch: 11   Global Step: 191180   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:15:13,689-Speed 9391.70 samples/sec   Loss 5.2437   LearningRate 0.0183   Epoch: 11   Global Step: 191190   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:15:14,740-Speed 9744.25 samples/sec   Loss 5.3332   LearningRate 0.0183   Epoch: 11   Global Step: 191200   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:15:15,782-Speed 9839.93 samples/sec   Loss 5.3840   LearningRate 0.0183   Epoch: 11   Global Step: 191210   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:15:16,849-Speed 9596.58 samples/sec   Loss 5.1738   LearningRate 0.0182   Epoch: 11   Global Step: 191220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:17,942-Speed 9390.33 samples/sec   Loss 5.2178   LearningRate 0.0182   Epoch: 11   Global Step: 191230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:19,033-Speed 9389.38 samples/sec   Loss 5.3724   LearningRate 0.0182   Epoch: 11   Global Step: 191240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:20,090-Speed 9697.27 samples/sec   Loss 5.2960   LearningRate 0.0182   Epoch: 11   Global Step: 191250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:21,208-Speed 9163.16 samples/sec   Loss 5.3022   LearningRate 0.0182   Epoch: 11   Global Step: 191260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:22,314-Speed 9269.36 samples/sec   Loss 5.2726   LearningRate 0.0182   Epoch: 11   Global Step: 191270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:23,388-Speed 9542.07 samples/sec   Loss 5.3380   LearningRate 0.0182   Epoch: 11   Global Step: 191280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:24,514-Speed 9094.60 samples/sec   Loss 5.2719   LearningRate 0.0182   Epoch: 11   Global Step: 191290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:25,615-Speed 9308.45 samples/sec   Loss 5.3522   LearningRate 0.0182   Epoch: 11   Global Step: 191300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:26,693-Speed 9502.68 samples/sec   Loss 5.3199   LearningRate 0.0182   Epoch: 11   Global Step: 191310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:27,805-Speed 9213.09 samples/sec   Loss 5.2440   LearningRate 0.0182   Epoch: 11   Global Step: 191320   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:15:28,887-Speed 9471.42 samples/sec   Loss 5.3157   LearningRate 0.0182   Epoch: 11   Global Step: 191330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:29,956-Speed 9590.18 samples/sec   Loss 5.3752   LearningRate 0.0182   Epoch: 11   Global Step: 191340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:31,020-Speed 9632.92 samples/sec   Loss 5.3124   LearningRate 0.0182   Epoch: 11   Global Step: 191350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:32,143-Speed 9122.60 samples/sec   Loss 5.2294   LearningRate 0.0182   Epoch: 11   Global Step: 191360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:33,241-Speed 9334.98 samples/sec   Loss 5.3882   LearningRate 0.0182   Epoch: 11   Global Step: 191370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:34,316-Speed 9531.97 samples/sec   Loss 5.3240   LearningRate 0.0182   Epoch: 11   Global Step: 191380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:35,396-Speed 9484.91 samples/sec   Loss 5.3311   LearningRate 0.0182   Epoch: 11   Global Step: 191390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:36,453-Speed 9688.17 samples/sec   Loss 5.3760   LearningRate 0.0182   Epoch: 11   Global Step: 191400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:37,544-Speed 9388.58 samples/sec   Loss 5.3344   LearningRate 0.0182   Epoch: 11   Global Step: 191410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:38,628-Speed 9459.16 samples/sec   Loss 5.3525   LearningRate 0.0182   Epoch: 11   Global Step: 191420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:39,706-Speed 9501.82 samples/sec   Loss 5.2809   LearningRate 0.0182   Epoch: 11   Global Step: 191430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:40,812-Speed 9263.71 samples/sec   Loss 5.3830   LearningRate 0.0182   Epoch: 11   Global Step: 191440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:41,890-Speed 9510.35 samples/sec   Loss 5.4146   LearningRate 0.0182   Epoch: 11   Global Step: 191450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:42,959-Speed 9581.65 samples/sec   Loss 5.3499   LearningRate 0.0182   Epoch: 11   Global Step: 191460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:44,044-Speed 9444.46 samples/sec   Loss 5.2652   LearningRate 0.0182   Epoch: 11   Global Step: 191470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:45,115-Speed 9566.28 samples/sec   Loss 5.3790   LearningRate 0.0182   Epoch: 11   Global Step: 191480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:46,179-Speed 9628.73 samples/sec   Loss 5.3272   LearningRate 0.0182   Epoch: 11   Global Step: 191490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:47,237-Speed 9683.16 samples/sec   Loss 5.4226   LearningRate 0.0182   Epoch: 11   Global Step: 191500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:48,333-Speed 9352.22 samples/sec   Loss 5.3644   LearningRate 0.0182   Epoch: 11   Global Step: 191510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:49,464-Speed 9056.99 samples/sec   Loss 5.2831   LearningRate 0.0182   Epoch: 11   Global Step: 191520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:50,550-Speed 9440.19 samples/sec   Loss 5.3913   LearningRate 0.0182   Epoch: 11   Global Step: 191530   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:15:51,618-Speed 9590.97 samples/sec   Loss 5.2899   LearningRate 0.0182   Epoch: 11   Global Step: 191540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:52,706-Speed 9416.10 samples/sec   Loss 5.3544   LearningRate 0.0182   Epoch: 11   Global Step: 191550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:53,751-Speed 9810.84 samples/sec   Loss 5.3003   LearningRate 0.0182   Epoch: 11   Global Step: 191560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:54,839-Speed 9417.91 samples/sec   Loss 5.3840   LearningRate 0.0182   Epoch: 11   Global Step: 191570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:55,913-Speed 9537.06 samples/sec   Loss 5.4249   LearningRate 0.0182   Epoch: 11   Global Step: 191580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:56,978-Speed 9620.38 samples/sec   Loss 5.3205   LearningRate 0.0182   Epoch: 11   Global Step: 191590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:58,082-Speed 9284.06 samples/sec   Loss 5.2844   LearningRate 0.0182   Epoch: 11   Global Step: 191600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:15:59,159-Speed 9513.24 samples/sec   Loss 5.2482   LearningRate 0.0181   Epoch: 11   Global Step: 191610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:00,239-Speed 9481.82 samples/sec   Loss 5.2203   LearningRate 0.0181   Epoch: 11   Global Step: 191620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:01,341-Speed 9303.31 samples/sec   Loss 5.3102   LearningRate 0.0181   Epoch: 11   Global Step: 191630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:02,424-Speed 9459.97 samples/sec   Loss 5.3041   LearningRate 0.0181   Epoch: 11   Global Step: 191640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:03,498-Speed 9536.23 samples/sec   Loss 5.4088   LearningRate 0.0181   Epoch: 11   Global Step: 191650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:04,574-Speed 9523.95 samples/sec   Loss 5.2871   LearningRate 0.0181   Epoch: 11   Global Step: 191660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:05,654-Speed 9489.65 samples/sec   Loss 5.3009   LearningRate 0.0181   Epoch: 11   Global Step: 191670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:06,743-Speed 9407.81 samples/sec   Loss 5.3276   LearningRate 0.0181   Epoch: 11   Global Step: 191680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:07,855-Speed 9213.98 samples/sec   Loss 5.2352   LearningRate 0.0181   Epoch: 11   Global Step: 191690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:08,908-Speed 9735.59 samples/sec   Loss 5.3844   LearningRate 0.0181   Epoch: 11   Global Step: 191700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:09,986-Speed 9508.07 samples/sec   Loss 5.2869   LearningRate 0.0181   Epoch: 11   Global Step: 191710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:11,090-Speed 9282.16 samples/sec   Loss 5.3891   LearningRate 0.0181   Epoch: 11   Global Step: 191720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:12,212-Speed 9130.09 samples/sec   Loss 5.2683   LearningRate 0.0181   Epoch: 11   Global Step: 191730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:13,310-Speed 9328.88 samples/sec   Loss 5.2802   LearningRate 0.0181   Epoch: 11   Global Step: 191740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:14,405-Speed 9357.10 samples/sec   Loss 5.1986   LearningRate 0.0181   Epoch: 11   Global Step: 191750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:15,489-Speed 9457.40 samples/sec   Loss 5.3372   LearningRate 0.0181   Epoch: 11   Global Step: 191760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:16,573-Speed 9446.49 samples/sec   Loss 5.3119   LearningRate 0.0181   Epoch: 11   Global Step: 191770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:17,684-Speed 9230.66 samples/sec   Loss 5.3379   LearningRate 0.0181   Epoch: 11   Global Step: 191780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:18,771-Speed 9433.03 samples/sec   Loss 5.3305   LearningRate 0.0181   Epoch: 11   Global Step: 191790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:19,868-Speed 9339.40 samples/sec   Loss 5.3308   LearningRate 0.0181   Epoch: 11   Global Step: 191800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:20,953-Speed 9437.67 samples/sec   Loss 5.3020   LearningRate 0.0181   Epoch: 11   Global Step: 191810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:22,050-Speed 9345.40 samples/sec   Loss 5.3663   LearningRate 0.0181   Epoch: 11   Global Step: 191820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:23,158-Speed 9248.70 samples/sec   Loss 5.2797   LearningRate 0.0181   Epoch: 11   Global Step: 191830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:24,217-Speed 9680.63 samples/sec   Loss 5.4519   LearningRate 0.0181   Epoch: 11   Global Step: 191840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:25,290-Speed 9544.91 samples/sec   Loss 5.3555   LearningRate 0.0181   Epoch: 11   Global Step: 191850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:26,345-Speed 9711.51 samples/sec   Loss 5.4943   LearningRate 0.0181   Epoch: 11   Global Step: 191860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:27,417-Speed 9562.24 samples/sec   Loss 5.1945   LearningRate 0.0181   Epoch: 11   Global Step: 191870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:28,548-Speed 9066.51 samples/sec   Loss 5.2661   LearningRate 0.0181   Epoch: 11   Global Step: 191880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:29,634-Speed 9432.05 samples/sec   Loss 5.2655   LearningRate 0.0181   Epoch: 11   Global Step: 191890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:30,713-Speed 9491.38 samples/sec   Loss 5.3083   LearningRate 0.0181   Epoch: 11   Global Step: 191900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:31,801-Speed 9417.23 samples/sec   Loss 5.2777   LearningRate 0.0181   Epoch: 11   Global Step: 191910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:32,897-Speed 9349.58 samples/sec   Loss 5.3146   LearningRate 0.0181   Epoch: 11   Global Step: 191920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:16:33,980-Speed 9459.41 samples/sec   Loss 5.3309   LearningRate 0.0181   Epoch: 11   Global Step: 191930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:35,068-Speed 9415.58 samples/sec   Loss 5.3824   LearningRate 0.0181   Epoch: 11   Global Step: 191940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:36,153-Speed 9444.22 samples/sec   Loss 5.3621   LearningRate 0.0181   Epoch: 11   Global Step: 191950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:37,282-Speed 9081.26 samples/sec   Loss 5.2586   LearningRate 0.0181   Epoch: 11   Global Step: 191960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:38,373-Speed 9388.96 samples/sec   Loss 5.3996   LearningRate 0.0181   Epoch: 11   Global Step: 191970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:39,462-Speed 9404.30 samples/sec   Loss 5.3694   LearningRate 0.0181   Epoch: 11   Global Step: 191980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:40,518-Speed 9714.01 samples/sec   Loss 5.3176   LearningRate 0.0181   Epoch: 11   Global Step: 191990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:16:41,592-Speed 9532.20 samples/sec   Loss 5.3646   LearningRate 0.0180   Epoch: 11   Global Step: 192000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:17:03,354-[lfw][192000]XNorm: 8.933503
Training: 2022-04-11 19:17:03,354-[lfw][192000]Accuracy-Flip: 0.99617+-0.00269
Training: 2022-04-11 19:17:03,355-[lfw][192000]Accuracy-Highest: 0.99683
Training: 2022-04-11 19:17:28,532-[cfp_fp][192000]XNorm: 7.654611
Training: 2022-04-11 19:17:28,533-[cfp_fp][192000]Accuracy-Flip: 0.96457+-0.00951
Training: 2022-04-11 19:17:28,533-[cfp_fp][192000]Accuracy-Highest: 0.96714
Training: 2022-04-11 19:17:50,241-[agedb_30][192000]XNorm: 8.630226
Training: 2022-04-11 19:17:50,242-[agedb_30][192000]Accuracy-Flip: 0.96733+-0.00863
Training: 2022-04-11 19:17:50,243-[agedb_30][192000]Accuracy-Highest: 0.96917
Training: 2022-04-11 19:17:51,327-Speed 146.84 samples/sec   Loss 5.3863   LearningRate 0.0180   Epoch: 11   Global Step: 192010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:17:52,377-Speed 9754.60 samples/sec   Loss 5.2680   LearningRate 0.0180   Epoch: 11   Global Step: 192020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:17:53,441-Speed 9624.16 samples/sec   Loss 5.3513   LearningRate 0.0180   Epoch: 11   Global Step: 192030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:17:54,484-Speed 9828.64 samples/sec   Loss 5.3244   LearningRate 0.0180   Epoch: 11   Global Step: 192040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:17:55,585-Speed 9304.13 samples/sec   Loss 5.3055   LearningRate 0.0180   Epoch: 11   Global Step: 192050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:17:56,664-Speed 9495.56 samples/sec   Loss 5.2687   LearningRate 0.0180   Epoch: 11   Global Step: 192060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:17:57,777-Speed 9208.61 samples/sec   Loss 5.2554   LearningRate 0.0180   Epoch: 11   Global Step: 192070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:17:58,869-Speed 9383.40 samples/sec   Loss 5.3774   LearningRate 0.0180   Epoch: 11   Global Step: 192080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:17:59,958-Speed 9406.90 samples/sec   Loss 5.3235   LearningRate 0.0180   Epoch: 11   Global Step: 192090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:01,073-Speed 9191.42 samples/sec   Loss 5.3135   LearningRate 0.0180   Epoch: 11   Global Step: 192100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:02,182-Speed 9234.52 samples/sec   Loss 5.2761   LearningRate 0.0180   Epoch: 11   Global Step: 192110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:03,294-Speed 9220.11 samples/sec   Loss 5.4014   LearningRate 0.0180   Epoch: 11   Global Step: 192120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:04,400-Speed 9264.33 samples/sec   Loss 5.3180   LearningRate 0.0180   Epoch: 11   Global Step: 192130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:05,482-Speed 9466.96 samples/sec   Loss 5.4617   LearningRate 0.0180   Epoch: 11   Global Step: 192140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:06,556-Speed 9543.52 samples/sec   Loss 5.3350   LearningRate 0.0180   Epoch: 11   Global Step: 192150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:07,580-Speed 10004.87 samples/sec   Loss 5.3899   LearningRate 0.0180   Epoch: 11   Global Step: 192160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:08,691-Speed 9224.76 samples/sec   Loss 5.3480   LearningRate 0.0180   Epoch: 11   Global Step: 192170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:09,787-Speed 9347.26 samples/sec   Loss 5.2993   LearningRate 0.0180   Epoch: 11   Global Step: 192180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:10,914-Speed 9089.90 samples/sec   Loss 5.3740   LearningRate 0.0180   Epoch: 11   Global Step: 192190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:11,974-Speed 9669.17 samples/sec   Loss 5.3223   LearningRate 0.0180   Epoch: 11   Global Step: 192200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:13,065-Speed 9394.82 samples/sec   Loss 5.3975   LearningRate 0.0180   Epoch: 11   Global Step: 192210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:14,181-Speed 9176.99 samples/sec   Loss 5.3242   LearningRate 0.0180   Epoch: 11   Global Step: 192220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:15,267-Speed 9441.46 samples/sec   Loss 5.3362   LearningRate 0.0180   Epoch: 11   Global Step: 192230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:16,342-Speed 9525.12 samples/sec   Loss 5.4185   LearningRate 0.0180   Epoch: 11   Global Step: 192240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:17,409-Speed 9603.31 samples/sec   Loss 5.3456   LearningRate 0.0180   Epoch: 11   Global Step: 192250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:18,466-Speed 9689.85 samples/sec   Loss 5.3414   LearningRate 0.0180   Epoch: 11   Global Step: 192260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:19,539-Speed 9553.16 samples/sec   Loss 5.3279   LearningRate 0.0180   Epoch: 11   Global Step: 192270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:20,619-Speed 9488.92 samples/sec   Loss 5.3146   LearningRate 0.0180   Epoch: 11   Global Step: 192280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:21,681-Speed 9640.79 samples/sec   Loss 5.4189   LearningRate 0.0180   Epoch: 11   Global Step: 192290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:22,790-Speed 9249.01 samples/sec   Loss 5.3884   LearningRate 0.0180   Epoch: 11   Global Step: 192300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:23,856-Speed 9607.02 samples/sec   Loss 5.3051   LearningRate 0.0180   Epoch: 11   Global Step: 192310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:24,954-Speed 9329.27 samples/sec   Loss 5.3472   LearningRate 0.0180   Epoch: 11   Global Step: 192320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:26,029-Speed 9529.70 samples/sec   Loss 5.4177   LearningRate 0.0180   Epoch: 11   Global Step: 192330   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:18:27,112-Speed 9461.77 samples/sec   Loss 5.4341   LearningRate 0.0180   Epoch: 11   Global Step: 192340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:28,169-Speed 9693.91 samples/sec   Loss 5.2477   LearningRate 0.0180   Epoch: 11   Global Step: 192350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:29,263-Speed 9366.65 samples/sec   Loss 5.2716   LearningRate 0.0180   Epoch: 11   Global Step: 192360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:30,325-Speed 9646.24 samples/sec   Loss 5.3689   LearningRate 0.0180   Epoch: 11   Global Step: 192370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:31,423-Speed 9334.04 samples/sec   Loss 5.3502   LearningRate 0.0180   Epoch: 11   Global Step: 192380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:32,533-Speed 9236.39 samples/sec   Loss 5.3847   LearningRate 0.0179   Epoch: 11   Global Step: 192390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:33,665-Speed 9054.45 samples/sec   Loss 5.3093   LearningRate 0.0179   Epoch: 11   Global Step: 192400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:34,759-Speed 9362.96 samples/sec   Loss 5.4060   LearningRate 0.0179   Epoch: 11   Global Step: 192410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:35,807-Speed 9771.08 samples/sec   Loss 5.3398   LearningRate 0.0179   Epoch: 11   Global Step: 192420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:36,902-Speed 9362.12 samples/sec   Loss 5.2411   LearningRate 0.0179   Epoch: 11   Global Step: 192430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:38,000-Speed 9327.26 samples/sec   Loss 5.3433   LearningRate 0.0179   Epoch: 11   Global Step: 192440   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:18:39,105-Speed 9271.68 samples/sec   Loss 5.2754   LearningRate 0.0179   Epoch: 11   Global Step: 192450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:40,169-Speed 9638.94 samples/sec   Loss 5.3091   LearningRate 0.0179   Epoch: 11   Global Step: 192460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:41,268-Speed 9322.48 samples/sec   Loss 5.4040   LearningRate 0.0179   Epoch: 11   Global Step: 192470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:42,365-Speed 9334.51 samples/sec   Loss 5.3491   LearningRate 0.0179   Epoch: 11   Global Step: 192480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:43,463-Speed 9336.77 samples/sec   Loss 5.2475   LearningRate 0.0179   Epoch: 11   Global Step: 192490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:44,548-Speed 9441.04 samples/sec   Loss 5.3496   LearningRate 0.0179   Epoch: 11   Global Step: 192500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:45,611-Speed 9636.79 samples/sec   Loss 5.3466   LearningRate 0.0179   Epoch: 11   Global Step: 192510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:46,728-Speed 9175.41 samples/sec   Loss 5.3935   LearningRate 0.0179   Epoch: 11   Global Step: 192520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:47,856-Speed 9078.83 samples/sec   Loss 5.4080   LearningRate 0.0179   Epoch: 11   Global Step: 192530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:48,958-Speed 9302.94 samples/sec   Loss 5.2791   LearningRate 0.0179   Epoch: 11   Global Step: 192540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:50,045-Speed 9424.49 samples/sec   Loss 5.3891   LearningRate 0.0179   Epoch: 11   Global Step: 192550   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:18:51,104-Speed 9674.64 samples/sec   Loss 5.2669   LearningRate 0.0179   Epoch: 11   Global Step: 192560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:52,177-Speed 9549.74 samples/sec   Loss 5.3302   LearningRate 0.0179   Epoch: 11   Global Step: 192570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:53,295-Speed 9167.21 samples/sec   Loss 5.2674   LearningRate 0.0179   Epoch: 11   Global Step: 192580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:54,358-Speed 9638.54 samples/sec   Loss 5.3898   LearningRate 0.0179   Epoch: 11   Global Step: 192590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:55,458-Speed 9319.90 samples/sec   Loss 5.4121   LearningRate 0.0179   Epoch: 11   Global Step: 192600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:56,541-Speed 9460.97 samples/sec   Loss 5.3457   LearningRate 0.0179   Epoch: 11   Global Step: 192610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:57,611-Speed 9574.23 samples/sec   Loss 5.2491   LearningRate 0.0179   Epoch: 11   Global Step: 192620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:58,676-Speed 9619.08 samples/sec   Loss 5.3550   LearningRate 0.0179   Epoch: 11   Global Step: 192630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:18:59,759-Speed 9461.73 samples/sec   Loss 5.3009   LearningRate 0.0179   Epoch: 11   Global Step: 192640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:00,832-Speed 9550.61 samples/sec   Loss 5.2979   LearningRate 0.0179   Epoch: 11   Global Step: 192650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:01,906-Speed 9540.83 samples/sec   Loss 5.3693   LearningRate 0.0179   Epoch: 11   Global Step: 192660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:03,056-Speed 8907.83 samples/sec   Loss 5.3801   LearningRate 0.0179   Epoch: 11   Global Step: 192670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:04,221-Speed 8797.44 samples/sec   Loss 5.2786   LearningRate 0.0179   Epoch: 11   Global Step: 192680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:05,291-Speed 9580.76 samples/sec   Loss 5.3172   LearningRate 0.0179   Epoch: 11   Global Step: 192690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:06,330-Speed 9863.87 samples/sec   Loss 5.3615   LearningRate 0.0179   Epoch: 11   Global Step: 192700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:07,400-Speed 9572.51 samples/sec   Loss 5.3296   LearningRate 0.0179   Epoch: 11   Global Step: 192710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:08,487-Speed 9426.38 samples/sec   Loss 5.3271   LearningRate 0.0179   Epoch: 11   Global Step: 192720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:09,549-Speed 9646.97 samples/sec   Loss 5.3706   LearningRate 0.0179   Epoch: 11   Global Step: 192730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:10,629-Speed 9491.46 samples/sec   Loss 5.3100   LearningRate 0.0179   Epoch: 11   Global Step: 192740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:11,712-Speed 9458.59 samples/sec   Loss 5.3721   LearningRate 0.0179   Epoch: 11   Global Step: 192750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:12,793-Speed 9478.70 samples/sec   Loss 5.3657   LearningRate 0.0179   Epoch: 11   Global Step: 192760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:13,902-Speed 9236.25 samples/sec   Loss 5.2774   LearningRate 0.0179   Epoch: 11   Global Step: 192770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:19:14,972-Speed 9575.07 samples/sec   Loss 5.4543   LearningRate 0.0179   Epoch: 11   Global Step: 192780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:19:16,082-Speed 9232.55 samples/sec   Loss 5.2903   LearningRate 0.0178   Epoch: 11   Global Step: 192790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:19:17,172-Speed 9405.33 samples/sec   Loss 5.2450   LearningRate 0.0178   Epoch: 11   Global Step: 192800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:19:18,276-Speed 9274.34 samples/sec   Loss 5.2898   LearningRate 0.0178   Epoch: 11   Global Step: 192810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:19:19,366-Speed 9399.38 samples/sec   Loss 5.3429   LearningRate 0.0178   Epoch: 11   Global Step: 192820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:19:20,465-Speed 9326.86 samples/sec   Loss 5.4385   LearningRate 0.0178   Epoch: 11   Global Step: 192830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:19:21,548-Speed 9462.77 samples/sec   Loss 5.4323   LearningRate 0.0178   Epoch: 11   Global Step: 192840   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:19:22,639-Speed 9415.19 samples/sec   Loss 5.2882   LearningRate 0.0178   Epoch: 11   Global Step: 192850   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:19:23,733-Speed 9364.26 samples/sec   Loss 5.3096   LearningRate 0.0178   Epoch: 11   Global Step: 192860   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:19:24,804-Speed 9567.29 samples/sec   Loss 5.4348   LearningRate 0.0178   Epoch: 11   Global Step: 192870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:25,900-Speed 9342.12 samples/sec   Loss 5.4375   LearningRate 0.0178   Epoch: 11   Global Step: 192880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:26,964-Speed 9630.38 samples/sec   Loss 5.3322   LearningRate 0.0178   Epoch: 11   Global Step: 192890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:28,070-Speed 9261.22 samples/sec   Loss 5.3895   LearningRate 0.0178   Epoch: 11   Global Step: 192900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:29,167-Speed 9347.49 samples/sec   Loss 5.4314   LearningRate 0.0178   Epoch: 11   Global Step: 192910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:30,239-Speed 9554.99 samples/sec   Loss 5.2926   LearningRate 0.0178   Epoch: 11   Global Step: 192920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:31,315-Speed 9520.11 samples/sec   Loss 5.2957   LearningRate 0.0178   Epoch: 11   Global Step: 192930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:32,400-Speed 9444.39 samples/sec   Loss 5.4375   LearningRate 0.0178   Epoch: 11   Global Step: 192940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:33,502-Speed 9302.65 samples/sec   Loss 5.3463   LearningRate 0.0178   Epoch: 11   Global Step: 192950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:34,588-Speed 9434.93 samples/sec   Loss 5.3773   LearningRate 0.0178   Epoch: 11   Global Step: 192960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:35,645-Speed 9695.09 samples/sec   Loss 5.2735   LearningRate 0.0178   Epoch: 11   Global Step: 192970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:36,753-Speed 9245.90 samples/sec   Loss 5.2984   LearningRate 0.0178   Epoch: 11   Global Step: 192980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:37,862-Speed 9235.32 samples/sec   Loss 5.3067   LearningRate 0.0178   Epoch: 11   Global Step: 192990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:38,948-Speed 9436.78 samples/sec   Loss 5.3352   LearningRate 0.0178   Epoch: 11   Global Step: 193000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:40,017-Speed 9591.01 samples/sec   Loss 5.3855   LearningRate 0.0178   Epoch: 11   Global Step: 193010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:41,070-Speed 9729.44 samples/sec   Loss 5.3068   LearningRate 0.0178   Epoch: 11   Global Step: 193020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:42,156-Speed 9440.18 samples/sec   Loss 5.3350   LearningRate 0.0178   Epoch: 11   Global Step: 193030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:43,231-Speed 9533.34 samples/sec   Loss 5.3217   LearningRate 0.0178   Epoch: 11   Global Step: 193040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:44,310-Speed 9492.43 samples/sec   Loss 5.2713   LearningRate 0.0178   Epoch: 11   Global Step: 193050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:45,385-Speed 9529.96 samples/sec   Loss 5.3596   LearningRate 0.0178   Epoch: 11   Global Step: 193060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:46,470-Speed 9440.14 samples/sec   Loss 5.3755   LearningRate 0.0178   Epoch: 11   Global Step: 193070   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:19:47,561-Speed 9394.80 samples/sec   Loss 5.2476   LearningRate 0.0178   Epoch: 11   Global Step: 193080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:48,606-Speed 9799.91 samples/sec   Loss 5.3291   LearningRate 0.0178   Epoch: 11   Global Step: 193090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:49,687-Speed 9478.31 samples/sec   Loss 5.2749   LearningRate 0.0178   Epoch: 11   Global Step: 193100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:50,772-Speed 9446.02 samples/sec   Loss 5.3939   LearningRate 0.0178   Epoch: 11   Global Step: 193110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:51,821-Speed 9763.12 samples/sec   Loss 5.3953   LearningRate 0.0178   Epoch: 11   Global Step: 193120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:52,883-Speed 9660.30 samples/sec   Loss 5.3935   LearningRate 0.0178   Epoch: 11   Global Step: 193130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:53,952-Speed 9578.15 samples/sec   Loss 5.3320   LearningRate 0.0178   Epoch: 11   Global Step: 193140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:55,023-Speed 9568.83 samples/sec   Loss 5.3811   LearningRate 0.0178   Epoch: 11   Global Step: 193150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:56,094-Speed 9564.55 samples/sec   Loss 5.4635   LearningRate 0.0178   Epoch: 11   Global Step: 193160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:57,178-Speed 9454.88 samples/sec   Loss 5.2640   LearningRate 0.0178   Epoch: 11   Global Step: 193170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:19:58,277-Speed 9324.46 samples/sec   Loss 5.3458   LearningRate 0.0177   Epoch: 11   Global Step: 193180   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:19:59,352-Speed 9533.85 samples/sec   Loss 5.2842   LearningRate 0.0177   Epoch: 11   Global Step: 193190   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:20:00,420-Speed 9590.09 samples/sec   Loss 5.4196   LearningRate 0.0177   Epoch: 11   Global Step: 193200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:01,518-Speed 9337.24 samples/sec   Loss 5.3622   LearningRate 0.0177   Epoch: 11   Global Step: 193210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:02,676-Speed 8842.94 samples/sec   Loss 5.2710   LearningRate 0.0177   Epoch: 11   Global Step: 193220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:03,750-Speed 9545.97 samples/sec   Loss 5.2965   LearningRate 0.0177   Epoch: 11   Global Step: 193230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:04,838-Speed 9420.63 samples/sec   Loss 5.4114   LearningRate 0.0177   Epoch: 11   Global Step: 193240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:05,908-Speed 9569.53 samples/sec   Loss 5.2705   LearningRate 0.0177   Epoch: 11   Global Step: 193250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:06,981-Speed 9547.30 samples/sec   Loss 5.3459   LearningRate 0.0177   Epoch: 11   Global Step: 193260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:08,071-Speed 9399.32 samples/sec   Loss 5.3098   LearningRate 0.0177   Epoch: 11   Global Step: 193270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:09,137-Speed 9618.09 samples/sec   Loss 5.3160   LearningRate 0.0177   Epoch: 11   Global Step: 193280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:10,249-Speed 9216.29 samples/sec   Loss 5.2802   LearningRate 0.0177   Epoch: 11   Global Step: 193290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:11,332-Speed 9453.89 samples/sec   Loss 5.2830   LearningRate 0.0177   Epoch: 11   Global Step: 193300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:12,434-Speed 9304.11 samples/sec   Loss 5.3595   LearningRate 0.0177   Epoch: 11   Global Step: 193310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:13,511-Speed 9506.94 samples/sec   Loss 5.3567   LearningRate 0.0177   Epoch: 11   Global Step: 193320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:14,628-Speed 9173.35 samples/sec   Loss 5.3169   LearningRate 0.0177   Epoch: 11   Global Step: 193330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:15,714-Speed 9434.70 samples/sec   Loss 5.3274   LearningRate 0.0177   Epoch: 11   Global Step: 193340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:16,835-Speed 9144.13 samples/sec   Loss 5.3251   LearningRate 0.0177   Epoch: 11   Global Step: 193350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:17,921-Speed 9433.46 samples/sec   Loss 5.3868   LearningRate 0.0177   Epoch: 11   Global Step: 193360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:18,995-Speed 9544.93 samples/sec   Loss 5.2788   LearningRate 0.0177   Epoch: 11   Global Step: 193370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:20,062-Speed 9593.67 samples/sec   Loss 5.2986   LearningRate 0.0177   Epoch: 11   Global Step: 193380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:21,131-Speed 9587.06 samples/sec   Loss 5.3395   LearningRate 0.0177   Epoch: 11   Global Step: 193390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:22,245-Speed 9202.31 samples/sec   Loss 5.2933   LearningRate 0.0177   Epoch: 11   Global Step: 193400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:23,350-Speed 9271.04 samples/sec   Loss 5.2925   LearningRate 0.0177   Epoch: 11   Global Step: 193410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:20:24,423-Speed 9555.03 samples/sec   Loss 5.3003   LearningRate 0.0177   Epoch: 11   Global Step: 193420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:20:25,527-Speed 9276.06 samples/sec   Loss 5.3444   LearningRate 0.0177   Epoch: 11   Global Step: 193430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:20:26,598-Speed 9564.76 samples/sec   Loss 5.3252   LearningRate 0.0177   Epoch: 11   Global Step: 193440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:20:27,659-Speed 9661.70 samples/sec   Loss 5.3631   LearningRate 0.0177   Epoch: 11   Global Step: 193450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:20:28,781-Speed 9130.98 samples/sec   Loss 5.3049   LearningRate 0.0177   Epoch: 11   Global Step: 193460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:20:29,868-Speed 9422.61 samples/sec   Loss 5.3501   LearningRate 0.0177   Epoch: 11   Global Step: 193470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:20:30,971-Speed 9287.64 samples/sec   Loss 5.3815   LearningRate 0.0177   Epoch: 11   Global Step: 193480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:20:32,094-Speed 9129.19 samples/sec   Loss 5.3314   LearningRate 0.0177   Epoch: 11   Global Step: 193490   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:20:33,239-Speed 8952.49 samples/sec   Loss 5.3252   LearningRate 0.0177   Epoch: 11   Global Step: 193500   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:20:34,303-Speed 9628.31 samples/sec   Loss 5.2744   LearningRate 0.0177   Epoch: 11   Global Step: 193510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:35,404-Speed 9305.83 samples/sec   Loss 5.3026   LearningRate 0.0177   Epoch: 11   Global Step: 193520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:36,489-Speed 9443.46 samples/sec   Loss 5.4309   LearningRate 0.0177   Epoch: 11   Global Step: 193530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:37,592-Speed 9291.80 samples/sec   Loss 5.3200   LearningRate 0.0177   Epoch: 11   Global Step: 193540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:38,679-Speed 9426.00 samples/sec   Loss 5.2606   LearningRate 0.0177   Epoch: 11   Global Step: 193550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:39,756-Speed 9518.39 samples/sec   Loss 5.3391   LearningRate 0.0177   Epoch: 11   Global Step: 193560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:40,829-Speed 9551.73 samples/sec   Loss 5.5056   LearningRate 0.0177   Epoch: 11   Global Step: 193570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:41,932-Speed 9284.32 samples/sec   Loss 5.3241   LearningRate 0.0176   Epoch: 11   Global Step: 193580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:43,021-Speed 9408.91 samples/sec   Loss 5.3676   LearningRate 0.0176   Epoch: 11   Global Step: 193590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:44,171-Speed 8908.20 samples/sec   Loss 5.3777   LearningRate 0.0176   Epoch: 11   Global Step: 193600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:45,258-Speed 9430.72 samples/sec   Loss 5.3890   LearningRate 0.0176   Epoch: 11   Global Step: 193610   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:20:46,321-Speed 9640.51 samples/sec   Loss 5.2703   LearningRate 0.0176   Epoch: 11   Global Step: 193620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:47,411-Speed 9404.08 samples/sec   Loss 5.3708   LearningRate 0.0176   Epoch: 11   Global Step: 193630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:48,542-Speed 9055.44 samples/sec   Loss 5.2796   LearningRate 0.0176   Epoch: 11   Global Step: 193640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:49,574-Speed 9934.96 samples/sec   Loss 5.3136   LearningRate 0.0176   Epoch: 11   Global Step: 193650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:50,668-Speed 9363.46 samples/sec   Loss 5.3854   LearningRate 0.0176   Epoch: 11   Global Step: 193660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:51,755-Speed 9425.97 samples/sec   Loss 5.3826   LearningRate 0.0176   Epoch: 11   Global Step: 193670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:52,818-Speed 9636.83 samples/sec   Loss 5.3558   LearningRate 0.0176   Epoch: 11   Global Step: 193680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:53,906-Speed 9418.22 samples/sec   Loss 5.3232   LearningRate 0.0176   Epoch: 11   Global Step: 193690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:54,991-Speed 9448.72 samples/sec   Loss 5.3178   LearningRate 0.0176   Epoch: 11   Global Step: 193700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:56,065-Speed 9544.64 samples/sec   Loss 5.3955   LearningRate 0.0176   Epoch: 11   Global Step: 193710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:57,147-Speed 9470.74 samples/sec   Loss 5.2914   LearningRate 0.0176   Epoch: 11   Global Step: 193720   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:20:58,246-Speed 9325.38 samples/sec   Loss 5.3142   LearningRate 0.0176   Epoch: 11   Global Step: 193730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:20:59,319-Speed 9550.80 samples/sec   Loss 5.4162   LearningRate 0.0176   Epoch: 11   Global Step: 193740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:00,423-Speed 9282.77 samples/sec   Loss 5.2799   LearningRate 0.0176   Epoch: 11   Global Step: 193750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:01,538-Speed 9186.88 samples/sec   Loss 5.3214   LearningRate 0.0176   Epoch: 11   Global Step: 193760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:02,629-Speed 9385.65 samples/sec   Loss 5.2485   LearningRate 0.0176   Epoch: 11   Global Step: 193770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:03,733-Speed 9287.65 samples/sec   Loss 5.3417   LearningRate 0.0176   Epoch: 11   Global Step: 193780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:04,791-Speed 9686.74 samples/sec   Loss 5.3564   LearningRate 0.0176   Epoch: 11   Global Step: 193790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:05,858-Speed 9595.29 samples/sec   Loss 5.3326   LearningRate 0.0176   Epoch: 11   Global Step: 193800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:06,912-Speed 9726.20 samples/sec   Loss 5.3913   LearningRate 0.0176   Epoch: 11   Global Step: 193810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:08,002-Speed 9395.16 samples/sec   Loss 5.2558   LearningRate 0.0176   Epoch: 11   Global Step: 193820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:09,068-Speed 9617.33 samples/sec   Loss 5.3064   LearningRate 0.0176   Epoch: 11   Global Step: 193830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:10,134-Speed 9613.36 samples/sec   Loss 5.2773   LearningRate 0.0176   Epoch: 11   Global Step: 193840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:11,196-Speed 9649.35 samples/sec   Loss 5.3522   LearningRate 0.0176   Epoch: 11   Global Step: 193850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:12,320-Speed 9111.26 samples/sec   Loss 5.2298   LearningRate 0.0176   Epoch: 11   Global Step: 193860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:13,412-Speed 9383.49 samples/sec   Loss 5.2720   LearningRate 0.0176   Epoch: 11   Global Step: 193870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:14,506-Speed 9366.40 samples/sec   Loss 5.2534   LearningRate 0.0176   Epoch: 11   Global Step: 193880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:15,576-Speed 9582.30 samples/sec   Loss 5.4330   LearningRate 0.0176   Epoch: 11   Global Step: 193890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:16,648-Speed 9554.50 samples/sec   Loss 5.2794   LearningRate 0.0176   Epoch: 11   Global Step: 193900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:17,737-Speed 9405.21 samples/sec   Loss 5.2557   LearningRate 0.0176   Epoch: 11   Global Step: 193910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:18,803-Speed 9612.91 samples/sec   Loss 5.3919   LearningRate 0.0176   Epoch: 11   Global Step: 193920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:19,918-Speed 9190.86 samples/sec   Loss 5.3759   LearningRate 0.0176   Epoch: 11   Global Step: 193930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:20,965-Speed 9783.24 samples/sec   Loss 5.3605   LearningRate 0.0176   Epoch: 11   Global Step: 193940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:22,046-Speed 9482.97 samples/sec   Loss 5.3298   LearningRate 0.0176   Epoch: 11   Global Step: 193950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:23,132-Speed 9435.93 samples/sec   Loss 5.3590   LearningRate 0.0176   Epoch: 11   Global Step: 193960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:24,217-Speed 9440.15 samples/sec   Loss 5.2786   LearningRate 0.0176   Epoch: 11   Global Step: 193970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:25,270-Speed 9733.10 samples/sec   Loss 5.4155   LearningRate 0.0175   Epoch: 11   Global Step: 193980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:26,373-Speed 9286.44 samples/sec   Loss 5.4080   LearningRate 0.0175   Epoch: 11   Global Step: 193990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:27,477-Speed 9285.59 samples/sec   Loss 5.3456   LearningRate 0.0175   Epoch: 11   Global Step: 194000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:21:49,420-[lfw][194000]XNorm: 8.972369
Training: 2022-04-11 19:21:49,421-[lfw][194000]Accuracy-Flip: 0.99650+-0.00283
Training: 2022-04-11 19:21:49,421-[lfw][194000]Accuracy-Highest: 0.99683
Training: 2022-04-11 19:22:14,791-[cfp_fp][194000]XNorm: 7.656675
Training: 2022-04-11 19:22:14,792-[cfp_fp][194000]Accuracy-Flip: 0.96457+-0.00514
Training: 2022-04-11 19:22:14,792-[cfp_fp][194000]Accuracy-Highest: 0.96714
Training: 2022-04-11 19:22:36,647-[agedb_30][194000]XNorm: 8.662819
Training: 2022-04-11 19:22:36,648-[agedb_30][194000]Accuracy-Flip: 0.96917+-0.00768
Training: 2022-04-11 19:22:36,648-[agedb_30][194000]Accuracy-Highest: 0.96917
Training: 2022-04-11 19:22:37,725-Speed 145.77 samples/sec   Loss 5.3366   LearningRate 0.0175   Epoch: 11   Global Step: 194010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:38,812-Speed 9430.35 samples/sec   Loss 5.2909   LearningRate 0.0175   Epoch: 11   Global Step: 194020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:39,887-Speed 9527.31 samples/sec   Loss 5.2799   LearningRate 0.0175   Epoch: 11   Global Step: 194030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:40,973-Speed 9437.29 samples/sec   Loss 5.3817   LearningRate 0.0175   Epoch: 11   Global Step: 194040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:42,016-Speed 9824.82 samples/sec   Loss 5.3158   LearningRate 0.0175   Epoch: 11   Global Step: 194050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:43,058-Speed 9826.66 samples/sec   Loss 5.3416   LearningRate 0.0175   Epoch: 11   Global Step: 194060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:44,156-Speed 9334.93 samples/sec   Loss 5.3021   LearningRate 0.0175   Epoch: 11   Global Step: 194070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:45,241-Speed 9440.82 samples/sec   Loss 5.4085   LearningRate 0.0175   Epoch: 11   Global Step: 194080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:46,327-Speed 9437.12 samples/sec   Loss 5.1977   LearningRate 0.0175   Epoch: 11   Global Step: 194090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:47,398-Speed 9564.35 samples/sec   Loss 5.3532   LearningRate 0.0175   Epoch: 11   Global Step: 194100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:48,483-Speed 9447.42 samples/sec   Loss 5.3133   LearningRate 0.0175   Epoch: 11   Global Step: 194110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:49,515-Speed 9928.34 samples/sec   Loss 5.2474   LearningRate 0.0175   Epoch: 11   Global Step: 194120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:50,597-Speed 9466.70 samples/sec   Loss 5.2909   LearningRate 0.0175   Epoch: 11   Global Step: 194130   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:22:51,669-Speed 9564.69 samples/sec   Loss 5.3413   LearningRate 0.0175   Epoch: 11   Global Step: 194140   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:22:52,729-Speed 9660.98 samples/sec   Loss 5.2414   LearningRate 0.0175   Epoch: 11   Global Step: 194150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:53,815-Speed 9436.94 samples/sec   Loss 5.2773   LearningRate 0.0175   Epoch: 11   Global Step: 194160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:54,921-Speed 9260.97 samples/sec   Loss 5.3599   LearningRate 0.0175   Epoch: 11   Global Step: 194170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:56,013-Speed 9389.03 samples/sec   Loss 5.3775   LearningRate 0.0175   Epoch: 11   Global Step: 194180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:57,112-Speed 9320.34 samples/sec   Loss 5.3692   LearningRate 0.0175   Epoch: 11   Global Step: 194190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:58,204-Speed 9381.93 samples/sec   Loss 5.4333   LearningRate 0.0175   Epoch: 11   Global Step: 194200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:22:59,306-Speed 9298.93 samples/sec   Loss 5.3229   LearningRate 0.0175   Epoch: 11   Global Step: 194210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:00,397-Speed 9387.17 samples/sec   Loss 5.3339   LearningRate 0.0175   Epoch: 11   Global Step: 194220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:01,475-Speed 9511.90 samples/sec   Loss 5.2864   LearningRate 0.0175   Epoch: 11   Global Step: 194230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:02,557-Speed 9472.73 samples/sec   Loss 5.3168   LearningRate 0.0175   Epoch: 11   Global Step: 194240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:03,663-Speed 9265.98 samples/sec   Loss 5.2960   LearningRate 0.0175   Epoch: 11   Global Step: 194250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:04,743-Speed 9491.69 samples/sec   Loss 5.3560   LearningRate 0.0175   Epoch: 11   Global Step: 194260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:05,865-Speed 9130.83 samples/sec   Loss 5.2549   LearningRate 0.0175   Epoch: 11   Global Step: 194270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:06,944-Speed 9490.07 samples/sec   Loss 5.2016   LearningRate 0.0175   Epoch: 11   Global Step: 194280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:08,031-Speed 9430.44 samples/sec   Loss 5.2935   LearningRate 0.0175   Epoch: 11   Global Step: 194290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:09,192-Speed 8824.99 samples/sec   Loss 5.4205   LearningRate 0.0175   Epoch: 11   Global Step: 194300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:10,255-Speed 9640.70 samples/sec   Loss 5.2492   LearningRate 0.0175   Epoch: 11   Global Step: 194310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:11,360-Speed 9271.12 samples/sec   Loss 5.4284   LearningRate 0.0175   Epoch: 11   Global Step: 194320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:12,468-Speed 9251.23 samples/sec   Loss 5.3505   LearningRate 0.0175   Epoch: 11   Global Step: 194330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:13,554-Speed 9427.71 samples/sec   Loss 5.3615   LearningRate 0.0175   Epoch: 11   Global Step: 194340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:14,643-Speed 9409.77 samples/sec   Loss 5.3953   LearningRate 0.0175   Epoch: 11   Global Step: 194350   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:23:15,716-Speed 9551.87 samples/sec   Loss 5.3451   LearningRate 0.0175   Epoch: 11   Global Step: 194360   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:23:16,802-Speed 9428.55 samples/sec   Loss 5.2623   LearningRate 0.0175   Epoch: 11   Global Step: 194370   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:23:17,957-Speed 8872.97 samples/sec   Loss 5.3044   LearningRate 0.0174   Epoch: 11   Global Step: 194380   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:19,038-Speed 9498.30 samples/sec   Loss 5.3682   LearningRate 0.0174   Epoch: 11   Global Step: 194390   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:20,095-Speed 9696.10 samples/sec   Loss 5.3055   LearningRate 0.0174   Epoch: 11   Global Step: 194400   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:21,153-Speed 9686.79 samples/sec   Loss 5.3917   LearningRate 0.0174   Epoch: 11   Global Step: 194410   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:22,219-Speed 9607.43 samples/sec   Loss 5.4210   LearningRate 0.0174   Epoch: 11   Global Step: 194420   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:23,325-Speed 9267.21 samples/sec   Loss 5.3116   LearningRate 0.0174   Epoch: 11   Global Step: 194430   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:24,419-Speed 9366.04 samples/sec   Loss 5.3041   LearningRate 0.0174   Epoch: 11   Global Step: 194440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:25,457-Speed 9872.73 samples/sec   Loss 5.3951   LearningRate 0.0174   Epoch: 11   Global Step: 194450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:26,533-Speed 9519.90 samples/sec   Loss 5.3668   LearningRate 0.0174   Epoch: 11   Global Step: 194460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:27,587-Speed 9726.12 samples/sec   Loss 5.4180   LearningRate 0.0174   Epoch: 11   Global Step: 194470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:28,686-Speed 9319.98 samples/sec   Loss 5.3128   LearningRate 0.0174   Epoch: 11   Global Step: 194480   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:23:29,770-Speed 9454.60 samples/sec   Loss 5.2584   LearningRate 0.0174   Epoch: 11   Global Step: 194490   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:23:30,850-Speed 9479.85 samples/sec   Loss 5.2719   LearningRate 0.0174   Epoch: 11   Global Step: 194500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:31,909-Speed 9680.18 samples/sec   Loss 5.3540   LearningRate 0.0174   Epoch: 11   Global Step: 194510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:33,009-Speed 9310.32 samples/sec   Loss 5.4131   LearningRate 0.0174   Epoch: 11   Global Step: 194520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:34,080-Speed 9570.59 samples/sec   Loss 5.3441   LearningRate 0.0174   Epoch: 11   Global Step: 194530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:35,174-Speed 9367.23 samples/sec   Loss 5.3601   LearningRate 0.0174   Epoch: 11   Global Step: 194540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:36,259-Speed 9438.28 samples/sec   Loss 5.4718   LearningRate 0.0174   Epoch: 11   Global Step: 194550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:37,365-Speed 9266.73 samples/sec   Loss 5.2953   LearningRate 0.0174   Epoch: 11   Global Step: 194560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:38,448-Speed 9464.36 samples/sec   Loss 5.3395   LearningRate 0.0174   Epoch: 11   Global Step: 194570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:39,549-Speed 9303.43 samples/sec   Loss 5.4915   LearningRate 0.0174   Epoch: 11   Global Step: 194580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:40,609-Speed 9672.14 samples/sec   Loss 5.3562   LearningRate 0.0174   Epoch: 11   Global Step: 194590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:41,679-Speed 9572.27 samples/sec   Loss 5.3021   LearningRate 0.0174   Epoch: 11   Global Step: 194600   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:23:42,749-Speed 9574.92 samples/sec   Loss 5.3188   LearningRate 0.0174   Epoch: 11   Global Step: 194610   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:43,832-Speed 9464.85 samples/sec   Loss 5.3548   LearningRate 0.0174   Epoch: 11   Global Step: 194620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:44,942-Speed 9228.00 samples/sec   Loss 5.3393   LearningRate 0.0174   Epoch: 11   Global Step: 194630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:46,057-Speed 9186.73 samples/sec   Loss 5.3449   LearningRate 0.0174   Epoch: 11   Global Step: 194640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:47,113-Speed 9704.48 samples/sec   Loss 5.2528   LearningRate 0.0174   Epoch: 11   Global Step: 194650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:48,218-Speed 9270.86 samples/sec   Loss 5.2933   LearningRate 0.0174   Epoch: 11   Global Step: 194660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:49,302-Speed 9450.85 samples/sec   Loss 5.3762   LearningRate 0.0174   Epoch: 11   Global Step: 194670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:50,388-Speed 9435.43 samples/sec   Loss 5.3290   LearningRate 0.0174   Epoch: 11   Global Step: 194680   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:51,461-Speed 9554.78 samples/sec   Loss 5.3173   LearningRate 0.0174   Epoch: 11   Global Step: 194690   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:52,543-Speed 9464.77 samples/sec   Loss 5.3289   LearningRate 0.0174   Epoch: 11   Global Step: 194700   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:53,659-Speed 9177.46 samples/sec   Loss 5.2720   LearningRate 0.0174   Epoch: 11   Global Step: 194710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:54,709-Speed 9762.33 samples/sec   Loss 5.3691   LearningRate 0.0174   Epoch: 11   Global Step: 194720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:55,816-Speed 9255.95 samples/sec   Loss 5.4574   LearningRate 0.0174   Epoch: 11   Global Step: 194730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:23:56,891-Speed 9535.13 samples/sec   Loss 5.3782   LearningRate 0.0174   Epoch: 11   Global Step: 194740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:23:58,039-Speed 8925.42 samples/sec   Loss 5.3010   LearningRate 0.0174   Epoch: 11   Global Step: 194750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:23:59,166-Speed 9090.68 samples/sec   Loss 5.3153   LearningRate 0.0174   Epoch: 11   Global Step: 194760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:24:00,236-Speed 9580.42 samples/sec   Loss 5.3414   LearningRate 0.0174   Epoch: 11   Global Step: 194770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:24:01,297-Speed 9651.06 samples/sec   Loss 5.4597   LearningRate 0.0173   Epoch: 11   Global Step: 194780   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:24:02,403-Speed 9271.01 samples/sec   Loss 5.3203   LearningRate 0.0173   Epoch: 11   Global Step: 194790   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:24:03,501-Speed 9327.16 samples/sec   Loss 5.3076   LearningRate 0.0173   Epoch: 11   Global Step: 194800   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:24:04,589-Speed 9419.54 samples/sec   Loss 5.3327   LearningRate 0.0173   Epoch: 11   Global Step: 194810   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:24:05,668-Speed 9491.50 samples/sec   Loss 5.3669   LearningRate 0.0173   Epoch: 11   Global Step: 194820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:24:06,767-Speed 9321.85 samples/sec   Loss 5.2551   LearningRate 0.0173   Epoch: 11   Global Step: 194830   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:24:07,846-Speed 9505.54 samples/sec   Loss 5.2641   LearningRate 0.0173   Epoch: 11   Global Step: 194840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:08,920-Speed 9535.79 samples/sec   Loss 5.3464   LearningRate 0.0173   Epoch: 11   Global Step: 194850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:10,031-Speed 9218.86 samples/sec   Loss 5.3896   LearningRate 0.0173   Epoch: 11   Global Step: 194860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:11,091-Speed 9673.60 samples/sec   Loss 5.3916   LearningRate 0.0173   Epoch: 11   Global Step: 194870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:12,184-Speed 9372.95 samples/sec   Loss 5.3801   LearningRate 0.0173   Epoch: 11   Global Step: 194880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:13,253-Speed 9585.12 samples/sec   Loss 5.3513   LearningRate 0.0173   Epoch: 11   Global Step: 194890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:14,352-Speed 9322.35 samples/sec   Loss 5.3563   LearningRate 0.0173   Epoch: 11   Global Step: 194900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:15,455-Speed 9289.34 samples/sec   Loss 5.3558   LearningRate 0.0173   Epoch: 11   Global Step: 194910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:16,550-Speed 9350.41 samples/sec   Loss 5.3631   LearningRate 0.0173   Epoch: 11   Global Step: 194920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:17,602-Speed 9748.30 samples/sec   Loss 5.2984   LearningRate 0.0173   Epoch: 11   Global Step: 194930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:18,693-Speed 9386.20 samples/sec   Loss 5.3478   LearningRate 0.0173   Epoch: 11   Global Step: 194940   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:24:19,770-Speed 9519.15 samples/sec   Loss 5.4340   LearningRate 0.0173   Epoch: 11   Global Step: 194950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:20,841-Speed 9569.90 samples/sec   Loss 5.2584   LearningRate 0.0173   Epoch: 11   Global Step: 194960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:21,941-Speed 9311.96 samples/sec   Loss 5.3238   LearningRate 0.0173   Epoch: 11   Global Step: 194970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:23,063-Speed 9128.57 samples/sec   Loss 5.3370   LearningRate 0.0173   Epoch: 11   Global Step: 194980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:24,168-Speed 9279.93 samples/sec   Loss 5.4028   LearningRate 0.0173   Epoch: 11   Global Step: 194990   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:25,211-Speed 9816.41 samples/sec   Loss 5.3849   LearningRate 0.0173   Epoch: 11   Global Step: 195000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:26,321-Speed 9235.40 samples/sec   Loss 5.3740   LearningRate 0.0173   Epoch: 11   Global Step: 195010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:27,414-Speed 9373.86 samples/sec   Loss 5.3844   LearningRate 0.0173   Epoch: 11   Global Step: 195020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:28,508-Speed 9362.59 samples/sec   Loss 5.2479   LearningRate 0.0173   Epoch: 11   Global Step: 195030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:29,576-Speed 9597.64 samples/sec   Loss 5.3052   LearningRate 0.0173   Epoch: 11   Global Step: 195040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:30,637-Speed 9650.09 samples/sec   Loss 5.3680   LearningRate 0.0173   Epoch: 11   Global Step: 195050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:31,680-Speed 9836.01 samples/sec   Loss 5.3663   LearningRate 0.0173   Epoch: 11   Global Step: 195060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:32,733-Speed 9730.25 samples/sec   Loss 5.3712   LearningRate 0.0173   Epoch: 11   Global Step: 195070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:33,780-Speed 9787.30 samples/sec   Loss 5.3780   LearningRate 0.0173   Epoch: 11   Global Step: 195080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:34,863-Speed 9463.58 samples/sec   Loss 5.3743   LearningRate 0.0173   Epoch: 11   Global Step: 195090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:35,952-Speed 9412.61 samples/sec   Loss 5.3762   LearningRate 0.0173   Epoch: 11   Global Step: 195100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:37,030-Speed 9502.30 samples/sec   Loss 5.3457   LearningRate 0.0173   Epoch: 11   Global Step: 195110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:38,138-Speed 9247.12 samples/sec   Loss 5.2503   LearningRate 0.0173   Epoch: 11   Global Step: 195120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:39,250-Speed 9215.50 samples/sec   Loss 5.3684   LearningRate 0.0173   Epoch: 11   Global Step: 195130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:40,320-Speed 9578.95 samples/sec   Loss 5.2669   LearningRate 0.0173   Epoch: 11   Global Step: 195140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:41,374-Speed 9724.88 samples/sec   Loss 5.3156   LearningRate 0.0173   Epoch: 11   Global Step: 195150   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:24:42,453-Speed 9497.28 samples/sec   Loss 5.3872   LearningRate 0.0173   Epoch: 11   Global Step: 195160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:43,612-Speed 8833.70 samples/sec   Loss 5.3191   LearningRate 0.0173   Epoch: 11   Global Step: 195170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:44,699-Speed 9430.84 samples/sec   Loss 5.3477   LearningRate 0.0172   Epoch: 11   Global Step: 195180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:45,759-Speed 9664.26 samples/sec   Loss 5.2643   LearningRate 0.0172   Epoch: 11   Global Step: 195190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:46,834-Speed 9533.35 samples/sec   Loss 5.3278   LearningRate 0.0172   Epoch: 11   Global Step: 195200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:47,895-Speed 9656.07 samples/sec   Loss 5.3301   LearningRate 0.0172   Epoch: 11   Global Step: 195210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:48,996-Speed 9305.99 samples/sec   Loss 5.3499   LearningRate 0.0172   Epoch: 11   Global Step: 195220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:50,062-Speed 9616.44 samples/sec   Loss 5.2826   LearningRate 0.0172   Epoch: 11   Global Step: 195230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:51,137-Speed 9530.43 samples/sec   Loss 5.2924   LearningRate 0.0172   Epoch: 11   Global Step: 195240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:52,245-Speed 9242.52 samples/sec   Loss 5.2975   LearningRate 0.0172   Epoch: 11   Global Step: 195250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:53,339-Speed 9364.11 samples/sec   Loss 5.3443   LearningRate 0.0172   Epoch: 11   Global Step: 195260   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:24:54,401-Speed 9647.71 samples/sec   Loss 5.3314   LearningRate 0.0172   Epoch: 11   Global Step: 195270   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:24:55,493-Speed 9387.84 samples/sec   Loss 5.4226   LearningRate 0.0172   Epoch: 11   Global Step: 195280   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:24:56,592-Speed 9331.06 samples/sec   Loss 5.3971   LearningRate 0.0172   Epoch: 11   Global Step: 195290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:57,699-Speed 9251.15 samples/sec   Loss 5.3474   LearningRate 0.0172   Epoch: 11   Global Step: 195300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:58,813-Speed 9199.13 samples/sec   Loss 5.2936   LearningRate 0.0172   Epoch: 11   Global Step: 195310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:24:59,867-Speed 9722.66 samples/sec   Loss 5.3027   LearningRate 0.0172   Epoch: 11   Global Step: 195320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:00,994-Speed 9088.41 samples/sec   Loss 5.2225   LearningRate 0.0172   Epoch: 11   Global Step: 195330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:02,081-Speed 9424.25 samples/sec   Loss 5.3172   LearningRate 0.0172   Epoch: 11   Global Step: 195340   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:03,159-Speed 9506.02 samples/sec   Loss 5.3194   LearningRate 0.0172   Epoch: 11   Global Step: 195350   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:04,224-Speed 9621.05 samples/sec   Loss 5.3415   LearningRate 0.0172   Epoch: 11   Global Step: 195360   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:05,294-Speed 9574.96 samples/sec   Loss 5.3580   LearningRate 0.0172   Epoch: 11   Global Step: 195370   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:06,364-Speed 9575.23 samples/sec   Loss 5.3305   LearningRate 0.0172   Epoch: 11   Global Step: 195380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:07,438-Speed 9543.31 samples/sec   Loss 5.2692   LearningRate 0.0172   Epoch: 11   Global Step: 195390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:08,541-Speed 9289.86 samples/sec   Loss 5.3368   LearningRate 0.0172   Epoch: 11   Global Step: 195400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:09,639-Speed 9333.85 samples/sec   Loss 5.3745   LearningRate 0.0172   Epoch: 11   Global Step: 195410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:10,694-Speed 9706.42 samples/sec   Loss 5.3174   LearningRate 0.0172   Epoch: 11   Global Step: 195420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:11,835-Speed 8986.02 samples/sec   Loss 5.3466   LearningRate 0.0172   Epoch: 11   Global Step: 195430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:12,957-Speed 9131.69 samples/sec   Loss 5.3158   LearningRate 0.0172   Epoch: 11   Global Step: 195440   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:14,081-Speed 9114.90 samples/sec   Loss 5.2613   LearningRate 0.0172   Epoch: 11   Global Step: 195450   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:15,148-Speed 9598.53 samples/sec   Loss 5.3780   LearningRate 0.0172   Epoch: 11   Global Step: 195460   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:16,208-Speed 9670.64 samples/sec   Loss 5.3164   LearningRate 0.0172   Epoch: 11   Global Step: 195470   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:17,292-Speed 9448.20 samples/sec   Loss 5.3461   LearningRate 0.0172   Epoch: 11   Global Step: 195480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:18,365-Speed 9549.45 samples/sec   Loss 5.3111   LearningRate 0.0172   Epoch: 11   Global Step: 195490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:19,471-Speed 9269.09 samples/sec   Loss 5.2848   LearningRate 0.0172   Epoch: 11   Global Step: 195500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:20,591-Speed 9151.32 samples/sec   Loss 5.3730   LearningRate 0.0172   Epoch: 11   Global Step: 195510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:21,713-Speed 9132.12 samples/sec   Loss 5.3923   LearningRate 0.0172   Epoch: 11   Global Step: 195520   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:22,775-Speed 9641.03 samples/sec   Loss 5.2727   LearningRate 0.0172   Epoch: 11   Global Step: 195530   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:23,820-Speed 9810.04 samples/sec   Loss 5.3437   LearningRate 0.0172   Epoch: 11   Global Step: 195540   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:24,873-Speed 9722.87 samples/sec   Loss 5.4497   LearningRate 0.0172   Epoch: 11   Global Step: 195550   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:25,972-Speed 9332.46 samples/sec   Loss 5.2875   LearningRate 0.0172   Epoch: 11   Global Step: 195560   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:27,095-Speed 9121.87 samples/sec   Loss 5.3076   LearningRate 0.0172   Epoch: 11   Global Step: 195570   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:28,151-Speed 9704.36 samples/sec   Loss 5.2888   LearningRate 0.0171   Epoch: 11   Global Step: 195580   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:29,227-Speed 9519.28 samples/sec   Loss 5.3048   LearningRate 0.0171   Epoch: 11   Global Step: 195590   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:30,363-Speed 9017.36 samples/sec   Loss 5.3974   LearningRate 0.0171   Epoch: 11   Global Step: 195600   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:31,469-Speed 9266.60 samples/sec   Loss 5.3623   LearningRate 0.0171   Epoch: 11   Global Step: 195610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:32,582-Speed 9198.95 samples/sec   Loss 5.2778   LearningRate 0.0171   Epoch: 11   Global Step: 195620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:33,669-Speed 9432.54 samples/sec   Loss 5.3629   LearningRate 0.0171   Epoch: 11   Global Step: 195630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:34,732-Speed 9637.37 samples/sec   Loss 5.3714   LearningRate 0.0171   Epoch: 11   Global Step: 195640   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:35,872-Speed 8989.34 samples/sec   Loss 5.2547   LearningRate 0.0171   Epoch: 11   Global Step: 195650   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:36,967-Speed 9352.67 samples/sec   Loss 5.3279   LearningRate 0.0171   Epoch: 11   Global Step: 195660   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:38,033-Speed 9618.41 samples/sec   Loss 5.3609   LearningRate 0.0171   Epoch: 11   Global Step: 195670   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:39,137-Speed 9278.26 samples/sec   Loss 5.3680   LearningRate 0.0171   Epoch: 11   Global Step: 195680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:40,197-Speed 9665.13 samples/sec   Loss 5.3763   LearningRate 0.0171   Epoch: 11   Global Step: 195690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:41,325-Speed 9085.94 samples/sec   Loss 5.1696   LearningRate 0.0171   Epoch: 11   Global Step: 195700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:25:42,451-Speed 9099.84 samples/sec   Loss 5.3639   LearningRate 0.0171   Epoch: 11   Global Step: 195710   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:43,557-Speed 9265.40 samples/sec   Loss 5.2974   LearningRate 0.0171   Epoch: 11   Global Step: 195720   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:44,634-Speed 9507.40 samples/sec   Loss 5.2945   LearningRate 0.0171   Epoch: 11   Global Step: 195730   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:45,669-Speed 9900.38 samples/sec   Loss 5.3211   LearningRate 0.0171   Epoch: 11   Global Step: 195740   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:46,797-Speed 9085.78 samples/sec   Loss 5.3856   LearningRate 0.0171   Epoch: 11   Global Step: 195750   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:47,904-Speed 9258.59 samples/sec   Loss 5.3462   LearningRate 0.0171   Epoch: 11   Global Step: 195760   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:48,981-Speed 9509.21 samples/sec   Loss 5.2973   LearningRate 0.0171   Epoch: 11   Global Step: 195770   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:50,078-Speed 9349.01 samples/sec   Loss 5.3306   LearningRate 0.0171   Epoch: 11   Global Step: 195780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:51,182-Speed 9280.20 samples/sec   Loss 5.3715   LearningRate 0.0171   Epoch: 11   Global Step: 195790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:52,224-Speed 9831.98 samples/sec   Loss 5.2896   LearningRate 0.0171   Epoch: 11   Global Step: 195800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:53,348-Speed 9116.50 samples/sec   Loss 5.2655   LearningRate 0.0171   Epoch: 11   Global Step: 195810   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:25:54,457-Speed 9240.32 samples/sec   Loss 5.2840   LearningRate 0.0171   Epoch: 11   Global Step: 195820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:55,539-Speed 9471.81 samples/sec   Loss 5.2063   LearningRate 0.0171   Epoch: 11   Global Step: 195830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:56,673-Speed 9037.22 samples/sec   Loss 5.2627   LearningRate 0.0171   Epoch: 11   Global Step: 195840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:57,797-Speed 9113.89 samples/sec   Loss 5.3274   LearningRate 0.0171   Epoch: 11   Global Step: 195850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:58,864-Speed 9601.17 samples/sec   Loss 5.2664   LearningRate 0.0171   Epoch: 11   Global Step: 195860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:25:59,945-Speed 9477.08 samples/sec   Loss 5.3010   LearningRate 0.0171   Epoch: 11   Global Step: 195870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:26:01,023-Speed 9510.88 samples/sec   Loss 5.2738   LearningRate 0.0171   Epoch: 11   Global Step: 195880   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:26:02,112-Speed 9408.26 samples/sec   Loss 5.2007   LearningRate 0.0171   Epoch: 11   Global Step: 195890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:26:03,232-Speed 9140.70 samples/sec   Loss 5.4236   LearningRate 0.0171   Epoch: 11   Global Step: 195900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:26:04,313-Speed 9482.63 samples/sec   Loss 5.3752   LearningRate 0.0171   Epoch: 11   Global Step: 195910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:26:05,421-Speed 9248.36 samples/sec   Loss 5.3361   LearningRate 0.0171   Epoch: 11   Global Step: 195920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:26:06,515-Speed 9369.07 samples/sec   Loss 5.3549   LearningRate 0.0171   Epoch: 11   Global Step: 195930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:26:07,625-Speed 9222.79 samples/sec   Loss 5.3338   LearningRate 0.0171   Epoch: 11   Global Step: 195940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:26:08,749-Speed 9122.93 samples/sec   Loss 5.2891   LearningRate 0.0171   Epoch: 11   Global Step: 195950   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:26:09,851-Speed 9298.38 samples/sec   Loss 5.3865   LearningRate 0.0171   Epoch: 11   Global Step: 195960   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:26:10,944-Speed 9374.03 samples/sec   Loss 5.2699   LearningRate 0.0171   Epoch: 11   Global Step: 195970   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:26:11,996-Speed 9743.43 samples/sec   Loss 5.2425   LearningRate 0.0171   Epoch: 11   Global Step: 195980   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:26:13,069-Speed 9548.35 samples/sec   Loss 5.3089   LearningRate 0.0170   Epoch: 11   Global Step: 195990   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:26:14,152-Speed 9464.23 samples/sec   Loss 5.3746   LearningRate 0.0170   Epoch: 11   Global Step: 196000   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:26:35,926-[lfw][196000]XNorm: 8.772796
Training: 2022-04-11 19:26:35,926-[lfw][196000]Accuracy-Flip: 0.99667+-0.00269
Training: 2022-04-11 19:26:35,926-[lfw][196000]Accuracy-Highest: 0.99683
Training: 2022-04-11 19:27:01,378-[cfp_fp][196000]XNorm: 7.544887
Training: 2022-04-11 19:27:01,379-[cfp_fp][196000]Accuracy-Flip: 0.96386+-0.00948
Training: 2022-04-11 19:27:01,379-[cfp_fp][196000]Accuracy-Highest: 0.96714
Training: 2022-04-11 19:27:23,137-[agedb_30][196000]XNorm: 8.494056
Training: 2022-04-11 19:27:23,138-[agedb_30][196000]Accuracy-Flip: 0.96983+-0.00886
Training: 2022-04-11 19:27:23,138-[agedb_30][196000]Accuracy-Highest: 0.96983
Training: 2022-04-11 19:27:24,192-Speed 146.20 samples/sec   Loss 5.3655   LearningRate 0.0170   Epoch: 11   Global Step: 196010   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:27:25,248-Speed 9699.20 samples/sec   Loss 5.3619   LearningRate 0.0170   Epoch: 11   Global Step: 196020   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:27:26,307-Speed 9674.83 samples/sec   Loss 5.3413   LearningRate 0.0170   Epoch: 11   Global Step: 196030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:27:27,368-Speed 9659.68 samples/sec   Loss 5.3550   LearningRate 0.0170   Epoch: 11   Global Step: 196040   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:27:28,420-Speed 9737.29 samples/sec   Loss 5.2804   LearningRate 0.0170   Epoch: 11   Global Step: 196050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:29,508-Speed 9413.47 samples/sec   Loss 5.3889   LearningRate 0.0170   Epoch: 11   Global Step: 196060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:30,574-Speed 9623.79 samples/sec   Loss 5.3183   LearningRate 0.0170   Epoch: 11   Global Step: 196070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:31,657-Speed 9454.05 samples/sec   Loss 5.3361   LearningRate 0.0170   Epoch: 11   Global Step: 196080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:32,791-Speed 9035.12 samples/sec   Loss 5.3121   LearningRate 0.0170   Epoch: 11   Global Step: 196090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:33,861-Speed 9574.60 samples/sec   Loss 5.4372   LearningRate 0.0170   Epoch: 11   Global Step: 196100   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:34,928-Speed 9603.93 samples/sec   Loss 5.2827   LearningRate 0.0170   Epoch: 11   Global Step: 196110   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:36,006-Speed 9501.82 samples/sec   Loss 5.3238   LearningRate 0.0170   Epoch: 11   Global Step: 196120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:37,066-Speed 9672.51 samples/sec   Loss 5.3351   LearningRate 0.0170   Epoch: 11   Global Step: 196130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:38,185-Speed 9151.99 samples/sec   Loss 5.3048   LearningRate 0.0170   Epoch: 11   Global Step: 196140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:39,287-Speed 9295.57 samples/sec   Loss 5.3375   LearningRate 0.0170   Epoch: 11   Global Step: 196150   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:27:40,322-Speed 9907.23 samples/sec   Loss 5.3127   LearningRate 0.0170   Epoch: 11   Global Step: 196160   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:27:41,400-Speed 9502.99 samples/sec   Loss 5.3088   LearningRate 0.0170   Epoch: 11   Global Step: 196170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:42,478-Speed 9504.93 samples/sec   Loss 5.3814   LearningRate 0.0170   Epoch: 11   Global Step: 196180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:43,553-Speed 9527.68 samples/sec   Loss 5.3857   LearningRate 0.0170   Epoch: 11   Global Step: 196190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:44,625-Speed 9561.55 samples/sec   Loss 5.3122   LearningRate 0.0170   Epoch: 11   Global Step: 196200   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:27:45,684-Speed 9677.10 samples/sec   Loss 5.3707   LearningRate 0.0170   Epoch: 11   Global Step: 196210   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:27:46,798-Speed 9206.64 samples/sec   Loss 5.3744   LearningRate 0.0170   Epoch: 11   Global Step: 196220   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:27:47,908-Speed 9226.98 samples/sec   Loss 5.3104   LearningRate 0.0170   Epoch: 11   Global Step: 196230   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:27:48,971-Speed 9645.19 samples/sec   Loss 5.3128   LearningRate 0.0170   Epoch: 11   Global Step: 196240   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:27:50,059-Speed 9409.68 samples/sec   Loss 5.3143   LearningRate 0.0170   Epoch: 11   Global Step: 196250   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:27:51,095-Speed 9889.77 samples/sec   Loss 5.2999   LearningRate 0.0170   Epoch: 11   Global Step: 196260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:27:52,196-Speed 9307.65 samples/sec   Loss 5.4012   LearningRate 0.0170   Epoch: 11   Global Step: 196270   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:27:53,242-Speed 9798.39 samples/sec   Loss 5.3540   LearningRate 0.0170   Epoch: 11   Global Step: 196280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:27:54,304-Speed 9642.39 samples/sec   Loss 5.2949   LearningRate 0.0170   Epoch: 11   Global Step: 196290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:27:55,379-Speed 9537.82 samples/sec   Loss 5.3269   LearningRate 0.0170   Epoch: 11   Global Step: 196300   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:56,461-Speed 9468.02 samples/sec   Loss 5.2768   LearningRate 0.0170   Epoch: 11   Global Step: 196310   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:57,563-Speed 9296.90 samples/sec   Loss 5.2429   LearningRate 0.0170   Epoch: 11   Global Step: 196320   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:58,666-Speed 9284.24 samples/sec   Loss 5.2804   LearningRate 0.0170   Epoch: 11   Global Step: 196330   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:27:59,738-Speed 9559.46 samples/sec   Loss 5.2897   LearningRate 0.0170   Epoch: 11   Global Step: 196340   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:00,816-Speed 9507.31 samples/sec   Loss 5.2893   LearningRate 0.0170   Epoch: 11   Global Step: 196350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:01,867-Speed 9752.88 samples/sec   Loss 5.3565   LearningRate 0.0170   Epoch: 11   Global Step: 196360   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:02,998-Speed 9054.19 samples/sec   Loss 5.3260   LearningRate 0.0170   Epoch: 11   Global Step: 196370   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:04,121-Speed 9128.36 samples/sec   Loss 5.3651   LearningRate 0.0170   Epoch: 11   Global Step: 196380   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:05,211-Speed 9397.92 samples/sec   Loss 5.3196   LearningRate 0.0169   Epoch: 11   Global Step: 196390   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:06,282-Speed 9562.61 samples/sec   Loss 5.2925   LearningRate 0.0169   Epoch: 11   Global Step: 196400   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:07,362-Speed 9491.70 samples/sec   Loss 5.3171   LearningRate 0.0169   Epoch: 11   Global Step: 196410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:08,456-Speed 9368.08 samples/sec   Loss 5.2962   LearningRate 0.0169   Epoch: 11   Global Step: 196420   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:09,526-Speed 9568.16 samples/sec   Loss 5.3201   LearningRate 0.0169   Epoch: 11   Global Step: 196430   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:10,608-Speed 9479.41 samples/sec   Loss 5.4775   LearningRate 0.0169   Epoch: 11   Global Step: 196440   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:11,698-Speed 9399.63 samples/sec   Loss 5.4507   LearningRate 0.0169   Epoch: 11   Global Step: 196450   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:12,802-Speed 9277.45 samples/sec   Loss 5.3606   LearningRate 0.0169   Epoch: 11   Global Step: 196460   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:13,913-Speed 9222.20 samples/sec   Loss 5.2806   LearningRate 0.0169   Epoch: 11   Global Step: 196470   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:15,014-Speed 9304.16 samples/sec   Loss 5.3003   LearningRate 0.0169   Epoch: 11   Global Step: 196480   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:16,081-Speed 9607.79 samples/sec   Loss 5.3060   LearningRate 0.0169   Epoch: 11   Global Step: 196490   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:17,153-Speed 9556.34 samples/sec   Loss 5.3598   LearningRate 0.0169   Epoch: 11   Global Step: 196500   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:18,239-Speed 9432.41 samples/sec   Loss 5.3485   LearningRate 0.0169   Epoch: 11   Global Step: 196510   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:19,345-Speed 9269.67 samples/sec   Loss 5.3771   LearningRate 0.0169   Epoch: 11   Global Step: 196520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:20,475-Speed 9064.61 samples/sec   Loss 5.3262   LearningRate 0.0169   Epoch: 11   Global Step: 196530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:21,599-Speed 9117.81 samples/sec   Loss 5.2890   LearningRate 0.0169   Epoch: 11   Global Step: 196540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:22,657-Speed 9681.55 samples/sec   Loss 5.1568   LearningRate 0.0169   Epoch: 11   Global Step: 196550   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:23,747-Speed 9403.18 samples/sec   Loss 5.2944   LearningRate 0.0169   Epoch: 11   Global Step: 196560   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:24,835-Speed 9417.85 samples/sec   Loss 5.3629   LearningRate 0.0169   Epoch: 11   Global Step: 196570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:25,899-Speed 9629.92 samples/sec   Loss 5.3076   LearningRate 0.0169   Epoch: 11   Global Step: 196580   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:26,954-Speed 9709.13 samples/sec   Loss 5.1319   LearningRate 0.0169   Epoch: 11   Global Step: 196590   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:28,019-Speed 9624.49 samples/sec   Loss 5.2598   LearningRate 0.0169   Epoch: 11   Global Step: 196600   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:29,108-Speed 9408.23 samples/sec   Loss 5.3491   LearningRate 0.0169   Epoch: 11   Global Step: 196610   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:30,197-Speed 9408.75 samples/sec   Loss 5.4879   LearningRate 0.0169   Epoch: 11   Global Step: 196620   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:31,270-Speed 9550.81 samples/sec   Loss 5.3002   LearningRate 0.0169   Epoch: 11   Global Step: 196630   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:32,357-Speed 9423.13 samples/sec   Loss 5.2783   LearningRate 0.0169   Epoch: 11   Global Step: 196640   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:33,448-Speed 9391.39 samples/sec   Loss 5.3645   LearningRate 0.0169   Epoch: 11   Global Step: 196650   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:34,533-Speed 9450.42 samples/sec   Loss 5.3081   LearningRate 0.0169   Epoch: 11   Global Step: 196660   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:35,564-Speed 9928.96 samples/sec   Loss 5.3879   LearningRate 0.0169   Epoch: 11   Global Step: 196670   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:36,674-Speed 9235.68 samples/sec   Loss 5.3035   LearningRate 0.0169   Epoch: 11   Global Step: 196680   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:37,729-Speed 9706.55 samples/sec   Loss 5.3290   LearningRate 0.0169   Epoch: 11   Global Step: 196690   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:38,796-Speed 9603.01 samples/sec   Loss 5.3422   LearningRate 0.0169   Epoch: 11   Global Step: 196700   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:39,889-Speed 9376.72 samples/sec   Loss 5.3052   LearningRate 0.0169   Epoch: 11   Global Step: 196710   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:40,944-Speed 9714.03 samples/sec   Loss 5.3440   LearningRate 0.0169   Epoch: 11   Global Step: 196720   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:41,998-Speed 9723.15 samples/sec   Loss 5.3068   LearningRate 0.0169   Epoch: 11   Global Step: 196730   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:43,056-Speed 9686.72 samples/sec   Loss 5.3668   LearningRate 0.0169   Epoch: 11   Global Step: 196740   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:44,175-Speed 9153.68 samples/sec   Loss 5.2818   LearningRate 0.0169   Epoch: 11   Global Step: 196750   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:45,243-Speed 9595.92 samples/sec   Loss 5.2717   LearningRate 0.0169   Epoch: 11   Global Step: 196760   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:46,362-Speed 9160.73 samples/sec   Loss 5.3734   LearningRate 0.0169   Epoch: 11   Global Step: 196770   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-11 19:28:47,477-Speed 9189.30 samples/sec   Loss 5.3775   LearningRate 0.0169   Epoch: 11   Global Step: 196780   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:48,568-Speed 9387.97 samples/sec   Loss 5.2774   LearningRate 0.0169   Epoch: 11   Global Step: 196790   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:49,627-Speed 9679.10 samples/sec   Loss 5.3266   LearningRate 0.0168   Epoch: 11   Global Step: 196800   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:50,718-Speed 9389.65 samples/sec   Loss 5.2469   LearningRate 0.0168   Epoch: 11   Global Step: 196810   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:51,761-Speed 9819.51 samples/sec   Loss 5.4513   LearningRate 0.0168   Epoch: 11   Global Step: 196820   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:52,838-Speed 9519.50 samples/sec   Loss 5.3178   LearningRate 0.0168   Epoch: 11   Global Step: 196830   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:53,900-Speed 9647.87 samples/sec   Loss 5.4058   LearningRate 0.0168   Epoch: 11   Global Step: 196840   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:55,010-Speed 9225.85 samples/sec   Loss 5.3745   LearningRate 0.0168   Epoch: 11   Global Step: 196850   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:56,062-Speed 9745.02 samples/sec   Loss 5.3763   LearningRate 0.0168   Epoch: 11   Global Step: 196860   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:57,163-Speed 9304.57 samples/sec   Loss 5.2644   LearningRate 0.0168   Epoch: 11   Global Step: 196870   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:28:58,230-Speed 9596.52 samples/sec   Loss 5.3203   LearningRate 0.0168   Epoch: 11   Global Step: 196880   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:28:59,287-Speed 9694.39 samples/sec   Loss 5.3856   LearningRate 0.0168   Epoch: 11   Global Step: 196890   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:00,369-Speed 9479.13 samples/sec   Loss 5.3720   LearningRate 0.0168   Epoch: 11   Global Step: 196900   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:01,464-Speed 9357.88 samples/sec   Loss 5.3122   LearningRate 0.0168   Epoch: 11   Global Step: 196910   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:02,534-Speed 9577.00 samples/sec   Loss 5.4353   LearningRate 0.0168   Epoch: 11   Global Step: 196920   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:03,650-Speed 9181.50 samples/sec   Loss 5.3161   LearningRate 0.0168   Epoch: 11   Global Step: 196930   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:04,722-Speed 9555.64 samples/sec   Loss 5.2910   LearningRate 0.0168   Epoch: 11   Global Step: 196940   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:05,806-Speed 9450.06 samples/sec   Loss 5.2568   LearningRate 0.0168   Epoch: 11   Global Step: 196950   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:06,897-Speed 9396.09 samples/sec   Loss 5.3961   LearningRate 0.0168   Epoch: 11   Global Step: 196960   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:08,038-Speed 8976.04 samples/sec   Loss 5.2805   LearningRate 0.0168   Epoch: 11   Global Step: 196970   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:09,154-Speed 9183.68 samples/sec   Loss 5.3701   LearningRate 0.0168   Epoch: 11   Global Step: 196980   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:10,244-Speed 9399.90 samples/sec   Loss 5.3048   LearningRate 0.0168   Epoch: 11   Global Step: 196990   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:29:11,273-Speed 9955.96 samples/sec   Loss 5.3817   LearningRate 0.0168   Epoch: 11   Global Step: 197000   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:12,346-Speed 9556.23 samples/sec   Loss 5.2454   LearningRate 0.0168   Epoch: 11   Global Step: 197010   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:13,461-Speed 9184.38 samples/sec   Loss 5.3166   LearningRate 0.0168   Epoch: 11   Global Step: 197020   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:14,567-Speed 9268.29 samples/sec   Loss 5.4434   LearningRate 0.0168   Epoch: 11   Global Step: 197030   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:15,684-Speed 9166.68 samples/sec   Loss 5.4057   LearningRate 0.0168   Epoch: 11   Global Step: 197040   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:16,764-Speed 9490.74 samples/sec   Loss 5.4572   LearningRate 0.0168   Epoch: 11   Global Step: 197050   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:17,866-Speed 9302.06 samples/sec   Loss 5.4157   LearningRate 0.0168   Epoch: 11   Global Step: 197060   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:18,960-Speed 9365.50 samples/sec   Loss 5.3504   LearningRate 0.0168   Epoch: 11   Global Step: 197070   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:20,051-Speed 9395.28 samples/sec   Loss 5.4418   LearningRate 0.0168   Epoch: 11   Global Step: 197080   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:21,108-Speed 9689.82 samples/sec   Loss 5.3032   LearningRate 0.0168   Epoch: 11   Global Step: 197090   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:22,173-Speed 9624.46 samples/sec   Loss 5.4287   LearningRate 0.0168   Epoch: 11   Global Step: 197100   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:29:23,291-Speed 9159.73 samples/sec   Loss 5.2947   LearningRate 0.0168   Epoch: 11   Global Step: 197110   Fp16 Grad Scale: 262144   Required: 6 hours
Training: 2022-04-11 19:29:24,340-Speed 9773.60 samples/sec   Loss 5.2360   LearningRate 0.0168   Epoch: 11   Global Step: 197120   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:25,414-Speed 9533.91 samples/sec   Loss 5.3996   LearningRate 0.0168   Epoch: 11   Global Step: 197130   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:26,524-Speed 9228.42 samples/sec   Loss 5.3060   LearningRate 0.0168   Epoch: 11   Global Step: 197140   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:27,605-Speed 9479.20 samples/sec   Loss 5.2736   LearningRate 0.0168   Epoch: 11   Global Step: 197150   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:28,711-Speed 9267.65 samples/sec   Loss 5.3277   LearningRate 0.0168   Epoch: 11   Global Step: 197160   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:29,796-Speed 9443.13 samples/sec   Loss 5.3201   LearningRate 0.0168   Epoch: 11   Global Step: 197170   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:30,885-Speed 9408.83 samples/sec   Loss 5.3597   LearningRate 0.0168   Epoch: 11   Global Step: 197180   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:32,005-Speed 9152.08 samples/sec   Loss 5.2877   LearningRate 0.0168   Epoch: 11   Global Step: 197190   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:33,083-Speed 9505.41 samples/sec   Loss 5.2320   LearningRate 0.0167   Epoch: 11   Global Step: 197200   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:34,191-Speed 9245.30 samples/sec   Loss 5.3813   LearningRate 0.0167   Epoch: 11   Global Step: 197210   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:35,273-Speed 9471.88 samples/sec   Loss 5.2845   LearningRate 0.0167   Epoch: 11   Global Step: 197220   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:36,342-Speed 9581.07 samples/sec   Loss 5.3377   LearningRate 0.0167   Epoch: 11   Global Step: 197230   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:37,399-Speed 9692.86 samples/sec   Loss 5.3426   LearningRate 0.0167   Epoch: 11   Global Step: 197240   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:38,508-Speed 9245.32 samples/sec   Loss 5.3548   LearningRate 0.0167   Epoch: 11   Global Step: 197250   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:39,586-Speed 9501.68 samples/sec   Loss 5.3110   LearningRate 0.0167   Epoch: 11   Global Step: 197260   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:40,666-Speed 9488.79 samples/sec   Loss 5.3115   LearningRate 0.0167   Epoch: 11   Global Step: 197270   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:41,781-Speed 9189.32 samples/sec   Loss 5.3093   LearningRate 0.0167   Epoch: 11   Global Step: 197280   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:42,879-Speed 9335.69 samples/sec   Loss 5.4150   LearningRate 0.0167   Epoch: 11   Global Step: 197290   Fp16 Grad Scale: 131072   Required: 6 hours
Training: 2022-04-11 19:29:43,950-Speed 9563.31 samples/sec   Loss 5.3122   LearningRate 0.0167   Epoch: 11   Global Step: 197300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:29:45,021-Speed 9569.46 samples/sec   Loss 5.3414   LearningRate 0.0167   Epoch: 11   Global Step: 197310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:29:46,087-Speed 9605.29 samples/sec   Loss 5.4466   LearningRate 0.0167   Epoch: 11   Global Step: 197320   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:29:47,153-Speed 9622.58 samples/sec   Loss 5.3376   LearningRate 0.0167   Epoch: 11   Global Step: 197330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:29:48,261-Speed 9247.02 samples/sec   Loss 5.4106   LearningRate 0.0167   Epoch: 11   Global Step: 197340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:29:49,341-Speed 9486.61 samples/sec   Loss 5.2992   LearningRate 0.0167   Epoch: 11   Global Step: 197350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:29:50,413-Speed 9554.39 samples/sec   Loss 5.2814   LearningRate 0.0167   Epoch: 11   Global Step: 197360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:29:51,478-Speed 9625.76 samples/sec   Loss 5.3218   LearningRate 0.0167   Epoch: 11   Global Step: 197370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:29:52,630-Speed 8895.01 samples/sec   Loss 5.4039   LearningRate 0.0167   Epoch: 11   Global Step: 197380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:29:53,730-Speed 9309.28 samples/sec   Loss 5.3234   LearningRate 0.0167   Epoch: 11   Global Step: 197390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:29:54,790-Speed 9669.49 samples/sec   Loss 5.3090   LearningRate 0.0167   Epoch: 11   Global Step: 197400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:29:55,874-Speed 9451.98 samples/sec   Loss 5.2482   LearningRate 0.0167   Epoch: 11   Global Step: 197410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:29:56,958-Speed 9449.35 samples/sec   Loss 5.3260   LearningRate 0.0167   Epoch: 11   Global Step: 197420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:29:58,077-Speed 9159.13 samples/sec   Loss 5.2886   LearningRate 0.0167   Epoch: 11   Global Step: 197430   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:29:59,158-Speed 9478.73 samples/sec   Loss 5.3485   LearningRate 0.0167   Epoch: 11   Global Step: 197440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:00,243-Speed 9443.06 samples/sec   Loss 5.3949   LearningRate 0.0167   Epoch: 11   Global Step: 197450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:01,340-Speed 9342.11 samples/sec   Loss 5.4250   LearningRate 0.0167   Epoch: 11   Global Step: 197460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:02,437-Speed 9342.98 samples/sec   Loss 5.2667   LearningRate 0.0167   Epoch: 11   Global Step: 197470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:03,579-Speed 8973.18 samples/sec   Loss 5.4156   LearningRate 0.0167   Epoch: 11   Global Step: 197480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:04,655-Speed 9518.61 samples/sec   Loss 5.3367   LearningRate 0.0167   Epoch: 11   Global Step: 197490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:05,747-Speed 9380.17 samples/sec   Loss 5.2304   LearningRate 0.0167   Epoch: 11   Global Step: 197500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:06,857-Speed 9235.64 samples/sec   Loss 5.2667   LearningRate 0.0167   Epoch: 11   Global Step: 197510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:07,959-Speed 9296.20 samples/sec   Loss 5.2771   LearningRate 0.0167   Epoch: 11   Global Step: 197520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:09,044-Speed 9446.25 samples/sec   Loss 5.3485   LearningRate 0.0167   Epoch: 11   Global Step: 197530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:10,124-Speed 9485.86 samples/sec   Loss 5.4397   LearningRate 0.0167   Epoch: 11   Global Step: 197540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:11,178-Speed 9726.43 samples/sec   Loss 5.3614   LearningRate 0.0167   Epoch: 11   Global Step: 197550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:12,254-Speed 9523.89 samples/sec   Loss 5.3733   LearningRate 0.0167   Epoch: 11   Global Step: 197560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:13,377-Speed 9119.10 samples/sec   Loss 5.3651   LearningRate 0.0167   Epoch: 11   Global Step: 197570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:14,445-Speed 9597.69 samples/sec   Loss 5.3986   LearningRate 0.0167   Epoch: 11   Global Step: 197580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:15,520-Speed 9528.58 samples/sec   Loss 5.2811   LearningRate 0.0167   Epoch: 11   Global Step: 197590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:16,613-Speed 9374.48 samples/sec   Loss 5.3177   LearningRate 0.0167   Epoch: 11   Global Step: 197600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:17,685-Speed 9566.80 samples/sec   Loss 5.3524   LearningRate 0.0166   Epoch: 11   Global Step: 197610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:18,754-Speed 9581.32 samples/sec   Loss 5.2687   LearningRate 0.0166   Epoch: 11   Global Step: 197620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:30:19,854-Speed 9314.25 samples/sec   Loss 5.3352   LearningRate 0.0166   Epoch: 11   Global Step: 197630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:30:20,920-Speed 9613.33 samples/sec   Loss 5.3773   LearningRate 0.0166   Epoch: 11   Global Step: 197640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:30:21,996-Speed 9517.44 samples/sec   Loss 5.3256   LearningRate 0.0166   Epoch: 11   Global Step: 197650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:30:23,063-Speed 9608.13 samples/sec   Loss 5.2650   LearningRate 0.0166   Epoch: 11   Global Step: 197660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:30:24,117-Speed 9713.93 samples/sec   Loss 5.2173   LearningRate 0.0166   Epoch: 11   Global Step: 197670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:30:25,171-Speed 9727.38 samples/sec   Loss 5.3614   LearningRate 0.0166   Epoch: 11   Global Step: 197680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:30:26,265-Speed 9367.54 samples/sec   Loss 5.3458   LearningRate 0.0166   Epoch: 11   Global Step: 197690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:30:27,331-Speed 9610.37 samples/sec   Loss 5.3332   LearningRate 0.0166   Epoch: 11   Global Step: 197700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:30:28,410-Speed 9500.04 samples/sec   Loss 5.3237   LearningRate 0.0166   Epoch: 11   Global Step: 197710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:30:29,492-Speed 9464.08 samples/sec   Loss 5.2989   LearningRate 0.0166   Epoch: 11   Global Step: 197720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:30,577-Speed 9455.40 samples/sec   Loss 5.3104   LearningRate 0.0166   Epoch: 11   Global Step: 197730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:31,687-Speed 9227.46 samples/sec   Loss 5.3911   LearningRate 0.0166   Epoch: 11   Global Step: 197740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:32,786-Speed 9324.38 samples/sec   Loss 5.4197   LearningRate 0.0166   Epoch: 11   Global Step: 197750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:33,842-Speed 9698.11 samples/sec   Loss 5.3205   LearningRate 0.0166   Epoch: 11   Global Step: 197760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:34,935-Speed 9379.06 samples/sec   Loss 5.3255   LearningRate 0.0166   Epoch: 11   Global Step: 197770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:36,057-Speed 9130.35 samples/sec   Loss 5.3410   LearningRate 0.0166   Epoch: 11   Global Step: 197780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:37,149-Speed 9380.17 samples/sec   Loss 5.4634   LearningRate 0.0166   Epoch: 11   Global Step: 197790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:38,210-Speed 9656.00 samples/sec   Loss 5.4394   LearningRate 0.0166   Epoch: 11   Global Step: 197800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:39,279-Speed 9583.94 samples/sec   Loss 5.3426   LearningRate 0.0166   Epoch: 11   Global Step: 197810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:40,367-Speed 9421.05 samples/sec   Loss 5.3835   LearningRate 0.0166   Epoch: 11   Global Step: 197820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:41,448-Speed 9482.76 samples/sec   Loss 5.2459   LearningRate 0.0166   Epoch: 11   Global Step: 197830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:42,486-Speed 9865.22 samples/sec   Loss 5.2896   LearningRate 0.0166   Epoch: 11   Global Step: 197840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:43,578-Speed 9379.89 samples/sec   Loss 5.2378   LearningRate 0.0166   Epoch: 11   Global Step: 197850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:44,665-Speed 9429.56 samples/sec   Loss 5.2844   LearningRate 0.0166   Epoch: 11   Global Step: 197860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:45,771-Speed 9267.68 samples/sec   Loss 5.3905   LearningRate 0.0166   Epoch: 11   Global Step: 197870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:46,887-Speed 9183.62 samples/sec   Loss 5.3226   LearningRate 0.0166   Epoch: 11   Global Step: 197880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:47,946-Speed 9670.83 samples/sec   Loss 5.3664   LearningRate 0.0166   Epoch: 11   Global Step: 197890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:49,025-Speed 9501.82 samples/sec   Loss 5.4279   LearningRate 0.0166   Epoch: 11   Global Step: 197900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:50,108-Speed 9455.54 samples/sec   Loss 5.3101   LearningRate 0.0166   Epoch: 11   Global Step: 197910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:51,179-Speed 9570.55 samples/sec   Loss 5.3341   LearningRate 0.0166   Epoch: 11   Global Step: 197920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:52,227-Speed 9774.47 samples/sec   Loss 5.3668   LearningRate 0.0166   Epoch: 11   Global Step: 197930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:53,292-Speed 9619.86 samples/sec   Loss 5.3342   LearningRate 0.0166   Epoch: 11   Global Step: 197940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:54,353-Speed 9662.27 samples/sec   Loss 5.3654   LearningRate 0.0166   Epoch: 11   Global Step: 197950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:55,465-Speed 9210.90 samples/sec   Loss 5.2659   LearningRate 0.0166   Epoch: 11   Global Step: 197960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:56,519-Speed 9716.31 samples/sec   Loss 5.2523   LearningRate 0.0166   Epoch: 11   Global Step: 197970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:57,601-Speed 9475.14 samples/sec   Loss 5.3615   LearningRate 0.0166   Epoch: 11   Global Step: 197980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:58,666-Speed 9614.55 samples/sec   Loss 5.3300   LearningRate 0.0166   Epoch: 11   Global Step: 197990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:30:59,786-Speed 9150.55 samples/sec   Loss 5.4674   LearningRate 0.0166   Epoch: 11   Global Step: 198000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:31:21,782-[lfw][198000]XNorm: 8.728273
Training: 2022-04-11 19:31:21,783-[lfw][198000]Accuracy-Flip: 0.99667+-0.00247
Training: 2022-04-11 19:31:21,783-[lfw][198000]Accuracy-Highest: 0.99683
Training: 2022-04-11 19:31:47,178-[cfp_fp][198000]XNorm: 7.472294
Training: 2022-04-11 19:31:47,179-[cfp_fp][198000]Accuracy-Flip: 0.96300+-0.00970
Training: 2022-04-11 19:31:47,180-[cfp_fp][198000]Accuracy-Highest: 0.96714
Training: 2022-04-11 19:32:09,134-[agedb_30][198000]XNorm: 8.502214
Training: 2022-04-11 19:32:09,135-[agedb_30][198000]Accuracy-Flip: 0.96367+-0.01183
Training: 2022-04-11 19:32:09,135-[agedb_30][198000]Accuracy-Highest: 0.96983
Training: 2022-04-11 19:32:10,207-Speed 145.41 samples/sec   Loss 5.3627   LearningRate 0.0166   Epoch: 11   Global Step: 198010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:11,278-Speed 9571.89 samples/sec   Loss 5.3808   LearningRate 0.0165   Epoch: 11   Global Step: 198020   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:32:12,369-Speed 9392.91 samples/sec   Loss 5.2270   LearningRate 0.0165   Epoch: 11   Global Step: 198030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:13,462-Speed 9372.51 samples/sec   Loss 5.3342   LearningRate 0.0165   Epoch: 11   Global Step: 198040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:14,575-Speed 9200.31 samples/sec   Loss 5.2733   LearningRate 0.0165   Epoch: 11   Global Step: 198050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:15,664-Speed 9410.96 samples/sec   Loss 5.3619   LearningRate 0.0165   Epoch: 11   Global Step: 198060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:16,756-Speed 9385.54 samples/sec   Loss 5.4010   LearningRate 0.0165   Epoch: 11   Global Step: 198070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:17,858-Speed 9299.92 samples/sec   Loss 5.2539   LearningRate 0.0165   Epoch: 11   Global Step: 198080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:18,910-Speed 9738.43 samples/sec   Loss 5.3014   LearningRate 0.0165   Epoch: 11   Global Step: 198090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:19,977-Speed 9607.07 samples/sec   Loss 5.3191   LearningRate 0.0165   Epoch: 11   Global Step: 198100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:21,056-Speed 9496.86 samples/sec   Loss 5.2597   LearningRate 0.0165   Epoch: 11   Global Step: 198110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:22,151-Speed 9353.75 samples/sec   Loss 5.2792   LearningRate 0.0165   Epoch: 11   Global Step: 198120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:23,209-Speed 9686.26 samples/sec   Loss 5.3172   LearningRate 0.0165   Epoch: 11   Global Step: 198130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:24,354-Speed 8946.19 samples/sec   Loss 5.4145   LearningRate 0.0165   Epoch: 11   Global Step: 198140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:25,437-Speed 9464.73 samples/sec   Loss 5.3075   LearningRate 0.0165   Epoch: 11   Global Step: 198150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:26,536-Speed 9325.71 samples/sec   Loss 5.3773   LearningRate 0.0165   Epoch: 11   Global Step: 198160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:27,604-Speed 9586.40 samples/sec   Loss 5.3155   LearningRate 0.0165   Epoch: 11   Global Step: 198170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:28,710-Speed 9263.30 samples/sec   Loss 5.2594   LearningRate 0.0165   Epoch: 11   Global Step: 198180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:29,802-Speed 9384.35 samples/sec   Loss 5.3453   LearningRate 0.0165   Epoch: 11   Global Step: 198190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:30,902-Speed 9314.56 samples/sec   Loss 5.2444   LearningRate 0.0165   Epoch: 11   Global Step: 198200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:31,980-Speed 9511.86 samples/sec   Loss 5.3705   LearningRate 0.0165   Epoch: 11   Global Step: 198210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:33,041-Speed 9653.56 samples/sec   Loss 5.2601   LearningRate 0.0165   Epoch: 11   Global Step: 198220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:34,151-Speed 9230.58 samples/sec   Loss 5.2975   LearningRate 0.0165   Epoch: 11   Global Step: 198230   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:32:35,213-Speed 9645.78 samples/sec   Loss 5.2393   LearningRate 0.0165   Epoch: 11   Global Step: 198240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:36,252-Speed 9865.35 samples/sec   Loss 5.4286   LearningRate 0.0165   Epoch: 11   Global Step: 198250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:37,307-Speed 9710.74 samples/sec   Loss 5.3419   LearningRate 0.0165   Epoch: 11   Global Step: 198260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:38,405-Speed 9330.55 samples/sec   Loss 5.2664   LearningRate 0.0165   Epoch: 11   Global Step: 198270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:39,525-Speed 9141.78 samples/sec   Loss 5.2498   LearningRate 0.0165   Epoch: 11   Global Step: 198280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:40,640-Speed 9193.50 samples/sec   Loss 5.2303   LearningRate 0.0165   Epoch: 11   Global Step: 198290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:41,743-Speed 9300.29 samples/sec   Loss 5.2328   LearningRate 0.0165   Epoch: 11   Global Step: 198300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:42,811-Speed 9589.16 samples/sec   Loss 5.3306   LearningRate 0.0165   Epoch: 11   Global Step: 198310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:43,900-Speed 9414.21 samples/sec   Loss 5.3683   LearningRate 0.0165   Epoch: 11   Global Step: 198320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:44,995-Speed 9355.60 samples/sec   Loss 5.2865   LearningRate 0.0165   Epoch: 11   Global Step: 198330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:46,078-Speed 9454.00 samples/sec   Loss 5.3232   LearningRate 0.0165   Epoch: 11   Global Step: 198340   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:32:47,148-Speed 9580.59 samples/sec   Loss 5.3664   LearningRate 0.0165   Epoch: 11   Global Step: 198350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:48,204-Speed 9707.16 samples/sec   Loss 5.3282   LearningRate 0.0165   Epoch: 11   Global Step: 198360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:49,278-Speed 9538.03 samples/sec   Loss 5.3644   LearningRate 0.0165   Epoch: 11   Global Step: 198370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:50,345-Speed 9603.74 samples/sec   Loss 5.1984   LearningRate 0.0165   Epoch: 11   Global Step: 198380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:51,411-Speed 9611.60 samples/sec   Loss 5.3633   LearningRate 0.0165   Epoch: 11   Global Step: 198390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:52,490-Speed 9498.86 samples/sec   Loss 5.3161   LearningRate 0.0165   Epoch: 11   Global Step: 198400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:53,541-Speed 9745.46 samples/sec   Loss 5.2972   LearningRate 0.0165   Epoch: 11   Global Step: 198410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:54,604-Speed 9641.26 samples/sec   Loss 5.1562   LearningRate 0.0165   Epoch: 11   Global Step: 198420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:55,661-Speed 9687.17 samples/sec   Loss 5.3860   LearningRate 0.0164   Epoch: 11   Global Step: 198430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:56,715-Speed 9725.46 samples/sec   Loss 5.3964   LearningRate 0.0164   Epoch: 11   Global Step: 198440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:57,777-Speed 9641.90 samples/sec   Loss 5.1871   LearningRate 0.0164   Epoch: 11   Global Step: 198450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:58,867-Speed 9406.74 samples/sec   Loss 5.3334   LearningRate 0.0164   Epoch: 11   Global Step: 198460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:32:59,919-Speed 9742.58 samples/sec   Loss 5.1726   LearningRate 0.0164   Epoch: 11   Global Step: 198470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:00,989-Speed 9575.77 samples/sec   Loss 5.3809   LearningRate 0.0164   Epoch: 11   Global Step: 198480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:33:02,061-Speed 9555.33 samples/sec   Loss 5.3412   LearningRate 0.0164   Epoch: 11   Global Step: 198490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:33:03,158-Speed 9339.49 samples/sec   Loss 5.2851   LearningRate 0.0164   Epoch: 11   Global Step: 198500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:33:04,253-Speed 9358.69 samples/sec   Loss 5.3778   LearningRate 0.0164   Epoch: 11   Global Step: 198510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:33:05,373-Speed 9147.28 samples/sec   Loss 5.2820   LearningRate 0.0164   Epoch: 11   Global Step: 198520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:33:06,469-Speed 9352.38 samples/sec   Loss 5.3205   LearningRate 0.0164   Epoch: 11   Global Step: 198530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:33:07,546-Speed 9515.43 samples/sec   Loss 5.3080   LearningRate 0.0164   Epoch: 11   Global Step: 198540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:33:08,603-Speed 9687.59 samples/sec   Loss 5.2708   LearningRate 0.0164   Epoch: 11   Global Step: 198550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:33:09,721-Speed 9169.47 samples/sec   Loss 5.3159   LearningRate 0.0164   Epoch: 11   Global Step: 198560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:33:10,810-Speed 9405.25 samples/sec   Loss 5.4019   LearningRate 0.0164   Epoch: 11   Global Step: 198570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:33:11,883-Speed 9555.08 samples/sec   Loss 5.2668   LearningRate 0.0164   Epoch: 11   Global Step: 198580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:12,957-Speed 9533.44 samples/sec   Loss 5.3346   LearningRate 0.0164   Epoch: 11   Global Step: 198590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:14,040-Speed 9461.82 samples/sec   Loss 5.2358   LearningRate 0.0164   Epoch: 11   Global Step: 198600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:15,116-Speed 9525.83 samples/sec   Loss 5.3701   LearningRate 0.0164   Epoch: 11   Global Step: 198610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:16,206-Speed 9399.93 samples/sec   Loss 5.1620   LearningRate 0.0164   Epoch: 11   Global Step: 198620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:17,320-Speed 9197.50 samples/sec   Loss 5.3142   LearningRate 0.0164   Epoch: 11   Global Step: 198630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:18,451-Speed 9062.55 samples/sec   Loss 5.3006   LearningRate 0.0164   Epoch: 11   Global Step: 198640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:19,569-Speed 9159.67 samples/sec   Loss 5.2598   LearningRate 0.0164   Epoch: 11   Global Step: 198650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:20,644-Speed 9534.96 samples/sec   Loss 5.3434   LearningRate 0.0164   Epoch: 11   Global Step: 198660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:21,708-Speed 9634.62 samples/sec   Loss 5.3752   LearningRate 0.0164   Epoch: 11   Global Step: 198670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:22,759-Speed 9745.63 samples/sec   Loss 5.3339   LearningRate 0.0164   Epoch: 11   Global Step: 198680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:23,824-Speed 9615.25 samples/sec   Loss 5.3169   LearningRate 0.0164   Epoch: 11   Global Step: 198690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:24,945-Speed 9141.36 samples/sec   Loss 5.2978   LearningRate 0.0164   Epoch: 11   Global Step: 198700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:26,025-Speed 9491.40 samples/sec   Loss 5.3001   LearningRate 0.0164   Epoch: 11   Global Step: 198710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:27,073-Speed 9770.97 samples/sec   Loss 5.3486   LearningRate 0.0164   Epoch: 11   Global Step: 198720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:28,189-Speed 9181.07 samples/sec   Loss 5.3241   LearningRate 0.0164   Epoch: 11   Global Step: 198730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:29,330-Speed 8980.24 samples/sec   Loss 5.2713   LearningRate 0.0164   Epoch: 11   Global Step: 198740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:30,391-Speed 9660.78 samples/sec   Loss 5.3575   LearningRate 0.0164   Epoch: 11   Global Step: 198750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:31,441-Speed 9760.27 samples/sec   Loss 5.3344   LearningRate 0.0164   Epoch: 11   Global Step: 198760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:32,516-Speed 9528.21 samples/sec   Loss 5.2831   LearningRate 0.0164   Epoch: 11   Global Step: 198770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:33,547-Speed 9937.78 samples/sec   Loss 5.2972   LearningRate 0.0164   Epoch: 11   Global Step: 198780   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:33:34,628-Speed 9479.42 samples/sec   Loss 5.3326   LearningRate 0.0164   Epoch: 11   Global Step: 198790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:35,752-Speed 9111.64 samples/sec   Loss 5.2956   LearningRate 0.0164   Epoch: 11   Global Step: 198800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:36,821-Speed 9590.73 samples/sec   Loss 5.2911   LearningRate 0.0164   Epoch: 11   Global Step: 198810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:37,874-Speed 9731.88 samples/sec   Loss 5.3130   LearningRate 0.0164   Epoch: 11   Global Step: 198820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:38,957-Speed 9461.15 samples/sec   Loss 5.3500   LearningRate 0.0164   Epoch: 11   Global Step: 198830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:40,034-Speed 9517.41 samples/sec   Loss 5.4738   LearningRate 0.0163   Epoch: 11   Global Step: 198840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:41,149-Speed 9199.56 samples/sec   Loss 5.3582   LearningRate 0.0163   Epoch: 11   Global Step: 198850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:42,241-Speed 9381.24 samples/sec   Loss 5.3546   LearningRate 0.0163   Epoch: 11   Global Step: 198860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:43,322-Speed 9477.21 samples/sec   Loss 5.2930   LearningRate 0.0163   Epoch: 11   Global Step: 198870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:44,423-Speed 9305.09 samples/sec   Loss 5.3204   LearningRate 0.0163   Epoch: 11   Global Step: 198880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:45,497-Speed 9543.82 samples/sec   Loss 5.3849   LearningRate 0.0163   Epoch: 11   Global Step: 198890   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:33:46,600-Speed 9287.01 samples/sec   Loss 5.3358   LearningRate 0.0163   Epoch: 11   Global Step: 198900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:47,709-Speed 9243.66 samples/sec   Loss 5.3412   LearningRate 0.0163   Epoch: 11   Global Step: 198910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:48,858-Speed 8910.51 samples/sec   Loss 5.2647   LearningRate 0.0163   Epoch: 11   Global Step: 198920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:49,936-Speed 9506.62 samples/sec   Loss 5.4347   LearningRate 0.0163   Epoch: 11   Global Step: 198930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:50,996-Speed 9664.30 samples/sec   Loss 5.2619   LearningRate 0.0163   Epoch: 11   Global Step: 198940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:52,106-Speed 9235.46 samples/sec   Loss 5.3031   LearningRate 0.0163   Epoch: 11   Global Step: 198950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:53,159-Speed 9722.85 samples/sec   Loss 5.2732   LearningRate 0.0163   Epoch: 11   Global Step: 198960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:54,216-Speed 9698.79 samples/sec   Loss 5.2942   LearningRate 0.0163   Epoch: 11   Global Step: 198970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:55,305-Speed 9408.18 samples/sec   Loss 5.3063   LearningRate 0.0163   Epoch: 11   Global Step: 198980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:56,370-Speed 9615.25 samples/sec   Loss 5.2323   LearningRate 0.0163   Epoch: 11   Global Step: 198990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:57,426-Speed 9711.33 samples/sec   Loss 5.2675   LearningRate 0.0163   Epoch: 11   Global Step: 199000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:58,469-Speed 9824.09 samples/sec   Loss 5.3144   LearningRate 0.0163   Epoch: 11   Global Step: 199010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:33:59,599-Speed 9072.60 samples/sec   Loss 5.3021   LearningRate 0.0163   Epoch: 11   Global Step: 199020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:00,660-Speed 9657.03 samples/sec   Loss 5.3152   LearningRate 0.0163   Epoch: 11   Global Step: 199030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:01,768-Speed 9245.40 samples/sec   Loss 5.2455   LearningRate 0.0163   Epoch: 11   Global Step: 199040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:02,863-Speed 9358.39 samples/sec   Loss 5.1508   LearningRate 0.0163   Epoch: 11   Global Step: 199050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:03,996-Speed 9045.29 samples/sec   Loss 5.3595   LearningRate 0.0163   Epoch: 11   Global Step: 199060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:05,065-Speed 9585.46 samples/sec   Loss 5.2901   LearningRate 0.0163   Epoch: 11   Global Step: 199070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:06,158-Speed 9367.08 samples/sec   Loss 5.2896   LearningRate 0.0163   Epoch: 11   Global Step: 199080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:07,221-Speed 9638.43 samples/sec   Loss 5.3450   LearningRate 0.0163   Epoch: 11   Global Step: 199090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:08,320-Speed 9323.61 samples/sec   Loss 5.3568   LearningRate 0.0163   Epoch: 11   Global Step: 199100   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:34:09,392-Speed 9562.95 samples/sec   Loss 5.2393   LearningRate 0.0163   Epoch: 11   Global Step: 199110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:10,453-Speed 9652.24 samples/sec   Loss 5.3109   LearningRate 0.0163   Epoch: 11   Global Step: 199120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:11,526-Speed 9550.22 samples/sec   Loss 5.3097   LearningRate 0.0163   Epoch: 11   Global Step: 199130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:12,626-Speed 9314.33 samples/sec   Loss 5.2617   LearningRate 0.0163   Epoch: 11   Global Step: 199140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:13,767-Speed 8978.11 samples/sec   Loss 5.2701   LearningRate 0.0163   Epoch: 11   Global Step: 199150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:14,887-Speed 9154.71 samples/sec   Loss 5.2593   LearningRate 0.0163   Epoch: 11   Global Step: 199160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:15,966-Speed 9497.90 samples/sec   Loss 5.3474   LearningRate 0.0163   Epoch: 11   Global Step: 199170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:17,034-Speed 9591.52 samples/sec   Loss 5.2747   LearningRate 0.0163   Epoch: 11   Global Step: 199180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:18,113-Speed 9497.58 samples/sec   Loss 5.3300   LearningRate 0.0163   Epoch: 11   Global Step: 199190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:19,223-Speed 9228.15 samples/sec   Loss 5.3640   LearningRate 0.0163   Epoch: 11   Global Step: 199200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:20,311-Speed 9419.97 samples/sec   Loss 5.2681   LearningRate 0.0163   Epoch: 11   Global Step: 199210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:21,374-Speed 9640.68 samples/sec   Loss 5.3392   LearningRate 0.0163   Epoch: 11   Global Step: 199220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:22,406-Speed 9926.29 samples/sec   Loss 5.3017   LearningRate 0.0163   Epoch: 11   Global Step: 199230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:34:23,457-Speed 9750.93 samples/sec   Loss 5.3302   LearningRate 0.0163   Epoch: 11   Global Step: 199240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:34:24,539-Speed 9473.99 samples/sec   Loss 5.2815   LearningRate 0.0163   Epoch: 11   Global Step: 199250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:34:25,609-Speed 9568.44 samples/sec   Loss 5.4222   LearningRate 0.0162   Epoch: 11   Global Step: 199260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:34:26,692-Speed 9463.92 samples/sec   Loss 5.4165   LearningRate 0.0162   Epoch: 11   Global Step: 199270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:34:27,825-Speed 9040.46 samples/sec   Loss 5.3729   LearningRate 0.0162   Epoch: 11   Global Step: 199280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:34:28,886-Speed 9659.71 samples/sec   Loss 5.3859   LearningRate 0.0162   Epoch: 11   Global Step: 199290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:34:29,959-Speed 9547.09 samples/sec   Loss 5.2788   LearningRate 0.0162   Epoch: 11   Global Step: 199300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:34:31,024-Speed 9619.58 samples/sec   Loss 5.2905   LearningRate 0.0162   Epoch: 11   Global Step: 199310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:34:32,103-Speed 9492.33 samples/sec   Loss 5.3146   LearningRate 0.0162   Epoch: 11   Global Step: 199320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:34:33,180-Speed 9515.60 samples/sec   Loss 5.3226   LearningRate 0.0162   Epoch: 11   Global Step: 199330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:34,300-Speed 9149.83 samples/sec   Loss 5.2694   LearningRate 0.0162   Epoch: 11   Global Step: 199340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:35,385-Speed 9446.33 samples/sec   Loss 5.3173   LearningRate 0.0162   Epoch: 11   Global Step: 199350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:36,489-Speed 9280.91 samples/sec   Loss 5.3596   LearningRate 0.0162   Epoch: 11   Global Step: 199360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:37,572-Speed 9468.11 samples/sec   Loss 5.3419   LearningRate 0.0162   Epoch: 11   Global Step: 199370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:38,638-Speed 9608.08 samples/sec   Loss 5.2920   LearningRate 0.0162   Epoch: 11   Global Step: 199380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:39,704-Speed 9611.16 samples/sec   Loss 5.3135   LearningRate 0.0162   Epoch: 11   Global Step: 199390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:40,778-Speed 9534.51 samples/sec   Loss 5.3623   LearningRate 0.0162   Epoch: 11   Global Step: 199400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:41,863-Speed 9448.75 samples/sec   Loss 5.2926   LearningRate 0.0162   Epoch: 11   Global Step: 199410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:42,977-Speed 9192.23 samples/sec   Loss 5.2184   LearningRate 0.0162   Epoch: 11   Global Step: 199420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:44,036-Speed 9679.74 samples/sec   Loss 5.3812   LearningRate 0.0162   Epoch: 11   Global Step: 199430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:45,112-Speed 9520.33 samples/sec   Loss 5.2832   LearningRate 0.0162   Epoch: 11   Global Step: 199440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:46,236-Speed 9117.79 samples/sec   Loss 5.2716   LearningRate 0.0162   Epoch: 11   Global Step: 199450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:47,272-Speed 9887.62 samples/sec   Loss 5.3786   LearningRate 0.0162   Epoch: 11   Global Step: 199460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:48,379-Speed 9253.67 samples/sec   Loss 5.3260   LearningRate 0.0162   Epoch: 11   Global Step: 199470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:49,504-Speed 9113.12 samples/sec   Loss 5.3559   LearningRate 0.0162   Epoch: 11   Global Step: 199480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:50,621-Speed 9165.16 samples/sec   Loss 5.3361   LearningRate 0.0162   Epoch: 11   Global Step: 199490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:51,695-Speed 9546.45 samples/sec   Loss 5.2827   LearningRate 0.0162   Epoch: 11   Global Step: 199500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:52,821-Speed 9099.87 samples/sec   Loss 5.3541   LearningRate 0.0162   Epoch: 11   Global Step: 199510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:53,980-Speed 8847.87 samples/sec   Loss 5.3100   LearningRate 0.0162   Epoch: 11   Global Step: 199520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:55,073-Speed 9369.05 samples/sec   Loss 5.3055   LearningRate 0.0162   Epoch: 11   Global Step: 199530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:56,165-Speed 9383.74 samples/sec   Loss 5.2743   LearningRate 0.0162   Epoch: 11   Global Step: 199540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:57,251-Speed 9439.34 samples/sec   Loss 5.2409   LearningRate 0.0162   Epoch: 11   Global Step: 199550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:34:58,370-Speed 9149.60 samples/sec   Loss 5.3519   LearningRate 0.0162   Epoch: 11   Global Step: 199560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:34:59,468-Speed 9334.95 samples/sec   Loss 5.3174   LearningRate 0.0162   Epoch: 11   Global Step: 199570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:00,543-Speed 9529.96 samples/sec   Loss 5.2137   LearningRate 0.0162   Epoch: 11   Global Step: 199580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:01,611-Speed 9590.42 samples/sec   Loss 5.2725   LearningRate 0.0162   Epoch: 11   Global Step: 199590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:02,714-Speed 9287.82 samples/sec   Loss 5.3329   LearningRate 0.0162   Epoch: 11   Global Step: 199600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:03,800-Speed 9434.70 samples/sec   Loss 5.3028   LearningRate 0.0162   Epoch: 11   Global Step: 199610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:04,874-Speed 9540.35 samples/sec   Loss 5.2798   LearningRate 0.0162   Epoch: 11   Global Step: 199620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:05,967-Speed 9381.04 samples/sec   Loss 5.3040   LearningRate 0.0162   Epoch: 11   Global Step: 199630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:07,070-Speed 9285.24 samples/sec   Loss 5.3552   LearningRate 0.0162   Epoch: 11   Global Step: 199640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:08,179-Speed 9241.97 samples/sec   Loss 5.4156   LearningRate 0.0162   Epoch: 11   Global Step: 199650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:09,283-Speed 9276.92 samples/sec   Loss 5.2814   LearningRate 0.0162   Epoch: 11   Global Step: 199660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:10,383-Speed 9317.43 samples/sec   Loss 5.3512   LearningRate 0.0161   Epoch: 11   Global Step: 199670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:11,516-Speed 9043.88 samples/sec   Loss 5.3093   LearningRate 0.0161   Epoch: 11   Global Step: 199680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:12,623-Speed 9260.69 samples/sec   Loss 5.2085   LearningRate 0.0161   Epoch: 11   Global Step: 199690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:13,683-Speed 9663.28 samples/sec   Loss 5.3209   LearningRate 0.0161   Epoch: 11   Global Step: 199700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:14,806-Speed 9124.70 samples/sec   Loss 5.3375   LearningRate 0.0161   Epoch: 11   Global Step: 199710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:15,901-Speed 9359.90 samples/sec   Loss 5.3182   LearningRate 0.0161   Epoch: 11   Global Step: 199720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:17,032-Speed 9058.39 samples/sec   Loss 5.3858   LearningRate 0.0161   Epoch: 11   Global Step: 199730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:18,099-Speed 9601.62 samples/sec   Loss 5.2814   LearningRate 0.0161   Epoch: 11   Global Step: 199740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:19,166-Speed 9606.74 samples/sec   Loss 5.3933   LearningRate 0.0161   Epoch: 11   Global Step: 199750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:20,251-Speed 9442.68 samples/sec   Loss 5.1923   LearningRate 0.0161   Epoch: 11   Global Step: 199760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:21,348-Speed 9341.83 samples/sec   Loss 5.4207   LearningRate 0.0161   Epoch: 11   Global Step: 199770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:22,472-Speed 9113.44 samples/sec   Loss 5.2858   LearningRate 0.0161   Epoch: 11   Global Step: 199780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:23,528-Speed 9706.94 samples/sec   Loss 5.3074   LearningRate 0.0161   Epoch: 11   Global Step: 199790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:35:24,607-Speed 9496.81 samples/sec   Loss 5.2302   LearningRate 0.0161   Epoch: 11   Global Step: 199800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:25,726-Speed 9154.62 samples/sec   Loss 5.3190   LearningRate 0.0161   Epoch: 11   Global Step: 199810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:26,819-Speed 9367.72 samples/sec   Loss 5.3685   LearningRate 0.0161   Epoch: 11   Global Step: 199820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:27,933-Speed 9197.65 samples/sec   Loss 5.3212   LearningRate 0.0161   Epoch: 11   Global Step: 199830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:29,038-Speed 9279.72 samples/sec   Loss 5.4038   LearningRate 0.0161   Epoch: 11   Global Step: 199840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:30,147-Speed 9231.41 samples/sec   Loss 5.4289   LearningRate 0.0161   Epoch: 11   Global Step: 199850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:31,208-Speed 9663.74 samples/sec   Loss 5.2619   LearningRate 0.0161   Epoch: 11   Global Step: 199860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:32,305-Speed 9339.76 samples/sec   Loss 5.2421   LearningRate 0.0161   Epoch: 11   Global Step: 199870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:33,403-Speed 9337.90 samples/sec   Loss 5.2463   LearningRate 0.0161   Epoch: 11   Global Step: 199880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:34,455-Speed 9742.15 samples/sec   Loss 5.2011   LearningRate 0.0161   Epoch: 11   Global Step: 199890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:35,522-Speed 9596.02 samples/sec   Loss 5.1276   LearningRate 0.0161   Epoch: 11   Global Step: 199900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:36,582-Speed 9670.38 samples/sec   Loss 5.2622   LearningRate 0.0161   Epoch: 11   Global Step: 199910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:37,641-Speed 9673.91 samples/sec   Loss 5.2622   LearningRate 0.0161   Epoch: 11   Global Step: 199920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:38,791-Speed 8909.84 samples/sec   Loss 5.3144   LearningRate 0.0161   Epoch: 11   Global Step: 199930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:39,876-Speed 9445.16 samples/sec   Loss 5.3433   LearningRate 0.0161   Epoch: 11   Global Step: 199940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:40,960-Speed 9449.66 samples/sec   Loss 5.2989   LearningRate 0.0161   Epoch: 11   Global Step: 199950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:42,044-Speed 9450.59 samples/sec   Loss 5.3021   LearningRate 0.0161   Epoch: 11   Global Step: 199960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:43,189-Speed 8953.10 samples/sec   Loss 5.3182   LearningRate 0.0161   Epoch: 11   Global Step: 199970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:44,293-Speed 9282.88 samples/sec   Loss 5.3473   LearningRate 0.0161   Epoch: 11   Global Step: 199980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:45,399-Speed 9261.33 samples/sec   Loss 5.3355   LearningRate 0.0161   Epoch: 11   Global Step: 199990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:35:46,436-Speed 9878.96 samples/sec   Loss 5.3712   LearningRate 0.0161   Epoch: 11   Global Step: 200000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:36:08,610-[lfw][200000]XNorm: 8.798848
Training: 2022-04-11 19:36:08,610-[lfw][200000]Accuracy-Flip: 0.99650+-0.00283
Training: 2022-04-11 19:36:08,611-[lfw][200000]Accuracy-Highest: 0.99683
Training: 2022-04-11 19:36:34,132-[cfp_fp][200000]XNorm: 7.570795
Training: 2022-04-11 19:36:34,133-[cfp_fp][200000]Accuracy-Flip: 0.96771+-0.00858
Training: 2022-04-11 19:36:34,133-[cfp_fp][200000]Accuracy-Highest: 0.96771
Training: 2022-04-11 19:36:56,129-[agedb_30][200000]XNorm: 8.513522
Training: 2022-04-11 19:36:56,130-[agedb_30][200000]Accuracy-Flip: 0.96467+-0.00971
Training: 2022-04-11 19:36:56,131-[agedb_30][200000]Accuracy-Highest: 0.96983
Training: 2022-04-11 19:36:57,221-Speed 144.67 samples/sec   Loss 5.3801   LearningRate 0.0161   Epoch: 11   Global Step: 200010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:36:58,352-Speed 9059.76 samples/sec   Loss 5.3245   LearningRate 0.0161   Epoch: 11   Global Step: 200020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:36:59,432-Speed 9481.19 samples/sec   Loss 5.3992   LearningRate 0.0161   Epoch: 11   Global Step: 200030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:00,517-Speed 9450.56 samples/sec   Loss 5.3688   LearningRate 0.0161   Epoch: 11   Global Step: 200040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:01,594-Speed 9506.48 samples/sec   Loss 5.2778   LearningRate 0.0161   Epoch: 11   Global Step: 200050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:02,681-Speed 9432.07 samples/sec   Loss 5.2911   LearningRate 0.0161   Epoch: 11   Global Step: 200060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:37:03,790-Speed 9235.00 samples/sec   Loss 5.1641   LearningRate 0.0161   Epoch: 11   Global Step: 200070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:37:04,886-Speed 9351.96 samples/sec   Loss 5.2919   LearningRate 0.0161   Epoch: 11   Global Step: 200080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:37:05,963-Speed 9510.69 samples/sec   Loss 5.3231   LearningRate 0.0160   Epoch: 11   Global Step: 200090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:37:07,061-Speed 9332.37 samples/sec   Loss 5.3277   LearningRate 0.0160   Epoch: 11   Global Step: 200100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:37:08,152-Speed 9391.95 samples/sec   Loss 5.3080   LearningRate 0.0160   Epoch: 11   Global Step: 200110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:37:09,231-Speed 9494.56 samples/sec   Loss 5.2699   LearningRate 0.0160   Epoch: 11   Global Step: 200120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:37:10,317-Speed 9433.99 samples/sec   Loss 5.3509   LearningRate 0.0160   Epoch: 11   Global Step: 200130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:37:11,424-Speed 9257.73 samples/sec   Loss 5.2632   LearningRate 0.0160   Epoch: 11   Global Step: 200140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:37:12,493-Speed 9586.27 samples/sec   Loss 5.2691   LearningRate 0.0160   Epoch: 11   Global Step: 200150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:37:13,611-Speed 9164.55 samples/sec   Loss 5.3178   LearningRate 0.0160   Epoch: 11   Global Step: 200160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:14,696-Speed 9437.34 samples/sec   Loss 5.4581   LearningRate 0.0160   Epoch: 11   Global Step: 200170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:15,773-Speed 9514.74 samples/sec   Loss 5.3763   LearningRate 0.0160   Epoch: 11   Global Step: 200180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:16,898-Speed 9110.43 samples/sec   Loss 5.3643   LearningRate 0.0160   Epoch: 11   Global Step: 200190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:17,995-Speed 9336.81 samples/sec   Loss 5.2771   LearningRate 0.0160   Epoch: 11   Global Step: 200200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:19,082-Speed 9426.61 samples/sec   Loss 5.4147   LearningRate 0.0160   Epoch: 11   Global Step: 200210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:20,127-Speed 9806.11 samples/sec   Loss 5.2634   LearningRate 0.0160   Epoch: 11   Global Step: 200220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:21,235-Speed 9247.09 samples/sec   Loss 5.1984   LearningRate 0.0160   Epoch: 11   Global Step: 200230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:22,343-Speed 9254.77 samples/sec   Loss 5.1925   LearningRate 0.0160   Epoch: 11   Global Step: 200240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:23,452-Speed 9239.52 samples/sec   Loss 5.1695   LearningRate 0.0160   Epoch: 11   Global Step: 200250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:24,537-Speed 9440.86 samples/sec   Loss 5.4040   LearningRate 0.0160   Epoch: 11   Global Step: 200260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:25,664-Speed 9091.87 samples/sec   Loss 5.2301   LearningRate 0.0160   Epoch: 11   Global Step: 200270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:26,749-Speed 9444.92 samples/sec   Loss 5.3398   LearningRate 0.0160   Epoch: 11   Global Step: 200280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:28,079-Speed 7697.90 samples/sec   Loss 5.3355   LearningRate 0.0160   Epoch: 11   Global Step: 200290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:55,540-Speed 372.91 samples/sec   Loss 4.7374   LearningRate 0.0160   Epoch: 12   Global Step: 200300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:57,379-Speed 5572.66 samples/sec   Loss 4.5564   LearningRate 0.0160   Epoch: 12   Global Step: 200310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:58,619-Speed 8265.20 samples/sec   Loss 4.5416   LearningRate 0.0160   Epoch: 12   Global Step: 200320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:37:59,930-Speed 7816.95 samples/sec   Loss 4.5539   LearningRate 0.0160   Epoch: 12   Global Step: 200330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:01,353-Speed 7203.19 samples/sec   Loss 4.5835   LearningRate 0.0160   Epoch: 12   Global Step: 200340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:02,797-Speed 7094.98 samples/sec   Loss 4.6246   LearningRate 0.0160   Epoch: 12   Global Step: 200350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:03,859-Speed 9652.97 samples/sec   Loss 4.5743   LearningRate 0.0160   Epoch: 12   Global Step: 200360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:04,950-Speed 9387.76 samples/sec   Loss 4.5719   LearningRate 0.0160   Epoch: 12   Global Step: 200370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:06,015-Speed 9624.98 samples/sec   Loss 4.5592   LearningRate 0.0160   Epoch: 12   Global Step: 200380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:07,256-Speed 8258.69 samples/sec   Loss 4.5833   LearningRate 0.0160   Epoch: 12   Global Step: 200390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:08,380-Speed 9116.11 samples/sec   Loss 4.5994   LearningRate 0.0160   Epoch: 12   Global Step: 200400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:09,487-Speed 9255.49 samples/sec   Loss 4.6441   LearningRate 0.0160   Epoch: 12   Global Step: 200410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:10,581-Speed 9369.77 samples/sec   Loss 4.5565   LearningRate 0.0160   Epoch: 12   Global Step: 200420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:11,716-Speed 9031.62 samples/sec   Loss 4.5087   LearningRate 0.0160   Epoch: 12   Global Step: 200430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:12,828-Speed 9218.64 samples/sec   Loss 4.5497   LearningRate 0.0160   Epoch: 12   Global Step: 200440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:13,941-Speed 9201.21 samples/sec   Loss 4.6402   LearningRate 0.0160   Epoch: 12   Global Step: 200450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:15,032-Speed 9397.30 samples/sec   Loss 4.5633   LearningRate 0.0160   Epoch: 12   Global Step: 200460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:16,108-Speed 9532.82 samples/sec   Loss 4.5854   LearningRate 0.0160   Epoch: 12   Global Step: 200470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:17,254-Speed 8938.59 samples/sec   Loss 4.5104   LearningRate 0.0160   Epoch: 12   Global Step: 200480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:18,403-Speed 8918.78 samples/sec   Loss 4.5589   LearningRate 0.0160   Epoch: 12   Global Step: 200490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:19,526-Speed 9125.93 samples/sec   Loss 4.5186   LearningRate 0.0160   Epoch: 12   Global Step: 200500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:20,590-Speed 9629.94 samples/sec   Loss 4.6101   LearningRate 0.0159   Epoch: 12   Global Step: 200510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:21,686-Speed 9347.31 samples/sec   Loss 4.6141   LearningRate 0.0159   Epoch: 12   Global Step: 200520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:22,800-Speed 9202.50 samples/sec   Loss 4.6651   LearningRate 0.0159   Epoch: 12   Global Step: 200530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:23,919-Speed 9156.90 samples/sec   Loss 4.6247   LearningRate 0.0159   Epoch: 12   Global Step: 200540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:25,068-Speed 8913.74 samples/sec   Loss 4.5887   LearningRate 0.0159   Epoch: 12   Global Step: 200550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:26,207-Speed 8999.48 samples/sec   Loss 4.6099   LearningRate 0.0159   Epoch: 12   Global Step: 200560   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:38:27,359-Speed 8892.71 samples/sec   Loss 4.5717   LearningRate 0.0159   Epoch: 12   Global Step: 200570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:28,441-Speed 9469.89 samples/sec   Loss 4.6042   LearningRate 0.0159   Epoch: 12   Global Step: 200580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:29,525-Speed 9455.02 samples/sec   Loss 4.6425   LearningRate 0.0159   Epoch: 12   Global Step: 200590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:30,598-Speed 9551.07 samples/sec   Loss 4.6221   LearningRate 0.0159   Epoch: 12   Global Step: 200600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:31,761-Speed 8808.96 samples/sec   Loss 4.7000   LearningRate 0.0159   Epoch: 12   Global Step: 200610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:32,862-Speed 9309.39 samples/sec   Loss 4.5734   LearningRate 0.0159   Epoch: 12   Global Step: 200620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:33,930-Speed 9599.50 samples/sec   Loss 4.5932   LearningRate 0.0159   Epoch: 12   Global Step: 200630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:35,045-Speed 9191.32 samples/sec   Loss 4.5536   LearningRate 0.0159   Epoch: 12   Global Step: 200640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:36,109-Speed 9630.84 samples/sec   Loss 4.6190   LearningRate 0.0159   Epoch: 12   Global Step: 200650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:37,257-Speed 8921.91 samples/sec   Loss 4.6208   LearningRate 0.0159   Epoch: 12   Global Step: 200660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:38,355-Speed 9333.12 samples/sec   Loss 4.5183   LearningRate 0.0159   Epoch: 12   Global Step: 200670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:39,449-Speed 9366.30 samples/sec   Loss 4.6455   LearningRate 0.0159   Epoch: 12   Global Step: 200680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:40,554-Speed 9275.31 samples/sec   Loss 4.6957   LearningRate 0.0159   Epoch: 12   Global Step: 200690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:41,683-Speed 9077.77 samples/sec   Loss 4.6222   LearningRate 0.0159   Epoch: 12   Global Step: 200700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:42,750-Speed 9599.24 samples/sec   Loss 4.6090   LearningRate 0.0159   Epoch: 12   Global Step: 200710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:44,388-Speed 6254.48 samples/sec   Loss 4.6129   LearningRate 0.0159   Epoch: 12   Global Step: 200720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:45,635-Speed 8213.09 samples/sec   Loss 4.7218   LearningRate 0.0159   Epoch: 12   Global Step: 200730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:47,074-Speed 7121.18 samples/sec   Loss 4.6718   LearningRate 0.0159   Epoch: 12   Global Step: 200740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:48,139-Speed 9613.59 samples/sec   Loss 4.6204   LearningRate 0.0159   Epoch: 12   Global Step: 200750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:49,256-Speed 9176.42 samples/sec   Loss 4.7092   LearningRate 0.0159   Epoch: 12   Global Step: 200760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:50,496-Speed 8265.95 samples/sec   Loss 4.6302   LearningRate 0.0159   Epoch: 12   Global Step: 200770   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:38:51,576-Speed 9495.79 samples/sec   Loss 4.5472   LearningRate 0.0159   Epoch: 12   Global Step: 200780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:52,699-Speed 9120.23 samples/sec   Loss 4.6720   LearningRate 0.0159   Epoch: 12   Global Step: 200790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:53,772-Speed 9551.64 samples/sec   Loss 4.6089   LearningRate 0.0159   Epoch: 12   Global Step: 200800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:54,834-Speed 9646.04 samples/sec   Loss 4.5638   LearningRate 0.0159   Epoch: 12   Global Step: 200810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:55,919-Speed 9449.39 samples/sec   Loss 4.6095   LearningRate 0.0159   Epoch: 12   Global Step: 200820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:57,002-Speed 9461.12 samples/sec   Loss 4.7544   LearningRate 0.0159   Epoch: 12   Global Step: 200830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:58,102-Speed 9308.50 samples/sec   Loss 4.5940   LearningRate 0.0159   Epoch: 12   Global Step: 200840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:38:59,223-Speed 9146.41 samples/sec   Loss 4.6085   LearningRate 0.0159   Epoch: 12   Global Step: 200850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:00,318-Speed 9357.60 samples/sec   Loss 4.6463   LearningRate 0.0159   Epoch: 12   Global Step: 200860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:01,418-Speed 9314.49 samples/sec   Loss 4.5787   LearningRate 0.0159   Epoch: 12   Global Step: 200870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:02,486-Speed 9590.66 samples/sec   Loss 4.5889   LearningRate 0.0159   Epoch: 12   Global Step: 200880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:03,586-Speed 9315.98 samples/sec   Loss 4.6748   LearningRate 0.0159   Epoch: 12   Global Step: 200890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:04,692-Speed 9264.94 samples/sec   Loss 4.7241   LearningRate 0.0159   Epoch: 12   Global Step: 200900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:05,820-Speed 9079.33 samples/sec   Loss 4.7110   LearningRate 0.0159   Epoch: 12   Global Step: 200910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:06,900-Speed 9488.52 samples/sec   Loss 4.6708   LearningRate 0.0158   Epoch: 12   Global Step: 200920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:08,173-Speed 8053.77 samples/sec   Loss 4.6737   LearningRate 0.0158   Epoch: 12   Global Step: 200930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:09,353-Speed 8684.29 samples/sec   Loss 4.6085   LearningRate 0.0158   Epoch: 12   Global Step: 200940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:10,453-Speed 9317.05 samples/sec   Loss 4.6889   LearningRate 0.0158   Epoch: 12   Global Step: 200950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:11,548-Speed 9358.53 samples/sec   Loss 4.6546   LearningRate 0.0158   Epoch: 12   Global Step: 200960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:12,617-Speed 9581.11 samples/sec   Loss 4.6997   LearningRate 0.0158   Epoch: 12   Global Step: 200970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:13,698-Speed 9481.21 samples/sec   Loss 4.6269   LearningRate 0.0158   Epoch: 12   Global Step: 200980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:14,794-Speed 9350.86 samples/sec   Loss 4.6516   LearningRate 0.0158   Epoch: 12   Global Step: 200990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:15,882-Speed 9410.85 samples/sec   Loss 4.6037   LearningRate 0.0158   Epoch: 12   Global Step: 201000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:17,016-Speed 9038.83 samples/sec   Loss 4.6309   LearningRate 0.0158   Epoch: 12   Global Step: 201010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:18,139-Speed 9126.51 samples/sec   Loss 4.6821   LearningRate 0.0158   Epoch: 12   Global Step: 201020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:19,247-Speed 9239.26 samples/sec   Loss 4.6666   LearningRate 0.0158   Epoch: 12   Global Step: 201030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:20,323-Speed 9528.56 samples/sec   Loss 4.6333   LearningRate 0.0158   Epoch: 12   Global Step: 201040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:21,409-Speed 9434.75 samples/sec   Loss 4.6683   LearningRate 0.0158   Epoch: 12   Global Step: 201050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:22,498-Speed 9409.41 samples/sec   Loss 4.6383   LearningRate 0.0158   Epoch: 12   Global Step: 201060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:23,590-Speed 9380.16 samples/sec   Loss 4.5913   LearningRate 0.0158   Epoch: 12   Global Step: 201070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:24,684-Speed 9366.24 samples/sec   Loss 4.6217   LearningRate 0.0158   Epoch: 12   Global Step: 201080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:25,778-Speed 9361.94 samples/sec   Loss 4.6320   LearningRate 0.0158   Epoch: 12   Global Step: 201090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:26,960-Speed 8670.07 samples/sec   Loss 4.7373   LearningRate 0.0158   Epoch: 12   Global Step: 201100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:28,085-Speed 9107.81 samples/sec   Loss 4.6236   LearningRate 0.0158   Epoch: 12   Global Step: 201110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:29,234-Speed 8921.21 samples/sec   Loss 4.6000   LearningRate 0.0158   Epoch: 12   Global Step: 201120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:30,330-Speed 9348.81 samples/sec   Loss 4.7455   LearningRate 0.0158   Epoch: 12   Global Step: 201130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:31,426-Speed 9351.88 samples/sec   Loss 4.6155   LearningRate 0.0158   Epoch: 12   Global Step: 201140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:32,515-Speed 9409.18 samples/sec   Loss 4.7721   LearningRate 0.0158   Epoch: 12   Global Step: 201150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:33,603-Speed 9422.20 samples/sec   Loss 4.6712   LearningRate 0.0158   Epoch: 12   Global Step: 201160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:34,723-Speed 9150.03 samples/sec   Loss 4.7304   LearningRate 0.0158   Epoch: 12   Global Step: 201170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:35,831-Speed 9245.43 samples/sec   Loss 4.7479   LearningRate 0.0158   Epoch: 12   Global Step: 201180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:36,971-Speed 8993.21 samples/sec   Loss 4.8029   LearningRate 0.0158   Epoch: 12   Global Step: 201190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:38,062-Speed 9395.93 samples/sec   Loss 4.7155   LearningRate 0.0158   Epoch: 12   Global Step: 201200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:39,216-Speed 8875.12 samples/sec   Loss 4.6468   LearningRate 0.0158   Epoch: 12   Global Step: 201210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:40,330-Speed 9199.12 samples/sec   Loss 4.7041   LearningRate 0.0158   Epoch: 12   Global Step: 201220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:41,433-Speed 9295.62 samples/sec   Loss 4.7263   LearningRate 0.0158   Epoch: 12   Global Step: 201230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:42,550-Speed 9176.69 samples/sec   Loss 4.7032   LearningRate 0.0158   Epoch: 12   Global Step: 201240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:43,654-Speed 9277.30 samples/sec   Loss 4.6617   LearningRate 0.0158   Epoch: 12   Global Step: 201250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:44,728-Speed 9543.01 samples/sec   Loss 4.7505   LearningRate 0.0158   Epoch: 12   Global Step: 201260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:45,783-Speed 9713.79 samples/sec   Loss 4.7215   LearningRate 0.0158   Epoch: 12   Global Step: 201270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:46,892-Speed 9237.61 samples/sec   Loss 4.6893   LearningRate 0.0158   Epoch: 12   Global Step: 201280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:47,954-Speed 9652.43 samples/sec   Loss 4.6691   LearningRate 0.0158   Epoch: 12   Global Step: 201290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:39:49,057-Speed 9282.64 samples/sec   Loss 4.6966   LearningRate 0.0158   Epoch: 12   Global Step: 201300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:50,206-Speed 8929.19 samples/sec   Loss 4.6344   LearningRate 0.0158   Epoch: 12   Global Step: 201310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:51,295-Speed 9408.32 samples/sec   Loss 4.7172   LearningRate 0.0158   Epoch: 12   Global Step: 201320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:52,421-Speed 9098.23 samples/sec   Loss 4.8109   LearningRate 0.0158   Epoch: 12   Global Step: 201330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:53,520-Speed 9322.85 samples/sec   Loss 4.6645   LearningRate 0.0157   Epoch: 12   Global Step: 201340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:54,594-Speed 9545.75 samples/sec   Loss 4.6932   LearningRate 0.0157   Epoch: 12   Global Step: 201350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:55,714-Speed 9142.68 samples/sec   Loss 4.7488   LearningRate 0.0157   Epoch: 12   Global Step: 201360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:56,838-Speed 9119.24 samples/sec   Loss 4.7987   LearningRate 0.0157   Epoch: 12   Global Step: 201370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:57,951-Speed 9201.65 samples/sec   Loss 4.6559   LearningRate 0.0157   Epoch: 12   Global Step: 201380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:39:59,093-Speed 8975.00 samples/sec   Loss 4.7138   LearningRate 0.0157   Epoch: 12   Global Step: 201390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:00,189-Speed 9348.64 samples/sec   Loss 4.7178   LearningRate 0.0157   Epoch: 12   Global Step: 201400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:01,258-Speed 9588.17 samples/sec   Loss 4.7511   LearningRate 0.0157   Epoch: 12   Global Step: 201410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:02,332-Speed 9535.77 samples/sec   Loss 4.7195   LearningRate 0.0157   Epoch: 12   Global Step: 201420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:40:03,423-Speed 9390.17 samples/sec   Loss 4.6947   LearningRate 0.0157   Epoch: 12   Global Step: 201430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:40:04,498-Speed 9536.05 samples/sec   Loss 4.7451   LearningRate 0.0157   Epoch: 12   Global Step: 201440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:40:05,584-Speed 9432.30 samples/sec   Loss 4.7321   LearningRate 0.0157   Epoch: 12   Global Step: 201450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:40:06,706-Speed 9127.76 samples/sec   Loss 4.7380   LearningRate 0.0157   Epoch: 12   Global Step: 201460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:40:07,765-Speed 9686.07 samples/sec   Loss 4.7006   LearningRate 0.0157   Epoch: 12   Global Step: 201470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:40:08,888-Speed 9125.57 samples/sec   Loss 4.6844   LearningRate 0.0157   Epoch: 12   Global Step: 201480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:40:10,010-Speed 9132.90 samples/sec   Loss 4.6774   LearningRate 0.0157   Epoch: 12   Global Step: 201490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:40:11,058-Speed 9783.72 samples/sec   Loss 4.7153   LearningRate 0.0157   Epoch: 12   Global Step: 201500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:40:12,118-Speed 9667.53 samples/sec   Loss 4.7678   LearningRate 0.0157   Epoch: 12   Global Step: 201510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:40:13,201-Speed 9452.80 samples/sec   Loss 4.7484   LearningRate 0.0157   Epoch: 12   Global Step: 201520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:14,307-Speed 9266.53 samples/sec   Loss 4.7237   LearningRate 0.0157   Epoch: 12   Global Step: 201530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:15,568-Speed 8127.71 samples/sec   Loss 4.6994   LearningRate 0.0157   Epoch: 12   Global Step: 201540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:16,679-Speed 9222.61 samples/sec   Loss 4.8221   LearningRate 0.0157   Epoch: 12   Global Step: 201550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:17,802-Speed 9118.42 samples/sec   Loss 4.6015   LearningRate 0.0157   Epoch: 12   Global Step: 201560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:18,929-Speed 9095.17 samples/sec   Loss 4.8317   LearningRate 0.0157   Epoch: 12   Global Step: 201570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:20,045-Speed 9188.64 samples/sec   Loss 4.6217   LearningRate 0.0157   Epoch: 12   Global Step: 201580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:21,220-Speed 8715.87 samples/sec   Loss 4.7397   LearningRate 0.0157   Epoch: 12   Global Step: 201590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:22,303-Speed 9464.05 samples/sec   Loss 4.6670   LearningRate 0.0157   Epoch: 12   Global Step: 201600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:23,678-Speed 7452.03 samples/sec   Loss 4.7307   LearningRate 0.0157   Epoch: 12   Global Step: 201610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:24,729-Speed 9755.59 samples/sec   Loss 4.8012   LearningRate 0.0157   Epoch: 12   Global Step: 201620   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:40:25,803-Speed 9538.20 samples/sec   Loss 4.7188   LearningRate 0.0157   Epoch: 12   Global Step: 201630   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:40:26,870-Speed 9601.82 samples/sec   Loss 4.7714   LearningRate 0.0157   Epoch: 12   Global Step: 201640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:28,057-Speed 8635.24 samples/sec   Loss 4.7055   LearningRate 0.0157   Epoch: 12   Global Step: 201650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:29,187-Speed 9062.32 samples/sec   Loss 4.6806   LearningRate 0.0157   Epoch: 12   Global Step: 201660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:30,315-Speed 9089.25 samples/sec   Loss 4.7446   LearningRate 0.0157   Epoch: 12   Global Step: 201670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:31,438-Speed 9127.87 samples/sec   Loss 4.7051   LearningRate 0.0157   Epoch: 12   Global Step: 201680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:32,589-Speed 8895.23 samples/sec   Loss 4.6981   LearningRate 0.0157   Epoch: 12   Global Step: 201690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:33,742-Speed 8889.63 samples/sec   Loss 4.7220   LearningRate 0.0157   Epoch: 12   Global Step: 201700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:34,872-Speed 9070.91 samples/sec   Loss 4.7558   LearningRate 0.0157   Epoch: 12   Global Step: 201710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:35,974-Speed 9292.04 samples/sec   Loss 4.7939   LearningRate 0.0157   Epoch: 12   Global Step: 201720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:37,068-Speed 9367.42 samples/sec   Loss 4.9498   LearningRate 0.0157   Epoch: 12   Global Step: 201730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:38,150-Speed 9478.06 samples/sec   Loss 4.8290   LearningRate 0.0157   Epoch: 12   Global Step: 201740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:39,226-Speed 9523.15 samples/sec   Loss 4.7227   LearningRate 0.0157   Epoch: 12   Global Step: 201750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:40,324-Speed 9336.30 samples/sec   Loss 4.8509   LearningRate 0.0157   Epoch: 12   Global Step: 201760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:41,417-Speed 9366.94 samples/sec   Loss 4.6612   LearningRate 0.0156   Epoch: 12   Global Step: 201770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:42,482-Speed 9627.46 samples/sec   Loss 4.7413   LearningRate 0.0156   Epoch: 12   Global Step: 201780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:43,604-Speed 9136.93 samples/sec   Loss 4.6815   LearningRate 0.0156   Epoch: 12   Global Step: 201790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:44,689-Speed 9442.15 samples/sec   Loss 4.7541   LearningRate 0.0156   Epoch: 12   Global Step: 201800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:45,777-Speed 9419.58 samples/sec   Loss 4.7560   LearningRate 0.0156   Epoch: 12   Global Step: 201810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:46,910-Speed 9048.91 samples/sec   Loss 4.7273   LearningRate 0.0156   Epoch: 12   Global Step: 201820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:47,984-Speed 9539.51 samples/sec   Loss 4.7866   LearningRate 0.0156   Epoch: 12   Global Step: 201830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:49,119-Speed 9027.54 samples/sec   Loss 4.7080   LearningRate 0.0156   Epoch: 12   Global Step: 201840   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:40:50,185-Speed 9614.78 samples/sec   Loss 4.7650   LearningRate 0.0156   Epoch: 12   Global Step: 201850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:51,270-Speed 9445.18 samples/sec   Loss 4.6597   LearningRate 0.0156   Epoch: 12   Global Step: 201860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:52,441-Speed 8744.20 samples/sec   Loss 4.7923   LearningRate 0.0156   Epoch: 12   Global Step: 201870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:53,543-Speed 9294.94 samples/sec   Loss 4.7906   LearningRate 0.0156   Epoch: 12   Global Step: 201880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:54,625-Speed 9471.18 samples/sec   Loss 4.7885   LearningRate 0.0156   Epoch: 12   Global Step: 201890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:55,715-Speed 9406.09 samples/sec   Loss 4.7960   LearningRate 0.0156   Epoch: 12   Global Step: 201900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:56,859-Speed 8952.39 samples/sec   Loss 4.7686   LearningRate 0.0156   Epoch: 12   Global Step: 201910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:58,054-Speed 8579.66 samples/sec   Loss 4.7912   LearningRate 0.0156   Epoch: 12   Global Step: 201920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:40:59,149-Speed 9357.87 samples/sec   Loss 4.8263   LearningRate 0.0156   Epoch: 12   Global Step: 201930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:41:00,242-Speed 9373.04 samples/sec   Loss 4.7970   LearningRate 0.0156   Epoch: 12   Global Step: 201940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:41:01,410-Speed 8780.71 samples/sec   Loss 4.7284   LearningRate 0.0156   Epoch: 12   Global Step: 201950   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:41:02,486-Speed 9517.59 samples/sec   Loss 4.6900   LearningRate 0.0156   Epoch: 12   Global Step: 201960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:41:03,571-Speed 9443.16 samples/sec   Loss 4.7523   LearningRate 0.0156   Epoch: 12   Global Step: 201970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:41:04,690-Speed 9266.86 samples/sec   Loss 4.7835   LearningRate 0.0156   Epoch: 12   Global Step: 201980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:41:05,753-Speed 9648.85 samples/sec   Loss 4.8489   LearningRate 0.0156   Epoch: 12   Global Step: 201990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:41:06,812-Speed 9672.94 samples/sec   Loss 4.8108   LearningRate 0.0156   Epoch: 12   Global Step: 202000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:41:28,659-[lfw][202000]XNorm: 8.694646
Training: 2022-04-11 19:41:28,660-[lfw][202000]Accuracy-Flip: 0.99617+-0.00325
Training: 2022-04-11 19:41:28,660-[lfw][202000]Accuracy-Highest: 0.99683
Training: 2022-04-11 19:41:53,851-[cfp_fp][202000]XNorm: 7.466227
Training: 2022-04-11 19:41:53,852-[cfp_fp][202000]Accuracy-Flip: 0.96171+-0.00921
Training: 2022-04-11 19:41:53,852-[cfp_fp][202000]Accuracy-Highest: 0.96771
Training: 2022-04-11 19:42:15,575-[agedb_30][202000]XNorm: 8.526358
Training: 2022-04-11 19:42:15,576-[agedb_30][202000]Accuracy-Flip: 0.96667+-0.01057
Training: 2022-04-11 19:42:15,576-[agedb_30][202000]Accuracy-Highest: 0.96983
Training: 2022-04-11 19:42:16,702-Speed 146.52 samples/sec   Loss 4.8032   LearningRate 0.0156   Epoch: 12   Global Step: 202010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:17,771-Speed 9588.82 samples/sec   Loss 4.7937   LearningRate 0.0156   Epoch: 12   Global Step: 202020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:18,883-Speed 9213.74 samples/sec   Loss 4.7664   LearningRate 0.0156   Epoch: 12   Global Step: 202030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:19,970-Speed 9421.15 samples/sec   Loss 4.7829   LearningRate 0.0156   Epoch: 12   Global Step: 202040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:21,072-Speed 9306.40 samples/sec   Loss 4.8075   LearningRate 0.0156   Epoch: 12   Global Step: 202050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:22,151-Speed 9494.63 samples/sec   Loss 4.7980   LearningRate 0.0156   Epoch: 12   Global Step: 202060   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:42:23,196-Speed 9805.32 samples/sec   Loss 4.8458   LearningRate 0.0156   Epoch: 12   Global Step: 202070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:24,258-Speed 9648.92 samples/sec   Loss 4.7667   LearningRate 0.0156   Epoch: 12   Global Step: 202080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:25,381-Speed 9124.62 samples/sec   Loss 4.7678   LearningRate 0.0156   Epoch: 12   Global Step: 202090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:26,468-Speed 9432.55 samples/sec   Loss 4.7590   LearningRate 0.0156   Epoch: 12   Global Step: 202100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:27,526-Speed 9676.79 samples/sec   Loss 4.8679   LearningRate 0.0156   Epoch: 12   Global Step: 202110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:28,599-Speed 9551.43 samples/sec   Loss 4.8647   LearningRate 0.0156   Epoch: 12   Global Step: 202120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:29,696-Speed 9432.48 samples/sec   Loss 4.7708   LearningRate 0.0156   Epoch: 12   Global Step: 202130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:30,807-Speed 9222.49 samples/sec   Loss 4.8198   LearningRate 0.0156   Epoch: 12   Global Step: 202140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:31,880-Speed 9547.16 samples/sec   Loss 4.7478   LearningRate 0.0156   Epoch: 12   Global Step: 202150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:32,994-Speed 9200.39 samples/sec   Loss 4.8309   LearningRate 0.0156   Epoch: 12   Global Step: 202160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:34,044-Speed 9762.04 samples/sec   Loss 4.8555   LearningRate 0.0156   Epoch: 12   Global Step: 202170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:35,112-Speed 9591.15 samples/sec   Loss 4.9010   LearningRate 0.0156   Epoch: 12   Global Step: 202180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:36,192-Speed 9491.33 samples/sec   Loss 4.7993   LearningRate 0.0155   Epoch: 12   Global Step: 202190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:37,298-Speed 9260.41 samples/sec   Loss 4.8646   LearningRate 0.0155   Epoch: 12   Global Step: 202200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:38,389-Speed 9399.76 samples/sec   Loss 4.7794   LearningRate 0.0155   Epoch: 12   Global Step: 202210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:42:39,469-Speed 9485.18 samples/sec   Loss 4.7964   LearningRate 0.0155   Epoch: 12   Global Step: 202220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:42:40,524-Speed 9809.96 samples/sec   Loss 4.8162   LearningRate 0.0155   Epoch: 12   Global Step: 202230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:42:41,660-Speed 9021.88 samples/sec   Loss 4.7864   LearningRate 0.0155   Epoch: 12   Global Step: 202240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:42:42,789-Speed 9084.17 samples/sec   Loss 4.8079   LearningRate 0.0155   Epoch: 12   Global Step: 202250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:42:43,874-Speed 9442.12 samples/sec   Loss 4.8354   LearningRate 0.0155   Epoch: 12   Global Step: 202260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:42:44,998-Speed 9117.90 samples/sec   Loss 4.8028   LearningRate 0.0155   Epoch: 12   Global Step: 202270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:42:46,104-Speed 9268.07 samples/sec   Loss 4.8128   LearningRate 0.0155   Epoch: 12   Global Step: 202280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:42:47,186-Speed 9463.70 samples/sec   Loss 4.7273   LearningRate 0.0155   Epoch: 12   Global Step: 202290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:42:48,322-Speed 9019.27 samples/sec   Loss 4.7645   LearningRate 0.0155   Epoch: 12   Global Step: 202300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:42:49,454-Speed 9049.55 samples/sec   Loss 4.8754   LearningRate 0.0155   Epoch: 12   Global Step: 202310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:50,575-Speed 9142.23 samples/sec   Loss 4.8590   LearningRate 0.0155   Epoch: 12   Global Step: 202320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:51,629-Speed 9726.92 samples/sec   Loss 4.8013   LearningRate 0.0155   Epoch: 12   Global Step: 202330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:52,795-Speed 8787.32 samples/sec   Loss 4.7356   LearningRate 0.0155   Epoch: 12   Global Step: 202340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:53,896-Speed 9302.94 samples/sec   Loss 4.7071   LearningRate 0.0155   Epoch: 12   Global Step: 202350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:55,123-Speed 8421.31 samples/sec   Loss 4.7868   LearningRate 0.0155   Epoch: 12   Global Step: 202360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:56,206-Speed 9466.07 samples/sec   Loss 4.8425   LearningRate 0.0155   Epoch: 12   Global Step: 202370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:57,306-Speed 9311.34 samples/sec   Loss 4.9268   LearningRate 0.0155   Epoch: 12   Global Step: 202380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:58,365-Speed 9673.64 samples/sec   Loss 4.7438   LearningRate 0.0155   Epoch: 12   Global Step: 202390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:42:59,452-Speed 9431.63 samples/sec   Loss 4.8873   LearningRate 0.0155   Epoch: 12   Global Step: 202400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:00,570-Speed 9161.49 samples/sec   Loss 4.8678   LearningRate 0.0155   Epoch: 12   Global Step: 202410   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:43:01,648-Speed 9511.49 samples/sec   Loss 4.8007   LearningRate 0.0155   Epoch: 12   Global Step: 202420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:02,738-Speed 9399.92 samples/sec   Loss 4.7941   LearningRate 0.0155   Epoch: 12   Global Step: 202430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:03,904-Speed 8781.65 samples/sec   Loss 4.7029   LearningRate 0.0155   Epoch: 12   Global Step: 202440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:05,039-Speed 9032.49 samples/sec   Loss 4.9281   LearningRate 0.0155   Epoch: 12   Global Step: 202450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:06,130-Speed 9400.64 samples/sec   Loss 4.7992   LearningRate 0.0155   Epoch: 12   Global Step: 202460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:07,228-Speed 9331.48 samples/sec   Loss 4.7963   LearningRate 0.0155   Epoch: 12   Global Step: 202470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:08,340-Speed 9212.81 samples/sec   Loss 4.7945   LearningRate 0.0155   Epoch: 12   Global Step: 202480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:09,443-Speed 9291.99 samples/sec   Loss 4.7324   LearningRate 0.0155   Epoch: 12   Global Step: 202490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:10,527-Speed 9452.49 samples/sec   Loss 4.8590   LearningRate 0.0155   Epoch: 12   Global Step: 202500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:11,624-Speed 9344.14 samples/sec   Loss 4.6881   LearningRate 0.0155   Epoch: 12   Global Step: 202510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:12,735-Speed 9217.14 samples/sec   Loss 4.8251   LearningRate 0.0155   Epoch: 12   Global Step: 202520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:13,825-Speed 9405.07 samples/sec   Loss 4.8514   LearningRate 0.0155   Epoch: 12   Global Step: 202530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:14,898-Speed 9540.77 samples/sec   Loss 4.8638   LearningRate 0.0155   Epoch: 12   Global Step: 202540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:15,979-Speed 9484.42 samples/sec   Loss 4.7645   LearningRate 0.0155   Epoch: 12   Global Step: 202550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:17,064-Speed 9446.81 samples/sec   Loss 4.8886   LearningRate 0.0155   Epoch: 12   Global Step: 202560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:18,151-Speed 9426.03 samples/sec   Loss 4.8209   LearningRate 0.0155   Epoch: 12   Global Step: 202570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:19,298-Speed 8932.73 samples/sec   Loss 4.9314   LearningRate 0.0155   Epoch: 12   Global Step: 202580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:20,404-Speed 9264.33 samples/sec   Loss 4.8517   LearningRate 0.0155   Epoch: 12   Global Step: 202590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:21,478-Speed 9548.09 samples/sec   Loss 4.8624   LearningRate 0.0155   Epoch: 12   Global Step: 202600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:22,528-Speed 9752.00 samples/sec   Loss 4.8266   LearningRate 0.0154   Epoch: 12   Global Step: 202610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:23,664-Speed 9022.52 samples/sec   Loss 4.8596   LearningRate 0.0154   Epoch: 12   Global Step: 202620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:24,784-Speed 9147.43 samples/sec   Loss 4.7793   LearningRate 0.0154   Epoch: 12   Global Step: 202630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:25,885-Speed 9310.64 samples/sec   Loss 4.7575   LearningRate 0.0154   Epoch: 12   Global Step: 202640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:26,996-Speed 9220.64 samples/sec   Loss 4.7749   LearningRate 0.0154   Epoch: 12   Global Step: 202650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:28,096-Speed 9308.89 samples/sec   Loss 4.8390   LearningRate 0.0154   Epoch: 12   Global Step: 202660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:29,227-Speed 9065.49 samples/sec   Loss 4.8379   LearningRate 0.0154   Epoch: 12   Global Step: 202670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:30,285-Speed 9688.69 samples/sec   Loss 4.8921   LearningRate 0.0154   Epoch: 12   Global Step: 202680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:31,355-Speed 9571.28 samples/sec   Loss 4.9915   LearningRate 0.0154   Epoch: 12   Global Step: 202690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:32,462-Speed 9251.80 samples/sec   Loss 4.8696   LearningRate 0.0154   Epoch: 12   Global Step: 202700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:33,571-Speed 9246.03 samples/sec   Loss 4.8563   LearningRate 0.0154   Epoch: 12   Global Step: 202710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:34,648-Speed 9515.43 samples/sec   Loss 4.9142   LearningRate 0.0154   Epoch: 12   Global Step: 202720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:35,700-Speed 9735.12 samples/sec   Loss 4.7741   LearningRate 0.0154   Epoch: 12   Global Step: 202730   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:43:36,854-Speed 8881.46 samples/sec   Loss 4.8518   LearningRate 0.0154   Epoch: 12   Global Step: 202740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:37,948-Speed 9367.29 samples/sec   Loss 4.8394   LearningRate 0.0154   Epoch: 12   Global Step: 202750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:39,040-Speed 9379.25 samples/sec   Loss 4.8320   LearningRate 0.0154   Epoch: 12   Global Step: 202760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:40,153-Speed 9209.18 samples/sec   Loss 4.8192   LearningRate 0.0154   Epoch: 12   Global Step: 202770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:41,276-Speed 9121.93 samples/sec   Loss 4.8442   LearningRate 0.0154   Epoch: 12   Global Step: 202780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:42,409-Speed 9044.37 samples/sec   Loss 4.8589   LearningRate 0.0154   Epoch: 12   Global Step: 202790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:43,507-Speed 9335.44 samples/sec   Loss 4.9015   LearningRate 0.0154   Epoch: 12   Global Step: 202800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:44,617-Speed 9228.69 samples/sec   Loss 4.7847   LearningRate 0.0154   Epoch: 12   Global Step: 202810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:45,692-Speed 9532.56 samples/sec   Loss 4.8952   LearningRate 0.0154   Epoch: 12   Global Step: 202820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:46,808-Speed 9186.37 samples/sec   Loss 4.7451   LearningRate 0.0154   Epoch: 12   Global Step: 202830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:47,885-Speed 9511.01 samples/sec   Loss 4.8482   LearningRate 0.0154   Epoch: 12   Global Step: 202840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:48,983-Speed 9337.88 samples/sec   Loss 4.8322   LearningRate 0.0154   Epoch: 12   Global Step: 202850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:50,091-Speed 9244.65 samples/sec   Loss 4.8483   LearningRate 0.0154   Epoch: 12   Global Step: 202860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:51,167-Speed 9523.45 samples/sec   Loss 4.8532   LearningRate 0.0154   Epoch: 12   Global Step: 202870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:52,274-Speed 9258.57 samples/sec   Loss 4.7868   LearningRate 0.0154   Epoch: 12   Global Step: 202880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:53,388-Speed 9196.17 samples/sec   Loss 4.8993   LearningRate 0.0154   Epoch: 12   Global Step: 202890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:43:54,449-Speed 9653.10 samples/sec   Loss 4.8388   LearningRate 0.0154   Epoch: 12   Global Step: 202900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:55,585-Speed 9026.27 samples/sec   Loss 4.9031   LearningRate 0.0154   Epoch: 12   Global Step: 202910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:56,694-Speed 9234.77 samples/sec   Loss 4.8695   LearningRate 0.0154   Epoch: 12   Global Step: 202920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:57,797-Speed 9288.93 samples/sec   Loss 4.9587   LearningRate 0.0154   Epoch: 12   Global Step: 202930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:58,885-Speed 9420.31 samples/sec   Loss 4.8972   LearningRate 0.0154   Epoch: 12   Global Step: 202940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:43:59,990-Speed 9273.70 samples/sec   Loss 4.8749   LearningRate 0.0154   Epoch: 12   Global Step: 202950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:01,084-Speed 9373.63 samples/sec   Loss 4.8107   LearningRate 0.0154   Epoch: 12   Global Step: 202960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:02,214-Speed 9061.99 samples/sec   Loss 4.8767   LearningRate 0.0154   Epoch: 12   Global Step: 202970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:03,346-Speed 9055.32 samples/sec   Loss 4.8780   LearningRate 0.0154   Epoch: 12   Global Step: 202980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:04,472-Speed 9097.98 samples/sec   Loss 4.9200   LearningRate 0.0154   Epoch: 12   Global Step: 202990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:05,537-Speed 9616.14 samples/sec   Loss 4.8523   LearningRate 0.0154   Epoch: 12   Global Step: 203000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:06,665-Speed 9084.07 samples/sec   Loss 4.8679   LearningRate 0.0154   Epoch: 12   Global Step: 203010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:07,808-Speed 8968.32 samples/sec   Loss 4.9163   LearningRate 0.0154   Epoch: 12   Global Step: 203020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:08,952-Speed 8950.67 samples/sec   Loss 4.9404   LearningRate 0.0154   Epoch: 12   Global Step: 203030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:10,104-Speed 8895.72 samples/sec   Loss 4.8629   LearningRate 0.0153   Epoch: 12   Global Step: 203040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:11,195-Speed 9401.33 samples/sec   Loss 4.7625   LearningRate 0.0153   Epoch: 12   Global Step: 203050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:12,278-Speed 9463.33 samples/sec   Loss 4.8863   LearningRate 0.0153   Epoch: 12   Global Step: 203060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:13,400-Speed 9128.37 samples/sec   Loss 4.8672   LearningRate 0.0153   Epoch: 12   Global Step: 203070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:14,490-Speed 9402.28 samples/sec   Loss 4.8073   LearningRate 0.0153   Epoch: 12   Global Step: 203080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:15,619-Speed 9073.50 samples/sec   Loss 4.8968   LearningRate 0.0153   Epoch: 12   Global Step: 203090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:16,707-Speed 9411.99 samples/sec   Loss 4.8118   LearningRate 0.0153   Epoch: 12   Global Step: 203100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:17,797-Speed 9406.60 samples/sec   Loss 4.8875   LearningRate 0.0153   Epoch: 12   Global Step: 203110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:18,850-Speed 9727.80 samples/sec   Loss 4.8234   LearningRate 0.0153   Epoch: 12   Global Step: 203120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:19,949-Speed 9322.66 samples/sec   Loss 4.8910   LearningRate 0.0153   Epoch: 12   Global Step: 203130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:21,069-Speed 9152.10 samples/sec   Loss 4.8882   LearningRate 0.0153   Epoch: 12   Global Step: 203140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:22,234-Speed 8793.68 samples/sec   Loss 4.7985   LearningRate 0.0153   Epoch: 12   Global Step: 203150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:23,309-Speed 9534.17 samples/sec   Loss 4.8037   LearningRate 0.0153   Epoch: 12   Global Step: 203160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:24,363-Speed 9716.52 samples/sec   Loss 4.8514   LearningRate 0.0153   Epoch: 12   Global Step: 203170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:25,448-Speed 9440.86 samples/sec   Loss 4.8687   LearningRate 0.0153   Epoch: 12   Global Step: 203180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:26,533-Speed 9446.45 samples/sec   Loss 4.9061   LearningRate 0.0153   Epoch: 12   Global Step: 203190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:27,603-Speed 9573.23 samples/sec   Loss 5.0033   LearningRate 0.0153   Epoch: 12   Global Step: 203200   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:44:28,683-Speed 9489.13 samples/sec   Loss 4.9000   LearningRate 0.0153   Epoch: 12   Global Step: 203210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:29,786-Speed 9293.73 samples/sec   Loss 4.8567   LearningRate 0.0153   Epoch: 12   Global Step: 203220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:30,853-Speed 9608.90 samples/sec   Loss 4.9474   LearningRate 0.0153   Epoch: 12   Global Step: 203230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:31,923-Speed 9572.03 samples/sec   Loss 4.9169   LearningRate 0.0153   Epoch: 12   Global Step: 203240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:33,040-Speed 9174.75 samples/sec   Loss 4.9164   LearningRate 0.0153   Epoch: 12   Global Step: 203250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:34,153-Speed 9205.70 samples/sec   Loss 4.8371   LearningRate 0.0153   Epoch: 12   Global Step: 203260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:35,227-Speed 9546.84 samples/sec   Loss 4.9584   LearningRate 0.0153   Epoch: 12   Global Step: 203270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:36,305-Speed 9507.89 samples/sec   Loss 4.8522   LearningRate 0.0153   Epoch: 12   Global Step: 203280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:37,375-Speed 9569.00 samples/sec   Loss 4.9127   LearningRate 0.0153   Epoch: 12   Global Step: 203290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:38,445-Speed 9573.81 samples/sec   Loss 4.9089   LearningRate 0.0153   Epoch: 12   Global Step: 203300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:39,547-Speed 9303.00 samples/sec   Loss 4.8043   LearningRate 0.0153   Epoch: 12   Global Step: 203310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:40,636-Speed 9410.51 samples/sec   Loss 4.9145   LearningRate 0.0153   Epoch: 12   Global Step: 203320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:41,738-Speed 9296.08 samples/sec   Loss 4.8442   LearningRate 0.0153   Epoch: 12   Global Step: 203330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:42,846-Speed 9247.85 samples/sec   Loss 4.8060   LearningRate 0.0153   Epoch: 12   Global Step: 203340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:43,969-Speed 9123.42 samples/sec   Loss 4.9674   LearningRate 0.0153   Epoch: 12   Global Step: 203350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:45,083-Speed 9198.87 samples/sec   Loss 4.8853   LearningRate 0.0153   Epoch: 12   Global Step: 203360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:46,217-Speed 9038.13 samples/sec   Loss 4.8575   LearningRate 0.0153   Epoch: 12   Global Step: 203370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:47,318-Speed 9309.70 samples/sec   Loss 4.9158   LearningRate 0.0153   Epoch: 12   Global Step: 203380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:48,441-Speed 9127.66 samples/sec   Loss 4.8996   LearningRate 0.0153   Epoch: 12   Global Step: 203390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:49,541-Speed 9311.20 samples/sec   Loss 4.9220   LearningRate 0.0153   Epoch: 12   Global Step: 203400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:50,648-Speed 9258.74 samples/sec   Loss 4.9666   LearningRate 0.0153   Epoch: 12   Global Step: 203410   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:44:51,717-Speed 9583.89 samples/sec   Loss 4.9228   LearningRate 0.0153   Epoch: 12   Global Step: 203420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:52,807-Speed 9398.58 samples/sec   Loss 4.9709   LearningRate 0.0153   Epoch: 12   Global Step: 203430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:53,930-Speed 9121.44 samples/sec   Loss 4.8648   LearningRate 0.0153   Epoch: 12   Global Step: 203440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:55,001-Speed 9572.40 samples/sec   Loss 4.7941   LearningRate 0.0153   Epoch: 12   Global Step: 203450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:56,091-Speed 9400.18 samples/sec   Loss 4.8552   LearningRate 0.0152   Epoch: 12   Global Step: 203460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:57,177-Speed 9426.60 samples/sec   Loss 4.8806   LearningRate 0.0152   Epoch: 12   Global Step: 203470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:58,275-Speed 9338.16 samples/sec   Loss 4.8433   LearningRate 0.0152   Epoch: 12   Global Step: 203480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:44:59,374-Speed 9315.06 samples/sec   Loss 4.8643   LearningRate 0.0152   Epoch: 12   Global Step: 203490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:00,482-Speed 9254.25 samples/sec   Loss 4.8484   LearningRate 0.0152   Epoch: 12   Global Step: 203500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:01,577-Speed 9356.62 samples/sec   Loss 4.7817   LearningRate 0.0152   Epoch: 12   Global Step: 203510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:02,683-Speed 9267.91 samples/sec   Loss 4.9685   LearningRate 0.0152   Epoch: 12   Global Step: 203520   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:45:03,748-Speed 9620.41 samples/sec   Loss 4.9130   LearningRate 0.0152   Epoch: 12   Global Step: 203530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:04,901-Speed 8883.33 samples/sec   Loss 4.8854   LearningRate 0.0152   Epoch: 12   Global Step: 203540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:06,009-Speed 9246.43 samples/sec   Loss 4.9692   LearningRate 0.0152   Epoch: 12   Global Step: 203550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:07,105-Speed 9349.29 samples/sec   Loss 4.8539   LearningRate 0.0152   Epoch: 12   Global Step: 203560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:08,184-Speed 9494.54 samples/sec   Loss 4.8990   LearningRate 0.0152   Epoch: 12   Global Step: 203570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:09,276-Speed 9384.17 samples/sec   Loss 4.9022   LearningRate 0.0152   Epoch: 12   Global Step: 203580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:10,341-Speed 9616.81 samples/sec   Loss 4.7821   LearningRate 0.0152   Epoch: 12   Global Step: 203590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:11,413-Speed 9566.62 samples/sec   Loss 4.9348   LearningRate 0.0152   Epoch: 12   Global Step: 203600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:12,520-Speed 9249.72 samples/sec   Loss 4.8660   LearningRate 0.0152   Epoch: 12   Global Step: 203610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:13,658-Speed 9006.28 samples/sec   Loss 4.9392   LearningRate 0.0152   Epoch: 12   Global Step: 203620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:14,731-Speed 9544.61 samples/sec   Loss 4.9962   LearningRate 0.0152   Epoch: 12   Global Step: 203630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:15,799-Speed 9597.36 samples/sec   Loss 4.9211   LearningRate 0.0152   Epoch: 12   Global Step: 203640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:16,867-Speed 9591.83 samples/sec   Loss 4.8693   LearningRate 0.0152   Epoch: 12   Global Step: 203650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:17,999-Speed 9058.10 samples/sec   Loss 4.8984   LearningRate 0.0152   Epoch: 12   Global Step: 203660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:19,128-Speed 9071.70 samples/sec   Loss 4.8934   LearningRate 0.0152   Epoch: 12   Global Step: 203670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:20,259-Speed 9061.67 samples/sec   Loss 4.9896   LearningRate 0.0152   Epoch: 12   Global Step: 203680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:21,357-Speed 9333.33 samples/sec   Loss 4.8900   LearningRate 0.0152   Epoch: 12   Global Step: 203690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:22,473-Speed 9177.17 samples/sec   Loss 4.9034   LearningRate 0.0152   Epoch: 12   Global Step: 203700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:23,599-Speed 9101.15 samples/sec   Loss 4.9323   LearningRate 0.0152   Epoch: 12   Global Step: 203710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:24,712-Speed 9205.99 samples/sec   Loss 4.8029   LearningRate 0.0152   Epoch: 12   Global Step: 203720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:25,804-Speed 9385.02 samples/sec   Loss 4.9692   LearningRate 0.0152   Epoch: 12   Global Step: 203730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:26,861-Speed 9690.50 samples/sec   Loss 4.8480   LearningRate 0.0152   Epoch: 12   Global Step: 203740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:27,943-Speed 9473.50 samples/sec   Loss 4.9266   LearningRate 0.0152   Epoch: 12   Global Step: 203750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:29,060-Speed 9166.34 samples/sec   Loss 4.8818   LearningRate 0.0152   Epoch: 12   Global Step: 203760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:30,130-Speed 9587.31 samples/sec   Loss 4.9374   LearningRate 0.0152   Epoch: 12   Global Step: 203770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:31,183-Speed 9724.81 samples/sec   Loss 4.9497   LearningRate 0.0152   Epoch: 12   Global Step: 203780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:32,264-Speed 9480.07 samples/sec   Loss 4.9231   LearningRate 0.0152   Epoch: 12   Global Step: 203790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:33,384-Speed 9147.03 samples/sec   Loss 4.8339   LearningRate 0.0152   Epoch: 12   Global Step: 203800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:34,457-Speed 9553.56 samples/sec   Loss 4.9022   LearningRate 0.0152   Epoch: 12   Global Step: 203810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:35,540-Speed 9454.91 samples/sec   Loss 4.9225   LearningRate 0.0152   Epoch: 12   Global Step: 203820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:36,607-Speed 9604.12 samples/sec   Loss 4.9581   LearningRate 0.0152   Epoch: 12   Global Step: 203830   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:45:37,723-Speed 9177.68 samples/sec   Loss 4.9142   LearningRate 0.0152   Epoch: 12   Global Step: 203840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:38,804-Speed 9483.02 samples/sec   Loss 4.9883   LearningRate 0.0152   Epoch: 12   Global Step: 203850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:39,879-Speed 9535.28 samples/sec   Loss 4.9581   LearningRate 0.0152   Epoch: 12   Global Step: 203860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:40,973-Speed 9361.95 samples/sec   Loss 4.8584   LearningRate 0.0152   Epoch: 12   Global Step: 203870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:42,047-Speed 9548.37 samples/sec   Loss 4.9626   LearningRate 0.0152   Epoch: 12   Global Step: 203880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:43,146-Speed 9319.83 samples/sec   Loss 4.9046   LearningRate 0.0151   Epoch: 12   Global Step: 203890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:44,229-Speed 9463.16 samples/sec   Loss 4.9390   LearningRate 0.0151   Epoch: 12   Global Step: 203900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:45,300-Speed 9566.40 samples/sec   Loss 4.9226   LearningRate 0.0151   Epoch: 12   Global Step: 203910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:46,397-Speed 9334.85 samples/sec   Loss 4.8569   LearningRate 0.0151   Epoch: 12   Global Step: 203920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:47,533-Speed 9023.95 samples/sec   Loss 4.8864   LearningRate 0.0151   Epoch: 12   Global Step: 203930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:48,605-Speed 9560.61 samples/sec   Loss 4.8530   LearningRate 0.0151   Epoch: 12   Global Step: 203940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:49,660-Speed 9710.75 samples/sec   Loss 4.9283   LearningRate 0.0151   Epoch: 12   Global Step: 203950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:50,733-Speed 9554.15 samples/sec   Loss 4.9262   LearningRate 0.0151   Epoch: 12   Global Step: 203960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:51,808-Speed 9528.78 samples/sec   Loss 4.8248   LearningRate 0.0151   Epoch: 12   Global Step: 203970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:52,883-Speed 9529.75 samples/sec   Loss 4.8714   LearningRate 0.0151   Epoch: 12   Global Step: 203980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:54,003-Speed 9155.24 samples/sec   Loss 4.9544   LearningRate 0.0151   Epoch: 12   Global Step: 203990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:45:55,125-Speed 9128.24 samples/sec   Loss 4.9197   LearningRate 0.0151   Epoch: 12   Global Step: 204000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:46:17,180-[lfw][204000]XNorm: 8.552353
Training: 2022-04-11 19:46:17,181-[lfw][204000]Accuracy-Flip: 0.99533+-0.00314
Training: 2022-04-11 19:46:17,181-[lfw][204000]Accuracy-Highest: 0.99683
Training: 2022-04-11 19:46:42,645-[cfp_fp][204000]XNorm: 7.345010
Training: 2022-04-11 19:46:42,645-[cfp_fp][204000]Accuracy-Flip: 0.96443+-0.00853
Training: 2022-04-11 19:46:42,646-[cfp_fp][204000]Accuracy-Highest: 0.96771
Training: 2022-04-11 19:47:04,546-[agedb_30][204000]XNorm: 8.247466
Training: 2022-04-11 19:47:04,547-[agedb_30][204000]Accuracy-Flip: 0.96950+-0.00898
Training: 2022-04-11 19:47:04,547-[agedb_30][204000]Accuracy-Highest: 0.96983
Training: 2022-04-11 19:47:05,634-Speed 145.23 samples/sec   Loss 4.8886   LearningRate 0.0151   Epoch: 12   Global Step: 204010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:06,732-Speed 9332.22 samples/sec   Loss 4.9113   LearningRate 0.0151   Epoch: 12   Global Step: 204020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:07,867-Speed 9028.22 samples/sec   Loss 4.9535   LearningRate 0.0151   Epoch: 12   Global Step: 204030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:08,945-Speed 9503.65 samples/sec   Loss 4.9898   LearningRate 0.0151   Epoch: 12   Global Step: 204040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:10,025-Speed 9485.03 samples/sec   Loss 5.0058   LearningRate 0.0151   Epoch: 12   Global Step: 204050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:11,093-Speed 9593.51 samples/sec   Loss 5.0017   LearningRate 0.0151   Epoch: 12   Global Step: 204060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:12,176-Speed 9467.70 samples/sec   Loss 4.8669   LearningRate 0.0151   Epoch: 12   Global Step: 204070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:13,227-Speed 9748.62 samples/sec   Loss 4.8928   LearningRate 0.0151   Epoch: 12   Global Step: 204080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:14,338-Speed 9220.92 samples/sec   Loss 4.8974   LearningRate 0.0151   Epoch: 12   Global Step: 204090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:15,434-Speed 9350.14 samples/sec   Loss 5.0190   LearningRate 0.0151   Epoch: 12   Global Step: 204100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:16,508-Speed 9542.48 samples/sec   Loss 4.9638   LearningRate 0.0151   Epoch: 12   Global Step: 204110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:17,570-Speed 9653.09 samples/sec   Loss 5.0591   LearningRate 0.0151   Epoch: 12   Global Step: 204120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:18,644-Speed 9539.02 samples/sec   Loss 4.9767   LearningRate 0.0151   Epoch: 12   Global Step: 204130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:19,719-Speed 9536.97 samples/sec   Loss 5.0221   LearningRate 0.0151   Epoch: 12   Global Step: 204140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:20,796-Speed 9515.81 samples/sec   Loss 4.8839   LearningRate 0.0151   Epoch: 12   Global Step: 204150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:21,873-Speed 9510.72 samples/sec   Loss 4.9770   LearningRate 0.0151   Epoch: 12   Global Step: 204160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:22,986-Speed 9210.97 samples/sec   Loss 4.9873   LearningRate 0.0151   Epoch: 12   Global Step: 204170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:24,070-Speed 9451.16 samples/sec   Loss 4.8085   LearningRate 0.0151   Epoch: 12   Global Step: 204180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:25,123-Speed 9730.00 samples/sec   Loss 4.9608   LearningRate 0.0151   Epoch: 12   Global Step: 204190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:26,184-Speed 9650.38 samples/sec   Loss 4.8283   LearningRate 0.0151   Epoch: 12   Global Step: 204200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:27,299-Speed 9195.30 samples/sec   Loss 4.9511   LearningRate 0.0151   Epoch: 12   Global Step: 204210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:28,362-Speed 9636.71 samples/sec   Loss 4.9549   LearningRate 0.0151   Epoch: 12   Global Step: 204220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:29,483-Speed 9139.04 samples/sec   Loss 4.9254   LearningRate 0.0151   Epoch: 12   Global Step: 204230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:30,570-Speed 9432.65 samples/sec   Loss 4.8518   LearningRate 0.0151   Epoch: 12   Global Step: 204240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:31,675-Speed 9269.78 samples/sec   Loss 4.9015   LearningRate 0.0151   Epoch: 12   Global Step: 204250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:32,802-Speed 9094.80 samples/sec   Loss 4.9194   LearningRate 0.0151   Epoch: 12   Global Step: 204260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:33,854-Speed 9739.59 samples/sec   Loss 4.9279   LearningRate 0.0151   Epoch: 12   Global Step: 204270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:34,967-Speed 9210.15 samples/sec   Loss 4.9231   LearningRate 0.0151   Epoch: 12   Global Step: 204280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:36,035-Speed 9594.60 samples/sec   Loss 5.0590   LearningRate 0.0151   Epoch: 12   Global Step: 204290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:37,111-Speed 9520.21 samples/sec   Loss 5.1033   LearningRate 0.0151   Epoch: 12   Global Step: 204300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:38,214-Speed 9286.54 samples/sec   Loss 4.9691   LearningRate 0.0151   Epoch: 12   Global Step: 204310   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:47:39,289-Speed 9537.50 samples/sec   Loss 4.9871   LearningRate 0.0150   Epoch: 12   Global Step: 204320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:40,397-Speed 9241.03 samples/sec   Loss 4.9889   LearningRate 0.0150   Epoch: 12   Global Step: 204330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:41,496-Speed 9329.21 samples/sec   Loss 4.9547   LearningRate 0.0150   Epoch: 12   Global Step: 204340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:42,580-Speed 9453.04 samples/sec   Loss 4.9732   LearningRate 0.0150   Epoch: 12   Global Step: 204350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:43,632-Speed 9739.47 samples/sec   Loss 4.8523   LearningRate 0.0150   Epoch: 12   Global Step: 204360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:44,779-Speed 8930.54 samples/sec   Loss 4.9389   LearningRate 0.0150   Epoch: 12   Global Step: 204370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:45,820-Speed 9841.59 samples/sec   Loss 4.7979   LearningRate 0.0150   Epoch: 12   Global Step: 204380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:46,906-Speed 9442.33 samples/sec   Loss 4.9313   LearningRate 0.0150   Epoch: 12   Global Step: 204390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:47,993-Speed 9421.06 samples/sec   Loss 4.9807   LearningRate 0.0150   Epoch: 12   Global Step: 204400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:49,062-Speed 9591.04 samples/sec   Loss 4.8770   LearningRate 0.0150   Epoch: 12   Global Step: 204410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:50,119-Speed 9695.27 samples/sec   Loss 4.8798   LearningRate 0.0150   Epoch: 12   Global Step: 204420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:51,176-Speed 9695.10 samples/sec   Loss 4.8902   LearningRate 0.0150   Epoch: 12   Global Step: 204430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:52,259-Speed 9458.17 samples/sec   Loss 5.0219   LearningRate 0.0150   Epoch: 12   Global Step: 204440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:53,406-Speed 8932.74 samples/sec   Loss 4.9241   LearningRate 0.0150   Epoch: 12   Global Step: 204450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:47:54,494-Speed 9423.84 samples/sec   Loss 5.0144   LearningRate 0.0150   Epoch: 12   Global Step: 204460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:55,607-Speed 9200.78 samples/sec   Loss 4.9081   LearningRate 0.0150   Epoch: 12   Global Step: 204470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:56,701-Speed 9367.97 samples/sec   Loss 4.9782   LearningRate 0.0150   Epoch: 12   Global Step: 204480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:57,795-Speed 9363.89 samples/sec   Loss 5.0102   LearningRate 0.0150   Epoch: 12   Global Step: 204490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:58,904-Speed 9244.11 samples/sec   Loss 4.9539   LearningRate 0.0150   Epoch: 12   Global Step: 204500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:47:59,971-Speed 9603.62 samples/sec   Loss 5.0146   LearningRate 0.0150   Epoch: 12   Global Step: 204510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:01,057-Speed 9435.60 samples/sec   Loss 4.9430   LearningRate 0.0150   Epoch: 12   Global Step: 204520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:02,137-Speed 9488.53 samples/sec   Loss 5.0838   LearningRate 0.0150   Epoch: 12   Global Step: 204530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:03,241-Speed 9282.96 samples/sec   Loss 4.9412   LearningRate 0.0150   Epoch: 12   Global Step: 204540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:04,363-Speed 9131.53 samples/sec   Loss 4.8827   LearningRate 0.0150   Epoch: 12   Global Step: 204550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:05,434-Speed 9571.93 samples/sec   Loss 4.9978   LearningRate 0.0150   Epoch: 12   Global Step: 204560   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:48:06,524-Speed 9397.28 samples/sec   Loss 4.8949   LearningRate 0.0150   Epoch: 12   Global Step: 204570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:07,647-Speed 9125.01 samples/sec   Loss 4.9385   LearningRate 0.0150   Epoch: 12   Global Step: 204580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:08,740-Speed 9372.27 samples/sec   Loss 5.0092   LearningRate 0.0150   Epoch: 12   Global Step: 204590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:09,908-Speed 8778.96 samples/sec   Loss 4.8922   LearningRate 0.0150   Epoch: 12   Global Step: 204600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:11,000-Speed 9380.39 samples/sec   Loss 4.9376   LearningRate 0.0150   Epoch: 12   Global Step: 204610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:12,080-Speed 9484.51 samples/sec   Loss 4.9507   LearningRate 0.0150   Epoch: 12   Global Step: 204620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:13,161-Speed 9483.85 samples/sec   Loss 4.9241   LearningRate 0.0150   Epoch: 12   Global Step: 204630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:14,238-Speed 9515.78 samples/sec   Loss 4.8861   LearningRate 0.0150   Epoch: 12   Global Step: 204640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:15,333-Speed 9352.31 samples/sec   Loss 4.9133   LearningRate 0.0150   Epoch: 12   Global Step: 204650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:16,388-Speed 9720.59 samples/sec   Loss 4.9268   LearningRate 0.0150   Epoch: 12   Global Step: 204660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:17,467-Speed 9490.58 samples/sec   Loss 5.0745   LearningRate 0.0150   Epoch: 12   Global Step: 204670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:18,564-Speed 9338.04 samples/sec   Loss 4.9976   LearningRate 0.0150   Epoch: 12   Global Step: 204680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:19,646-Speed 9476.11 samples/sec   Loss 4.9464   LearningRate 0.0150   Epoch: 12   Global Step: 204690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:20,729-Speed 9457.50 samples/sec   Loss 5.0118   LearningRate 0.0150   Epoch: 12   Global Step: 204700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:21,802-Speed 9552.13 samples/sec   Loss 4.9775   LearningRate 0.0150   Epoch: 12   Global Step: 204710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:22,883-Speed 9477.49 samples/sec   Loss 5.0085   LearningRate 0.0150   Epoch: 12   Global Step: 204720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:24,009-Speed 9101.04 samples/sec   Loss 4.9109   LearningRate 0.0150   Epoch: 12   Global Step: 204730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:25,100-Speed 9391.49 samples/sec   Loss 4.8804   LearningRate 0.0150   Epoch: 12   Global Step: 204740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:26,168-Speed 9598.89 samples/sec   Loss 5.0233   LearningRate 0.0149   Epoch: 12   Global Step: 204750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:27,233-Speed 9619.68 samples/sec   Loss 4.8672   LearningRate 0.0149   Epoch: 12   Global Step: 204760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:28,372-Speed 8993.99 samples/sec   Loss 4.9115   LearningRate 0.0149   Epoch: 12   Global Step: 204770   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:48:29,439-Speed 9606.27 samples/sec   Loss 4.9425   LearningRate 0.0149   Epoch: 12   Global Step: 204780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:30,509-Speed 9579.14 samples/sec   Loss 4.9391   LearningRate 0.0149   Epoch: 12   Global Step: 204790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:31,611-Speed 9295.50 samples/sec   Loss 4.8395   LearningRate 0.0149   Epoch: 12   Global Step: 204800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:32,701-Speed 9400.69 samples/sec   Loss 4.9039   LearningRate 0.0149   Epoch: 12   Global Step: 204810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:33,795-Speed 9366.53 samples/sec   Loss 5.0250   LearningRate 0.0149   Epoch: 12   Global Step: 204820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:34,912-Speed 9174.36 samples/sec   Loss 4.9871   LearningRate 0.0149   Epoch: 12   Global Step: 204830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:36,027-Speed 9193.00 samples/sec   Loss 4.8719   LearningRate 0.0149   Epoch: 12   Global Step: 204840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:37,141-Speed 9200.14 samples/sec   Loss 4.9570   LearningRate 0.0149   Epoch: 12   Global Step: 204850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:48:38,270-Speed 9077.19 samples/sec   Loss 4.9642   LearningRate 0.0149   Epoch: 12   Global Step: 204860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:48:39,330-Speed 9663.86 samples/sec   Loss 4.9508   LearningRate 0.0149   Epoch: 12   Global Step: 204870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:48:40,379-Speed 9764.40 samples/sec   Loss 4.9120   LearningRate 0.0149   Epoch: 12   Global Step: 204880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:48:41,487-Speed 9249.81 samples/sec   Loss 4.8994   LearningRate 0.0149   Epoch: 12   Global Step: 204890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:48:42,572-Speed 9438.41 samples/sec   Loss 4.9266   LearningRate 0.0149   Epoch: 12   Global Step: 204900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:48:43,665-Speed 9382.38 samples/sec   Loss 4.9552   LearningRate 0.0149   Epoch: 12   Global Step: 204910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:48:44,693-Speed 9967.79 samples/sec   Loss 5.0073   LearningRate 0.0149   Epoch: 12   Global Step: 204920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:48:45,797-Speed 9280.22 samples/sec   Loss 4.9346   LearningRate 0.0149   Epoch: 12   Global Step: 204930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:48:46,848-Speed 9749.32 samples/sec   Loss 5.0120   LearningRate 0.0149   Epoch: 12   Global Step: 204940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:48:47,919-Speed 9566.83 samples/sec   Loss 4.9800   LearningRate 0.0149   Epoch: 12   Global Step: 204950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:49,003-Speed 9449.91 samples/sec   Loss 4.9449   LearningRate 0.0149   Epoch: 12   Global Step: 204960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:50,107-Speed 9281.87 samples/sec   Loss 4.9394   LearningRate 0.0149   Epoch: 12   Global Step: 204970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:51,207-Speed 9321.80 samples/sec   Loss 4.9914   LearningRate 0.0149   Epoch: 12   Global Step: 204980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:52,285-Speed 9503.45 samples/sec   Loss 4.9435   LearningRate 0.0149   Epoch: 12   Global Step: 204990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:53,385-Speed 9321.56 samples/sec   Loss 4.9801   LearningRate 0.0149   Epoch: 12   Global Step: 205000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:54,501-Speed 9179.98 samples/sec   Loss 4.9999   LearningRate 0.0149   Epoch: 12   Global Step: 205010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:55,621-Speed 9148.14 samples/sec   Loss 4.8310   LearningRate 0.0149   Epoch: 12   Global Step: 205020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:56,748-Speed 9098.06 samples/sec   Loss 4.9292   LearningRate 0.0149   Epoch: 12   Global Step: 205030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:57,839-Speed 9389.14 samples/sec   Loss 5.0152   LearningRate 0.0149   Epoch: 12   Global Step: 205040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:48:59,018-Speed 8694.74 samples/sec   Loss 4.9590   LearningRate 0.0149   Epoch: 12   Global Step: 205050   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:49:00,082-Speed 9632.97 samples/sec   Loss 4.9916   LearningRate 0.0149   Epoch: 12   Global Step: 205060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:01,117-Speed 9892.72 samples/sec   Loss 4.9858   LearningRate 0.0149   Epoch: 12   Global Step: 205070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:02,187-Speed 9577.81 samples/sec   Loss 5.0030   LearningRate 0.0149   Epoch: 12   Global Step: 205080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:03,295-Speed 9248.12 samples/sec   Loss 5.0088   LearningRate 0.0149   Epoch: 12   Global Step: 205090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:04,355-Speed 9669.85 samples/sec   Loss 4.9342   LearningRate 0.0149   Epoch: 12   Global Step: 205100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:05,503-Speed 8928.95 samples/sec   Loss 4.9963   LearningRate 0.0149   Epoch: 12   Global Step: 205110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:06,619-Speed 9174.53 samples/sec   Loss 4.9421   LearningRate 0.0149   Epoch: 12   Global Step: 205120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:07,714-Speed 9360.32 samples/sec   Loss 4.9855   LearningRate 0.0149   Epoch: 12   Global Step: 205130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:08,822-Speed 9248.34 samples/sec   Loss 4.9763   LearningRate 0.0149   Epoch: 12   Global Step: 205140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:09,908-Speed 9431.20 samples/sec   Loss 4.9946   LearningRate 0.0149   Epoch: 12   Global Step: 205150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:10,988-Speed 9491.73 samples/sec   Loss 5.0193   LearningRate 0.0149   Epoch: 12   Global Step: 205160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:12,090-Speed 9292.27 samples/sec   Loss 4.9730   LearningRate 0.0149   Epoch: 12   Global Step: 205170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:13,182-Speed 9387.25 samples/sec   Loss 5.0449   LearningRate 0.0149   Epoch: 12   Global Step: 205180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:14,290-Speed 9245.90 samples/sec   Loss 4.9402   LearningRate 0.0148   Epoch: 12   Global Step: 205190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:15,424-Speed 9039.28 samples/sec   Loss 4.9132   LearningRate 0.0148   Epoch: 12   Global Step: 205200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:16,571-Speed 8935.26 samples/sec   Loss 5.0110   LearningRate 0.0148   Epoch: 12   Global Step: 205210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:17,672-Speed 9303.41 samples/sec   Loss 4.9721   LearningRate 0.0148   Epoch: 12   Global Step: 205220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:18,826-Speed 8880.54 samples/sec   Loss 5.1063   LearningRate 0.0148   Epoch: 12   Global Step: 205230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:19,909-Speed 9462.45 samples/sec   Loss 4.9952   LearningRate 0.0148   Epoch: 12   Global Step: 205240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:20,974-Speed 9617.72 samples/sec   Loss 5.0110   LearningRate 0.0148   Epoch: 12   Global Step: 205250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:22,062-Speed 9424.55 samples/sec   Loss 4.9605   LearningRate 0.0148   Epoch: 12   Global Step: 205260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:23,140-Speed 9505.49 samples/sec   Loss 4.9230   LearningRate 0.0148   Epoch: 12   Global Step: 205270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:24,265-Speed 9106.77 samples/sec   Loss 4.9907   LearningRate 0.0148   Epoch: 12   Global Step: 205280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:25,343-Speed 9504.88 samples/sec   Loss 4.9679   LearningRate 0.0148   Epoch: 12   Global Step: 205290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:49:26,427-Speed 9449.81 samples/sec   Loss 4.9564   LearningRate 0.0148   Epoch: 12   Global Step: 205300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:49:27,504-Speed 9515.48 samples/sec   Loss 5.0260   LearningRate 0.0148   Epoch: 12   Global Step: 205310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:49:28,592-Speed 9419.09 samples/sec   Loss 4.8772   LearningRate 0.0148   Epoch: 12   Global Step: 205320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:49:29,688-Speed 9347.30 samples/sec   Loss 4.9327   LearningRate 0.0148   Epoch: 12   Global Step: 205330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:49:30,827-Speed 8994.00 samples/sec   Loss 4.9744   LearningRate 0.0148   Epoch: 12   Global Step: 205340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:49:31,931-Speed 9280.00 samples/sec   Loss 5.0494   LearningRate 0.0148   Epoch: 12   Global Step: 205350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:49:33,027-Speed 9348.40 samples/sec   Loss 4.9086   LearningRate 0.0148   Epoch: 12   Global Step: 205360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:49:34,159-Speed 9052.52 samples/sec   Loss 5.0792   LearningRate 0.0148   Epoch: 12   Global Step: 205370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:49:35,257-Speed 9333.01 samples/sec   Loss 4.9438   LearningRate 0.0148   Epoch: 12   Global Step: 205380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:49:36,385-Speed 9081.69 samples/sec   Loss 4.9494   LearningRate 0.0148   Epoch: 12   Global Step: 205390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:37,473-Speed 9420.32 samples/sec   Loss 4.9149   LearningRate 0.0148   Epoch: 12   Global Step: 205400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:38,556-Speed 9460.88 samples/sec   Loss 4.9249   LearningRate 0.0148   Epoch: 12   Global Step: 205410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:39,674-Speed 9168.40 samples/sec   Loss 4.9164   LearningRate 0.0148   Epoch: 12   Global Step: 205420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:40,783-Speed 9242.06 samples/sec   Loss 5.0254   LearningRate 0.0148   Epoch: 12   Global Step: 205430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:41,934-Speed 8897.22 samples/sec   Loss 5.0616   LearningRate 0.0148   Epoch: 12   Global Step: 205440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:43,025-Speed 9387.66 samples/sec   Loss 5.0051   LearningRate 0.0148   Epoch: 12   Global Step: 205450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:44,108-Speed 9458.77 samples/sec   Loss 4.9207   LearningRate 0.0148   Epoch: 12   Global Step: 205460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:45,208-Speed 9315.80 samples/sec   Loss 4.9718   LearningRate 0.0148   Epoch: 12   Global Step: 205470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:46,347-Speed 9000.47 samples/sec   Loss 4.8872   LearningRate 0.0148   Epoch: 12   Global Step: 205480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:47,464-Speed 9177.96 samples/sec   Loss 4.9893   LearningRate 0.0148   Epoch: 12   Global Step: 205490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:48,581-Speed 9169.35 samples/sec   Loss 4.9505   LearningRate 0.0148   Epoch: 12   Global Step: 205500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:49,682-Speed 9309.11 samples/sec   Loss 5.0116   LearningRate 0.0148   Epoch: 12   Global Step: 205510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:50,793-Speed 9221.03 samples/sec   Loss 4.9689   LearningRate 0.0148   Epoch: 12   Global Step: 205520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:51,851-Speed 9682.72 samples/sec   Loss 4.8841   LearningRate 0.0148   Epoch: 12   Global Step: 205530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:52,964-Speed 9205.41 samples/sec   Loss 4.9900   LearningRate 0.0148   Epoch: 12   Global Step: 205540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:54,076-Speed 9225.68 samples/sec   Loss 5.0566   LearningRate 0.0148   Epoch: 12   Global Step: 205550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:55,169-Speed 9372.30 samples/sec   Loss 5.0673   LearningRate 0.0148   Epoch: 12   Global Step: 205560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:56,290-Speed 9137.68 samples/sec   Loss 4.9909   LearningRate 0.0148   Epoch: 12   Global Step: 205570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:57,372-Speed 9469.65 samples/sec   Loss 4.9428   LearningRate 0.0148   Epoch: 12   Global Step: 205580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:49:58,465-Speed 9369.80 samples/sec   Loss 4.9404   LearningRate 0.0148   Epoch: 12   Global Step: 205590   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:49:59,639-Speed 8729.70 samples/sec   Loss 5.0751   LearningRate 0.0148   Epoch: 12   Global Step: 205600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:00,724-Speed 9442.36 samples/sec   Loss 4.9732   LearningRate 0.0148   Epoch: 12   Global Step: 205610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:01,893-Speed 8769.27 samples/sec   Loss 5.0645   LearningRate 0.0147   Epoch: 12   Global Step: 205620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:02,979-Speed 9434.52 samples/sec   Loss 5.0134   LearningRate 0.0147   Epoch: 12   Global Step: 205630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:04,040-Speed 9656.56 samples/sec   Loss 5.0144   LearningRate 0.0147   Epoch: 12   Global Step: 205640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:05,137-Speed 9339.92 samples/sec   Loss 5.0387   LearningRate 0.0147   Epoch: 12   Global Step: 205650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:06,232-Speed 9361.01 samples/sec   Loss 4.9082   LearningRate 0.0147   Epoch: 12   Global Step: 205660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:07,366-Speed 9035.93 samples/sec   Loss 5.0440   LearningRate 0.0147   Epoch: 12   Global Step: 205670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:08,476-Speed 9233.68 samples/sec   Loss 4.9902   LearningRate 0.0147   Epoch: 12   Global Step: 205680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:09,562-Speed 9437.79 samples/sec   Loss 5.0578   LearningRate 0.0147   Epoch: 12   Global Step: 205690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:10,617-Speed 9711.21 samples/sec   Loss 5.1294   LearningRate 0.0147   Epoch: 12   Global Step: 205700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:11,684-Speed 9600.80 samples/sec   Loss 4.9646   LearningRate 0.0147   Epoch: 12   Global Step: 205710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:12,743-Speed 9676.63 samples/sec   Loss 4.9002   LearningRate 0.0147   Epoch: 12   Global Step: 205720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:13,840-Speed 9341.85 samples/sec   Loss 4.9747   LearningRate 0.0147   Epoch: 12   Global Step: 205730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:14,932-Speed 9383.78 samples/sec   Loss 4.8832   LearningRate 0.0147   Epoch: 12   Global Step: 205740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:16,037-Speed 9272.76 samples/sec   Loss 4.9852   LearningRate 0.0147   Epoch: 12   Global Step: 205750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:17,120-Speed 9463.33 samples/sec   Loss 4.9510   LearningRate 0.0147   Epoch: 12   Global Step: 205760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:18,204-Speed 9458.57 samples/sec   Loss 5.0327   LearningRate 0.0147   Epoch: 12   Global Step: 205770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:19,303-Speed 9321.29 samples/sec   Loss 5.0302   LearningRate 0.0147   Epoch: 12   Global Step: 205780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:20,362-Speed 9671.49 samples/sec   Loss 4.9434   LearningRate 0.0147   Epoch: 12   Global Step: 205790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:50:21,427-Speed 9628.09 samples/sec   Loss 4.8893   LearningRate 0.0147   Epoch: 12   Global Step: 205800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:50:22,493-Speed 9608.63 samples/sec   Loss 5.1033   LearningRate 0.0147   Epoch: 12   Global Step: 205810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:50:23,601-Speed 9252.43 samples/sec   Loss 5.0034   LearningRate 0.0147   Epoch: 12   Global Step: 205820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:50:24,677-Speed 9519.20 samples/sec   Loss 5.0068   LearningRate 0.0147   Epoch: 12   Global Step: 205830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:50:25,778-Speed 9312.38 samples/sec   Loss 5.0606   LearningRate 0.0147   Epoch: 12   Global Step: 205840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:50:26,878-Speed 9308.47 samples/sec   Loss 4.9862   LearningRate 0.0147   Epoch: 12   Global Step: 205850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:50:27,934-Speed 9710.01 samples/sec   Loss 5.0357   LearningRate 0.0147   Epoch: 12   Global Step: 205860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:50:28,983-Speed 9769.00 samples/sec   Loss 4.9760   LearningRate 0.0147   Epoch: 12   Global Step: 205870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:50:30,102-Speed 9154.95 samples/sec   Loss 4.9809   LearningRate 0.0147   Epoch: 12   Global Step: 205880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:50:31,189-Speed 9428.32 samples/sec   Loss 4.9378   LearningRate 0.0147   Epoch: 12   Global Step: 205890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:32,273-Speed 9450.55 samples/sec   Loss 5.0334   LearningRate 0.0147   Epoch: 12   Global Step: 205900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:33,375-Speed 9292.23 samples/sec   Loss 4.9928   LearningRate 0.0147   Epoch: 12   Global Step: 205910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:34,471-Speed 9353.73 samples/sec   Loss 5.0313   LearningRate 0.0147   Epoch: 12   Global Step: 205920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:35,596-Speed 9104.71 samples/sec   Loss 4.9665   LearningRate 0.0147   Epoch: 12   Global Step: 205930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:36,661-Speed 9620.83 samples/sec   Loss 5.0658   LearningRate 0.0147   Epoch: 12   Global Step: 205940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:37,772-Speed 9221.72 samples/sec   Loss 5.0306   LearningRate 0.0147   Epoch: 12   Global Step: 205950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:38,867-Speed 9359.38 samples/sec   Loss 4.9746   LearningRate 0.0147   Epoch: 12   Global Step: 205960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:39,937-Speed 9573.66 samples/sec   Loss 5.0046   LearningRate 0.0147   Epoch: 12   Global Step: 205970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:41,002-Speed 9625.17 samples/sec   Loss 4.8837   LearningRate 0.0147   Epoch: 12   Global Step: 205980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:50:42,140-Speed 9001.97 samples/sec   Loss 5.0358   LearningRate 0.0147   Epoch: 12   Global Step: 205990   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:50:43,218-Speed 9510.08 samples/sec   Loss 4.8676   LearningRate 0.0147   Epoch: 12   Global Step: 206000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:51:05,313-[lfw][206000]XNorm: 8.476825
Training: 2022-04-11 19:51:05,313-[lfw][206000]Accuracy-Flip: 0.99600+-0.00309
Training: 2022-04-11 19:51:05,314-[lfw][206000]Accuracy-Highest: 0.99683
Training: 2022-04-11 19:51:30,535-[cfp_fp][206000]XNorm: 7.316313
Training: 2022-04-11 19:51:30,536-[cfp_fp][206000]Accuracy-Flip: 0.96657+-0.00911
Training: 2022-04-11 19:51:30,536-[cfp_fp][206000]Accuracy-Highest: 0.96771
Training: 2022-04-11 19:51:52,242-[agedb_30][206000]XNorm: 8.179552
Training: 2022-04-11 19:51:52,243-[agedb_30][206000]Accuracy-Flip: 0.96733+-0.00981
Training: 2022-04-11 19:51:52,243-[agedb_30][206000]Accuracy-Highest: 0.96983
Training: 2022-04-11 19:51:53,298-Speed 146.12 samples/sec   Loss 5.0208   LearningRate 0.0147   Epoch: 12   Global Step: 206010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:51:54,416-Speed 9160.47 samples/sec   Loss 5.0577   LearningRate 0.0147   Epoch: 12   Global Step: 206020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:51:55,503-Speed 9425.76 samples/sec   Loss 4.9557   LearningRate 0.0147   Epoch: 12   Global Step: 206030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:51:56,584-Speed 9481.57 samples/sec   Loss 4.9153   LearningRate 0.0147   Epoch: 12   Global Step: 206040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:51:57,660-Speed 9520.68 samples/sec   Loss 5.0015   LearningRate 0.0146   Epoch: 12   Global Step: 206050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:51:58,724-Speed 9633.05 samples/sec   Loss 4.9436   LearningRate 0.0146   Epoch: 12   Global Step: 206060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:51:59,810-Speed 9437.64 samples/sec   Loss 4.9457   LearningRate 0.0146   Epoch: 12   Global Step: 206070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:52:00,905-Speed 9360.36 samples/sec   Loss 4.9370   LearningRate 0.0146   Epoch: 12   Global Step: 206080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:52:01,995-Speed 9396.67 samples/sec   Loss 4.9674   LearningRate 0.0146   Epoch: 12   Global Step: 206090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:52:03,109-Speed 9195.84 samples/sec   Loss 4.9715   LearningRate 0.0146   Epoch: 12   Global Step: 206100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:52:04,187-Speed 9504.92 samples/sec   Loss 5.0040   LearningRate 0.0146   Epoch: 12   Global Step: 206110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:52:05,245-Speed 9680.24 samples/sec   Loss 5.0301   LearningRate 0.0146   Epoch: 12   Global Step: 206120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:52:06,334-Speed 9414.91 samples/sec   Loss 4.9291   LearningRate 0.0146   Epoch: 12   Global Step: 206130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:07,441-Speed 9247.23 samples/sec   Loss 4.9900   LearningRate 0.0146   Epoch: 12   Global Step: 206140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:08,501-Speed 9667.08 samples/sec   Loss 5.1395   LearningRate 0.0146   Epoch: 12   Global Step: 206150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:09,576-Speed 9534.96 samples/sec   Loss 5.0797   LearningRate 0.0146   Epoch: 12   Global Step: 206160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:10,659-Speed 9455.59 samples/sec   Loss 4.9666   LearningRate 0.0146   Epoch: 12   Global Step: 206170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:11,790-Speed 9062.08 samples/sec   Loss 4.9847   LearningRate 0.0146   Epoch: 12   Global Step: 206180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:12,852-Speed 9656.28 samples/sec   Loss 4.9546   LearningRate 0.0146   Epoch: 12   Global Step: 206190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:13,901-Speed 9760.04 samples/sec   Loss 5.0307   LearningRate 0.0146   Epoch: 12   Global Step: 206200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:14,984-Speed 9466.81 samples/sec   Loss 5.0395   LearningRate 0.0146   Epoch: 12   Global Step: 206210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:16,055-Speed 9568.96 samples/sec   Loss 4.9341   LearningRate 0.0146   Epoch: 12   Global Step: 206220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:17,129-Speed 9538.37 samples/sec   Loss 4.9293   LearningRate 0.0146   Epoch: 12   Global Step: 206230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:18,237-Speed 9249.34 samples/sec   Loss 4.9694   LearningRate 0.0146   Epoch: 12   Global Step: 206240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:19,381-Speed 8960.70 samples/sec   Loss 4.9825   LearningRate 0.0146   Epoch: 12   Global Step: 206250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:20,485-Speed 9276.03 samples/sec   Loss 5.0367   LearningRate 0.0146   Epoch: 12   Global Step: 206260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:21,571-Speed 9440.90 samples/sec   Loss 4.9680   LearningRate 0.0146   Epoch: 12   Global Step: 206270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:22,674-Speed 9289.70 samples/sec   Loss 4.8944   LearningRate 0.0146   Epoch: 12   Global Step: 206280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:23,768-Speed 9365.41 samples/sec   Loss 4.9213   LearningRate 0.0146   Epoch: 12   Global Step: 206290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:24,912-Speed 8947.99 samples/sec   Loss 4.8586   LearningRate 0.0146   Epoch: 12   Global Step: 206300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:25,994-Speed 9473.30 samples/sec   Loss 4.9874   LearningRate 0.0146   Epoch: 12   Global Step: 206310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:27,059-Speed 9632.03 samples/sec   Loss 4.9693   LearningRate 0.0146   Epoch: 12   Global Step: 206320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:28,112-Speed 9724.66 samples/sec   Loss 4.9817   LearningRate 0.0146   Epoch: 12   Global Step: 206330   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:52:29,202-Speed 9401.08 samples/sec   Loss 5.0414   LearningRate 0.0146   Epoch: 12   Global Step: 206340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:30,297-Speed 9358.53 samples/sec   Loss 5.0016   LearningRate 0.0146   Epoch: 12   Global Step: 206350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:31,352-Speed 9705.27 samples/sec   Loss 4.9979   LearningRate 0.0146   Epoch: 12   Global Step: 206360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:32,471-Speed 9159.60 samples/sec   Loss 4.9680   LearningRate 0.0146   Epoch: 12   Global Step: 206370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:33,562-Speed 9396.35 samples/sec   Loss 5.0394   LearningRate 0.0146   Epoch: 12   Global Step: 206380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:34,672-Speed 9229.69 samples/sec   Loss 4.9266   LearningRate 0.0146   Epoch: 12   Global Step: 206390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:35,802-Speed 9065.32 samples/sec   Loss 4.9325   LearningRate 0.0146   Epoch: 12   Global Step: 206400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:36,883-Speed 9480.87 samples/sec   Loss 4.9292   LearningRate 0.0146   Epoch: 12   Global Step: 206410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:37,979-Speed 9349.53 samples/sec   Loss 5.0141   LearningRate 0.0146   Epoch: 12   Global Step: 206420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:39,057-Speed 9500.50 samples/sec   Loss 4.9277   LearningRate 0.0146   Epoch: 12   Global Step: 206430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:40,156-Speed 9324.88 samples/sec   Loss 5.0107   LearningRate 0.0146   Epoch: 12   Global Step: 206440   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:52:41,241-Speed 9445.51 samples/sec   Loss 4.9915   LearningRate 0.0146   Epoch: 12   Global Step: 206450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:42,323-Speed 9472.02 samples/sec   Loss 4.9521   LearningRate 0.0146   Epoch: 12   Global Step: 206460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:43,372-Speed 9768.58 samples/sec   Loss 4.9824   LearningRate 0.0146   Epoch: 12   Global Step: 206470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:44,487-Speed 9187.09 samples/sec   Loss 4.8919   LearningRate 0.0146   Epoch: 12   Global Step: 206480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:45,586-Speed 9327.24 samples/sec   Loss 4.9694   LearningRate 0.0145   Epoch: 12   Global Step: 206490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:46,731-Speed 8946.31 samples/sec   Loss 5.0492   LearningRate 0.0145   Epoch: 12   Global Step: 206500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:47,887-Speed 8866.88 samples/sec   Loss 5.0031   LearningRate 0.0145   Epoch: 12   Global Step: 206510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:48,947-Speed 9665.46 samples/sec   Loss 5.0010   LearningRate 0.0145   Epoch: 12   Global Step: 206520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:50,025-Speed 9506.40 samples/sec   Loss 5.0838   LearningRate 0.0145   Epoch: 12   Global Step: 206530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:51,125-Speed 9311.61 samples/sec   Loss 4.9573   LearningRate 0.0145   Epoch: 12   Global Step: 206540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:52,183-Speed 9690.91 samples/sec   Loss 4.9749   LearningRate 0.0145   Epoch: 12   Global Step: 206550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:53,267-Speed 9450.06 samples/sec   Loss 5.0497   LearningRate 0.0145   Epoch: 12   Global Step: 206560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:54,345-Speed 9505.15 samples/sec   Loss 5.0219   LearningRate 0.0145   Epoch: 12   Global Step: 206570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:55,391-Speed 9800.39 samples/sec   Loss 4.9501   LearningRate 0.0145   Epoch: 12   Global Step: 206580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:56,448-Speed 9697.14 samples/sec   Loss 5.0890   LearningRate 0.0145   Epoch: 12   Global Step: 206590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:57,542-Speed 9357.53 samples/sec   Loss 5.0747   LearningRate 0.0145   Epoch: 12   Global Step: 206600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:58,635-Speed 9376.89 samples/sec   Loss 4.9430   LearningRate 0.0145   Epoch: 12   Global Step: 206610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:52:59,682-Speed 9787.59 samples/sec   Loss 4.9499   LearningRate 0.0145   Epoch: 12   Global Step: 206620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:00,743-Speed 9654.47 samples/sec   Loss 5.0534   LearningRate 0.0145   Epoch: 12   Global Step: 206630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:01,824-Speed 9482.00 samples/sec   Loss 4.9999   LearningRate 0.0145   Epoch: 12   Global Step: 206640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:02,939-Speed 9188.53 samples/sec   Loss 4.9722   LearningRate 0.0145   Epoch: 12   Global Step: 206650   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:53:04,003-Speed 9631.87 samples/sec   Loss 5.1238   LearningRate 0.0145   Epoch: 12   Global Step: 206660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:05,059-Speed 9704.32 samples/sec   Loss 5.0437   LearningRate 0.0145   Epoch: 12   Global Step: 206670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:06,171-Speed 9213.74 samples/sec   Loss 4.9015   LearningRate 0.0145   Epoch: 12   Global Step: 206680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:07,220-Speed 9768.57 samples/sec   Loss 5.0185   LearningRate 0.0145   Epoch: 12   Global Step: 206690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:08,334-Speed 9202.80 samples/sec   Loss 5.0324   LearningRate 0.0145   Epoch: 12   Global Step: 206700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:09,413-Speed 9488.52 samples/sec   Loss 5.0601   LearningRate 0.0145   Epoch: 12   Global Step: 206710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:10,518-Speed 9273.77 samples/sec   Loss 5.0719   LearningRate 0.0145   Epoch: 12   Global Step: 206720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:11,611-Speed 9374.79 samples/sec   Loss 5.1183   LearningRate 0.0145   Epoch: 12   Global Step: 206730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:12,750-Speed 8997.87 samples/sec   Loss 4.9824   LearningRate 0.0145   Epoch: 12   Global Step: 206740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:13,811-Speed 9654.25 samples/sec   Loss 5.1023   LearningRate 0.0145   Epoch: 12   Global Step: 206750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:14,869-Speed 9691.17 samples/sec   Loss 5.0097   LearningRate 0.0145   Epoch: 12   Global Step: 206760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:15,990-Speed 9138.59 samples/sec   Loss 5.0502   LearningRate 0.0145   Epoch: 12   Global Step: 206770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:17,107-Speed 9177.37 samples/sec   Loss 4.9774   LearningRate 0.0145   Epoch: 12   Global Step: 206780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:18,231-Speed 9114.45 samples/sec   Loss 5.0411   LearningRate 0.0145   Epoch: 12   Global Step: 206790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:19,322-Speed 9394.34 samples/sec   Loss 4.9636   LearningRate 0.0145   Epoch: 12   Global Step: 206800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:20,420-Speed 9331.40 samples/sec   Loss 5.0228   LearningRate 0.0145   Epoch: 12   Global Step: 206810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:21,518-Speed 9327.24 samples/sec   Loss 5.0312   LearningRate 0.0145   Epoch: 12   Global Step: 206820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:22,633-Speed 9188.21 samples/sec   Loss 5.0386   LearningRate 0.0145   Epoch: 12   Global Step: 206830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:23,757-Speed 9117.13 samples/sec   Loss 5.0323   LearningRate 0.0145   Epoch: 12   Global Step: 206840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:24,822-Speed 9621.55 samples/sec   Loss 5.0272   LearningRate 0.0145   Epoch: 12   Global Step: 206850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:25,916-Speed 9362.80 samples/sec   Loss 4.9638   LearningRate 0.0145   Epoch: 12   Global Step: 206860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:27,020-Speed 9290.83 samples/sec   Loss 5.0683   LearningRate 0.0145   Epoch: 12   Global Step: 206870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:28,171-Speed 8903.39 samples/sec   Loss 5.0195   LearningRate 0.0145   Epoch: 12   Global Step: 206880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:29,268-Speed 9338.03 samples/sec   Loss 4.9737   LearningRate 0.0145   Epoch: 12   Global Step: 206890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:30,342-Speed 9539.71 samples/sec   Loss 5.0109   LearningRate 0.0145   Epoch: 12   Global Step: 206900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:31,426-Speed 9453.67 samples/sec   Loss 5.0137   LearningRate 0.0145   Epoch: 12   Global Step: 206910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:32,552-Speed 9103.27 samples/sec   Loss 5.0812   LearningRate 0.0145   Epoch: 12   Global Step: 206920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:33,627-Speed 9531.49 samples/sec   Loss 4.9378   LearningRate 0.0144   Epoch: 12   Global Step: 206930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:34,713-Speed 9434.39 samples/sec   Loss 5.0400   LearningRate 0.0144   Epoch: 12   Global Step: 206940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:35,799-Speed 9438.13 samples/sec   Loss 5.0100   LearningRate 0.0144   Epoch: 12   Global Step: 206950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:36,866-Speed 9605.70 samples/sec   Loss 4.9817   LearningRate 0.0144   Epoch: 12   Global Step: 206960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:37,991-Speed 9109.49 samples/sec   Loss 5.0491   LearningRate 0.0144   Epoch: 12   Global Step: 206970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:39,109-Speed 9170.72 samples/sec   Loss 5.0109   LearningRate 0.0144   Epoch: 12   Global Step: 206980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:40,173-Speed 9624.69 samples/sec   Loss 4.9755   LearningRate 0.0144   Epoch: 12   Global Step: 206990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:41,249-Speed 9524.45 samples/sec   Loss 5.0665   LearningRate 0.0144   Epoch: 12   Global Step: 207000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:42,348-Speed 9325.70 samples/sec   Loss 5.0309   LearningRate 0.0144   Epoch: 12   Global Step: 207010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:43,460-Speed 9216.71 samples/sec   Loss 5.0327   LearningRate 0.0144   Epoch: 12   Global Step: 207020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:44,567-Speed 9253.38 samples/sec   Loss 5.0859   LearningRate 0.0144   Epoch: 12   Global Step: 207030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:45,667-Speed 9320.38 samples/sec   Loss 5.0037   LearningRate 0.0144   Epoch: 12   Global Step: 207040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:46,757-Speed 9402.51 samples/sec   Loss 5.0257   LearningRate 0.0144   Epoch: 12   Global Step: 207050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:47,842-Speed 9444.08 samples/sec   Loss 5.0553   LearningRate 0.0144   Epoch: 12   Global Step: 207060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:53:48,933-Speed 9389.37 samples/sec   Loss 5.0542   LearningRate 0.0144   Epoch: 12   Global Step: 207070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:50,061-Speed 9084.62 samples/sec   Loss 4.9956   LearningRate 0.0144   Epoch: 12   Global Step: 207080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:51,150-Speed 9409.90 samples/sec   Loss 4.9652   LearningRate 0.0144   Epoch: 12   Global Step: 207090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:52,229-Speed 9497.53 samples/sec   Loss 5.0989   LearningRate 0.0144   Epoch: 12   Global Step: 207100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:53,338-Speed 9238.88 samples/sec   Loss 4.9373   LearningRate 0.0144   Epoch: 12   Global Step: 207110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:54,448-Speed 9227.60 samples/sec   Loss 5.0296   LearningRate 0.0144   Epoch: 12   Global Step: 207120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:55,559-Speed 9227.89 samples/sec   Loss 5.0050   LearningRate 0.0144   Epoch: 12   Global Step: 207130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:56,641-Speed 9469.21 samples/sec   Loss 4.9528   LearningRate 0.0144   Epoch: 12   Global Step: 207140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:57,767-Speed 9104.08 samples/sec   Loss 5.0039   LearningRate 0.0144   Epoch: 12   Global Step: 207150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:53:58,917-Speed 8906.95 samples/sec   Loss 4.9537   LearningRate 0.0144   Epoch: 12   Global Step: 207160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:00,000-Speed 9465.73 samples/sec   Loss 5.0594   LearningRate 0.0144   Epoch: 12   Global Step: 207170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:01,068-Speed 9588.96 samples/sec   Loss 5.0389   LearningRate 0.0144   Epoch: 12   Global Step: 207180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:02,150-Speed 9475.20 samples/sec   Loss 4.9913   LearningRate 0.0144   Epoch: 12   Global Step: 207190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:03,301-Speed 8902.65 samples/sec   Loss 4.9911   LearningRate 0.0144   Epoch: 12   Global Step: 207200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:04,395-Speed 9370.88 samples/sec   Loss 5.0889   LearningRate 0.0144   Epoch: 12   Global Step: 207210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:05,480-Speed 9442.11 samples/sec   Loss 4.9413   LearningRate 0.0144   Epoch: 12   Global Step: 207220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:06,565-Speed 9443.01 samples/sec   Loss 4.9083   LearningRate 0.0144   Epoch: 12   Global Step: 207230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:07,666-Speed 9304.71 samples/sec   Loss 4.9927   LearningRate 0.0144   Epoch: 12   Global Step: 207240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:08,800-Speed 9038.18 samples/sec   Loss 5.0059   LearningRate 0.0144   Epoch: 12   Global Step: 207250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:09,893-Speed 9373.96 samples/sec   Loss 5.0294   LearningRate 0.0144   Epoch: 12   Global Step: 207260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:10,982-Speed 9410.94 samples/sec   Loss 4.9820   LearningRate 0.0144   Epoch: 12   Global Step: 207270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:12,047-Speed 9611.64 samples/sec   Loss 5.1115   LearningRate 0.0144   Epoch: 12   Global Step: 207280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:13,117-Speed 9582.16 samples/sec   Loss 5.0328   LearningRate 0.0144   Epoch: 12   Global Step: 207290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:14,195-Speed 9507.71 samples/sec   Loss 5.1159   LearningRate 0.0144   Epoch: 12   Global Step: 207300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:15,281-Speed 9437.18 samples/sec   Loss 4.9755   LearningRate 0.0144   Epoch: 12   Global Step: 207310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:16,351-Speed 9573.69 samples/sec   Loss 5.0629   LearningRate 0.0144   Epoch: 12   Global Step: 207320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:17,416-Speed 9622.47 samples/sec   Loss 5.0174   LearningRate 0.0144   Epoch: 12   Global Step: 207330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:18,530-Speed 9203.01 samples/sec   Loss 4.9575   LearningRate 0.0144   Epoch: 12   Global Step: 207340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:19,627-Speed 9342.32 samples/sec   Loss 5.0285   LearningRate 0.0144   Epoch: 12   Global Step: 207350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:20,710-Speed 9454.47 samples/sec   Loss 4.9920   LearningRate 0.0144   Epoch: 12   Global Step: 207360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:21,785-Speed 9530.82 samples/sec   Loss 5.0198   LearningRate 0.0143   Epoch: 12   Global Step: 207370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:22,871-Speed 9439.62 samples/sec   Loss 5.0413   LearningRate 0.0143   Epoch: 12   Global Step: 207380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:23,941-Speed 9567.44 samples/sec   Loss 4.9824   LearningRate 0.0143   Epoch: 12   Global Step: 207390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:25,059-Speed 9169.31 samples/sec   Loss 5.0703   LearningRate 0.0143   Epoch: 12   Global Step: 207400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:26,153-Speed 9363.30 samples/sec   Loss 4.9961   LearningRate 0.0143   Epoch: 12   Global Step: 207410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:27,256-Speed 9294.58 samples/sec   Loss 5.0841   LearningRate 0.0143   Epoch: 12   Global Step: 207420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:28,343-Speed 9425.87 samples/sec   Loss 4.9782   LearningRate 0.0143   Epoch: 12   Global Step: 207430   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:54:29,411-Speed 9595.13 samples/sec   Loss 4.9671   LearningRate 0.0143   Epoch: 12   Global Step: 207440   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:54:30,541-Speed 9073.55 samples/sec   Loss 5.0878   LearningRate 0.0143   Epoch: 12   Global Step: 207450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:31,642-Speed 9306.04 samples/sec   Loss 4.9679   LearningRate 0.0143   Epoch: 12   Global Step: 207460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:32,718-Speed 9515.30 samples/sec   Loss 5.0567   LearningRate 0.0143   Epoch: 12   Global Step: 207470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:33,828-Speed 9234.77 samples/sec   Loss 4.9482   LearningRate 0.0143   Epoch: 12   Global Step: 207480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:34,899-Speed 9572.40 samples/sec   Loss 5.0806   LearningRate 0.0143   Epoch: 12   Global Step: 207490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:35,995-Speed 9346.74 samples/sec   Loss 4.9413   LearningRate 0.0143   Epoch: 12   Global Step: 207500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:37,111-Speed 9179.24 samples/sec   Loss 5.0358   LearningRate 0.0143   Epoch: 12   Global Step: 207510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:38,192-Speed 9483.19 samples/sec   Loss 5.0440   LearningRate 0.0143   Epoch: 12   Global Step: 207520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:39,338-Speed 8932.69 samples/sec   Loss 4.9725   LearningRate 0.0143   Epoch: 12   Global Step: 207530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:40,423-Speed 9443.25 samples/sec   Loss 4.9366   LearningRate 0.0143   Epoch: 12   Global Step: 207540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:41,519-Speed 9348.23 samples/sec   Loss 5.0693   LearningRate 0.0143   Epoch: 12   Global Step: 207550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:42,582-Speed 9645.04 samples/sec   Loss 5.0067   LearningRate 0.0143   Epoch: 12   Global Step: 207560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:43,677-Speed 9355.98 samples/sec   Loss 5.0305   LearningRate 0.0143   Epoch: 12   Global Step: 207570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:44,786-Speed 9234.73 samples/sec   Loss 5.1375   LearningRate 0.0143   Epoch: 12   Global Step: 207580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:45,873-Speed 9435.19 samples/sec   Loss 5.0436   LearningRate 0.0143   Epoch: 12   Global Step: 207590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:46,958-Speed 9441.66 samples/sec   Loss 5.0985   LearningRate 0.0143   Epoch: 12   Global Step: 207600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:48,036-Speed 9500.63 samples/sec   Loss 5.0479   LearningRate 0.0143   Epoch: 12   Global Step: 207610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:49,126-Speed 9399.39 samples/sec   Loss 4.9433   LearningRate 0.0143   Epoch: 12   Global Step: 207620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:50,258-Speed 9051.11 samples/sec   Loss 5.0258   LearningRate 0.0143   Epoch: 12   Global Step: 207630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:51,341-Speed 9457.96 samples/sec   Loss 5.0067   LearningRate 0.0143   Epoch: 12   Global Step: 207640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:52,493-Speed 8895.58 samples/sec   Loss 5.1112   LearningRate 0.0143   Epoch: 12   Global Step: 207650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:53,584-Speed 9388.90 samples/sec   Loss 5.0691   LearningRate 0.0143   Epoch: 12   Global Step: 207660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:54,694-Speed 9236.08 samples/sec   Loss 5.0380   LearningRate 0.0143   Epoch: 12   Global Step: 207670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:55,753-Speed 9676.18 samples/sec   Loss 5.0431   LearningRate 0.0143   Epoch: 12   Global Step: 207680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:54:56,836-Speed 9468.60 samples/sec   Loss 5.0405   LearningRate 0.0143   Epoch: 12   Global Step: 207690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:57,939-Speed 9285.90 samples/sec   Loss 5.0831   LearningRate 0.0143   Epoch: 12   Global Step: 207700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:54:59,075-Speed 9022.05 samples/sec   Loss 4.9634   LearningRate 0.0143   Epoch: 12   Global Step: 207710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:55:00,161-Speed 9434.75 samples/sec   Loss 5.0397   LearningRate 0.0143   Epoch: 12   Global Step: 207720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:55:01,259-Speed 9329.43 samples/sec   Loss 5.0148   LearningRate 0.0143   Epoch: 12   Global Step: 207730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:55:02,356-Speed 9344.03 samples/sec   Loss 5.0538   LearningRate 0.0143   Epoch: 12   Global Step: 207740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:55:03,515-Speed 8842.22 samples/sec   Loss 5.0302   LearningRate 0.0143   Epoch: 12   Global Step: 207750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:55:04,659-Speed 8955.69 samples/sec   Loss 5.0478   LearningRate 0.0143   Epoch: 12   Global Step: 207760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:55:05,742-Speed 9465.10 samples/sec   Loss 5.0095   LearningRate 0.0143   Epoch: 12   Global Step: 207770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:55:06,824-Speed 9471.71 samples/sec   Loss 5.1087   LearningRate 0.0143   Epoch: 12   Global Step: 207780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:55:07,896-Speed 9554.53 samples/sec   Loss 4.9614   LearningRate 0.0143   Epoch: 12   Global Step: 207790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:55:08,941-Speed 9809.16 samples/sec   Loss 5.0543   LearningRate 0.0143   Epoch: 12   Global Step: 207800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:55:10,040-Speed 9325.90 samples/sec   Loss 5.0676   LearningRate 0.0142   Epoch: 12   Global Step: 207810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:11,130-Speed 9398.59 samples/sec   Loss 5.0700   LearningRate 0.0142   Epoch: 12   Global Step: 207820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:12,212-Speed 9468.78 samples/sec   Loss 5.1254   LearningRate 0.0142   Epoch: 12   Global Step: 207830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:13,324-Speed 9211.75 samples/sec   Loss 4.9673   LearningRate 0.0142   Epoch: 12   Global Step: 207840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:14,393-Speed 9589.66 samples/sec   Loss 5.0155   LearningRate 0.0142   Epoch: 12   Global Step: 207850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:15,498-Speed 9271.92 samples/sec   Loss 5.0074   LearningRate 0.0142   Epoch: 12   Global Step: 207860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:16,604-Speed 9262.80 samples/sec   Loss 5.0369   LearningRate 0.0142   Epoch: 12   Global Step: 207870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:17,674-Speed 9578.05 samples/sec   Loss 5.0350   LearningRate 0.0142   Epoch: 12   Global Step: 207880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:18,725-Speed 9748.76 samples/sec   Loss 5.1246   LearningRate 0.0142   Epoch: 12   Global Step: 207890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:19,771-Speed 9800.92 samples/sec   Loss 4.9660   LearningRate 0.0142   Epoch: 12   Global Step: 207900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:20,904-Speed 9039.05 samples/sec   Loss 5.0270   LearningRate 0.0142   Epoch: 12   Global Step: 207910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:22,038-Speed 9040.38 samples/sec   Loss 5.0843   LearningRate 0.0142   Epoch: 12   Global Step: 207920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:23,164-Speed 9092.97 samples/sec   Loss 5.0326   LearningRate 0.0142   Epoch: 12   Global Step: 207930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:24,246-Speed 9476.36 samples/sec   Loss 5.0538   LearningRate 0.0142   Epoch: 12   Global Step: 207940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:25,331-Speed 9440.21 samples/sec   Loss 5.0273   LearningRate 0.0142   Epoch: 12   Global Step: 207950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:26,413-Speed 9470.08 samples/sec   Loss 5.0141   LearningRate 0.0142   Epoch: 12   Global Step: 207960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:27,538-Speed 9107.46 samples/sec   Loss 5.0316   LearningRate 0.0142   Epoch: 12   Global Step: 207970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:55:28,666-Speed 9079.81 samples/sec   Loss 4.9445   LearningRate 0.0142   Epoch: 12   Global Step: 207980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:55:29,720-Speed 9727.33 samples/sec   Loss 5.0990   LearningRate 0.0142   Epoch: 12   Global Step: 207990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:55:30,835-Speed 9183.41 samples/sec   Loss 5.0474   LearningRate 0.0142   Epoch: 12   Global Step: 208000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:55:53,356-[lfw][208000]XNorm: 8.578325
Training: 2022-04-11 19:55:53,356-[lfw][208000]Accuracy-Flip: 0.99600+-0.00291
Training: 2022-04-11 19:55:53,356-[lfw][208000]Accuracy-Highest: 0.99683
Training: 2022-04-11 19:56:19,053-[cfp_fp][208000]XNorm: 7.294262
Training: 2022-04-11 19:56:19,054-[cfp_fp][208000]Accuracy-Flip: 0.96571+-0.00932
Training: 2022-04-11 19:56:19,054-[cfp_fp][208000]Accuracy-Highest: 0.96771
Training: 2022-04-11 19:56:41,458-[agedb_30][208000]XNorm: 8.298497
Training: 2022-04-11 19:56:41,459-[agedb_30][208000]Accuracy-Flip: 0.96883+-0.00837
Training: 2022-04-11 19:56:41,459-[agedb_30][208000]Accuracy-Highest: 0.96983
Training: 2022-04-11 19:56:42,534-Speed 142.82 samples/sec   Loss 5.0363   LearningRate 0.0142   Epoch: 12   Global Step: 208010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:56:43,609-Speed 9531.17 samples/sec   Loss 5.0898   LearningRate 0.0142   Epoch: 12   Global Step: 208020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:56:44,664-Speed 9711.31 samples/sec   Loss 4.9662   LearningRate 0.0142   Epoch: 12   Global Step: 208030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:56:45,742-Speed 9501.41 samples/sec   Loss 5.0186   LearningRate 0.0142   Epoch: 12   Global Step: 208040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:56:46,831-Speed 9411.58 samples/sec   Loss 5.0039   LearningRate 0.0142   Epoch: 12   Global Step: 208050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:56:47,902-Speed 9567.72 samples/sec   Loss 4.9874   LearningRate 0.0142   Epoch: 12   Global Step: 208060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:56:49,012-Speed 9236.44 samples/sec   Loss 5.1053   LearningRate 0.0142   Epoch: 12   Global Step: 208070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:56:50,115-Speed 9294.02 samples/sec   Loss 5.0408   LearningRate 0.0142   Epoch: 12   Global Step: 208080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:56:51,220-Speed 9267.09 samples/sec   Loss 5.0833   LearningRate 0.0142   Epoch: 12   Global Step: 208090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:56:52,287-Speed 9606.72 samples/sec   Loss 5.0013   LearningRate 0.0142   Epoch: 12   Global Step: 208100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:56:53,424-Speed 9004.77 samples/sec   Loss 5.0092   LearningRate 0.0142   Epoch: 12   Global Step: 208110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:56:54,506-Speed 9471.34 samples/sec   Loss 5.0405   LearningRate 0.0142   Epoch: 12   Global Step: 208120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:56:55,581-Speed 9540.77 samples/sec   Loss 5.0527   LearningRate 0.0142   Epoch: 12   Global Step: 208130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:56:56,664-Speed 9454.01 samples/sec   Loss 5.0897   LearningRate 0.0142   Epoch: 12   Global Step: 208140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:56:57,787-Speed 9131.63 samples/sec   Loss 4.9546   LearningRate 0.0142   Epoch: 12   Global Step: 208150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:56:58,886-Speed 9322.92 samples/sec   Loss 4.9480   LearningRate 0.0142   Epoch: 12   Global Step: 208160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:56:59,997-Speed 9217.87 samples/sec   Loss 4.9852   LearningRate 0.0142   Epoch: 12   Global Step: 208170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:01,088-Speed 9398.61 samples/sec   Loss 4.9432   LearningRate 0.0142   Epoch: 12   Global Step: 208180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:02,160-Speed 9551.48 samples/sec   Loss 5.0178   LearningRate 0.0142   Epoch: 12   Global Step: 208190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:03,394-Speed 8306.96 samples/sec   Loss 5.0404   LearningRate 0.0142   Epoch: 12   Global Step: 208200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:04,501-Speed 9259.04 samples/sec   Loss 4.9583   LearningRate 0.0142   Epoch: 12   Global Step: 208210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:05,650-Speed 8920.58 samples/sec   Loss 5.0585   LearningRate 0.0142   Epoch: 12   Global Step: 208220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:06,734-Speed 9450.56 samples/sec   Loss 5.0931   LearningRate 0.0142   Epoch: 12   Global Step: 208230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:07,831-Speed 9338.74 samples/sec   Loss 5.0448   LearningRate 0.0142   Epoch: 12   Global Step: 208240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:08,936-Speed 9274.87 samples/sec   Loss 5.0175   LearningRate 0.0141   Epoch: 12   Global Step: 208250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:10,030-Speed 9364.56 samples/sec   Loss 5.0279   LearningRate 0.0141   Epoch: 12   Global Step: 208260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:11,136-Speed 9265.15 samples/sec   Loss 5.0962   LearningRate 0.0141   Epoch: 12   Global Step: 208270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:12,252-Speed 9177.17 samples/sec   Loss 4.9563   LearningRate 0.0141   Epoch: 12   Global Step: 208280   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:57:13,373-Speed 9142.21 samples/sec   Loss 5.0334   LearningRate 0.0141   Epoch: 12   Global Step: 208290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:14,434-Speed 9660.21 samples/sec   Loss 4.9690   LearningRate 0.0141   Epoch: 12   Global Step: 208300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:15,577-Speed 8962.32 samples/sec   Loss 5.0623   LearningRate 0.0141   Epoch: 12   Global Step: 208310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:16,700-Speed 9119.83 samples/sec   Loss 5.0925   LearningRate 0.0141   Epoch: 12   Global Step: 208320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:17,821-Speed 9141.17 samples/sec   Loss 4.9862   LearningRate 0.0141   Epoch: 12   Global Step: 208330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:18,865-Speed 9816.58 samples/sec   Loss 5.0934   LearningRate 0.0141   Epoch: 12   Global Step: 208340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:19,935-Speed 9576.73 samples/sec   Loss 4.9897   LearningRate 0.0141   Epoch: 12   Global Step: 208350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:21,020-Speed 9441.36 samples/sec   Loss 5.0336   LearningRate 0.0141   Epoch: 12   Global Step: 208360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:22,116-Speed 9354.14 samples/sec   Loss 5.0009   LearningRate 0.0141   Epoch: 12   Global Step: 208370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:23,205-Speed 9402.16 samples/sec   Loss 5.0382   LearningRate 0.0141   Epoch: 12   Global Step: 208380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:24,297-Speed 9385.22 samples/sec   Loss 5.0979   LearningRate 0.0141   Epoch: 12   Global Step: 208390   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:57:25,420-Speed 9125.57 samples/sec   Loss 5.1080   LearningRate 0.0141   Epoch: 12   Global Step: 208400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:26,488-Speed 9597.66 samples/sec   Loss 5.0426   LearningRate 0.0141   Epoch: 12   Global Step: 208410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:27,596-Speed 9251.25 samples/sec   Loss 5.0643   LearningRate 0.0141   Epoch: 12   Global Step: 208420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:28,682-Speed 9431.43 samples/sec   Loss 5.0444   LearningRate 0.0141   Epoch: 12   Global Step: 208430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:29,831-Speed 8916.91 samples/sec   Loss 5.0429   LearningRate 0.0141   Epoch: 12   Global Step: 208440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:30,938-Speed 9254.33 samples/sec   Loss 5.0112   LearningRate 0.0141   Epoch: 12   Global Step: 208450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:32,036-Speed 9340.74 samples/sec   Loss 5.0217   LearningRate 0.0141   Epoch: 12   Global Step: 208460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:33,161-Speed 9103.27 samples/sec   Loss 5.0925   LearningRate 0.0141   Epoch: 12   Global Step: 208470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:34,245-Speed 9455.92 samples/sec   Loss 4.9839   LearningRate 0.0141   Epoch: 12   Global Step: 208480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:35,316-Speed 9562.96 samples/sec   Loss 5.0086   LearningRate 0.0141   Epoch: 12   Global Step: 208490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:36,418-Speed 9303.96 samples/sec   Loss 5.0410   LearningRate 0.0141   Epoch: 12   Global Step: 208500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:37,527-Speed 9237.55 samples/sec   Loss 5.0950   LearningRate 0.0141   Epoch: 12   Global Step: 208510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:38,603-Speed 9519.24 samples/sec   Loss 5.0152   LearningRate 0.0141   Epoch: 12   Global Step: 208520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:39,675-Speed 9564.59 samples/sec   Loss 4.9391   LearningRate 0.0141   Epoch: 12   Global Step: 208530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:40,787-Speed 9210.66 samples/sec   Loss 5.0661   LearningRate 0.0141   Epoch: 12   Global Step: 208540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:41,846-Speed 9678.16 samples/sec   Loss 4.9741   LearningRate 0.0141   Epoch: 12   Global Step: 208550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:42,921-Speed 9533.75 samples/sec   Loss 4.9254   LearningRate 0.0141   Epoch: 12   Global Step: 208560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:44,001-Speed 9487.25 samples/sec   Loss 5.1292   LearningRate 0.0141   Epoch: 12   Global Step: 208570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:45,076-Speed 9526.53 samples/sec   Loss 5.0378   LearningRate 0.0141   Epoch: 12   Global Step: 208580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:46,180-Speed 9287.26 samples/sec   Loss 5.1196   LearningRate 0.0141   Epoch: 12   Global Step: 208590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:47,270-Speed 9401.13 samples/sec   Loss 5.0005   LearningRate 0.0141   Epoch: 12   Global Step: 208600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:57:48,351-Speed 9477.16 samples/sec   Loss 4.9982   LearningRate 0.0141   Epoch: 12   Global Step: 208610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:57:49,403-Speed 9743.36 samples/sec   Loss 4.9328   LearningRate 0.0141   Epoch: 12   Global Step: 208620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:57:50,461-Speed 9688.42 samples/sec   Loss 5.0184   LearningRate 0.0141   Epoch: 12   Global Step: 208630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:57:51,548-Speed 9420.97 samples/sec   Loss 4.9679   LearningRate 0.0141   Epoch: 12   Global Step: 208640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:57:52,606-Speed 9689.44 samples/sec   Loss 5.0425   LearningRate 0.0141   Epoch: 12   Global Step: 208650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:57:53,670-Speed 9631.84 samples/sec   Loss 5.0866   LearningRate 0.0141   Epoch: 12   Global Step: 208660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:57:54,766-Speed 9349.51 samples/sec   Loss 5.0305   LearningRate 0.0141   Epoch: 12   Global Step: 208670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:57:55,851-Speed 9438.50 samples/sec   Loss 4.9643   LearningRate 0.0141   Epoch: 12   Global Step: 208680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:57:56,953-Speed 9304.02 samples/sec   Loss 5.1063   LearningRate 0.0141   Epoch: 12   Global Step: 208690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:57:58,044-Speed 9387.18 samples/sec   Loss 5.0678   LearningRate 0.0140   Epoch: 12   Global Step: 208700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:57:59,112-Speed 9596.65 samples/sec   Loss 5.0020   LearningRate 0.0140   Epoch: 12   Global Step: 208710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:00,191-Speed 9494.32 samples/sec   Loss 5.0294   LearningRate 0.0140   Epoch: 12   Global Step: 208720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:01,301-Speed 9229.54 samples/sec   Loss 5.0027   LearningRate 0.0140   Epoch: 12   Global Step: 208730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:02,384-Speed 9461.01 samples/sec   Loss 5.1144   LearningRate 0.0140   Epoch: 12   Global Step: 208740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:03,488-Speed 9286.22 samples/sec   Loss 4.9644   LearningRate 0.0140   Epoch: 12   Global Step: 208750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:04,570-Speed 9473.07 samples/sec   Loss 4.9919   LearningRate 0.0140   Epoch: 12   Global Step: 208760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:05,660-Speed 9397.60 samples/sec   Loss 5.0126   LearningRate 0.0140   Epoch: 12   Global Step: 208770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:06,726-Speed 9613.50 samples/sec   Loss 4.9564   LearningRate 0.0140   Epoch: 12   Global Step: 208780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:07,862-Speed 9013.80 samples/sec   Loss 5.0263   LearningRate 0.0140   Epoch: 12   Global Step: 208790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:08,929-Speed 9606.04 samples/sec   Loss 4.9834   LearningRate 0.0140   Epoch: 12   Global Step: 208800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:10,001-Speed 9558.63 samples/sec   Loss 4.9515   LearningRate 0.0140   Epoch: 12   Global Step: 208810   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:58:11,127-Speed 9095.96 samples/sec   Loss 5.0643   LearningRate 0.0140   Epoch: 12   Global Step: 208820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:12,221-Speed 9361.88 samples/sec   Loss 5.0733   LearningRate 0.0140   Epoch: 12   Global Step: 208830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:13,282-Speed 9658.39 samples/sec   Loss 5.0353   LearningRate 0.0140   Epoch: 12   Global Step: 208840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:14,348-Speed 9613.83 samples/sec   Loss 5.0799   LearningRate 0.0140   Epoch: 12   Global Step: 208850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:15,429-Speed 9475.19 samples/sec   Loss 5.0492   LearningRate 0.0140   Epoch: 12   Global Step: 208860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:16,458-Speed 9966.46 samples/sec   Loss 5.0514   LearningRate 0.0140   Epoch: 12   Global Step: 208870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:17,573-Speed 9191.67 samples/sec   Loss 5.0269   LearningRate 0.0140   Epoch: 12   Global Step: 208880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:18,660-Speed 9418.99 samples/sec   Loss 5.0364   LearningRate 0.0140   Epoch: 12   Global Step: 208890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:19,777-Speed 9174.36 samples/sec   Loss 5.0122   LearningRate 0.0140   Epoch: 12   Global Step: 208900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:20,878-Speed 9308.71 samples/sec   Loss 5.0953   LearningRate 0.0140   Epoch: 12   Global Step: 208910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:21,952-Speed 9540.41 samples/sec   Loss 4.9398   LearningRate 0.0140   Epoch: 12   Global Step: 208920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:23,064-Speed 9215.10 samples/sec   Loss 5.0659   LearningRate 0.0140   Epoch: 12   Global Step: 208930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:24,236-Speed 8739.61 samples/sec   Loss 5.1401   LearningRate 0.0140   Epoch: 12   Global Step: 208940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:25,315-Speed 9497.56 samples/sec   Loss 5.0120   LearningRate 0.0140   Epoch: 12   Global Step: 208950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:26,410-Speed 9359.52 samples/sec   Loss 4.9716   LearningRate 0.0140   Epoch: 12   Global Step: 208960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:27,503-Speed 9372.04 samples/sec   Loss 5.0747   LearningRate 0.0140   Epoch: 12   Global Step: 208970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:28,619-Speed 9186.34 samples/sec   Loss 4.9484   LearningRate 0.0140   Epoch: 12   Global Step: 208980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:29,686-Speed 9601.04 samples/sec   Loss 5.1222   LearningRate 0.0140   Epoch: 12   Global Step: 208990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:30,776-Speed 9402.06 samples/sec   Loss 4.9832   LearningRate 0.0140   Epoch: 12   Global Step: 209000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:31,815-Speed 9857.97 samples/sec   Loss 4.9123   LearningRate 0.0140   Epoch: 12   Global Step: 209010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:32,923-Speed 9255.67 samples/sec   Loss 5.0644   LearningRate 0.0140   Epoch: 12   Global Step: 209020   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:58:34,010-Speed 9425.12 samples/sec   Loss 5.0356   LearningRate 0.0140   Epoch: 12   Global Step: 209030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:35,067-Speed 9687.52 samples/sec   Loss 5.0136   LearningRate 0.0140   Epoch: 12   Global Step: 209040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:36,173-Speed 9268.95 samples/sec   Loss 5.0184   LearningRate 0.0140   Epoch: 12   Global Step: 209050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:37,260-Speed 9423.95 samples/sec   Loss 5.0925   LearningRate 0.0140   Epoch: 12   Global Step: 209060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:38,324-Speed 9633.85 samples/sec   Loss 5.0236   LearningRate 0.0140   Epoch: 12   Global Step: 209070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:39,436-Speed 9218.72 samples/sec   Loss 4.9612   LearningRate 0.0140   Epoch: 12   Global Step: 209080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:40,495-Speed 9671.94 samples/sec   Loss 4.9725   LearningRate 0.0140   Epoch: 12   Global Step: 209090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:41,559-Speed 9629.07 samples/sec   Loss 4.9949   LearningRate 0.0140   Epoch: 12   Global Step: 209100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:42,667-Speed 9251.78 samples/sec   Loss 5.0269   LearningRate 0.0140   Epoch: 12   Global Step: 209110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:43,765-Speed 9330.95 samples/sec   Loss 5.0376   LearningRate 0.0140   Epoch: 12   Global Step: 209120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:44,814-Speed 9767.40 samples/sec   Loss 4.9614   LearningRate 0.0140   Epoch: 12   Global Step: 209130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:45,932-Speed 9165.31 samples/sec   Loss 5.0019   LearningRate 0.0139   Epoch: 12   Global Step: 209140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:47,045-Speed 9205.26 samples/sec   Loss 5.0032   LearningRate 0.0139   Epoch: 12   Global Step: 209150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:48,114-Speed 9588.86 samples/sec   Loss 5.1013   LearningRate 0.0139   Epoch: 12   Global Step: 209160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:49,280-Speed 8784.41 samples/sec   Loss 5.0075   LearningRate 0.0139   Epoch: 12   Global Step: 209170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:50,355-Speed 9538.14 samples/sec   Loss 4.9514   LearningRate 0.0139   Epoch: 12   Global Step: 209180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:51,434-Speed 9497.69 samples/sec   Loss 5.0809   LearningRate 0.0139   Epoch: 12   Global Step: 209190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:52,572-Speed 8998.84 samples/sec   Loss 5.0413   LearningRate 0.0139   Epoch: 12   Global Step: 209200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:53,739-Speed 8783.39 samples/sec   Loss 5.0698   LearningRate 0.0139   Epoch: 12   Global Step: 209210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:54,807-Speed 9596.28 samples/sec   Loss 5.0766   LearningRate 0.0139   Epoch: 12   Global Step: 209220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:55,903-Speed 9350.39 samples/sec   Loss 5.0278   LearningRate 0.0139   Epoch: 12   Global Step: 209230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:56,975-Speed 9557.13 samples/sec   Loss 5.0123   LearningRate 0.0139   Epoch: 12   Global Step: 209240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:58,068-Speed 9374.21 samples/sec   Loss 5.0111   LearningRate 0.0139   Epoch: 12   Global Step: 209250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:58:59,139-Speed 9568.60 samples/sec   Loss 5.0197   LearningRate 0.0139   Epoch: 12   Global Step: 209260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:00,207-Speed 9592.64 samples/sec   Loss 5.0431   LearningRate 0.0139   Epoch: 12   Global Step: 209270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:01,297-Speed 9400.83 samples/sec   Loss 5.0943   LearningRate 0.0139   Epoch: 12   Global Step: 209280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:02,385-Speed 9413.06 samples/sec   Loss 5.0120   LearningRate 0.0139   Epoch: 12   Global Step: 209290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:03,498-Speed 9209.23 samples/sec   Loss 5.1079   LearningRate 0.0139   Epoch: 12   Global Step: 209300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:04,581-Speed 9463.18 samples/sec   Loss 5.1039   LearningRate 0.0139   Epoch: 12   Global Step: 209310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:05,652-Speed 9567.41 samples/sec   Loss 4.9798   LearningRate 0.0139   Epoch: 12   Global Step: 209320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:59:06,725-Speed 9549.43 samples/sec   Loss 5.0115   LearningRate 0.0139   Epoch: 12   Global Step: 209330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:59:07,839-Speed 9192.96 samples/sec   Loss 5.0482   LearningRate 0.0139   Epoch: 12   Global Step: 209340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:59:08,925-Speed 9436.40 samples/sec   Loss 5.0337   LearningRate 0.0139   Epoch: 12   Global Step: 209350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:59:10,088-Speed 8812.74 samples/sec   Loss 5.0557   LearningRate 0.0139   Epoch: 12   Global Step: 209360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:59:11,165-Speed 9519.40 samples/sec   Loss 4.9770   LearningRate 0.0139   Epoch: 12   Global Step: 209370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:59:12,250-Speed 9437.70 samples/sec   Loss 4.9848   LearningRate 0.0139   Epoch: 12   Global Step: 209380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:59:13,397-Speed 8934.53 samples/sec   Loss 5.0400   LearningRate 0.0139   Epoch: 12   Global Step: 209390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:59:14,483-Speed 9435.62 samples/sec   Loss 5.1109   LearningRate 0.0139   Epoch: 12   Global Step: 209400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:59:15,596-Speed 9204.05 samples/sec   Loss 5.0474   LearningRate 0.0139   Epoch: 12   Global Step: 209410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:59:16,704-Speed 9247.32 samples/sec   Loss 5.0655   LearningRate 0.0139   Epoch: 12   Global Step: 209420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:17,809-Speed 9271.98 samples/sec   Loss 5.0401   LearningRate 0.0139   Epoch: 12   Global Step: 209430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:18,893-Speed 9449.16 samples/sec   Loss 5.0086   LearningRate 0.0139   Epoch: 12   Global Step: 209440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:19,991-Speed 9336.31 samples/sec   Loss 5.1058   LearningRate 0.0139   Epoch: 12   Global Step: 209450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:21,106-Speed 9192.68 samples/sec   Loss 4.9710   LearningRate 0.0139   Epoch: 12   Global Step: 209460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:22,167-Speed 9658.30 samples/sec   Loss 4.9598   LearningRate 0.0139   Epoch: 12   Global Step: 209470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:23,295-Speed 9079.51 samples/sec   Loss 5.0305   LearningRate 0.0139   Epoch: 12   Global Step: 209480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:24,393-Speed 9334.69 samples/sec   Loss 5.0916   LearningRate 0.0139   Epoch: 12   Global Step: 209490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:25,507-Speed 9195.95 samples/sec   Loss 5.0907   LearningRate 0.0139   Epoch: 12   Global Step: 209500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:26,587-Speed 9483.94 samples/sec   Loss 4.9739   LearningRate 0.0139   Epoch: 12   Global Step: 209510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:27,725-Speed 9002.52 samples/sec   Loss 5.0556   LearningRate 0.0139   Epoch: 12   Global Step: 209520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:28,836-Speed 9224.05 samples/sec   Loss 4.9825   LearningRate 0.0139   Epoch: 12   Global Step: 209530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:29,976-Speed 8989.08 samples/sec   Loss 5.0735   LearningRate 0.0139   Epoch: 12   Global Step: 209540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:31,066-Speed 9405.26 samples/sec   Loss 5.0003   LearningRate 0.0139   Epoch: 12   Global Step: 209550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:32,207-Speed 8985.10 samples/sec   Loss 5.0848   LearningRate 0.0139   Epoch: 12   Global Step: 209560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:33,318-Speed 9220.44 samples/sec   Loss 5.0831   LearningRate 0.0139   Epoch: 12   Global Step: 209570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:34,374-Speed 9701.13 samples/sec   Loss 5.1063   LearningRate 0.0139   Epoch: 12   Global Step: 209580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:35,457-Speed 9465.80 samples/sec   Loss 5.0106   LearningRate 0.0138   Epoch: 12   Global Step: 209590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:36,507-Speed 9751.98 samples/sec   Loss 5.0146   LearningRate 0.0138   Epoch: 12   Global Step: 209600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:37,600-Speed 9379.44 samples/sec   Loss 5.1950   LearningRate 0.0138   Epoch: 12   Global Step: 209610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:38,670-Speed 9580.03 samples/sec   Loss 5.0550   LearningRate 0.0138   Epoch: 12   Global Step: 209620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:39,771-Speed 9305.73 samples/sec   Loss 5.0320   LearningRate 0.0138   Epoch: 12   Global Step: 209630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:40,864-Speed 9378.40 samples/sec   Loss 5.0361   LearningRate 0.0138   Epoch: 12   Global Step: 209640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:41,953-Speed 9404.13 samples/sec   Loss 4.9733   LearningRate 0.0138   Epoch: 12   Global Step: 209650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:43,047-Speed 9364.27 samples/sec   Loss 5.0006   LearningRate 0.0138   Epoch: 12   Global Step: 209660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:44,118-Speed 9567.23 samples/sec   Loss 5.0224   LearningRate 0.0138   Epoch: 12   Global Step: 209670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:45,217-Speed 9331.42 samples/sec   Loss 5.0834   LearningRate 0.0138   Epoch: 12   Global Step: 209680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:46,302-Speed 9436.69 samples/sec   Loss 5.0293   LearningRate 0.0138   Epoch: 12   Global Step: 209690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:47,394-Speed 9381.96 samples/sec   Loss 4.9463   LearningRate 0.0138   Epoch: 12   Global Step: 209700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:48,471-Speed 9512.94 samples/sec   Loss 5.0350   LearningRate 0.0138   Epoch: 12   Global Step: 209710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:49,647-Speed 8715.16 samples/sec   Loss 4.9772   LearningRate 0.0138   Epoch: 12   Global Step: 209720   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 19:59:50,724-Speed 9534.85 samples/sec   Loss 4.9319   LearningRate 0.0138   Epoch: 12   Global Step: 209730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:51,863-Speed 8995.24 samples/sec   Loss 5.0306   LearningRate 0.0138   Epoch: 12   Global Step: 209740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:53,028-Speed 8788.35 samples/sec   Loss 4.9384   LearningRate 0.0138   Epoch: 12   Global Step: 209750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:54,149-Speed 9139.33 samples/sec   Loss 5.1124   LearningRate 0.0138   Epoch: 12   Global Step: 209760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:55,246-Speed 9342.47 samples/sec   Loss 4.9920   LearningRate 0.0138   Epoch: 12   Global Step: 209770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:56,350-Speed 9287.19 samples/sec   Loss 4.9982   LearningRate 0.0138   Epoch: 12   Global Step: 209780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:57,445-Speed 9358.06 samples/sec   Loss 5.0437   LearningRate 0.0138   Epoch: 12   Global Step: 209790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 19:59:58,529-Speed 9447.98 samples/sec   Loss 5.0683   LearningRate 0.0138   Epoch: 12   Global Step: 209800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 19:59:59,636-Speed 9256.25 samples/sec   Loss 5.0009   LearningRate 0.0138   Epoch: 12   Global Step: 209810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:00:00,712-Speed 9520.82 samples/sec   Loss 5.0009   LearningRate 0.0138   Epoch: 12   Global Step: 209820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:00:01,819-Speed 9256.48 samples/sec   Loss 4.9886   LearningRate 0.0138   Epoch: 12   Global Step: 209830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:00:02,934-Speed 9194.08 samples/sec   Loss 5.0567   LearningRate 0.0138   Epoch: 12   Global Step: 209840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:00:04,006-Speed 9555.66 samples/sec   Loss 5.0533   LearningRate 0.0138   Epoch: 12   Global Step: 209850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:00:05,101-Speed 9362.03 samples/sec   Loss 5.0353   LearningRate 0.0138   Epoch: 12   Global Step: 209860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:00:06,251-Speed 8906.04 samples/sec   Loss 5.0902   LearningRate 0.0138   Epoch: 12   Global Step: 209870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:00:07,343-Speed 9383.93 samples/sec   Loss 4.9892   LearningRate 0.0138   Epoch: 12   Global Step: 209880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:00:08,445-Speed 9296.24 samples/sec   Loss 5.0282   LearningRate 0.0138   Epoch: 12   Global Step: 209890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:00:09,518-Speed 9553.79 samples/sec   Loss 5.0347   LearningRate 0.0138   Epoch: 12   Global Step: 209900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:00:10,576-Speed 9685.21 samples/sec   Loss 5.1065   LearningRate 0.0138   Epoch: 12   Global Step: 209910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:00:11,667-Speed 9391.08 samples/sec   Loss 5.0313   LearningRate 0.0138   Epoch: 12   Global Step: 209920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:00:12,725-Speed 9679.51 samples/sec   Loss 5.1142   LearningRate 0.0138   Epoch: 12   Global Step: 209930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:00:13,818-Speed 9376.10 samples/sec   Loss 4.9715   LearningRate 0.0138   Epoch: 12   Global Step: 209940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:00:14,887-Speed 9582.41 samples/sec   Loss 4.9994   LearningRate 0.0138   Epoch: 12   Global Step: 209950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:00:15,968-Speed 9479.39 samples/sec   Loss 5.0316   LearningRate 0.0138   Epoch: 12   Global Step: 209960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:00:17,087-Speed 9150.74 samples/sec   Loss 5.0756   LearningRate 0.0138   Epoch: 12   Global Step: 209970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:00:18,161-Speed 9546.60 samples/sec   Loss 5.0757   LearningRate 0.0138   Epoch: 12   Global Step: 209980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:00:19,264-Speed 9288.75 samples/sec   Loss 5.0161   LearningRate 0.0138   Epoch: 12   Global Step: 209990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:00:20,399-Speed 9023.00 samples/sec   Loss 5.0189   LearningRate 0.0138   Epoch: 12   Global Step: 210000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:00:42,359-[lfw][210000]XNorm: 8.414431
Training: 2022-04-11 20:00:42,360-[lfw][210000]Accuracy-Flip: 0.99550+-0.00325
Training: 2022-04-11 20:00:42,360-[lfw][210000]Accuracy-Highest: 0.99683
Training: 2022-04-11 20:01:07,857-[cfp_fp][210000]XNorm: 7.211093
Training: 2022-04-11 20:01:07,858-[cfp_fp][210000]Accuracy-Flip: 0.96514+-0.00872
Training: 2022-04-11 20:01:07,858-[cfp_fp][210000]Accuracy-Highest: 0.96771
Training: 2022-04-11 20:01:29,903-[agedb_30][210000]XNorm: 8.184819
Training: 2022-04-11 20:01:29,904-[agedb_30][210000]Accuracy-Flip: 0.96933+-0.00946
Training: 2022-04-11 20:01:29,904-[agedb_30][210000]Accuracy-Highest: 0.96983
Training: 2022-04-11 20:01:30,970-Speed 145.10 samples/sec   Loss 5.0565   LearningRate 0.0138   Epoch: 12   Global Step: 210010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:32,053-Speed 9464.83 samples/sec   Loss 5.1059   LearningRate 0.0138   Epoch: 12   Global Step: 210020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:33,172-Speed 9154.74 samples/sec   Loss 5.0650   LearningRate 0.0138   Epoch: 12   Global Step: 210030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:34,260-Speed 9413.21 samples/sec   Loss 4.9930   LearningRate 0.0137   Epoch: 12   Global Step: 210040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:35,362-Speed 9296.43 samples/sec   Loss 5.0610   LearningRate 0.0137   Epoch: 12   Global Step: 210050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:36,452-Speed 9404.35 samples/sec   Loss 5.0318   LearningRate 0.0137   Epoch: 12   Global Step: 210060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:37,492-Speed 9843.77 samples/sec   Loss 5.0023   LearningRate 0.0137   Epoch: 12   Global Step: 210070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:38,620-Speed 9082.40 samples/sec   Loss 5.0394   LearningRate 0.0137   Epoch: 12   Global Step: 210080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:39,708-Speed 9416.55 samples/sec   Loss 5.1297   LearningRate 0.0137   Epoch: 12   Global Step: 210090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:40,801-Speed 9375.94 samples/sec   Loss 4.9661   LearningRate 0.0137   Epoch: 12   Global Step: 210100   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:01:41,874-Speed 9550.68 samples/sec   Loss 5.0267   LearningRate 0.0137   Epoch: 12   Global Step: 210110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:42,928-Speed 9725.26 samples/sec   Loss 5.0796   LearningRate 0.0137   Epoch: 12   Global Step: 210120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:43,976-Speed 9773.51 samples/sec   Loss 4.9941   LearningRate 0.0137   Epoch: 12   Global Step: 210130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:45,070-Speed 9362.90 samples/sec   Loss 4.9320   LearningRate 0.0137   Epoch: 12   Global Step: 210140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:46,152-Speed 9472.66 samples/sec   Loss 5.0862   LearningRate 0.0137   Epoch: 12   Global Step: 210150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:47,222-Speed 9571.63 samples/sec   Loss 4.9585   LearningRate 0.0137   Epoch: 12   Global Step: 210160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:48,287-Speed 9621.99 samples/sec   Loss 5.1075   LearningRate 0.0137   Epoch: 12   Global Step: 210170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:49,374-Speed 9439.13 samples/sec   Loss 5.0620   LearningRate 0.0137   Epoch: 12   Global Step: 210180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:01:50,440-Speed 9613.75 samples/sec   Loss 5.0382   LearningRate 0.0137   Epoch: 12   Global Step: 210190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:01:51,544-Speed 9280.68 samples/sec   Loss 5.0192   LearningRate 0.0137   Epoch: 12   Global Step: 210200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:01:52,634-Speed 9403.40 samples/sec   Loss 4.9632   LearningRate 0.0137   Epoch: 12   Global Step: 210210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:01:53,730-Speed 9346.66 samples/sec   Loss 4.9984   LearningRate 0.0137   Epoch: 12   Global Step: 210220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:01:54,776-Speed 9789.62 samples/sec   Loss 5.0643   LearningRate 0.0137   Epoch: 12   Global Step: 210230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:01:55,868-Speed 9387.49 samples/sec   Loss 5.0158   LearningRate 0.0137   Epoch: 12   Global Step: 210240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:01:56,983-Speed 9183.66 samples/sec   Loss 5.0960   LearningRate 0.0137   Epoch: 12   Global Step: 210250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:01:58,060-Speed 9519.46 samples/sec   Loss 5.0514   LearningRate 0.0137   Epoch: 12   Global Step: 210260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:01:59,150-Speed 9399.00 samples/sec   Loss 5.0710   LearningRate 0.0137   Epoch: 12   Global Step: 210270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:02:00,242-Speed 9382.63 samples/sec   Loss 5.0090   LearningRate 0.0137   Epoch: 12   Global Step: 210280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:02:01,332-Speed 9395.30 samples/sec   Loss 5.0133   LearningRate 0.0137   Epoch: 12   Global Step: 210290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:02,458-Speed 9113.65 samples/sec   Loss 5.0849   LearningRate 0.0137   Epoch: 12   Global Step: 210300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:03,554-Speed 9347.33 samples/sec   Loss 5.1371   LearningRate 0.0137   Epoch: 12   Global Step: 210310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:04,627-Speed 9545.91 samples/sec   Loss 5.0780   LearningRate 0.0137   Epoch: 12   Global Step: 210320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:05,689-Speed 9651.84 samples/sec   Loss 5.0748   LearningRate 0.0137   Epoch: 12   Global Step: 210330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:06,767-Speed 9498.24 samples/sec   Loss 5.0797   LearningRate 0.0137   Epoch: 12   Global Step: 210340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:07,881-Speed 9201.29 samples/sec   Loss 4.9885   LearningRate 0.0137   Epoch: 12   Global Step: 210350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:08,967-Speed 9439.36 samples/sec   Loss 4.9992   LearningRate 0.0137   Epoch: 12   Global Step: 210360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:10,074-Speed 9251.04 samples/sec   Loss 4.9653   LearningRate 0.0137   Epoch: 12   Global Step: 210370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:11,102-Speed 9965.67 samples/sec   Loss 5.0775   LearningRate 0.0137   Epoch: 12   Global Step: 210380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:12,187-Speed 9446.16 samples/sec   Loss 5.0575   LearningRate 0.0137   Epoch: 12   Global Step: 210390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:13,286-Speed 9323.85 samples/sec   Loss 4.9191   LearningRate 0.0137   Epoch: 12   Global Step: 210400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:14,380-Speed 9365.02 samples/sec   Loss 4.9609   LearningRate 0.0137   Epoch: 12   Global Step: 210410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:15,494-Speed 9197.51 samples/sec   Loss 5.0368   LearningRate 0.0137   Epoch: 12   Global Step: 210420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:16,569-Speed 9530.15 samples/sec   Loss 5.0660   LearningRate 0.0137   Epoch: 12   Global Step: 210430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:17,641-Speed 9561.00 samples/sec   Loss 4.9920   LearningRate 0.0137   Epoch: 12   Global Step: 210440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:18,767-Speed 9099.15 samples/sec   Loss 4.9989   LearningRate 0.0137   Epoch: 12   Global Step: 210450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:19,844-Speed 9518.16 samples/sec   Loss 4.9755   LearningRate 0.0137   Epoch: 12   Global Step: 210460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:20,938-Speed 9367.61 samples/sec   Loss 5.1656   LearningRate 0.0137   Epoch: 12   Global Step: 210470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:22,041-Speed 9283.49 samples/sec   Loss 5.0578   LearningRate 0.0137   Epoch: 12   Global Step: 210480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:23,123-Speed 9473.52 samples/sec   Loss 5.0568   LearningRate 0.0136   Epoch: 12   Global Step: 210490   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:02:24,217-Speed 9366.09 samples/sec   Loss 4.9984   LearningRate 0.0136   Epoch: 12   Global Step: 210500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:25,287-Speed 9575.13 samples/sec   Loss 5.0916   LearningRate 0.0136   Epoch: 12   Global Step: 210510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:26,422-Speed 9024.76 samples/sec   Loss 5.0179   LearningRate 0.0136   Epoch: 12   Global Step: 210520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:27,555-Speed 9049.24 samples/sec   Loss 5.1198   LearningRate 0.0136   Epoch: 12   Global Step: 210530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:28,639-Speed 9457.45 samples/sec   Loss 5.0574   LearningRate 0.0136   Epoch: 12   Global Step: 210540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:29,722-Speed 9460.37 samples/sec   Loss 5.0584   LearningRate 0.0136   Epoch: 12   Global Step: 210550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:30,824-Speed 9300.30 samples/sec   Loss 5.0074   LearningRate 0.0136   Epoch: 12   Global Step: 210560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:31,923-Speed 9325.57 samples/sec   Loss 5.1046   LearningRate 0.0136   Epoch: 12   Global Step: 210570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:33,006-Speed 9455.17 samples/sec   Loss 5.1024   LearningRate 0.0136   Epoch: 12   Global Step: 210580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:34,096-Speed 9403.21 samples/sec   Loss 5.0202   LearningRate 0.0136   Epoch: 12   Global Step: 210590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:35,186-Speed 9404.24 samples/sec   Loss 5.1317   LearningRate 0.0136   Epoch: 12   Global Step: 210600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:36,323-Speed 9006.65 samples/sec   Loss 5.0230   LearningRate 0.0136   Epoch: 12   Global Step: 210610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:37,444-Speed 9140.27 samples/sec   Loss 4.8815   LearningRate 0.0136   Epoch: 12   Global Step: 210620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:38,561-Speed 9172.66 samples/sec   Loss 5.0491   LearningRate 0.0136   Epoch: 12   Global Step: 210630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:39,671-Speed 9235.37 samples/sec   Loss 5.0117   LearningRate 0.0136   Epoch: 12   Global Step: 210640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:40,772-Speed 9307.40 samples/sec   Loss 4.9908   LearningRate 0.0136   Epoch: 12   Global Step: 210650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:41,865-Speed 9372.79 samples/sec   Loss 5.1144   LearningRate 0.0136   Epoch: 12   Global Step: 210660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:42,955-Speed 9397.97 samples/sec   Loss 5.0659   LearningRate 0.0136   Epoch: 12   Global Step: 210670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:44,084-Speed 9079.43 samples/sec   Loss 5.0219   LearningRate 0.0136   Epoch: 12   Global Step: 210680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:45,169-Speed 9443.27 samples/sec   Loss 5.0709   LearningRate 0.0136   Epoch: 12   Global Step: 210690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:46,302-Speed 9043.28 samples/sec   Loss 5.0516   LearningRate 0.0136   Epoch: 12   Global Step: 210700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:47,405-Speed 9292.11 samples/sec   Loss 4.9913   LearningRate 0.0136   Epoch: 12   Global Step: 210710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:48,502-Speed 9336.19 samples/sec   Loss 5.0380   LearningRate 0.0136   Epoch: 12   Global Step: 210720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:49,586-Speed 9456.95 samples/sec   Loss 5.2134   LearningRate 0.0136   Epoch: 12   Global Step: 210730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:50,685-Speed 9323.89 samples/sec   Loss 5.0497   LearningRate 0.0136   Epoch: 12   Global Step: 210740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:51,830-Speed 8947.36 samples/sec   Loss 5.0263   LearningRate 0.0136   Epoch: 12   Global Step: 210750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:52,895-Speed 9616.27 samples/sec   Loss 5.0100   LearningRate 0.0136   Epoch: 12   Global Step: 210760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:54,010-Speed 9192.96 samples/sec   Loss 5.0092   LearningRate 0.0136   Epoch: 12   Global Step: 210770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:55,189-Speed 8687.78 samples/sec   Loss 5.0085   LearningRate 0.0136   Epoch: 12   Global Step: 210780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:56,292-Speed 9290.68 samples/sec   Loss 5.0162   LearningRate 0.0136   Epoch: 12   Global Step: 210790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:57,410-Speed 9167.25 samples/sec   Loss 5.0258   LearningRate 0.0136   Epoch: 12   Global Step: 210800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:58,562-Speed 8889.96 samples/sec   Loss 5.0807   LearningRate 0.0136   Epoch: 12   Global Step: 210810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:02:59,643-Speed 9484.54 samples/sec   Loss 5.1093   LearningRate 0.0136   Epoch: 12   Global Step: 210820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:00,761-Speed 9168.18 samples/sec   Loss 5.1155   LearningRate 0.0136   Epoch: 12   Global Step: 210830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:01,843-Speed 9471.06 samples/sec   Loss 5.0304   LearningRate 0.0136   Epoch: 12   Global Step: 210840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:02,926-Speed 9454.92 samples/sec   Loss 5.1264   LearningRate 0.0136   Epoch: 12   Global Step: 210850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:03,998-Speed 9557.09 samples/sec   Loss 5.0507   LearningRate 0.0136   Epoch: 12   Global Step: 210860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:05,069-Speed 9573.40 samples/sec   Loss 5.0907   LearningRate 0.0136   Epoch: 12   Global Step: 210870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:06,218-Speed 8914.60 samples/sec   Loss 5.0493   LearningRate 0.0136   Epoch: 12   Global Step: 210880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:07,321-Speed 9294.87 samples/sec   Loss 5.0154   LearningRate 0.0136   Epoch: 12   Global Step: 210890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:08,416-Speed 9351.39 samples/sec   Loss 5.1540   LearningRate 0.0136   Epoch: 12   Global Step: 210900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:09,476-Speed 9665.03 samples/sec   Loss 5.0599   LearningRate 0.0136   Epoch: 12   Global Step: 210910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:10,534-Speed 9691.85 samples/sec   Loss 5.1686   LearningRate 0.0136   Epoch: 12   Global Step: 210920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:11,636-Speed 9290.62 samples/sec   Loss 5.1761   LearningRate 0.0136   Epoch: 12   Global Step: 210930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:12,787-Speed 8913.43 samples/sec   Loss 4.9808   LearningRate 0.0135   Epoch: 12   Global Step: 210940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:13,894-Speed 9257.27 samples/sec   Loss 4.9654   LearningRate 0.0135   Epoch: 12   Global Step: 210950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:14,943-Speed 9759.12 samples/sec   Loss 5.0693   LearningRate 0.0135   Epoch: 12   Global Step: 210960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:16,055-Speed 9214.41 samples/sec   Loss 4.9998   LearningRate 0.0135   Epoch: 12   Global Step: 210970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:17,199-Speed 8955.78 samples/sec   Loss 5.0616   LearningRate 0.0135   Epoch: 12   Global Step: 210980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:18,257-Speed 9691.54 samples/sec   Loss 5.1241   LearningRate 0.0135   Epoch: 12   Global Step: 210990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:19,367-Speed 9232.27 samples/sec   Loss 5.0890   LearningRate 0.0135   Epoch: 12   Global Step: 211000   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:03:20,426-Speed 9682.36 samples/sec   Loss 5.0628   LearningRate 0.0135   Epoch: 12   Global Step: 211010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:21,520-Speed 9365.82 samples/sec   Loss 5.0247   LearningRate 0.0135   Epoch: 12   Global Step: 211020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:22,642-Speed 9134.82 samples/sec   Loss 5.0258   LearningRate 0.0135   Epoch: 12   Global Step: 211030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:23,744-Speed 9296.07 samples/sec   Loss 4.9716   LearningRate 0.0135   Epoch: 12   Global Step: 211040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:24,845-Speed 9304.53 samples/sec   Loss 5.0755   LearningRate 0.0135   Epoch: 12   Global Step: 211050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:25,911-Speed 9608.34 samples/sec   Loss 5.0402   LearningRate 0.0135   Epoch: 12   Global Step: 211060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:27,003-Speed 9381.16 samples/sec   Loss 4.9891   LearningRate 0.0135   Epoch: 12   Global Step: 211070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:28,131-Speed 9083.93 samples/sec   Loss 5.0831   LearningRate 0.0135   Epoch: 12   Global Step: 211080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:29,248-Speed 9181.58 samples/sec   Loss 5.0726   LearningRate 0.0135   Epoch: 12   Global Step: 211090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:30,315-Speed 9595.71 samples/sec   Loss 5.1020   LearningRate 0.0135   Epoch: 12   Global Step: 211100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:31,423-Speed 9255.00 samples/sec   Loss 5.0408   LearningRate 0.0135   Epoch: 12   Global Step: 211110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:32,557-Speed 9031.49 samples/sec   Loss 5.0353   LearningRate 0.0135   Epoch: 12   Global Step: 211120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:33,636-Speed 9499.02 samples/sec   Loss 4.9740   LearningRate 0.0135   Epoch: 12   Global Step: 211130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:34,710-Speed 9540.09 samples/sec   Loss 5.0595   LearningRate 0.0135   Epoch: 12   Global Step: 211140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:35,874-Speed 8801.17 samples/sec   Loss 5.0504   LearningRate 0.0135   Epoch: 12   Global Step: 211150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:36,973-Speed 9324.49 samples/sec   Loss 4.9515   LearningRate 0.0135   Epoch: 12   Global Step: 211160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:38,038-Speed 9626.03 samples/sec   Loss 5.0107   LearningRate 0.0135   Epoch: 12   Global Step: 211170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:39,134-Speed 9349.86 samples/sec   Loss 4.9975   LearningRate 0.0135   Epoch: 12   Global Step: 211180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:40,220-Speed 9428.32 samples/sec   Loss 5.0131   LearningRate 0.0135   Epoch: 12   Global Step: 211190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:41,331-Speed 9226.76 samples/sec   Loss 5.0239   LearningRate 0.0135   Epoch: 12   Global Step: 211200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:42,430-Speed 9319.19 samples/sec   Loss 5.0525   LearningRate 0.0135   Epoch: 12   Global Step: 211210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:43,539-Speed 9239.32 samples/sec   Loss 5.0875   LearningRate 0.0135   Epoch: 12   Global Step: 211220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:03:44,632-Speed 9375.22 samples/sec   Loss 4.9834   LearningRate 0.0135   Epoch: 12   Global Step: 211230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:45,747-Speed 9192.46 samples/sec   Loss 5.0040   LearningRate 0.0135   Epoch: 12   Global Step: 211240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:46,853-Speed 9261.92 samples/sec   Loss 5.0393   LearningRate 0.0135   Epoch: 12   Global Step: 211250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:47,924-Speed 9568.39 samples/sec   Loss 5.0743   LearningRate 0.0135   Epoch: 12   Global Step: 211260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:49,066-Speed 8978.25 samples/sec   Loss 4.9549   LearningRate 0.0135   Epoch: 12   Global Step: 211270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:50,167-Speed 9301.14 samples/sec   Loss 5.1097   LearningRate 0.0135   Epoch: 12   Global Step: 211280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:51,376-Speed 8479.24 samples/sec   Loss 4.9839   LearningRate 0.0135   Epoch: 12   Global Step: 211290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:52,436-Speed 9663.81 samples/sec   Loss 5.0400   LearningRate 0.0135   Epoch: 12   Global Step: 211300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:53,522-Speed 9432.14 samples/sec   Loss 5.0871   LearningRate 0.0135   Epoch: 12   Global Step: 211310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:54,584-Speed 9650.88 samples/sec   Loss 5.0563   LearningRate 0.0135   Epoch: 12   Global Step: 211320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:55,660-Speed 9522.05 samples/sec   Loss 5.0063   LearningRate 0.0135   Epoch: 12   Global Step: 211330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:56,846-Speed 8637.08 samples/sec   Loss 5.0475   LearningRate 0.0135   Epoch: 12   Global Step: 211340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:57,915-Speed 9582.76 samples/sec   Loss 5.0196   LearningRate 0.0135   Epoch: 12   Global Step: 211350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:03:58,998-Speed 9461.88 samples/sec   Loss 5.0328   LearningRate 0.0135   Epoch: 12   Global Step: 211360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:00,105-Speed 9261.64 samples/sec   Loss 5.0194   LearningRate 0.0135   Epoch: 12   Global Step: 211370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:01,200-Speed 9352.92 samples/sec   Loss 4.9826   LearningRate 0.0135   Epoch: 12   Global Step: 211380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:02,295-Speed 9356.19 samples/sec   Loss 5.0149   LearningRate 0.0135   Epoch: 12   Global Step: 211390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:03,410-Speed 9192.58 samples/sec   Loss 5.1974   LearningRate 0.0134   Epoch: 12   Global Step: 211400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:04,512-Speed 9297.51 samples/sec   Loss 5.0115   LearningRate 0.0134   Epoch: 12   Global Step: 211410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:05,650-Speed 9001.55 samples/sec   Loss 5.0797   LearningRate 0.0134   Epoch: 12   Global Step: 211420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:06,715-Speed 9616.39 samples/sec   Loss 5.0663   LearningRate 0.0134   Epoch: 12   Global Step: 211430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:07,835-Speed 9157.04 samples/sec   Loss 5.0435   LearningRate 0.0134   Epoch: 12   Global Step: 211440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:08,932-Speed 9337.41 samples/sec   Loss 5.0714   LearningRate 0.0134   Epoch: 12   Global Step: 211450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:10,033-Speed 9308.23 samples/sec   Loss 5.1009   LearningRate 0.0134   Epoch: 12   Global Step: 211460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:11,093-Speed 9669.40 samples/sec   Loss 5.0852   LearningRate 0.0134   Epoch: 12   Global Step: 211470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:04:12,158-Speed 9621.77 samples/sec   Loss 5.0691   LearningRate 0.0134   Epoch: 12   Global Step: 211480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:04:13,233-Speed 9528.60 samples/sec   Loss 4.9898   LearningRate 0.0134   Epoch: 12   Global Step: 211490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:04:14,329-Speed 9354.42 samples/sec   Loss 5.0654   LearningRate 0.0134   Epoch: 12   Global Step: 211500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:04:15,428-Speed 9319.35 samples/sec   Loss 5.1131   LearningRate 0.0134   Epoch: 12   Global Step: 211510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:04:16,514-Speed 9436.40 samples/sec   Loss 5.1274   LearningRate 0.0134   Epoch: 12   Global Step: 211520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:04:17,575-Speed 9648.60 samples/sec   Loss 5.0835   LearningRate 0.0134   Epoch: 12   Global Step: 211530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:04:18,671-Speed 9355.03 samples/sec   Loss 5.0842   LearningRate 0.0134   Epoch: 12   Global Step: 211540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:04:19,779-Speed 9251.56 samples/sec   Loss 4.9933   LearningRate 0.0134   Epoch: 12   Global Step: 211550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:04:20,845-Speed 9612.55 samples/sec   Loss 5.0497   LearningRate 0.0134   Epoch: 12   Global Step: 211560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:04:21,892-Speed 9781.18 samples/sec   Loss 4.9987   LearningRate 0.0134   Epoch: 12   Global Step: 211570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:22,978-Speed 9442.76 samples/sec   Loss 4.9829   LearningRate 0.0134   Epoch: 12   Global Step: 211580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:24,085-Speed 9253.80 samples/sec   Loss 4.9884   LearningRate 0.0134   Epoch: 12   Global Step: 211590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:25,182-Speed 9345.26 samples/sec   Loss 4.9924   LearningRate 0.0134   Epoch: 12   Global Step: 211600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:26,267-Speed 9442.51 samples/sec   Loss 5.0135   LearningRate 0.0134   Epoch: 12   Global Step: 211610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:27,389-Speed 9130.13 samples/sec   Loss 4.9610   LearningRate 0.0134   Epoch: 12   Global Step: 211620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:28,456-Speed 9601.58 samples/sec   Loss 5.0528   LearningRate 0.0134   Epoch: 12   Global Step: 211630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:29,587-Speed 9067.36 samples/sec   Loss 5.0524   LearningRate 0.0134   Epoch: 12   Global Step: 211640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:30,678-Speed 9388.12 samples/sec   Loss 5.0857   LearningRate 0.0134   Epoch: 12   Global Step: 211650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:31,726-Speed 9777.58 samples/sec   Loss 5.0325   LearningRate 0.0134   Epoch: 12   Global Step: 211660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:32,792-Speed 9613.37 samples/sec   Loss 4.9822   LearningRate 0.0134   Epoch: 12   Global Step: 211670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:33,912-Speed 9144.94 samples/sec   Loss 5.0939   LearningRate 0.0134   Epoch: 12   Global Step: 211680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:34,984-Speed 9555.97 samples/sec   Loss 5.0540   LearningRate 0.0134   Epoch: 12   Global Step: 211690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:36,070-Speed 9437.14 samples/sec   Loss 5.0867   LearningRate 0.0134   Epoch: 12   Global Step: 211700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:37,206-Speed 9022.23 samples/sec   Loss 5.0149   LearningRate 0.0134   Epoch: 12   Global Step: 211710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:38,330-Speed 9114.73 samples/sec   Loss 5.0725   LearningRate 0.0134   Epoch: 12   Global Step: 211720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:39,426-Speed 9346.41 samples/sec   Loss 5.0889   LearningRate 0.0134   Epoch: 12   Global Step: 211730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:40,490-Speed 9628.27 samples/sec   Loss 4.9289   LearningRate 0.0134   Epoch: 12   Global Step: 211740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:41,578-Speed 9419.38 samples/sec   Loss 5.1909   LearningRate 0.0134   Epoch: 12   Global Step: 211750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:42,705-Speed 9096.57 samples/sec   Loss 5.0851   LearningRate 0.0134   Epoch: 12   Global Step: 211760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:43,791-Speed 9428.96 samples/sec   Loss 5.0875   LearningRate 0.0134   Epoch: 12   Global Step: 211770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:44,902-Speed 9225.66 samples/sec   Loss 5.0334   LearningRate 0.0134   Epoch: 12   Global Step: 211780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:45,980-Speed 9498.97 samples/sec   Loss 4.9646   LearningRate 0.0134   Epoch: 12   Global Step: 211790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:47,070-Speed 9400.73 samples/sec   Loss 5.0383   LearningRate 0.0134   Epoch: 12   Global Step: 211800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:48,152-Speed 9472.92 samples/sec   Loss 5.0185   LearningRate 0.0134   Epoch: 12   Global Step: 211810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:49,266-Speed 9199.80 samples/sec   Loss 5.0458   LearningRate 0.0134   Epoch: 12   Global Step: 211820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:50,345-Speed 9494.44 samples/sec   Loss 5.0762   LearningRate 0.0134   Epoch: 12   Global Step: 211830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:51,522-Speed 8705.20 samples/sec   Loss 4.9824   LearningRate 0.0134   Epoch: 12   Global Step: 211840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:52,628-Speed 9270.87 samples/sec   Loss 5.1005   LearningRate 0.0134   Epoch: 12   Global Step: 211850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:53,724-Speed 9347.61 samples/sec   Loss 4.9950   LearningRate 0.0133   Epoch: 12   Global Step: 211860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:54,806-Speed 9473.47 samples/sec   Loss 5.0555   LearningRate 0.0133   Epoch: 12   Global Step: 211870   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:04:55,910-Speed 9275.97 samples/sec   Loss 5.0103   LearningRate 0.0133   Epoch: 12   Global Step: 211880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:57,013-Speed 9291.14 samples/sec   Loss 5.0694   LearningRate 0.0133   Epoch: 12   Global Step: 211890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:58,066-Speed 9727.40 samples/sec   Loss 5.0748   LearningRate 0.0133   Epoch: 12   Global Step: 211900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:04:59,172-Speed 9264.67 samples/sec   Loss 5.0275   LearningRate 0.0133   Epoch: 12   Global Step: 211910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:05:00,280-Speed 9246.85 samples/sec   Loss 5.0581   LearningRate 0.0133   Epoch: 12   Global Step: 211920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:05:01,391-Speed 9225.36 samples/sec   Loss 5.0994   LearningRate 0.0133   Epoch: 12   Global Step: 211930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:05:02,476-Speed 9441.65 samples/sec   Loss 5.0751   LearningRate 0.0133   Epoch: 12   Global Step: 211940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:05:03,579-Speed 9287.42 samples/sec   Loss 4.9977   LearningRate 0.0133   Epoch: 12   Global Step: 211950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:05:04,682-Speed 9293.46 samples/sec   Loss 5.0520   LearningRate 0.0133   Epoch: 12   Global Step: 211960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:05:05,756-Speed 9538.72 samples/sec   Loss 5.0468   LearningRate 0.0133   Epoch: 12   Global Step: 211970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:05:06,862-Speed 9267.84 samples/sec   Loss 5.0010   LearningRate 0.0133   Epoch: 12   Global Step: 211980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:05:07,968-Speed 9265.89 samples/sec   Loss 5.0724   LearningRate 0.0133   Epoch: 12   Global Step: 211990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:05:09,040-Speed 9555.91 samples/sec   Loss 5.1244   LearningRate 0.0133   Epoch: 12   Global Step: 212000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:05:30,970-[lfw][212000]XNorm: 8.396976
Training: 2022-04-11 20:05:30,971-[lfw][212000]Accuracy-Flip: 0.99600+-0.00367
Training: 2022-04-11 20:05:30,971-[lfw][212000]Accuracy-Highest: 0.99683
Training: 2022-04-11 20:05:56,314-[cfp_fp][212000]XNorm: 7.223426
Training: 2022-04-11 20:05:56,314-[cfp_fp][212000]Accuracy-Flip: 0.96757+-0.00876
Training: 2022-04-11 20:05:56,315-[cfp_fp][212000]Accuracy-Highest: 0.96771
Training: 2022-04-11 20:06:18,207-[agedb_30][212000]XNorm: 8.128869
Training: 2022-04-11 20:06:18,208-[agedb_30][212000]Accuracy-Flip: 0.96933+-0.00867
Training: 2022-04-11 20:06:18,208-[agedb_30][212000]Accuracy-Highest: 0.96983
Training: 2022-04-11 20:06:19,305-Speed 145.73 samples/sec   Loss 5.0870   LearningRate 0.0133   Epoch: 12   Global Step: 212010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:20,406-Speed 9306.64 samples/sec   Loss 5.0613   LearningRate 0.0133   Epoch: 12   Global Step: 212020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:21,496-Speed 9401.55 samples/sec   Loss 4.9038   LearningRate 0.0133   Epoch: 12   Global Step: 212030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:22,593-Speed 9344.77 samples/sec   Loss 5.1054   LearningRate 0.0133   Epoch: 12   Global Step: 212040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:23,688-Speed 9355.45 samples/sec   Loss 5.1498   LearningRate 0.0133   Epoch: 12   Global Step: 212050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:24,782-Speed 9364.61 samples/sec   Loss 5.0281   LearningRate 0.0133   Epoch: 12   Global Step: 212060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:06:25,954-Speed 8749.36 samples/sec   Loss 5.1288   LearningRate 0.0133   Epoch: 12   Global Step: 212070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:27,037-Speed 9462.57 samples/sec   Loss 5.1157   LearningRate 0.0133   Epoch: 12   Global Step: 212080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:28,136-Speed 9323.45 samples/sec   Loss 5.0484   LearningRate 0.0133   Epoch: 12   Global Step: 212090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:29,193-Speed 9692.85 samples/sec   Loss 5.0375   LearningRate 0.0133   Epoch: 12   Global Step: 212100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:30,339-Speed 8941.86 samples/sec   Loss 5.1613   LearningRate 0.0133   Epoch: 12   Global Step: 212110   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:31,409-Speed 9571.77 samples/sec   Loss 5.0656   LearningRate 0.0133   Epoch: 12   Global Step: 212120   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:32,523-Speed 9204.00 samples/sec   Loss 5.1488   LearningRate 0.0133   Epoch: 12   Global Step: 212130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:33,623-Speed 9311.56 samples/sec   Loss 5.1096   LearningRate 0.0133   Epoch: 12   Global Step: 212140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:34,698-Speed 9528.66 samples/sec   Loss 5.0391   LearningRate 0.0133   Epoch: 12   Global Step: 212150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:35,747-Speed 9771.43 samples/sec   Loss 5.0251   LearningRate 0.0133   Epoch: 12   Global Step: 212160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:36,829-Speed 9463.27 samples/sec   Loss 5.0723   LearningRate 0.0133   Epoch: 12   Global Step: 212170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:06:37,954-Speed 9114.45 samples/sec   Loss 5.0089   LearningRate 0.0133   Epoch: 12   Global Step: 212180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:06:39,059-Speed 9275.27 samples/sec   Loss 5.1068   LearningRate 0.0133   Epoch: 12   Global Step: 212190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:06:40,166-Speed 9251.35 samples/sec   Loss 4.9822   LearningRate 0.0133   Epoch: 12   Global Step: 212200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:06:41,315-Speed 8923.31 samples/sec   Loss 5.0943   LearningRate 0.0133   Epoch: 12   Global Step: 212210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:06:42,400-Speed 9438.25 samples/sec   Loss 5.0897   LearningRate 0.0133   Epoch: 12   Global Step: 212220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:06:43,514-Speed 9197.23 samples/sec   Loss 5.0810   LearningRate 0.0133   Epoch: 12   Global Step: 212230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:06:44,630-Speed 9184.39 samples/sec   Loss 5.0735   LearningRate 0.0133   Epoch: 12   Global Step: 212240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:06:45,738-Speed 9247.14 samples/sec   Loss 5.0521   LearningRate 0.0133   Epoch: 12   Global Step: 212250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:06:46,843-Speed 9277.41 samples/sec   Loss 4.9621   LearningRate 0.0133   Epoch: 12   Global Step: 212260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:47,911-Speed 9596.24 samples/sec   Loss 5.0449   LearningRate 0.0133   Epoch: 12   Global Step: 212270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:48,967-Speed 9704.25 samples/sec   Loss 4.9972   LearningRate 0.0133   Epoch: 12   Global Step: 212280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:50,045-Speed 9501.80 samples/sec   Loss 4.9712   LearningRate 0.0133   Epoch: 12   Global Step: 212290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:51,126-Speed 9476.91 samples/sec   Loss 4.9907   LearningRate 0.0133   Epoch: 12   Global Step: 212300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:52,167-Speed 9840.59 samples/sec   Loss 5.0687   LearningRate 0.0132   Epoch: 12   Global Step: 212310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:53,249-Speed 9475.40 samples/sec   Loss 5.0729   LearningRate 0.0132   Epoch: 12   Global Step: 212320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:54,348-Speed 9323.44 samples/sec   Loss 5.0131   LearningRate 0.0132   Epoch: 12   Global Step: 212330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:55,452-Speed 9288.45 samples/sec   Loss 5.0927   LearningRate 0.0132   Epoch: 12   Global Step: 212340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:56,599-Speed 8932.35 samples/sec   Loss 5.0218   LearningRate 0.0132   Epoch: 12   Global Step: 212350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:06:57,694-Speed 9363.53 samples/sec   Loss 5.0417   LearningRate 0.0132   Epoch: 12   Global Step: 212360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:06:58,765-Speed 9566.00 samples/sec   Loss 5.0693   LearningRate 0.0132   Epoch: 12   Global Step: 212370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:06:59,834-Speed 9577.38 samples/sec   Loss 5.0292   LearningRate 0.0132   Epoch: 12   Global Step: 212380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:00,905-Speed 9568.32 samples/sec   Loss 5.0478   LearningRate 0.0132   Epoch: 12   Global Step: 212390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:01,983-Speed 9503.17 samples/sec   Loss 5.1294   LearningRate 0.0132   Epoch: 12   Global Step: 212400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:03,137-Speed 8881.94 samples/sec   Loss 5.0759   LearningRate 0.0132   Epoch: 12   Global Step: 212410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:04,249-Speed 9214.80 samples/sec   Loss 4.9399   LearningRate 0.0132   Epoch: 12   Global Step: 212420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:05,326-Speed 9516.66 samples/sec   Loss 5.1002   LearningRate 0.0132   Epoch: 12   Global Step: 212430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:06,390-Speed 9639.71 samples/sec   Loss 5.0356   LearningRate 0.0132   Epoch: 12   Global Step: 212440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:07,506-Speed 9180.35 samples/sec   Loss 5.0779   LearningRate 0.0132   Epoch: 12   Global Step: 212450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:08,586-Speed 9487.99 samples/sec   Loss 5.0154   LearningRate 0.0132   Epoch: 12   Global Step: 212460   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:07:09,639-Speed 9727.13 samples/sec   Loss 5.0641   LearningRate 0.0132   Epoch: 12   Global Step: 212470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:10,747-Speed 9249.98 samples/sec   Loss 5.0491   LearningRate 0.0132   Epoch: 12   Global Step: 212480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:11,873-Speed 9097.44 samples/sec   Loss 5.0391   LearningRate 0.0132   Epoch: 12   Global Step: 212490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:12,954-Speed 9478.84 samples/sec   Loss 5.0512   LearningRate 0.0132   Epoch: 12   Global Step: 212500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:14,074-Speed 9151.34 samples/sec   Loss 5.0255   LearningRate 0.0132   Epoch: 12   Global Step: 212510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:15,160-Speed 9435.38 samples/sec   Loss 5.0954   LearningRate 0.0132   Epoch: 12   Global Step: 212520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:16,251-Speed 9388.03 samples/sec   Loss 5.0164   LearningRate 0.0132   Epoch: 12   Global Step: 212530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:17,372-Speed 9139.20 samples/sec   Loss 5.0707   LearningRate 0.0132   Epoch: 12   Global Step: 212540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:18,440-Speed 9599.39 samples/sec   Loss 5.1742   LearningRate 0.0132   Epoch: 12   Global Step: 212550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:19,515-Speed 9529.99 samples/sec   Loss 5.0610   LearningRate 0.0132   Epoch: 12   Global Step: 212560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:20,558-Speed 9821.03 samples/sec   Loss 5.1344   LearningRate 0.0132   Epoch: 12   Global Step: 212570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:21,600-Speed 9837.69 samples/sec   Loss 5.0646   LearningRate 0.0132   Epoch: 12   Global Step: 212580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:22,666-Speed 9607.24 samples/sec   Loss 5.0762   LearningRate 0.0132   Epoch: 12   Global Step: 212590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:23,780-Speed 9198.07 samples/sec   Loss 5.0925   LearningRate 0.0132   Epoch: 12   Global Step: 212600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:24,848-Speed 9605.43 samples/sec   Loss 5.0434   LearningRate 0.0132   Epoch: 12   Global Step: 212610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:07:25,913-Speed 9621.70 samples/sec   Loss 4.9966   LearningRate 0.0132   Epoch: 12   Global Step: 212620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:07:26,979-Speed 9606.83 samples/sec   Loss 4.9697   LearningRate 0.0132   Epoch: 12   Global Step: 212630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:07:28,095-Speed 9185.70 samples/sec   Loss 4.9188   LearningRate 0.0132   Epoch: 12   Global Step: 212640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:07:29,262-Speed 8777.67 samples/sec   Loss 5.0349   LearningRate 0.0132   Epoch: 12   Global Step: 212650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:07:30,349-Speed 9430.71 samples/sec   Loss 4.9637   LearningRate 0.0132   Epoch: 12   Global Step: 212660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:07:31,420-Speed 9563.63 samples/sec   Loss 5.0517   LearningRate 0.0132   Epoch: 12   Global Step: 212670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:07:32,472-Speed 9741.68 samples/sec   Loss 5.0291   LearningRate 0.0132   Epoch: 12   Global Step: 212680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:07:33,608-Speed 9022.30 samples/sec   Loss 5.0345   LearningRate 0.0132   Epoch: 12   Global Step: 212690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:07:34,722-Speed 9191.63 samples/sec   Loss 5.0104   LearningRate 0.0132   Epoch: 12   Global Step: 212700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:07:35,819-Speed 9345.29 samples/sec   Loss 5.0979   LearningRate 0.0132   Epoch: 12   Global Step: 212710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:36,887-Speed 9594.82 samples/sec   Loss 5.0917   LearningRate 0.0132   Epoch: 12   Global Step: 212720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:37,937-Speed 9753.28 samples/sec   Loss 5.0851   LearningRate 0.0132   Epoch: 12   Global Step: 212730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:39,050-Speed 9206.44 samples/sec   Loss 5.1089   LearningRate 0.0132   Epoch: 12   Global Step: 212740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:40,111-Speed 9663.98 samples/sec   Loss 5.1569   LearningRate 0.0132   Epoch: 12   Global Step: 212750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:41,166-Speed 9707.60 samples/sec   Loss 5.1114   LearningRate 0.0132   Epoch: 12   Global Step: 212760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:42,270-Speed 9277.75 samples/sec   Loss 5.0493   LearningRate 0.0131   Epoch: 12   Global Step: 212770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:43,374-Speed 9284.20 samples/sec   Loss 5.0269   LearningRate 0.0131   Epoch: 12   Global Step: 212780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:44,421-Speed 9783.95 samples/sec   Loss 5.0113   LearningRate 0.0131   Epoch: 12   Global Step: 212790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:45,577-Speed 8865.37 samples/sec   Loss 5.0845   LearningRate 0.0131   Epoch: 12   Global Step: 212800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:46,638-Speed 9661.12 samples/sec   Loss 5.0358   LearningRate 0.0131   Epoch: 12   Global Step: 212810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:47,747-Speed 9243.52 samples/sec   Loss 5.0022   LearningRate 0.0131   Epoch: 12   Global Step: 212820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:48,894-Speed 8927.37 samples/sec   Loss 5.0008   LearningRate 0.0131   Epoch: 12   Global Step: 212830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:49,978-Speed 9455.44 samples/sec   Loss 5.0790   LearningRate 0.0131   Epoch: 12   Global Step: 212840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:51,060-Speed 9473.11 samples/sec   Loss 5.0583   LearningRate 0.0131   Epoch: 12   Global Step: 212850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:52,186-Speed 9094.81 samples/sec   Loss 5.0640   LearningRate 0.0131   Epoch: 12   Global Step: 212860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:53,297-Speed 9230.77 samples/sec   Loss 5.0233   LearningRate 0.0131   Epoch: 12   Global Step: 212870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:54,387-Speed 9402.78 samples/sec   Loss 5.0087   LearningRate 0.0131   Epoch: 12   Global Step: 212880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:55,460-Speed 9547.54 samples/sec   Loss 5.0302   LearningRate 0.0131   Epoch: 12   Global Step: 212890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:56,576-Speed 9183.56 samples/sec   Loss 4.9676   LearningRate 0.0131   Epoch: 12   Global Step: 212900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:57,652-Speed 9524.40 samples/sec   Loss 5.0521   LearningRate 0.0131   Epoch: 12   Global Step: 212910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:58,736-Speed 9447.27 samples/sec   Loss 5.1754   LearningRate 0.0131   Epoch: 12   Global Step: 212920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:07:59,809-Speed 9552.29 samples/sec   Loss 5.0384   LearningRate 0.0131   Epoch: 12   Global Step: 212930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:00,884-Speed 9533.60 samples/sec   Loss 5.0427   LearningRate 0.0131   Epoch: 12   Global Step: 212940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:01,955-Speed 9563.88 samples/sec   Loss 5.1467   LearningRate 0.0131   Epoch: 12   Global Step: 212950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:03,088-Speed 9044.27 samples/sec   Loss 5.0263   LearningRate 0.0131   Epoch: 12   Global Step: 212960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:04,189-Speed 9307.27 samples/sec   Loss 5.0638   LearningRate 0.0131   Epoch: 12   Global Step: 212970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:05,251-Speed 9651.74 samples/sec   Loss 5.1351   LearningRate 0.0131   Epoch: 12   Global Step: 212980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:06,350-Speed 9322.87 samples/sec   Loss 5.0229   LearningRate 0.0131   Epoch: 12   Global Step: 212990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:07,417-Speed 9599.76 samples/sec   Loss 5.1103   LearningRate 0.0131   Epoch: 12   Global Step: 213000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:08,513-Speed 9351.44 samples/sec   Loss 4.9674   LearningRate 0.0131   Epoch: 12   Global Step: 213010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:09,572-Speed 9674.86 samples/sec   Loss 5.0784   LearningRate 0.0131   Epoch: 12   Global Step: 213020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:10,643-Speed 9568.80 samples/sec   Loss 5.0938   LearningRate 0.0131   Epoch: 12   Global Step: 213030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:11,733-Speed 9398.62 samples/sec   Loss 5.0520   LearningRate 0.0131   Epoch: 12   Global Step: 213040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:12,843-Speed 9237.14 samples/sec   Loss 5.0862   LearningRate 0.0131   Epoch: 12   Global Step: 213050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:13,961-Speed 9161.17 samples/sec   Loss 4.9996   LearningRate 0.0131   Epoch: 12   Global Step: 213060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:15,057-Speed 9352.02 samples/sec   Loss 5.0328   LearningRate 0.0131   Epoch: 12   Global Step: 213070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:16,165-Speed 9250.80 samples/sec   Loss 5.1764   LearningRate 0.0131   Epoch: 12   Global Step: 213080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:17,276-Speed 9220.61 samples/sec   Loss 5.0403   LearningRate 0.0131   Epoch: 12   Global Step: 213090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:18,361-Speed 9447.46 samples/sec   Loss 5.0104   LearningRate 0.0131   Epoch: 12   Global Step: 213100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:19,437-Speed 9525.13 samples/sec   Loss 5.0591   LearningRate 0.0131   Epoch: 12   Global Step: 213110   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:08:20,583-Speed 8936.58 samples/sec   Loss 4.9975   LearningRate 0.0131   Epoch: 12   Global Step: 213120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:21,691-Speed 9251.93 samples/sec   Loss 5.0731   LearningRate 0.0131   Epoch: 12   Global Step: 213130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:22,784-Speed 9373.76 samples/sec   Loss 4.9599   LearningRate 0.0131   Epoch: 12   Global Step: 213140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:23,892-Speed 9244.00 samples/sec   Loss 5.0204   LearningRate 0.0131   Epoch: 12   Global Step: 213150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:24,974-Speed 9469.08 samples/sec   Loss 5.1253   LearningRate 0.0131   Epoch: 12   Global Step: 213160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:26,046-Speed 9561.52 samples/sec   Loss 5.0895   LearningRate 0.0131   Epoch: 12   Global Step: 213170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:27,106-Speed 9665.29 samples/sec   Loss 4.9714   LearningRate 0.0131   Epoch: 12   Global Step: 213180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:28,219-Speed 9208.16 samples/sec   Loss 5.0616   LearningRate 0.0131   Epoch: 12   Global Step: 213190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:29,328-Speed 9242.89 samples/sec   Loss 4.9819   LearningRate 0.0131   Epoch: 12   Global Step: 213200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:30,408-Speed 9486.81 samples/sec   Loss 4.9815   LearningRate 0.0131   Epoch: 12   Global Step: 213210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:31,516-Speed 9248.22 samples/sec   Loss 4.9435   LearningRate 0.0131   Epoch: 12   Global Step: 213220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:32,620-Speed 9277.99 samples/sec   Loss 5.0810   LearningRate 0.0130   Epoch: 12   Global Step: 213230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:33,713-Speed 9378.98 samples/sec   Loss 5.0487   LearningRate 0.0130   Epoch: 12   Global Step: 213240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:34,790-Speed 9513.31 samples/sec   Loss 5.0235   LearningRate 0.0130   Epoch: 12   Global Step: 213250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:35,863-Speed 9543.79 samples/sec   Loss 5.0003   LearningRate 0.0130   Epoch: 12   Global Step: 213260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:36,971-Speed 9248.01 samples/sec   Loss 4.9381   LearningRate 0.0130   Epoch: 12   Global Step: 213270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:38,061-Speed 9403.84 samples/sec   Loss 5.0404   LearningRate 0.0130   Epoch: 12   Global Step: 213280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:39,123-Speed 9647.90 samples/sec   Loss 4.9668   LearningRate 0.0130   Epoch: 12   Global Step: 213290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:40,241-Speed 9164.62 samples/sec   Loss 4.9579   LearningRate 0.0130   Epoch: 12   Global Step: 213300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:41,419-Speed 8700.32 samples/sec   Loss 5.0412   LearningRate 0.0130   Epoch: 12   Global Step: 213310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:08:42,496-Speed 9519.68 samples/sec   Loss 5.0441   LearningRate 0.0130   Epoch: 12   Global Step: 213320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:08:43,591-Speed 9351.12 samples/sec   Loss 5.0627   LearningRate 0.0130   Epoch: 12   Global Step: 213330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:08:44,751-Speed 8837.68 samples/sec   Loss 5.0191   LearningRate 0.0130   Epoch: 12   Global Step: 213340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:08:45,852-Speed 9302.81 samples/sec   Loss 5.0622   LearningRate 0.0130   Epoch: 12   Global Step: 213350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:08:46,958-Speed 9266.07 samples/sec   Loss 4.9043   LearningRate 0.0130   Epoch: 12   Global Step: 213360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:08:48,046-Speed 9423.15 samples/sec   Loss 4.9782   LearningRate 0.0130   Epoch: 12   Global Step: 213370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:08:49,169-Speed 9128.04 samples/sec   Loss 4.9783   LearningRate 0.0130   Epoch: 12   Global Step: 213380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:08:50,271-Speed 9292.05 samples/sec   Loss 5.0127   LearningRate 0.0130   Epoch: 12   Global Step: 213390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:08:51,369-Speed 9334.41 samples/sec   Loss 5.0046   LearningRate 0.0130   Epoch: 12   Global Step: 213400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:08:52,450-Speed 9475.64 samples/sec   Loss 5.0250   LearningRate 0.0130   Epoch: 12   Global Step: 213410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:53,553-Speed 9294.96 samples/sec   Loss 4.9885   LearningRate 0.0130   Epoch: 12   Global Step: 213420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:54,636-Speed 9466.81 samples/sec   Loss 5.0497   LearningRate 0.0130   Epoch: 12   Global Step: 213430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:08:55,779-Speed 8970.93 samples/sec   Loss 5.0658   LearningRate 0.0130   Epoch: 12   Global Step: 213440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:08:56,855-Speed 9524.58 samples/sec   Loss 5.0127   LearningRate 0.0130   Epoch: 12   Global Step: 213450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:08:57,946-Speed 9387.94 samples/sec   Loss 5.0991   LearningRate 0.0130   Epoch: 12   Global Step: 213460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:08:59,029-Speed 9460.48 samples/sec   Loss 5.0943   LearningRate 0.0130   Epoch: 12   Global Step: 213470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:00,161-Speed 9058.76 samples/sec   Loss 4.9931   LearningRate 0.0130   Epoch: 12   Global Step: 213480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:01,218-Speed 9695.31 samples/sec   Loss 5.0277   LearningRate 0.0130   Epoch: 12   Global Step: 213490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:02,334-Speed 9175.14 samples/sec   Loss 5.0746   LearningRate 0.0130   Epoch: 12   Global Step: 213500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:03,477-Speed 8963.27 samples/sec   Loss 4.9763   LearningRate 0.0130   Epoch: 12   Global Step: 213510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:04,603-Speed 9102.43 samples/sec   Loss 4.9335   LearningRate 0.0130   Epoch: 12   Global Step: 213520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:05,695-Speed 9386.31 samples/sec   Loss 5.0629   LearningRate 0.0130   Epoch: 12   Global Step: 213530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:06,769-Speed 9544.64 samples/sec   Loss 5.0058   LearningRate 0.0130   Epoch: 12   Global Step: 213540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:07,868-Speed 9330.03 samples/sec   Loss 4.9637   LearningRate 0.0130   Epoch: 12   Global Step: 213550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:08,985-Speed 9165.57 samples/sec   Loss 5.0459   LearningRate 0.0130   Epoch: 12   Global Step: 213560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:10,075-Speed 9400.19 samples/sec   Loss 5.0245   LearningRate 0.0130   Epoch: 12   Global Step: 213570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:11,193-Speed 9168.62 samples/sec   Loss 5.1206   LearningRate 0.0130   Epoch: 12   Global Step: 213580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:12,281-Speed 9417.40 samples/sec   Loss 5.0685   LearningRate 0.0130   Epoch: 12   Global Step: 213590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:13,353-Speed 9553.29 samples/sec   Loss 5.0995   LearningRate 0.0130   Epoch: 12   Global Step: 213600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:14,475-Speed 9131.27 samples/sec   Loss 4.9158   LearningRate 0.0130   Epoch: 12   Global Step: 213610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:15,604-Speed 9077.34 samples/sec   Loss 4.9873   LearningRate 0.0130   Epoch: 12   Global Step: 213620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:16,691-Speed 9430.89 samples/sec   Loss 5.0581   LearningRate 0.0130   Epoch: 12   Global Step: 213630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:17,786-Speed 9363.30 samples/sec   Loss 5.0579   LearningRate 0.0130   Epoch: 12   Global Step: 213640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:18,886-Speed 9309.41 samples/sec   Loss 4.9945   LearningRate 0.0130   Epoch: 12   Global Step: 213650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:20,002-Speed 9186.57 samples/sec   Loss 4.9744   LearningRate 0.0130   Epoch: 12   Global Step: 213660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:21,111-Speed 9237.99 samples/sec   Loss 4.9976   LearningRate 0.0130   Epoch: 12   Global Step: 213670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:22,225-Speed 9199.23 samples/sec   Loss 4.9964   LearningRate 0.0130   Epoch: 12   Global Step: 213680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:23,350-Speed 9108.15 samples/sec   Loss 5.1092   LearningRate 0.0130   Epoch: 12   Global Step: 213690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:24,401-Speed 9751.23 samples/sec   Loss 5.0605   LearningRate 0.0129   Epoch: 12   Global Step: 213700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:25,462-Speed 9654.86 samples/sec   Loss 5.1158   LearningRate 0.0129   Epoch: 12   Global Step: 213710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:26,554-Speed 9382.69 samples/sec   Loss 4.9641   LearningRate 0.0129   Epoch: 12   Global Step: 213720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:27,679-Speed 9115.07 samples/sec   Loss 4.9805   LearningRate 0.0129   Epoch: 12   Global Step: 213730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:28,770-Speed 9390.77 samples/sec   Loss 4.9309   LearningRate 0.0129   Epoch: 12   Global Step: 213740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:29,839-Speed 9582.54 samples/sec   Loss 5.0396   LearningRate 0.0129   Epoch: 12   Global Step: 213750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:30,942-Speed 9287.03 samples/sec   Loss 4.9335   LearningRate 0.0129   Epoch: 12   Global Step: 213760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:32,048-Speed 9269.11 samples/sec   Loss 5.0575   LearningRate 0.0129   Epoch: 12   Global Step: 213770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:33,175-Speed 9094.40 samples/sec   Loss 5.0395   LearningRate 0.0129   Epoch: 12   Global Step: 213780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:34,239-Speed 9622.40 samples/sec   Loss 5.1232   LearningRate 0.0129   Epoch: 12   Global Step: 213790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:35,328-Speed 9411.67 samples/sec   Loss 4.9922   LearningRate 0.0129   Epoch: 12   Global Step: 213800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:36,426-Speed 9331.64 samples/sec   Loss 5.0557   LearningRate 0.0129   Epoch: 12   Global Step: 213810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:37,499-Speed 9551.31 samples/sec   Loss 5.0949   LearningRate 0.0129   Epoch: 12   Global Step: 213820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:38,598-Speed 9324.48 samples/sec   Loss 5.0174   LearningRate 0.0129   Epoch: 12   Global Step: 213830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:39,691-Speed 9369.21 samples/sec   Loss 5.0619   LearningRate 0.0129   Epoch: 12   Global Step: 213840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:40,787-Speed 9351.71 samples/sec   Loss 5.0175   LearningRate 0.0129   Epoch: 12   Global Step: 213850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:41,882-Speed 9360.33 samples/sec   Loss 5.0151   LearningRate 0.0129   Epoch: 12   Global Step: 213860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:42,951-Speed 9579.99 samples/sec   Loss 5.1169   LearningRate 0.0129   Epoch: 12   Global Step: 213870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:44,008-Speed 9697.91 samples/sec   Loss 5.0653   LearningRate 0.0129   Epoch: 12   Global Step: 213880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:45,086-Speed 9503.57 samples/sec   Loss 5.0454   LearningRate 0.0129   Epoch: 12   Global Step: 213890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:46,221-Speed 9023.35 samples/sec   Loss 5.0214   LearningRate 0.0129   Epoch: 12   Global Step: 213900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:47,322-Speed 9310.12 samples/sec   Loss 4.9761   LearningRate 0.0129   Epoch: 12   Global Step: 213910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:48,360-Speed 9880.88 samples/sec   Loss 5.0333   LearningRate 0.0129   Epoch: 12   Global Step: 213920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:49,459-Speed 9316.67 samples/sec   Loss 5.0069   LearningRate 0.0129   Epoch: 12   Global Step: 213930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:50,567-Speed 9250.28 samples/sec   Loss 5.0171   LearningRate 0.0129   Epoch: 12   Global Step: 213940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:09:51,670-Speed 9290.66 samples/sec   Loss 4.9902   LearningRate 0.0129   Epoch: 12   Global Step: 213950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:52,755-Speed 9439.00 samples/sec   Loss 5.0903   LearningRate 0.0129   Epoch: 12   Global Step: 213960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:53,820-Speed 9622.76 samples/sec   Loss 5.0650   LearningRate 0.0129   Epoch: 12   Global Step: 213970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:54,924-Speed 9279.35 samples/sec   Loss 4.9773   LearningRate 0.0129   Epoch: 12   Global Step: 213980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:56,003-Speed 9508.33 samples/sec   Loss 5.1213   LearningRate 0.0129   Epoch: 12   Global Step: 213990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:09:57,052-Speed 9766.28 samples/sec   Loss 5.0268   LearningRate 0.0129   Epoch: 12   Global Step: 214000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:10:18,859-[lfw][214000]XNorm: 8.206974
Training: 2022-04-11 20:10:18,859-[lfw][214000]Accuracy-Flip: 0.99583+-0.00291
Training: 2022-04-11 20:10:18,860-[lfw][214000]Accuracy-Highest: 0.99683
Training: 2022-04-11 20:10:44,202-[cfp_fp][214000]XNorm: 7.060784
Training: 2022-04-11 20:10:44,203-[cfp_fp][214000]Accuracy-Flip: 0.96557+-0.00925
Training: 2022-04-11 20:10:44,203-[cfp_fp][214000]Accuracy-Highest: 0.96771
Training: 2022-04-11 20:11:06,175-[agedb_30][214000]XNorm: 7.938451
Training: 2022-04-11 20:11:06,176-[agedb_30][214000]Accuracy-Flip: 0.96833+-0.01220
Training: 2022-04-11 20:11:06,176-[agedb_30][214000]Accuracy-Highest: 0.96983
Training: 2022-04-11 20:11:07,288-Speed 145.80 samples/sec   Loss 4.9959   LearningRate 0.0129   Epoch: 12   Global Step: 214010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:08,365-Speed 9516.93 samples/sec   Loss 5.0938   LearningRate 0.0129   Epoch: 12   Global Step: 214020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:09,429-Speed 9630.96 samples/sec   Loss 5.1041   LearningRate 0.0129   Epoch: 12   Global Step: 214030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:10,476-Speed 9786.23 samples/sec   Loss 5.0920   LearningRate 0.0129   Epoch: 12   Global Step: 214040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:11,578-Speed 9298.52 samples/sec   Loss 5.0656   LearningRate 0.0129   Epoch: 12   Global Step: 214050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:12,698-Speed 9144.65 samples/sec   Loss 5.0706   LearningRate 0.0129   Epoch: 12   Global Step: 214060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:13,792-Speed 9373.56 samples/sec   Loss 5.0554   LearningRate 0.0129   Epoch: 12   Global Step: 214070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:14,847-Speed 9706.97 samples/sec   Loss 4.9393   LearningRate 0.0129   Epoch: 12   Global Step: 214080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:15,947-Speed 9318.21 samples/sec   Loss 5.0140   LearningRate 0.0129   Epoch: 12   Global Step: 214090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:17,032-Speed 9445.60 samples/sec   Loss 5.0600   LearningRate 0.0129   Epoch: 12   Global Step: 214100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:18,111-Speed 9491.20 samples/sec   Loss 5.1123   LearningRate 0.0129   Epoch: 12   Global Step: 214110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:19,211-Speed 9316.67 samples/sec   Loss 4.9674   LearningRate 0.0129   Epoch: 12   Global Step: 214120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:20,329-Speed 9160.04 samples/sec   Loss 5.0145   LearningRate 0.0129   Epoch: 12   Global Step: 214130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:21,397-Speed 9598.24 samples/sec   Loss 5.0580   LearningRate 0.0129   Epoch: 12   Global Step: 214140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:22,492-Speed 9358.46 samples/sec   Loss 5.0308   LearningRate 0.0129   Epoch: 12   Global Step: 214150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:23,568-Speed 9521.01 samples/sec   Loss 5.0064   LearningRate 0.0128   Epoch: 12   Global Step: 214160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:24,638-Speed 9571.97 samples/sec   Loss 5.1390   LearningRate 0.0128   Epoch: 12   Global Step: 214170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:25,678-Speed 9859.54 samples/sec   Loss 5.1068   LearningRate 0.0128   Epoch: 12   Global Step: 214180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:26,749-Speed 9560.36 samples/sec   Loss 4.9981   LearningRate 0.0128   Epoch: 12   Global Step: 214190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:27,810-Speed 9655.98 samples/sec   Loss 4.9449   LearningRate 0.0128   Epoch: 12   Global Step: 214200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:28,879-Speed 9587.13 samples/sec   Loss 5.0644   LearningRate 0.0128   Epoch: 12   Global Step: 214210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:29,978-Speed 9326.34 samples/sec   Loss 5.1108   LearningRate 0.0128   Epoch: 12   Global Step: 214220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:31,020-Speed 9836.78 samples/sec   Loss 5.0982   LearningRate 0.0128   Epoch: 12   Global Step: 214230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:32,102-Speed 9467.89 samples/sec   Loss 5.0033   LearningRate 0.0128   Epoch: 12   Global Step: 214240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:33,188-Speed 9434.87 samples/sec   Loss 4.9264   LearningRate 0.0128   Epoch: 12   Global Step: 214250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:34,259-Speed 9564.50 samples/sec   Loss 5.0109   LearningRate 0.0128   Epoch: 12   Global Step: 214260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:35,311-Speed 9742.74 samples/sec   Loss 4.9788   LearningRate 0.0128   Epoch: 12   Global Step: 214270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:36,399-Speed 9413.91 samples/sec   Loss 5.0561   LearningRate 0.0128   Epoch: 12   Global Step: 214280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:37,487-Speed 9419.47 samples/sec   Loss 5.0159   LearningRate 0.0128   Epoch: 12   Global Step: 214290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:38,606-Speed 9159.73 samples/sec   Loss 5.0085   LearningRate 0.0128   Epoch: 12   Global Step: 214300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:39,690-Speed 9448.08 samples/sec   Loss 4.9966   LearningRate 0.0128   Epoch: 12   Global Step: 214310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:40,802-Speed 9212.31 samples/sec   Loss 5.0413   LearningRate 0.0128   Epoch: 12   Global Step: 214320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:41,889-Speed 9425.53 samples/sec   Loss 5.0418   LearningRate 0.0128   Epoch: 12   Global Step: 214330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:42,954-Speed 9623.77 samples/sec   Loss 5.1465   LearningRate 0.0128   Epoch: 12   Global Step: 214340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:44,042-Speed 9417.84 samples/sec   Loss 4.9785   LearningRate 0.0128   Epoch: 12   Global Step: 214350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:45,124-Speed 9466.64 samples/sec   Loss 4.9623   LearningRate 0.0128   Epoch: 12   Global Step: 214360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:46,223-Speed 9325.19 samples/sec   Loss 5.0005   LearningRate 0.0128   Epoch: 12   Global Step: 214370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:47,277-Speed 9723.49 samples/sec   Loss 5.0249   LearningRate 0.0128   Epoch: 12   Global Step: 214380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:48,407-Speed 9064.87 samples/sec   Loss 5.0483   LearningRate 0.0128   Epoch: 12   Global Step: 214390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:49,481-Speed 9536.20 samples/sec   Loss 5.0455   LearningRate 0.0128   Epoch: 12   Global Step: 214400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:11:50,545-Speed 9636.33 samples/sec   Loss 4.9756   LearningRate 0.0128   Epoch: 12   Global Step: 214410   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:11:51,638-Speed 9368.23 samples/sec   Loss 4.9738   LearningRate 0.0128   Epoch: 12   Global Step: 214420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:52,700-Speed 9653.02 samples/sec   Loss 5.1161   LearningRate 0.0128   Epoch: 12   Global Step: 214430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:53,775-Speed 9530.68 samples/sec   Loss 5.0884   LearningRate 0.0128   Epoch: 12   Global Step: 214440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:54,867-Speed 9382.12 samples/sec   Loss 5.0754   LearningRate 0.0128   Epoch: 12   Global Step: 214450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:55,978-Speed 9225.58 samples/sec   Loss 5.0536   LearningRate 0.0128   Epoch: 12   Global Step: 214460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:57,060-Speed 9467.95 samples/sec   Loss 5.0808   LearningRate 0.0128   Epoch: 12   Global Step: 214470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:58,140-Speed 9487.26 samples/sec   Loss 5.0174   LearningRate 0.0128   Epoch: 12   Global Step: 214480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:11:59,200-Speed 9663.21 samples/sec   Loss 5.0308   LearningRate 0.0128   Epoch: 12   Global Step: 214490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:12:00,268-Speed 9597.68 samples/sec   Loss 5.0614   LearningRate 0.0128   Epoch: 12   Global Step: 214500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:12:01,344-Speed 9521.06 samples/sec   Loss 5.0079   LearningRate 0.0128   Epoch: 12   Global Step: 214510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:12:02,449-Speed 9270.10 samples/sec   Loss 5.0367   LearningRate 0.0128   Epoch: 12   Global Step: 214520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:03,539-Speed 9400.48 samples/sec   Loss 4.9296   LearningRate 0.0128   Epoch: 12   Global Step: 214530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:04,613-Speed 9540.89 samples/sec   Loss 5.0221   LearningRate 0.0128   Epoch: 12   Global Step: 214540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:05,712-Speed 9324.84 samples/sec   Loss 5.0184   LearningRate 0.0128   Epoch: 12   Global Step: 214550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:06,804-Speed 9386.01 samples/sec   Loss 5.0428   LearningRate 0.0128   Epoch: 12   Global Step: 214560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:07,928-Speed 9107.98 samples/sec   Loss 4.9959   LearningRate 0.0128   Epoch: 12   Global Step: 214570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:09,055-Speed 9099.74 samples/sec   Loss 4.9892   LearningRate 0.0128   Epoch: 12   Global Step: 214580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:10,137-Speed 9461.42 samples/sec   Loss 4.9733   LearningRate 0.0128   Epoch: 12   Global Step: 214590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:11,235-Speed 9334.51 samples/sec   Loss 5.0622   LearningRate 0.0128   Epoch: 12   Global Step: 214600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:12,338-Speed 9295.79 samples/sec   Loss 4.9891   LearningRate 0.0128   Epoch: 12   Global Step: 214610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:13,484-Speed 8939.52 samples/sec   Loss 5.1243   LearningRate 0.0128   Epoch: 12   Global Step: 214620   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:12:14,581-Speed 9337.56 samples/sec   Loss 5.1394   LearningRate 0.0127   Epoch: 12   Global Step: 214630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:15,675-Speed 9368.80 samples/sec   Loss 5.0761   LearningRate 0.0127   Epoch: 12   Global Step: 214640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:16,771-Speed 9347.46 samples/sec   Loss 5.0317   LearningRate 0.0127   Epoch: 12   Global Step: 214650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:17,862-Speed 9392.76 samples/sec   Loss 5.0332   LearningRate 0.0127   Epoch: 12   Global Step: 214660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:18,990-Speed 9078.23 samples/sec   Loss 4.9744   LearningRate 0.0127   Epoch: 12   Global Step: 214670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:20,091-Speed 9311.39 samples/sec   Loss 5.0263   LearningRate 0.0127   Epoch: 12   Global Step: 214680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:21,148-Speed 9686.53 samples/sec   Loss 5.0278   LearningRate 0.0127   Epoch: 12   Global Step: 214690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:22,238-Speed 9399.63 samples/sec   Loss 5.0334   LearningRate 0.0127   Epoch: 12   Global Step: 214700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:23,383-Speed 8955.39 samples/sec   Loss 5.0405   LearningRate 0.0127   Epoch: 12   Global Step: 214710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:24,452-Speed 9582.24 samples/sec   Loss 4.9840   LearningRate 0.0127   Epoch: 12   Global Step: 214720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:25,535-Speed 9457.08 samples/sec   Loss 5.0842   LearningRate 0.0127   Epoch: 12   Global Step: 214730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:26,609-Speed 9545.81 samples/sec   Loss 5.0788   LearningRate 0.0127   Epoch: 12   Global Step: 214740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:27,666-Speed 9697.49 samples/sec   Loss 5.0418   LearningRate 0.0127   Epoch: 12   Global Step: 214750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:28,782-Speed 9178.54 samples/sec   Loss 5.0320   LearningRate 0.0127   Epoch: 12   Global Step: 214760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:29,836-Speed 9720.44 samples/sec   Loss 5.0337   LearningRate 0.0127   Epoch: 12   Global Step: 214770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:30,929-Speed 9382.87 samples/sec   Loss 5.0006   LearningRate 0.0127   Epoch: 12   Global Step: 214780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:32,052-Speed 9124.95 samples/sec   Loss 5.1030   LearningRate 0.0127   Epoch: 12   Global Step: 214790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:33,169-Speed 9173.12 samples/sec   Loss 5.1420   LearningRate 0.0127   Epoch: 12   Global Step: 214800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:34,274-Speed 9268.44 samples/sec   Loss 4.9614   LearningRate 0.0127   Epoch: 12   Global Step: 214810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:35,365-Speed 9388.19 samples/sec   Loss 5.0938   LearningRate 0.0127   Epoch: 12   Global Step: 214820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:36,431-Speed 9610.64 samples/sec   Loss 5.0715   LearningRate 0.0127   Epoch: 12   Global Step: 214830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:37,528-Speed 9346.52 samples/sec   Loss 4.9815   LearningRate 0.0127   Epoch: 12   Global Step: 214840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:38,585-Speed 9691.26 samples/sec   Loss 5.0454   LearningRate 0.0127   Epoch: 12   Global Step: 214850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:39,660-Speed 9529.11 samples/sec   Loss 5.0603   LearningRate 0.0127   Epoch: 12   Global Step: 214860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:40,713-Speed 9733.77 samples/sec   Loss 4.9915   LearningRate 0.0127   Epoch: 12   Global Step: 214870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:41,780-Speed 9595.16 samples/sec   Loss 5.1134   LearningRate 0.0127   Epoch: 12   Global Step: 214880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:42,885-Speed 9282.10 samples/sec   Loss 5.0559   LearningRate 0.0127   Epoch: 12   Global Step: 214890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:43,988-Speed 9281.52 samples/sec   Loss 5.0321   LearningRate 0.0127   Epoch: 12   Global Step: 214900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:45,067-Speed 9502.67 samples/sec   Loss 5.1272   LearningRate 0.0127   Epoch: 12   Global Step: 214910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:46,176-Speed 9236.46 samples/sec   Loss 5.1341   LearningRate 0.0127   Epoch: 12   Global Step: 214920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:47,242-Speed 9606.97 samples/sec   Loss 4.9524   LearningRate 0.0127   Epoch: 12   Global Step: 214930   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:12:48,343-Speed 9308.10 samples/sec   Loss 4.9945   LearningRate 0.0127   Epoch: 12   Global Step: 214940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:49,400-Speed 9692.16 samples/sec   Loss 5.0707   LearningRate 0.0127   Epoch: 12   Global Step: 214950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:50,462-Speed 9652.17 samples/sec   Loss 4.9526   LearningRate 0.0127   Epoch: 12   Global Step: 214960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:51,530-Speed 9588.55 samples/sec   Loss 4.9967   LearningRate 0.0127   Epoch: 12   Global Step: 214970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:52,621-Speed 9391.16 samples/sec   Loss 4.9587   LearningRate 0.0127   Epoch: 12   Global Step: 214980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:53,703-Speed 9473.48 samples/sec   Loss 5.0156   LearningRate 0.0127   Epoch: 12   Global Step: 214990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:54,781-Speed 9503.51 samples/sec   Loss 4.9643   LearningRate 0.0127   Epoch: 12   Global Step: 215000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:55,857-Speed 9520.85 samples/sec   Loss 5.0033   LearningRate 0.0127   Epoch: 12   Global Step: 215010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:56,971-Speed 9203.69 samples/sec   Loss 5.0010   LearningRate 0.0127   Epoch: 12   Global Step: 215020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:58,043-Speed 9550.04 samples/sec   Loss 5.0704   LearningRate 0.0127   Epoch: 12   Global Step: 215030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:12:59,080-Speed 9881.16 samples/sec   Loss 5.0011   LearningRate 0.0127   Epoch: 12   Global Step: 215040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:00,177-Speed 9346.48 samples/sec   Loss 4.9636   LearningRate 0.0127   Epoch: 12   Global Step: 215050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:01,285-Speed 9244.94 samples/sec   Loss 5.0145   LearningRate 0.0127   Epoch: 12   Global Step: 215060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:02,337-Speed 9742.04 samples/sec   Loss 5.0338   LearningRate 0.0127   Epoch: 12   Global Step: 215070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:03,423-Speed 9435.85 samples/sec   Loss 5.0150   LearningRate 0.0127   Epoch: 12   Global Step: 215080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:04,533-Speed 9224.73 samples/sec   Loss 5.0752   LearningRate 0.0127   Epoch: 12   Global Step: 215090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:05,658-Speed 9109.37 samples/sec   Loss 5.0369   LearningRate 0.0126   Epoch: 12   Global Step: 215100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:06,798-Speed 8983.41 samples/sec   Loss 4.9985   LearningRate 0.0126   Epoch: 12   Global Step: 215110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:07,878-Speed 9489.90 samples/sec   Loss 5.0510   LearningRate 0.0126   Epoch: 12   Global Step: 215120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:08,972-Speed 9368.63 samples/sec   Loss 5.0673   LearningRate 0.0126   Epoch: 12   Global Step: 215130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:10,047-Speed 9535.31 samples/sec   Loss 4.9792   LearningRate 0.0126   Epoch: 12   Global Step: 215140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:11,148-Speed 9302.20 samples/sec   Loss 5.0480   LearningRate 0.0126   Epoch: 12   Global Step: 215150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:12,196-Speed 9781.10 samples/sec   Loss 4.9509   LearningRate 0.0126   Epoch: 12   Global Step: 215160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:13,274-Speed 9502.99 samples/sec   Loss 5.0513   LearningRate 0.0126   Epoch: 12   Global Step: 215170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:14,394-Speed 9149.80 samples/sec   Loss 5.0365   LearningRate 0.0126   Epoch: 12   Global Step: 215180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:15,415-Speed 10030.43 samples/sec   Loss 4.9866   LearningRate 0.0126   Epoch: 12   Global Step: 215190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:16,484-Speed 9582.96 samples/sec   Loss 4.9534   LearningRate 0.0126   Epoch: 12   Global Step: 215200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:17,606-Speed 9135.50 samples/sec   Loss 4.9451   LearningRate 0.0126   Epoch: 12   Global Step: 215210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:18,705-Speed 9318.35 samples/sec   Loss 5.0826   LearningRate 0.0126   Epoch: 12   Global Step: 215220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:19,785-Speed 9489.73 samples/sec   Loss 4.9820   LearningRate 0.0126   Epoch: 12   Global Step: 215230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:20,900-Speed 9194.83 samples/sec   Loss 5.0475   LearningRate 0.0126   Epoch: 12   Global Step: 215240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:22,018-Speed 9160.37 samples/sec   Loss 5.0844   LearningRate 0.0126   Epoch: 12   Global Step: 215250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:23,133-Speed 9189.17 samples/sec   Loss 4.9548   LearningRate 0.0126   Epoch: 12   Global Step: 215260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:24,215-Speed 9476.58 samples/sec   Loss 5.0052   LearningRate 0.0126   Epoch: 12   Global Step: 215270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:25,294-Speed 9489.42 samples/sec   Loss 5.0446   LearningRate 0.0126   Epoch: 12   Global Step: 215280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:26,394-Speed 9319.13 samples/sec   Loss 5.0484   LearningRate 0.0126   Epoch: 12   Global Step: 215290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:27,524-Speed 9065.51 samples/sec   Loss 5.0860   LearningRate 0.0126   Epoch: 12   Global Step: 215300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:28,592-Speed 9597.88 samples/sec   Loss 4.9807   LearningRate 0.0126   Epoch: 12   Global Step: 215310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:29,732-Speed 8990.75 samples/sec   Loss 5.0442   LearningRate 0.0126   Epoch: 12   Global Step: 215320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:30,839-Speed 9254.56 samples/sec   Loss 5.0359   LearningRate 0.0126   Epoch: 12   Global Step: 215330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:31,929-Speed 9402.68 samples/sec   Loss 4.9727   LearningRate 0.0126   Epoch: 12   Global Step: 215340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:33,011-Speed 9470.79 samples/sec   Loss 5.0065   LearningRate 0.0126   Epoch: 12   Global Step: 215350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:34,133-Speed 9126.15 samples/sec   Loss 4.9685   LearningRate 0.0126   Epoch: 12   Global Step: 215360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:35,233-Speed 9317.56 samples/sec   Loss 5.0653   LearningRate 0.0126   Epoch: 12   Global Step: 215370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:36,314-Speed 9474.01 samples/sec   Loss 5.0292   LearningRate 0.0126   Epoch: 12   Global Step: 215380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:37,390-Speed 9520.19 samples/sec   Loss 5.0440   LearningRate 0.0126   Epoch: 12   Global Step: 215390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:38,482-Speed 9384.35 samples/sec   Loss 4.9982   LearningRate 0.0126   Epoch: 12   Global Step: 215400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:39,565-Speed 9461.24 samples/sec   Loss 5.0279   LearningRate 0.0126   Epoch: 12   Global Step: 215410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:40,651-Speed 9431.91 samples/sec   Loss 4.9838   LearningRate 0.0126   Epoch: 12   Global Step: 215420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:41,728-Speed 9516.07 samples/sec   Loss 5.0013   LearningRate 0.0126   Epoch: 12   Global Step: 215430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:42,841-Speed 9200.98 samples/sec   Loss 4.9764   LearningRate 0.0126   Epoch: 12   Global Step: 215440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:43,962-Speed 9151.49 samples/sec   Loss 5.1233   LearningRate 0.0126   Epoch: 12   Global Step: 215450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:45,036-Speed 9531.61 samples/sec   Loss 4.9767   LearningRate 0.0126   Epoch: 12   Global Step: 215460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:46,088-Speed 9747.68 samples/sec   Loss 5.0154   LearningRate 0.0126   Epoch: 12   Global Step: 215470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:47,197-Speed 9239.40 samples/sec   Loss 4.9670   LearningRate 0.0126   Epoch: 12   Global Step: 215480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:48,263-Speed 9611.78 samples/sec   Loss 4.9948   LearningRate 0.0126   Epoch: 12   Global Step: 215490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:49,315-Speed 9737.38 samples/sec   Loss 5.0772   LearningRate 0.0126   Epoch: 12   Global Step: 215500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:50,380-Speed 9624.16 samples/sec   Loss 5.0962   LearningRate 0.0126   Epoch: 12   Global Step: 215510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:51,454-Speed 9536.22 samples/sec   Loss 5.0230   LearningRate 0.0126   Epoch: 12   Global Step: 215520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:52,563-Speed 9236.24 samples/sec   Loss 5.0620   LearningRate 0.0126   Epoch: 12   Global Step: 215530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:53,657-Speed 9369.48 samples/sec   Loss 5.1623   LearningRate 0.0126   Epoch: 12   Global Step: 215540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:54,740-Speed 9465.61 samples/sec   Loss 5.0248   LearningRate 0.0126   Epoch: 12   Global Step: 215550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:13:55,831-Speed 9385.40 samples/sec   Loss 5.1157   LearningRate 0.0126   Epoch: 12   Global Step: 215560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:56,953-Speed 9129.60 samples/sec   Loss 5.0863   LearningRate 0.0125   Epoch: 12   Global Step: 215570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:58,038-Speed 9442.07 samples/sec   Loss 5.0550   LearningRate 0.0125   Epoch: 12   Global Step: 215580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:13:59,165-Speed 9094.60 samples/sec   Loss 5.0369   LearningRate 0.0125   Epoch: 12   Global Step: 215590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:00,275-Speed 9235.78 samples/sec   Loss 5.0430   LearningRate 0.0125   Epoch: 12   Global Step: 215600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:01,362-Speed 9426.21 samples/sec   Loss 4.9974   LearningRate 0.0125   Epoch: 12   Global Step: 215610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:02,473-Speed 9215.86 samples/sec   Loss 4.9341   LearningRate 0.0125   Epoch: 12   Global Step: 215620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:03,586-Speed 9214.96 samples/sec   Loss 4.9420   LearningRate 0.0125   Epoch: 12   Global Step: 215630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:04,686-Speed 9316.15 samples/sec   Loss 5.0858   LearningRate 0.0125   Epoch: 12   Global Step: 215640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:05,771-Speed 9437.62 samples/sec   Loss 5.0352   LearningRate 0.0125   Epoch: 12   Global Step: 215650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:06,865-Speed 9372.66 samples/sec   Loss 5.1396   LearningRate 0.0125   Epoch: 12   Global Step: 215660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:14:07,961-Speed 9345.81 samples/sec   Loss 4.9519   LearningRate 0.0125   Epoch: 12   Global Step: 215670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:14:09,022-Speed 9650.92 samples/sec   Loss 5.0059   LearningRate 0.0125   Epoch: 12   Global Step: 215680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:14:10,134-Speed 9213.69 samples/sec   Loss 5.0800   LearningRate 0.0125   Epoch: 12   Global Step: 215690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:14:11,214-Speed 9492.46 samples/sec   Loss 5.0516   LearningRate 0.0125   Epoch: 12   Global Step: 215700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:14:12,275-Speed 9654.93 samples/sec   Loss 5.0487   LearningRate 0.0125   Epoch: 12   Global Step: 215710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:14:13,339-Speed 9629.68 samples/sec   Loss 5.0324   LearningRate 0.0125   Epoch: 12   Global Step: 215720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:14:14,421-Speed 9471.26 samples/sec   Loss 4.9377   LearningRate 0.0125   Epoch: 12   Global Step: 215730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:14:15,502-Speed 9477.79 samples/sec   Loss 4.9786   LearningRate 0.0125   Epoch: 12   Global Step: 215740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:14:16,597-Speed 9355.16 samples/sec   Loss 5.0290   LearningRate 0.0125   Epoch: 12   Global Step: 215750   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:17,695-Speed 9328.79 samples/sec   Loss 5.0109   LearningRate 0.0125   Epoch: 12   Global Step: 215760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:18,786-Speed 9396.37 samples/sec   Loss 4.9644   LearningRate 0.0125   Epoch: 12   Global Step: 215770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:19,884-Speed 9327.29 samples/sec   Loss 4.9824   LearningRate 0.0125   Epoch: 12   Global Step: 215780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:20,916-Speed 9928.41 samples/sec   Loss 4.9370   LearningRate 0.0125   Epoch: 12   Global Step: 215790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:21,974-Speed 9683.23 samples/sec   Loss 5.0972   LearningRate 0.0125   Epoch: 12   Global Step: 215800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:23,039-Speed 9618.73 samples/sec   Loss 4.9349   LearningRate 0.0125   Epoch: 12   Global Step: 215810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:24,110-Speed 9578.03 samples/sec   Loss 5.1960   LearningRate 0.0125   Epoch: 12   Global Step: 215820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:25,160-Speed 9753.55 samples/sec   Loss 5.0873   LearningRate 0.0125   Epoch: 12   Global Step: 215830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:26,214-Speed 9727.27 samples/sec   Loss 5.1257   LearningRate 0.0125   Epoch: 12   Global Step: 215840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:27,293-Speed 9493.54 samples/sec   Loss 5.0860   LearningRate 0.0125   Epoch: 12   Global Step: 215850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:14:28,413-Speed 9144.97 samples/sec   Loss 5.0650   LearningRate 0.0125   Epoch: 12   Global Step: 215860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:14:29,466-Speed 9728.46 samples/sec   Loss 5.0055   LearningRate 0.0125   Epoch: 12   Global Step: 215870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:14:30,525-Speed 9678.64 samples/sec   Loss 5.0086   LearningRate 0.0125   Epoch: 12   Global Step: 215880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:14:31,589-Speed 9632.01 samples/sec   Loss 5.0150   LearningRate 0.0125   Epoch: 12   Global Step: 215890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:14:32,671-Speed 9473.76 samples/sec   Loss 5.0561   LearningRate 0.0125   Epoch: 12   Global Step: 215900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:33,754-Speed 9460.11 samples/sec   Loss 4.9561   LearningRate 0.0125   Epoch: 12   Global Step: 215910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:34,867-Speed 9200.42 samples/sec   Loss 5.1061   LearningRate 0.0125   Epoch: 12   Global Step: 215920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:35,932-Speed 9624.63 samples/sec   Loss 4.9454   LearningRate 0.0125   Epoch: 12   Global Step: 215930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:37,012-Speed 9481.50 samples/sec   Loss 5.1136   LearningRate 0.0125   Epoch: 12   Global Step: 215940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:38,130-Speed 9170.61 samples/sec   Loss 5.0007   LearningRate 0.0125   Epoch: 12   Global Step: 215950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:39,211-Speed 9475.42 samples/sec   Loss 4.9904   LearningRate 0.0125   Epoch: 12   Global Step: 215960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:40,260-Speed 9767.39 samples/sec   Loss 5.0823   LearningRate 0.0125   Epoch: 12   Global Step: 215970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:41,324-Speed 9628.57 samples/sec   Loss 5.0374   LearningRate 0.0125   Epoch: 12   Global Step: 215980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:42,396-Speed 9560.65 samples/sec   Loss 5.0377   LearningRate 0.0125   Epoch: 12   Global Step: 215990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:14:43,491-Speed 9374.10 samples/sec   Loss 5.0640   LearningRate 0.0125   Epoch: 12   Global Step: 216000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:15:05,388-[lfw][216000]XNorm: 8.206345
Training: 2022-04-11 20:15:05,388-[lfw][216000]Accuracy-Flip: 0.99667+-0.00258
Training: 2022-04-11 20:15:05,389-[lfw][216000]Accuracy-Highest: 0.99683
Training: 2022-04-11 20:15:30,729-[cfp_fp][216000]XNorm: 7.009395
Training: 2022-04-11 20:15:30,730-[cfp_fp][216000]Accuracy-Flip: 0.96586+-0.00936
Training: 2022-04-11 20:15:30,730-[cfp_fp][216000]Accuracy-Highest: 0.96771
Training: 2022-04-11 20:15:52,629-[agedb_30][216000]XNorm: 7.970877
Training: 2022-04-11 20:15:52,630-[agedb_30][216000]Accuracy-Flip: 0.97033+-0.00980
Training: 2022-04-11 20:15:52,630-[agedb_30][216000]Accuracy-Highest: 0.97033
Training: 2022-04-11 20:15:53,722-Speed 145.80 samples/sec   Loss 5.0488   LearningRate 0.0125   Epoch: 12   Global Step: 216010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:15:54,772-Speed 9761.15 samples/sec   Loss 4.9203   LearningRate 0.0125   Epoch: 12   Global Step: 216020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:15:55,823-Speed 9741.00 samples/sec   Loss 4.9919   LearningRate 0.0125   Epoch: 12   Global Step: 216030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:15:56,998-Speed 8723.72 samples/sec   Loss 5.0672   LearningRate 0.0124   Epoch: 12   Global Step: 216040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:15:58,094-Speed 9345.52 samples/sec   Loss 4.9774   LearningRate 0.0124   Epoch: 12   Global Step: 216050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:15:59,171-Speed 9510.98 samples/sec   Loss 5.0749   LearningRate 0.0124   Epoch: 12   Global Step: 216060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:00,238-Speed 9608.29 samples/sec   Loss 5.0936   LearningRate 0.0124   Epoch: 12   Global Step: 216070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:01,345-Speed 9250.20 samples/sec   Loss 5.0518   LearningRate 0.0124   Epoch: 12   Global Step: 216080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:02,468-Speed 9123.67 samples/sec   Loss 5.0350   LearningRate 0.0124   Epoch: 12   Global Step: 216090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:03,577-Speed 9238.80 samples/sec   Loss 4.9979   LearningRate 0.0124   Epoch: 12   Global Step: 216100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:04,637-Speed 9667.30 samples/sec   Loss 4.9895   LearningRate 0.0124   Epoch: 12   Global Step: 216110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:05,739-Speed 9302.06 samples/sec   Loss 5.1260   LearningRate 0.0124   Epoch: 12   Global Step: 216120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:06,867-Speed 9086.06 samples/sec   Loss 4.9828   LearningRate 0.0124   Epoch: 12   Global Step: 216130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:07,947-Speed 9486.48 samples/sec   Loss 4.9574   LearningRate 0.0124   Epoch: 12   Global Step: 216140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:09,028-Speed 9479.37 samples/sec   Loss 5.0536   LearningRate 0.0124   Epoch: 12   Global Step: 216150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:10,117-Speed 9404.10 samples/sec   Loss 4.9567   LearningRate 0.0124   Epoch: 12   Global Step: 216160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:11,223-Speed 9261.13 samples/sec   Loss 4.9548   LearningRate 0.0124   Epoch: 12   Global Step: 216170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:12,284-Speed 9661.21 samples/sec   Loss 5.0882   LearningRate 0.0124   Epoch: 12   Global Step: 216180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:13,369-Speed 9446.99 samples/sec   Loss 5.1502   LearningRate 0.0124   Epoch: 12   Global Step: 216190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:14,460-Speed 9395.11 samples/sec   Loss 5.0765   LearningRate 0.0124   Epoch: 12   Global Step: 216200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:15,531-Speed 9565.15 samples/sec   Loss 5.1448   LearningRate 0.0124   Epoch: 12   Global Step: 216210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:16,683-Speed 8890.99 samples/sec   Loss 5.0610   LearningRate 0.0124   Epoch: 12   Global Step: 216220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:17,777-Speed 9366.95 samples/sec   Loss 5.1514   LearningRate 0.0124   Epoch: 12   Global Step: 216230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:18,844-Speed 9605.15 samples/sec   Loss 4.9919   LearningRate 0.0124   Epoch: 12   Global Step: 216240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:19,913-Speed 9578.57 samples/sec   Loss 5.0364   LearningRate 0.0124   Epoch: 12   Global Step: 216250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:21,033-Speed 9145.91 samples/sec   Loss 5.0132   LearningRate 0.0124   Epoch: 12   Global Step: 216260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:22,122-Speed 9413.88 samples/sec   Loss 5.1229   LearningRate 0.0124   Epoch: 12   Global Step: 216270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:23,199-Speed 9510.50 samples/sec   Loss 5.1409   LearningRate 0.0124   Epoch: 12   Global Step: 216280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:24,340-Speed 8981.04 samples/sec   Loss 4.9726   LearningRate 0.0124   Epoch: 12   Global Step: 216290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:25,427-Speed 9424.76 samples/sec   Loss 5.0610   LearningRate 0.0124   Epoch: 12   Global Step: 216300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:26,564-Speed 9011.34 samples/sec   Loss 4.9234   LearningRate 0.0124   Epoch: 12   Global Step: 216310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:27,628-Speed 9633.07 samples/sec   Loss 5.1034   LearningRate 0.0124   Epoch: 12   Global Step: 216320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:28,684-Speed 9699.19 samples/sec   Loss 5.0266   LearningRate 0.0124   Epoch: 12   Global Step: 216330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:29,760-Speed 9520.02 samples/sec   Loss 4.9333   LearningRate 0.0124   Epoch: 12   Global Step: 216340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:30,838-Speed 9507.42 samples/sec   Loss 4.9823   LearningRate 0.0124   Epoch: 12   Global Step: 216350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:31,944-Speed 9268.24 samples/sec   Loss 4.9560   LearningRate 0.0124   Epoch: 12   Global Step: 216360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:33,019-Speed 9530.08 samples/sec   Loss 5.0220   LearningRate 0.0124   Epoch: 12   Global Step: 216370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:34,108-Speed 9401.51 samples/sec   Loss 5.0211   LearningRate 0.0124   Epoch: 12   Global Step: 216380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:35,216-Speed 9255.79 samples/sec   Loss 4.9533   LearningRate 0.0124   Epoch: 12   Global Step: 216390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:36,288-Speed 9560.44 samples/sec   Loss 5.0329   LearningRate 0.0124   Epoch: 12   Global Step: 216400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:37,426-Speed 9001.89 samples/sec   Loss 5.0500   LearningRate 0.0124   Epoch: 12   Global Step: 216410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:38,543-Speed 9169.75 samples/sec   Loss 5.0441   LearningRate 0.0124   Epoch: 12   Global Step: 216420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:39,613-Speed 9578.14 samples/sec   Loss 5.0301   LearningRate 0.0124   Epoch: 12   Global Step: 216430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:40,675-Speed 9645.45 samples/sec   Loss 5.0062   LearningRate 0.0124   Epoch: 12   Global Step: 216440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:41,805-Speed 9074.79 samples/sec   Loss 5.0303   LearningRate 0.0124   Epoch: 12   Global Step: 216450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:16:42,895-Speed 9396.55 samples/sec   Loss 4.9851   LearningRate 0.0124   Epoch: 12   Global Step: 216460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:44,018-Speed 9125.50 samples/sec   Loss 4.9845   LearningRate 0.0124   Epoch: 12   Global Step: 216470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:45,067-Speed 9761.14 samples/sec   Loss 5.0489   LearningRate 0.0124   Epoch: 12   Global Step: 216480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:46,145-Speed 9501.85 samples/sec   Loss 5.0657   LearningRate 0.0124   Epoch: 12   Global Step: 216490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:47,222-Speed 9521.18 samples/sec   Loss 5.0264   LearningRate 0.0124   Epoch: 12   Global Step: 216500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:48,333-Speed 9221.30 samples/sec   Loss 5.0424   LearningRate 0.0123   Epoch: 12   Global Step: 216510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:49,424-Speed 9388.53 samples/sec   Loss 4.9985   LearningRate 0.0123   Epoch: 12   Global Step: 216520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:50,499-Speed 9535.07 samples/sec   Loss 5.0783   LearningRate 0.0123   Epoch: 12   Global Step: 216530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:51,599-Speed 9315.36 samples/sec   Loss 5.0249   LearningRate 0.0123   Epoch: 12   Global Step: 216540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:52,703-Speed 9280.73 samples/sec   Loss 5.1229   LearningRate 0.0123   Epoch: 12   Global Step: 216550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:53,845-Speed 8973.93 samples/sec   Loss 4.9592   LearningRate 0.0123   Epoch: 12   Global Step: 216560   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:16:54,930-Speed 9439.32 samples/sec   Loss 5.0490   LearningRate 0.0123   Epoch: 12   Global Step: 216570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:56,028-Speed 9336.61 samples/sec   Loss 4.9288   LearningRate 0.0123   Epoch: 12   Global Step: 216580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:57,092-Speed 9629.53 samples/sec   Loss 4.9971   LearningRate 0.0123   Epoch: 12   Global Step: 216590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:58,200-Speed 9246.52 samples/sec   Loss 4.9451   LearningRate 0.0123   Epoch: 12   Global Step: 216600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:16:59,283-Speed 9460.81 samples/sec   Loss 5.0599   LearningRate 0.0123   Epoch: 12   Global Step: 216610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:00,346-Speed 9637.14 samples/sec   Loss 5.1061   LearningRate 0.0123   Epoch: 12   Global Step: 216620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:01,426-Speed 9481.98 samples/sec   Loss 5.1149   LearningRate 0.0123   Epoch: 12   Global Step: 216630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:02,540-Speed 9199.44 samples/sec   Loss 4.9693   LearningRate 0.0123   Epoch: 12   Global Step: 216640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:03,619-Speed 9498.37 samples/sec   Loss 5.0490   LearningRate 0.0123   Epoch: 12   Global Step: 216650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:04,685-Speed 9611.20 samples/sec   Loss 5.0224   LearningRate 0.0123   Epoch: 12   Global Step: 216660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:05,800-Speed 9192.36 samples/sec   Loss 5.0083   LearningRate 0.0123   Epoch: 12   Global Step: 216670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:06,896-Speed 9345.92 samples/sec   Loss 5.0771   LearningRate 0.0123   Epoch: 12   Global Step: 216680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:08,030-Speed 9041.96 samples/sec   Loss 4.9797   LearningRate 0.0123   Epoch: 12   Global Step: 216690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:09,100-Speed 9573.89 samples/sec   Loss 5.0735   LearningRate 0.0123   Epoch: 12   Global Step: 216700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:10,161-Speed 9655.72 samples/sec   Loss 4.9930   LearningRate 0.0123   Epoch: 12   Global Step: 216710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:11,270-Speed 9242.76 samples/sec   Loss 4.9339   LearningRate 0.0123   Epoch: 12   Global Step: 216720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:12,468-Speed 8553.51 samples/sec   Loss 4.9276   LearningRate 0.0123   Epoch: 12   Global Step: 216730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:13,581-Speed 9202.66 samples/sec   Loss 5.0672   LearningRate 0.0123   Epoch: 12   Global Step: 216740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:14,674-Speed 9381.82 samples/sec   Loss 5.0028   LearningRate 0.0123   Epoch: 12   Global Step: 216750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:15,758-Speed 9443.87 samples/sec   Loss 5.0146   LearningRate 0.0123   Epoch: 12   Global Step: 216760   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:16,846-Speed 9423.20 samples/sec   Loss 4.9881   LearningRate 0.0123   Epoch: 12   Global Step: 216770   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:17,923-Speed 9509.71 samples/sec   Loss 4.9722   LearningRate 0.0123   Epoch: 12   Global Step: 216780   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:18,998-Speed 9529.00 samples/sec   Loss 5.0402   LearningRate 0.0123   Epoch: 12   Global Step: 216790   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:20,086-Speed 9418.10 samples/sec   Loss 4.9145   LearningRate 0.0123   Epoch: 12   Global Step: 216800   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:21,142-Speed 9704.04 samples/sec   Loss 5.1144   LearningRate 0.0123   Epoch: 12   Global Step: 216810   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:22,210-Speed 9589.72 samples/sec   Loss 4.9954   LearningRate 0.0123   Epoch: 12   Global Step: 216820   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:23,283-Speed 9549.80 samples/sec   Loss 5.0406   LearningRate 0.0123   Epoch: 12   Global Step: 216830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:24,393-Speed 9239.69 samples/sec   Loss 5.0703   LearningRate 0.0123   Epoch: 12   Global Step: 216840   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:25,472-Speed 9502.16 samples/sec   Loss 5.0561   LearningRate 0.0123   Epoch: 12   Global Step: 216850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:17:26,542-Speed 9575.42 samples/sec   Loss 4.9845   LearningRate 0.0123   Epoch: 12   Global Step: 216860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:27,629-Speed 9421.67 samples/sec   Loss 5.1211   LearningRate 0.0123   Epoch: 12   Global Step: 216870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:28,705-Speed 9522.37 samples/sec   Loss 4.8719   LearningRate 0.0123   Epoch: 12   Global Step: 216880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:29,765-Speed 9674.18 samples/sec   Loss 4.9122   LearningRate 0.0123   Epoch: 12   Global Step: 216890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:30,861-Speed 9346.24 samples/sec   Loss 4.9357   LearningRate 0.0123   Epoch: 12   Global Step: 216900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:31,934-Speed 9548.74 samples/sec   Loss 5.0011   LearningRate 0.0123   Epoch: 12   Global Step: 216910   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:33,011-Speed 9515.91 samples/sec   Loss 5.0172   LearningRate 0.0123   Epoch: 12   Global Step: 216920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:34,106-Speed 9357.43 samples/sec   Loss 5.0484   LearningRate 0.0123   Epoch: 12   Global Step: 216930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:35,205-Speed 9323.00 samples/sec   Loss 5.0239   LearningRate 0.0123   Epoch: 12   Global Step: 216940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:36,356-Speed 8902.13 samples/sec   Loss 5.0055   LearningRate 0.0123   Epoch: 12   Global Step: 216950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:37,429-Speed 9546.53 samples/sec   Loss 4.9617   LearningRate 0.0123   Epoch: 12   Global Step: 216960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:38,492-Speed 9641.89 samples/sec   Loss 5.0616   LearningRate 0.0123   Epoch: 12   Global Step: 216970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:17:39,766-Speed 8042.26 samples/sec   Loss 5.0640   LearningRate 0.0123   Epoch: 12   Global Step: 216980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:08,502-Speed 356.35 samples/sec   Loss 4.5408   LearningRate 0.0122   Epoch: 13   Global Step: 216990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:09,592-Speed 9398.47 samples/sec   Loss 4.2905   LearningRate 0.0122   Epoch: 13   Global Step: 217000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:10,983-Speed 7367.97 samples/sec   Loss 4.3195   LearningRate 0.0122   Epoch: 13   Global Step: 217010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:12,386-Speed 7306.18 samples/sec   Loss 4.3776   LearningRate 0.0122   Epoch: 13   Global Step: 217020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:13,644-Speed 8141.70 samples/sec   Loss 4.3003   LearningRate 0.0122   Epoch: 13   Global Step: 217030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:14,971-Speed 7723.75 samples/sec   Loss 4.2562   LearningRate 0.0122   Epoch: 13   Global Step: 217040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:16,285-Speed 7797.93 samples/sec   Loss 4.3716   LearningRate 0.0122   Epoch: 13   Global Step: 217050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:17,407-Speed 9137.39 samples/sec   Loss 4.3648   LearningRate 0.0122   Epoch: 13   Global Step: 217060   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:18:18,573-Speed 8786.81 samples/sec   Loss 4.3196   LearningRate 0.0122   Epoch: 13   Global Step: 217070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:19,779-Speed 8491.79 samples/sec   Loss 4.3851   LearningRate 0.0122   Epoch: 13   Global Step: 217080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:20,877-Speed 9340.18 samples/sec   Loss 4.4009   LearningRate 0.0122   Epoch: 13   Global Step: 217090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:21,983-Speed 9256.16 samples/sec   Loss 4.3528   LearningRate 0.0122   Epoch: 13   Global Step: 217100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:23,106-Speed 9126.37 samples/sec   Loss 4.3077   LearningRate 0.0122   Epoch: 13   Global Step: 217110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:24,198-Speed 9384.94 samples/sec   Loss 4.3818   LearningRate 0.0122   Epoch: 13   Global Step: 217120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:25,413-Speed 8433.72 samples/sec   Loss 4.3432   LearningRate 0.0122   Epoch: 13   Global Step: 217130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:26,514-Speed 9307.54 samples/sec   Loss 4.3823   LearningRate 0.0122   Epoch: 13   Global Step: 217140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:27,612-Speed 9331.81 samples/sec   Loss 4.2962   LearningRate 0.0122   Epoch: 13   Global Step: 217150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:28,757-Speed 8950.98 samples/sec   Loss 4.2852   LearningRate 0.0122   Epoch: 13   Global Step: 217160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:29,877-Speed 9146.22 samples/sec   Loss 4.3170   LearningRate 0.0122   Epoch: 13   Global Step: 217170   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:18:30,978-Speed 9310.27 samples/sec   Loss 4.3794   LearningRate 0.0122   Epoch: 13   Global Step: 217180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:32,070-Speed 9379.51 samples/sec   Loss 4.2996   LearningRate 0.0122   Epoch: 13   Global Step: 217190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:33,186-Speed 9181.69 samples/sec   Loss 4.2404   LearningRate 0.0122   Epoch: 13   Global Step: 217200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:34,328-Speed 8976.11 samples/sec   Loss 4.3452   LearningRate 0.0122   Epoch: 13   Global Step: 217210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:35,463-Speed 9033.58 samples/sec   Loss 4.3240   LearningRate 0.0122   Epoch: 13   Global Step: 217220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:36,583-Speed 9145.25 samples/sec   Loss 4.3453   LearningRate 0.0122   Epoch: 13   Global Step: 217230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:37,702-Speed 9157.19 samples/sec   Loss 4.4223   LearningRate 0.0122   Epoch: 13   Global Step: 217240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:38,896-Speed 8584.42 samples/sec   Loss 4.3617   LearningRate 0.0122   Epoch: 13   Global Step: 217250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:39,963-Speed 9600.50 samples/sec   Loss 4.3685   LearningRate 0.0122   Epoch: 13   Global Step: 217260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:41,054-Speed 9387.44 samples/sec   Loss 4.3888   LearningRate 0.0122   Epoch: 13   Global Step: 217270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:42,163-Speed 9246.19 samples/sec   Loss 4.3576   LearningRate 0.0122   Epoch: 13   Global Step: 217280   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:18:43,283-Speed 9147.81 samples/sec   Loss 4.2875   LearningRate 0.0122   Epoch: 13   Global Step: 217290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:44,354-Speed 9561.37 samples/sec   Loss 4.2282   LearningRate 0.0122   Epoch: 13   Global Step: 217300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:45,464-Speed 9235.43 samples/sec   Loss 4.3612   LearningRate 0.0122   Epoch: 13   Global Step: 217310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:46,549-Speed 9446.08 samples/sec   Loss 4.4186   LearningRate 0.0122   Epoch: 13   Global Step: 217320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:47,666-Speed 9171.07 samples/sec   Loss 4.2834   LearningRate 0.0122   Epoch: 13   Global Step: 217330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:48,781-Speed 9188.80 samples/sec   Loss 4.3409   LearningRate 0.0122   Epoch: 13   Global Step: 217340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:49,889-Speed 9246.08 samples/sec   Loss 4.3461   LearningRate 0.0122   Epoch: 13   Global Step: 217350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:51,020-Speed 9060.26 samples/sec   Loss 4.4727   LearningRate 0.0122   Epoch: 13   Global Step: 217360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:52,094-Speed 9542.04 samples/sec   Loss 4.3586   LearningRate 0.0122   Epoch: 13   Global Step: 217370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:53,160-Speed 9605.10 samples/sec   Loss 4.3842   LearningRate 0.0122   Epoch: 13   Global Step: 217380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:54,225-Speed 9629.94 samples/sec   Loss 4.3932   LearningRate 0.0122   Epoch: 13   Global Step: 217390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:55,338-Speed 9203.97 samples/sec   Loss 4.3328   LearningRate 0.0122   Epoch: 13   Global Step: 217400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:18:56,470-Speed 9048.30 samples/sec   Loss 4.3430   LearningRate 0.0122   Epoch: 13   Global Step: 217410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:18:57,573-Speed 9292.78 samples/sec   Loss 4.3836   LearningRate 0.0122   Epoch: 13   Global Step: 217420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:18:58,689-Speed 9176.99 samples/sec   Loss 4.3219   LearningRate 0.0122   Epoch: 13   Global Step: 217430   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:18:59,824-Speed 9024.57 samples/sec   Loss 4.2975   LearningRate 0.0122   Epoch: 13   Global Step: 217440   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:19:00,932-Speed 9249.50 samples/sec   Loss 4.2877   LearningRate 0.0122   Epoch: 13   Global Step: 217450   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:19:02,057-Speed 9110.46 samples/sec   Loss 4.3137   LearningRate 0.0122   Epoch: 13   Global Step: 217460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:19:03,202-Speed 8946.50 samples/sec   Loss 4.3157   LearningRate 0.0121   Epoch: 13   Global Step: 217470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:19:04,310-Speed 9247.27 samples/sec   Loss 4.3655   LearningRate 0.0121   Epoch: 13   Global Step: 217480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:19:05,375-Speed 9622.53 samples/sec   Loss 4.3617   LearningRate 0.0121   Epoch: 13   Global Step: 217490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:19:06,454-Speed 9494.14 samples/sec   Loss 4.3684   LearningRate 0.0121   Epoch: 13   Global Step: 217500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:19:07,559-Speed 9274.32 samples/sec   Loss 4.2825   LearningRate 0.0121   Epoch: 13   Global Step: 217510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:08,688-Speed 9072.87 samples/sec   Loss 4.4126   LearningRate 0.0121   Epoch: 13   Global Step: 217520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:09,728-Speed 9858.41 samples/sec   Loss 4.3544   LearningRate 0.0121   Epoch: 13   Global Step: 217530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:10,826-Speed 9327.71 samples/sec   Loss 4.4034   LearningRate 0.0121   Epoch: 13   Global Step: 217540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:11,920-Speed 9367.17 samples/sec   Loss 4.3960   LearningRate 0.0121   Epoch: 13   Global Step: 217550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:12,994-Speed 9538.13 samples/sec   Loss 4.3811   LearningRate 0.0121   Epoch: 13   Global Step: 217560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:14,049-Speed 9715.80 samples/sec   Loss 4.3995   LearningRate 0.0121   Epoch: 13   Global Step: 217570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:15,515-Speed 6989.03 samples/sec   Loss 4.3349   LearningRate 0.0121   Epoch: 13   Global Step: 217580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:16,945-Speed 7162.53 samples/sec   Loss 4.3824   LearningRate 0.0121   Epoch: 13   Global Step: 217590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:18,032-Speed 9425.31 samples/sec   Loss 4.3445   LearningRate 0.0121   Epoch: 13   Global Step: 217600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:19,274-Speed 8252.12 samples/sec   Loss 4.3822   LearningRate 0.0121   Epoch: 13   Global Step: 217610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:20,532-Speed 8144.67 samples/sec   Loss 4.3494   LearningRate 0.0121   Epoch: 13   Global Step: 217620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:21,828-Speed 7904.55 samples/sec   Loss 4.3431   LearningRate 0.0121   Epoch: 13   Global Step: 217630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:22,932-Speed 9282.90 samples/sec   Loss 4.3347   LearningRate 0.0121   Epoch: 13   Global Step: 217640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:24,031-Speed 9322.42 samples/sec   Loss 4.3830   LearningRate 0.0121   Epoch: 13   Global Step: 217650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:25,102-Speed 9564.99 samples/sec   Loss 4.4408   LearningRate 0.0121   Epoch: 13   Global Step: 217660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:26,158-Speed 9709.99 samples/sec   Loss 4.3369   LearningRate 0.0121   Epoch: 13   Global Step: 217670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:27,403-Speed 8225.29 samples/sec   Loss 4.4222   LearningRate 0.0121   Epoch: 13   Global Step: 217680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:28,497-Speed 9371.84 samples/sec   Loss 4.4146   LearningRate 0.0121   Epoch: 13   Global Step: 217690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:29,584-Speed 9418.83 samples/sec   Loss 4.4441   LearningRate 0.0121   Epoch: 13   Global Step: 217700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:30,670-Speed 9437.90 samples/sec   Loss 4.4633   LearningRate 0.0121   Epoch: 13   Global Step: 217710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:31,793-Speed 9123.88 samples/sec   Loss 4.4208   LearningRate 0.0121   Epoch: 13   Global Step: 217720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:32,860-Speed 9600.82 samples/sec   Loss 4.3827   LearningRate 0.0121   Epoch: 13   Global Step: 217730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:33,961-Speed 9304.95 samples/sec   Loss 4.4109   LearningRate 0.0121   Epoch: 13   Global Step: 217740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:35,042-Speed 9475.87 samples/sec   Loss 4.4505   LearningRate 0.0121   Epoch: 13   Global Step: 217750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:36,137-Speed 9361.79 samples/sec   Loss 4.4306   LearningRate 0.0121   Epoch: 13   Global Step: 217760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:37,261-Speed 9118.63 samples/sec   Loss 4.4663   LearningRate 0.0121   Epoch: 13   Global Step: 217770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:38,387-Speed 9099.73 samples/sec   Loss 4.4026   LearningRate 0.0121   Epoch: 13   Global Step: 217780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:39,488-Speed 9302.44 samples/sec   Loss 4.3485   LearningRate 0.0121   Epoch: 13   Global Step: 217790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:40,578-Speed 9406.90 samples/sec   Loss 4.4283   LearningRate 0.0121   Epoch: 13   Global Step: 217800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:41,679-Speed 9303.14 samples/sec   Loss 4.4761   LearningRate 0.0121   Epoch: 13   Global Step: 217810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:42,734-Speed 9721.11 samples/sec   Loss 4.4146   LearningRate 0.0121   Epoch: 13   Global Step: 217820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:43,796-Speed 9649.71 samples/sec   Loss 4.5240   LearningRate 0.0121   Epoch: 13   Global Step: 217830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:44,924-Speed 9079.72 samples/sec   Loss 4.4857   LearningRate 0.0121   Epoch: 13   Global Step: 217840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:46,016-Speed 9381.54 samples/sec   Loss 4.3595   LearningRate 0.0121   Epoch: 13   Global Step: 217850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:47,150-Speed 9035.27 samples/sec   Loss 4.3844   LearningRate 0.0121   Epoch: 13   Global Step: 217860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:48,238-Speed 9422.99 samples/sec   Loss 4.3166   LearningRate 0.0121   Epoch: 13   Global Step: 217870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:49,352-Speed 9192.52 samples/sec   Loss 4.4088   LearningRate 0.0121   Epoch: 13   Global Step: 217880   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:50,459-Speed 9252.83 samples/sec   Loss 4.3423   LearningRate 0.0121   Epoch: 13   Global Step: 217890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:51,554-Speed 9357.65 samples/sec   Loss 4.4765   LearningRate 0.0121   Epoch: 13   Global Step: 217900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:52,724-Speed 8759.96 samples/sec   Loss 4.4300   LearningRate 0.0121   Epoch: 13   Global Step: 217910   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:19:53,841-Speed 9174.46 samples/sec   Loss 4.4609   LearningRate 0.0121   Epoch: 13   Global Step: 217920   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:54,927-Speed 9432.08 samples/sec   Loss 4.3841   LearningRate 0.0121   Epoch: 13   Global Step: 217930   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:56,025-Speed 9338.07 samples/sec   Loss 4.4320   LearningRate 0.0121   Epoch: 13   Global Step: 217940   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:57,175-Speed 8907.70 samples/sec   Loss 4.4769   LearningRate 0.0120   Epoch: 13   Global Step: 217950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:58,284-Speed 9239.17 samples/sec   Loss 4.3469   LearningRate 0.0120   Epoch: 13   Global Step: 217960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:19:59,389-Speed 9266.31 samples/sec   Loss 4.4220   LearningRate 0.0120   Epoch: 13   Global Step: 217970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:20:00,490-Speed 9311.75 samples/sec   Loss 4.4507   LearningRate 0.0120   Epoch: 13   Global Step: 217980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:20:01,683-Speed 8589.61 samples/sec   Loss 4.4553   LearningRate 0.0120   Epoch: 13   Global Step: 217990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:20:02,781-Speed 9332.79 samples/sec   Loss 4.3450   LearningRate 0.0120   Epoch: 13   Global Step: 218000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:20:24,528-[lfw][218000]XNorm: 8.244316
Training: 2022-04-11 20:20:24,529-[lfw][218000]Accuracy-Flip: 0.99700+-0.00314
Training: 2022-04-11 20:20:24,529-[lfw][218000]Accuracy-Highest: 0.99700
Training: 2022-04-11 20:20:49,697-[cfp_fp][218000]XNorm: 7.073361
Training: 2022-04-11 20:20:49,698-[cfp_fp][218000]Accuracy-Flip: 0.96500+-0.01145
Training: 2022-04-11 20:20:49,698-[cfp_fp][218000]Accuracy-Highest: 0.96771
Training: 2022-04-11 20:21:11,403-[agedb_30][218000]XNorm: 7.942713
Training: 2022-04-11 20:21:11,404-[agedb_30][218000]Accuracy-Flip: 0.96833+-0.00972
Training: 2022-04-11 20:21:11,404-[agedb_30][218000]Accuracy-Highest: 0.97033
Training: 2022-04-11 20:21:12,502-Speed 146.87 samples/sec   Loss 4.4353   LearningRate 0.0120   Epoch: 13   Global Step: 218010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:13,546-Speed 9813.48 samples/sec   Loss 4.5222   LearningRate 0.0120   Epoch: 13   Global Step: 218020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:14,620-Speed 9542.76 samples/sec   Loss 4.4397   LearningRate 0.0120   Epoch: 13   Global Step: 218030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:15,738-Speed 9166.06 samples/sec   Loss 4.4036   LearningRate 0.0120   Epoch: 13   Global Step: 218040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:16,811-Speed 9546.27 samples/sec   Loss 4.3977   LearningRate 0.0120   Epoch: 13   Global Step: 218050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:17,887-Speed 9520.41 samples/sec   Loss 4.3959   LearningRate 0.0120   Epoch: 13   Global Step: 218060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:18,969-Speed 9575.36 samples/sec   Loss 4.4232   LearningRate 0.0120   Epoch: 13   Global Step: 218070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:20,043-Speed 9539.86 samples/sec   Loss 4.4734   LearningRate 0.0120   Epoch: 13   Global Step: 218080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:21,139-Speed 9351.91 samples/sec   Loss 4.4179   LearningRate 0.0120   Epoch: 13   Global Step: 218090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:22,242-Speed 9290.02 samples/sec   Loss 4.4591   LearningRate 0.0120   Epoch: 13   Global Step: 218100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:23,333-Speed 9399.97 samples/sec   Loss 4.4614   LearningRate 0.0120   Epoch: 13   Global Step: 218110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:24,414-Speed 9473.74 samples/sec   Loss 4.4781   LearningRate 0.0120   Epoch: 13   Global Step: 218120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:25,497-Speed 9465.58 samples/sec   Loss 4.4025   LearningRate 0.0120   Epoch: 13   Global Step: 218130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:26,610-Speed 9205.43 samples/sec   Loss 4.5334   LearningRate 0.0120   Epoch: 13   Global Step: 218140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:27,671-Speed 9657.91 samples/sec   Loss 4.4266   LearningRate 0.0120   Epoch: 13   Global Step: 218150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:28,768-Speed 9336.84 samples/sec   Loss 4.4542   LearningRate 0.0120   Epoch: 13   Global Step: 218160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:29,847-Speed 9496.93 samples/sec   Loss 4.4354   LearningRate 0.0120   Epoch: 13   Global Step: 218170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:30,977-Speed 9068.53 samples/sec   Loss 4.4904   LearningRate 0.0120   Epoch: 13   Global Step: 218180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:32,087-Speed 9237.60 samples/sec   Loss 4.4857   LearningRate 0.0120   Epoch: 13   Global Step: 218190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:33,207-Speed 9140.46 samples/sec   Loss 4.4469   LearningRate 0.0120   Epoch: 13   Global Step: 218200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:34,303-Speed 9355.69 samples/sec   Loss 4.3888   LearningRate 0.0120   Epoch: 13   Global Step: 218210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:35,400-Speed 9337.21 samples/sec   Loss 4.4902   LearningRate 0.0120   Epoch: 13   Global Step: 218220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:36,485-Speed 9452.87 samples/sec   Loss 4.4755   LearningRate 0.0120   Epoch: 13   Global Step: 218230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:37,556-Speed 9564.80 samples/sec   Loss 4.4233   LearningRate 0.0120   Epoch: 13   Global Step: 218240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:38,682-Speed 9099.54 samples/sec   Loss 4.4288   LearningRate 0.0120   Epoch: 13   Global Step: 218250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:39,801-Speed 9159.94 samples/sec   Loss 4.4670   LearningRate 0.0120   Epoch: 13   Global Step: 218260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:40,896-Speed 9350.74 samples/sec   Loss 4.5793   LearningRate 0.0120   Epoch: 13   Global Step: 218270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:41,997-Speed 9306.20 samples/sec   Loss 4.4619   LearningRate 0.0120   Epoch: 13   Global Step: 218280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:43,083-Speed 9439.77 samples/sec   Loss 4.5622   LearningRate 0.0120   Epoch: 13   Global Step: 218290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:44,208-Speed 9102.56 samples/sec   Loss 4.4409   LearningRate 0.0120   Epoch: 13   Global Step: 218300   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:45,274-Speed 9615.87 samples/sec   Loss 4.4654   LearningRate 0.0120   Epoch: 13   Global Step: 218310   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:46,372-Speed 9334.23 samples/sec   Loss 4.5110   LearningRate 0.0120   Epoch: 13   Global Step: 218320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:47,480-Speed 9247.71 samples/sec   Loss 4.4432   LearningRate 0.0120   Epoch: 13   Global Step: 218330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:48,588-Speed 9242.90 samples/sec   Loss 4.4538   LearningRate 0.0120   Epoch: 13   Global Step: 218340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:49,702-Speed 9200.17 samples/sec   Loss 4.5361   LearningRate 0.0120   Epoch: 13   Global Step: 218350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:50,763-Speed 9659.67 samples/sec   Loss 4.4696   LearningRate 0.0120   Epoch: 13   Global Step: 218360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:51,860-Speed 9342.89 samples/sec   Loss 4.4126   LearningRate 0.0120   Epoch: 13   Global Step: 218370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:21:52,970-Speed 9234.59 samples/sec   Loss 4.4102   LearningRate 0.0120   Epoch: 13   Global Step: 218380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:54,067-Speed 9334.69 samples/sec   Loss 4.4852   LearningRate 0.0120   Epoch: 13   Global Step: 218390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:55,232-Speed 8794.78 samples/sec   Loss 4.4835   LearningRate 0.0120   Epoch: 13   Global Step: 218400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:56,322-Speed 9405.05 samples/sec   Loss 4.4978   LearningRate 0.0120   Epoch: 13   Global Step: 218410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:57,424-Speed 9293.57 samples/sec   Loss 4.4938   LearningRate 0.0120   Epoch: 13   Global Step: 218420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:58,510-Speed 9437.08 samples/sec   Loss 4.5067   LearningRate 0.0119   Epoch: 13   Global Step: 218430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:21:59,623-Speed 9205.76 samples/sec   Loss 4.3824   LearningRate 0.0119   Epoch: 13   Global Step: 218440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:00,689-Speed 9605.08 samples/sec   Loss 4.4669   LearningRate 0.0119   Epoch: 13   Global Step: 218450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:01,789-Speed 9319.88 samples/sec   Loss 4.4528   LearningRate 0.0119   Epoch: 13   Global Step: 218460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:02,876-Speed 9424.26 samples/sec   Loss 4.4906   LearningRate 0.0119   Epoch: 13   Global Step: 218470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:03,961-Speed 9441.01 samples/sec   Loss 4.4954   LearningRate 0.0119   Epoch: 13   Global Step: 218480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:05,045-Speed 9457.92 samples/sec   Loss 4.4322   LearningRate 0.0119   Epoch: 13   Global Step: 218490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:06,121-Speed 9514.97 samples/sec   Loss 4.3921   LearningRate 0.0119   Epoch: 13   Global Step: 218500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:07,236-Speed 9195.92 samples/sec   Loss 4.4762   LearningRate 0.0119   Epoch: 13   Global Step: 218510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:08,327-Speed 9388.48 samples/sec   Loss 4.5842   LearningRate 0.0119   Epoch: 13   Global Step: 218520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:09,446-Speed 9158.45 samples/sec   Loss 4.3899   LearningRate 0.0119   Epoch: 13   Global Step: 218530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:10,531-Speed 9449.31 samples/sec   Loss 4.5502   LearningRate 0.0119   Epoch: 13   Global Step: 218540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:11,619-Speed 9412.36 samples/sec   Loss 4.4747   LearningRate 0.0119   Epoch: 13   Global Step: 218550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:12,694-Speed 9536.59 samples/sec   Loss 4.4248   LearningRate 0.0119   Epoch: 13   Global Step: 218560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:13,774-Speed 9487.51 samples/sec   Loss 4.5164   LearningRate 0.0119   Epoch: 13   Global Step: 218570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:14,819-Speed 9801.19 samples/sec   Loss 4.5208   LearningRate 0.0119   Epoch: 13   Global Step: 218580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:15,866-Speed 9785.99 samples/sec   Loss 4.4608   LearningRate 0.0119   Epoch: 13   Global Step: 218590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:16,955-Speed 9415.17 samples/sec   Loss 4.4839   LearningRate 0.0119   Epoch: 13   Global Step: 218600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:18,125-Speed 8754.84 samples/sec   Loss 4.4444   LearningRate 0.0119   Epoch: 13   Global Step: 218610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:19,244-Speed 9151.88 samples/sec   Loss 4.5240   LearningRate 0.0119   Epoch: 13   Global Step: 218620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:20,331-Speed 9424.33 samples/sec   Loss 4.4380   LearningRate 0.0119   Epoch: 13   Global Step: 218630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:21,397-Speed 9613.34 samples/sec   Loss 4.4824   LearningRate 0.0119   Epoch: 13   Global Step: 218640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:22,510-Speed 9210.52 samples/sec   Loss 4.6109   LearningRate 0.0119   Epoch: 13   Global Step: 218650   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:23,647-Speed 9010.17 samples/sec   Loss 4.5433   LearningRate 0.0119   Epoch: 13   Global Step: 218660   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:24,755-Speed 9248.48 samples/sec   Loss 4.5816   LearningRate 0.0119   Epoch: 13   Global Step: 218670   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:25,865-Speed 9227.38 samples/sec   Loss 4.4346   LearningRate 0.0119   Epoch: 13   Global Step: 218680   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:26,950-Speed 9446.11 samples/sec   Loss 4.5679   LearningRate 0.0119   Epoch: 13   Global Step: 218690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:28,044-Speed 9369.07 samples/sec   Loss 4.4877   LearningRate 0.0119   Epoch: 13   Global Step: 218700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:29,157-Speed 9199.94 samples/sec   Loss 4.4800   LearningRate 0.0119   Epoch: 13   Global Step: 218710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:30,236-Speed 9502.62 samples/sec   Loss 4.5633   LearningRate 0.0119   Epoch: 13   Global Step: 218720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:31,312-Speed 9521.74 samples/sec   Loss 4.4611   LearningRate 0.0119   Epoch: 13   Global Step: 218730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:32,398-Speed 9428.48 samples/sec   Loss 4.4636   LearningRate 0.0119   Epoch: 13   Global Step: 218740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:33,452-Speed 9718.18 samples/sec   Loss 4.5458   LearningRate 0.0119   Epoch: 13   Global Step: 218750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:34,533-Speed 9480.85 samples/sec   Loss 4.5218   LearningRate 0.0119   Epoch: 13   Global Step: 218760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:35,663-Speed 9070.59 samples/sec   Loss 4.4592   LearningRate 0.0119   Epoch: 13   Global Step: 218770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:36,771-Speed 9249.90 samples/sec   Loss 4.3951   LearningRate 0.0119   Epoch: 13   Global Step: 218780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:37,845-Speed 9535.53 samples/sec   Loss 4.5150   LearningRate 0.0119   Epoch: 13   Global Step: 218790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:38,960-Speed 9190.66 samples/sec   Loss 4.5045   LearningRate 0.0119   Epoch: 13   Global Step: 218800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:40,040-Speed 9488.91 samples/sec   Loss 4.4309   LearningRate 0.0119   Epoch: 13   Global Step: 218810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:41,134-Speed 9361.17 samples/sec   Loss 4.5025   LearningRate 0.0119   Epoch: 13   Global Step: 218820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:42,205-Speed 9570.55 samples/sec   Loss 4.5395   LearningRate 0.0119   Epoch: 13   Global Step: 218830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:43,303-Speed 9326.59 samples/sec   Loss 4.5469   LearningRate 0.0119   Epoch: 13   Global Step: 218840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:44,441-Speed 9005.57 samples/sec   Loss 4.5125   LearningRate 0.0119   Epoch: 13   Global Step: 218850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:45,563-Speed 9135.94 samples/sec   Loss 4.4241   LearningRate 0.0119   Epoch: 13   Global Step: 218860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:46,642-Speed 9503.08 samples/sec   Loss 4.3530   LearningRate 0.0119   Epoch: 13   Global Step: 218870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:47,723-Speed 9476.42 samples/sec   Loss 4.5702   LearningRate 0.0119   Epoch: 13   Global Step: 218880   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:22:48,815-Speed 9381.70 samples/sec   Loss 4.4725   LearningRate 0.0119   Epoch: 13   Global Step: 218890   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:49,965-Speed 8905.38 samples/sec   Loss 4.5055   LearningRate 0.0119   Epoch: 13   Global Step: 218900   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:22:51,039-Speed 9538.87 samples/sec   Loss 4.5019   LearningRate 0.0118   Epoch: 13   Global Step: 218910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:22:52,099-Speed 9668.75 samples/sec   Loss 4.4399   LearningRate 0.0118   Epoch: 13   Global Step: 218920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:22:53,217-Speed 9170.60 samples/sec   Loss 4.4865   LearningRate 0.0118   Epoch: 13   Global Step: 218930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:22:54,342-Speed 9107.70 samples/sec   Loss 4.5703   LearningRate 0.0118   Epoch: 13   Global Step: 218940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:22:55,413-Speed 9562.96 samples/sec   Loss 4.5093   LearningRate 0.0118   Epoch: 13   Global Step: 218950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:22:56,485-Speed 9553.70 samples/sec   Loss 4.5563   LearningRate 0.0118   Epoch: 13   Global Step: 218960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:22:57,592-Speed 9256.12 samples/sec   Loss 4.5941   LearningRate 0.0118   Epoch: 13   Global Step: 218970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:22:58,686-Speed 9363.48 samples/sec   Loss 4.4746   LearningRate 0.0118   Epoch: 13   Global Step: 218980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:22:59,776-Speed 9401.31 samples/sec   Loss 4.4733   LearningRate 0.0118   Epoch: 13   Global Step: 218990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:23:00,908-Speed 9055.27 samples/sec   Loss 4.4977   LearningRate 0.0118   Epoch: 13   Global Step: 219000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:23:01,991-Speed 9459.64 samples/sec   Loss 4.4912   LearningRate 0.0118   Epoch: 13   Global Step: 219010   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:03,077-Speed 9436.07 samples/sec   Loss 4.4949   LearningRate 0.0118   Epoch: 13   Global Step: 219020   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:04,203-Speed 9098.49 samples/sec   Loss 4.5168   LearningRate 0.0118   Epoch: 13   Global Step: 219030   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:05,288-Speed 9454.39 samples/sec   Loss 4.5234   LearningRate 0.0118   Epoch: 13   Global Step: 219040   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:06,377-Speed 9403.89 samples/sec   Loss 4.5246   LearningRate 0.0118   Epoch: 13   Global Step: 219050   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:07,511-Speed 9036.09 samples/sec   Loss 4.5808   LearningRate 0.0118   Epoch: 13   Global Step: 219060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:08,573-Speed 9651.97 samples/sec   Loss 4.5264   LearningRate 0.0118   Epoch: 13   Global Step: 219070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:09,668-Speed 9356.90 samples/sec   Loss 4.4602   LearningRate 0.0118   Epoch: 13   Global Step: 219080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:10,723-Speed 9711.12 samples/sec   Loss 4.5148   LearningRate 0.0118   Epoch: 13   Global Step: 219090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:11,818-Speed 9350.58 samples/sec   Loss 4.4778   LearningRate 0.0118   Epoch: 13   Global Step: 219100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:12,913-Speed 9356.39 samples/sec   Loss 4.5813   LearningRate 0.0118   Epoch: 13   Global Step: 219110   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:23:14,011-Speed 9334.33 samples/sec   Loss 4.5488   LearningRate 0.0118   Epoch: 13   Global Step: 219120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:15,111-Speed 9318.68 samples/sec   Loss 4.5327   LearningRate 0.0118   Epoch: 13   Global Step: 219130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:16,239-Speed 9082.61 samples/sec   Loss 4.4884   LearningRate 0.0118   Epoch: 13   Global Step: 219140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:17,385-Speed 8937.99 samples/sec   Loss 4.6454   LearningRate 0.0118   Epoch: 13   Global Step: 219150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:18,552-Speed 8782.14 samples/sec   Loss 4.5499   LearningRate 0.0118   Epoch: 13   Global Step: 219160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:19,646-Speed 9366.52 samples/sec   Loss 4.4622   LearningRate 0.0118   Epoch: 13   Global Step: 219170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:20,708-Speed 9643.05 samples/sec   Loss 4.5229   LearningRate 0.0118   Epoch: 13   Global Step: 219180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:21,811-Speed 9287.97 samples/sec   Loss 4.5661   LearningRate 0.0118   Epoch: 13   Global Step: 219190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:22,911-Speed 9324.48 samples/sec   Loss 4.4918   LearningRate 0.0118   Epoch: 13   Global Step: 219200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:23:23,984-Speed 9548.28 samples/sec   Loss 4.5319   LearningRate 0.0118   Epoch: 13   Global Step: 219210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:23:25,070-Speed 9431.79 samples/sec   Loss 4.4495   LearningRate 0.0118   Epoch: 13   Global Step: 219220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:23:26,144-Speed 9543.25 samples/sec   Loss 4.5907   LearningRate 0.0118   Epoch: 13   Global Step: 219230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:23:27,225-Speed 9484.88 samples/sec   Loss 4.5226   LearningRate 0.0118   Epoch: 13   Global Step: 219240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:23:28,321-Speed 9340.15 samples/sec   Loss 4.5681   LearningRate 0.0118   Epoch: 13   Global Step: 219250   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:23:29,465-Speed 8959.62 samples/sec   Loss 4.5745   LearningRate 0.0118   Epoch: 13   Global Step: 219260   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:23:30,581-Speed 9183.88 samples/sec   Loss 4.4552   LearningRate 0.0118   Epoch: 13   Global Step: 219270   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:23:31,656-Speed 9527.96 samples/sec   Loss 4.5546   LearningRate 0.0118   Epoch: 13   Global Step: 219280   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:23:32,741-Speed 9444.38 samples/sec   Loss 4.4443   LearningRate 0.0118   Epoch: 13   Global Step: 219290   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:23:33,851-Speed 9223.37 samples/sec   Loss 4.6109   LearningRate 0.0118   Epoch: 13   Global Step: 219300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:34,991-Speed 8999.60 samples/sec   Loss 4.5741   LearningRate 0.0118   Epoch: 13   Global Step: 219310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:36,073-Speed 9470.65 samples/sec   Loss 4.6169   LearningRate 0.0118   Epoch: 13   Global Step: 219320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:37,152-Speed 9492.01 samples/sec   Loss 4.6597   LearningRate 0.0118   Epoch: 13   Global Step: 219330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:38,291-Speed 9001.47 samples/sec   Loss 4.6043   LearningRate 0.0118   Epoch: 13   Global Step: 219340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:39,359-Speed 9592.73 samples/sec   Loss 4.5755   LearningRate 0.0118   Epoch: 13   Global Step: 219350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:40,434-Speed 9528.39 samples/sec   Loss 4.5720   LearningRate 0.0118   Epoch: 13   Global Step: 219360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:41,472-Speed 9879.69 samples/sec   Loss 4.5315   LearningRate 0.0118   Epoch: 13   Global Step: 219370   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:42,507-Speed 9894.75 samples/sec   Loss 4.5411   LearningRate 0.0118   Epoch: 13   Global Step: 219380   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:43,588-Speed 9483.61 samples/sec   Loss 4.6150   LearningRate 0.0118   Epoch: 13   Global Step: 219390   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:44,704-Speed 9175.31 samples/sec   Loss 4.5699   LearningRate 0.0117   Epoch: 13   Global Step: 219400   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:45,796-Speed 9389.34 samples/sec   Loss 4.5619   LearningRate 0.0117   Epoch: 13   Global Step: 219410   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:46,840-Speed 9811.63 samples/sec   Loss 4.5440   LearningRate 0.0117   Epoch: 13   Global Step: 219420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:47,964-Speed 9113.25 samples/sec   Loss 4.5443   LearningRate 0.0117   Epoch: 13   Global Step: 219430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:49,080-Speed 9186.63 samples/sec   Loss 4.5661   LearningRate 0.0117   Epoch: 13   Global Step: 219440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:50,143-Speed 9632.48 samples/sec   Loss 4.4725   LearningRate 0.0117   Epoch: 13   Global Step: 219450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:51,202-Speed 9679.91 samples/sec   Loss 4.5263   LearningRate 0.0117   Epoch: 13   Global Step: 219460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:52,304-Speed 9298.15 samples/sec   Loss 4.5644   LearningRate 0.0117   Epoch: 13   Global Step: 219470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:53,440-Speed 9021.79 samples/sec   Loss 4.6131   LearningRate 0.0117   Epoch: 13   Global Step: 219480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:54,535-Speed 9356.65 samples/sec   Loss 4.6374   LearningRate 0.0117   Epoch: 13   Global Step: 219490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:55,596-Speed 9655.28 samples/sec   Loss 4.5230   LearningRate 0.0117   Epoch: 13   Global Step: 219500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:56,706-Speed 9227.95 samples/sec   Loss 4.6438   LearningRate 0.0117   Epoch: 13   Global Step: 219510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:57,849-Speed 8960.45 samples/sec   Loss 4.5214   LearningRate 0.0117   Epoch: 13   Global Step: 219520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:23:58,966-Speed 9180.29 samples/sec   Loss 4.5779   LearningRate 0.0117   Epoch: 13   Global Step: 219530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:00,070-Speed 9280.42 samples/sec   Loss 4.5977   LearningRate 0.0117   Epoch: 13   Global Step: 219540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:01,115-Speed 9808.38 samples/sec   Loss 4.5473   LearningRate 0.0117   Epoch: 13   Global Step: 219550   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:02,214-Speed 9321.47 samples/sec   Loss 4.6411   LearningRate 0.0117   Epoch: 13   Global Step: 219560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:03,288-Speed 9542.31 samples/sec   Loss 4.5450   LearningRate 0.0117   Epoch: 13   Global Step: 219570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:04,413-Speed 9107.54 samples/sec   Loss 4.5640   LearningRate 0.0117   Epoch: 13   Global Step: 219580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:05,509-Speed 9346.22 samples/sec   Loss 4.6020   LearningRate 0.0117   Epoch: 13   Global Step: 219590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:06,590-Speed 9477.17 samples/sec   Loss 4.5929   LearningRate 0.0117   Epoch: 13   Global Step: 219600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:07,679-Speed 9410.42 samples/sec   Loss 4.6217   LearningRate 0.0117   Epoch: 13   Global Step: 219610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:08,814-Speed 9028.19 samples/sec   Loss 4.5888   LearningRate 0.0117   Epoch: 13   Global Step: 219620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:09,915-Speed 9305.25 samples/sec   Loss 4.5712   LearningRate 0.0117   Epoch: 13   Global Step: 219630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:10,996-Speed 9478.67 samples/sec   Loss 4.5651   LearningRate 0.0117   Epoch: 13   Global Step: 219640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:12,097-Speed 9301.84 samples/sec   Loss 4.6129   LearningRate 0.0117   Epoch: 13   Global Step: 219650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:13,195-Speed 9329.13 samples/sec   Loss 4.5818   LearningRate 0.0117   Epoch: 13   Global Step: 219660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:14,274-Speed 9502.80 samples/sec   Loss 4.6003   LearningRate 0.0117   Epoch: 13   Global Step: 219670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:15,368-Speed 9363.70 samples/sec   Loss 4.6203   LearningRate 0.0117   Epoch: 13   Global Step: 219680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:16,461-Speed 9374.42 samples/sec   Loss 4.5713   LearningRate 0.0117   Epoch: 13   Global Step: 219690   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:17,555-Speed 9370.50 samples/sec   Loss 4.6517   LearningRate 0.0117   Epoch: 13   Global Step: 219700   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:18,676-Speed 9141.68 samples/sec   Loss 4.5913   LearningRate 0.0117   Epoch: 13   Global Step: 219710   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:19,744-Speed 9592.21 samples/sec   Loss 4.6493   LearningRate 0.0117   Epoch: 13   Global Step: 219720   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:20,808-Speed 9630.65 samples/sec   Loss 4.6184   LearningRate 0.0117   Epoch: 13   Global Step: 219730   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:21,890-Speed 9468.40 samples/sec   Loss 4.4847   LearningRate 0.0117   Epoch: 13   Global Step: 219740   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:22,988-Speed 9328.78 samples/sec   Loss 4.6231   LearningRate 0.0117   Epoch: 13   Global Step: 219750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:24,068-Speed 9495.14 samples/sec   Loss 4.6119   LearningRate 0.0117   Epoch: 13   Global Step: 219760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:25,145-Speed 9512.93 samples/sec   Loss 4.6335   LearningRate 0.0117   Epoch: 13   Global Step: 219770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:26,226-Speed 9485.41 samples/sec   Loss 4.5680   LearningRate 0.0117   Epoch: 13   Global Step: 219780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:27,291-Speed 9618.89 samples/sec   Loss 4.5049   LearningRate 0.0117   Epoch: 13   Global Step: 219790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:28,375-Speed 9448.03 samples/sec   Loss 4.5928   LearningRate 0.0117   Epoch: 13   Global Step: 219800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:29,450-Speed 9526.91 samples/sec   Loss 4.4876   LearningRate 0.0117   Epoch: 13   Global Step: 219810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:30,566-Speed 9184.82 samples/sec   Loss 4.5452   LearningRate 0.0117   Epoch: 13   Global Step: 219820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:31,634-Speed 9596.49 samples/sec   Loss 4.4803   LearningRate 0.0117   Epoch: 13   Global Step: 219830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:32,740-Speed 9257.53 samples/sec   Loss 4.5884   LearningRate 0.0117   Epoch: 13   Global Step: 219840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:33,878-Speed 9004.46 samples/sec   Loss 4.6153   LearningRate 0.0117   Epoch: 13   Global Step: 219850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:35,016-Speed 9005.74 samples/sec   Loss 4.5410   LearningRate 0.0117   Epoch: 13   Global Step: 219860   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:36,120-Speed 9279.38 samples/sec   Loss 4.6497   LearningRate 0.0117   Epoch: 13   Global Step: 219870   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:24:37,179-Speed 9678.54 samples/sec   Loss 4.5616   LearningRate 0.0117   Epoch: 13   Global Step: 219880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:38,275-Speed 9354.42 samples/sec   Loss 4.5848   LearningRate 0.0116   Epoch: 13   Global Step: 219890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:39,365-Speed 9399.76 samples/sec   Loss 4.5843   LearningRate 0.0116   Epoch: 13   Global Step: 219900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:40,426-Speed 9655.02 samples/sec   Loss 4.6322   LearningRate 0.0116   Epoch: 13   Global Step: 219910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:41,500-Speed 9533.57 samples/sec   Loss 4.6305   LearningRate 0.0116   Epoch: 13   Global Step: 219920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:42,596-Speed 9355.32 samples/sec   Loss 4.5368   LearningRate 0.0116   Epoch: 13   Global Step: 219930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:43,680-Speed 9444.78 samples/sec   Loss 4.6321   LearningRate 0.0116   Epoch: 13   Global Step: 219940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:44,746-Speed 9614.65 samples/sec   Loss 4.6479   LearningRate 0.0116   Epoch: 13   Global Step: 219950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:45,825-Speed 9498.89 samples/sec   Loss 4.5221   LearningRate 0.0116   Epoch: 13   Global Step: 219960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:46,913-Speed 9415.57 samples/sec   Loss 4.5598   LearningRate 0.0116   Epoch: 13   Global Step: 219970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:47,999-Speed 9434.33 samples/sec   Loss 4.4752   LearningRate 0.0116   Epoch: 13   Global Step: 219980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:49,154-Speed 8877.20 samples/sec   Loss 4.4904   LearningRate 0.0116   Epoch: 13   Global Step: 219990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:24:50,262-Speed 9246.77 samples/sec   Loss 4.5567   LearningRate 0.0116   Epoch: 13   Global Step: 220000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:25:12,455-[lfw][220000]XNorm: 8.136507
Training: 2022-04-11 20:25:12,455-[lfw][220000]Accuracy-Flip: 0.99567+-0.00238
Training: 2022-04-11 20:25:12,456-[lfw][220000]Accuracy-Highest: 0.99700
Training: 2022-04-11 20:25:38,090-[cfp_fp][220000]XNorm: 6.992759
Training: 2022-04-11 20:25:38,091-[cfp_fp][220000]Accuracy-Flip: 0.96543+-0.00753
Training: 2022-04-11 20:25:38,091-[cfp_fp][220000]Accuracy-Highest: 0.96771
Training: 2022-04-11 20:26:00,229-[agedb_30][220000]XNorm: 7.909859
Training: 2022-04-11 20:26:00,230-[agedb_30][220000]Accuracy-Flip: 0.96850+-0.00996
Training: 2022-04-11 20:26:00,230-[agedb_30][220000]Accuracy-Highest: 0.97033
Training: 2022-04-11 20:26:01,324-Speed 144.10 samples/sec   Loss 4.5845   LearningRate 0.0116   Epoch: 13   Global Step: 220010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:02,404-Speed 9489.37 samples/sec   Loss 4.6570   LearningRate 0.0116   Epoch: 13   Global Step: 220020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:03,458-Speed 9715.43 samples/sec   Loss 4.5880   LearningRate 0.0116   Epoch: 13   Global Step: 220030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:05,390-Speed 5301.83 samples/sec   Loss 4.6890   LearningRate 0.0116   Epoch: 13   Global Step: 220040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:06,464-Speed 9540.09 samples/sec   Loss 4.6199   LearningRate 0.0116   Epoch: 13   Global Step: 220050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:07,550-Speed 9439.49 samples/sec   Loss 4.5780   LearningRate 0.0116   Epoch: 13   Global Step: 220060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:08,693-Speed 8965.37 samples/sec   Loss 4.6834   LearningRate 0.0116   Epoch: 13   Global Step: 220070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:09,742-Speed 9766.88 samples/sec   Loss 4.5892   LearningRate 0.0116   Epoch: 13   Global Step: 220080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:10,805-Speed 9632.56 samples/sec   Loss 4.5998   LearningRate 0.0116   Epoch: 13   Global Step: 220090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:11,931-Speed 9101.90 samples/sec   Loss 4.6143   LearningRate 0.0116   Epoch: 13   Global Step: 220100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:13,023-Speed 9384.59 samples/sec   Loss 4.5450   LearningRate 0.0116   Epoch: 13   Global Step: 220110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:14,151-Speed 9079.38 samples/sec   Loss 4.6633   LearningRate 0.0116   Epoch: 13   Global Step: 220120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:15,192-Speed 9846.85 samples/sec   Loss 4.5947   LearningRate 0.0116   Epoch: 13   Global Step: 220130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:16,274-Speed 9470.10 samples/sec   Loss 4.6049   LearningRate 0.0116   Epoch: 13   Global Step: 220140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:17,376-Speed 9297.73 samples/sec   Loss 4.4895   LearningRate 0.0116   Epoch: 13   Global Step: 220150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:18,438-Speed 9645.45 samples/sec   Loss 4.6523   LearningRate 0.0116   Epoch: 13   Global Step: 220160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:19,522-Speed 9456.13 samples/sec   Loss 4.5720   LearningRate 0.0116   Epoch: 13   Global Step: 220170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:20,628-Speed 9265.78 samples/sec   Loss 4.6209   LearningRate 0.0116   Epoch: 13   Global Step: 220180   Fp16 Grad Scale: 262144   Required: 5 hours
Training: 2022-04-11 20:26:21,749-Speed 9135.24 samples/sec   Loss 4.5982   LearningRate 0.0116   Epoch: 13   Global Step: 220190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:22,856-Speed 9264.43 samples/sec   Loss 4.5177   LearningRate 0.0116   Epoch: 13   Global Step: 220200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:23,946-Speed 9397.59 samples/sec   Loss 4.6723   LearningRate 0.0116   Epoch: 13   Global Step: 220210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:25,024-Speed 9504.92 samples/sec   Loss 4.5833   LearningRate 0.0116   Epoch: 13   Global Step: 220220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:26,123-Speed 9320.12 samples/sec   Loss 4.6549   LearningRate 0.0116   Epoch: 13   Global Step: 220230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:27,267-Speed 8955.34 samples/sec   Loss 4.5953   LearningRate 0.0116   Epoch: 13   Global Step: 220240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:28,324-Speed 9693.72 samples/sec   Loss 4.5857   LearningRate 0.0116   Epoch: 13   Global Step: 220250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:29,442-Speed 9162.46 samples/sec   Loss 4.6305   LearningRate 0.0116   Epoch: 13   Global Step: 220260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:30,490-Speed 9777.20 samples/sec   Loss 4.5698   LearningRate 0.0116   Epoch: 13   Global Step: 220270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:31,575-Speed 9437.76 samples/sec   Loss 4.6330   LearningRate 0.0116   Epoch: 13   Global Step: 220280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:32,710-Speed 9028.51 samples/sec   Loss 4.5985   LearningRate 0.0116   Epoch: 13   Global Step: 220290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:33,804-Speed 9364.39 samples/sec   Loss 4.5910   LearningRate 0.0116   Epoch: 13   Global Step: 220300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:34,924-Speed 9154.11 samples/sec   Loss 4.5622   LearningRate 0.0116   Epoch: 13   Global Step: 220310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:36,062-Speed 9002.63 samples/sec   Loss 4.5766   LearningRate 0.0116   Epoch: 13   Global Step: 220320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:37,151-Speed 9406.67 samples/sec   Loss 4.6277   LearningRate 0.0116   Epoch: 13   Global Step: 220330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:38,279-Speed 9085.85 samples/sec   Loss 4.5972   LearningRate 0.0116   Epoch: 13   Global Step: 220340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:39,380-Speed 9305.71 samples/sec   Loss 4.6686   LearningRate 0.0116   Epoch: 13   Global Step: 220350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:40,484-Speed 9281.82 samples/sec   Loss 4.6949   LearningRate 0.0116   Epoch: 13   Global Step: 220360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:41,572-Speed 9414.08 samples/sec   Loss 4.5316   LearningRate 0.0116   Epoch: 13   Global Step: 220370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:42,631-Speed 9678.36 samples/sec   Loss 4.6007   LearningRate 0.0115   Epoch: 13   Global Step: 220380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:43,762-Speed 9062.51 samples/sec   Loss 4.6464   LearningRate 0.0115   Epoch: 13   Global Step: 220390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:44,853-Speed 9383.51 samples/sec   Loss 4.6200   LearningRate 0.0115   Epoch: 13   Global Step: 220400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:45,933-Speed 9489.74 samples/sec   Loss 4.6339   LearningRate 0.0115   Epoch: 13   Global Step: 220410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:47,038-Speed 9272.17 samples/sec   Loss 4.7108   LearningRate 0.0115   Epoch: 13   Global Step: 220420   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:48,137-Speed 9329.75 samples/sec   Loss 4.6476   LearningRate 0.0115   Epoch: 13   Global Step: 220430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:49,221-Speed 9446.87 samples/sec   Loss 4.5987   LearningRate 0.0115   Epoch: 13   Global Step: 220440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:50,305-Speed 9452.08 samples/sec   Loss 4.5986   LearningRate 0.0115   Epoch: 13   Global Step: 220450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:26:51,362-Speed 9697.07 samples/sec   Loss 4.5404   LearningRate 0.0115   Epoch: 13   Global Step: 220460   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:52,513-Speed 8901.14 samples/sec   Loss 4.8258   LearningRate 0.0115   Epoch: 13   Global Step: 220470   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:53,635-Speed 9149.32 samples/sec   Loss 4.5757   LearningRate 0.0115   Epoch: 13   Global Step: 220480   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:54,737-Speed 9301.86 samples/sec   Loss 4.5109   LearningRate 0.0115   Epoch: 13   Global Step: 220490   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:55,821-Speed 9446.49 samples/sec   Loss 4.6039   LearningRate 0.0115   Epoch: 13   Global Step: 220500   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:56,914-Speed 9382.40 samples/sec   Loss 4.7096   LearningRate 0.0115   Epoch: 13   Global Step: 220510   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:58,002-Speed 9410.43 samples/sec   Loss 4.6403   LearningRate 0.0115   Epoch: 13   Global Step: 220520   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:26:59,045-Speed 9826.40 samples/sec   Loss 4.6317   LearningRate 0.0115   Epoch: 13   Global Step: 220530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:00,110-Speed 9621.31 samples/sec   Loss 4.5648   LearningRate 0.0115   Epoch: 13   Global Step: 220540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:01,248-Speed 9005.29 samples/sec   Loss 4.5985   LearningRate 0.0115   Epoch: 13   Global Step: 220550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:02,392-Speed 8957.50 samples/sec   Loss 4.6597   LearningRate 0.0115   Epoch: 13   Global Step: 220560   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:27:03,464-Speed 9550.34 samples/sec   Loss 4.5754   LearningRate 0.0115   Epoch: 13   Global Step: 220570   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:27:04,556-Speed 9383.43 samples/sec   Loss 4.6107   LearningRate 0.0115   Epoch: 13   Global Step: 220580   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:27:05,691-Speed 9032.51 samples/sec   Loss 4.6012   LearningRate 0.0115   Epoch: 13   Global Step: 220590   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:27:06,779-Speed 9415.20 samples/sec   Loss 4.6499   LearningRate 0.0115   Epoch: 13   Global Step: 220600   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:27:07,923-Speed 8962.51 samples/sec   Loss 4.5854   LearningRate 0.0115   Epoch: 13   Global Step: 220610   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:27:09,010-Speed 9423.75 samples/sec   Loss 4.6265   LearningRate 0.0115   Epoch: 13   Global Step: 220620   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:27:10,127-Speed 9172.79 samples/sec   Loss 4.5983   LearningRate 0.0115   Epoch: 13   Global Step: 220630   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:27:11,254-Speed 9085.48 samples/sec   Loss 4.6492   LearningRate 0.0115   Epoch: 13   Global Step: 220640   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:27:12,355-Speed 9310.40 samples/sec   Loss 4.4900   LearningRate 0.0115   Epoch: 13   Global Step: 220650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:13,456-Speed 9304.76 samples/sec   Loss 4.6833   LearningRate 0.0115   Epoch: 13   Global Step: 220660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:14,562-Speed 9265.96 samples/sec   Loss 4.6485   LearningRate 0.0115   Epoch: 13   Global Step: 220670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:15,648-Speed 9428.74 samples/sec   Loss 4.6441   LearningRate 0.0115   Epoch: 13   Global Step: 220680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:16,731-Speed 9473.06 samples/sec   Loss 4.6246   LearningRate 0.0115   Epoch: 13   Global Step: 220690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:17,797-Speed 9607.66 samples/sec   Loss 4.6962   LearningRate 0.0115   Epoch: 13   Global Step: 220700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:18,916-Speed 9153.26 samples/sec   Loss 4.6237   LearningRate 0.0115   Epoch: 13   Global Step: 220710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:20,029-Speed 9211.42 samples/sec   Loss 4.6498   LearningRate 0.0115   Epoch: 13   Global Step: 220720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:21,112-Speed 9457.91 samples/sec   Loss 4.5979   LearningRate 0.0115   Epoch: 13   Global Step: 220730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:22,221-Speed 9239.77 samples/sec   Loss 4.6456   LearningRate 0.0115   Epoch: 13   Global Step: 220740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:23,303-Speed 9467.89 samples/sec   Loss 4.5894   LearningRate 0.0115   Epoch: 13   Global Step: 220750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 20:27:24,416-Speed 9209.92 samples/sec   Loss 4.6002   LearningRate 0.0115   Epoch: 13   Global Step: 220760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 20:27:25,513-Speed 9335.30 samples/sec   Loss 4.7069   LearningRate 0.0115   Epoch: 13   Global Step: 220770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 20:27:26,576-Speed 9644.43 samples/sec   Loss 4.5723   LearningRate 0.0115   Epoch: 13   Global Step: 220780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 20:27:27,709-Speed 9043.81 samples/sec   Loss 4.5585   LearningRate 0.0115   Epoch: 13   Global Step: 220790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 20:27:28,834-Speed 9106.52 samples/sec   Loss 4.6151   LearningRate 0.0115   Epoch: 13   Global Step: 220800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 20:27:29,960-Speed 9097.48 samples/sec   Loss 4.5582   LearningRate 0.0115   Epoch: 13   Global Step: 220810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 20:27:31,088-Speed 9080.11 samples/sec   Loss 4.5688   LearningRate 0.0115   Epoch: 13   Global Step: 220820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 20:27:32,138-Speed 9757.40 samples/sec   Loss 4.5754   LearningRate 0.0115   Epoch: 13   Global Step: 220830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 20:27:33,263-Speed 9110.40 samples/sec   Loss 4.6683   LearningRate 0.0115   Epoch: 13   Global Step: 220840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-11 20:27:34,355-Speed 9379.59 samples/sec   Loss 4.7314   LearningRate 0.0115   Epoch: 13   Global Step: 220850   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:35,461-Speed 9275.45 samples/sec   Loss 4.5030   LearningRate 0.0115   Epoch: 13   Global Step: 220860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:36,586-Speed 9106.72 samples/sec   Loss 4.6712   LearningRate 0.0114   Epoch: 13   Global Step: 220870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:37,663-Speed 9517.98 samples/sec   Loss 4.5258   LearningRate 0.0114   Epoch: 13   Global Step: 220880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:38,786-Speed 9119.19 samples/sec   Loss 4.6007   LearningRate 0.0114   Epoch: 13   Global Step: 220890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:39,854-Speed 9599.39 samples/sec   Loss 4.6183   LearningRate 0.0114   Epoch: 13   Global Step: 220900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:40,917-Speed 9637.59 samples/sec   Loss 4.6864   LearningRate 0.0114   Epoch: 13   Global Step: 220910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:42,031-Speed 9190.51 samples/sec   Loss 4.6515   LearningRate 0.0114   Epoch: 13   Global Step: 220920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:43,122-Speed 9395.19 samples/sec   Loss 4.5930   LearningRate 0.0114   Epoch: 13   Global Step: 220930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:44,261-Speed 8992.58 samples/sec   Loss 4.6678   LearningRate 0.0114   Epoch: 13   Global Step: 220940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:45,374-Speed 9209.45 samples/sec   Loss 4.6284   LearningRate 0.0114   Epoch: 13   Global Step: 220950   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:27:46,438-Speed 9629.86 samples/sec   Loss 4.7051   LearningRate 0.0114   Epoch: 13   Global Step: 220960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:47,526-Speed 9414.76 samples/sec   Loss 4.6432   LearningRate 0.0114   Epoch: 13   Global Step: 220970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:48,646-Speed 9150.65 samples/sec   Loss 4.5324   LearningRate 0.0114   Epoch: 13   Global Step: 220980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:49,727-Speed 9472.76 samples/sec   Loss 4.6827   LearningRate 0.0114   Epoch: 13   Global Step: 220990   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:50,828-Speed 9312.13 samples/sec   Loss 4.6046   LearningRate 0.0114   Epoch: 13   Global Step: 221000   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:51,916-Speed 9414.29 samples/sec   Loss 4.5940   LearningRate 0.0114   Epoch: 13   Global Step: 221010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:53,017-Speed 9317.11 samples/sec   Loss 4.6575   LearningRate 0.0114   Epoch: 13   Global Step: 221020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:54,104-Speed 9426.98 samples/sec   Loss 4.6290   LearningRate 0.0114   Epoch: 13   Global Step: 221030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:55,206-Speed 9301.54 samples/sec   Loss 4.6861   LearningRate 0.0114   Epoch: 13   Global Step: 221040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:56,302-Speed 9343.45 samples/sec   Loss 4.5839   LearningRate 0.0114   Epoch: 13   Global Step: 221050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:27:57,361-Speed 9675.18 samples/sec   Loss 4.6731   LearningRate 0.0114   Epoch: 13   Global Step: 221060   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:27:58,470-Speed 9239.96 samples/sec   Loss 4.5908   LearningRate 0.0114   Epoch: 13   Global Step: 221070   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:27:59,536-Speed 9608.58 samples/sec   Loss 4.6781   LearningRate 0.0114   Epoch: 13   Global Step: 221080   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:00,599-Speed 9638.02 samples/sec   Loss 4.7107   LearningRate 0.0114   Epoch: 13   Global Step: 221090   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:01,698-Speed 9326.10 samples/sec   Loss 4.6335   LearningRate 0.0114   Epoch: 13   Global Step: 221100   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:02,869-Speed 8748.76 samples/sec   Loss 4.7191   LearningRate 0.0114   Epoch: 13   Global Step: 221110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:04,015-Speed 8941.00 samples/sec   Loss 4.6332   LearningRate 0.0114   Epoch: 13   Global Step: 221120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:05,108-Speed 9377.92 samples/sec   Loss 4.6438   LearningRate 0.0114   Epoch: 13   Global Step: 221130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:06,207-Speed 9323.89 samples/sec   Loss 4.6544   LearningRate 0.0114   Epoch: 13   Global Step: 221140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:07,293-Speed 9427.41 samples/sec   Loss 4.6714   LearningRate 0.0114   Epoch: 13   Global Step: 221150   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:08,396-Speed 9294.95 samples/sec   Loss 4.6611   LearningRate 0.0114   Epoch: 13   Global Step: 221160   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:09,500-Speed 9277.32 samples/sec   Loss 4.6123   LearningRate 0.0114   Epoch: 13   Global Step: 221170   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:10,555-Speed 9713.84 samples/sec   Loss 4.6723   LearningRate 0.0114   Epoch: 13   Global Step: 221180   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:11,667-Speed 9213.33 samples/sec   Loss 4.6745   LearningRate 0.0114   Epoch: 13   Global Step: 221190   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:12,744-Speed 9513.63 samples/sec   Loss 4.5585   LearningRate 0.0114   Epoch: 13   Global Step: 221200   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:13,874-Speed 9072.28 samples/sec   Loss 4.6544   LearningRate 0.0114   Epoch: 13   Global Step: 221210   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:15,005-Speed 9058.88 samples/sec   Loss 4.6537   LearningRate 0.0114   Epoch: 13   Global Step: 221220   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:16,079-Speed 9534.39 samples/sec   Loss 4.6599   LearningRate 0.0114   Epoch: 13   Global Step: 221230   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:17,129-Speed 9769.18 samples/sec   Loss 4.5573   LearningRate 0.0114   Epoch: 13   Global Step: 221240   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:18,252-Speed 9119.51 samples/sec   Loss 4.6241   LearningRate 0.0114   Epoch: 13   Global Step: 221250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:19,355-Speed 9287.34 samples/sec   Loss 4.6645   LearningRate 0.0114   Epoch: 13   Global Step: 221260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:20,426-Speed 9564.40 samples/sec   Loss 4.7039   LearningRate 0.0114   Epoch: 13   Global Step: 221270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:21,530-Speed 9287.92 samples/sec   Loss 4.7186   LearningRate 0.0114   Epoch: 13   Global Step: 221280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:22,629-Speed 9319.80 samples/sec   Loss 4.6755   LearningRate 0.0114   Epoch: 13   Global Step: 221290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:23,817-Speed 8628.79 samples/sec   Loss 4.7429   LearningRate 0.0114   Epoch: 13   Global Step: 221300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:24,898-Speed 9473.85 samples/sec   Loss 4.6348   LearningRate 0.0114   Epoch: 13   Global Step: 221310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:26,017-Speed 9157.43 samples/sec   Loss 4.6182   LearningRate 0.0114   Epoch: 13   Global Step: 221320   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:27,086-Speed 9579.40 samples/sec   Loss 4.6110   LearningRate 0.0114   Epoch: 13   Global Step: 221330   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:28,205-Speed 9162.97 samples/sec   Loss 4.6084   LearningRate 0.0114   Epoch: 13   Global Step: 221340   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:29,372-Speed 8776.79 samples/sec   Loss 4.6194   LearningRate 0.0114   Epoch: 13   Global Step: 221350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:30,463-Speed 9394.10 samples/sec   Loss 4.6786   LearningRate 0.0113   Epoch: 13   Global Step: 221360   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:31,549-Speed 9436.18 samples/sec   Loss 4.7142   LearningRate 0.0113   Epoch: 13   Global Step: 221370   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:32,725-Speed 8713.04 samples/sec   Loss 4.6216   LearningRate 0.0113   Epoch: 13   Global Step: 221380   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:33,804-Speed 9496.27 samples/sec   Loss 4.6669   LearningRate 0.0113   Epoch: 13   Global Step: 221390   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:34,889-Speed 9460.86 samples/sec   Loss 4.6750   LearningRate 0.0113   Epoch: 13   Global Step: 221400   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:35,961-Speed 9554.18 samples/sec   Loss 4.5601   LearningRate 0.0113   Epoch: 13   Global Step: 221410   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:37,007-Speed 9794.77 samples/sec   Loss 4.6553   LearningRate 0.0113   Epoch: 13   Global Step: 221420   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:38,087-Speed 9485.39 samples/sec   Loss 4.6558   LearningRate 0.0113   Epoch: 13   Global Step: 221430   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:39,193-Speed 9266.49 samples/sec   Loss 4.6848   LearningRate 0.0113   Epoch: 13   Global Step: 221440   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:40,339-Speed 8941.92 samples/sec   Loss 4.6388   LearningRate 0.0113   Epoch: 13   Global Step: 221450   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:41,415-Speed 9527.32 samples/sec   Loss 4.7367   LearningRate 0.0113   Epoch: 13   Global Step: 221460   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:42,539-Speed 9110.91 samples/sec   Loss 4.6359   LearningRate 0.0113   Epoch: 13   Global Step: 221470   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:43,696-Speed 8855.67 samples/sec   Loss 4.6439   LearningRate 0.0113   Epoch: 13   Global Step: 221480   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:44,770-Speed 9546.54 samples/sec   Loss 4.6566   LearningRate 0.0113   Epoch: 13   Global Step: 221490   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:45,870-Speed 9309.93 samples/sec   Loss 4.6571   LearningRate 0.0113   Epoch: 13   Global Step: 221500   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:46,954-Speed 9449.30 samples/sec   Loss 4.5637   LearningRate 0.0113   Epoch: 13   Global Step: 221510   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:48,059-Speed 9278.45 samples/sec   Loss 4.6189   LearningRate 0.0113   Epoch: 13   Global Step: 221520   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:49,146-Speed 9424.84 samples/sec   Loss 4.6171   LearningRate 0.0113   Epoch: 13   Global Step: 221530   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:50,228-Speed 9473.79 samples/sec   Loss 4.6731   LearningRate 0.0113   Epoch: 13   Global Step: 221540   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:28:51,327-Speed 9330.81 samples/sec   Loss 4.6765   LearningRate 0.0113   Epoch: 13   Global Step: 221550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:52,422-Speed 9352.99 samples/sec   Loss 4.6470   LearningRate 0.0113   Epoch: 13   Global Step: 221560   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:53,521-Speed 9327.39 samples/sec   Loss 4.6327   LearningRate 0.0113   Epoch: 13   Global Step: 221570   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:54,601-Speed 9486.15 samples/sec   Loss 4.6671   LearningRate 0.0113   Epoch: 13   Global Step: 221580   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:55,724-Speed 9122.12 samples/sec   Loss 4.5925   LearningRate 0.0113   Epoch: 13   Global Step: 221590   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:56,830-Speed 9260.32 samples/sec   Loss 4.7398   LearningRate 0.0113   Epoch: 13   Global Step: 221600   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:57,908-Speed 9515.12 samples/sec   Loss 4.6945   LearningRate 0.0113   Epoch: 13   Global Step: 221610   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:28:59,000-Speed 9385.98 samples/sec   Loss 4.6074   LearningRate 0.0113   Epoch: 13   Global Step: 221620   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:00,098-Speed 9328.47 samples/sec   Loss 4.7857   LearningRate 0.0113   Epoch: 13   Global Step: 221630   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:01,155-Speed 9690.66 samples/sec   Loss 4.6613   LearningRate 0.0113   Epoch: 13   Global Step: 221640   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:02,246-Speed 9395.46 samples/sec   Loss 4.6479   LearningRate 0.0113   Epoch: 13   Global Step: 221650   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:03,349-Speed 9284.07 samples/sec   Loss 4.6665   LearningRate 0.0113   Epoch: 13   Global Step: 221660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:04,426-Speed 9514.22 samples/sec   Loss 4.6605   LearningRate 0.0113   Epoch: 13   Global Step: 221670   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:05,475-Speed 9770.08 samples/sec   Loss 4.6518   LearningRate 0.0113   Epoch: 13   Global Step: 221680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:06,549-Speed 9541.03 samples/sec   Loss 4.5916   LearningRate 0.0113   Epoch: 13   Global Step: 221690   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:07,658-Speed 9232.66 samples/sec   Loss 4.6379   LearningRate 0.0113   Epoch: 13   Global Step: 221700   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:08,733-Speed 9535.30 samples/sec   Loss 4.6717   LearningRate 0.0113   Epoch: 13   Global Step: 221710   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:09,795-Speed 9641.58 samples/sec   Loss 4.6374   LearningRate 0.0113   Epoch: 13   Global Step: 221720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:10,882-Speed 9428.70 samples/sec   Loss 4.6947   LearningRate 0.0113   Epoch: 13   Global Step: 221730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:11,993-Speed 9224.35 samples/sec   Loss 4.6669   LearningRate 0.0113   Epoch: 13   Global Step: 221740   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:13,092-Speed 9329.55 samples/sec   Loss 4.5835   LearningRate 0.0113   Epoch: 13   Global Step: 221750   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:29:14,219-Speed 9087.40 samples/sec   Loss 4.6194   LearningRate 0.0113   Epoch: 13   Global Step: 221760   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:29:15,327-Speed 9250.97 samples/sec   Loss 4.6592   LearningRate 0.0113   Epoch: 13   Global Step: 221770   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:29:16,426-Speed 9320.87 samples/sec   Loss 4.6014   LearningRate 0.0113   Epoch: 13   Global Step: 221780   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:29:17,552-Speed 9100.25 samples/sec   Loss 4.5678   LearningRate 0.0113   Epoch: 13   Global Step: 221790   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:29:18,659-Speed 9254.96 samples/sec   Loss 4.5870   LearningRate 0.0113   Epoch: 13   Global Step: 221800   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:29:19,745-Speed 9438.53 samples/sec   Loss 4.5876   LearningRate 0.0113   Epoch: 13   Global Step: 221810   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:29:20,841-Speed 9342.16 samples/sec   Loss 4.7051   LearningRate 0.0113   Epoch: 13   Global Step: 221820   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:29:21,918-Speed 9523.31 samples/sec   Loss 4.6362   LearningRate 0.0113   Epoch: 13   Global Step: 221830   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:29:23,013-Speed 9350.57 samples/sec   Loss 4.6646   LearningRate 0.0113   Epoch: 13   Global Step: 221840   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:29:24,145-Speed 9056.81 samples/sec   Loss 4.6167   LearningRate 0.0113   Epoch: 13   Global Step: 221850   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:29:25,209-Speed 9628.17 samples/sec   Loss 4.7234   LearningRate 0.0112   Epoch: 13   Global Step: 221860   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:26,304-Speed 9351.68 samples/sec   Loss 4.6367   LearningRate 0.0112   Epoch: 13   Global Step: 221870   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:27,369-Speed 9625.03 samples/sec   Loss 4.7513   LearningRate 0.0112   Epoch: 13   Global Step: 221880   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:28,488-Speed 9160.31 samples/sec   Loss 4.7321   LearningRate 0.0112   Epoch: 13   Global Step: 221890   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:29,653-Speed 8793.16 samples/sec   Loss 4.6876   LearningRate 0.0112   Epoch: 13   Global Step: 221900   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:30,732-Speed 9493.44 samples/sec   Loss 4.7924   LearningRate 0.0112   Epoch: 13   Global Step: 221910   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:31,794-Speed 9650.59 samples/sec   Loss 4.6301   LearningRate 0.0112   Epoch: 13   Global Step: 221920   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:32,916-Speed 9135.23 samples/sec   Loss 4.7504   LearningRate 0.0112   Epoch: 13   Global Step: 221930   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:34,061-Speed 8949.26 samples/sec   Loss 4.6103   LearningRate 0.0112   Epoch: 13   Global Step: 221940   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:35,191-Speed 9059.98 samples/sec   Loss 4.7229   LearningRate 0.0112   Epoch: 13   Global Step: 221950   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:29:36,260-Speed 9591.63 samples/sec   Loss 4.6626   LearningRate 0.0112   Epoch: 13   Global Step: 221960   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:29:37,339-Speed 9492.56 samples/sec   Loss 4.6798   LearningRate 0.0112   Epoch: 13   Global Step: 221970   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:29:38,423-Speed 9450.84 samples/sec   Loss 4.7227   LearningRate 0.0112   Epoch: 13   Global Step: 221980   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:29:39,482-Speed 9676.25 samples/sec   Loss 4.7522   LearningRate 0.0112   Epoch: 13   Global Step: 221990   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:29:40,586-Speed 9285.05 samples/sec   Loss 4.6769   LearningRate 0.0112   Epoch: 13   Global Step: 222000   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:30:02,521-[lfw][222000]XNorm: 7.941319
Training: 2022-04-11 20:30:02,522-[lfw][222000]Accuracy-Flip: 0.99583+-0.00291
Training: 2022-04-11 20:30:02,522-[lfw][222000]Accuracy-Highest: 0.99700
Training: 2022-04-11 20:30:27,847-[cfp_fp][222000]XNorm: 6.859523
Training: 2022-04-11 20:30:27,848-[cfp_fp][222000]Accuracy-Flip: 0.96757+-0.00864
Training: 2022-04-11 20:30:27,848-[cfp_fp][222000]Accuracy-Highest: 0.96771
Training: 2022-04-11 20:30:49,707-[agedb_30][222000]XNorm: 7.695806
Training: 2022-04-11 20:30:49,707-[agedb_30][222000]Accuracy-Flip: 0.96867+-0.00951
Training: 2022-04-11 20:30:49,707-[agedb_30][222000]Accuracy-Highest: 0.97033
Training: 2022-04-11 20:30:50,799-Speed 145.84 samples/sec   Loss 4.6504   LearningRate 0.0112   Epoch: 13   Global Step: 222010   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:30:51,862-Speed 9644.51 samples/sec   Loss 4.6524   LearningRate 0.0112   Epoch: 13   Global Step: 222020   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:30:52,963-Speed 9308.25 samples/sec   Loss 4.6034   LearningRate 0.0112   Epoch: 13   Global Step: 222030   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:30:54,076-Speed 9203.49 samples/sec   Loss 4.5794   LearningRate 0.0112   Epoch: 13   Global Step: 222040   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:30:55,149-Speed 9552.49 samples/sec   Loss 4.7122   LearningRate 0.0112   Epoch: 13   Global Step: 222050   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:30:56,270-Speed 9142.74 samples/sec   Loss 4.7014   LearningRate 0.0112   Epoch: 13   Global Step: 222060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:30:57,423-Speed 8884.84 samples/sec   Loss 4.7026   LearningRate 0.0112   Epoch: 13   Global Step: 222070   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:30:58,497-Speed 9544.53 samples/sec   Loss 4.5587   LearningRate 0.0112   Epoch: 13   Global Step: 222080   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:30:59,587-Speed 9395.93 samples/sec   Loss 4.6813   LearningRate 0.0112   Epoch: 13   Global Step: 222090   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:31:00,696-Speed 9243.55 samples/sec   Loss 4.6275   LearningRate 0.0112   Epoch: 13   Global Step: 222100   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:31:01,778-Speed 9468.93 samples/sec   Loss 4.7020   LearningRate 0.0112   Epoch: 13   Global Step: 222110   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:31:02,887-Speed 9238.40 samples/sec   Loss 4.7710   LearningRate 0.0112   Epoch: 13   Global Step: 222120   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:31:04,016-Speed 9078.64 samples/sec   Loss 4.7219   LearningRate 0.0112   Epoch: 13   Global Step: 222130   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:31:05,137-Speed 9132.76 samples/sec   Loss 4.6930   LearningRate 0.0112   Epoch: 13   Global Step: 222140   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:31:06,247-Speed 9230.89 samples/sec   Loss 4.6522   LearningRate 0.0112   Epoch: 13   Global Step: 222150   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:31:07,333-Speed 9438.43 samples/sec   Loss 4.7081   LearningRate 0.0112   Epoch: 13   Global Step: 222160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:31:08,448-Speed 9186.82 samples/sec   Loss 4.7262   LearningRate 0.0112   Epoch: 13   Global Step: 222170   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:31:09,547-Speed 9324.87 samples/sec   Loss 4.6243   LearningRate 0.0112   Epoch: 13   Global Step: 222180   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:31:10,626-Speed 9497.83 samples/sec   Loss 4.6587   LearningRate 0.0112   Epoch: 13   Global Step: 222190   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:31:11,684-Speed 9679.70 samples/sec   Loss 4.6319   LearningRate 0.0112   Epoch: 13   Global Step: 222200   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:31:12,772-Speed 9415.97 samples/sec   Loss 4.6208   LearningRate 0.0112   Epoch: 13   Global Step: 222210   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:31:14,176-Speed 7299.25 samples/sec   Loss 4.6701   LearningRate 0.0112   Epoch: 13   Global Step: 222220   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:31:15,255-Speed 9495.34 samples/sec   Loss 4.6144   LearningRate 0.0112   Epoch: 13   Global Step: 222230   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:31:16,393-Speed 9005.67 samples/sec   Loss 4.6273   LearningRate 0.0112   Epoch: 13   Global Step: 222240   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-11 20:31:17,483-Speed 9396.29 samples/sec   Loss 4.6470   LearningRate 0.0112   Epoch: 13   Global Step: 222250   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:31:18,618-Speed 9030.82 samples/sec   Loss 4.6804   LearningRate 0.0112   Epoch: 13   Global Step: 222260   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:31:19,665-Speed 9785.97 samples/sec   Loss 4.6268   LearningRate 0.0112   Epoch: 13   Global Step: 222270   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:31:20,780-Speed 9191.24 samples/sec   Loss 4.6004   LearningRate 0.0112   Epoch: 13   Global Step: 222280   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:31:21,913-Speed 9039.59 samples/sec   Loss 4.6631   LearningRate 0.0112   Epoch: 13   Global Step: 222290   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:31:23,007-Speed 9370.26 samples/sec   Loss 4.7193   LearningRate 0.0112   Epoch: 13   Global Step: 222300   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:31:24,076-Speed 9581.01 samples/sec   Loss 4.6623   LearningRate 0.0112   Epoch: 13   Global Step: 222310   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:31:25,176-Speed 9317.15 samples/sec   Loss 4.7275   LearningRate 0.0112   Epoch: 13   Global Step: 222320   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:31:26,268-Speed 9374.75 samples/sec   Loss 4.6121   LearningRate 0.0112   Epoch: 13   Global Step: 222330   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:31:27,338-Speed 9582.91 samples/sec   Loss 4.5667   LearningRate 0.0112   Epoch: 13   Global Step: 222340   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:31:28,412-Speed 9541.55 samples/sec   Loss 4.6689   LearningRate 0.0112   Epoch: 13   Global Step: 222350   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:31:29,490-Speed 9501.11 samples/sec   Loss 4.7737   LearningRate 0.0111   Epoch: 13   Global Step: 222360   Fp16 Grad Scale: 131072   Required: 5 hours
Training: 2022-04-11 20:31:30,563-Speed 9547.89 samples/sec   Loss 4.5572   LearningRate 0.0111   Epoch: 13   Global Step: 222370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:31:31,647-Speed 9454.90 samples/sec   Loss 4.7353   LearningRate 0.0111   Epoch: 13   Global Step: 222380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:31:32,715-Speed 9597.97 samples/sec   Loss 4.7152   LearningRate 0.0111   Epoch: 13   Global Step: 222390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:31:33,765-Speed 9754.25 samples/sec   Loss 4.6727   LearningRate 0.0111   Epoch: 13   Global Step: 222400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:31:34,844-Speed 9495.88 samples/sec   Loss 4.6569   LearningRate 0.0111   Epoch: 13   Global Step: 222410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:31:35,932-Speed 9416.32 samples/sec   Loss 4.6965   LearningRate 0.0111   Epoch: 13   Global Step: 222420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:31:37,047-Speed 9192.74 samples/sec   Loss 4.7600   LearningRate 0.0111   Epoch: 13   Global Step: 222430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:31:38,152-Speed 9272.13 samples/sec   Loss 4.6529   LearningRate 0.0111   Epoch: 13   Global Step: 222440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:31:39,255-Speed 9296.84 samples/sec   Loss 4.7369   LearningRate 0.0111   Epoch: 13   Global Step: 222450   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 20:31:40,330-Speed 9529.05 samples/sec   Loss 4.7891   LearningRate 0.0111   Epoch: 13   Global Step: 222460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:31:41,398-Speed 9594.66 samples/sec   Loss 4.8462   LearningRate 0.0111   Epoch: 13   Global Step: 222470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:31:42,508-Speed 9229.19 samples/sec   Loss 4.6565   LearningRate 0.0111   Epoch: 13   Global Step: 222480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:31:43,637-Speed 9074.38 samples/sec   Loss 4.7906   LearningRate 0.0111   Epoch: 13   Global Step: 222490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:31:44,704-Speed 9604.05 samples/sec   Loss 4.6987   LearningRate 0.0111   Epoch: 13   Global Step: 222500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:31:45,781-Speed 9509.81 samples/sec   Loss 4.6343   LearningRate 0.0111   Epoch: 13   Global Step: 222510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:31:46,890-Speed 9236.93 samples/sec   Loss 4.7195   LearningRate 0.0111   Epoch: 13   Global Step: 222520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:31:47,987-Speed 9342.23 samples/sec   Loss 4.6947   LearningRate 0.0111   Epoch: 13   Global Step: 222530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:31:49,080-Speed 9375.79 samples/sec   Loss 4.6033   LearningRate 0.0111   Epoch: 13   Global Step: 222540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:31:50,173-Speed 9372.99 samples/sec   Loss 4.6444   LearningRate 0.0111   Epoch: 13   Global Step: 222550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:31:51,306-Speed 9049.47 samples/sec   Loss 4.7493   LearningRate 0.0111   Epoch: 13   Global Step: 222560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:31:52,388-Speed 9474.30 samples/sec   Loss 4.7128   LearningRate 0.0111   Epoch: 13   Global Step: 222570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:31:53,492-Speed 9279.07 samples/sec   Loss 4.6645   LearningRate 0.0111   Epoch: 13   Global Step: 222580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:31:54,660-Speed 8772.28 samples/sec   Loss 4.6817   LearningRate 0.0111   Epoch: 13   Global Step: 222590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:31:55,732-Speed 9556.97 samples/sec   Loss 4.6655   LearningRate 0.0111   Epoch: 13   Global Step: 222600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:31:56,775-Speed 9824.31 samples/sec   Loss 4.6720   LearningRate 0.0111   Epoch: 13   Global Step: 222610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:31:57,853-Speed 9503.11 samples/sec   Loss 4.6786   LearningRate 0.0111   Epoch: 13   Global Step: 222620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:31:58,928-Speed 9527.71 samples/sec   Loss 4.7278   LearningRate 0.0111   Epoch: 13   Global Step: 222630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:00,024-Speed 9350.26 samples/sec   Loss 4.6999   LearningRate 0.0111   Epoch: 13   Global Step: 222640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:01,134-Speed 9234.16 samples/sec   Loss 4.7140   LearningRate 0.0111   Epoch: 13   Global Step: 222650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:02,230-Speed 9350.83 samples/sec   Loss 4.8009   LearningRate 0.0111   Epoch: 13   Global Step: 222660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:03,333-Speed 9287.36 samples/sec   Loss 4.7656   LearningRate 0.0111   Epoch: 13   Global Step: 222670   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 20:32:04,395-Speed 9649.02 samples/sec   Loss 4.6879   LearningRate 0.0111   Epoch: 13   Global Step: 222680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:05,487-Speed 9382.32 samples/sec   Loss 4.7091   LearningRate 0.0111   Epoch: 13   Global Step: 222690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:06,586-Speed 9327.13 samples/sec   Loss 4.7695   LearningRate 0.0111   Epoch: 13   Global Step: 222700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:07,658-Speed 9558.49 samples/sec   Loss 4.8514   LearningRate 0.0111   Epoch: 13   Global Step: 222710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:08,777-Speed 9155.85 samples/sec   Loss 4.7602   LearningRate 0.0111   Epoch: 13   Global Step: 222720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:09,878-Speed 9306.84 samples/sec   Loss 4.6883   LearningRate 0.0111   Epoch: 13   Global Step: 222730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:10,950-Speed 9556.53 samples/sec   Loss 4.7315   LearningRate 0.0111   Epoch: 13   Global Step: 222740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:12,048-Speed 9328.95 samples/sec   Loss 4.6575   LearningRate 0.0111   Epoch: 13   Global Step: 222750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:13,161-Speed 9207.36 samples/sec   Loss 4.6914   LearningRate 0.0111   Epoch: 13   Global Step: 222760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:14,278-Speed 9176.98 samples/sec   Loss 4.7405   LearningRate 0.0111   Epoch: 13   Global Step: 222770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:15,320-Speed 9825.31 samples/sec   Loss 4.6517   LearningRate 0.0111   Epoch: 13   Global Step: 222780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:16,403-Speed 9467.76 samples/sec   Loss 4.7087   LearningRate 0.0111   Epoch: 13   Global Step: 222790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:17,471-Speed 9593.40 samples/sec   Loss 4.6491   LearningRate 0.0111   Epoch: 13   Global Step: 222800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:18,613-Speed 8972.53 samples/sec   Loss 4.7812   LearningRate 0.0111   Epoch: 13   Global Step: 222810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:19,714-Speed 9305.60 samples/sec   Loss 4.6604   LearningRate 0.0111   Epoch: 13   Global Step: 222820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:20,850-Speed 9018.50 samples/sec   Loss 4.6273   LearningRate 0.0111   Epoch: 13   Global Step: 222830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:21,947-Speed 9337.08 samples/sec   Loss 4.6891   LearningRate 0.0111   Epoch: 13   Global Step: 222840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:23,053-Speed 9264.76 samples/sec   Loss 4.6126   LearningRate 0.0111   Epoch: 13   Global Step: 222850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:24,114-Speed 9663.44 samples/sec   Loss 4.7214   LearningRate 0.0110   Epoch: 13   Global Step: 222860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:25,183-Speed 9580.01 samples/sec   Loss 4.7470   LearningRate 0.0110   Epoch: 13   Global Step: 222870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:26,297-Speed 9200.93 samples/sec   Loss 4.7325   LearningRate 0.0110   Epoch: 13   Global Step: 222880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:27,356-Speed 9677.81 samples/sec   Loss 4.6072   LearningRate 0.0110   Epoch: 13   Global Step: 222890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:28,410-Speed 9719.65 samples/sec   Loss 4.6836   LearningRate 0.0110   Epoch: 13   Global Step: 222900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:29,492-Speed 9465.57 samples/sec   Loss 4.7232   LearningRate 0.0110   Epoch: 13   Global Step: 222910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:30,602-Speed 9231.57 samples/sec   Loss 4.7059   LearningRate 0.0110   Epoch: 13   Global Step: 222920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:31,733-Speed 9063.48 samples/sec   Loss 4.6469   LearningRate 0.0110   Epoch: 13   Global Step: 222930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:32,821-Speed 9419.16 samples/sec   Loss 4.6663   LearningRate 0.0110   Epoch: 13   Global Step: 222940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:33,904-Speed 9454.48 samples/sec   Loss 4.7567   LearningRate 0.0110   Epoch: 13   Global Step: 222950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:35,027-Speed 9125.58 samples/sec   Loss 4.6350   LearningRate 0.0110   Epoch: 13   Global Step: 222960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:36,135-Speed 9246.45 samples/sec   Loss 4.7420   LearningRate 0.0110   Epoch: 13   Global Step: 222970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:37,198-Speed 9642.07 samples/sec   Loss 4.7115   LearningRate 0.0110   Epoch: 13   Global Step: 222980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:32:38,292-Speed 9372.21 samples/sec   Loss 4.7173   LearningRate 0.0110   Epoch: 13   Global Step: 222990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:32:39,405-Speed 9204.63 samples/sec   Loss 4.6770   LearningRate 0.0110   Epoch: 13   Global Step: 223000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:32:40,458-Speed 9735.97 samples/sec   Loss 4.7223   LearningRate 0.0110   Epoch: 13   Global Step: 223010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:32:41,539-Speed 9474.73 samples/sec   Loss 4.7892   LearningRate 0.0110   Epoch: 13   Global Step: 223020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:32:43,448-Speed 5367.87 samples/sec   Loss 4.7228   LearningRate 0.0110   Epoch: 13   Global Step: 223030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:32:44,536-Speed 9415.51 samples/sec   Loss 4.7557   LearningRate 0.0110   Epoch: 13   Global Step: 223040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:32:45,642-Speed 9266.88 samples/sec   Loss 4.5952   LearningRate 0.0110   Epoch: 13   Global Step: 223050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:32:46,767-Speed 9107.17 samples/sec   Loss 4.8346   LearningRate 0.0110   Epoch: 13   Global Step: 223060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:32:47,847-Speed 9484.92 samples/sec   Loss 4.7509   LearningRate 0.0110   Epoch: 13   Global Step: 223070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:32:48,942-Speed 9361.32 samples/sec   Loss 4.7397   LearningRate 0.0110   Epoch: 13   Global Step: 223080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:50,050-Speed 9245.04 samples/sec   Loss 4.7949   LearningRate 0.0110   Epoch: 13   Global Step: 223090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:51,133-Speed 9463.30 samples/sec   Loss 4.6926   LearningRate 0.0110   Epoch: 13   Global Step: 223100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:52,248-Speed 9191.51 samples/sec   Loss 4.6371   LearningRate 0.0110   Epoch: 13   Global Step: 223110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:53,333-Speed 9437.49 samples/sec   Loss 4.6235   LearningRate 0.0110   Epoch: 13   Global Step: 223120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:54,390-Speed 9697.47 samples/sec   Loss 4.6944   LearningRate 0.0110   Epoch: 13   Global Step: 223130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:55,448-Speed 9679.70 samples/sec   Loss 4.7570   LearningRate 0.0110   Epoch: 13   Global Step: 223140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:56,584-Speed 9022.89 samples/sec   Loss 4.7230   LearningRate 0.0110   Epoch: 13   Global Step: 223150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:57,637-Speed 9731.80 samples/sec   Loss 4.6474   LearningRate 0.0110   Epoch: 13   Global Step: 223160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:58,751-Speed 9194.53 samples/sec   Loss 4.6888   LearningRate 0.0110   Epoch: 13   Global Step: 223170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:32:59,822-Speed 9560.93 samples/sec   Loss 4.6842   LearningRate 0.0110   Epoch: 13   Global Step: 223180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:00,914-Speed 9388.94 samples/sec   Loss 4.7251   LearningRate 0.0110   Epoch: 13   Global Step: 223190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:02,049-Speed 9026.50 samples/sec   Loss 4.6637   LearningRate 0.0110   Epoch: 13   Global Step: 223200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:03,106-Speed 9700.91 samples/sec   Loss 4.7002   LearningRate 0.0110   Epoch: 13   Global Step: 223210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:04,170-Speed 9622.71 samples/sec   Loss 4.7057   LearningRate 0.0110   Epoch: 13   Global Step: 223220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:05,243-Speed 9553.65 samples/sec   Loss 4.6934   LearningRate 0.0110   Epoch: 13   Global Step: 223230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:33:06,334-Speed 9385.01 samples/sec   Loss 4.7369   LearningRate 0.0110   Epoch: 13   Global Step: 223240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:33:07,429-Speed 9361.46 samples/sec   Loss 4.5658   LearningRate 0.0110   Epoch: 13   Global Step: 223250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:33:08,509-Speed 9493.01 samples/sec   Loss 4.6495   LearningRate 0.0110   Epoch: 13   Global Step: 223260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:33:09,626-Speed 9165.99 samples/sec   Loss 4.6924   LearningRate 0.0110   Epoch: 13   Global Step: 223270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:33:10,715-Speed 9410.43 samples/sec   Loss 4.6116   LearningRate 0.0110   Epoch: 13   Global Step: 223280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:33:11,819-Speed 9279.97 samples/sec   Loss 4.7478   LearningRate 0.0110   Epoch: 13   Global Step: 223290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:33:12,887-Speed 9591.67 samples/sec   Loss 4.7988   LearningRate 0.0110   Epoch: 13   Global Step: 223300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:33:13,972-Speed 9441.63 samples/sec   Loss 4.7197   LearningRate 0.0110   Epoch: 13   Global Step: 223310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:33:15,096-Speed 9206.23 samples/sec   Loss 4.8092   LearningRate 0.0110   Epoch: 13   Global Step: 223320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:33:16,204-Speed 9253.80 samples/sec   Loss 4.6616   LearningRate 0.0110   Epoch: 13   Global Step: 223330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:17,281-Speed 9506.64 samples/sec   Loss 4.7054   LearningRate 0.0110   Epoch: 13   Global Step: 223340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:18,398-Speed 9172.46 samples/sec   Loss 4.7728   LearningRate 0.0110   Epoch: 13   Global Step: 223350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:19,507-Speed 9242.27 samples/sec   Loss 4.7089   LearningRate 0.0109   Epoch: 13   Global Step: 223360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:20,589-Speed 9470.16 samples/sec   Loss 4.7293   LearningRate 0.0109   Epoch: 13   Global Step: 223370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:21,663-Speed 9547.01 samples/sec   Loss 4.7130   LearningRate 0.0109   Epoch: 13   Global Step: 223380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:22,772-Speed 9237.48 samples/sec   Loss 4.7782   LearningRate 0.0109   Epoch: 13   Global Step: 223390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:23,856-Speed 9451.83 samples/sec   Loss 4.7514   LearningRate 0.0109   Epoch: 13   Global Step: 223400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:24,979-Speed 9125.82 samples/sec   Loss 4.6329   LearningRate 0.0109   Epoch: 13   Global Step: 223410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:26,073-Speed 9361.78 samples/sec   Loss 4.7415   LearningRate 0.0109   Epoch: 13   Global Step: 223420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:27,176-Speed 9283.30 samples/sec   Loss 4.5718   LearningRate 0.0109   Epoch: 13   Global Step: 223430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:28,256-Speed 9492.01 samples/sec   Loss 4.6869   LearningRate 0.0109   Epoch: 13   Global Step: 223440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:29,348-Speed 9378.85 samples/sec   Loss 4.7300   LearningRate 0.0109   Epoch: 13   Global Step: 223450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:30,439-Speed 9390.71 samples/sec   Loss 4.6089   LearningRate 0.0109   Epoch: 13   Global Step: 223460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:31,522-Speed 9464.88 samples/sec   Loss 4.8520   LearningRate 0.0109   Epoch: 13   Global Step: 223470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:32,655-Speed 9044.33 samples/sec   Loss 4.6917   LearningRate 0.0109   Epoch: 13   Global Step: 223480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:33,730-Speed 9535.51 samples/sec   Loss 4.6680   LearningRate 0.0109   Epoch: 13   Global Step: 223490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:34,835-Speed 9271.55 samples/sec   Loss 4.6985   LearningRate 0.0109   Epoch: 13   Global Step: 223500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:35,928-Speed 9368.56 samples/sec   Loss 4.6747   LearningRate 0.0109   Epoch: 13   Global Step: 223510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:37,065-Speed 9012.94 samples/sec   Loss 4.7699   LearningRate 0.0109   Epoch: 13   Global Step: 223520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:38,158-Speed 9387.27 samples/sec   Loss 4.7062   LearningRate 0.0109   Epoch: 13   Global Step: 223530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:39,287-Speed 9074.20 samples/sec   Loss 4.6690   LearningRate 0.0109   Epoch: 13   Global Step: 223540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:40,354-Speed 9605.01 samples/sec   Loss 4.6186   LearningRate 0.0109   Epoch: 13   Global Step: 223550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:41,421-Speed 9605.31 samples/sec   Loss 4.7026   LearningRate 0.0109   Epoch: 13   Global Step: 223560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:42,574-Speed 8884.79 samples/sec   Loss 4.7480   LearningRate 0.0109   Epoch: 13   Global Step: 223570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:43,682-Speed 9248.09 samples/sec   Loss 4.6905   LearningRate 0.0109   Epoch: 13   Global Step: 223580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:44,756-Speed 9541.52 samples/sec   Loss 4.6053   LearningRate 0.0109   Epoch: 13   Global Step: 223590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:45,861-Speed 9267.99 samples/sec   Loss 4.6256   LearningRate 0.0109   Epoch: 13   Global Step: 223600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:46,973-Speed 9209.69 samples/sec   Loss 4.7793   LearningRate 0.0109   Epoch: 13   Global Step: 223610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:48,055-Speed 9470.46 samples/sec   Loss 4.6407   LearningRate 0.0109   Epoch: 13   Global Step: 223620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:49,107-Speed 9744.60 samples/sec   Loss 4.6965   LearningRate 0.0109   Epoch: 13   Global Step: 223630   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 20:33:50,184-Speed 9510.63 samples/sec   Loss 4.6027   LearningRate 0.0109   Epoch: 13   Global Step: 223640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:51,282-Speed 9330.71 samples/sec   Loss 4.7430   LearningRate 0.0109   Epoch: 13   Global Step: 223650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:52,368-Speed 9440.00 samples/sec   Loss 4.6281   LearningRate 0.0109   Epoch: 13   Global Step: 223660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:53,445-Speed 9508.93 samples/sec   Loss 4.6986   LearningRate 0.0109   Epoch: 13   Global Step: 223670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:54,531-Speed 9434.70 samples/sec   Loss 4.7618   LearningRate 0.0109   Epoch: 13   Global Step: 223680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:55,597-Speed 9611.77 samples/sec   Loss 4.7463   LearningRate 0.0109   Epoch: 13   Global Step: 223690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:56,666-Speed 9584.77 samples/sec   Loss 4.6997   LearningRate 0.0109   Epoch: 13   Global Step: 223700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:57,721-Speed 9712.47 samples/sec   Loss 4.6826   LearningRate 0.0109   Epoch: 13   Global Step: 223710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:58,859-Speed 9003.64 samples/sec   Loss 4.8064   LearningRate 0.0109   Epoch: 13   Global Step: 223720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:33:59,983-Speed 9113.86 samples/sec   Loss 4.7021   LearningRate 0.0109   Epoch: 13   Global Step: 223730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:34:01,077-Speed 9372.31 samples/sec   Loss 4.7800   LearningRate 0.0109   Epoch: 13   Global Step: 223740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:34:02,169-Speed 9383.70 samples/sec   Loss 4.7830   LearningRate 0.0109   Epoch: 13   Global Step: 223750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:34:03,301-Speed 9047.63 samples/sec   Loss 4.6655   LearningRate 0.0109   Epoch: 13   Global Step: 223760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:34:05,364-Speed 4965.25 samples/sec   Loss 4.7272   LearningRate 0.0109   Epoch: 13   Global Step: 223770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:34:06,454-Speed 9401.24 samples/sec   Loss 4.6710   LearningRate 0.0109   Epoch: 13   Global Step: 223780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:34:08,394-Speed 5279.61 samples/sec   Loss 4.7919   LearningRate 0.0109   Epoch: 13   Global Step: 223790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:34:09,490-Speed 9355.87 samples/sec   Loss 4.7430   LearningRate 0.0109   Epoch: 13   Global Step: 223800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:34:10,580-Speed 9391.05 samples/sec   Loss 4.7353   LearningRate 0.0109   Epoch: 13   Global Step: 223810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:34:11,656-Speed 9523.89 samples/sec   Loss 4.6150   LearningRate 0.0109   Epoch: 13   Global Step: 223820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:34:12,679-Speed 10020.57 samples/sec   Loss 4.7190   LearningRate 0.0109   Epoch: 13   Global Step: 223830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:34:13,780-Speed 9303.61 samples/sec   Loss 4.6873   LearningRate 0.0109   Epoch: 13   Global Step: 223840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:34:14,882-Speed 9301.37 samples/sec   Loss 4.8202   LearningRate 0.0109   Epoch: 13   Global Step: 223850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:34:15,968-Speed 9431.21 samples/sec   Loss 4.7277   LearningRate 0.0109   Epoch: 13   Global Step: 223860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:34:17,042-Speed 9544.29 samples/sec   Loss 4.6425   LearningRate 0.0108   Epoch: 13   Global Step: 223870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:34:18,077-Speed 9892.94 samples/sec   Loss 4.7858   LearningRate 0.0108   Epoch: 13   Global Step: 223880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:34:19,165-Speed 9417.31 samples/sec   Loss 4.7330   LearningRate 0.0108   Epoch: 13   Global Step: 223890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:34:20,238-Speed 9554.87 samples/sec   Loss 4.7032   LearningRate 0.0108   Epoch: 13   Global Step: 223900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:34:21,345-Speed 9252.95 samples/sec   Loss 4.8242   LearningRate 0.0108   Epoch: 13   Global Step: 223910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:34:22,442-Speed 9337.45 samples/sec   Loss 4.5918   LearningRate 0.0108   Epoch: 13   Global Step: 223920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:34:23,498-Speed 9705.19 samples/sec   Loss 4.7409   LearningRate 0.0108   Epoch: 13   Global Step: 223930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:34:24,583-Speed 9445.53 samples/sec   Loss 4.7551   LearningRate 0.0108   Epoch: 13   Global Step: 223940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:34:25,657-Speed 9536.52 samples/sec   Loss 4.7576   LearningRate 0.0108   Epoch: 13   Global Step: 223950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:34:26,790-Speed 9042.29 samples/sec   Loss 4.6702   LearningRate 0.0108   Epoch: 13   Global Step: 223960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:34:27,882-Speed 9387.10 samples/sec   Loss 4.7463   LearningRate 0.0108   Epoch: 13   Global Step: 223970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:34:29,008-Speed 9093.49 samples/sec   Loss 4.6359   LearningRate 0.0108   Epoch: 13   Global Step: 223980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:34:30,108-Speed 9314.28 samples/sec   Loss 4.6279   LearningRate 0.0108   Epoch: 13   Global Step: 223990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:34:31,197-Speed 9409.21 samples/sec   Loss 4.7833   LearningRate 0.0108   Epoch: 13   Global Step: 224000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:34:53,347-[lfw][224000]XNorm: 7.955453
Training: 2022-04-11 20:34:53,348-[lfw][224000]Accuracy-Flip: 0.99733+-0.00249
Training: 2022-04-11 20:34:53,348-[lfw][224000]Accuracy-Highest: 0.99733
Training: 2022-04-11 20:35:18,935-[cfp_fp][224000]XNorm: 6.844160
Training: 2022-04-11 20:35:18,935-[cfp_fp][224000]Accuracy-Flip: 0.96686+-0.00857
Training: 2022-04-11 20:35:18,936-[cfp_fp][224000]Accuracy-Highest: 0.96771
Training: 2022-04-11 20:35:40,962-[agedb_30][224000]XNorm: 7.721459
Training: 2022-04-11 20:35:40,963-[agedb_30][224000]Accuracy-Flip: 0.97033+-0.00865
Training: 2022-04-11 20:35:40,963-[agedb_30][224000]Accuracy-Highest: 0.97033
Training: 2022-04-11 20:35:42,073-Speed 144.48 samples/sec   Loss 4.7132   LearningRate 0.0108   Epoch: 13   Global Step: 224010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:35:43,164-Speed 9389.31 samples/sec   Loss 4.7341   LearningRate 0.0108   Epoch: 13   Global Step: 224020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:35:44,261-Speed 9341.29 samples/sec   Loss 4.7417   LearningRate 0.0108   Epoch: 13   Global Step: 224030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:35:45,328-Speed 9598.76 samples/sec   Loss 4.7974   LearningRate 0.0108   Epoch: 13   Global Step: 224040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:35:46,459-Speed 9061.55 samples/sec   Loss 4.6594   LearningRate 0.0108   Epoch: 13   Global Step: 224050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:35:47,614-Speed 8871.16 samples/sec   Loss 4.7517   LearningRate 0.0108   Epoch: 13   Global Step: 224060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:35:48,713-Speed 9326.01 samples/sec   Loss 4.7299   LearningRate 0.0108   Epoch: 13   Global Step: 224070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:35:49,813-Speed 9310.17 samples/sec   Loss 4.6914   LearningRate 0.0108   Epoch: 13   Global Step: 224080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:35:50,878-Speed 9622.13 samples/sec   Loss 4.7777   LearningRate 0.0108   Epoch: 13   Global Step: 224090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:35:51,980-Speed 9295.50 samples/sec   Loss 4.7948   LearningRate 0.0108   Epoch: 13   Global Step: 224100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:35:53,063-Speed 9468.01 samples/sec   Loss 4.7934   LearningRate 0.0108   Epoch: 13   Global Step: 224110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:35:54,179-Speed 9172.95 samples/sec   Loss 4.7639   LearningRate 0.0108   Epoch: 13   Global Step: 224120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:35:55,260-Speed 9484.98 samples/sec   Loss 4.6493   LearningRate 0.0108   Epoch: 13   Global Step: 224130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:35:56,347-Speed 9422.96 samples/sec   Loss 4.7634   LearningRate 0.0108   Epoch: 13   Global Step: 224140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:35:57,467-Speed 9146.13 samples/sec   Loss 4.7315   LearningRate 0.0108   Epoch: 13   Global Step: 224150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:35:58,584-Speed 9175.59 samples/sec   Loss 4.8527   LearningRate 0.0108   Epoch: 13   Global Step: 224160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:35:59,665-Speed 9475.46 samples/sec   Loss 4.6618   LearningRate 0.0108   Epoch: 13   Global Step: 224170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:00,782-Speed 9175.12 samples/sec   Loss 4.7576   LearningRate 0.0108   Epoch: 13   Global Step: 224180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:01,844-Speed 9646.15 samples/sec   Loss 4.6563   LearningRate 0.0108   Epoch: 13   Global Step: 224190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:03,002-Speed 8846.83 samples/sec   Loss 4.8032   LearningRate 0.0108   Epoch: 13   Global Step: 224200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:04,080-Speed 9506.65 samples/sec   Loss 4.7026   LearningRate 0.0108   Epoch: 13   Global Step: 224210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:05,173-Speed 9379.18 samples/sec   Loss 4.7995   LearningRate 0.0108   Epoch: 13   Global Step: 224220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:06,217-Speed 9814.32 samples/sec   Loss 4.6783   LearningRate 0.0108   Epoch: 13   Global Step: 224230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:07,258-Speed 9837.78 samples/sec   Loss 4.7422   LearningRate 0.0108   Epoch: 13   Global Step: 224240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:08,409-Speed 8897.94 samples/sec   Loss 4.6973   LearningRate 0.0108   Epoch: 13   Global Step: 224250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:09,468-Speed 9681.22 samples/sec   Loss 4.6745   LearningRate 0.0108   Epoch: 13   Global Step: 224260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:10,525-Speed 9688.80 samples/sec   Loss 4.7056   LearningRate 0.0108   Epoch: 13   Global Step: 224270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:11,605-Speed 9490.25 samples/sec   Loss 4.6690   LearningRate 0.0108   Epoch: 13   Global Step: 224280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:12,710-Speed 9267.52 samples/sec   Loss 4.7528   LearningRate 0.0108   Epoch: 13   Global Step: 224290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:13,803-Speed 9378.16 samples/sec   Loss 4.7437   LearningRate 0.0108   Epoch: 13   Global Step: 224300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:14,904-Speed 9300.26 samples/sec   Loss 4.6528   LearningRate 0.0108   Epoch: 13   Global Step: 224310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:15,971-Speed 9606.57 samples/sec   Loss 4.7053   LearningRate 0.0108   Epoch: 13   Global Step: 224320   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 20:36:17,086-Speed 9195.07 samples/sec   Loss 4.7587   LearningRate 0.0108   Epoch: 13   Global Step: 224330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:18,155-Speed 9581.53 samples/sec   Loss 4.7514   LearningRate 0.0108   Epoch: 13   Global Step: 224340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:19,253-Speed 9334.65 samples/sec   Loss 4.7371   LearningRate 0.0108   Epoch: 13   Global Step: 224350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:20,336-Speed 9463.60 samples/sec   Loss 4.7043   LearningRate 0.0108   Epoch: 13   Global Step: 224360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:21,438-Speed 9290.71 samples/sec   Loss 4.6837   LearningRate 0.0107   Epoch: 13   Global Step: 224370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:22,588-Speed 8914.36 samples/sec   Loss 4.7172   LearningRate 0.0107   Epoch: 13   Global Step: 224380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:23,664-Speed 9521.34 samples/sec   Loss 4.7694   LearningRate 0.0107   Epoch: 13   Global Step: 224390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:24,724-Speed 9664.59 samples/sec   Loss 4.7066   LearningRate 0.0107   Epoch: 13   Global Step: 224400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:25,819-Speed 9359.89 samples/sec   Loss 4.7327   LearningRate 0.0107   Epoch: 13   Global Step: 224410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:26,876-Speed 9690.77 samples/sec   Loss 4.6845   LearningRate 0.0107   Epoch: 13   Global Step: 224420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:27,988-Speed 9214.85 samples/sec   Loss 4.8004   LearningRate 0.0107   Epoch: 13   Global Step: 224430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:29,080-Speed 9383.90 samples/sec   Loss 4.6722   LearningRate 0.0107   Epoch: 13   Global Step: 224440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:30,168-Speed 9411.08 samples/sec   Loss 4.7121   LearningRate 0.0107   Epoch: 13   Global Step: 224450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:31,227-Speed 9674.82 samples/sec   Loss 4.6721   LearningRate 0.0107   Epoch: 13   Global Step: 224460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:32,270-Speed 9831.63 samples/sec   Loss 4.7743   LearningRate 0.0107   Epoch: 13   Global Step: 224470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:33,364-Speed 9364.11 samples/sec   Loss 4.6949   LearningRate 0.0107   Epoch: 13   Global Step: 224480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:34,475-Speed 9221.77 samples/sec   Loss 4.6099   LearningRate 0.0107   Epoch: 13   Global Step: 224490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:35,566-Speed 9398.21 samples/sec   Loss 4.6869   LearningRate 0.0107   Epoch: 13   Global Step: 224500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:36,656-Speed 9398.07 samples/sec   Loss 4.7588   LearningRate 0.0107   Epoch: 13   Global Step: 224510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:37,788-Speed 9049.97 samples/sec   Loss 4.8121   LearningRate 0.0107   Epoch: 13   Global Step: 224520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:38,873-Speed 9440.60 samples/sec   Loss 4.7175   LearningRate 0.0107   Epoch: 13   Global Step: 224530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:41,987-Speed 3289.32 samples/sec   Loss 4.7960   LearningRate 0.0107   Epoch: 13   Global Step: 224540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:43,044-Speed 9689.22 samples/sec   Loss 4.7098   LearningRate 0.0107   Epoch: 13   Global Step: 224550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:44,192-Speed 8926.96 samples/sec   Loss 4.6733   LearningRate 0.0107   Epoch: 13   Global Step: 224560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:46,100-Speed 5370.15 samples/sec   Loss 4.6955   LearningRate 0.0107   Epoch: 13   Global Step: 224570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:48,143-Speed 5014.52 samples/sec   Loss 4.6196   LearningRate 0.0107   Epoch: 13   Global Step: 224580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:49,192-Speed 9762.35 samples/sec   Loss 4.7581   LearningRate 0.0107   Epoch: 13   Global Step: 224590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:51,222-Speed 5046.30 samples/sec   Loss 4.6276   LearningRate 0.0107   Epoch: 13   Global Step: 224600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:52,354-Speed 9057.57 samples/sec   Loss 4.7170   LearningRate 0.0107   Epoch: 13   Global Step: 224610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:36:53,414-Speed 9667.51 samples/sec   Loss 4.6817   LearningRate 0.0107   Epoch: 13   Global Step: 224620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:54,473-Speed 9674.40 samples/sec   Loss 4.6961   LearningRate 0.0107   Epoch: 13   Global Step: 224630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:55,592-Speed 9157.55 samples/sec   Loss 4.8314   LearningRate 0.0107   Epoch: 13   Global Step: 224640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:56,715-Speed 9121.21 samples/sec   Loss 4.7274   LearningRate 0.0107   Epoch: 13   Global Step: 224650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:57,802-Speed 9423.89 samples/sec   Loss 4.7093   LearningRate 0.0107   Epoch: 13   Global Step: 224660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:36:58,905-Speed 9295.55 samples/sec   Loss 4.7725   LearningRate 0.0107   Epoch: 13   Global Step: 224670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:37:00,017-Speed 9213.46 samples/sec   Loss 4.6529   LearningRate 0.0107   Epoch: 13   Global Step: 224680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:37:01,117-Speed 9315.35 samples/sec   Loss 4.7753   LearningRate 0.0107   Epoch: 13   Global Step: 224690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:37:02,177-Speed 9659.97 samples/sec   Loss 4.8247   LearningRate 0.0107   Epoch: 13   Global Step: 224700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:37:03,270-Speed 9374.19 samples/sec   Loss 4.6418   LearningRate 0.0107   Epoch: 13   Global Step: 224710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:37:04,408-Speed 9008.86 samples/sec   Loss 4.7145   LearningRate 0.0107   Epoch: 13   Global Step: 224720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:37:05,527-Speed 9154.90 samples/sec   Loss 4.6090   LearningRate 0.0107   Epoch: 13   Global Step: 224730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:37:06,627-Speed 9318.75 samples/sec   Loss 4.7766   LearningRate 0.0107   Epoch: 13   Global Step: 224740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:37:07,725-Speed 9326.60 samples/sec   Loss 4.7562   LearningRate 0.0107   Epoch: 13   Global Step: 224750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:37:08,822-Speed 9338.46 samples/sec   Loss 4.6318   LearningRate 0.0107   Epoch: 13   Global Step: 224760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:37:09,917-Speed 9360.72 samples/sec   Loss 4.7987   LearningRate 0.0107   Epoch: 13   Global Step: 224770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:37:11,042-Speed 9105.92 samples/sec   Loss 4.7119   LearningRate 0.0107   Epoch: 13   Global Step: 224780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:37:12,155-Speed 9206.31 samples/sec   Loss 4.7031   LearningRate 0.0107   Epoch: 13   Global Step: 224790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:37:13,249-Speed 9368.21 samples/sec   Loss 4.7164   LearningRate 0.0107   Epoch: 13   Global Step: 224800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:37:14,318-Speed 9588.28 samples/sec   Loss 4.7739   LearningRate 0.0107   Epoch: 13   Global Step: 224810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:37:15,401-Speed 9461.89 samples/sec   Loss 4.7302   LearningRate 0.0107   Epoch: 13   Global Step: 224820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:37:16,524-Speed 9124.91 samples/sec   Loss 4.6295   LearningRate 0.0107   Epoch: 13   Global Step: 224830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:37:17,601-Speed 9509.61 samples/sec   Loss 4.6673   LearningRate 0.0107   Epoch: 13   Global Step: 224840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:37:18,678-Speed 9516.27 samples/sec   Loss 4.8240   LearningRate 0.0107   Epoch: 13   Global Step: 224850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:37:19,742-Speed 9627.25 samples/sec   Loss 4.7975   LearningRate 0.0107   Epoch: 13   Global Step: 224860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:37:20,841-Speed 9324.73 samples/sec   Loss 4.6581   LearningRate 0.0107   Epoch: 13   Global Step: 224870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:37:21,902-Speed 9661.23 samples/sec   Loss 4.7649   LearningRate 0.0107   Epoch: 13   Global Step: 224880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:37:23,008-Speed 9265.13 samples/sec   Loss 4.6961   LearningRate 0.0106   Epoch: 13   Global Step: 224890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:37:24,108-Speed 9312.41 samples/sec   Loss 4.7441   LearningRate 0.0106   Epoch: 13   Global Step: 224900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:37:25,195-Speed 9430.57 samples/sec   Loss 4.7571   LearningRate 0.0106   Epoch: 13   Global Step: 224910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:26,301-Speed 9259.50 samples/sec   Loss 4.6628   LearningRate 0.0106   Epoch: 13   Global Step: 224920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:27,374-Speed 9550.73 samples/sec   Loss 4.7658   LearningRate 0.0106   Epoch: 13   Global Step: 224930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:28,427-Speed 9729.46 samples/sec   Loss 4.7176   LearningRate 0.0106   Epoch: 13   Global Step: 224940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:29,601-Speed 8727.32 samples/sec   Loss 4.7339   LearningRate 0.0106   Epoch: 13   Global Step: 224950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:30,693-Speed 9384.13 samples/sec   Loss 4.7200   LearningRate 0.0106   Epoch: 13   Global Step: 224960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:31,770-Speed 9517.37 samples/sec   Loss 4.6078   LearningRate 0.0106   Epoch: 13   Global Step: 224970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:32,887-Speed 9172.86 samples/sec   Loss 4.7895   LearningRate 0.0106   Epoch: 13   Global Step: 224980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:34,045-Speed 8854.70 samples/sec   Loss 4.7447   LearningRate 0.0106   Epoch: 13   Global Step: 224990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:35,132-Speed 9424.45 samples/sec   Loss 4.7559   LearningRate 0.0106   Epoch: 13   Global Step: 225000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:36,220-Speed 9417.52 samples/sec   Loss 4.6526   LearningRate 0.0106   Epoch: 13   Global Step: 225010   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 20:37:37,285-Speed 9623.78 samples/sec   Loss 4.6809   LearningRate 0.0106   Epoch: 13   Global Step: 225020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:38,386-Speed 9300.43 samples/sec   Loss 4.7749   LearningRate 0.0106   Epoch: 13   Global Step: 225030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:39,484-Speed 9335.99 samples/sec   Loss 4.6515   LearningRate 0.0106   Epoch: 13   Global Step: 225040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:40,591-Speed 9255.27 samples/sec   Loss 4.7458   LearningRate 0.0106   Epoch: 13   Global Step: 225050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:41,719-Speed 9077.22 samples/sec   Loss 4.8076   LearningRate 0.0106   Epoch: 13   Global Step: 225060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:42,823-Speed 9280.82 samples/sec   Loss 4.7698   LearningRate 0.0106   Epoch: 13   Global Step: 225070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:43,912-Speed 9407.32 samples/sec   Loss 4.7069   LearningRate 0.0106   Epoch: 13   Global Step: 225080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:45,013-Speed 9308.37 samples/sec   Loss 4.8078   LearningRate 0.0106   Epoch: 13   Global Step: 225090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:46,090-Speed 9515.27 samples/sec   Loss 4.7501   LearningRate 0.0106   Epoch: 13   Global Step: 225100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:47,176-Speed 9435.78 samples/sec   Loss 4.7745   LearningRate 0.0106   Epoch: 13   Global Step: 225110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:48,289-Speed 9204.95 samples/sec   Loss 4.7592   LearningRate 0.0106   Epoch: 13   Global Step: 225120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:49,393-Speed 9281.92 samples/sec   Loss 4.7458   LearningRate 0.0106   Epoch: 13   Global Step: 225130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:50,479-Speed 9437.33 samples/sec   Loss 4.7825   LearningRate 0.0106   Epoch: 13   Global Step: 225140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:51,606-Speed 9092.68 samples/sec   Loss 4.7785   LearningRate 0.0106   Epoch: 13   Global Step: 225150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:52,706-Speed 9316.85 samples/sec   Loss 4.7815   LearningRate 0.0106   Epoch: 13   Global Step: 225160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:53,790-Speed 9452.67 samples/sec   Loss 4.7009   LearningRate 0.0106   Epoch: 13   Global Step: 225170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:54,904-Speed 9197.68 samples/sec   Loss 4.7423   LearningRate 0.0106   Epoch: 13   Global Step: 225180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:55,994-Speed 9394.37 samples/sec   Loss 4.7410   LearningRate 0.0106   Epoch: 13   Global Step: 225190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:57,098-Speed 9286.69 samples/sec   Loss 4.7045   LearningRate 0.0106   Epoch: 13   Global Step: 225200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:58,239-Speed 8980.45 samples/sec   Loss 4.7642   LearningRate 0.0106   Epoch: 13   Global Step: 225210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:37:59,355-Speed 9172.77 samples/sec   Loss 4.6280   LearningRate 0.0106   Epoch: 13   Global Step: 225220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:00,469-Speed 9199.15 samples/sec   Loss 4.6941   LearningRate 0.0106   Epoch: 13   Global Step: 225230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:01,556-Speed 9424.05 samples/sec   Loss 4.6824   LearningRate 0.0106   Epoch: 13   Global Step: 225240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:02,665-Speed 9244.42 samples/sec   Loss 4.6623   LearningRate 0.0106   Epoch: 13   Global Step: 225250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:03,771-Speed 9256.17 samples/sec   Loss 4.6652   LearningRate 0.0106   Epoch: 13   Global Step: 225260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:04,863-Speed 9391.17 samples/sec   Loss 4.6937   LearningRate 0.0106   Epoch: 13   Global Step: 225270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:05,937-Speed 9539.76 samples/sec   Loss 4.7307   LearningRate 0.0106   Epoch: 13   Global Step: 225280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:07,014-Speed 9512.96 samples/sec   Loss 4.7278   LearningRate 0.0106   Epoch: 13   Global Step: 225290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:08,125-Speed 9227.14 samples/sec   Loss 4.7917   LearningRate 0.0106   Epoch: 13   Global Step: 225300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:09,184-Speed 9672.53 samples/sec   Loss 4.7949   LearningRate 0.0106   Epoch: 13   Global Step: 225310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:10,255-Speed 9566.52 samples/sec   Loss 4.7010   LearningRate 0.0106   Epoch: 13   Global Step: 225320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:11,372-Speed 9173.20 samples/sec   Loss 4.7641   LearningRate 0.0106   Epoch: 13   Global Step: 225330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:12,484-Speed 9208.08 samples/sec   Loss 4.7560   LearningRate 0.0106   Epoch: 13   Global Step: 225340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:13,588-Speed 9279.51 samples/sec   Loss 4.6839   LearningRate 0.0106   Epoch: 13   Global Step: 225350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:14,668-Speed 9490.05 samples/sec   Loss 4.7327   LearningRate 0.0106   Epoch: 13   Global Step: 225360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:15,733-Speed 9626.64 samples/sec   Loss 4.6867   LearningRate 0.0106   Epoch: 13   Global Step: 225370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:16,850-Speed 9166.73 samples/sec   Loss 4.6935   LearningRate 0.0106   Epoch: 13   Global Step: 225380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:17,942-Speed 9390.33 samples/sec   Loss 4.7344   LearningRate 0.0106   Epoch: 13   Global Step: 225390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:19,076-Speed 9031.36 samples/sec   Loss 4.6977   LearningRate 0.0105   Epoch: 13   Global Step: 225400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:20,187-Speed 9217.47 samples/sec   Loss 4.8016   LearningRate 0.0105   Epoch: 13   Global Step: 225410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:21,253-Speed 9613.60 samples/sec   Loss 4.7524   LearningRate 0.0105   Epoch: 13   Global Step: 225420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:22,341-Speed 9421.36 samples/sec   Loss 4.7198   LearningRate 0.0105   Epoch: 13   Global Step: 225430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:23,420-Speed 9500.02 samples/sec   Loss 4.7826   LearningRate 0.0105   Epoch: 13   Global Step: 225440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:24,479-Speed 9670.39 samples/sec   Loss 4.7322   LearningRate 0.0105   Epoch: 13   Global Step: 225450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:25,527-Speed 9774.78 samples/sec   Loss 4.6361   LearningRate 0.0105   Epoch: 13   Global Step: 225460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:26,592-Speed 9629.97 samples/sec   Loss 4.7342   LearningRate 0.0105   Epoch: 13   Global Step: 225470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:27,689-Speed 9335.88 samples/sec   Loss 4.7882   LearningRate 0.0105   Epoch: 13   Global Step: 225480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:28,766-Speed 9511.79 samples/sec   Loss 4.7635   LearningRate 0.0105   Epoch: 13   Global Step: 225490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:29,829-Speed 9638.27 samples/sec   Loss 4.8191   LearningRate 0.0105   Epoch: 13   Global Step: 225500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:30,877-Speed 9777.62 samples/sec   Loss 4.7871   LearningRate 0.0105   Epoch: 13   Global Step: 225510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:31,956-Speed 9497.71 samples/sec   Loss 4.7168   LearningRate 0.0105   Epoch: 13   Global Step: 225520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:33,046-Speed 9393.50 samples/sec   Loss 4.7219   LearningRate 0.0105   Epoch: 13   Global Step: 225530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:34,133-Speed 9430.02 samples/sec   Loss 4.7171   LearningRate 0.0105   Epoch: 13   Global Step: 225540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:35,236-Speed 9292.86 samples/sec   Loss 4.7107   LearningRate 0.0105   Epoch: 13   Global Step: 225550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:36,268-Speed 9921.42 samples/sec   Loss 4.7411   LearningRate 0.0105   Epoch: 13   Global Step: 225560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:37,362-Speed 9372.21 samples/sec   Loss 4.7027   LearningRate 0.0105   Epoch: 13   Global Step: 225570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:38,441-Speed 9489.00 samples/sec   Loss 4.8281   LearningRate 0.0105   Epoch: 13   Global Step: 225580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:39,583-Speed 8972.02 samples/sec   Loss 4.7316   LearningRate 0.0105   Epoch: 13   Global Step: 225590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:40,661-Speed 9506.44 samples/sec   Loss 4.5895   LearningRate 0.0105   Epoch: 13   Global Step: 225600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:41,742-Speed 9481.61 samples/sec   Loss 4.7584   LearningRate 0.0105   Epoch: 13   Global Step: 225610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:42,862-Speed 9148.65 samples/sec   Loss 4.7554   LearningRate 0.0105   Epoch: 13   Global Step: 225620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:43,981-Speed 9158.20 samples/sec   Loss 4.7813   LearningRate 0.0105   Epoch: 13   Global Step: 225630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:45,058-Speed 9515.04 samples/sec   Loss 4.6917   LearningRate 0.0105   Epoch: 13   Global Step: 225640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:46,109-Speed 9750.09 samples/sec   Loss 4.6993   LearningRate 0.0105   Epoch: 13   Global Step: 225650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:47,192-Speed 9458.08 samples/sec   Loss 4.6170   LearningRate 0.0105   Epoch: 13   Global Step: 225660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:48,268-Speed 9527.03 samples/sec   Loss 4.7835   LearningRate 0.0105   Epoch: 13   Global Step: 225670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:49,386-Speed 9158.31 samples/sec   Loss 4.6361   LearningRate 0.0105   Epoch: 13   Global Step: 225680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:50,469-Speed 9467.19 samples/sec   Loss 4.8166   LearningRate 0.0105   Epoch: 13   Global Step: 225690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:51,551-Speed 9465.56 samples/sec   Loss 4.6625   LearningRate 0.0105   Epoch: 13   Global Step: 225700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:38:52,666-Speed 9199.09 samples/sec   Loss 4.7085   LearningRate 0.0105   Epoch: 13   Global Step: 225710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:53,781-Speed 9183.60 samples/sec   Loss 4.7685   LearningRate 0.0105   Epoch: 13   Global Step: 225720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:54,880-Speed 9324.92 samples/sec   Loss 4.7703   LearningRate 0.0105   Epoch: 13   Global Step: 225730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:55,946-Speed 9605.73 samples/sec   Loss 4.6870   LearningRate 0.0105   Epoch: 13   Global Step: 225740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:57,023-Speed 9518.15 samples/sec   Loss 4.7383   LearningRate 0.0105   Epoch: 13   Global Step: 225750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:58,125-Speed 9294.75 samples/sec   Loss 4.7506   LearningRate 0.0105   Epoch: 13   Global Step: 225760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:38:59,188-Speed 9642.89 samples/sec   Loss 4.7822   LearningRate 0.0105   Epoch: 13   Global Step: 225770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:39:00,309-Speed 9135.29 samples/sec   Loss 4.7085   LearningRate 0.0105   Epoch: 13   Global Step: 225780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:39:01,395-Speed 9438.33 samples/sec   Loss 4.7795   LearningRate 0.0105   Epoch: 13   Global Step: 225790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:39:02,464-Speed 9578.95 samples/sec   Loss 4.6735   LearningRate 0.0105   Epoch: 13   Global Step: 225800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:39:03,562-Speed 9335.28 samples/sec   Loss 4.8094   LearningRate 0.0105   Epoch: 13   Global Step: 225810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:39:04,671-Speed 9242.99 samples/sec   Loss 4.6964   LearningRate 0.0105   Epoch: 13   Global Step: 225820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:39:05,764-Speed 9369.33 samples/sec   Loss 4.7987   LearningRate 0.0105   Epoch: 13   Global Step: 225830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:39:06,865-Speed 9306.46 samples/sec   Loss 4.7637   LearningRate 0.0105   Epoch: 13   Global Step: 225840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:39:07,916-Speed 9754.67 samples/sec   Loss 4.8009   LearningRate 0.0105   Epoch: 13   Global Step: 225850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:39:08,999-Speed 9455.03 samples/sec   Loss 4.7401   LearningRate 0.0105   Epoch: 13   Global Step: 225860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:39:10,119-Speed 9149.81 samples/sec   Loss 4.7595   LearningRate 0.0105   Epoch: 13   Global Step: 225870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:39:11,207-Speed 9420.95 samples/sec   Loss 4.6928   LearningRate 0.0105   Epoch: 13   Global Step: 225880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:39:12,238-Speed 9930.68 samples/sec   Loss 4.7238   LearningRate 0.0105   Epoch: 13   Global Step: 225890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:39:13,308-Speed 9577.82 samples/sec   Loss 4.7953   LearningRate 0.0105   Epoch: 13   Global Step: 225900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:39:14,404-Speed 9345.24 samples/sec   Loss 4.8092   LearningRate 0.0104   Epoch: 13   Global Step: 225910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:39:15,464-Speed 9675.01 samples/sec   Loss 4.6517   LearningRate 0.0104   Epoch: 13   Global Step: 225920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:39:16,539-Speed 9527.55 samples/sec   Loss 4.7908   LearningRate 0.0104   Epoch: 13   Global Step: 225930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:39:17,616-Speed 9511.52 samples/sec   Loss 4.6858   LearningRate 0.0104   Epoch: 13   Global Step: 225940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:39:18,706-Speed 9403.71 samples/sec   Loss 4.6808   LearningRate 0.0104   Epoch: 13   Global Step: 225950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:39:19,789-Speed 9461.08 samples/sec   Loss 4.7817   LearningRate 0.0104   Epoch: 13   Global Step: 225960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:39:20,862-Speed 9542.44 samples/sec   Loss 4.7600   LearningRate 0.0104   Epoch: 13   Global Step: 225970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:39:21,947-Speed 9449.20 samples/sec   Loss 4.6986   LearningRate 0.0104   Epoch: 13   Global Step: 225980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:39:23,040-Speed 9382.57 samples/sec   Loss 4.7630   LearningRate 0.0104   Epoch: 13   Global Step: 225990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:39:24,108-Speed 9589.88 samples/sec   Loss 4.7866   LearningRate 0.0104   Epoch: 13   Global Step: 226000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:39:46,181-[lfw][226000]XNorm: 7.981218
Training: 2022-04-11 20:39:46,181-[lfw][226000]Accuracy-Flip: 0.99700+-0.00287
Training: 2022-04-11 20:39:46,182-[lfw][226000]Accuracy-Highest: 0.99733
Training: 2022-04-11 20:40:11,357-[cfp_fp][226000]XNorm: 6.844564
Training: 2022-04-11 20:40:11,357-[cfp_fp][226000]Accuracy-Flip: 0.96771+-0.01066
Training: 2022-04-11 20:40:11,357-[cfp_fp][226000]Accuracy-Highest: 0.96771
Training: 2022-04-11 20:40:33,057-[agedb_30][226000]XNorm: 7.700978
Training: 2022-04-11 20:40:33,058-[agedb_30][226000]Accuracy-Flip: 0.96900+-0.00920
Training: 2022-04-11 20:40:33,058-[agedb_30][226000]Accuracy-Highest: 0.97033
Training: 2022-04-11 20:40:34,139-Speed 146.22 samples/sec   Loss 4.7565   LearningRate 0.0104   Epoch: 13   Global Step: 226010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:35,208-Speed 9587.76 samples/sec   Loss 4.7886   LearningRate 0.0104   Epoch: 13   Global Step: 226020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:36,270-Speed 9641.48 samples/sec   Loss 4.7578   LearningRate 0.0104   Epoch: 13   Global Step: 226030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:37,339-Speed 9584.03 samples/sec   Loss 4.6880   LearningRate 0.0104   Epoch: 13   Global Step: 226040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:38,453-Speed 9203.50 samples/sec   Loss 4.7353   LearningRate 0.0104   Epoch: 13   Global Step: 226050   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 20:40:39,534-Speed 9471.42 samples/sec   Loss 4.7351   LearningRate 0.0104   Epoch: 13   Global Step: 226060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:40,567-Speed 9927.45 samples/sec   Loss 4.8044   LearningRate 0.0104   Epoch: 13   Global Step: 226070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:41,652-Speed 9439.60 samples/sec   Loss 4.6888   LearningRate 0.0104   Epoch: 13   Global Step: 226080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:42,719-Speed 9600.05 samples/sec   Loss 4.7141   LearningRate 0.0104   Epoch: 13   Global Step: 226090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:43,810-Speed 9393.65 samples/sec   Loss 4.7592   LearningRate 0.0104   Epoch: 13   Global Step: 226100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:44,891-Speed 9478.87 samples/sec   Loss 4.7592   LearningRate 0.0104   Epoch: 13   Global Step: 226110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:45,997-Speed 9265.50 samples/sec   Loss 4.6525   LearningRate 0.0104   Epoch: 13   Global Step: 226120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:47,082-Speed 9445.92 samples/sec   Loss 4.7834   LearningRate 0.0104   Epoch: 13   Global Step: 226130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:48,218-Speed 9017.18 samples/sec   Loss 4.7439   LearningRate 0.0104   Epoch: 13   Global Step: 226140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:49,293-Speed 9534.86 samples/sec   Loss 4.7345   LearningRate 0.0104   Epoch: 13   Global Step: 226150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:50,372-Speed 9492.99 samples/sec   Loss 4.7625   LearningRate 0.0104   Epoch: 13   Global Step: 226160   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 20:40:51,463-Speed 9392.16 samples/sec   Loss 4.7819   LearningRate 0.0104   Epoch: 13   Global Step: 226170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:52,565-Speed 9305.18 samples/sec   Loss 4.7532   LearningRate 0.0104   Epoch: 13   Global Step: 226180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:53,657-Speed 9377.21 samples/sec   Loss 4.7304   LearningRate 0.0104   Epoch: 13   Global Step: 226190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:54,761-Speed 9283.02 samples/sec   Loss 4.6147   LearningRate 0.0104   Epoch: 13   Global Step: 226200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:55,823-Speed 9649.62 samples/sec   Loss 4.6309   LearningRate 0.0104   Epoch: 13   Global Step: 226210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:56,886-Speed 9639.45 samples/sec   Loss 4.7350   LearningRate 0.0104   Epoch: 13   Global Step: 226220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:57,977-Speed 9387.50 samples/sec   Loss 4.7426   LearningRate 0.0104   Epoch: 13   Global Step: 226230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:40:59,039-Speed 9647.46 samples/sec   Loss 4.8447   LearningRate 0.0104   Epoch: 13   Global Step: 226240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:00,088-Speed 9764.33 samples/sec   Loss 4.7219   LearningRate 0.0104   Epoch: 13   Global Step: 226250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:01,181-Speed 9380.30 samples/sec   Loss 4.8541   LearningRate 0.0104   Epoch: 13   Global Step: 226260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:02,297-Speed 9174.08 samples/sec   Loss 4.8452   LearningRate 0.0104   Epoch: 13   Global Step: 226270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:03,360-Speed 9643.53 samples/sec   Loss 4.7381   LearningRate 0.0104   Epoch: 13   Global Step: 226280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:04,406-Speed 9790.61 samples/sec   Loss 4.7358   LearningRate 0.0104   Epoch: 13   Global Step: 226290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:05,508-Speed 9303.92 samples/sec   Loss 4.9293   LearningRate 0.0104   Epoch: 13   Global Step: 226300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:06,614-Speed 9261.58 samples/sec   Loss 4.7185   LearningRate 0.0104   Epoch: 13   Global Step: 226310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:07,698-Speed 9449.98 samples/sec   Loss 4.7272   LearningRate 0.0104   Epoch: 13   Global Step: 226320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:08,760-Speed 9650.82 samples/sec   Loss 4.7353   LearningRate 0.0104   Epoch: 13   Global Step: 226330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:09,838-Speed 9501.63 samples/sec   Loss 4.7272   LearningRate 0.0104   Epoch: 13   Global Step: 226340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:10,892-Speed 9719.70 samples/sec   Loss 4.6622   LearningRate 0.0104   Epoch: 13   Global Step: 226350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:11,934-Speed 9838.45 samples/sec   Loss 4.7780   LearningRate 0.0104   Epoch: 13   Global Step: 226360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:13,016-Speed 9467.16 samples/sec   Loss 4.6873   LearningRate 0.0104   Epoch: 13   Global Step: 226370   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 20:41:14,108-Speed 9390.83 samples/sec   Loss 4.7271   LearningRate 0.0104   Epoch: 13   Global Step: 226380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:15,206-Speed 9329.79 samples/sec   Loss 4.7475   LearningRate 0.0104   Epoch: 13   Global Step: 226390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:41:16,311-Speed 9274.92 samples/sec   Loss 4.7314   LearningRate 0.0104   Epoch: 13   Global Step: 226400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:41:17,398-Speed 9422.53 samples/sec   Loss 4.6607   LearningRate 0.0104   Epoch: 13   Global Step: 226410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:41:18,475-Speed 9520.86 samples/sec   Loss 4.7265   LearningRate 0.0104   Epoch: 13   Global Step: 226420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:41:19,568-Speed 9366.78 samples/sec   Loss 4.7815   LearningRate 0.0103   Epoch: 13   Global Step: 226430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:41:20,635-Speed 9605.15 samples/sec   Loss 4.7567   LearningRate 0.0103   Epoch: 13   Global Step: 226440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:41:21,752-Speed 9168.59 samples/sec   Loss 4.6261   LearningRate 0.0103   Epoch: 13   Global Step: 226450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:41:22,822-Speed 9581.71 samples/sec   Loss 4.7930   LearningRate 0.0103   Epoch: 13   Global Step: 226460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:41:23,916-Speed 9367.13 samples/sec   Loss 4.6912   LearningRate 0.0103   Epoch: 13   Global Step: 226470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:41:25,042-Speed 9099.44 samples/sec   Loss 4.7539   LearningRate 0.0103   Epoch: 13   Global Step: 226480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:41:26,151-Speed 9237.72 samples/sec   Loss 4.6509   LearningRate 0.0103   Epoch: 13   Global Step: 226490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:27,265-Speed 9199.75 samples/sec   Loss 4.6876   LearningRate 0.0103   Epoch: 13   Global Step: 226500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:28,352-Speed 9427.80 samples/sec   Loss 4.6882   LearningRate 0.0103   Epoch: 13   Global Step: 226510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:29,415-Speed 9635.61 samples/sec   Loss 4.7724   LearningRate 0.0103   Epoch: 13   Global Step: 226520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:30,518-Speed 9286.86 samples/sec   Loss 4.6902   LearningRate 0.0103   Epoch: 13   Global Step: 226530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:31,612-Speed 9368.02 samples/sec   Loss 4.7585   LearningRate 0.0103   Epoch: 13   Global Step: 226540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:32,700-Speed 9422.67 samples/sec   Loss 4.8012   LearningRate 0.0103   Epoch: 13   Global Step: 226550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:33,755-Speed 9705.99 samples/sec   Loss 4.6949   LearningRate 0.0103   Epoch: 13   Global Step: 226560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:34,818-Speed 9649.05 samples/sec   Loss 4.6609   LearningRate 0.0103   Epoch: 13   Global Step: 226570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:35,911-Speed 9370.92 samples/sec   Loss 4.7915   LearningRate 0.0103   Epoch: 13   Global Step: 226580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:36,969-Speed 9686.15 samples/sec   Loss 4.8066   LearningRate 0.0103   Epoch: 13   Global Step: 226590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:38,032-Speed 9637.85 samples/sec   Loss 4.7597   LearningRate 0.0103   Epoch: 13   Global Step: 226600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:39,143-Speed 9217.78 samples/sec   Loss 4.7477   LearningRate 0.0103   Epoch: 13   Global Step: 226610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:40,210-Speed 9600.73 samples/sec   Loss 4.7049   LearningRate 0.0103   Epoch: 13   Global Step: 226620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:41,306-Speed 9354.31 samples/sec   Loss 4.7492   LearningRate 0.0103   Epoch: 13   Global Step: 226630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:42,365-Speed 9666.20 samples/sec   Loss 4.8192   LearningRate 0.0103   Epoch: 13   Global Step: 226640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:43,439-Speed 9544.50 samples/sec   Loss 4.7424   LearningRate 0.0103   Epoch: 13   Global Step: 226650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:44,557-Speed 9159.58 samples/sec   Loss 4.7408   LearningRate 0.0103   Epoch: 13   Global Step: 226660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:45,651-Speed 9372.24 samples/sec   Loss 4.8030   LearningRate 0.0103   Epoch: 13   Global Step: 226670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:46,714-Speed 9634.20 samples/sec   Loss 4.7090   LearningRate 0.0103   Epoch: 13   Global Step: 226680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:47,829-Speed 9190.46 samples/sec   Loss 4.6949   LearningRate 0.0103   Epoch: 13   Global Step: 226690   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 20:41:48,891-Speed 9650.90 samples/sec   Loss 4.8679   LearningRate 0.0103   Epoch: 13   Global Step: 226700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:49,972-Speed 9475.87 samples/sec   Loss 4.7999   LearningRate 0.0103   Epoch: 13   Global Step: 226710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:51,085-Speed 9211.67 samples/sec   Loss 4.7499   LearningRate 0.0103   Epoch: 13   Global Step: 226720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:52,147-Speed 9646.60 samples/sec   Loss 4.6523   LearningRate 0.0103   Epoch: 13   Global Step: 226730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:53,181-Speed 9910.44 samples/sec   Loss 4.7159   LearningRate 0.0103   Epoch: 13   Global Step: 226740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:54,260-Speed 9499.33 samples/sec   Loss 4.6995   LearningRate 0.0103   Epoch: 13   Global Step: 226750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:55,328-Speed 9589.22 samples/sec   Loss 4.7139   LearningRate 0.0103   Epoch: 13   Global Step: 226760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:56,381-Speed 9738.38 samples/sec   Loss 4.8058   LearningRate 0.0103   Epoch: 13   Global Step: 226770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:57,468-Speed 9423.24 samples/sec   Loss 4.7231   LearningRate 0.0103   Epoch: 13   Global Step: 226780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:58,579-Speed 9223.17 samples/sec   Loss 4.7769   LearningRate 0.0103   Epoch: 13   Global Step: 226790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:41:59,650-Speed 9565.48 samples/sec   Loss 4.7578   LearningRate 0.0103   Epoch: 13   Global Step: 226800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:00,764-Speed 9194.29 samples/sec   Loss 4.7622   LearningRate 0.0103   Epoch: 13   Global Step: 226810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:01,797-Speed 9926.17 samples/sec   Loss 4.8364   LearningRate 0.0103   Epoch: 13   Global Step: 226820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:02,912-Speed 9186.92 samples/sec   Loss 4.7518   LearningRate 0.0103   Epoch: 13   Global Step: 226830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:03,973-Speed 9659.06 samples/sec   Loss 4.8018   LearningRate 0.0103   Epoch: 13   Global Step: 226840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:05,021-Speed 9777.07 samples/sec   Loss 4.6722   LearningRate 0.0103   Epoch: 13   Global Step: 226850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:06,096-Speed 9524.38 samples/sec   Loss 4.7708   LearningRate 0.0103   Epoch: 13   Global Step: 226860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:07,175-Speed 9494.31 samples/sec   Loss 4.6602   LearningRate 0.0103   Epoch: 13   Global Step: 226870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:08,265-Speed 9407.20 samples/sec   Loss 4.7120   LearningRate 0.0103   Epoch: 13   Global Step: 226880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:09,334-Speed 9580.97 samples/sec   Loss 4.7293   LearningRate 0.0103   Epoch: 13   Global Step: 226890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:10,427-Speed 9383.73 samples/sec   Loss 4.7149   LearningRate 0.0103   Epoch: 13   Global Step: 226900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:11,547-Speed 9149.40 samples/sec   Loss 4.7799   LearningRate 0.0103   Epoch: 13   Global Step: 226910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:12,618-Speed 9564.25 samples/sec   Loss 4.7296   LearningRate 0.0103   Epoch: 13   Global Step: 226920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:13,674-Speed 9704.31 samples/sec   Loss 4.7358   LearningRate 0.0103   Epoch: 13   Global Step: 226930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:14,769-Speed 9348.89 samples/sec   Loss 4.6641   LearningRate 0.0103   Epoch: 13   Global Step: 226940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:15,852-Speed 9461.62 samples/sec   Loss 4.8641   LearningRate 0.0102   Epoch: 13   Global Step: 226950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:16,928-Speed 9524.07 samples/sec   Loss 4.7258   LearningRate 0.0102   Epoch: 13   Global Step: 226960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:18,008-Speed 9490.10 samples/sec   Loss 4.7794   LearningRate 0.0102   Epoch: 13   Global Step: 226970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:19,088-Speed 9487.89 samples/sec   Loss 4.7641   LearningRate 0.0102   Epoch: 13   Global Step: 226980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:20,155-Speed 9599.12 samples/sec   Loss 4.8699   LearningRate 0.0102   Epoch: 13   Global Step: 226990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:21,233-Speed 9505.94 samples/sec   Loss 4.7165   LearningRate 0.0102   Epoch: 13   Global Step: 227000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:22,285-Speed 9742.77 samples/sec   Loss 4.7794   LearningRate 0.0102   Epoch: 13   Global Step: 227010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:23,368-Speed 9460.44 samples/sec   Loss 4.6980   LearningRate 0.0102   Epoch: 13   Global Step: 227020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:42:24,465-Speed 9340.19 samples/sec   Loss 4.7780   LearningRate 0.0102   Epoch: 13   Global Step: 227030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:42:25,544-Speed 9493.60 samples/sec   Loss 4.7392   LearningRate 0.0102   Epoch: 13   Global Step: 227040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:42:26,651-Speed 9257.80 samples/sec   Loss 4.7912   LearningRate 0.0102   Epoch: 13   Global Step: 227050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:42:27,734-Speed 9470.69 samples/sec   Loss 4.7383   LearningRate 0.0102   Epoch: 13   Global Step: 227060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:42:28,827-Speed 9373.88 samples/sec   Loss 4.7339   LearningRate 0.0102   Epoch: 13   Global Step: 227070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:42:29,883-Speed 9701.00 samples/sec   Loss 4.6700   LearningRate 0.0102   Epoch: 13   Global Step: 227080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:42:30,929-Speed 9795.64 samples/sec   Loss 4.7784   LearningRate 0.0102   Epoch: 13   Global Step: 227090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:42:32,018-Speed 9410.42 samples/sec   Loss 4.6847   LearningRate 0.0102   Epoch: 13   Global Step: 227100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:42:33,115-Speed 9337.10 samples/sec   Loss 4.7662   LearningRate 0.0102   Epoch: 13   Global Step: 227110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:42:34,184-Speed 9581.66 samples/sec   Loss 4.7018   LearningRate 0.0102   Epoch: 13   Global Step: 227120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:35,274-Speed 9400.39 samples/sec   Loss 4.8421   LearningRate 0.0102   Epoch: 13   Global Step: 227130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:36,370-Speed 9354.26 samples/sec   Loss 4.7835   LearningRate 0.0102   Epoch: 13   Global Step: 227140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:37,522-Speed 8886.99 samples/sec   Loss 4.7908   LearningRate 0.0102   Epoch: 13   Global Step: 227150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:38,678-Speed 8861.85 samples/sec   Loss 4.7505   LearningRate 0.0102   Epoch: 13   Global Step: 227160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:39,759-Speed 9482.86 samples/sec   Loss 4.7875   LearningRate 0.0102   Epoch: 13   Global Step: 227170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:40,823-Speed 9629.07 samples/sec   Loss 4.7592   LearningRate 0.0102   Epoch: 13   Global Step: 227180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:41,922-Speed 9325.70 samples/sec   Loss 4.7477   LearningRate 0.0102   Epoch: 13   Global Step: 227190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:42,972-Speed 9755.58 samples/sec   Loss 4.7022   LearningRate 0.0102   Epoch: 13   Global Step: 227200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:44,044-Speed 9560.12 samples/sec   Loss 4.6974   LearningRate 0.0102   Epoch: 13   Global Step: 227210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:45,165-Speed 9136.13 samples/sec   Loss 4.7254   LearningRate 0.0102   Epoch: 13   Global Step: 227220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:46,249-Speed 9455.27 samples/sec   Loss 4.8188   LearningRate 0.0102   Epoch: 13   Global Step: 227230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:47,358-Speed 9241.67 samples/sec   Loss 4.7091   LearningRate 0.0102   Epoch: 13   Global Step: 227240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:48,440-Speed 9470.93 samples/sec   Loss 4.7148   LearningRate 0.0102   Epoch: 13   Global Step: 227250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:49,513-Speed 9549.26 samples/sec   Loss 4.7729   LearningRate 0.0102   Epoch: 13   Global Step: 227260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:50,595-Speed 9464.93 samples/sec   Loss 4.7520   LearningRate 0.0102   Epoch: 13   Global Step: 227270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:51,674-Speed 9504.00 samples/sec   Loss 4.7910   LearningRate 0.0102   Epoch: 13   Global Step: 227280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:52,742-Speed 9591.06 samples/sec   Loss 4.7713   LearningRate 0.0102   Epoch: 13   Global Step: 227290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:53,802-Speed 9664.53 samples/sec   Loss 4.7039   LearningRate 0.0102   Epoch: 13   Global Step: 227300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:54,875-Speed 9548.32 samples/sec   Loss 4.7641   LearningRate 0.0102   Epoch: 13   Global Step: 227310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:55,936-Speed 9651.39 samples/sec   Loss 4.7657   LearningRate 0.0102   Epoch: 13   Global Step: 227320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:57,034-Speed 9334.13 samples/sec   Loss 4.7401   LearningRate 0.0102   Epoch: 13   Global Step: 227330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:58,079-Speed 9808.90 samples/sec   Loss 4.7465   LearningRate 0.0102   Epoch: 13   Global Step: 227340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:42:59,157-Speed 9505.14 samples/sec   Loss 4.6523   LearningRate 0.0102   Epoch: 13   Global Step: 227350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:00,254-Speed 9338.18 samples/sec   Loss 4.7261   LearningRate 0.0102   Epoch: 13   Global Step: 227360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:01,325-Speed 9570.83 samples/sec   Loss 4.6140   LearningRate 0.0102   Epoch: 13   Global Step: 227370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:02,393-Speed 9592.75 samples/sec   Loss 4.7106   LearningRate 0.0102   Epoch: 13   Global Step: 227380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:03,456-Speed 9636.53 samples/sec   Loss 4.6978   LearningRate 0.0102   Epoch: 13   Global Step: 227390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:04,528-Speed 9560.67 samples/sec   Loss 4.7685   LearningRate 0.0102   Epoch: 13   Global Step: 227400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:05,594-Speed 9612.40 samples/sec   Loss 4.7750   LearningRate 0.0102   Epoch: 13   Global Step: 227410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:06,702-Speed 9250.97 samples/sec   Loss 4.8191   LearningRate 0.0102   Epoch: 13   Global Step: 227420   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 20:43:07,776-Speed 9536.08 samples/sec   Loss 4.7012   LearningRate 0.0102   Epoch: 13   Global Step: 227430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:08,848-Speed 9557.75 samples/sec   Loss 4.6806   LearningRate 0.0102   Epoch: 13   Global Step: 227440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:09,890-Speed 9830.48 samples/sec   Loss 4.7773   LearningRate 0.0102   Epoch: 13   Global Step: 227450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:10,983-Speed 9377.67 samples/sec   Loss 4.7474   LearningRate 0.0102   Epoch: 13   Global Step: 227460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:12,095-Speed 9218.91 samples/sec   Loss 4.6619   LearningRate 0.0101   Epoch: 13   Global Step: 227470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:13,191-Speed 9345.61 samples/sec   Loss 4.6793   LearningRate 0.0101   Epoch: 13   Global Step: 227480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:14,275-Speed 9451.51 samples/sec   Loss 4.7702   LearningRate 0.0101   Epoch: 13   Global Step: 227490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:15,397-Speed 9131.76 samples/sec   Loss 4.7295   LearningRate 0.0101   Epoch: 13   Global Step: 227500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:16,468-Speed 9567.99 samples/sec   Loss 4.7064   LearningRate 0.0101   Epoch: 13   Global Step: 227510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:17,580-Speed 9215.87 samples/sec   Loss 4.6530   LearningRate 0.0101   Epoch: 13   Global Step: 227520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:18,723-Speed 8962.69 samples/sec   Loss 4.8098   LearningRate 0.0101   Epoch: 13   Global Step: 227530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:19,855-Speed 9049.76 samples/sec   Loss 4.7345   LearningRate 0.0101   Epoch: 13   Global Step: 227540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:20,995-Speed 8985.65 samples/sec   Loss 4.7278   LearningRate 0.0101   Epoch: 13   Global Step: 227550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:22,097-Speed 9300.93 samples/sec   Loss 4.8098   LearningRate 0.0101   Epoch: 13   Global Step: 227560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:23,192-Speed 9363.30 samples/sec   Loss 4.7817   LearningRate 0.0101   Epoch: 13   Global Step: 227570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:24,273-Speed 9475.28 samples/sec   Loss 4.7474   LearningRate 0.0101   Epoch: 13   Global Step: 227580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:25,354-Speed 9481.47 samples/sec   Loss 4.7329   LearningRate 0.0101   Epoch: 13   Global Step: 227590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:26,447-Speed 9369.54 samples/sec   Loss 4.7222   LearningRate 0.0101   Epoch: 13   Global Step: 227600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:27,511-Speed 9632.20 samples/sec   Loss 4.7395   LearningRate 0.0101   Epoch: 13   Global Step: 227610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:28,616-Speed 9273.07 samples/sec   Loss 4.7808   LearningRate 0.0101   Epoch: 13   Global Step: 227620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:29,748-Speed 9049.68 samples/sec   Loss 4.8210   LearningRate 0.0101   Epoch: 13   Global Step: 227630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:30,837-Speed 9414.43 samples/sec   Loss 4.7174   LearningRate 0.0101   Epoch: 13   Global Step: 227640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:31,892-Speed 9710.25 samples/sec   Loss 4.8473   LearningRate 0.0101   Epoch: 13   Global Step: 227650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:32,988-Speed 9346.41 samples/sec   Loss 4.7449   LearningRate 0.0101   Epoch: 13   Global Step: 227660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:34,109-Speed 9142.12 samples/sec   Loss 4.6935   LearningRate 0.0101   Epoch: 13   Global Step: 227670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:35,177-Speed 9585.10 samples/sec   Loss 4.7044   LearningRate 0.0101   Epoch: 13   Global Step: 227680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:36,265-Speed 9423.26 samples/sec   Loss 4.7852   LearningRate 0.0101   Epoch: 13   Global Step: 227690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:37,355-Speed 9396.32 samples/sec   Loss 4.7804   LearningRate 0.0101   Epoch: 13   Global Step: 227700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:38,469-Speed 9201.16 samples/sec   Loss 4.6472   LearningRate 0.0101   Epoch: 13   Global Step: 227710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:39,530-Speed 9661.93 samples/sec   Loss 4.7845   LearningRate 0.0101   Epoch: 13   Global Step: 227720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:40,611-Speed 9476.93 samples/sec   Loss 4.7297   LearningRate 0.0101   Epoch: 13   Global Step: 227730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:41,732-Speed 9139.77 samples/sec   Loss 4.7862   LearningRate 0.0101   Epoch: 13   Global Step: 227740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:42,832-Speed 9310.66 samples/sec   Loss 4.7124   LearningRate 0.0101   Epoch: 13   Global Step: 227750   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 20:43:43,888-Speed 9704.94 samples/sec   Loss 4.6929   LearningRate 0.0101   Epoch: 13   Global Step: 227760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:44,979-Speed 9392.02 samples/sec   Loss 4.8063   LearningRate 0.0101   Epoch: 13   Global Step: 227770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:46,075-Speed 9348.29 samples/sec   Loss 4.8183   LearningRate 0.0101   Epoch: 13   Global Step: 227780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:47,177-Speed 9296.91 samples/sec   Loss 4.7236   LearningRate 0.0101   Epoch: 13   Global Step: 227790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:48,267-Speed 9397.41 samples/sec   Loss 4.8288   LearningRate 0.0101   Epoch: 13   Global Step: 227800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:49,379-Speed 9213.19 samples/sec   Loss 4.8553   LearningRate 0.0101   Epoch: 13   Global Step: 227810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:50,446-Speed 9604.38 samples/sec   Loss 4.8442   LearningRate 0.0101   Epoch: 13   Global Step: 227820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:51,503-Speed 9698.86 samples/sec   Loss 4.6812   LearningRate 0.0101   Epoch: 13   Global Step: 227830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:52,560-Speed 9695.26 samples/sec   Loss 4.8007   LearningRate 0.0101   Epoch: 13   Global Step: 227840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:53,669-Speed 9233.96 samples/sec   Loss 4.8090   LearningRate 0.0101   Epoch: 13   Global Step: 227850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:54,754-Speed 9442.40 samples/sec   Loss 4.6918   LearningRate 0.0101   Epoch: 13   Global Step: 227860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:43:55,868-Speed 9200.18 samples/sec   Loss 4.6523   LearningRate 0.0101   Epoch: 13   Global Step: 227870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:56,933-Speed 9617.95 samples/sec   Loss 4.8214   LearningRate 0.0101   Epoch: 13   Global Step: 227880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:58,014-Speed 9480.00 samples/sec   Loss 4.7216   LearningRate 0.0101   Epoch: 13   Global Step: 227890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:43:59,135-Speed 9139.07 samples/sec   Loss 4.8194   LearningRate 0.0101   Epoch: 13   Global Step: 227900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:44:00,276-Speed 8980.20 samples/sec   Loss 4.7534   LearningRate 0.0101   Epoch: 13   Global Step: 227910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:44:01,378-Speed 9299.71 samples/sec   Loss 4.7936   LearningRate 0.0101   Epoch: 13   Global Step: 227920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:44:02,452-Speed 9538.84 samples/sec   Loss 4.7484   LearningRate 0.0101   Epoch: 13   Global Step: 227930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:44:03,520-Speed 9599.01 samples/sec   Loss 4.7759   LearningRate 0.0101   Epoch: 13   Global Step: 227940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:44:04,645-Speed 9104.74 samples/sec   Loss 4.7352   LearningRate 0.0101   Epoch: 13   Global Step: 227950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:44:05,725-Speed 9486.87 samples/sec   Loss 4.7544   LearningRate 0.0101   Epoch: 13   Global Step: 227960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:44:06,793-Speed 9596.56 samples/sec   Loss 4.6500   LearningRate 0.0101   Epoch: 13   Global Step: 227970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:44:07,839-Speed 9797.96 samples/sec   Loss 4.6951   LearningRate 0.0101   Epoch: 13   Global Step: 227980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:44:08,953-Speed 9198.70 samples/sec   Loss 4.8212   LearningRate 0.0101   Epoch: 13   Global Step: 227990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:44:10,026-Speed 9549.12 samples/sec   Loss 4.7416   LearningRate 0.0100   Epoch: 13   Global Step: 228000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:44:32,085-[lfw][228000]XNorm: 7.984275
Training: 2022-04-11 20:44:32,086-[lfw][228000]Accuracy-Flip: 0.99667+-0.00307
Training: 2022-04-11 20:44:32,086-[lfw][228000]Accuracy-Highest: 0.99733
Training: 2022-04-11 20:44:57,605-[cfp_fp][228000]XNorm: 6.898904
Training: 2022-04-11 20:44:57,606-[cfp_fp][228000]Accuracy-Flip: 0.96586+-0.01056
Training: 2022-04-11 20:44:57,606-[cfp_fp][228000]Accuracy-Highest: 0.96771
Training: 2022-04-11 20:45:19,656-[agedb_30][228000]XNorm: 7.723843
Training: 2022-04-11 20:45:19,656-[agedb_30][228000]Accuracy-Flip: 0.96867+-0.01059
Training: 2022-04-11 20:45:19,656-[agedb_30][228000]Accuracy-Highest: 0.97033
Training: 2022-04-11 20:45:20,770-Speed 144.75 samples/sec   Loss 4.7532   LearningRate 0.0100   Epoch: 13   Global Step: 228010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:21,847-Speed 9520.86 samples/sec   Loss 4.7082   LearningRate 0.0100   Epoch: 13   Global Step: 228020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:22,930-Speed 9461.39 samples/sec   Loss 4.7694   LearningRate 0.0100   Epoch: 13   Global Step: 228030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:24,050-Speed 9146.13 samples/sec   Loss 4.8589   LearningRate 0.0100   Epoch: 13   Global Step: 228040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:25,166-Speed 9180.62 samples/sec   Loss 4.7603   LearningRate 0.0100   Epoch: 13   Global Step: 228050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:26,250-Speed 9453.17 samples/sec   Loss 4.8578   LearningRate 0.0100   Epoch: 13   Global Step: 228060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:27,326-Speed 9523.68 samples/sec   Loss 4.7554   LearningRate 0.0100   Epoch: 13   Global Step: 228070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:28,402-Speed 9520.36 samples/sec   Loss 4.7167   LearningRate 0.0100   Epoch: 13   Global Step: 228080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:29,483-Speed 9476.21 samples/sec   Loss 4.8312   LearningRate 0.0100   Epoch: 13   Global Step: 228090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:30,562-Speed 9501.24 samples/sec   Loss 4.8005   LearningRate 0.0100   Epoch: 13   Global Step: 228100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:31,663-Speed 9302.19 samples/sec   Loss 4.7114   LearningRate 0.0100   Epoch: 13   Global Step: 228110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:32,761-Speed 9332.40 samples/sec   Loss 4.8250   LearningRate 0.0100   Epoch: 13   Global Step: 228120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:33,827-Speed 9610.40 samples/sec   Loss 4.7991   LearningRate 0.0100   Epoch: 13   Global Step: 228130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:34,903-Speed 9520.11 samples/sec   Loss 4.7402   LearningRate 0.0100   Epoch: 13   Global Step: 228140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:36,011-Speed 9249.68 samples/sec   Loss 4.8598   LearningRate 0.0100   Epoch: 13   Global Step: 228150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:37,116-Speed 9273.14 samples/sec   Loss 4.7173   LearningRate 0.0100   Epoch: 13   Global Step: 228160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:38,224-Speed 9247.61 samples/sec   Loss 4.7295   LearningRate 0.0100   Epoch: 13   Global Step: 228170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:39,342-Speed 9164.15 samples/sec   Loss 4.7636   LearningRate 0.0100   Epoch: 13   Global Step: 228180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:40,433-Speed 9389.72 samples/sec   Loss 4.7694   LearningRate 0.0100   Epoch: 13   Global Step: 228190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:41,523-Speed 9406.00 samples/sec   Loss 4.8462   LearningRate 0.0100   Epoch: 13   Global Step: 228200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:42,618-Speed 9353.54 samples/sec   Loss 4.8371   LearningRate 0.0100   Epoch: 13   Global Step: 228210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:43,672-Speed 9721.67 samples/sec   Loss 4.6229   LearningRate 0.0100   Epoch: 13   Global Step: 228220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:45:44,759-Speed 9430.97 samples/sec   Loss 4.8764   LearningRate 0.0100   Epoch: 13   Global Step: 228230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:45:45,809-Speed 9755.31 samples/sec   Loss 4.7267   LearningRate 0.0100   Epoch: 13   Global Step: 228240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:45:46,882-Speed 9553.06 samples/sec   Loss 4.7823   LearningRate 0.0100   Epoch: 13   Global Step: 228250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:45:48,008-Speed 9095.21 samples/sec   Loss 4.8108   LearningRate 0.0100   Epoch: 13   Global Step: 228260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:45:49,103-Speed 9357.64 samples/sec   Loss 4.7828   LearningRate 0.0100   Epoch: 13   Global Step: 228270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:45:50,181-Speed 9507.53 samples/sec   Loss 4.7313   LearningRate 0.0100   Epoch: 13   Global Step: 228280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:45:51,270-Speed 9408.98 samples/sec   Loss 4.6813   LearningRate 0.0100   Epoch: 13   Global Step: 228290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:45:52,352-Speed 9471.26 samples/sec   Loss 4.6828   LearningRate 0.0100   Epoch: 13   Global Step: 228300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:45:53,402-Speed 9759.45 samples/sec   Loss 4.6830   LearningRate 0.0100   Epoch: 13   Global Step: 228310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:45:54,495-Speed 9369.16 samples/sec   Loss 4.7945   LearningRate 0.0100   Epoch: 13   Global Step: 228320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:55,560-Speed 9624.75 samples/sec   Loss 4.8282   LearningRate 0.0100   Epoch: 13   Global Step: 228330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:56,628-Speed 9596.91 samples/sec   Loss 4.8057   LearningRate 0.0100   Epoch: 13   Global Step: 228340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:57,730-Speed 9297.06 samples/sec   Loss 4.7135   LearningRate 0.0100   Epoch: 13   Global Step: 228350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:58,828-Speed 9335.21 samples/sec   Loss 4.6848   LearningRate 0.0100   Epoch: 13   Global Step: 228360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:45:59,934-Speed 9262.32 samples/sec   Loss 4.7236   LearningRate 0.0100   Epoch: 13   Global Step: 228370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:01,026-Speed 9386.56 samples/sec   Loss 4.6499   LearningRate 0.0100   Epoch: 13   Global Step: 228380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:02,106-Speed 9485.35 samples/sec   Loss 4.8026   LearningRate 0.0100   Epoch: 13   Global Step: 228390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:03,181-Speed 9532.64 samples/sec   Loss 4.7030   LearningRate 0.0100   Epoch: 13   Global Step: 228400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:46:04,288-Speed 9255.86 samples/sec   Loss 4.7149   LearningRate 0.0100   Epoch: 13   Global Step: 228410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:46:05,371-Speed 9456.47 samples/sec   Loss 4.8157   LearningRate 0.0100   Epoch: 13   Global Step: 228420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:46:06,512-Speed 8981.32 samples/sec   Loss 4.7177   LearningRate 0.0100   Epoch: 13   Global Step: 228430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:46:07,600-Speed 9411.68 samples/sec   Loss 4.7761   LearningRate 0.0100   Epoch: 13   Global Step: 228440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:46:08,673-Speed 9559.80 samples/sec   Loss 4.6723   LearningRate 0.0100   Epoch: 13   Global Step: 228450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:46:09,777-Speed 9277.16 samples/sec   Loss 4.6552   LearningRate 0.0100   Epoch: 13   Global Step: 228460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:46:10,857-Speed 9488.31 samples/sec   Loss 4.7720   LearningRate 0.0100   Epoch: 13   Global Step: 228470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:46:11,951-Speed 9368.38 samples/sec   Loss 4.8655   LearningRate 0.0100   Epoch: 13   Global Step: 228480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:46:13,015-Speed 9627.72 samples/sec   Loss 4.8182   LearningRate 0.0100   Epoch: 13   Global Step: 228490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:46:14,077-Speed 9649.11 samples/sec   Loss 4.7581   LearningRate 0.0100   Epoch: 13   Global Step: 228500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:15,130-Speed 9727.55 samples/sec   Loss 4.6274   LearningRate 0.0100   Epoch: 13   Global Step: 228510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:16,223-Speed 9371.16 samples/sec   Loss 4.8411   LearningRate 0.0100   Epoch: 13   Global Step: 228520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:17,256-Speed 9926.22 samples/sec   Loss 4.7366   LearningRate 0.0099   Epoch: 13   Global Step: 228530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:18,346-Speed 9396.44 samples/sec   Loss 4.6860   LearningRate 0.0099   Epoch: 13   Global Step: 228540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:19,459-Speed 9206.93 samples/sec   Loss 4.6953   LearningRate 0.0099   Epoch: 13   Global Step: 228550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:20,522-Speed 9634.23 samples/sec   Loss 4.7310   LearningRate 0.0099   Epoch: 13   Global Step: 228560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:21,631-Speed 9243.91 samples/sec   Loss 4.7476   LearningRate 0.0099   Epoch: 13   Global Step: 228570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:22,729-Speed 9332.76 samples/sec   Loss 4.7526   LearningRate 0.0099   Epoch: 13   Global Step: 228580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:23,826-Speed 9340.31 samples/sec   Loss 4.6873   LearningRate 0.0099   Epoch: 13   Global Step: 228590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:24,917-Speed 9391.32 samples/sec   Loss 4.7455   LearningRate 0.0099   Epoch: 13   Global Step: 228600   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 20:46:26,000-Speed 9460.28 samples/sec   Loss 4.7411   LearningRate 0.0099   Epoch: 13   Global Step: 228610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:27,048-Speed 9773.81 samples/sec   Loss 4.8110   LearningRate 0.0099   Epoch: 13   Global Step: 228620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:28,144-Speed 9350.34 samples/sec   Loss 4.7224   LearningRate 0.0099   Epoch: 13   Global Step: 228630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:29,193-Speed 9777.46 samples/sec   Loss 4.7782   LearningRate 0.0099   Epoch: 13   Global Step: 228640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:30,251-Speed 9682.07 samples/sec   Loss 4.6558   LearningRate 0.0099   Epoch: 13   Global Step: 228650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:31,334-Speed 9455.88 samples/sec   Loss 4.7369   LearningRate 0.0099   Epoch: 13   Global Step: 228660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:32,397-Speed 9644.15 samples/sec   Loss 4.7039   LearningRate 0.0099   Epoch: 13   Global Step: 228670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:33,502-Speed 9273.04 samples/sec   Loss 4.6194   LearningRate 0.0099   Epoch: 13   Global Step: 228680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:34,574-Speed 9557.48 samples/sec   Loss 4.7271   LearningRate 0.0099   Epoch: 13   Global Step: 228690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:35,674-Speed 9313.07 samples/sec   Loss 4.7503   LearningRate 0.0099   Epoch: 13   Global Step: 228700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:36,748-Speed 9536.20 samples/sec   Loss 4.6908   LearningRate 0.0099   Epoch: 13   Global Step: 228710   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 20:46:37,836-Speed 9420.17 samples/sec   Loss 4.7403   LearningRate 0.0099   Epoch: 13   Global Step: 228720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:38,905-Speed 9579.22 samples/sec   Loss 4.7530   LearningRate 0.0099   Epoch: 13   Global Step: 228730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:39,992-Speed 9430.53 samples/sec   Loss 4.7582   LearningRate 0.0099   Epoch: 13   Global Step: 228740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:41,042-Speed 9757.60 samples/sec   Loss 4.8404   LearningRate 0.0099   Epoch: 13   Global Step: 228750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:42,133-Speed 9391.08 samples/sec   Loss 4.7606   LearningRate 0.0099   Epoch: 13   Global Step: 228760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:43,234-Speed 9307.50 samples/sec   Loss 4.7854   LearningRate 0.0099   Epoch: 13   Global Step: 228770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:44,300-Speed 9613.82 samples/sec   Loss 4.6864   LearningRate 0.0099   Epoch: 13   Global Step: 228780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:45,367-Speed 9597.22 samples/sec   Loss 4.7426   LearningRate 0.0099   Epoch: 13   Global Step: 228790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:46,472-Speed 9271.27 samples/sec   Loss 4.7872   LearningRate 0.0099   Epoch: 13   Global Step: 228800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:47,587-Speed 9196.03 samples/sec   Loss 4.8111   LearningRate 0.0099   Epoch: 13   Global Step: 228810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:48,686-Speed 9329.57 samples/sec   Loss 4.7510   LearningRate 0.0099   Epoch: 13   Global Step: 228820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:49,778-Speed 9378.15 samples/sec   Loss 4.6383   LearningRate 0.0099   Epoch: 13   Global Step: 228830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:50,915-Speed 9008.17 samples/sec   Loss 4.7676   LearningRate 0.0099   Epoch: 13   Global Step: 228840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:52,013-Speed 9340.02 samples/sec   Loss 4.6816   LearningRate 0.0099   Epoch: 13   Global Step: 228850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:46:53,096-Speed 9465.02 samples/sec   Loss 4.8656   LearningRate 0.0099   Epoch: 13   Global Step: 228860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:46:54,143-Speed 9783.24 samples/sec   Loss 4.7468   LearningRate 0.0099   Epoch: 13   Global Step: 228870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:46:55,239-Speed 9350.92 samples/sec   Loss 4.7818   LearningRate 0.0099   Epoch: 13   Global Step: 228880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:46:56,326-Speed 9422.99 samples/sec   Loss 4.7441   LearningRate 0.0099   Epoch: 13   Global Step: 228890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:46:57,374-Speed 9774.06 samples/sec   Loss 4.6429   LearningRate 0.0099   Epoch: 13   Global Step: 228900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:46:58,438-Speed 9635.24 samples/sec   Loss 4.7427   LearningRate 0.0099   Epoch: 13   Global Step: 228910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:46:59,557-Speed 9157.16 samples/sec   Loss 4.7310   LearningRate 0.0099   Epoch: 13   Global Step: 228920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:00,656-Speed 9324.14 samples/sec   Loss 4.7325   LearningRate 0.0099   Epoch: 13   Global Step: 228930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:01,734-Speed 9498.03 samples/sec   Loss 4.7329   LearningRate 0.0099   Epoch: 13   Global Step: 228940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:02,793-Speed 9673.85 samples/sec   Loss 4.7384   LearningRate 0.0099   Epoch: 13   Global Step: 228950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:03,837-Speed 9814.31 samples/sec   Loss 4.7160   LearningRate 0.0099   Epoch: 13   Global Step: 228960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:04,924-Speed 9425.40 samples/sec   Loss 4.7566   LearningRate 0.0099   Epoch: 13   Global Step: 228970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:05,982-Speed 9693.31 samples/sec   Loss 4.7489   LearningRate 0.0099   Epoch: 13   Global Step: 228980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:07,050-Speed 9594.24 samples/sec   Loss 4.8390   LearningRate 0.0099   Epoch: 13   Global Step: 228990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:08,101-Speed 9747.52 samples/sec   Loss 4.6623   LearningRate 0.0099   Epoch: 13   Global Step: 229000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:09,165-Speed 9633.23 samples/sec   Loss 4.6880   LearningRate 0.0099   Epoch: 13   Global Step: 229010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:10,256-Speed 9390.20 samples/sec   Loss 4.7540   LearningRate 0.0099   Epoch: 13   Global Step: 229020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:11,345-Speed 9413.71 samples/sec   Loss 4.7327   LearningRate 0.0099   Epoch: 13   Global Step: 229030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:12,457-Speed 9213.93 samples/sec   Loss 4.7066   LearningRate 0.0099   Epoch: 13   Global Step: 229040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:13,530-Speed 9542.45 samples/sec   Loss 4.7067   LearningRate 0.0099   Epoch: 13   Global Step: 229050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:14,606-Speed 9525.44 samples/sec   Loss 4.8090   LearningRate 0.0098   Epoch: 13   Global Step: 229060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:15,718-Speed 9210.27 samples/sec   Loss 4.7178   LearningRate 0.0098   Epoch: 13   Global Step: 229070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:16,782-Speed 9630.17 samples/sec   Loss 4.8108   LearningRate 0.0098   Epoch: 13   Global Step: 229080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:17,908-Speed 9098.54 samples/sec   Loss 4.7461   LearningRate 0.0098   Epoch: 13   Global Step: 229090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:19,027-Speed 9156.75 samples/sec   Loss 4.7160   LearningRate 0.0098   Epoch: 13   Global Step: 229100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:20,155-Speed 9084.60 samples/sec   Loss 4.8129   LearningRate 0.0098   Epoch: 13   Global Step: 229110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:21,242-Speed 9429.28 samples/sec   Loss 4.7960   LearningRate 0.0098   Epoch: 13   Global Step: 229120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:22,358-Speed 9176.34 samples/sec   Loss 4.8145   LearningRate 0.0098   Epoch: 13   Global Step: 229130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:23,468-Speed 9234.39 samples/sec   Loss 4.7545   LearningRate 0.0098   Epoch: 13   Global Step: 229140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:24,572-Speed 9281.53 samples/sec   Loss 4.7198   LearningRate 0.0098   Epoch: 13   Global Step: 229150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:25,665-Speed 9377.81 samples/sec   Loss 4.7081   LearningRate 0.0098   Epoch: 13   Global Step: 229160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:26,683-Speed 10066.11 samples/sec   Loss 4.8402   LearningRate 0.0098   Epoch: 13   Global Step: 229170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:27,777-Speed 9365.63 samples/sec   Loss 4.6767   LearningRate 0.0098   Epoch: 13   Global Step: 229180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:28,848-Speed 9563.32 samples/sec   Loss 4.7508   LearningRate 0.0098   Epoch: 13   Global Step: 229190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:29,944-Speed 9350.97 samples/sec   Loss 4.7049   LearningRate 0.0098   Epoch: 13   Global Step: 229200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:31,010-Speed 9609.07 samples/sec   Loss 4.7630   LearningRate 0.0098   Epoch: 13   Global Step: 229210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:32,079-Speed 9590.77 samples/sec   Loss 4.7412   LearningRate 0.0098   Epoch: 13   Global Step: 229220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:33,198-Speed 9154.34 samples/sec   Loss 4.7499   LearningRate 0.0098   Epoch: 13   Global Step: 229230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:34,267-Speed 9582.04 samples/sec   Loss 4.6657   LearningRate 0.0098   Epoch: 13   Global Step: 229240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:35,299-Speed 9928.87 samples/sec   Loss 4.7366   LearningRate 0.0098   Epoch: 13   Global Step: 229250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:36,356-Speed 9691.35 samples/sec   Loss 4.7626   LearningRate 0.0098   Epoch: 13   Global Step: 229260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:37,435-Speed 9493.21 samples/sec   Loss 4.7236   LearningRate 0.0098   Epoch: 13   Global Step: 229270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:38,521-Speed 9438.59 samples/sec   Loss 4.8221   LearningRate 0.0098   Epoch: 13   Global Step: 229280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:39,631-Speed 9229.81 samples/sec   Loss 4.7388   LearningRate 0.0098   Epoch: 13   Global Step: 229290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:40,688-Speed 9697.58 samples/sec   Loss 4.6944   LearningRate 0.0098   Epoch: 13   Global Step: 229300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:41,777-Speed 9407.95 samples/sec   Loss 4.6746   LearningRate 0.0098   Epoch: 13   Global Step: 229310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:42,853-Speed 9523.47 samples/sec   Loss 4.7826   LearningRate 0.0098   Epoch: 13   Global Step: 229320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:43,940-Speed 9423.23 samples/sec   Loss 4.6801   LearningRate 0.0098   Epoch: 13   Global Step: 229330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:45,008-Speed 9592.53 samples/sec   Loss 4.8171   LearningRate 0.0098   Epoch: 13   Global Step: 229340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:46,043-Speed 9906.37 samples/sec   Loss 4.8292   LearningRate 0.0098   Epoch: 13   Global Step: 229350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:47,087-Speed 9811.60 samples/sec   Loss 4.7233   LearningRate 0.0098   Epoch: 13   Global Step: 229360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:48,157-Speed 9571.35 samples/sec   Loss 4.7102   LearningRate 0.0098   Epoch: 13   Global Step: 229370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:49,242-Speed 9448.78 samples/sec   Loss 4.7777   LearningRate 0.0098   Epoch: 13   Global Step: 229380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:50,336-Speed 9360.33 samples/sec   Loss 4.7141   LearningRate 0.0098   Epoch: 13   Global Step: 229390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:51,427-Speed 9396.76 samples/sec   Loss 4.8511   LearningRate 0.0098   Epoch: 13   Global Step: 229400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:52,486-Speed 9674.90 samples/sec   Loss 4.7407   LearningRate 0.0098   Epoch: 13   Global Step: 229410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:53,557-Speed 9561.58 samples/sec   Loss 4.6529   LearningRate 0.0098   Epoch: 13   Global Step: 229420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:47:54,679-Speed 9134.50 samples/sec   Loss 4.7370   LearningRate 0.0098   Epoch: 13   Global Step: 229430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:55,753-Speed 9538.63 samples/sec   Loss 4.7729   LearningRate 0.0098   Epoch: 13   Global Step: 229440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:56,841-Speed 9423.98 samples/sec   Loss 4.6939   LearningRate 0.0098   Epoch: 13   Global Step: 229450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:57,895-Speed 9725.00 samples/sec   Loss 4.7493   LearningRate 0.0098   Epoch: 13   Global Step: 229460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:47:58,968-Speed 9545.77 samples/sec   Loss 4.7447   LearningRate 0.0098   Epoch: 13   Global Step: 229470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:00,025-Speed 9692.47 samples/sec   Loss 4.8285   LearningRate 0.0098   Epoch: 13   Global Step: 229480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:01,101-Speed 9526.76 samples/sec   Loss 4.7489   LearningRate 0.0098   Epoch: 13   Global Step: 229490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:02,189-Speed 9418.73 samples/sec   Loss 4.7452   LearningRate 0.0098   Epoch: 13   Global Step: 229500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:03,259-Speed 9577.12 samples/sec   Loss 4.8357   LearningRate 0.0098   Epoch: 13   Global Step: 229510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:04,291-Speed 9924.25 samples/sec   Loss 4.7478   LearningRate 0.0098   Epoch: 13   Global Step: 229520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:05,373-Speed 9471.94 samples/sec   Loss 4.7340   LearningRate 0.0098   Epoch: 13   Global Step: 229530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:06,444-Speed 9562.97 samples/sec   Loss 4.8034   LearningRate 0.0098   Epoch: 13   Global Step: 229540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:07,535-Speed 9389.40 samples/sec   Loss 4.7839   LearningRate 0.0098   Epoch: 13   Global Step: 229550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:08,622-Speed 9428.89 samples/sec   Loss 4.7569   LearningRate 0.0098   Epoch: 13   Global Step: 229560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:09,709-Speed 9424.56 samples/sec   Loss 4.7972   LearningRate 0.0098   Epoch: 13   Global Step: 229570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:10,837-Speed 9087.99 samples/sec   Loss 4.7068   LearningRate 0.0098   Epoch: 13   Global Step: 229580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:11,900-Speed 9636.13 samples/sec   Loss 4.7235   LearningRate 0.0097   Epoch: 13   Global Step: 229590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:12,986-Speed 9430.78 samples/sec   Loss 4.7116   LearningRate 0.0097   Epoch: 13   Global Step: 229600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:14,039-Speed 9730.18 samples/sec   Loss 4.7555   LearningRate 0.0097   Epoch: 13   Global Step: 229610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:15,139-Speed 9316.66 samples/sec   Loss 4.7543   LearningRate 0.0097   Epoch: 13   Global Step: 229620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:16,202-Speed 9635.50 samples/sec   Loss 4.6645   LearningRate 0.0097   Epoch: 13   Global Step: 229630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:17,284-Speed 9472.06 samples/sec   Loss 4.7150   LearningRate 0.0097   Epoch: 13   Global Step: 229640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:18,424-Speed 8983.39 samples/sec   Loss 4.7398   LearningRate 0.0097   Epoch: 13   Global Step: 229650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:19,506-Speed 9470.65 samples/sec   Loss 4.7573   LearningRate 0.0097   Epoch: 13   Global Step: 229660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:20,579-Speed 9552.97 samples/sec   Loss 4.8490   LearningRate 0.0097   Epoch: 13   Global Step: 229670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:21,644-Speed 9628.13 samples/sec   Loss 4.6756   LearningRate 0.0097   Epoch: 13   Global Step: 229680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:22,768-Speed 9119.32 samples/sec   Loss 4.8136   LearningRate 0.0097   Epoch: 13   Global Step: 229690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:23,870-Speed 9297.35 samples/sec   Loss 4.7187   LearningRate 0.0097   Epoch: 13   Global Step: 229700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:24,926-Speed 9697.35 samples/sec   Loss 4.6798   LearningRate 0.0097   Epoch: 13   Global Step: 229710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:26,016-Speed 9405.44 samples/sec   Loss 4.7595   LearningRate 0.0097   Epoch: 13   Global Step: 229720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:27,090-Speed 9538.56 samples/sec   Loss 4.8223   LearningRate 0.0097   Epoch: 13   Global Step: 229730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:28,183-Speed 9383.51 samples/sec   Loss 4.8520   LearningRate 0.0097   Epoch: 13   Global Step: 229740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:29,281-Speed 9334.68 samples/sec   Loss 4.7189   LearningRate 0.0097   Epoch: 13   Global Step: 229750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:30,352-Speed 9566.70 samples/sec   Loss 4.7981   LearningRate 0.0097   Epoch: 13   Global Step: 229760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:31,410-Speed 9682.05 samples/sec   Loss 4.8703   LearningRate 0.0097   Epoch: 13   Global Step: 229770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:32,491-Speed 9479.40 samples/sec   Loss 4.7483   LearningRate 0.0097   Epoch: 13   Global Step: 229780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:33,555-Speed 9634.61 samples/sec   Loss 4.7550   LearningRate 0.0097   Epoch: 13   Global Step: 229790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:34,595-Speed 9847.35 samples/sec   Loss 4.7628   LearningRate 0.0097   Epoch: 13   Global Step: 229800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:35,671-Speed 9526.03 samples/sec   Loss 4.7391   LearningRate 0.0097   Epoch: 13   Global Step: 229810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:36,752-Speed 9475.01 samples/sec   Loss 4.6883   LearningRate 0.0097   Epoch: 13   Global Step: 229820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:37,814-Speed 9644.77 samples/sec   Loss 4.7504   LearningRate 0.0097   Epoch: 13   Global Step: 229830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:38,926-Speed 9220.51 samples/sec   Loss 4.6585   LearningRate 0.0097   Epoch: 13   Global Step: 229840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:39,991-Speed 9617.85 samples/sec   Loss 4.7650   LearningRate 0.0097   Epoch: 13   Global Step: 229850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:41,082-Speed 9398.96 samples/sec   Loss 4.6653   LearningRate 0.0097   Epoch: 13   Global Step: 229860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:42,160-Speed 9496.21 samples/sec   Loss 4.7702   LearningRate 0.0097   Epoch: 13   Global Step: 229870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:43,213-Speed 9737.51 samples/sec   Loss 4.7692   LearningRate 0.0097   Epoch: 13   Global Step: 229880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:44,246-Speed 9916.59 samples/sec   Loss 4.7146   LearningRate 0.0097   Epoch: 13   Global Step: 229890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:45,284-Speed 9871.93 samples/sec   Loss 4.7184   LearningRate 0.0097   Epoch: 13   Global Step: 229900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:46,366-Speed 9467.86 samples/sec   Loss 4.7491   LearningRate 0.0097   Epoch: 13   Global Step: 229910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:48:47,435-Speed 9581.48 samples/sec   Loss 4.7853   LearningRate 0.0097   Epoch: 13   Global Step: 229920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:48,463-Speed 9968.29 samples/sec   Loss 4.7684   LearningRate 0.0097   Epoch: 13   Global Step: 229930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:49,538-Speed 9532.96 samples/sec   Loss 4.7206   LearningRate 0.0097   Epoch: 13   Global Step: 229940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:50,611-Speed 9545.64 samples/sec   Loss 4.6864   LearningRate 0.0097   Epoch: 13   Global Step: 229950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:51,692-Speed 9480.01 samples/sec   Loss 4.7463   LearningRate 0.0097   Epoch: 13   Global Step: 229960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:52,791-Speed 9326.73 samples/sec   Loss 4.7105   LearningRate 0.0097   Epoch: 13   Global Step: 229970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:53,896-Speed 9264.77 samples/sec   Loss 4.7526   LearningRate 0.0097   Epoch: 13   Global Step: 229980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:55,025-Speed 9080.50 samples/sec   Loss 4.7512   LearningRate 0.0097   Epoch: 13   Global Step: 229990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:48:56,072-Speed 9778.86 samples/sec   Loss 4.7257   LearningRate 0.0097   Epoch: 13   Global Step: 230000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:49:17,929-[lfw][230000]XNorm: 7.864795
Training: 2022-04-11 20:49:17,930-[lfw][230000]Accuracy-Flip: 0.99617+-0.00236
Training: 2022-04-11 20:49:17,930-[lfw][230000]Accuracy-Highest: 0.99733
Training: 2022-04-11 20:49:43,158-[cfp_fp][230000]XNorm: 6.765681
Training: 2022-04-11 20:49:43,159-[cfp_fp][230000]Accuracy-Flip: 0.96914+-0.00767
Training: 2022-04-11 20:49:43,160-[cfp_fp][230000]Accuracy-Highest: 0.96914
Training: 2022-04-11 20:50:04,909-[agedb_30][230000]XNorm: 7.607323
Training: 2022-04-11 20:50:04,910-[agedb_30][230000]Accuracy-Flip: 0.96900+-0.00955
Training: 2022-04-11 20:50:04,910-[agedb_30][230000]Accuracy-Highest: 0.97033
Training: 2022-04-11 20:50:06,018-Speed 146.40 samples/sec   Loss 4.6567   LearningRate 0.0097   Epoch: 13   Global Step: 230010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:07,141-Speed 9124.00 samples/sec   Loss 4.7554   LearningRate 0.0097   Epoch: 13   Global Step: 230020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:08,246-Speed 9275.40 samples/sec   Loss 4.8190   LearningRate 0.0097   Epoch: 13   Global Step: 230030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:09,337-Speed 9393.31 samples/sec   Loss 4.6925   LearningRate 0.0097   Epoch: 13   Global Step: 230040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:10,410-Speed 9546.79 samples/sec   Loss 4.7112   LearningRate 0.0097   Epoch: 13   Global Step: 230050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:11,479-Speed 9584.60 samples/sec   Loss 4.8027   LearningRate 0.0097   Epoch: 13   Global Step: 230060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:12,585-Speed 9261.88 samples/sec   Loss 4.6949   LearningRate 0.0097   Epoch: 13   Global Step: 230070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:13,692-Speed 9255.29 samples/sec   Loss 4.6824   LearningRate 0.0097   Epoch: 13   Global Step: 230080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:14,756-Speed 9631.34 samples/sec   Loss 4.8113   LearningRate 0.0097   Epoch: 13   Global Step: 230090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:15,783-Speed 9980.32 samples/sec   Loss 4.6157   LearningRate 0.0097   Epoch: 13   Global Step: 230100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:16,851-Speed 9592.94 samples/sec   Loss 4.6817   LearningRate 0.0097   Epoch: 13   Global Step: 230110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:17,927-Speed 9516.86 samples/sec   Loss 4.7273   LearningRate 0.0097   Epoch: 13   Global Step: 230120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:18,974-Speed 9788.73 samples/sec   Loss 4.6843   LearningRate 0.0096   Epoch: 13   Global Step: 230130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:20,028-Speed 9720.21 samples/sec   Loss 4.8367   LearningRate 0.0096   Epoch: 13   Global Step: 230140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:21,080-Speed 9737.21 samples/sec   Loss 4.7656   LearningRate 0.0096   Epoch: 13   Global Step: 230150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:22,157-Speed 9518.79 samples/sec   Loss 4.7337   LearningRate 0.0096   Epoch: 13   Global Step: 230160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:23,276-Speed 9152.65 samples/sec   Loss 4.7326   LearningRate 0.0096   Epoch: 13   Global Step: 230170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:24,376-Speed 9322.88 samples/sec   Loss 4.8235   LearningRate 0.0096   Epoch: 13   Global Step: 230180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:25,470-Speed 9358.92 samples/sec   Loss 4.7614   LearningRate 0.0096   Epoch: 13   Global Step: 230190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:26,568-Speed 9336.66 samples/sec   Loss 4.7101   LearningRate 0.0096   Epoch: 13   Global Step: 230200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:27,664-Speed 9345.23 samples/sec   Loss 4.6346   LearningRate 0.0096   Epoch: 13   Global Step: 230210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:28,725-Speed 9659.38 samples/sec   Loss 4.7660   LearningRate 0.0096   Epoch: 13   Global Step: 230220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:29,764-Speed 9865.65 samples/sec   Loss 4.6806   LearningRate 0.0096   Epoch: 13   Global Step: 230230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:30,853-Speed 9409.39 samples/sec   Loss 4.8306   LearningRate 0.0096   Epoch: 13   Global Step: 230240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:31,921-Speed 9588.68 samples/sec   Loss 4.8035   LearningRate 0.0096   Epoch: 13   Global Step: 230250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:33,001-Speed 9486.86 samples/sec   Loss 4.8218   LearningRate 0.0096   Epoch: 13   Global Step: 230260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:34,110-Speed 9239.70 samples/sec   Loss 4.6840   LearningRate 0.0096   Epoch: 13   Global Step: 230270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:35,214-Speed 9277.55 samples/sec   Loss 4.8002   LearningRate 0.0096   Epoch: 13   Global Step: 230280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:36,321-Speed 9255.81 samples/sec   Loss 4.6970   LearningRate 0.0096   Epoch: 13   Global Step: 230290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:37,395-Speed 9547.87 samples/sec   Loss 4.7826   LearningRate 0.0096   Epoch: 13   Global Step: 230300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:38,451-Speed 9700.63 samples/sec   Loss 4.7190   LearningRate 0.0096   Epoch: 13   Global Step: 230310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:39,552-Speed 9304.12 samples/sec   Loss 4.6748   LearningRate 0.0096   Epoch: 13   Global Step: 230320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:50:40,642-Speed 9401.39 samples/sec   Loss 4.8019   LearningRate 0.0096   Epoch: 13   Global Step: 230330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:41,737-Speed 9363.74 samples/sec   Loss 4.7195   LearningRate 0.0096   Epoch: 13   Global Step: 230340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:42,806-Speed 9582.52 samples/sec   Loss 4.7731   LearningRate 0.0096   Epoch: 13   Global Step: 230350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:43,873-Speed 9601.44 samples/sec   Loss 4.7510   LearningRate 0.0096   Epoch: 13   Global Step: 230360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:44,946-Speed 9547.09 samples/sec   Loss 4.7554   LearningRate 0.0096   Epoch: 13   Global Step: 230370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:46,043-Speed 9348.84 samples/sec   Loss 4.7685   LearningRate 0.0096   Epoch: 13   Global Step: 230380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:47,132-Speed 9404.58 samples/sec   Loss 4.7198   LearningRate 0.0096   Epoch: 13   Global Step: 230390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:48,206-Speed 9538.84 samples/sec   Loss 4.6683   LearningRate 0.0096   Epoch: 13   Global Step: 230400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:49,364-Speed 8847.62 samples/sec   Loss 4.6873   LearningRate 0.0096   Epoch: 13   Global Step: 230410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:50,436-Speed 9556.48 samples/sec   Loss 4.6919   LearningRate 0.0096   Epoch: 13   Global Step: 230420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:51,516-Speed 9496.51 samples/sec   Loss 4.7268   LearningRate 0.0096   Epoch: 13   Global Step: 230430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:52,549-Speed 9920.45 samples/sec   Loss 4.7413   LearningRate 0.0096   Epoch: 13   Global Step: 230440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:53,645-Speed 9347.39 samples/sec   Loss 4.8011   LearningRate 0.0096   Epoch: 13   Global Step: 230450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:54,746-Speed 9300.37 samples/sec   Loss 4.8112   LearningRate 0.0096   Epoch: 13   Global Step: 230460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:55,776-Speed 9948.85 samples/sec   Loss 4.8025   LearningRate 0.0096   Epoch: 13   Global Step: 230470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:56,861-Speed 9446.44 samples/sec   Loss 4.7383   LearningRate 0.0096   Epoch: 13   Global Step: 230480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:57,971-Speed 9226.49 samples/sec   Loss 4.7095   LearningRate 0.0096   Epoch: 13   Global Step: 230490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:50:59,046-Speed 9533.80 samples/sec   Loss 4.6814   LearningRate 0.0096   Epoch: 13   Global Step: 230500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:00,092-Speed 9798.40 samples/sec   Loss 4.7870   LearningRate 0.0096   Epoch: 13   Global Step: 230510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:01,157-Speed 9618.27 samples/sec   Loss 4.8458   LearningRate 0.0096   Epoch: 13   Global Step: 230520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:02,232-Speed 9530.04 samples/sec   Loss 4.7438   LearningRate 0.0096   Epoch: 13   Global Step: 230530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:03,335-Speed 9286.33 samples/sec   Loss 4.6752   LearningRate 0.0096   Epoch: 13   Global Step: 230540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:04,468-Speed 9047.33 samples/sec   Loss 4.7001   LearningRate 0.0096   Epoch: 13   Global Step: 230550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:05,551-Speed 9458.69 samples/sec   Loss 4.6855   LearningRate 0.0096   Epoch: 13   Global Step: 230560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:06,677-Speed 9100.12 samples/sec   Loss 4.7807   LearningRate 0.0096   Epoch: 13   Global Step: 230570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:07,805-Speed 9082.53 samples/sec   Loss 4.7856   LearningRate 0.0096   Epoch: 13   Global Step: 230580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:08,932-Speed 9095.49 samples/sec   Loss 4.7882   LearningRate 0.0096   Epoch: 13   Global Step: 230590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:09,995-Speed 9644.46 samples/sec   Loss 4.7600   LearningRate 0.0096   Epoch: 13   Global Step: 230600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:11,052-Speed 9699.34 samples/sec   Loss 4.7763   LearningRate 0.0096   Epoch: 13   Global Step: 230610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:12,133-Speed 9472.31 samples/sec   Loss 4.6347   LearningRate 0.0096   Epoch: 13   Global Step: 230620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:13,211-Speed 9503.88 samples/sec   Loss 4.6714   LearningRate 0.0096   Epoch: 13   Global Step: 230630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:14,343-Speed 9051.59 samples/sec   Loss 4.7835   LearningRate 0.0096   Epoch: 13   Global Step: 230640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:15,428-Speed 9446.29 samples/sec   Loss 4.7182   LearningRate 0.0096   Epoch: 13   Global Step: 230650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:16,497-Speed 9583.66 samples/sec   Loss 4.7089   LearningRate 0.0095   Epoch: 13   Global Step: 230660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:17,615-Speed 9167.95 samples/sec   Loss 4.8119   LearningRate 0.0095   Epoch: 13   Global Step: 230670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:18,715-Speed 9307.06 samples/sec   Loss 4.7366   LearningRate 0.0095   Epoch: 13   Global Step: 230680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:19,816-Speed 9305.62 samples/sec   Loss 4.7875   LearningRate 0.0095   Epoch: 13   Global Step: 230690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:20,928-Speed 9215.52 samples/sec   Loss 4.7192   LearningRate 0.0095   Epoch: 13   Global Step: 230700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:22,002-Speed 9546.78 samples/sec   Loss 4.7540   LearningRate 0.0095   Epoch: 13   Global Step: 230710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:23,097-Speed 9358.70 samples/sec   Loss 4.8298   LearningRate 0.0095   Epoch: 13   Global Step: 230720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:24,193-Speed 9354.35 samples/sec   Loss 4.7747   LearningRate 0.0095   Epoch: 13   Global Step: 230730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:25,274-Speed 9479.36 samples/sec   Loss 4.7401   LearningRate 0.0095   Epoch: 13   Global Step: 230740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:26,303-Speed 9954.30 samples/sec   Loss 4.7763   LearningRate 0.0095   Epoch: 13   Global Step: 230750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:27,383-Speed 9488.31 samples/sec   Loss 4.6754   LearningRate 0.0095   Epoch: 13   Global Step: 230760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:51:28,483-Speed 9313.22 samples/sec   Loss 4.8248   LearningRate 0.0095   Epoch: 13   Global Step: 230770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:51:29,588-Speed 9273.84 samples/sec   Loss 4.8142   LearningRate 0.0095   Epoch: 13   Global Step: 230780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:51:30,697-Speed 9240.20 samples/sec   Loss 4.8517   LearningRate 0.0095   Epoch: 13   Global Step: 230790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:51:31,791-Speed 9359.32 samples/sec   Loss 4.7660   LearningRate 0.0095   Epoch: 13   Global Step: 230800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:51:32,861-Speed 9581.20 samples/sec   Loss 4.8021   LearningRate 0.0095   Epoch: 13   Global Step: 230810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:51:33,914-Speed 9729.05 samples/sec   Loss 4.7484   LearningRate 0.0095   Epoch: 13   Global Step: 230820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:51:35,036-Speed 9127.38 samples/sec   Loss 4.7953   LearningRate 0.0095   Epoch: 13   Global Step: 230830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:51:36,140-Speed 9282.78 samples/sec   Loss 4.7389   LearningRate 0.0095   Epoch: 13   Global Step: 230840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:51:37,237-Speed 9339.28 samples/sec   Loss 4.7148   LearningRate 0.0095   Epoch: 13   Global Step: 230850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:51:38,405-Speed 8875.77 samples/sec   Loss 4.7514   LearningRate 0.0095   Epoch: 13   Global Step: 230860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:39,495-Speed 9395.69 samples/sec   Loss 4.7910   LearningRate 0.0095   Epoch: 13   Global Step: 230870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:40,589-Speed 9369.36 samples/sec   Loss 4.8077   LearningRate 0.0095   Epoch: 13   Global Step: 230880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:41,713-Speed 9119.47 samples/sec   Loss 4.7335   LearningRate 0.0095   Epoch: 13   Global Step: 230890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:42,801-Speed 9417.83 samples/sec   Loss 4.7212   LearningRate 0.0095   Epoch: 13   Global Step: 230900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:43,852-Speed 9752.55 samples/sec   Loss 4.8247   LearningRate 0.0095   Epoch: 13   Global Step: 230910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:44,987-Speed 9023.54 samples/sec   Loss 4.7223   LearningRate 0.0095   Epoch: 13   Global Step: 230920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:46,057-Speed 9579.24 samples/sec   Loss 4.7966   LearningRate 0.0095   Epoch: 13   Global Step: 230930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:47,165-Speed 9241.63 samples/sec   Loss 4.8558   LearningRate 0.0095   Epoch: 13   Global Step: 230940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:48,254-Speed 9414.83 samples/sec   Loss 4.7117   LearningRate 0.0095   Epoch: 13   Global Step: 230950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:49,335-Speed 9476.86 samples/sec   Loss 4.8310   LearningRate 0.0095   Epoch: 13   Global Step: 230960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:50,399-Speed 9627.14 samples/sec   Loss 4.7264   LearningRate 0.0095   Epoch: 13   Global Step: 230970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:51,456-Speed 9690.99 samples/sec   Loss 4.6901   LearningRate 0.0095   Epoch: 13   Global Step: 230980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:52,550-Speed 9371.77 samples/sec   Loss 4.7438   LearningRate 0.0095   Epoch: 13   Global Step: 230990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:53,649-Speed 9322.32 samples/sec   Loss 4.7711   LearningRate 0.0095   Epoch: 13   Global Step: 231000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:54,735-Speed 9430.11 samples/sec   Loss 4.7643   LearningRate 0.0095   Epoch: 13   Global Step: 231010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:51:55,820-Speed 9441.96 samples/sec   Loss 4.8102   LearningRate 0.0095   Epoch: 13   Global Step: 231020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:51:56,909-Speed 9413.47 samples/sec   Loss 4.7160   LearningRate 0.0095   Epoch: 13   Global Step: 231030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:51:57,975-Speed 9611.52 samples/sec   Loss 4.7338   LearningRate 0.0095   Epoch: 13   Global Step: 231040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:51:59,093-Speed 9169.21 samples/sec   Loss 4.6466   LearningRate 0.0095   Epoch: 13   Global Step: 231050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:00,193-Speed 9318.07 samples/sec   Loss 4.7721   LearningRate 0.0095   Epoch: 13   Global Step: 231060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:01,256-Speed 9632.76 samples/sec   Loss 4.7447   LearningRate 0.0095   Epoch: 13   Global Step: 231070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:02,334-Speed 9504.31 samples/sec   Loss 4.6996   LearningRate 0.0095   Epoch: 13   Global Step: 231080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:03,430-Speed 9353.79 samples/sec   Loss 4.7345   LearningRate 0.0095   Epoch: 13   Global Step: 231090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:04,561-Speed 9052.32 samples/sec   Loss 4.6927   LearningRate 0.0095   Epoch: 13   Global Step: 231100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:05,611-Speed 9765.64 samples/sec   Loss 4.8582   LearningRate 0.0095   Epoch: 13   Global Step: 231110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:06,688-Speed 9510.85 samples/sec   Loss 4.8696   LearningRate 0.0095   Epoch: 13   Global Step: 231120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:07,743-Speed 9711.44 samples/sec   Loss 4.6904   LearningRate 0.0095   Epoch: 13   Global Step: 231130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:08,800-Speed 9696.75 samples/sec   Loss 4.6480   LearningRate 0.0095   Epoch: 13   Global Step: 231140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:09,853-Speed 9727.27 samples/sec   Loss 4.8141   LearningRate 0.0095   Epoch: 13   Global Step: 231150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:10,932-Speed 9492.18 samples/sec   Loss 4.6530   LearningRate 0.0095   Epoch: 13   Global Step: 231160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:12,039-Speed 9261.12 samples/sec   Loss 4.7721   LearningRate 0.0095   Epoch: 13   Global Step: 231170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:13,134-Speed 9354.59 samples/sec   Loss 4.7022   LearningRate 0.0095   Epoch: 13   Global Step: 231180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:14,230-Speed 9347.35 samples/sec   Loss 4.7583   LearningRate 0.0095   Epoch: 13   Global Step: 231190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:15,327-Speed 9337.01 samples/sec   Loss 4.7112   LearningRate 0.0095   Epoch: 13   Global Step: 231200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:16,448-Speed 9142.63 samples/sec   Loss 4.7159   LearningRate 0.0094   Epoch: 13   Global Step: 231210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:17,553-Speed 9272.08 samples/sec   Loss 4.6033   LearningRate 0.0094   Epoch: 13   Global Step: 231220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:18,625-Speed 9556.32 samples/sec   Loss 4.7151   LearningRate 0.0094   Epoch: 13   Global Step: 231230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:52:19,718-Speed 9381.47 samples/sec   Loss 4.7327   LearningRate 0.0094   Epoch: 13   Global Step: 231240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:20,773-Speed 9704.27 samples/sec   Loss 4.7293   LearningRate 0.0094   Epoch: 13   Global Step: 231250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:21,855-Speed 9476.28 samples/sec   Loss 4.7596   LearningRate 0.0094   Epoch: 13   Global Step: 231260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:22,924-Speed 9589.31 samples/sec   Loss 4.7369   LearningRate 0.0094   Epoch: 13   Global Step: 231270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:24,013-Speed 9406.25 samples/sec   Loss 4.8462   LearningRate 0.0094   Epoch: 13   Global Step: 231280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:25,116-Speed 9284.94 samples/sec   Loss 4.7619   LearningRate 0.0094   Epoch: 13   Global Step: 231290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:26,208-Speed 9379.22 samples/sec   Loss 4.8537   LearningRate 0.0094   Epoch: 13   Global Step: 231300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:27,319-Speed 9227.99 samples/sec   Loss 4.6926   LearningRate 0.0094   Epoch: 13   Global Step: 231310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:28,411-Speed 9398.62 samples/sec   Loss 4.6486   LearningRate 0.0094   Epoch: 13   Global Step: 231320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:29,508-Speed 9335.86 samples/sec   Loss 4.7066   LearningRate 0.0094   Epoch: 13   Global Step: 231330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:30,593-Speed 9445.11 samples/sec   Loss 4.6720   LearningRate 0.0094   Epoch: 13   Global Step: 231340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:31,722-Speed 9078.37 samples/sec   Loss 4.6779   LearningRate 0.0094   Epoch: 13   Global Step: 231350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:32,810-Speed 9410.89 samples/sec   Loss 4.6992   LearningRate 0.0094   Epoch: 13   Global Step: 231360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:33,890-Speed 9490.64 samples/sec   Loss 4.7142   LearningRate 0.0094   Epoch: 13   Global Step: 231370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:34,962-Speed 9559.08 samples/sec   Loss 4.7472   LearningRate 0.0094   Epoch: 13   Global Step: 231380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:36,056-Speed 9362.10 samples/sec   Loss 4.7674   LearningRate 0.0094   Epoch: 13   Global Step: 231390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:37,153-Speed 9339.13 samples/sec   Loss 4.6558   LearningRate 0.0094   Epoch: 13   Global Step: 231400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:52:38,265-Speed 9211.37 samples/sec   Loss 4.7505   LearningRate 0.0094   Epoch: 13   Global Step: 231410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:39,386-Speed 9143.49 samples/sec   Loss 4.7778   LearningRate 0.0094   Epoch: 13   Global Step: 231420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:40,520-Speed 9039.65 samples/sec   Loss 4.6706   LearningRate 0.0094   Epoch: 13   Global Step: 231430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:41,618-Speed 9335.10 samples/sec   Loss 4.6793   LearningRate 0.0094   Epoch: 13   Global Step: 231440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:42,683-Speed 9618.37 samples/sec   Loss 4.6645   LearningRate 0.0094   Epoch: 13   Global Step: 231450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:43,777-Speed 9363.18 samples/sec   Loss 4.8098   LearningRate 0.0094   Epoch: 13   Global Step: 231460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:44,826-Speed 9768.30 samples/sec   Loss 4.7614   LearningRate 0.0094   Epoch: 13   Global Step: 231470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:45,925-Speed 9326.80 samples/sec   Loss 4.6377   LearningRate 0.0094   Epoch: 13   Global Step: 231480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:46,999-Speed 9540.20 samples/sec   Loss 4.7752   LearningRate 0.0094   Epoch: 13   Global Step: 231490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:48,082-Speed 9460.36 samples/sec   Loss 4.6934   LearningRate 0.0094   Epoch: 13   Global Step: 231500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:49,161-Speed 9488.53 samples/sec   Loss 4.7000   LearningRate 0.0094   Epoch: 13   Global Step: 231510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:52:50,209-Speed 9779.51 samples/sec   Loss 4.7780   LearningRate 0.0094   Epoch: 13   Global Step: 231520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:51,289-Speed 9490.95 samples/sec   Loss 4.7764   LearningRate 0.0094   Epoch: 13   Global Step: 231530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:52,399-Speed 9226.86 samples/sec   Loss 4.7432   LearningRate 0.0094   Epoch: 13   Global Step: 231540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:53,518-Speed 9157.90 samples/sec   Loss 4.6744   LearningRate 0.0094   Epoch: 13   Global Step: 231550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:54,554-Speed 9886.82 samples/sec   Loss 4.7010   LearningRate 0.0094   Epoch: 13   Global Step: 231560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:55,686-Speed 9054.91 samples/sec   Loss 4.7312   LearningRate 0.0094   Epoch: 13   Global Step: 231570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:56,746-Speed 9661.19 samples/sec   Loss 4.6631   LearningRate 0.0094   Epoch: 13   Global Step: 231580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:57,858-Speed 9217.83 samples/sec   Loss 4.7200   LearningRate 0.0094   Epoch: 13   Global Step: 231590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:52:58,963-Speed 9275.13 samples/sec   Loss 4.6471   LearningRate 0.0094   Epoch: 13   Global Step: 231600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:00,021-Speed 9685.25 samples/sec   Loss 4.8828   LearningRate 0.0094   Epoch: 13   Global Step: 231610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:01,116-Speed 9357.34 samples/sec   Loss 4.6721   LearningRate 0.0094   Epoch: 13   Global Step: 231620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:02,201-Speed 9443.61 samples/sec   Loss 4.7151   LearningRate 0.0094   Epoch: 13   Global Step: 231630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:03,277-Speed 9526.59 samples/sec   Loss 4.7090   LearningRate 0.0094   Epoch: 13   Global Step: 231640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:04,338-Speed 9652.58 samples/sec   Loss 4.8411   LearningRate 0.0094   Epoch: 13   Global Step: 231650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:05,427-Speed 9407.42 samples/sec   Loss 4.7008   LearningRate 0.0094   Epoch: 13   Global Step: 231660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:06,500-Speed 9550.53 samples/sec   Loss 4.6984   LearningRate 0.0094   Epoch: 13   Global Step: 231670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:07,591-Speed 9393.20 samples/sec   Loss 4.7032   LearningRate 0.0094   Epoch: 13   Global Step: 231680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:08,718-Speed 9086.05 samples/sec   Loss 4.8549   LearningRate 0.0094   Epoch: 13   Global Step: 231690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:09,791-Speed 9548.89 samples/sec   Loss 4.7308   LearningRate 0.0094   Epoch: 13   Global Step: 231700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:10,875-Speed 9458.97 samples/sec   Loss 4.7025   LearningRate 0.0094   Epoch: 13   Global Step: 231710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:11,951-Speed 9526.90 samples/sec   Loss 4.7092   LearningRate 0.0094   Epoch: 13   Global Step: 231720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:13,045-Speed 9365.80 samples/sec   Loss 4.8040   LearningRate 0.0094   Epoch: 13   Global Step: 231730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:14,149-Speed 9279.63 samples/sec   Loss 4.7005   LearningRate 0.0094   Epoch: 13   Global Step: 231740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:15,227-Speed 9500.73 samples/sec   Loss 4.7466   LearningRate 0.0093   Epoch: 13   Global Step: 231750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:16,321-Speed 9375.23 samples/sec   Loss 4.7609   LearningRate 0.0093   Epoch: 13   Global Step: 231760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:17,439-Speed 9167.43 samples/sec   Loss 4.7633   LearningRate 0.0093   Epoch: 13   Global Step: 231770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:18,518-Speed 9494.75 samples/sec   Loss 4.7431   LearningRate 0.0093   Epoch: 13   Global Step: 231780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:19,636-Speed 9163.26 samples/sec   Loss 4.7564   LearningRate 0.0093   Epoch: 13   Global Step: 231790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:20,755-Speed 9152.20 samples/sec   Loss 4.7728   LearningRate 0.0093   Epoch: 13   Global Step: 231800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:21,873-Speed 9174.70 samples/sec   Loss 4.7425   LearningRate 0.0093   Epoch: 13   Global Step: 231810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:22,936-Speed 9637.90 samples/sec   Loss 4.7733   LearningRate 0.0093   Epoch: 13   Global Step: 231820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:24,024-Speed 9412.85 samples/sec   Loss 4.7592   LearningRate 0.0093   Epoch: 13   Global Step: 231830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:25,145-Speed 9141.71 samples/sec   Loss 4.7372   LearningRate 0.0093   Epoch: 13   Global Step: 231840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:26,248-Speed 9289.79 samples/sec   Loss 4.7023   LearningRate 0.0093   Epoch: 13   Global Step: 231850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:27,343-Speed 9356.79 samples/sec   Loss 4.7290   LearningRate 0.0093   Epoch: 13   Global Step: 231860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:28,439-Speed 9352.23 samples/sec   Loss 4.6751   LearningRate 0.0093   Epoch: 13   Global Step: 231870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:29,557-Speed 9164.62 samples/sec   Loss 4.8847   LearningRate 0.0093   Epoch: 13   Global Step: 231880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:30,642-Speed 9438.61 samples/sec   Loss 4.7933   LearningRate 0.0093   Epoch: 13   Global Step: 231890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:31,748-Speed 9264.59 samples/sec   Loss 4.6827   LearningRate 0.0093   Epoch: 13   Global Step: 231900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:32,824-Speed 9525.30 samples/sec   Loss 4.7739   LearningRate 0.0093   Epoch: 13   Global Step: 231910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:33,934-Speed 9228.87 samples/sec   Loss 4.7038   LearningRate 0.0093   Epoch: 13   Global Step: 231920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:34,999-Speed 9624.82 samples/sec   Loss 4.7413   LearningRate 0.0093   Epoch: 13   Global Step: 231930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:36,069-Speed 9573.85 samples/sec   Loss 4.6884   LearningRate 0.0093   Epoch: 13   Global Step: 231940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:37,136-Speed 9597.16 samples/sec   Loss 4.7464   LearningRate 0.0093   Epoch: 13   Global Step: 231950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:38,213-Speed 9516.56 samples/sec   Loss 4.7304   LearningRate 0.0093   Epoch: 13   Global Step: 231960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:39,280-Speed 9601.28 samples/sec   Loss 4.6514   LearningRate 0.0093   Epoch: 13   Global Step: 231970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:53:40,344-Speed 9628.38 samples/sec   Loss 4.8185   LearningRate 0.0093   Epoch: 13   Global Step: 231980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:41,445-Speed 9315.59 samples/sec   Loss 4.7340   LearningRate 0.0093   Epoch: 13   Global Step: 231990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:53:42,507-Speed 9645.29 samples/sec   Loss 4.7864   LearningRate 0.0093   Epoch: 13   Global Step: 232000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:54:04,422-[lfw][232000]XNorm: 7.786779
Training: 2022-04-11 20:54:04,423-[lfw][232000]Accuracy-Flip: 0.99617+-0.00248
Training: 2022-04-11 20:54:04,423-[lfw][232000]Accuracy-Highest: 0.99733
Training: 2022-04-11 20:54:29,695-[cfp_fp][232000]XNorm: 6.727420
Training: 2022-04-11 20:54:29,696-[cfp_fp][232000]Accuracy-Flip: 0.96700+-0.00920
Training: 2022-04-11 20:54:29,696-[cfp_fp][232000]Accuracy-Highest: 0.96914
Training: 2022-04-11 20:54:51,491-[agedb_30][232000]XNorm: 7.541200
Training: 2022-04-11 20:54:51,491-[agedb_30][232000]Accuracy-Flip: 0.97250+-0.00817
Training: 2022-04-11 20:54:51,492-[agedb_30][232000]Accuracy-Highest: 0.97250
Training: 2022-04-11 20:54:52,613-Speed 146.07 samples/sec   Loss 4.7096   LearningRate 0.0093   Epoch: 13   Global Step: 232010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:54:53,679-Speed 9611.58 samples/sec   Loss 4.7657   LearningRate 0.0093   Epoch: 13   Global Step: 232020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:54:54,752-Speed 9551.96 samples/sec   Loss 4.8248   LearningRate 0.0093   Epoch: 13   Global Step: 232030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:54:55,830-Speed 9508.58 samples/sec   Loss 4.8786   LearningRate 0.0093   Epoch: 13   Global Step: 232040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:54:56,900-Speed 9571.66 samples/sec   Loss 4.7440   LearningRate 0.0093   Epoch: 13   Global Step: 232050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:54:57,985-Speed 9448.82 samples/sec   Loss 4.7345   LearningRate 0.0093   Epoch: 13   Global Step: 232060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:54:59,064-Speed 9499.38 samples/sec   Loss 4.6987   LearningRate 0.0093   Epoch: 13   Global Step: 232070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:00,171-Speed 9251.45 samples/sec   Loss 4.6545   LearningRate 0.0093   Epoch: 13   Global Step: 232080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:01,267-Speed 9347.22 samples/sec   Loss 4.7711   LearningRate 0.0093   Epoch: 13   Global Step: 232090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:02,352-Speed 9446.49 samples/sec   Loss 4.6853   LearningRate 0.0093   Epoch: 13   Global Step: 232100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:03,431-Speed 9493.52 samples/sec   Loss 4.7579   LearningRate 0.0093   Epoch: 13   Global Step: 232110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:04,503-Speed 9559.97 samples/sec   Loss 4.7666   LearningRate 0.0093   Epoch: 13   Global Step: 232120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:05,596-Speed 9375.20 samples/sec   Loss 4.8746   LearningRate 0.0093   Epoch: 13   Global Step: 232130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:06,680-Speed 9446.72 samples/sec   Loss 4.7505   LearningRate 0.0093   Epoch: 13   Global Step: 232140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:07,782-Speed 9298.66 samples/sec   Loss 4.7111   LearningRate 0.0093   Epoch: 13   Global Step: 232150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:08,872-Speed 9406.12 samples/sec   Loss 4.6942   LearningRate 0.0093   Epoch: 13   Global Step: 232160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:09,937-Speed 9620.54 samples/sec   Loss 4.7784   LearningRate 0.0093   Epoch: 13   Global Step: 232170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:11,004-Speed 9598.09 samples/sec   Loss 4.7640   LearningRate 0.0093   Epoch: 13   Global Step: 232180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:12,081-Speed 9516.25 samples/sec   Loss 4.8174   LearningRate 0.0093   Epoch: 13   Global Step: 232190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:13,162-Speed 9479.11 samples/sec   Loss 4.6905   LearningRate 0.0093   Epoch: 13   Global Step: 232200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:14,195-Speed 9920.02 samples/sec   Loss 4.6584   LearningRate 0.0093   Epoch: 13   Global Step: 232210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:15,240-Speed 9803.97 samples/sec   Loss 4.7374   LearningRate 0.0093   Epoch: 13   Global Step: 232220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:16,340-Speed 9314.01 samples/sec   Loss 4.7630   LearningRate 0.0093   Epoch: 13   Global Step: 232230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:17,422-Speed 9469.95 samples/sec   Loss 4.7036   LearningRate 0.0093   Epoch: 13   Global Step: 232240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:18,542-Speed 9153.33 samples/sec   Loss 4.6760   LearningRate 0.0093   Epoch: 13   Global Step: 232250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:19,646-Speed 9279.07 samples/sec   Loss 4.7296   LearningRate 0.0093   Epoch: 13   Global Step: 232260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:20,717-Speed 9563.67 samples/sec   Loss 4.7030   LearningRate 0.0093   Epoch: 13   Global Step: 232270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:21,847-Speed 9069.81 samples/sec   Loss 4.6899   LearningRate 0.0093   Epoch: 13   Global Step: 232280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:22,922-Speed 9534.52 samples/sec   Loss 4.7495   LearningRate 0.0093   Epoch: 13   Global Step: 232290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:24,004-Speed 9469.65 samples/sec   Loss 4.8230   LearningRate 0.0092   Epoch: 13   Global Step: 232300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:25,086-Speed 9464.18 samples/sec   Loss 4.7918   LearningRate 0.0092   Epoch: 13   Global Step: 232310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:26,188-Speed 9295.82 samples/sec   Loss 4.7133   LearningRate 0.0092   Epoch: 13   Global Step: 232320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:27,295-Speed 9256.85 samples/sec   Loss 4.6930   LearningRate 0.0092   Epoch: 13   Global Step: 232330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:28,407-Speed 9220.81 samples/sec   Loss 4.7129   LearningRate 0.0092   Epoch: 13   Global Step: 232340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:29,530-Speed 9119.05 samples/sec   Loss 4.7600   LearningRate 0.0092   Epoch: 13   Global Step: 232350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:30,638-Speed 9251.52 samples/sec   Loss 4.7846   LearningRate 0.0092   Epoch: 13   Global Step: 232360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:31,729-Speed 9394.02 samples/sec   Loss 4.6291   LearningRate 0.0092   Epoch: 13   Global Step: 232370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:32,806-Speed 9509.97 samples/sec   Loss 4.7582   LearningRate 0.0092   Epoch: 13   Global Step: 232380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:33,865-Speed 9673.77 samples/sec   Loss 4.8012   LearningRate 0.0092   Epoch: 13   Global Step: 232390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:34,945-Speed 9485.48 samples/sec   Loss 4.7924   LearningRate 0.0092   Epoch: 13   Global Step: 232400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:36,078-Speed 9040.96 samples/sec   Loss 4.8435   LearningRate 0.0092   Epoch: 13   Global Step: 232410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:37,143-Speed 9625.14 samples/sec   Loss 4.9045   LearningRate 0.0092   Epoch: 13   Global Step: 232420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:38,222-Speed 9496.81 samples/sec   Loss 4.7833   LearningRate 0.0092   Epoch: 13   Global Step: 232430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:39,344-Speed 9136.35 samples/sec   Loss 4.7562   LearningRate 0.0092   Epoch: 13   Global Step: 232440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:40,448-Speed 9275.52 samples/sec   Loss 4.7188   LearningRate 0.0092   Epoch: 13   Global Step: 232450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:41,523-Speed 9534.12 samples/sec   Loss 4.7311   LearningRate 0.0092   Epoch: 13   Global Step: 232460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:42,652-Speed 9077.41 samples/sec   Loss 4.7738   LearningRate 0.0092   Epoch: 13   Global Step: 232470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:43,761-Speed 9240.07 samples/sec   Loss 4.8157   LearningRate 0.0092   Epoch: 13   Global Step: 232480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:44,828-Speed 9600.99 samples/sec   Loss 4.7286   LearningRate 0.0092   Epoch: 13   Global Step: 232490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:45,904-Speed 9528.04 samples/sec   Loss 4.7653   LearningRate 0.0092   Epoch: 13   Global Step: 232500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:46,988-Speed 9446.97 samples/sec   Loss 4.6753   LearningRate 0.0092   Epoch: 13   Global Step: 232510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:48,085-Speed 9343.17 samples/sec   Loss 4.6463   LearningRate 0.0092   Epoch: 13   Global Step: 232520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:49,184-Speed 9322.80 samples/sec   Loss 4.7237   LearningRate 0.0092   Epoch: 13   Global Step: 232530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:55:50,269-Speed 9442.46 samples/sec   Loss 4.7411   LearningRate 0.0092   Epoch: 13   Global Step: 232540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:51,366-Speed 9335.26 samples/sec   Loss 4.7894   LearningRate 0.0092   Epoch: 13   Global Step: 232550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:52,541-Speed 8718.40 samples/sec   Loss 4.7307   LearningRate 0.0092   Epoch: 13   Global Step: 232560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:53,604-Speed 9639.23 samples/sec   Loss 4.6559   LearningRate 0.0092   Epoch: 13   Global Step: 232570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:54,689-Speed 9445.16 samples/sec   Loss 4.7227   LearningRate 0.0092   Epoch: 13   Global Step: 232580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:55,792-Speed 9289.41 samples/sec   Loss 4.7193   LearningRate 0.0092   Epoch: 13   Global Step: 232590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:56,856-Speed 9623.45 samples/sec   Loss 4.7857   LearningRate 0.0092   Epoch: 13   Global Step: 232600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:57,974-Speed 9173.23 samples/sec   Loss 4.6847   LearningRate 0.0092   Epoch: 13   Global Step: 232610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:55:59,052-Speed 9500.86 samples/sec   Loss 4.7249   LearningRate 0.0092   Epoch: 13   Global Step: 232620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:00,146-Speed 9365.51 samples/sec   Loss 4.7355   LearningRate 0.0092   Epoch: 13   Global Step: 232630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:01,203-Speed 9702.83 samples/sec   Loss 4.7671   LearningRate 0.0092   Epoch: 13   Global Step: 232640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:02,304-Speed 9306.19 samples/sec   Loss 4.7495   LearningRate 0.0092   Epoch: 13   Global Step: 232650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:03,387-Speed 9460.22 samples/sec   Loss 4.7789   LearningRate 0.0092   Epoch: 13   Global Step: 232660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:04,470-Speed 9455.71 samples/sec   Loss 4.6984   LearningRate 0.0092   Epoch: 13   Global Step: 232670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:05,531-Speed 9662.61 samples/sec   Loss 4.7318   LearningRate 0.0092   Epoch: 13   Global Step: 232680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:06,620-Speed 9406.94 samples/sec   Loss 4.7344   LearningRate 0.0092   Epoch: 13   Global Step: 232690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:07,658-Speed 9866.15 samples/sec   Loss 4.7710   LearningRate 0.0092   Epoch: 13   Global Step: 232700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:08,769-Speed 9227.36 samples/sec   Loss 4.7415   LearningRate 0.0092   Epoch: 13   Global Step: 232710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:09,891-Speed 9133.25 samples/sec   Loss 4.6977   LearningRate 0.0092   Epoch: 13   Global Step: 232720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:10,986-Speed 9351.89 samples/sec   Loss 4.7819   LearningRate 0.0092   Epoch: 13   Global Step: 232730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:12,055-Speed 9585.41 samples/sec   Loss 4.7279   LearningRate 0.0092   Epoch: 13   Global Step: 232740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:13,139-Speed 9454.12 samples/sec   Loss 4.8167   LearningRate 0.0092   Epoch: 13   Global Step: 232750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:14,233-Speed 9363.33 samples/sec   Loss 4.7240   LearningRate 0.0092   Epoch: 13   Global Step: 232760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:15,335-Speed 9303.81 samples/sec   Loss 4.6838   LearningRate 0.0092   Epoch: 13   Global Step: 232770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:16,438-Speed 9284.94 samples/sec   Loss 4.8024   LearningRate 0.0092   Epoch: 13   Global Step: 232780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:17,536-Speed 9331.07 samples/sec   Loss 4.7591   LearningRate 0.0092   Epoch: 13   Global Step: 232790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:18,628-Speed 9385.22 samples/sec   Loss 4.7486   LearningRate 0.0092   Epoch: 13   Global Step: 232800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:19,711-Speed 9461.48 samples/sec   Loss 4.7584   LearningRate 0.0092   Epoch: 13   Global Step: 232810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:20,808-Speed 9342.67 samples/sec   Loss 4.7365   LearningRate 0.0092   Epoch: 13   Global Step: 232820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:21,908-Speed 9310.47 samples/sec   Loss 4.6714   LearningRate 0.0092   Epoch: 13   Global Step: 232830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:23,024-Speed 9185.37 samples/sec   Loss 4.7706   LearningRate 0.0092   Epoch: 13   Global Step: 232840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:24,135-Speed 9216.90 samples/sec   Loss 4.8235   LearningRate 0.0091   Epoch: 13   Global Step: 232850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:25,233-Speed 9337.79 samples/sec   Loss 4.7272   LearningRate 0.0091   Epoch: 13   Global Step: 232860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:26,315-Speed 9464.86 samples/sec   Loss 4.7024   LearningRate 0.0091   Epoch: 13   Global Step: 232870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:27,447-Speed 9053.53 samples/sec   Loss 4.7328   LearningRate 0.0091   Epoch: 13   Global Step: 232880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:28,531-Speed 9461.62 samples/sec   Loss 4.7185   LearningRate 0.0091   Epoch: 13   Global Step: 232890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:29,596-Speed 9612.33 samples/sec   Loss 4.7645   LearningRate 0.0091   Epoch: 13   Global Step: 232900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:30,699-Speed 9296.20 samples/sec   Loss 4.7850   LearningRate 0.0091   Epoch: 13   Global Step: 232910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:31,816-Speed 9171.03 samples/sec   Loss 4.7438   LearningRate 0.0091   Epoch: 13   Global Step: 232920   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 20:56:32,886-Speed 9574.48 samples/sec   Loss 4.7132   LearningRate 0.0091   Epoch: 13   Global Step: 232930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:34,001-Speed 9189.30 samples/sec   Loss 4.7061   LearningRate 0.0091   Epoch: 13   Global Step: 232940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:35,085-Speed 9454.41 samples/sec   Loss 4.6173   LearningRate 0.0091   Epoch: 13   Global Step: 232950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:36,178-Speed 9372.22 samples/sec   Loss 4.7174   LearningRate 0.0091   Epoch: 13   Global Step: 232960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:37,258-Speed 9485.62 samples/sec   Loss 4.7322   LearningRate 0.0091   Epoch: 13   Global Step: 232970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:38,309-Speed 9757.51 samples/sec   Loss 4.6866   LearningRate 0.0091   Epoch: 13   Global Step: 232980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:39,352-Speed 9818.85 samples/sec   Loss 4.7516   LearningRate 0.0091   Epoch: 13   Global Step: 232990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:40,428-Speed 9523.95 samples/sec   Loss 4.7059   LearningRate 0.0091   Epoch: 13   Global Step: 233000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:41,560-Speed 9052.10 samples/sec   Loss 4.6988   LearningRate 0.0091   Epoch: 13   Global Step: 233010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:42,669-Speed 9240.62 samples/sec   Loss 4.6854   LearningRate 0.0091   Epoch: 13   Global Step: 233020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:43,768-Speed 9335.31 samples/sec   Loss 4.8318   LearningRate 0.0091   Epoch: 13   Global Step: 233030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:44,854-Speed 9431.95 samples/sec   Loss 4.6811   LearningRate 0.0091   Epoch: 13   Global Step: 233040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:45,949-Speed 9359.14 samples/sec   Loss 4.7758   LearningRate 0.0091   Epoch: 13   Global Step: 233050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:47,021-Speed 9555.18 samples/sec   Loss 4.7892   LearningRate 0.0091   Epoch: 13   Global Step: 233060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:56:48,087-Speed 9608.54 samples/sec   Loss 4.7473   LearningRate 0.0091   Epoch: 13   Global Step: 233070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:49,131-Speed 9815.22 samples/sec   Loss 4.7443   LearningRate 0.0091   Epoch: 13   Global Step: 233080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:50,215-Speed 9458.16 samples/sec   Loss 4.6863   LearningRate 0.0091   Epoch: 13   Global Step: 233090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:51,284-Speed 9584.89 samples/sec   Loss 4.8152   LearningRate 0.0091   Epoch: 13   Global Step: 233100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:52,343-Speed 9675.20 samples/sec   Loss 4.7753   LearningRate 0.0091   Epoch: 13   Global Step: 233110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:53,472-Speed 9069.13 samples/sec   Loss 4.7445   LearningRate 0.0091   Epoch: 13   Global Step: 233120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:54,575-Speed 9290.06 samples/sec   Loss 4.7563   LearningRate 0.0091   Epoch: 13   Global Step: 233130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:55,717-Speed 8970.62 samples/sec   Loss 4.7663   LearningRate 0.0091   Epoch: 13   Global Step: 233140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:56,772-Speed 9713.05 samples/sec   Loss 4.6792   LearningRate 0.0091   Epoch: 13   Global Step: 233150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:57,820-Speed 9782.60 samples/sec   Loss 4.7082   LearningRate 0.0091   Epoch: 13   Global Step: 233160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:56:58,887-Speed 9595.69 samples/sec   Loss 4.7867   LearningRate 0.0091   Epoch: 13   Global Step: 233170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:00,002-Speed 9196.69 samples/sec   Loss 4.7396   LearningRate 0.0091   Epoch: 13   Global Step: 233180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:01,089-Speed 9425.00 samples/sec   Loss 4.6046   LearningRate 0.0091   Epoch: 13   Global Step: 233190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:02,211-Speed 9124.51 samples/sec   Loss 4.7201   LearningRate 0.0091   Epoch: 13   Global Step: 233200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:03,335-Speed 9124.60 samples/sec   Loss 4.7911   LearningRate 0.0091   Epoch: 13   Global Step: 233210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:04,430-Speed 9351.24 samples/sec   Loss 4.7579   LearningRate 0.0091   Epoch: 13   Global Step: 233220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:05,520-Speed 9403.31 samples/sec   Loss 4.6937   LearningRate 0.0091   Epoch: 13   Global Step: 233230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:06,594-Speed 9542.16 samples/sec   Loss 4.8045   LearningRate 0.0091   Epoch: 13   Global Step: 233240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:07,678-Speed 9445.01 samples/sec   Loss 4.6646   LearningRate 0.0091   Epoch: 13   Global Step: 233250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:08,748-Speed 9577.94 samples/sec   Loss 4.6975   LearningRate 0.0091   Epoch: 13   Global Step: 233260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:09,836-Speed 9413.41 samples/sec   Loss 4.6202   LearningRate 0.0091   Epoch: 13   Global Step: 233270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:10,921-Speed 9444.61 samples/sec   Loss 4.7679   LearningRate 0.0091   Epoch: 13   Global Step: 233280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:11,992-Speed 9564.50 samples/sec   Loss 4.6967   LearningRate 0.0091   Epoch: 13   Global Step: 233290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:13,063-Speed 9569.62 samples/sec   Loss 4.7784   LearningRate 0.0091   Epoch: 13   Global Step: 233300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:14,149-Speed 9442.74 samples/sec   Loss 4.7747   LearningRate 0.0091   Epoch: 13   Global Step: 233310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:15,228-Speed 9496.68 samples/sec   Loss 4.7406   LearningRate 0.0091   Epoch: 13   Global Step: 233320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:16,323-Speed 9353.25 samples/sec   Loss 4.7007   LearningRate 0.0091   Epoch: 13   Global Step: 233330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:17,442-Speed 9157.22 samples/sec   Loss 4.6916   LearningRate 0.0091   Epoch: 13   Global Step: 233340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:18,529-Speed 9424.06 samples/sec   Loss 4.7056   LearningRate 0.0091   Epoch: 13   Global Step: 233350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:19,588-Speed 9679.04 samples/sec   Loss 4.7117   LearningRate 0.0091   Epoch: 13   Global Step: 233360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:20,651-Speed 9639.82 samples/sec   Loss 4.7255   LearningRate 0.0091   Epoch: 13   Global Step: 233370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:21,737-Speed 9439.13 samples/sec   Loss 4.6507   LearningRate 0.0091   Epoch: 13   Global Step: 233380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:22,816-Speed 9491.05 samples/sec   Loss 4.7453   LearningRate 0.0091   Epoch: 13   Global Step: 233390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:23,914-Speed 9332.13 samples/sec   Loss 4.6941   LearningRate 0.0090   Epoch: 13   Global Step: 233400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:25,038-Speed 9121.54 samples/sec   Loss 4.8071   LearningRate 0.0090   Epoch: 13   Global Step: 233410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:26,151-Speed 9202.97 samples/sec   Loss 4.7554   LearningRate 0.0090   Epoch: 13   Global Step: 233420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:27,226-Speed 9526.95 samples/sec   Loss 4.7838   LearningRate 0.0090   Epoch: 13   Global Step: 233430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:28,301-Speed 9536.90 samples/sec   Loss 4.7076   LearningRate 0.0090   Epoch: 13   Global Step: 233440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:29,429-Speed 9082.42 samples/sec   Loss 4.7949   LearningRate 0.0090   Epoch: 13   Global Step: 233450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:30,546-Speed 9165.68 samples/sec   Loss 4.6672   LearningRate 0.0090   Epoch: 13   Global Step: 233460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:31,647-Speed 9305.94 samples/sec   Loss 4.6838   LearningRate 0.0090   Epoch: 13   Global Step: 233470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:32,716-Speed 9596.59 samples/sec   Loss 4.7846   LearningRate 0.0090   Epoch: 13   Global Step: 233480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:33,831-Speed 9190.10 samples/sec   Loss 4.6907   LearningRate 0.0090   Epoch: 13   Global Step: 233490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:34,914-Speed 9462.36 samples/sec   Loss 4.7739   LearningRate 0.0090   Epoch: 13   Global Step: 233500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:35,995-Speed 9477.61 samples/sec   Loss 4.7665   LearningRate 0.0090   Epoch: 13   Global Step: 233510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:37,073-Speed 9506.45 samples/sec   Loss 4.6982   LearningRate 0.0090   Epoch: 13   Global Step: 233520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:38,133-Speed 9661.37 samples/sec   Loss 4.7250   LearningRate 0.0090   Epoch: 13   Global Step: 233530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:39,272-Speed 8996.73 samples/sec   Loss 4.7071   LearningRate 0.0090   Epoch: 13   Global Step: 233540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:40,367-Speed 9358.60 samples/sec   Loss 4.7473   LearningRate 0.0090   Epoch: 13   Global Step: 233550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:41,480-Speed 9208.90 samples/sec   Loss 4.7338   LearningRate 0.0090   Epoch: 13   Global Step: 233560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:42,551-Speed 9562.59 samples/sec   Loss 4.6907   LearningRate 0.0090   Epoch: 13   Global Step: 233570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:43,641-Speed 9395.78 samples/sec   Loss 4.6935   LearningRate 0.0090   Epoch: 13   Global Step: 233580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:44,732-Speed 9398.43 samples/sec   Loss 4.7216   LearningRate 0.0090   Epoch: 13   Global Step: 233590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:57:45,814-Speed 9473.13 samples/sec   Loss 4.8107   LearningRate 0.0090   Epoch: 13   Global Step: 233600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:46,929-Speed 9182.08 samples/sec   Loss 4.7602   LearningRate 0.0090   Epoch: 13   Global Step: 233610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:48,043-Speed 9200.66 samples/sec   Loss 4.6805   LearningRate 0.0090   Epoch: 13   Global Step: 233620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:49,151-Speed 9244.92 samples/sec   Loss 4.7152   LearningRate 0.0090   Epoch: 13   Global Step: 233630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:50,259-Speed 9247.63 samples/sec   Loss 4.6513   LearningRate 0.0090   Epoch: 13   Global Step: 233640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:51,348-Speed 9414.82 samples/sec   Loss 4.6725   LearningRate 0.0090   Epoch: 13   Global Step: 233650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:52,446-Speed 9332.37 samples/sec   Loss 4.7278   LearningRate 0.0090   Epoch: 13   Global Step: 233660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:57:53,994-Speed 6615.96 samples/sec   Loss 4.7626   LearningRate 0.0090   Epoch: 13   Global Step: 233670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:58:37,135-Speed 237.37 samples/sec   Loss 4.3075   LearningRate 0.0090   Epoch: 14   Global Step: 233680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:58:38,787-Speed 6207.45 samples/sec   Loss 4.0919   LearningRate 0.0090   Epoch: 14   Global Step: 233690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:58:39,866-Speed 9497.80 samples/sec   Loss 3.9814   LearningRate 0.0090   Epoch: 14   Global Step: 233700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:58:40,947-Speed 9476.81 samples/sec   Loss 4.0864   LearningRate 0.0090   Epoch: 14   Global Step: 233710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:58:42,656-Speed 5991.48 samples/sec   Loss 4.0313   LearningRate 0.0090   Epoch: 14   Global Step: 233720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:58:43,938-Speed 7996.05 samples/sec   Loss 4.0485   LearningRate 0.0090   Epoch: 14   Global Step: 233730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:58:45,042-Speed 9284.75 samples/sec   Loss 4.0234   LearningRate 0.0090   Epoch: 14   Global Step: 233740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:58:46,131-Speed 9408.45 samples/sec   Loss 4.0563   LearningRate 0.0090   Epoch: 14   Global Step: 233750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:58:47,392-Speed 8122.69 samples/sec   Loss 3.9804   LearningRate 0.0090   Epoch: 14   Global Step: 233760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:58:48,491-Speed 9320.59 samples/sec   Loss 4.1438   LearningRate 0.0090   Epoch: 14   Global Step: 233770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:58:49,587-Speed 9346.04 samples/sec   Loss 4.0933   LearningRate 0.0090   Epoch: 14   Global Step: 233780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:58:50,696-Speed 9238.46 samples/sec   Loss 4.1275   LearningRate 0.0090   Epoch: 14   Global Step: 233790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:58:51,837-Speed 8987.64 samples/sec   Loss 3.9810   LearningRate 0.0090   Epoch: 14   Global Step: 233800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:58:53,023-Speed 8641.11 samples/sec   Loss 4.0523   LearningRate 0.0090   Epoch: 14   Global Step: 233810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:58:54,116-Speed 9367.77 samples/sec   Loss 4.0375   LearningRate 0.0090   Epoch: 14   Global Step: 233820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:58:55,230-Speed 9198.82 samples/sec   Loss 4.0902   LearningRate 0.0090   Epoch: 14   Global Step: 233830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:58:56,325-Speed 9362.27 samples/sec   Loss 4.0923   LearningRate 0.0090   Epoch: 14   Global Step: 233840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:58:57,426-Speed 9307.21 samples/sec   Loss 4.1626   LearningRate 0.0090   Epoch: 14   Global Step: 233850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:58:58,519-Speed 9374.06 samples/sec   Loss 4.0813   LearningRate 0.0090   Epoch: 14   Global Step: 233860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:58:59,668-Speed 8918.74 samples/sec   Loss 4.0485   LearningRate 0.0090   Epoch: 14   Global Step: 233870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-11 20:59:00,704-Speed 9883.97 samples/sec   Loss 4.1670   LearningRate 0.0090   Epoch: 14   Global Step: 233880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:59:01,809-Speed 9273.44 samples/sec   Loss 4.0877   LearningRate 0.0090   Epoch: 14   Global Step: 233890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:59:02,882-Speed 9544.37 samples/sec   Loss 4.1224   LearningRate 0.0090   Epoch: 14   Global Step: 233900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:59:03,943-Speed 9667.22 samples/sec   Loss 4.1663   LearningRate 0.0090   Epoch: 14   Global Step: 233910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:59:04,988-Speed 9798.54 samples/sec   Loss 4.0805   LearningRate 0.0090   Epoch: 14   Global Step: 233920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:59:06,041-Speed 9735.12 samples/sec   Loss 4.1432   LearningRate 0.0090   Epoch: 14   Global Step: 233930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:59:07,119-Speed 9500.89 samples/sec   Loss 4.0464   LearningRate 0.0090   Epoch: 14   Global Step: 233940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:59:08,225-Speed 9263.09 samples/sec   Loss 4.1079   LearningRate 0.0090   Epoch: 14   Global Step: 233950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:59:09,339-Speed 9195.19 samples/sec   Loss 4.1860   LearningRate 0.0089   Epoch: 14   Global Step: 233960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:59:10,440-Speed 9313.00 samples/sec   Loss 4.1275   LearningRate 0.0089   Epoch: 14   Global Step: 233970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 20:59:11,505-Speed 9616.13 samples/sec   Loss 4.1991   LearningRate 0.0089   Epoch: 14   Global Step: 233980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:59:12,604-Speed 9320.58 samples/sec   Loss 4.1407   LearningRate 0.0089   Epoch: 14   Global Step: 233990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:59:13,677-Speed 9551.57 samples/sec   Loss 4.1166   LearningRate 0.0089   Epoch: 14   Global Step: 234000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 20:59:35,439-[lfw][234000]XNorm: 7.682958
Training: 2022-04-11 20:59:35,440-[lfw][234000]Accuracy-Flip: 0.99533+-0.00287
Training: 2022-04-11 20:59:35,440-[lfw][234000]Accuracy-Highest: 0.99733
Training: 2022-04-11 21:00:00,604-[cfp_fp][234000]XNorm: 6.673557
Training: 2022-04-11 21:00:00,604-[cfp_fp][234000]Accuracy-Flip: 0.97014+-0.01015
Training: 2022-04-11 21:00:00,605-[cfp_fp][234000]Accuracy-Highest: 0.97014
Training: 2022-04-11 21:00:22,303-[agedb_30][234000]XNorm: 7.464890
Training: 2022-04-11 21:00:22,304-[agedb_30][234000]Accuracy-Flip: 0.97083+-0.00946
Training: 2022-04-11 21:00:22,304-[agedb_30][234000]Accuracy-Highest: 0.97250
Training: 2022-04-11 21:00:23,404-Speed 146.86 samples/sec   Loss 4.1683   LearningRate 0.0089   Epoch: 14   Global Step: 234010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:24,451-Speed 9782.74 samples/sec   Loss 4.0868   LearningRate 0.0089   Epoch: 14   Global Step: 234020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:25,513-Speed 9645.01 samples/sec   Loss 4.1081   LearningRate 0.0089   Epoch: 14   Global Step: 234030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:26,718-Speed 8505.69 samples/sec   Loss 4.0329   LearningRate 0.0089   Epoch: 14   Global Step: 234040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:27,961-Speed 8241.14 samples/sec   Loss 4.0111   LearningRate 0.0089   Epoch: 14   Global Step: 234050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:29,066-Speed 9272.20 samples/sec   Loss 4.1344   LearningRate 0.0089   Epoch: 14   Global Step: 234060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:30,134-Speed 9592.85 samples/sec   Loss 4.0457   LearningRate 0.0089   Epoch: 14   Global Step: 234070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:31,198-Speed 9632.70 samples/sec   Loss 4.0812   LearningRate 0.0089   Epoch: 14   Global Step: 234080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:32,284-Speed 9434.35 samples/sec   Loss 4.1727   LearningRate 0.0089   Epoch: 14   Global Step: 234090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:33,377-Speed 9370.07 samples/sec   Loss 4.0981   LearningRate 0.0089   Epoch: 14   Global Step: 234100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:34,464-Speed 9425.94 samples/sec   Loss 4.1739   LearningRate 0.0089   Epoch: 14   Global Step: 234110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:35,512-Speed 9776.13 samples/sec   Loss 4.1463   LearningRate 0.0089   Epoch: 14   Global Step: 234120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:36,611-Speed 9325.04 samples/sec   Loss 4.0867   LearningRate 0.0089   Epoch: 14   Global Step: 234130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:37,676-Speed 9623.63 samples/sec   Loss 4.0758   LearningRate 0.0089   Epoch: 14   Global Step: 234140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:38,756-Speed 9488.28 samples/sec   Loss 4.0861   LearningRate 0.0089   Epoch: 14   Global Step: 234150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:39,841-Speed 9440.23 samples/sec   Loss 4.0990   LearningRate 0.0089   Epoch: 14   Global Step: 234160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:40,950-Speed 9233.43 samples/sec   Loss 4.1359   LearningRate 0.0089   Epoch: 14   Global Step: 234170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:42,088-Speed 9004.29 samples/sec   Loss 4.0943   LearningRate 0.0089   Epoch: 14   Global Step: 234180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:43,194-Speed 9269.90 samples/sec   Loss 4.1615   LearningRate 0.0089   Epoch: 14   Global Step: 234190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:44,263-Speed 9587.39 samples/sec   Loss 4.0268   LearningRate 0.0089   Epoch: 14   Global Step: 234200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:45,348-Speed 9443.47 samples/sec   Loss 4.0323   LearningRate 0.0089   Epoch: 14   Global Step: 234210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:46,419-Speed 9566.68 samples/sec   Loss 4.1613   LearningRate 0.0089   Epoch: 14   Global Step: 234220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:47,475-Speed 9702.22 samples/sec   Loss 4.1685   LearningRate 0.0089   Epoch: 14   Global Step: 234230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:48,547-Speed 9560.19 samples/sec   Loss 4.1253   LearningRate 0.0089   Epoch: 14   Global Step: 234240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:49,603-Speed 9698.67 samples/sec   Loss 4.1277   LearningRate 0.0089   Epoch: 14   Global Step: 234250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:50,682-Speed 9502.56 samples/sec   Loss 4.0977   LearningRate 0.0089   Epoch: 14   Global Step: 234260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:51,780-Speed 9330.78 samples/sec   Loss 4.0992   LearningRate 0.0089   Epoch: 14   Global Step: 234270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:00:52,812-Speed 9929.56 samples/sec   Loss 4.1863   LearningRate 0.0089   Epoch: 14   Global Step: 234280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:00:53,918-Speed 9264.10 samples/sec   Loss 4.1577   LearningRate 0.0089   Epoch: 14   Global Step: 234290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:00:55,015-Speed 9333.59 samples/sec   Loss 4.2415   LearningRate 0.0089   Epoch: 14   Global Step: 234300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:00:56,072-Speed 9699.57 samples/sec   Loss 4.0502   LearningRate 0.0089   Epoch: 14   Global Step: 234310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:00:57,129-Speed 9691.31 samples/sec   Loss 4.2413   LearningRate 0.0089   Epoch: 14   Global Step: 234320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:00:58,193-Speed 9629.94 samples/sec   Loss 4.0628   LearningRate 0.0089   Epoch: 14   Global Step: 234330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:00:59,243-Speed 9755.81 samples/sec   Loss 4.0820   LearningRate 0.0089   Epoch: 14   Global Step: 234340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:01:00,308-Speed 9622.20 samples/sec   Loss 4.1041   LearningRate 0.0089   Epoch: 14   Global Step: 234350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:01:01,390-Speed 9464.26 samples/sec   Loss 4.1641   LearningRate 0.0089   Epoch: 14   Global Step: 234360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:01:02,663-Speed 8054.37 samples/sec   Loss 4.2113   LearningRate 0.0089   Epoch: 14   Global Step: 234370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:01:03,752-Speed 9408.13 samples/sec   Loss 4.1636   LearningRate 0.0089   Epoch: 14   Global Step: 234380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:04,834-Speed 9472.35 samples/sec   Loss 4.1387   LearningRate 0.0089   Epoch: 14   Global Step: 234390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:05,887-Speed 9726.42 samples/sec   Loss 4.0697   LearningRate 0.0089   Epoch: 14   Global Step: 234400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:06,997-Speed 9228.89 samples/sec   Loss 4.0967   LearningRate 0.0089   Epoch: 14   Global Step: 234410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:08,050-Speed 9737.78 samples/sec   Loss 4.1182   LearningRate 0.0089   Epoch: 14   Global Step: 234420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:09,109-Speed 9670.61 samples/sec   Loss 4.1512   LearningRate 0.0089   Epoch: 14   Global Step: 234430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:10,159-Speed 9755.79 samples/sec   Loss 4.1987   LearningRate 0.0089   Epoch: 14   Global Step: 234440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:11,247-Speed 9415.99 samples/sec   Loss 4.0999   LearningRate 0.0089   Epoch: 14   Global Step: 234450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:12,369-Speed 9139.41 samples/sec   Loss 4.1206   LearningRate 0.0089   Epoch: 14   Global Step: 234460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:13,429-Speed 9661.14 samples/sec   Loss 4.2103   LearningRate 0.0089   Epoch: 14   Global Step: 234470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:14,534-Speed 9280.92 samples/sec   Loss 4.2162   LearningRate 0.0089   Epoch: 14   Global Step: 234480   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 21:01:15,631-Speed 9341.63 samples/sec   Loss 4.1625   LearningRate 0.0089   Epoch: 14   Global Step: 234490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:16,704-Speed 9542.40 samples/sec   Loss 4.0836   LearningRate 0.0089   Epoch: 14   Global Step: 234500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:17,799-Speed 9356.14 samples/sec   Loss 4.1270   LearningRate 0.0089   Epoch: 14   Global Step: 234510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:18,887-Speed 9418.30 samples/sec   Loss 4.1216   LearningRate 0.0088   Epoch: 14   Global Step: 234520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:19,943-Speed 9707.87 samples/sec   Loss 4.1183   LearningRate 0.0088   Epoch: 14   Global Step: 234530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:21,046-Speed 9293.27 samples/sec   Loss 4.2369   LearningRate 0.0088   Epoch: 14   Global Step: 234540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:22,159-Speed 9204.76 samples/sec   Loss 4.1596   LearningRate 0.0088   Epoch: 14   Global Step: 234550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:23,261-Speed 9296.55 samples/sec   Loss 4.0696   LearningRate 0.0088   Epoch: 14   Global Step: 234560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:24,356-Speed 9354.52 samples/sec   Loss 4.1200   LearningRate 0.0088   Epoch: 14   Global Step: 234570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:25,426-Speed 9578.85 samples/sec   Loss 4.0541   LearningRate 0.0088   Epoch: 14   Global Step: 234580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:26,514-Speed 9421.25 samples/sec   Loss 4.1073   LearningRate 0.0088   Epoch: 14   Global Step: 234590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:27,629-Speed 9185.19 samples/sec   Loss 4.1745   LearningRate 0.0088   Epoch: 14   Global Step: 234600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:28,683-Speed 9721.96 samples/sec   Loss 4.1201   LearningRate 0.0088   Epoch: 14   Global Step: 234610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:01:29,757-Speed 9539.57 samples/sec   Loss 4.0447   LearningRate 0.0088   Epoch: 14   Global Step: 234620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:01:30,829-Speed 9554.44 samples/sec   Loss 4.1374   LearningRate 0.0088   Epoch: 14   Global Step: 234630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:01:31,891-Speed 9651.18 samples/sec   Loss 4.0753   LearningRate 0.0088   Epoch: 14   Global Step: 234640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:01:32,956-Speed 9615.64 samples/sec   Loss 4.1504   LearningRate 0.0088   Epoch: 14   Global Step: 234650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:01:34,023-Speed 9608.70 samples/sec   Loss 4.1304   LearningRate 0.0088   Epoch: 14   Global Step: 234660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:01:35,094-Speed 9563.31 samples/sec   Loss 4.1044   LearningRate 0.0088   Epoch: 14   Global Step: 234670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:01:36,214-Speed 9149.24 samples/sec   Loss 4.0877   LearningRate 0.0088   Epoch: 14   Global Step: 234680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:01:37,285-Speed 9561.60 samples/sec   Loss 4.1235   LearningRate 0.0088   Epoch: 14   Global Step: 234690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:01:38,339-Speed 9726.79 samples/sec   Loss 4.2077   LearningRate 0.0088   Epoch: 14   Global Step: 234700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:01:39,424-Speed 9440.03 samples/sec   Loss 4.1496   LearningRate 0.0088   Epoch: 14   Global Step: 234710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:40,497-Speed 9555.40 samples/sec   Loss 4.1940   LearningRate 0.0088   Epoch: 14   Global Step: 234720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:41,595-Speed 9332.51 samples/sec   Loss 4.1077   LearningRate 0.0088   Epoch: 14   Global Step: 234730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:42,682-Speed 9427.14 samples/sec   Loss 4.2265   LearningRate 0.0088   Epoch: 14   Global Step: 234740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:43,786-Speed 9281.76 samples/sec   Loss 4.1438   LearningRate 0.0088   Epoch: 14   Global Step: 234750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:44,899-Speed 9203.73 samples/sec   Loss 4.2519   LearningRate 0.0088   Epoch: 14   Global Step: 234760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:45,950-Speed 9749.55 samples/sec   Loss 4.1612   LearningRate 0.0088   Epoch: 14   Global Step: 234770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:47,028-Speed 9497.85 samples/sec   Loss 4.1307   LearningRate 0.0088   Epoch: 14   Global Step: 234780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:48,081-Speed 9735.38 samples/sec   Loss 4.2104   LearningRate 0.0088   Epoch: 14   Global Step: 234790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:49,162-Speed 9473.12 samples/sec   Loss 4.2189   LearningRate 0.0088   Epoch: 14   Global Step: 234800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:50,195-Speed 9923.05 samples/sec   Loss 4.2500   LearningRate 0.0088   Epoch: 14   Global Step: 234810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:51,262-Speed 9602.71 samples/sec   Loss 4.3467   LearningRate 0.0088   Epoch: 14   Global Step: 234820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:52,345-Speed 9463.93 samples/sec   Loss 4.1369   LearningRate 0.0088   Epoch: 14   Global Step: 234830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:53,424-Speed 9496.99 samples/sec   Loss 4.1983   LearningRate 0.0088   Epoch: 14   Global Step: 234840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:54,543-Speed 9156.59 samples/sec   Loss 4.2008   LearningRate 0.0088   Epoch: 14   Global Step: 234850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:55,616-Speed 9548.94 samples/sec   Loss 4.1419   LearningRate 0.0088   Epoch: 14   Global Step: 234860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:56,702-Speed 9439.16 samples/sec   Loss 4.2345   LearningRate 0.0088   Epoch: 14   Global Step: 234870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:57,833-Speed 9052.47 samples/sec   Loss 4.1897   LearningRate 0.0088   Epoch: 14   Global Step: 234880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:58,931-Speed 9330.82 samples/sec   Loss 4.1487   LearningRate 0.0088   Epoch: 14   Global Step: 234890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:01:59,982-Speed 9754.87 samples/sec   Loss 4.1650   LearningRate 0.0088   Epoch: 14   Global Step: 234900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:01,070-Speed 9417.16 samples/sec   Loss 4.1309   LearningRate 0.0088   Epoch: 14   Global Step: 234910   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 21:02:02,164-Speed 9362.69 samples/sec   Loss 4.1352   LearningRate 0.0088   Epoch: 14   Global Step: 234920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:03,273-Speed 9244.43 samples/sec   Loss 4.1083   LearningRate 0.0088   Epoch: 14   Global Step: 234930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:04,370-Speed 9340.58 samples/sec   Loss 4.1336   LearningRate 0.0088   Epoch: 14   Global Step: 234940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:05,445-Speed 9526.32 samples/sec   Loss 4.1701   LearningRate 0.0088   Epoch: 14   Global Step: 234950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:06,524-Speed 9492.22 samples/sec   Loss 4.2334   LearningRate 0.0088   Epoch: 14   Global Step: 234960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:07,605-Speed 9485.55 samples/sec   Loss 4.2244   LearningRate 0.0088   Epoch: 14   Global Step: 234970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:08,636-Speed 9928.57 samples/sec   Loss 4.1620   LearningRate 0.0088   Epoch: 14   Global Step: 234980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:09,751-Speed 9198.74 samples/sec   Loss 4.1820   LearningRate 0.0088   Epoch: 14   Global Step: 234990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:10,819-Speed 9594.13 samples/sec   Loss 4.1728   LearningRate 0.0088   Epoch: 14   Global Step: 235000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:11,930-Speed 9216.18 samples/sec   Loss 4.1715   LearningRate 0.0088   Epoch: 14   Global Step: 235010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:12,975-Speed 9806.96 samples/sec   Loss 4.2300   LearningRate 0.0088   Epoch: 14   Global Step: 235020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:14,038-Speed 9635.57 samples/sec   Loss 4.2566   LearningRate 0.0088   Epoch: 14   Global Step: 235030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:15,094-Speed 9706.43 samples/sec   Loss 4.1772   LearningRate 0.0088   Epoch: 14   Global Step: 235040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:16,186-Speed 9385.76 samples/sec   Loss 4.2105   LearningRate 0.0088   Epoch: 14   Global Step: 235050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:17,287-Speed 9301.26 samples/sec   Loss 4.0865   LearningRate 0.0088   Epoch: 14   Global Step: 235060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:18,366-Speed 9493.13 samples/sec   Loss 4.2100   LearningRate 0.0088   Epoch: 14   Global Step: 235070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:19,430-Speed 9640.47 samples/sec   Loss 4.0809   LearningRate 0.0087   Epoch: 14   Global Step: 235080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:20,501-Speed 9564.33 samples/sec   Loss 4.2081   LearningRate 0.0087   Epoch: 14   Global Step: 235090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:21,579-Speed 9507.39 samples/sec   Loss 4.1628   LearningRate 0.0087   Epoch: 14   Global Step: 235100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:22,673-Speed 9361.36 samples/sec   Loss 4.1529   LearningRate 0.0087   Epoch: 14   Global Step: 235110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:23,776-Speed 9288.25 samples/sec   Loss 4.2362   LearningRate 0.0087   Epoch: 14   Global Step: 235120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:24,849-Speed 9549.15 samples/sec   Loss 4.3138   LearningRate 0.0087   Epoch: 14   Global Step: 235130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:25,920-Speed 9568.78 samples/sec   Loss 4.1922   LearningRate 0.0087   Epoch: 14   Global Step: 235140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:27,034-Speed 9203.00 samples/sec   Loss 4.2051   LearningRate 0.0087   Epoch: 14   Global Step: 235150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:28,119-Speed 9438.75 samples/sec   Loss 4.2163   LearningRate 0.0087   Epoch: 14   Global Step: 235160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:29,252-Speed 9046.49 samples/sec   Loss 4.1619   LearningRate 0.0087   Epoch: 14   Global Step: 235170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:30,350-Speed 9326.78 samples/sec   Loss 4.1114   LearningRate 0.0087   Epoch: 14   Global Step: 235180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:31,428-Speed 9501.81 samples/sec   Loss 4.1329   LearningRate 0.0087   Epoch: 14   Global Step: 235190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:32,481-Speed 9733.06 samples/sec   Loss 4.1023   LearningRate 0.0087   Epoch: 14   Global Step: 235200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:33,550-Speed 9582.02 samples/sec   Loss 4.1734   LearningRate 0.0087   Epoch: 14   Global Step: 235210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:34,594-Speed 9814.91 samples/sec   Loss 4.1604   LearningRate 0.0087   Epoch: 14   Global Step: 235220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:35,688-Speed 9366.94 samples/sec   Loss 4.1692   LearningRate 0.0087   Epoch: 14   Global Step: 235230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:36,756-Speed 9589.32 samples/sec   Loss 4.1919   LearningRate 0.0087   Epoch: 14   Global Step: 235240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:37,856-Speed 9324.68 samples/sec   Loss 4.3314   LearningRate 0.0087   Epoch: 14   Global Step: 235250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:38,956-Speed 9316.66 samples/sec   Loss 4.2335   LearningRate 0.0087   Epoch: 14   Global Step: 235260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:40,042-Speed 9437.50 samples/sec   Loss 4.2665   LearningRate 0.0087   Epoch: 14   Global Step: 235270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:41,104-Speed 9644.21 samples/sec   Loss 4.3095   LearningRate 0.0087   Epoch: 14   Global Step: 235280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:42,158-Speed 9721.75 samples/sec   Loss 4.1862   LearningRate 0.0087   Epoch: 14   Global Step: 235290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:43,270-Speed 9217.94 samples/sec   Loss 4.2517   LearningRate 0.0087   Epoch: 14   Global Step: 235300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:44,349-Speed 9493.37 samples/sec   Loss 4.2072   LearningRate 0.0087   Epoch: 14   Global Step: 235310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:45,400-Speed 9749.20 samples/sec   Loss 4.2345   LearningRate 0.0087   Epoch: 14   Global Step: 235320   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 21:02:46,458-Speed 9686.85 samples/sec   Loss 4.2853   LearningRate 0.0087   Epoch: 14   Global Step: 235330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:47,545-Speed 9425.29 samples/sec   Loss 4.2500   LearningRate 0.0087   Epoch: 14   Global Step: 235340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:48,681-Speed 9014.65 samples/sec   Loss 4.2210   LearningRate 0.0087   Epoch: 14   Global Step: 235350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:49,792-Speed 9229.49 samples/sec   Loss 4.2225   LearningRate 0.0087   Epoch: 14   Global Step: 235360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:50,858-Speed 9610.62 samples/sec   Loss 4.2730   LearningRate 0.0087   Epoch: 14   Global Step: 235370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:51,959-Speed 9299.98 samples/sec   Loss 4.1985   LearningRate 0.0087   Epoch: 14   Global Step: 235380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:53,011-Speed 9742.27 samples/sec   Loss 4.1755   LearningRate 0.0087   Epoch: 14   Global Step: 235390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:54,102-Speed 9394.66 samples/sec   Loss 4.2055   LearningRate 0.0087   Epoch: 14   Global Step: 235400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:55,176-Speed 9535.72 samples/sec   Loss 4.2763   LearningRate 0.0087   Epoch: 14   Global Step: 235410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:56,349-Speed 8743.03 samples/sec   Loss 4.1977   LearningRate 0.0087   Epoch: 14   Global Step: 235420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:57,444-Speed 9358.02 samples/sec   Loss 4.2858   LearningRate 0.0087   Epoch: 14   Global Step: 235430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:58,562-Speed 9161.81 samples/sec   Loss 4.2306   LearningRate 0.0087   Epoch: 14   Global Step: 235440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:02:59,660-Speed 9327.42 samples/sec   Loss 4.2426   LearningRate 0.0087   Epoch: 14   Global Step: 235450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:00,711-Speed 9752.15 samples/sec   Loss 4.1906   LearningRate 0.0087   Epoch: 14   Global Step: 235460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:01,795-Speed 9450.31 samples/sec   Loss 4.2033   LearningRate 0.0087   Epoch: 14   Global Step: 235470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:02,834-Speed 9864.35 samples/sec   Loss 4.3301   LearningRate 0.0087   Epoch: 14   Global Step: 235480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:03,877-Speed 9816.29 samples/sec   Loss 4.1926   LearningRate 0.0087   Epoch: 14   Global Step: 235490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:04,935-Speed 9692.26 samples/sec   Loss 4.1807   LearningRate 0.0087   Epoch: 14   Global Step: 235500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:06,014-Speed 9493.85 samples/sec   Loss 4.2301   LearningRate 0.0087   Epoch: 14   Global Step: 235510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:07,129-Speed 9184.58 samples/sec   Loss 4.2473   LearningRate 0.0087   Epoch: 14   Global Step: 235520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:08,225-Speed 9349.29 samples/sec   Loss 4.2311   LearningRate 0.0087   Epoch: 14   Global Step: 235530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:09,368-Speed 8964.19 samples/sec   Loss 4.2699   LearningRate 0.0087   Epoch: 14   Global Step: 235540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:10,464-Speed 9346.69 samples/sec   Loss 4.1886   LearningRate 0.0087   Epoch: 14   Global Step: 235550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:11,555-Speed 9391.09 samples/sec   Loss 4.2360   LearningRate 0.0087   Epoch: 14   Global Step: 235560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:12,631-Speed 9522.49 samples/sec   Loss 4.1807   LearningRate 0.0087   Epoch: 14   Global Step: 235570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:13,688-Speed 9697.90 samples/sec   Loss 4.3161   LearningRate 0.0087   Epoch: 14   Global Step: 235580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:14,753-Speed 9619.28 samples/sec   Loss 4.1838   LearningRate 0.0087   Epoch: 14   Global Step: 235590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:15,834-Speed 9476.62 samples/sec   Loss 4.2950   LearningRate 0.0087   Epoch: 14   Global Step: 235600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:16,923-Speed 9413.10 samples/sec   Loss 4.2791   LearningRate 0.0087   Epoch: 14   Global Step: 235610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:18,006-Speed 9460.22 samples/sec   Loss 4.2958   LearningRate 0.0087   Epoch: 14   Global Step: 235620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:19,142-Speed 9013.70 samples/sec   Loss 4.1810   LearningRate 0.0087   Epoch: 14   Global Step: 235630   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 21:03:20,211-Speed 9591.84 samples/sec   Loss 4.2144   LearningRate 0.0087   Epoch: 14   Global Step: 235640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:21,301-Speed 9397.79 samples/sec   Loss 4.2243   LearningRate 0.0086   Epoch: 14   Global Step: 235650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:22,357-Speed 9703.57 samples/sec   Loss 4.2629   LearningRate 0.0086   Epoch: 14   Global Step: 235660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:23,443-Speed 9432.50 samples/sec   Loss 4.2868   LearningRate 0.0086   Epoch: 14   Global Step: 235670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:24,557-Speed 9196.39 samples/sec   Loss 4.2541   LearningRate 0.0086   Epoch: 14   Global Step: 235680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:25,661-Speed 9284.69 samples/sec   Loss 4.2677   LearningRate 0.0086   Epoch: 14   Global Step: 235690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:26,750-Speed 9412.74 samples/sec   Loss 4.3631   LearningRate 0.0086   Epoch: 14   Global Step: 235700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:27,827-Speed 9508.29 samples/sec   Loss 4.2405   LearningRate 0.0086   Epoch: 14   Global Step: 235710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:28,881-Speed 9725.16 samples/sec   Loss 4.2660   LearningRate 0.0086   Epoch: 14   Global Step: 235720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:29,988-Speed 9256.73 samples/sec   Loss 4.2288   LearningRate 0.0086   Epoch: 14   Global Step: 235730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:31,055-Speed 9598.25 samples/sec   Loss 4.2732   LearningRate 0.0086   Epoch: 14   Global Step: 235740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:32,177-Speed 9133.89 samples/sec   Loss 4.2329   LearningRate 0.0086   Epoch: 14   Global Step: 235750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:33,260-Speed 9460.78 samples/sec   Loss 4.2906   LearningRate 0.0086   Epoch: 14   Global Step: 235760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:34,310-Speed 9762.63 samples/sec   Loss 4.2496   LearningRate 0.0086   Epoch: 14   Global Step: 235770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:35,449-Speed 8995.66 samples/sec   Loss 4.3224   LearningRate 0.0086   Epoch: 14   Global Step: 235780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:36,539-Speed 9402.56 samples/sec   Loss 4.3769   LearningRate 0.0086   Epoch: 14   Global Step: 235790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:37,615-Speed 9514.43 samples/sec   Loss 4.2387   LearningRate 0.0086   Epoch: 14   Global Step: 235800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:03:38,679-Speed 9634.04 samples/sec   Loss 4.2730   LearningRate 0.0086   Epoch: 14   Global Step: 235810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:03:39,756-Speed 9517.01 samples/sec   Loss 4.2281   LearningRate 0.0086   Epoch: 14   Global Step: 235820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:03:40,845-Speed 9407.95 samples/sec   Loss 4.2387   LearningRate 0.0086   Epoch: 14   Global Step: 235830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:03:41,939-Speed 9360.14 samples/sec   Loss 4.2008   LearningRate 0.0086   Epoch: 14   Global Step: 235840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:03:42,983-Speed 9817.72 samples/sec   Loss 4.1965   LearningRate 0.0086   Epoch: 14   Global Step: 235850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:03:44,086-Speed 9291.29 samples/sec   Loss 4.2937   LearningRate 0.0086   Epoch: 14   Global Step: 235860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:03:45,165-Speed 9493.34 samples/sec   Loss 4.1940   LearningRate 0.0086   Epoch: 14   Global Step: 235870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:03:46,269-Speed 9276.51 samples/sec   Loss 4.1677   LearningRate 0.0086   Epoch: 14   Global Step: 235880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:03:47,310-Speed 9848.99 samples/sec   Loss 4.3647   LearningRate 0.0086   Epoch: 14   Global Step: 235890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:03:48,404-Speed 9366.38 samples/sec   Loss 4.2532   LearningRate 0.0086   Epoch: 14   Global Step: 235900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:49,462-Speed 9684.18 samples/sec   Loss 4.4075   LearningRate 0.0086   Epoch: 14   Global Step: 235910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:50,551-Speed 9409.01 samples/sec   Loss 4.2428   LearningRate 0.0086   Epoch: 14   Global Step: 235920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:51,630-Speed 9497.10 samples/sec   Loss 4.1760   LearningRate 0.0086   Epoch: 14   Global Step: 235930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:52,757-Speed 9089.40 samples/sec   Loss 4.2266   LearningRate 0.0086   Epoch: 14   Global Step: 235940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:53,852-Speed 9356.83 samples/sec   Loss 4.1637   LearningRate 0.0086   Epoch: 14   Global Step: 235950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:03:54,939-Speed 9424.65 samples/sec   Loss 4.2310   LearningRate 0.0086   Epoch: 14   Global Step: 235960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:03:55,993-Speed 9726.19 samples/sec   Loss 4.2350   LearningRate 0.0086   Epoch: 14   Global Step: 235970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:03:57,025-Speed 9925.74 samples/sec   Loss 4.2539   LearningRate 0.0086   Epoch: 14   Global Step: 235980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:03:58,093-Speed 9597.43 samples/sec   Loss 4.2909   LearningRate 0.0086   Epoch: 14   Global Step: 235990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:03:59,161-Speed 9589.69 samples/sec   Loss 4.2874   LearningRate 0.0086   Epoch: 14   Global Step: 236000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:04:21,666-[lfw][236000]XNorm: 7.708942
Training: 2022-04-11 21:04:21,667-[lfw][236000]Accuracy-Flip: 0.99667+-0.00269
Training: 2022-04-11 21:04:21,667-[lfw][236000]Accuracy-Highest: 0.99733
Training: 2022-04-11 21:04:47,978-[cfp_fp][236000]XNorm: 6.633025
Training: 2022-04-11 21:04:47,979-[cfp_fp][236000]Accuracy-Flip: 0.97143+-0.00780
Training: 2022-04-11 21:04:47,979-[cfp_fp][236000]Accuracy-Highest: 0.97143
Training: 2022-04-11 21:05:10,337-[agedb_30][236000]XNorm: 7.491549
Training: 2022-04-11 21:05:10,338-[agedb_30][236000]Accuracy-Flip: 0.96933+-0.00952
Training: 2022-04-11 21:05:10,339-[agedb_30][236000]Accuracy-Highest: 0.97250
Training: 2022-04-11 21:05:11,430-Speed 141.70 samples/sec   Loss 4.2550   LearningRate 0.0086   Epoch: 14   Global Step: 236010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:12,502-Speed 9549.93 samples/sec   Loss 4.2933   LearningRate 0.0086   Epoch: 14   Global Step: 236020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:13,549-Speed 9786.20 samples/sec   Loss 4.3304   LearningRate 0.0086   Epoch: 14   Global Step: 236030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:14,626-Speed 9517.47 samples/sec   Loss 4.1797   LearningRate 0.0086   Epoch: 14   Global Step: 236040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:15,697-Speed 9561.42 samples/sec   Loss 4.3039   LearningRate 0.0086   Epoch: 14   Global Step: 236050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:16,736-Speed 9869.09 samples/sec   Loss 4.2664   LearningRate 0.0086   Epoch: 14   Global Step: 236060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:05:17,782-Speed 9789.70 samples/sec   Loss 4.2642   LearningRate 0.0086   Epoch: 14   Global Step: 236070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:05:18,881-Speed 9327.21 samples/sec   Loss 4.2090   LearningRate 0.0086   Epoch: 14   Global Step: 236080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:19,952-Speed 9571.22 samples/sec   Loss 4.2884   LearningRate 0.0086   Epoch: 14   Global Step: 236090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:21,050-Speed 9330.17 samples/sec   Loss 4.1626   LearningRate 0.0086   Epoch: 14   Global Step: 236100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:22,136-Speed 9432.78 samples/sec   Loss 4.4476   LearningRate 0.0086   Epoch: 14   Global Step: 236110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:23,175-Speed 9864.89 samples/sec   Loss 4.2564   LearningRate 0.0086   Epoch: 14   Global Step: 236120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:24,267-Speed 9375.02 samples/sec   Loss 4.2454   LearningRate 0.0086   Epoch: 14   Global Step: 236130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:25,343-Speed 9526.86 samples/sec   Loss 4.2522   LearningRate 0.0086   Epoch: 14   Global Step: 236140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:26,432-Speed 9412.68 samples/sec   Loss 4.3129   LearningRate 0.0086   Epoch: 14   Global Step: 236150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:27,533-Speed 9303.44 samples/sec   Loss 4.3435   LearningRate 0.0086   Epoch: 14   Global Step: 236160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:28,578-Speed 9803.57 samples/sec   Loss 4.2613   LearningRate 0.0086   Epoch: 14   Global Step: 236170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:29,691-Speed 9205.66 samples/sec   Loss 4.2204   LearningRate 0.0086   Epoch: 14   Global Step: 236180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:05:30,764-Speed 9551.86 samples/sec   Loss 4.3413   LearningRate 0.0086   Epoch: 14   Global Step: 236190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:31,854-Speed 9398.32 samples/sec   Loss 4.2807   LearningRate 0.0086   Epoch: 14   Global Step: 236200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:32,952-Speed 9328.44 samples/sec   Loss 4.2891   LearningRate 0.0085   Epoch: 14   Global Step: 236210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:34,044-Speed 9386.55 samples/sec   Loss 4.2981   LearningRate 0.0085   Epoch: 14   Global Step: 236220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:35,138-Speed 9369.74 samples/sec   Loss 4.2547   LearningRate 0.0085   Epoch: 14   Global Step: 236230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:36,260-Speed 9126.17 samples/sec   Loss 4.3726   LearningRate 0.0085   Epoch: 14   Global Step: 236240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:37,339-Speed 9498.32 samples/sec   Loss 4.3124   LearningRate 0.0085   Epoch: 14   Global Step: 236250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:38,419-Speed 9489.52 samples/sec   Loss 4.2927   LearningRate 0.0085   Epoch: 14   Global Step: 236260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:39,493-Speed 9547.85 samples/sec   Loss 4.1901   LearningRate 0.0085   Epoch: 14   Global Step: 236270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:40,584-Speed 9386.41 samples/sec   Loss 4.2656   LearningRate 0.0085   Epoch: 14   Global Step: 236280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:41,657-Speed 9546.98 samples/sec   Loss 4.2458   LearningRate 0.0085   Epoch: 14   Global Step: 236290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:05:42,763-Speed 9264.80 samples/sec   Loss 4.3296   LearningRate 0.0085   Epoch: 14   Global Step: 236300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:05:43,862-Speed 9326.44 samples/sec   Loss 4.2694   LearningRate 0.0085   Epoch: 14   Global Step: 236310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:05:44,967-Speed 9265.43 samples/sec   Loss 4.2658   LearningRate 0.0085   Epoch: 14   Global Step: 236320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:05:46,066-Speed 9323.83 samples/sec   Loss 4.2764   LearningRate 0.0085   Epoch: 14   Global Step: 236330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:05:47,140-Speed 9542.93 samples/sec   Loss 4.2269   LearningRate 0.0085   Epoch: 14   Global Step: 236340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:05:48,238-Speed 9329.13 samples/sec   Loss 4.3282   LearningRate 0.0085   Epoch: 14   Global Step: 236350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:05:49,367-Speed 9075.17 samples/sec   Loss 4.2428   LearningRate 0.0085   Epoch: 14   Global Step: 236360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:05:50,470-Speed 9290.57 samples/sec   Loss 4.2885   LearningRate 0.0085   Epoch: 14   Global Step: 236370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:05:51,540-Speed 9575.28 samples/sec   Loss 4.3058   LearningRate 0.0085   Epoch: 14   Global Step: 236380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:05:52,579-Speed 9860.11 samples/sec   Loss 4.2543   LearningRate 0.0085   Epoch: 14   Global Step: 236390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:05:53,689-Speed 9230.85 samples/sec   Loss 4.2965   LearningRate 0.0085   Epoch: 14   Global Step: 236400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:05:54,783-Speed 9367.43 samples/sec   Loss 4.3497   LearningRate 0.0085   Epoch: 14   Global Step: 236410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:05:55,868-Speed 9446.25 samples/sec   Loss 4.2534   LearningRate 0.0085   Epoch: 14   Global Step: 236420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:56,976-Speed 9249.21 samples/sec   Loss 4.1808   LearningRate 0.0085   Epoch: 14   Global Step: 236430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:58,047-Speed 9570.20 samples/sec   Loss 4.1679   LearningRate 0.0085   Epoch: 14   Global Step: 236440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:05:59,124-Speed 9514.77 samples/sec   Loss 4.2509   LearningRate 0.0085   Epoch: 14   Global Step: 236450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:00,182-Speed 9680.32 samples/sec   Loss 4.2760   LearningRate 0.0085   Epoch: 14   Global Step: 236460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:01,231-Speed 9767.23 samples/sec   Loss 4.2916   LearningRate 0.0085   Epoch: 14   Global Step: 236470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:02,342-Speed 9220.81 samples/sec   Loss 4.2498   LearningRate 0.0085   Epoch: 14   Global Step: 236480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:03,406-Speed 9633.66 samples/sec   Loss 4.2488   LearningRate 0.0085   Epoch: 14   Global Step: 236490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:04,454-Speed 9774.23 samples/sec   Loss 4.2474   LearningRate 0.0085   Epoch: 14   Global Step: 236500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:05,529-Speed 9528.28 samples/sec   Loss 4.2602   LearningRate 0.0085   Epoch: 14   Global Step: 236510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:06,596-Speed 9608.50 samples/sec   Loss 4.3179   LearningRate 0.0085   Epoch: 14   Global Step: 236520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:07,666-Speed 9570.35 samples/sec   Loss 4.3320   LearningRate 0.0085   Epoch: 14   Global Step: 236530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:08,760-Speed 9365.94 samples/sec   Loss 4.2117   LearningRate 0.0085   Epoch: 14   Global Step: 236540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:09,869-Speed 9245.55 samples/sec   Loss 4.2791   LearningRate 0.0085   Epoch: 14   Global Step: 236550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:10,968-Speed 9321.88 samples/sec   Loss 4.3213   LearningRate 0.0085   Epoch: 14   Global Step: 236560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:12,052-Speed 9452.20 samples/sec   Loss 4.1998   LearningRate 0.0085   Epoch: 14   Global Step: 236570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:13,131-Speed 9494.38 samples/sec   Loss 4.2395   LearningRate 0.0085   Epoch: 14   Global Step: 236580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:14,195-Speed 9626.74 samples/sec   Loss 4.2627   LearningRate 0.0085   Epoch: 14   Global Step: 236590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:15,311-Speed 9186.33 samples/sec   Loss 4.3194   LearningRate 0.0085   Epoch: 14   Global Step: 236600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:16,414-Speed 9283.34 samples/sec   Loss 4.3059   LearningRate 0.0085   Epoch: 14   Global Step: 236610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:17,524-Speed 9232.53 samples/sec   Loss 4.2975   LearningRate 0.0085   Epoch: 14   Global Step: 236620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:18,615-Speed 9404.22 samples/sec   Loss 4.2417   LearningRate 0.0085   Epoch: 14   Global Step: 236630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:19,749-Speed 9041.54 samples/sec   Loss 4.2816   LearningRate 0.0085   Epoch: 14   Global Step: 236640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:20,833-Speed 9451.84 samples/sec   Loss 4.2523   LearningRate 0.0085   Epoch: 14   Global Step: 236650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:21,950-Speed 9171.96 samples/sec   Loss 4.3101   LearningRate 0.0085   Epoch: 14   Global Step: 236660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:23,027-Speed 9515.39 samples/sec   Loss 4.3475   LearningRate 0.0085   Epoch: 14   Global Step: 236670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:24,088-Speed 9656.56 samples/sec   Loss 4.2909   LearningRate 0.0085   Epoch: 14   Global Step: 236680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:25,149-Speed 9655.40 samples/sec   Loss 4.2134   LearningRate 0.0085   Epoch: 14   Global Step: 236690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:26,217-Speed 9600.94 samples/sec   Loss 4.2961   LearningRate 0.0085   Epoch: 14   Global Step: 236700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:27,302-Speed 9441.53 samples/sec   Loss 4.2909   LearningRate 0.0085   Epoch: 14   Global Step: 236710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:28,390-Speed 9413.54 samples/sec   Loss 4.3477   LearningRate 0.0085   Epoch: 14   Global Step: 236720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:29,503-Speed 9201.96 samples/sec   Loss 4.3667   LearningRate 0.0085   Epoch: 14   Global Step: 236730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:30,592-Speed 9414.23 samples/sec   Loss 4.3294   LearningRate 0.0085   Epoch: 14   Global Step: 236740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:31,673-Speed 9477.96 samples/sec   Loss 4.2161   LearningRate 0.0085   Epoch: 14   Global Step: 236750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:32,751-Speed 9501.56 samples/sec   Loss 4.2685   LearningRate 0.0085   Epoch: 14   Global Step: 236760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:33,829-Speed 9506.90 samples/sec   Loss 4.2679   LearningRate 0.0085   Epoch: 14   Global Step: 236770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:34,896-Speed 9609.72 samples/sec   Loss 4.3693   LearningRate 0.0085   Epoch: 14   Global Step: 236780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:35,976-Speed 9487.16 samples/sec   Loss 4.2464   LearningRate 0.0084   Epoch: 14   Global Step: 236790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:37,026-Speed 9756.56 samples/sec   Loss 4.2915   LearningRate 0.0084   Epoch: 14   Global Step: 236800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:38,073-Speed 9786.50 samples/sec   Loss 4.3023   LearningRate 0.0084   Epoch: 14   Global Step: 236810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:39,111-Speed 9866.85 samples/sec   Loss 4.2859   LearningRate 0.0084   Epoch: 14   Global Step: 236820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:40,160-Speed 9770.09 samples/sec   Loss 4.3388   LearningRate 0.0084   Epoch: 14   Global Step: 236830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:41,239-Speed 9496.54 samples/sec   Loss 4.2527   LearningRate 0.0084   Epoch: 14   Global Step: 236840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:42,287-Speed 9771.42 samples/sec   Loss 4.3592   LearningRate 0.0084   Epoch: 14   Global Step: 236850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:43,333-Speed 9798.84 samples/sec   Loss 4.3109   LearningRate 0.0084   Epoch: 14   Global Step: 236860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:44,405-Speed 9558.99 samples/sec   Loss 4.3388   LearningRate 0.0084   Epoch: 14   Global Step: 236870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:45,488-Speed 9459.44 samples/sec   Loss 4.2547   LearningRate 0.0084   Epoch: 14   Global Step: 236880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:46,558-Speed 9574.95 samples/sec   Loss 4.3063   LearningRate 0.0084   Epoch: 14   Global Step: 236890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:47,672-Speed 9201.07 samples/sec   Loss 4.2492   LearningRate 0.0084   Epoch: 14   Global Step: 236900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:48,739-Speed 9601.38 samples/sec   Loss 4.3293   LearningRate 0.0084   Epoch: 14   Global Step: 236910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:06:49,849-Speed 9232.48 samples/sec   Loss 4.3761   LearningRate 0.0084   Epoch: 14   Global Step: 236920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:50,903-Speed 9718.61 samples/sec   Loss 4.2626   LearningRate 0.0084   Epoch: 14   Global Step: 236930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:51,962-Speed 9673.59 samples/sec   Loss 4.2854   LearningRate 0.0084   Epoch: 14   Global Step: 236940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:53,039-Speed 9515.17 samples/sec   Loss 4.3104   LearningRate 0.0084   Epoch: 14   Global Step: 236950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:54,088-Speed 9771.10 samples/sec   Loss 4.3303   LearningRate 0.0084   Epoch: 14   Global Step: 236960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:55,155-Speed 9606.56 samples/sec   Loss 4.2248   LearningRate 0.0084   Epoch: 14   Global Step: 236970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:56,240-Speed 9443.83 samples/sec   Loss 4.2765   LearningRate 0.0084   Epoch: 14   Global Step: 236980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:57,331-Speed 9389.78 samples/sec   Loss 4.3330   LearningRate 0.0084   Epoch: 14   Global Step: 236990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:58,428-Speed 9338.10 samples/sec   Loss 4.3126   LearningRate 0.0084   Epoch: 14   Global Step: 237000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:06:59,479-Speed 9743.34 samples/sec   Loss 4.3054   LearningRate 0.0084   Epoch: 14   Global Step: 237010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:00,544-Speed 9625.52 samples/sec   Loss 4.3038   LearningRate 0.0084   Epoch: 14   Global Step: 237020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:01,613-Speed 9584.79 samples/sec   Loss 4.2690   LearningRate 0.0084   Epoch: 14   Global Step: 237030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:02,664-Speed 9756.23 samples/sec   Loss 4.3103   LearningRate 0.0084   Epoch: 14   Global Step: 237040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:03,783-Speed 9154.97 samples/sec   Loss 4.2787   LearningRate 0.0084   Epoch: 14   Global Step: 237050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:04,841-Speed 9677.92 samples/sec   Loss 4.3327   LearningRate 0.0084   Epoch: 14   Global Step: 237060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:05,914-Speed 9550.08 samples/sec   Loss 4.2410   LearningRate 0.0084   Epoch: 14   Global Step: 237070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:07,011-Speed 9347.20 samples/sec   Loss 4.2937   LearningRate 0.0084   Epoch: 14   Global Step: 237080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:08,109-Speed 9333.36 samples/sec   Loss 4.3691   LearningRate 0.0084   Epoch: 14   Global Step: 237090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:09,132-Speed 10014.44 samples/sec   Loss 4.3283   LearningRate 0.0084   Epoch: 14   Global Step: 237100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:10,193-Speed 9651.56 samples/sec   Loss 4.3650   LearningRate 0.0084   Epoch: 14   Global Step: 237110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:11,286-Speed 9376.25 samples/sec   Loss 4.2853   LearningRate 0.0084   Epoch: 14   Global Step: 237120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:12,340-Speed 9725.74 samples/sec   Loss 4.3237   LearningRate 0.0084   Epoch: 14   Global Step: 237130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:13,401-Speed 9654.91 samples/sec   Loss 4.3973   LearningRate 0.0084   Epoch: 14   Global Step: 237140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:14,488-Speed 9426.08 samples/sec   Loss 4.2956   LearningRate 0.0084   Epoch: 14   Global Step: 237150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:15,564-Speed 9527.67 samples/sec   Loss 4.3937   LearningRate 0.0084   Epoch: 14   Global Step: 237160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:16,674-Speed 9230.65 samples/sec   Loss 4.2840   LearningRate 0.0084   Epoch: 14   Global Step: 237170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:17,760-Speed 9431.77 samples/sec   Loss 4.3491   LearningRate 0.0084   Epoch: 14   Global Step: 237180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:18,862-Speed 9299.55 samples/sec   Loss 4.2927   LearningRate 0.0084   Epoch: 14   Global Step: 237190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:19,953-Speed 9390.91 samples/sec   Loss 4.3911   LearningRate 0.0084   Epoch: 14   Global Step: 237200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:21,052-Speed 9317.94 samples/sec   Loss 4.3064   LearningRate 0.0084   Epoch: 14   Global Step: 237210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:22,152-Speed 9315.39 samples/sec   Loss 4.2600   LearningRate 0.0084   Epoch: 14   Global Step: 237220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:23,242-Speed 9402.41 samples/sec   Loss 4.3439   LearningRate 0.0084   Epoch: 14   Global Step: 237230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:24,311-Speed 9589.88 samples/sec   Loss 4.3472   LearningRate 0.0084   Epoch: 14   Global Step: 237240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:25,378-Speed 9603.71 samples/sec   Loss 4.3408   LearningRate 0.0084   Epoch: 14   Global Step: 237250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:26,468-Speed 9400.70 samples/sec   Loss 4.3785   LearningRate 0.0084   Epoch: 14   Global Step: 237260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:27,515-Speed 9785.68 samples/sec   Loss 4.2535   LearningRate 0.0084   Epoch: 14   Global Step: 237270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:28,573-Speed 9686.64 samples/sec   Loss 4.3324   LearningRate 0.0084   Epoch: 14   Global Step: 237280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:29,609-Speed 9883.67 samples/sec   Loss 4.1982   LearningRate 0.0084   Epoch: 14   Global Step: 237290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:30,694-Speed 9445.18 samples/sec   Loss 4.3211   LearningRate 0.0084   Epoch: 14   Global Step: 237300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:31,753-Speed 9676.57 samples/sec   Loss 4.3284   LearningRate 0.0084   Epoch: 14   Global Step: 237310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:32,836-Speed 9462.76 samples/sec   Loss 4.2757   LearningRate 0.0084   Epoch: 14   Global Step: 237320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:33,950-Speed 9193.04 samples/sec   Loss 4.3349   LearningRate 0.0084   Epoch: 14   Global Step: 237330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:35,009-Speed 9679.26 samples/sec   Loss 4.3033   LearningRate 0.0084   Epoch: 14   Global Step: 237340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:36,112-Speed 9285.77 samples/sec   Loss 4.4320   LearningRate 0.0084   Epoch: 14   Global Step: 237350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:37,200-Speed 9418.40 samples/sec   Loss 4.3155   LearningRate 0.0083   Epoch: 14   Global Step: 237360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:38,291-Speed 9387.03 samples/sec   Loss 4.3194   LearningRate 0.0083   Epoch: 14   Global Step: 237370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:39,373-Speed 9470.87 samples/sec   Loss 4.2621   LearningRate 0.0083   Epoch: 14   Global Step: 237380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:40,463-Speed 9400.46 samples/sec   Loss 4.4023   LearningRate 0.0083   Epoch: 14   Global Step: 237390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:41,556-Speed 9376.32 samples/sec   Loss 4.3412   LearningRate 0.0083   Epoch: 14   Global Step: 237400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:42,585-Speed 9957.81 samples/sec   Loss 4.4565   LearningRate 0.0083   Epoch: 14   Global Step: 237410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:43,652-Speed 9603.64 samples/sec   Loss 4.3624   LearningRate 0.0083   Epoch: 14   Global Step: 237420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:44,732-Speed 9493.50 samples/sec   Loss 4.3408   LearningRate 0.0083   Epoch: 14   Global Step: 237430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:45,813-Speed 9476.52 samples/sec   Loss 4.3928   LearningRate 0.0083   Epoch: 14   Global Step: 237440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:46,911-Speed 9332.81 samples/sec   Loss 4.4298   LearningRate 0.0083   Epoch: 14   Global Step: 237450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:48,030-Speed 9156.47 samples/sec   Loss 4.2783   LearningRate 0.0083   Epoch: 14   Global Step: 237460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:49,141-Speed 9217.90 samples/sec   Loss 4.3198   LearningRate 0.0083   Epoch: 14   Global Step: 237470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:50,280-Speed 8995.82 samples/sec   Loss 4.3833   LearningRate 0.0083   Epoch: 14   Global Step: 237480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:51,381-Speed 9305.14 samples/sec   Loss 4.4160   LearningRate 0.0083   Epoch: 14   Global Step: 237490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:52,463-Speed 9472.50 samples/sec   Loss 4.3355   LearningRate 0.0083   Epoch: 14   Global Step: 237500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:53,564-Speed 9302.71 samples/sec   Loss 4.2739   LearningRate 0.0083   Epoch: 14   Global Step: 237510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:07:54,664-Speed 9321.00 samples/sec   Loss 4.3468   LearningRate 0.0083   Epoch: 14   Global Step: 237520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:55,723-Speed 9678.75 samples/sec   Loss 4.3973   LearningRate 0.0083   Epoch: 14   Global Step: 237530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:56,844-Speed 9135.42 samples/sec   Loss 4.3227   LearningRate 0.0083   Epoch: 14   Global Step: 237540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:57,994-Speed 8913.67 samples/sec   Loss 4.2970   LearningRate 0.0083   Epoch: 14   Global Step: 237550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:07:59,080-Speed 9431.53 samples/sec   Loss 4.3660   LearningRate 0.0083   Epoch: 14   Global Step: 237560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:00,155-Speed 9537.26 samples/sec   Loss 4.3273   LearningRate 0.0083   Epoch: 14   Global Step: 237570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:01,220-Speed 9617.10 samples/sec   Loss 4.2877   LearningRate 0.0083   Epoch: 14   Global Step: 237580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:02,299-Speed 9500.06 samples/sec   Loss 4.3332   LearningRate 0.0083   Epoch: 14   Global Step: 237590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:03,405-Speed 9262.82 samples/sec   Loss 4.1983   LearningRate 0.0083   Epoch: 14   Global Step: 237600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:04,470-Speed 9621.68 samples/sec   Loss 4.3807   LearningRate 0.0083   Epoch: 14   Global Step: 237610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:05,551-Speed 9478.01 samples/sec   Loss 4.3723   LearningRate 0.0083   Epoch: 14   Global Step: 237620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:06,598-Speed 9782.81 samples/sec   Loss 4.3508   LearningRate 0.0083   Epoch: 14   Global Step: 237630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:07,663-Speed 9619.69 samples/sec   Loss 4.3077   LearningRate 0.0083   Epoch: 14   Global Step: 237640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:08,766-Speed 9295.61 samples/sec   Loss 4.4144   LearningRate 0.0083   Epoch: 14   Global Step: 237650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:09,861-Speed 9352.16 samples/sec   Loss 4.3449   LearningRate 0.0083   Epoch: 14   Global Step: 237660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:10,957-Speed 9345.34 samples/sec   Loss 4.3293   LearningRate 0.0083   Epoch: 14   Global Step: 237670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:12,026-Speed 9588.01 samples/sec   Loss 4.3204   LearningRate 0.0083   Epoch: 14   Global Step: 237680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:13,114-Speed 9416.84 samples/sec   Loss 4.2823   LearningRate 0.0083   Epoch: 14   Global Step: 237690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:14,209-Speed 9353.96 samples/sec   Loss 4.3136   LearningRate 0.0083   Epoch: 14   Global Step: 237700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:15,309-Speed 9322.68 samples/sec   Loss 4.3633   LearningRate 0.0083   Epoch: 14   Global Step: 237710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:16,389-Speed 9485.07 samples/sec   Loss 4.3310   LearningRate 0.0083   Epoch: 14   Global Step: 237720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:17,475-Speed 9432.90 samples/sec   Loss 4.3783   LearningRate 0.0083   Epoch: 14   Global Step: 237730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:18,567-Speed 9378.70 samples/sec   Loss 4.3450   LearningRate 0.0083   Epoch: 14   Global Step: 237740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:19,655-Speed 9423.23 samples/sec   Loss 4.2856   LearningRate 0.0083   Epoch: 14   Global Step: 237750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:20,766-Speed 9223.10 samples/sec   Loss 4.3549   LearningRate 0.0083   Epoch: 14   Global Step: 237760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:21,845-Speed 9500.54 samples/sec   Loss 4.3031   LearningRate 0.0083   Epoch: 14   Global Step: 237770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:22,914-Speed 9578.66 samples/sec   Loss 4.3454   LearningRate 0.0083   Epoch: 14   Global Step: 237780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:24,001-Speed 9423.16 samples/sec   Loss 4.4091   LearningRate 0.0083   Epoch: 14   Global Step: 237790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:25,087-Speed 9437.74 samples/sec   Loss 4.3483   LearningRate 0.0083   Epoch: 14   Global Step: 237800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:26,211-Speed 9118.15 samples/sec   Loss 4.4691   LearningRate 0.0083   Epoch: 14   Global Step: 237810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:27,278-Speed 9602.30 samples/sec   Loss 4.2848   LearningRate 0.0083   Epoch: 14   Global Step: 237820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:28,363-Speed 9443.00 samples/sec   Loss 4.3389   LearningRate 0.0083   Epoch: 14   Global Step: 237830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:29,473-Speed 9234.47 samples/sec   Loss 4.3186   LearningRate 0.0083   Epoch: 14   Global Step: 237840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:30,613-Speed 8986.40 samples/sec   Loss 4.3814   LearningRate 0.0083   Epoch: 14   Global Step: 237850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:31,738-Speed 9107.66 samples/sec   Loss 4.3062   LearningRate 0.0083   Epoch: 14   Global Step: 237860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:32,820-Speed 9468.75 samples/sec   Loss 4.2526   LearningRate 0.0083   Epoch: 14   Global Step: 237870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:33,879-Speed 9675.29 samples/sec   Loss 4.4078   LearningRate 0.0083   Epoch: 14   Global Step: 237880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:34,971-Speed 9386.58 samples/sec   Loss 4.3087   LearningRate 0.0083   Epoch: 14   Global Step: 237890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:36,031-Speed 9662.94 samples/sec   Loss 4.3320   LearningRate 0.0083   Epoch: 14   Global Step: 237900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:37,058-Speed 9969.72 samples/sec   Loss 4.4039   LearningRate 0.0083   Epoch: 14   Global Step: 237910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:38,132-Speed 9545.00 samples/sec   Loss 4.3406   LearningRate 0.0083   Epoch: 14   Global Step: 237920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:39,185-Speed 9729.51 samples/sec   Loss 4.3420   LearningRate 0.0083   Epoch: 14   Global Step: 237930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:40,288-Speed 9291.39 samples/sec   Loss 4.2745   LearningRate 0.0082   Epoch: 14   Global Step: 237940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:41,353-Speed 9625.00 samples/sec   Loss 4.3610   LearningRate 0.0082   Epoch: 14   Global Step: 237950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:42,440-Speed 9426.00 samples/sec   Loss 4.3300   LearningRate 0.0082   Epoch: 14   Global Step: 237960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:43,519-Speed 9494.18 samples/sec   Loss 4.3077   LearningRate 0.0082   Epoch: 14   Global Step: 237970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:08:44,578-Speed 9676.45 samples/sec   Loss 4.4674   LearningRate 0.0082   Epoch: 14   Global Step: 237980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:45,642-Speed 9632.39 samples/sec   Loss 4.2571   LearningRate 0.0082   Epoch: 14   Global Step: 237990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:08:46,716-Speed 9531.79 samples/sec   Loss 4.3980   LearningRate 0.0082   Epoch: 14   Global Step: 238000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:09:08,874-[lfw][238000]XNorm: 7.709809
Training: 2022-04-11 21:09:08,875-[lfw][238000]Accuracy-Flip: 0.99550+-0.00289
Training: 2022-04-11 21:09:08,875-[lfw][238000]Accuracy-Highest: 0.99733
Training: 2022-04-11 21:09:34,458-[cfp_fp][238000]XNorm: 6.654449
Training: 2022-04-11 21:09:34,458-[cfp_fp][238000]Accuracy-Flip: 0.96886+-0.00717
Training: 2022-04-11 21:09:34,459-[cfp_fp][238000]Accuracy-Highest: 0.97143
Training: 2022-04-11 21:09:56,498-[agedb_30][238000]XNorm: 7.481106
Training: 2022-04-11 21:09:56,499-[agedb_30][238000]Accuracy-Flip: 0.96683+-0.00917
Training: 2022-04-11 21:09:56,499-[agedb_30][238000]Accuracy-Highest: 0.97250
Training: 2022-04-11 21:09:57,602-Speed 144.46 samples/sec   Loss 4.3896   LearningRate 0.0082   Epoch: 14   Global Step: 238010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:09:58,679-Speed 9516.05 samples/sec   Loss 4.3070   LearningRate 0.0082   Epoch: 14   Global Step: 238020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:09:59,737-Speed 9680.83 samples/sec   Loss 4.4323   LearningRate 0.0082   Epoch: 14   Global Step: 238030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:00,782-Speed 9810.25 samples/sec   Loss 4.3093   LearningRate 0.0082   Epoch: 14   Global Step: 238040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:01,872-Speed 9396.11 samples/sec   Loss 4.3391   LearningRate 0.0082   Epoch: 14   Global Step: 238050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:02,923-Speed 9747.76 samples/sec   Loss 4.4112   LearningRate 0.0082   Epoch: 14   Global Step: 238060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:04,020-Speed 9343.70 samples/sec   Loss 4.2898   LearningRate 0.0082   Epoch: 14   Global Step: 238070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:05,054-Speed 9902.42 samples/sec   Loss 4.3958   LearningRate 0.0082   Epoch: 14   Global Step: 238080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:06,163-Speed 9244.66 samples/sec   Loss 4.3192   LearningRate 0.0082   Epoch: 14   Global Step: 238090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:07,231-Speed 9591.86 samples/sec   Loss 4.3054   LearningRate 0.0082   Epoch: 14   Global Step: 238100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:08,286-Speed 9712.84 samples/sec   Loss 4.3437   LearningRate 0.0082   Epoch: 14   Global Step: 238110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:09,375-Speed 9410.76 samples/sec   Loss 4.3650   LearningRate 0.0082   Epoch: 14   Global Step: 238120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:10,462-Speed 9426.98 samples/sec   Loss 4.3426   LearningRate 0.0082   Epoch: 14   Global Step: 238130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:10:11,579-Speed 9170.87 samples/sec   Loss 4.4116   LearningRate 0.0082   Epoch: 14   Global Step: 238140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:10:12,717-Speed 8998.01 samples/sec   Loss 4.3496   LearningRate 0.0082   Epoch: 14   Global Step: 238150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:10:13,743-Speed 9985.08 samples/sec   Loss 4.3782   LearningRate 0.0082   Epoch: 14   Global Step: 238160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:10:14,789-Speed 9796.13 samples/sec   Loss 4.3748   LearningRate 0.0082   Epoch: 14   Global Step: 238170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:10:15,833-Speed 9818.46 samples/sec   Loss 4.3880   LearningRate 0.0082   Epoch: 14   Global Step: 238180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:10:16,906-Speed 9546.55 samples/sec   Loss 4.3571   LearningRate 0.0082   Epoch: 14   Global Step: 238190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:10:17,963-Speed 9692.94 samples/sec   Loss 4.3414   LearningRate 0.0082   Epoch: 14   Global Step: 238200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:10:19,073-Speed 9236.81 samples/sec   Loss 4.2968   LearningRate 0.0082   Epoch: 14   Global Step: 238210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:10:20,170-Speed 9333.67 samples/sec   Loss 4.4440   LearningRate 0.0082   Epoch: 14   Global Step: 238220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:10:21,273-Speed 9288.95 samples/sec   Loss 4.3909   LearningRate 0.0082   Epoch: 14   Global Step: 238230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:10:22,394-Speed 9137.86 samples/sec   Loss 4.4597   LearningRate 0.0082   Epoch: 14   Global Step: 238240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:10:23,456-Speed 9651.41 samples/sec   Loss 4.2913   LearningRate 0.0082   Epoch: 14   Global Step: 238250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:24,499-Speed 9821.15 samples/sec   Loss 4.3532   LearningRate 0.0082   Epoch: 14   Global Step: 238260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:25,597-Speed 9334.21 samples/sec   Loss 4.4050   LearningRate 0.0082   Epoch: 14   Global Step: 238270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:26,727-Speed 9069.81 samples/sec   Loss 4.3047   LearningRate 0.0082   Epoch: 14   Global Step: 238280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:27,839-Speed 9215.96 samples/sec   Loss 4.3141   LearningRate 0.0082   Epoch: 14   Global Step: 238290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:28,959-Speed 9147.06 samples/sec   Loss 4.3652   LearningRate 0.0082   Epoch: 14   Global Step: 238300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:30,043-Speed 9454.92 samples/sec   Loss 4.3134   LearningRate 0.0082   Epoch: 14   Global Step: 238310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:31,100-Speed 9692.39 samples/sec   Loss 4.3428   LearningRate 0.0082   Epoch: 14   Global Step: 238320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:32,162-Speed 9646.45 samples/sec   Loss 4.4426   LearningRate 0.0082   Epoch: 14   Global Step: 238330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:33,259-Speed 9339.49 samples/sec   Loss 4.4246   LearningRate 0.0082   Epoch: 14   Global Step: 238340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:34,284-Speed 9995.37 samples/sec   Loss 4.3714   LearningRate 0.0082   Epoch: 14   Global Step: 238350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:10:35,369-Speed 9445.63 samples/sec   Loss 4.3376   LearningRate 0.0082   Epoch: 14   Global Step: 238360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:10:36,430-Speed 9656.08 samples/sec   Loss 4.3487   LearningRate 0.0082   Epoch: 14   Global Step: 238370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:37,524-Speed 9366.04 samples/sec   Loss 4.3257   LearningRate 0.0082   Epoch: 14   Global Step: 238380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:38,606-Speed 9467.54 samples/sec   Loss 4.3428   LearningRate 0.0082   Epoch: 14   Global Step: 238390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:39,716-Speed 9233.91 samples/sec   Loss 4.2963   LearningRate 0.0082   Epoch: 14   Global Step: 238400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:40,800-Speed 9448.61 samples/sec   Loss 4.3185   LearningRate 0.0082   Epoch: 14   Global Step: 238410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:41,875-Speed 9530.50 samples/sec   Loss 4.3701   LearningRate 0.0082   Epoch: 14   Global Step: 238420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:42,988-Speed 9206.70 samples/sec   Loss 4.3282   LearningRate 0.0082   Epoch: 14   Global Step: 238430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:44,082-Speed 9365.12 samples/sec   Loss 4.3436   LearningRate 0.0082   Epoch: 14   Global Step: 238440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:45,191-Speed 9244.90 samples/sec   Loss 4.4660   LearningRate 0.0082   Epoch: 14   Global Step: 238450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:46,271-Speed 9488.66 samples/sec   Loss 4.3441   LearningRate 0.0082   Epoch: 14   Global Step: 238460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:47,354-Speed 9464.93 samples/sec   Loss 4.3221   LearningRate 0.0082   Epoch: 14   Global Step: 238470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:10:48,418-Speed 9623.52 samples/sec   Loss 4.4655   LearningRate 0.0082   Epoch: 14   Global Step: 238480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:49,448-Speed 9949.15 samples/sec   Loss 4.3802   LearningRate 0.0082   Epoch: 14   Global Step: 238490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:50,513-Speed 9621.29 samples/sec   Loss 4.3806   LearningRate 0.0082   Epoch: 14   Global Step: 238500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:51,586-Speed 9544.12 samples/sec   Loss 4.2902   LearningRate 0.0082   Epoch: 14   Global Step: 238510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:52,701-Speed 9189.98 samples/sec   Loss 4.4443   LearningRate 0.0082   Epoch: 14   Global Step: 238520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:53,808-Speed 9258.52 samples/sec   Loss 4.3639   LearningRate 0.0081   Epoch: 14   Global Step: 238530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:54,867-Speed 9669.02 samples/sec   Loss 4.4810   LearningRate 0.0081   Epoch: 14   Global Step: 238540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:55,943-Speed 9521.88 samples/sec   Loss 4.4228   LearningRate 0.0081   Epoch: 14   Global Step: 238550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:57,031-Speed 9418.86 samples/sec   Loss 4.2909   LearningRate 0.0081   Epoch: 14   Global Step: 238560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:58,101-Speed 9582.48 samples/sec   Loss 4.3545   LearningRate 0.0081   Epoch: 14   Global Step: 238570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:10:59,192-Speed 9394.45 samples/sec   Loss 4.4234   LearningRate 0.0081   Epoch: 14   Global Step: 238580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:00,235-Speed 9822.65 samples/sec   Loss 4.3587   LearningRate 0.0081   Epoch: 14   Global Step: 238590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:01,301-Speed 9605.14 samples/sec   Loss 4.3228   LearningRate 0.0081   Epoch: 14   Global Step: 238600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:02,368-Speed 9602.18 samples/sec   Loss 4.2986   LearningRate 0.0081   Epoch: 14   Global Step: 238610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:03,444-Speed 9527.06 samples/sec   Loss 4.3631   LearningRate 0.0081   Epoch: 14   Global Step: 238620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:04,504-Speed 9662.01 samples/sec   Loss 4.2946   LearningRate 0.0081   Epoch: 14   Global Step: 238630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:05,580-Speed 9527.00 samples/sec   Loss 4.3719   LearningRate 0.0081   Epoch: 14   Global Step: 238640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:06,689-Speed 9239.51 samples/sec   Loss 4.3756   LearningRate 0.0081   Epoch: 14   Global Step: 238650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:07,781-Speed 9382.75 samples/sec   Loss 4.3506   LearningRate 0.0081   Epoch: 14   Global Step: 238660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:08,887-Speed 9268.75 samples/sec   Loss 4.3098   LearningRate 0.0081   Epoch: 14   Global Step: 238670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:09,962-Speed 9528.98 samples/sec   Loss 4.4306   LearningRate 0.0081   Epoch: 14   Global Step: 238680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:11,061-Speed 9325.90 samples/sec   Loss 4.4245   LearningRate 0.0081   Epoch: 14   Global Step: 238690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:12,107-Speed 9789.98 samples/sec   Loss 4.3196   LearningRate 0.0081   Epoch: 14   Global Step: 238700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:13,144-Speed 9880.52 samples/sec   Loss 4.4561   LearningRate 0.0081   Epoch: 14   Global Step: 238710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:14,239-Speed 9361.87 samples/sec   Loss 4.4685   LearningRate 0.0081   Epoch: 14   Global Step: 238720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:15,330-Speed 9398.89 samples/sec   Loss 4.3719   LearningRate 0.0081   Epoch: 14   Global Step: 238730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:16,481-Speed 8899.73 samples/sec   Loss 4.3858   LearningRate 0.0081   Epoch: 14   Global Step: 238740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:17,614-Speed 9045.67 samples/sec   Loss 4.3853   LearningRate 0.0081   Epoch: 14   Global Step: 238750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:18,741-Speed 9085.18 samples/sec   Loss 4.3647   LearningRate 0.0081   Epoch: 14   Global Step: 238760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:19,862-Speed 9140.52 samples/sec   Loss 4.3972   LearningRate 0.0081   Epoch: 14   Global Step: 238770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:20,930-Speed 9594.53 samples/sec   Loss 4.3847   LearningRate 0.0081   Epoch: 14   Global Step: 238780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:21,976-Speed 9789.95 samples/sec   Loss 4.3362   LearningRate 0.0081   Epoch: 14   Global Step: 238790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:23,046-Speed 9575.58 samples/sec   Loss 4.3859   LearningRate 0.0081   Epoch: 14   Global Step: 238800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:24,087-Speed 9844.84 samples/sec   Loss 4.4425   LearningRate 0.0081   Epoch: 14   Global Step: 238810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:25,135-Speed 9785.62 samples/sec   Loss 4.3657   LearningRate 0.0081   Epoch: 14   Global Step: 238820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:26,176-Speed 9838.97 samples/sec   Loss 4.3707   LearningRate 0.0081   Epoch: 14   Global Step: 238830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:27,258-Speed 9468.06 samples/sec   Loss 4.2933   LearningRate 0.0081   Epoch: 14   Global Step: 238840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:28,348-Speed 9403.26 samples/sec   Loss 4.4253   LearningRate 0.0081   Epoch: 14   Global Step: 238850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:29,407-Speed 9672.86 samples/sec   Loss 4.3683   LearningRate 0.0081   Epoch: 14   Global Step: 238860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:30,502-Speed 9360.30 samples/sec   Loss 4.3323   LearningRate 0.0081   Epoch: 14   Global Step: 238870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:31,593-Speed 9392.52 samples/sec   Loss 4.3862   LearningRate 0.0081   Epoch: 14   Global Step: 238880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:32,681-Speed 9410.97 samples/sec   Loss 4.3707   LearningRate 0.0081   Epoch: 14   Global Step: 238890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:33,767-Speed 9434.22 samples/sec   Loss 4.3819   LearningRate 0.0081   Epoch: 14   Global Step: 238900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:34,816-Speed 9772.67 samples/sec   Loss 4.3584   LearningRate 0.0081   Epoch: 14   Global Step: 238910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:35,892-Speed 9519.84 samples/sec   Loss 4.3989   LearningRate 0.0081   Epoch: 14   Global Step: 238920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:36,959-Speed 9603.14 samples/sec   Loss 4.3682   LearningRate 0.0081   Epoch: 14   Global Step: 238930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:38,088-Speed 9076.71 samples/sec   Loss 4.4413   LearningRate 0.0081   Epoch: 14   Global Step: 238940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:39,126-Speed 9875.81 samples/sec   Loss 4.4161   LearningRate 0.0081   Epoch: 14   Global Step: 238950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:40,193-Speed 9599.76 samples/sec   Loss 4.3902   LearningRate 0.0081   Epoch: 14   Global Step: 238960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:41,271-Speed 9501.18 samples/sec   Loss 4.4544   LearningRate 0.0081   Epoch: 14   Global Step: 238970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:42,327-Speed 9703.57 samples/sec   Loss 4.3959   LearningRate 0.0081   Epoch: 14   Global Step: 238980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:43,416-Speed 9409.59 samples/sec   Loss 4.3381   LearningRate 0.0081   Epoch: 14   Global Step: 238990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:44,438-Speed 10031.10 samples/sec   Loss 4.3187   LearningRate 0.0081   Epoch: 14   Global Step: 239000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:45,475-Speed 9887.91 samples/sec   Loss 4.5339   LearningRate 0.0081   Epoch: 14   Global Step: 239010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:46,574-Speed 9318.01 samples/sec   Loss 4.4027   LearningRate 0.0081   Epoch: 14   Global Step: 239020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:47,673-Speed 9320.53 samples/sec   Loss 4.4370   LearningRate 0.0081   Epoch: 14   Global Step: 239030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:48,729-Speed 9701.29 samples/sec   Loss 4.4218   LearningRate 0.0081   Epoch: 14   Global Step: 239040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:49,802-Speed 9556.32 samples/sec   Loss 4.3769   LearningRate 0.0081   Epoch: 14   Global Step: 239050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:50,857-Speed 9702.70 samples/sec   Loss 4.4152   LearningRate 0.0081   Epoch: 14   Global Step: 239060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:51,935-Speed 9507.82 samples/sec   Loss 4.4076   LearningRate 0.0081   Epoch: 14   Global Step: 239070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:53,046-Speed 9223.37 samples/sec   Loss 4.4709   LearningRate 0.0081   Epoch: 14   Global Step: 239080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:54,152-Speed 9261.49 samples/sec   Loss 4.3595   LearningRate 0.0081   Epoch: 14   Global Step: 239090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:55,241-Speed 9408.75 samples/sec   Loss 4.4028   LearningRate 0.0081   Epoch: 14   Global Step: 239100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:11:56,372-Speed 9061.68 samples/sec   Loss 4.4015   LearningRate 0.0080   Epoch: 14   Global Step: 239110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:57,439-Speed 9598.65 samples/sec   Loss 4.3798   LearningRate 0.0080   Epoch: 14   Global Step: 239120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:58,537-Speed 9331.88 samples/sec   Loss 4.4133   LearningRate 0.0080   Epoch: 14   Global Step: 239130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:11:59,649-Speed 9217.97 samples/sec   Loss 4.3507   LearningRate 0.0080   Epoch: 14   Global Step: 239140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:00,745-Speed 9352.06 samples/sec   Loss 4.3725   LearningRate 0.0080   Epoch: 14   Global Step: 239150   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:01,849-Speed 9286.39 samples/sec   Loss 4.4053   LearningRate 0.0080   Epoch: 14   Global Step: 239160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:02,965-Speed 9175.62 samples/sec   Loss 4.3392   LearningRate 0.0080   Epoch: 14   Global Step: 239170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:04,012-Speed 9785.49 samples/sec   Loss 4.2991   LearningRate 0.0080   Epoch: 14   Global Step: 239180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:05,132-Speed 9150.02 samples/sec   Loss 4.4174   LearningRate 0.0080   Epoch: 14   Global Step: 239190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:06,226-Speed 9361.22 samples/sec   Loss 4.4170   LearningRate 0.0080   Epoch: 14   Global Step: 239200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:07,286-Speed 9669.11 samples/sec   Loss 4.4449   LearningRate 0.0080   Epoch: 14   Global Step: 239210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:08,340-Speed 9720.84 samples/sec   Loss 4.2810   LearningRate 0.0080   Epoch: 14   Global Step: 239220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:09,440-Speed 9321.40 samples/sec   Loss 4.3206   LearningRate 0.0080   Epoch: 14   Global Step: 239230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:10,512-Speed 9552.42 samples/sec   Loss 4.3917   LearningRate 0.0080   Epoch: 14   Global Step: 239240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:11,608-Speed 9350.60 samples/sec   Loss 4.3882   LearningRate 0.0080   Epoch: 14   Global Step: 239250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:12,724-Speed 9175.51 samples/sec   Loss 4.4650   LearningRate 0.0080   Epoch: 14   Global Step: 239260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:13,885-Speed 8824.02 samples/sec   Loss 4.4090   LearningRate 0.0080   Epoch: 14   Global Step: 239270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:14,970-Speed 9446.00 samples/sec   Loss 4.3624   LearningRate 0.0080   Epoch: 14   Global Step: 239280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:16,059-Speed 9416.89 samples/sec   Loss 4.4838   LearningRate 0.0080   Epoch: 14   Global Step: 239290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:17,182-Speed 9118.71 samples/sec   Loss 4.3890   LearningRate 0.0080   Epoch: 14   Global Step: 239300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:18,242-Speed 9668.81 samples/sec   Loss 4.4584   LearningRate 0.0080   Epoch: 14   Global Step: 239310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:19,315-Speed 9549.03 samples/sec   Loss 4.4485   LearningRate 0.0080   Epoch: 14   Global Step: 239320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:12:20,396-Speed 9475.59 samples/sec   Loss 4.3997   LearningRate 0.0080   Epoch: 14   Global Step: 239330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:21,504-Speed 9252.67 samples/sec   Loss 4.3949   LearningRate 0.0080   Epoch: 14   Global Step: 239340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:22,588-Speed 9449.88 samples/sec   Loss 4.3288   LearningRate 0.0080   Epoch: 14   Global Step: 239350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:23,667-Speed 9498.47 samples/sec   Loss 4.4191   LearningRate 0.0080   Epoch: 14   Global Step: 239360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:24,741-Speed 9541.90 samples/sec   Loss 4.4084   LearningRate 0.0080   Epoch: 14   Global Step: 239370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:25,805-Speed 9624.98 samples/sec   Loss 4.3690   LearningRate 0.0080   Epoch: 14   Global Step: 239380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:26,890-Speed 9440.24 samples/sec   Loss 4.4262   LearningRate 0.0080   Epoch: 14   Global Step: 239390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:27,959-Speed 9590.81 samples/sec   Loss 4.3868   LearningRate 0.0080   Epoch: 14   Global Step: 239400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:29,067-Speed 9253.07 samples/sec   Loss 4.3703   LearningRate 0.0080   Epoch: 14   Global Step: 239410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:30,143-Speed 9522.85 samples/sec   Loss 4.3846   LearningRate 0.0080   Epoch: 14   Global Step: 239420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:31,244-Speed 9301.41 samples/sec   Loss 4.3794   LearningRate 0.0080   Epoch: 14   Global Step: 239430   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 21:12:32,321-Speed 9517.05 samples/sec   Loss 4.3358   LearningRate 0.0080   Epoch: 14   Global Step: 239440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:33,409-Speed 9415.55 samples/sec   Loss 4.4177   LearningRate 0.0080   Epoch: 14   Global Step: 239450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:34,460-Speed 9747.88 samples/sec   Loss 4.2885   LearningRate 0.0080   Epoch: 14   Global Step: 239460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:35,491-Speed 9942.36 samples/sec   Loss 4.3566   LearningRate 0.0080   Epoch: 14   Global Step: 239470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:36,576-Speed 9442.03 samples/sec   Loss 4.4360   LearningRate 0.0080   Epoch: 14   Global Step: 239480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:37,666-Speed 9400.01 samples/sec   Loss 4.3280   LearningRate 0.0080   Epoch: 14   Global Step: 239490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:38,752-Speed 9428.75 samples/sec   Loss 4.3833   LearningRate 0.0080   Epoch: 14   Global Step: 239500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:39,810-Speed 9692.76 samples/sec   Loss 4.4123   LearningRate 0.0080   Epoch: 14   Global Step: 239510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:40,861-Speed 9750.29 samples/sec   Loss 4.4040   LearningRate 0.0080   Epoch: 14   Global Step: 239520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:41,892-Speed 9940.36 samples/sec   Loss 4.4343   LearningRate 0.0080   Epoch: 14   Global Step: 239530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:42,949-Speed 9693.78 samples/sec   Loss 4.4104   LearningRate 0.0080   Epoch: 14   Global Step: 239540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:44,073-Speed 9113.66 samples/sec   Loss 4.4328   LearningRate 0.0080   Epoch: 14   Global Step: 239550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:45,130-Speed 9693.70 samples/sec   Loss 4.3957   LearningRate 0.0080   Epoch: 14   Global Step: 239560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:46,184-Speed 9724.02 samples/sec   Loss 4.4370   LearningRate 0.0080   Epoch: 14   Global Step: 239570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:47,284-Speed 9307.73 samples/sec   Loss 4.4288   LearningRate 0.0080   Epoch: 14   Global Step: 239580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:48,370-Speed 9435.35 samples/sec   Loss 4.4642   LearningRate 0.0080   Epoch: 14   Global Step: 239590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:49,471-Speed 9307.29 samples/sec   Loss 4.5233   LearningRate 0.0080   Epoch: 14   Global Step: 239600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:50,552-Speed 9482.14 samples/sec   Loss 4.3964   LearningRate 0.0080   Epoch: 14   Global Step: 239610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:51,635-Speed 9461.41 samples/sec   Loss 4.4084   LearningRate 0.0080   Epoch: 14   Global Step: 239620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:52,714-Speed 9488.58 samples/sec   Loss 4.3873   LearningRate 0.0080   Epoch: 14   Global Step: 239630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:53,833-Speed 9161.14 samples/sec   Loss 4.4924   LearningRate 0.0080   Epoch: 14   Global Step: 239640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:54,891-Speed 9683.43 samples/sec   Loss 4.3350   LearningRate 0.0080   Epoch: 14   Global Step: 239650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:55,969-Speed 9504.35 samples/sec   Loss 4.3272   LearningRate 0.0080   Epoch: 14   Global Step: 239660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:57,092-Speed 9126.99 samples/sec   Loss 4.3814   LearningRate 0.0080   Epoch: 14   Global Step: 239670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:58,213-Speed 9140.96 samples/sec   Loss 4.3724   LearningRate 0.0080   Epoch: 14   Global Step: 239680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:12:59,286-Speed 9550.33 samples/sec   Loss 4.3751   LearningRate 0.0080   Epoch: 14   Global Step: 239690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:13:00,351-Speed 9618.21 samples/sec   Loss 4.4377   LearningRate 0.0079   Epoch: 14   Global Step: 239700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:01,403-Speed 9741.29 samples/sec   Loss 4.4261   LearningRate 0.0079   Epoch: 14   Global Step: 239710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:02,522-Speed 9157.75 samples/sec   Loss 4.3430   LearningRate 0.0079   Epoch: 14   Global Step: 239720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:03,587-Speed 9623.36 samples/sec   Loss 4.4476   LearningRate 0.0079   Epoch: 14   Global Step: 239730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:04,648-Speed 9659.19 samples/sec   Loss 4.3622   LearningRate 0.0079   Epoch: 14   Global Step: 239740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:05,734-Speed 9428.64 samples/sec   Loss 4.3819   LearningRate 0.0079   Epoch: 14   Global Step: 239750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:06,813-Speed 9495.51 samples/sec   Loss 4.3986   LearningRate 0.0079   Epoch: 14   Global Step: 239760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:07,872-Speed 9679.70 samples/sec   Loss 4.3710   LearningRate 0.0079   Epoch: 14   Global Step: 239770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:08,920-Speed 9781.70 samples/sec   Loss 4.2489   LearningRate 0.0079   Epoch: 14   Global Step: 239780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:09,990-Speed 9573.63 samples/sec   Loss 4.4726   LearningRate 0.0079   Epoch: 14   Global Step: 239790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:11,070-Speed 9486.55 samples/sec   Loss 4.3418   LearningRate 0.0079   Epoch: 14   Global Step: 239800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:13:12,159-Speed 9404.53 samples/sec   Loss 4.3850   LearningRate 0.0079   Epoch: 14   Global Step: 239810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:13:13,268-Speed 9239.64 samples/sec   Loss 4.3414   LearningRate 0.0079   Epoch: 14   Global Step: 239820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:13:14,352-Speed 9458.03 samples/sec   Loss 4.4597   LearningRate 0.0079   Epoch: 14   Global Step: 239830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:13:15,468-Speed 9179.34 samples/sec   Loss 4.3275   LearningRate 0.0079   Epoch: 14   Global Step: 239840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:13:16,581-Speed 9209.55 samples/sec   Loss 4.4138   LearningRate 0.0079   Epoch: 14   Global Step: 239850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:13:17,653-Speed 9562.86 samples/sec   Loss 4.4813   LearningRate 0.0079   Epoch: 14   Global Step: 239860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:13:18,727-Speed 9537.79 samples/sec   Loss 4.4854   LearningRate 0.0079   Epoch: 14   Global Step: 239870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:13:19,802-Speed 9528.93 samples/sec   Loss 4.3414   LearningRate 0.0079   Epoch: 14   Global Step: 239880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:20,904-Speed 9302.05 samples/sec   Loss 4.4257   LearningRate 0.0079   Epoch: 14   Global Step: 239890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:22,031-Speed 9087.93 samples/sec   Loss 4.3829   LearningRate 0.0079   Epoch: 14   Global Step: 239900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:23,115-Speed 9450.48 samples/sec   Loss 4.4059   LearningRate 0.0079   Epoch: 14   Global Step: 239910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:24,230-Speed 9190.76 samples/sec   Loss 4.4383   LearningRate 0.0079   Epoch: 14   Global Step: 239920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:25,308-Speed 9502.10 samples/sec   Loss 4.4436   LearningRate 0.0079   Epoch: 14   Global Step: 239930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:26,397-Speed 9411.41 samples/sec   Loss 4.3556   LearningRate 0.0079   Epoch: 14   Global Step: 239940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:27,460-Speed 9644.84 samples/sec   Loss 4.2976   LearningRate 0.0079   Epoch: 14   Global Step: 239950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:28,560-Speed 9315.03 samples/sec   Loss 4.4591   LearningRate 0.0079   Epoch: 14   Global Step: 239960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:29,645-Speed 9435.25 samples/sec   Loss 4.4249   LearningRate 0.0079   Epoch: 14   Global Step: 239970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:13:30,733-Speed 9424.23 samples/sec   Loss 4.4070   LearningRate 0.0079   Epoch: 14   Global Step: 239980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:13:31,792-Speed 9674.62 samples/sec   Loss 4.4104   LearningRate 0.0079   Epoch: 14   Global Step: 239990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:13:32,861-Speed 9578.68 samples/sec   Loss 4.3986   LearningRate 0.0079   Epoch: 14   Global Step: 240000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:13:54,594-[lfw][240000]XNorm: 7.469457
Training: 2022-04-11 21:13:54,594-[lfw][240000]Accuracy-Flip: 0.99683+-0.00302
Training: 2022-04-11 21:13:54,595-[lfw][240000]Accuracy-Highest: 0.99733
Training: 2022-04-11 21:14:19,743-[cfp_fp][240000]XNorm: 6.468526
Training: 2022-04-11 21:14:19,744-[cfp_fp][240000]Accuracy-Flip: 0.96857+-0.01026
Training: 2022-04-11 21:14:19,744-[cfp_fp][240000]Accuracy-Highest: 0.97143
Training: 2022-04-11 21:14:41,446-[agedb_30][240000]XNorm: 7.240392
Training: 2022-04-11 21:14:41,447-[agedb_30][240000]Accuracy-Flip: 0.97150+-0.00908
Training: 2022-04-11 21:14:41,447-[agedb_30][240000]Accuracy-Highest: 0.97250
Training: 2022-04-11 21:14:42,551-Speed 146.94 samples/sec   Loss 4.4464   LearningRate 0.0079   Epoch: 14   Global Step: 240010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:14:43,592-Speed 9845.75 samples/sec   Loss 4.3968   LearningRate 0.0079   Epoch: 14   Global Step: 240020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:14:44,648-Speed 9705.52 samples/sec   Loss 4.3953   LearningRate 0.0079   Epoch: 14   Global Step: 240030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:14:45,765-Speed 9170.46 samples/sec   Loss 4.4535   LearningRate 0.0079   Epoch: 14   Global Step: 240040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:14:46,819-Speed 9716.07 samples/sec   Loss 4.2955   LearningRate 0.0079   Epoch: 14   Global Step: 240050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:14:47,970-Speed 8905.55 samples/sec   Loss 4.3955   LearningRate 0.0079   Epoch: 14   Global Step: 240060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:14:49,079-Speed 9241.17 samples/sec   Loss 4.4369   LearningRate 0.0079   Epoch: 14   Global Step: 240070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:14:50,161-Speed 9466.63 samples/sec   Loss 4.3814   LearningRate 0.0079   Epoch: 14   Global Step: 240080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:14:51,244-Speed 9463.64 samples/sec   Loss 4.3827   LearningRate 0.0079   Epoch: 14   Global Step: 240090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:14:52,333-Speed 9403.82 samples/sec   Loss 4.3772   LearningRate 0.0079   Epoch: 14   Global Step: 240100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:14:53,414-Speed 9479.24 samples/sec   Loss 4.4917   LearningRate 0.0079   Epoch: 14   Global Step: 240110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:14:54,498-Speed 9456.21 samples/sec   Loss 4.3811   LearningRate 0.0079   Epoch: 14   Global Step: 240120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:14:55,566-Speed 9595.90 samples/sec   Loss 4.3743   LearningRate 0.0079   Epoch: 14   Global Step: 240130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:14:56,614-Speed 9775.92 samples/sec   Loss 4.4033   LearningRate 0.0079   Epoch: 14   Global Step: 240140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:14:57,685-Speed 9566.26 samples/sec   Loss 4.3900   LearningRate 0.0079   Epoch: 14   Global Step: 240150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:14:58,747-Speed 9647.19 samples/sec   Loss 4.3973   LearningRate 0.0079   Epoch: 14   Global Step: 240160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:14:59,841-Speed 9360.56 samples/sec   Loss 4.3358   LearningRate 0.0079   Epoch: 14   Global Step: 240170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:00,913-Speed 9565.90 samples/sec   Loss 4.3874   LearningRate 0.0079   Epoch: 14   Global Step: 240180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:01,955-Speed 9828.72 samples/sec   Loss 4.4206   LearningRate 0.0079   Epoch: 14   Global Step: 240190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:03,020-Speed 9616.76 samples/sec   Loss 4.4025   LearningRate 0.0079   Epoch: 14   Global Step: 240200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:04,082-Speed 9653.11 samples/sec   Loss 4.4447   LearningRate 0.0079   Epoch: 14   Global Step: 240210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:05,169-Speed 9425.89 samples/sec   Loss 4.4345   LearningRate 0.0079   Epoch: 14   Global Step: 240220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:06,238-Speed 9585.74 samples/sec   Loss 4.5363   LearningRate 0.0079   Epoch: 14   Global Step: 240230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:07,289-Speed 9752.03 samples/sec   Loss 4.4560   LearningRate 0.0079   Epoch: 14   Global Step: 240240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:08,381-Speed 9377.23 samples/sec   Loss 4.3559   LearningRate 0.0079   Epoch: 14   Global Step: 240250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:09,488-Speed 9259.80 samples/sec   Loss 4.4575   LearningRate 0.0079   Epoch: 14   Global Step: 240260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:10,565-Speed 9511.46 samples/sec   Loss 4.4346   LearningRate 0.0079   Epoch: 14   Global Step: 240270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:11,675-Speed 9226.25 samples/sec   Loss 4.3247   LearningRate 0.0079   Epoch: 14   Global Step: 240280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:12,785-Speed 9233.67 samples/sec   Loss 4.3593   LearningRate 0.0079   Epoch: 14   Global Step: 240290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:13,865-Speed 9492.83 samples/sec   Loss 4.3966   LearningRate 0.0078   Epoch: 14   Global Step: 240300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:14,948-Speed 9460.10 samples/sec   Loss 4.4378   LearningRate 0.0078   Epoch: 14   Global Step: 240310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:16,029-Speed 9479.67 samples/sec   Loss 4.4614   LearningRate 0.0078   Epoch: 14   Global Step: 240320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:17,117-Speed 9413.02 samples/sec   Loss 4.4286   LearningRate 0.0078   Epoch: 14   Global Step: 240330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:18,230-Speed 9208.01 samples/sec   Loss 4.4156   LearningRate 0.0078   Epoch: 14   Global Step: 240340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:19,277-Speed 9781.71 samples/sec   Loss 4.3468   LearningRate 0.0078   Epoch: 14   Global Step: 240350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:20,370-Speed 9370.85 samples/sec   Loss 4.3971   LearningRate 0.0078   Epoch: 14   Global Step: 240360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:21,484-Speed 9204.64 samples/sec   Loss 4.3696   LearningRate 0.0078   Epoch: 14   Global Step: 240370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:22,591-Speed 9253.44 samples/sec   Loss 4.3955   LearningRate 0.0078   Epoch: 14   Global Step: 240380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:23,681-Speed 9397.45 samples/sec   Loss 4.4430   LearningRate 0.0078   Epoch: 14   Global Step: 240390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:24,790-Speed 9242.22 samples/sec   Loss 4.3475   LearningRate 0.0078   Epoch: 14   Global Step: 240400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:25,876-Speed 9434.42 samples/sec   Loss 4.3776   LearningRate 0.0078   Epoch: 14   Global Step: 240410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:26,947-Speed 9569.05 samples/sec   Loss 4.3954   LearningRate 0.0078   Epoch: 14   Global Step: 240420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:28,013-Speed 9610.72 samples/sec   Loss 4.3566   LearningRate 0.0078   Epoch: 14   Global Step: 240430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:29,085-Speed 9562.68 samples/sec   Loss 4.4958   LearningRate 0.0078   Epoch: 14   Global Step: 240440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:30,163-Speed 9506.73 samples/sec   Loss 4.4525   LearningRate 0.0078   Epoch: 14   Global Step: 240450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:31,225-Speed 9644.02 samples/sec   Loss 4.4108   LearningRate 0.0078   Epoch: 14   Global Step: 240460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:32,279-Speed 9728.18 samples/sec   Loss 4.4623   LearningRate 0.0078   Epoch: 14   Global Step: 240470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:33,371-Speed 9375.68 samples/sec   Loss 4.4200   LearningRate 0.0078   Epoch: 14   Global Step: 240480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:34,469-Speed 9337.34 samples/sec   Loss 4.4459   LearningRate 0.0078   Epoch: 14   Global Step: 240490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:35,560-Speed 9386.46 samples/sec   Loss 4.3748   LearningRate 0.0078   Epoch: 14   Global Step: 240500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:36,645-Speed 9447.90 samples/sec   Loss 4.4820   LearningRate 0.0078   Epoch: 14   Global Step: 240510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:37,687-Speed 9825.71 samples/sec   Loss 4.4329   LearningRate 0.0078   Epoch: 14   Global Step: 240520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:38,752-Speed 9623.20 samples/sec   Loss 4.3559   LearningRate 0.0078   Epoch: 14   Global Step: 240530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:39,781-Speed 9955.74 samples/sec   Loss 4.4862   LearningRate 0.0078   Epoch: 14   Global Step: 240540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:40,895-Speed 9194.37 samples/sec   Loss 4.3281   LearningRate 0.0078   Epoch: 14   Global Step: 240550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:41,970-Speed 9536.09 samples/sec   Loss 4.4658   LearningRate 0.0078   Epoch: 14   Global Step: 240560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:43,079-Speed 9235.22 samples/sec   Loss 4.3843   LearningRate 0.0078   Epoch: 14   Global Step: 240570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:44,189-Speed 9237.28 samples/sec   Loss 4.3764   LearningRate 0.0078   Epoch: 14   Global Step: 240580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:45,237-Speed 9774.40 samples/sec   Loss 4.4216   LearningRate 0.0078   Epoch: 14   Global Step: 240590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:46,312-Speed 9530.44 samples/sec   Loss 4.4194   LearningRate 0.0078   Epoch: 14   Global Step: 240600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:47,398-Speed 9435.39 samples/sec   Loss 4.4229   LearningRate 0.0078   Epoch: 14   Global Step: 240610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:48,489-Speed 9397.43 samples/sec   Loss 4.4400   LearningRate 0.0078   Epoch: 14   Global Step: 240620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:49,579-Speed 9396.56 samples/sec   Loss 4.5387   LearningRate 0.0078   Epoch: 14   Global Step: 240630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:50,633-Speed 9725.02 samples/sec   Loss 4.4796   LearningRate 0.0078   Epoch: 14   Global Step: 240640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:51,713-Speed 9479.69 samples/sec   Loss 4.4156   LearningRate 0.0078   Epoch: 14   Global Step: 240650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:52,823-Speed 9235.72 samples/sec   Loss 4.3549   LearningRate 0.0078   Epoch: 14   Global Step: 240660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:53,949-Speed 9097.80 samples/sec   Loss 4.3972   LearningRate 0.0078   Epoch: 14   Global Step: 240670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:55,039-Speed 9404.26 samples/sec   Loss 4.4203   LearningRate 0.0078   Epoch: 14   Global Step: 240680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:56,138-Speed 9317.56 samples/sec   Loss 4.3970   LearningRate 0.0078   Epoch: 14   Global Step: 240690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:57,219-Speed 9480.43 samples/sec   Loss 4.3118   LearningRate 0.0078   Epoch: 14   Global Step: 240700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:15:58,301-Speed 9471.41 samples/sec   Loss 4.3718   LearningRate 0.0078   Epoch: 14   Global Step: 240710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:15:59,366-Speed 9622.93 samples/sec   Loss 4.3719   LearningRate 0.0078   Epoch: 14   Global Step: 240720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:00,472-Speed 9259.67 samples/sec   Loss 4.3434   LearningRate 0.0078   Epoch: 14   Global Step: 240730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:01,566-Speed 9370.82 samples/sec   Loss 4.3632   LearningRate 0.0078   Epoch: 14   Global Step: 240740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:02,648-Speed 9467.21 samples/sec   Loss 4.4008   LearningRate 0.0078   Epoch: 14   Global Step: 240750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:03,743-Speed 9358.50 samples/sec   Loss 4.3918   LearningRate 0.0078   Epoch: 14   Global Step: 240760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:04,812-Speed 9588.59 samples/sec   Loss 4.4019   LearningRate 0.0078   Epoch: 14   Global Step: 240770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:05,886-Speed 9537.02 samples/sec   Loss 4.4272   LearningRate 0.0078   Epoch: 14   Global Step: 240780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:06,964-Speed 9500.42 samples/sec   Loss 4.4069   LearningRate 0.0078   Epoch: 14   Global Step: 240790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:07,999-Speed 9908.11 samples/sec   Loss 4.3993   LearningRate 0.0078   Epoch: 14   Global Step: 240800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:09,048-Speed 9762.97 samples/sec   Loss 4.4828   LearningRate 0.0078   Epoch: 14   Global Step: 240810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:10,115-Speed 9599.99 samples/sec   Loss 4.3806   LearningRate 0.0078   Epoch: 14   Global Step: 240820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:11,144-Speed 9959.77 samples/sec   Loss 4.4035   LearningRate 0.0078   Epoch: 14   Global Step: 240830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:12,281-Speed 9013.37 samples/sec   Loss 4.4129   LearningRate 0.0078   Epoch: 14   Global Step: 240840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:13,364-Speed 9452.35 samples/sec   Loss 4.5425   LearningRate 0.0078   Epoch: 14   Global Step: 240850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:14,459-Speed 9368.47 samples/sec   Loss 4.4268   LearningRate 0.0078   Epoch: 14   Global Step: 240860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:15,490-Speed 9930.32 samples/sec   Loss 4.3691   LearningRate 0.0078   Epoch: 14   Global Step: 240870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:16,579-Speed 9407.18 samples/sec   Loss 4.4160   LearningRate 0.0078   Epoch: 14   Global Step: 240880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:17,688-Speed 9246.33 samples/sec   Loss 4.4275   LearningRate 0.0077   Epoch: 14   Global Step: 240890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:18,818-Speed 9066.05 samples/sec   Loss 4.4320   LearningRate 0.0077   Epoch: 14   Global Step: 240900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:19,884-Speed 9604.70 samples/sec   Loss 4.3921   LearningRate 0.0077   Epoch: 14   Global Step: 240910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:20,976-Speed 9389.40 samples/sec   Loss 4.4423   LearningRate 0.0077   Epoch: 14   Global Step: 240920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:22,102-Speed 9100.03 samples/sec   Loss 4.3587   LearningRate 0.0077   Epoch: 14   Global Step: 240930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:23,175-Speed 9548.34 samples/sec   Loss 4.4404   LearningRate 0.0077   Epoch: 14   Global Step: 240940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:24,287-Speed 9221.83 samples/sec   Loss 4.4251   LearningRate 0.0077   Epoch: 14   Global Step: 240950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:25,367-Speed 9481.08 samples/sec   Loss 4.3817   LearningRate 0.0077   Epoch: 14   Global Step: 240960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:26,424-Speed 9699.78 samples/sec   Loss 4.3925   LearningRate 0.0077   Epoch: 14   Global Step: 240970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:27,509-Speed 9438.89 samples/sec   Loss 4.4314   LearningRate 0.0077   Epoch: 14   Global Step: 240980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:28,634-Speed 9104.33 samples/sec   Loss 4.4145   LearningRate 0.0077   Epoch: 14   Global Step: 240990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:29,765-Speed 9066.65 samples/sec   Loss 4.3579   LearningRate 0.0077   Epoch: 14   Global Step: 241000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:30,879-Speed 9200.02 samples/sec   Loss 4.3772   LearningRate 0.0077   Epoch: 14   Global Step: 241010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:31,954-Speed 9530.54 samples/sec   Loss 4.4399   LearningRate 0.0077   Epoch: 14   Global Step: 241020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:33,040-Speed 9429.66 samples/sec   Loss 4.4460   LearningRate 0.0077   Epoch: 14   Global Step: 241030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:34,124-Speed 9452.63 samples/sec   Loss 4.4505   LearningRate 0.0077   Epoch: 14   Global Step: 241040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:35,209-Speed 9443.41 samples/sec   Loss 4.4666   LearningRate 0.0077   Epoch: 14   Global Step: 241050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:36,291-Speed 9466.09 samples/sec   Loss 4.3019   LearningRate 0.0077   Epoch: 14   Global Step: 241060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:37,348-Speed 9697.16 samples/sec   Loss 4.3730   LearningRate 0.0077   Epoch: 14   Global Step: 241070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:38,427-Speed 9499.19 samples/sec   Loss 4.5167   LearningRate 0.0077   Epoch: 14   Global Step: 241080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:39,512-Speed 9436.35 samples/sec   Loss 4.3236   LearningRate 0.0077   Epoch: 14   Global Step: 241090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:40,632-Speed 9148.58 samples/sec   Loss 4.3988   LearningRate 0.0077   Epoch: 14   Global Step: 241100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:41,745-Speed 9208.18 samples/sec   Loss 4.3693   LearningRate 0.0077   Epoch: 14   Global Step: 241110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:42,835-Speed 9401.75 samples/sec   Loss 4.4150   LearningRate 0.0077   Epoch: 14   Global Step: 241120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:43,898-Speed 9643.74 samples/sec   Loss 4.3358   LearningRate 0.0077   Epoch: 14   Global Step: 241130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:44,984-Speed 9438.43 samples/sec   Loss 4.3279   LearningRate 0.0077   Epoch: 14   Global Step: 241140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:46,068-Speed 9454.77 samples/sec   Loss 4.4989   LearningRate 0.0077   Epoch: 14   Global Step: 241150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:47,109-Speed 9839.14 samples/sec   Loss 4.3106   LearningRate 0.0077   Epoch: 14   Global Step: 241160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:48,166-Speed 9693.58 samples/sec   Loss 4.4780   LearningRate 0.0077   Epoch: 14   Global Step: 241170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:49,257-Speed 9393.53 samples/sec   Loss 4.4650   LearningRate 0.0077   Epoch: 14   Global Step: 241180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:50,302-Speed 9799.06 samples/sec   Loss 4.3464   LearningRate 0.0077   Epoch: 14   Global Step: 241190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:51,355-Speed 9730.29 samples/sec   Loss 4.4208   LearningRate 0.0077   Epoch: 14   Global Step: 241200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:52,422-Speed 9604.53 samples/sec   Loss 4.4372   LearningRate 0.0077   Epoch: 14   Global Step: 241210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:53,495-Speed 9545.96 samples/sec   Loss 4.5553   LearningRate 0.0077   Epoch: 14   Global Step: 241220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:54,594-Speed 9331.23 samples/sec   Loss 4.4147   LearningRate 0.0077   Epoch: 14   Global Step: 241230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:55,625-Speed 9934.82 samples/sec   Loss 4.4389   LearningRate 0.0077   Epoch: 14   Global Step: 241240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:16:56,681-Speed 9699.38 samples/sec   Loss 4.3351   LearningRate 0.0077   Epoch: 14   Global Step: 241250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:57,790-Speed 9236.49 samples/sec   Loss 4.3890   LearningRate 0.0077   Epoch: 14   Global Step: 241260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:16:58,901-Speed 9224.42 samples/sec   Loss 4.3145   LearningRate 0.0077   Epoch: 14   Global Step: 241270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:00,025-Speed 9124.47 samples/sec   Loss 4.4555   LearningRate 0.0077   Epoch: 14   Global Step: 241280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:01,146-Speed 9149.25 samples/sec   Loss 4.4213   LearningRate 0.0077   Epoch: 14   Global Step: 241290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:02,247-Speed 9300.42 samples/sec   Loss 4.4916   LearningRate 0.0077   Epoch: 14   Global Step: 241300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:03,325-Speed 9506.98 samples/sec   Loss 4.3392   LearningRate 0.0077   Epoch: 14   Global Step: 241310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:04,393-Speed 9590.23 samples/sec   Loss 4.5262   LearningRate 0.0077   Epoch: 14   Global Step: 241320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:05,496-Speed 9291.45 samples/sec   Loss 4.4285   LearningRate 0.0077   Epoch: 14   Global Step: 241330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:06,587-Speed 9390.57 samples/sec   Loss 4.3355   LearningRate 0.0077   Epoch: 14   Global Step: 241340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:07,691-Speed 9277.92 samples/sec   Loss 4.4699   LearningRate 0.0077   Epoch: 14   Global Step: 241350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:08,751-Speed 9669.77 samples/sec   Loss 4.4961   LearningRate 0.0077   Epoch: 14   Global Step: 241360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:09,786-Speed 9899.60 samples/sec   Loss 4.4544   LearningRate 0.0077   Epoch: 14   Global Step: 241370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:10,884-Speed 9329.69 samples/sec   Loss 4.3407   LearningRate 0.0077   Epoch: 14   Global Step: 241380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:11,968-Speed 9454.80 samples/sec   Loss 4.4763   LearningRate 0.0077   Epoch: 14   Global Step: 241390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:13,060-Speed 9382.45 samples/sec   Loss 4.5362   LearningRate 0.0077   Epoch: 14   Global Step: 241400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:14,160-Speed 9315.00 samples/sec   Loss 4.4977   LearningRate 0.0077   Epoch: 14   Global Step: 241410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:15,244-Speed 9452.09 samples/sec   Loss 4.5040   LearningRate 0.0077   Epoch: 14   Global Step: 241420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:16,332-Speed 9414.49 samples/sec   Loss 4.4439   LearningRate 0.0077   Epoch: 14   Global Step: 241430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:17,414-Speed 9469.13 samples/sec   Loss 4.4856   LearningRate 0.0077   Epoch: 14   Global Step: 241440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:18,494-Speed 9493.79 samples/sec   Loss 4.4676   LearningRate 0.0077   Epoch: 14   Global Step: 241450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:19,584-Speed 9397.02 samples/sec   Loss 4.3479   LearningRate 0.0077   Epoch: 14   Global Step: 241460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:20,655-Speed 9567.54 samples/sec   Loss 4.4552   LearningRate 0.0077   Epoch: 14   Global Step: 241470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:21,733-Speed 9501.25 samples/sec   Loss 4.3708   LearningRate 0.0077   Epoch: 14   Global Step: 241480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:22,795-Speed 9651.98 samples/sec   Loss 4.4138   LearningRate 0.0076   Epoch: 14   Global Step: 241490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:23,872-Speed 9513.79 samples/sec   Loss 4.4502   LearningRate 0.0076   Epoch: 14   Global Step: 241500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:24,963-Speed 9389.71 samples/sec   Loss 4.4024   LearningRate 0.0076   Epoch: 14   Global Step: 241510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:26,056-Speed 9380.65 samples/sec   Loss 4.3665   LearningRate 0.0076   Epoch: 14   Global Step: 241520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:27,131-Speed 9523.82 samples/sec   Loss 4.4440   LearningRate 0.0076   Epoch: 14   Global Step: 241530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:28,174-Speed 9826.50 samples/sec   Loss 4.3728   LearningRate 0.0076   Epoch: 14   Global Step: 241540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:29,261-Speed 9428.43 samples/sec   Loss 4.4938   LearningRate 0.0076   Epoch: 14   Global Step: 241550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:30,318-Speed 9688.84 samples/sec   Loss 4.3796   LearningRate 0.0076   Epoch: 14   Global Step: 241560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:31,400-Speed 9476.27 samples/sec   Loss 4.2956   LearningRate 0.0076   Epoch: 14   Global Step: 241570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:32,491-Speed 9392.40 samples/sec   Loss 4.3659   LearningRate 0.0076   Epoch: 14   Global Step: 241580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:33,638-Speed 8927.39 samples/sec   Loss 4.4217   LearningRate 0.0076   Epoch: 14   Global Step: 241590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:34,692-Speed 9722.29 samples/sec   Loss 4.4439   LearningRate 0.0076   Epoch: 14   Global Step: 241600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:35,736-Speed 9817.01 samples/sec   Loss 4.4642   LearningRate 0.0076   Epoch: 14   Global Step: 241610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:36,820-Speed 9448.84 samples/sec   Loss 4.4491   LearningRate 0.0076   Epoch: 14   Global Step: 241620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:37,872-Speed 9740.22 samples/sec   Loss 4.4912   LearningRate 0.0076   Epoch: 14   Global Step: 241630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:38,944-Speed 9561.40 samples/sec   Loss 4.4742   LearningRate 0.0076   Epoch: 14   Global Step: 241640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:40,018-Speed 9540.59 samples/sec   Loss 4.4288   LearningRate 0.0076   Epoch: 14   Global Step: 241650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:41,119-Speed 9305.95 samples/sec   Loss 4.4425   LearningRate 0.0076   Epoch: 14   Global Step: 241660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:42,206-Speed 9427.64 samples/sec   Loss 4.3515   LearningRate 0.0076   Epoch: 14   Global Step: 241670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:43,278-Speed 9555.91 samples/sec   Loss 4.4007   LearningRate 0.0076   Epoch: 14   Global Step: 241680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:44,351-Speed 9554.32 samples/sec   Loss 4.5042   LearningRate 0.0076   Epoch: 14   Global Step: 241690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:45,403-Speed 9738.82 samples/sec   Loss 4.3873   LearningRate 0.0076   Epoch: 14   Global Step: 241700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:46,494-Speed 9387.80 samples/sec   Loss 4.3931   LearningRate 0.0076   Epoch: 14   Global Step: 241710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:47,539-Speed 9803.67 samples/sec   Loss 4.5124   LearningRate 0.0076   Epoch: 14   Global Step: 241720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:48,625-Speed 9441.38 samples/sec   Loss 4.3612   LearningRate 0.0076   Epoch: 14   Global Step: 241730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:49,721-Speed 9342.48 samples/sec   Loss 4.3850   LearningRate 0.0076   Epoch: 14   Global Step: 241740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:50,822-Speed 9305.06 samples/sec   Loss 4.4673   LearningRate 0.0076   Epoch: 14   Global Step: 241750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:51,949-Speed 9096.15 samples/sec   Loss 4.4323   LearningRate 0.0076   Epoch: 14   Global Step: 241760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:53,019-Speed 9574.58 samples/sec   Loss 4.4136   LearningRate 0.0076   Epoch: 14   Global Step: 241770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:54,141-Speed 9128.54 samples/sec   Loss 4.4444   LearningRate 0.0076   Epoch: 14   Global Step: 241780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:55,243-Speed 9306.51 samples/sec   Loss 4.4158   LearningRate 0.0076   Epoch: 14   Global Step: 241790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:56,324-Speed 9472.88 samples/sec   Loss 4.4193   LearningRate 0.0076   Epoch: 14   Global Step: 241800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:57,417-Speed 9382.85 samples/sec   Loss 4.5000   LearningRate 0.0076   Epoch: 14   Global Step: 241810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:17:58,482-Speed 9615.00 samples/sec   Loss 4.4084   LearningRate 0.0076   Epoch: 14   Global Step: 241820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:17:59,544-Speed 9651.74 samples/sec   Loss 4.4855   LearningRate 0.0076   Epoch: 14   Global Step: 241830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:18:00,634-Speed 9399.66 samples/sec   Loss 4.5434   LearningRate 0.0076   Epoch: 14   Global Step: 241840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:18:01,726-Speed 9382.76 samples/sec   Loss 4.4097   LearningRate 0.0076   Epoch: 14   Global Step: 241850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:18:02,823-Speed 9343.62 samples/sec   Loss 4.4250   LearningRate 0.0076   Epoch: 14   Global Step: 241860   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:18:03,880-Speed 9694.49 samples/sec   Loss 4.4804   LearningRate 0.0076   Epoch: 14   Global Step: 241870   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:18:04,970-Speed 9392.61 samples/sec   Loss 4.5278   LearningRate 0.0076   Epoch: 14   Global Step: 241880   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:18:06,064-Speed 9371.77 samples/sec   Loss 4.4433   LearningRate 0.0076   Epoch: 14   Global Step: 241890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:18:07,136-Speed 9550.14 samples/sec   Loss 4.4226   LearningRate 0.0076   Epoch: 14   Global Step: 241900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:18:08,225-Speed 9409.38 samples/sec   Loss 4.4558   LearningRate 0.0076   Epoch: 14   Global Step: 241910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:18:09,325-Speed 9314.44 samples/sec   Loss 4.4661   LearningRate 0.0076   Epoch: 14   Global Step: 241920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:18:10,386-Speed 9661.91 samples/sec   Loss 4.4640   LearningRate 0.0076   Epoch: 14   Global Step: 241930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:18:11,479-Speed 9369.48 samples/sec   Loss 4.4553   LearningRate 0.0076   Epoch: 14   Global Step: 241940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:18:12,564-Speed 9445.48 samples/sec   Loss 4.3648   LearningRate 0.0076   Epoch: 14   Global Step: 241950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:18:13,670-Speed 9260.74 samples/sec   Loss 4.4183   LearningRate 0.0076   Epoch: 14   Global Step: 241960   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:18:14,735-Speed 9620.08 samples/sec   Loss 4.3640   LearningRate 0.0076   Epoch: 14   Global Step: 241970   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:18:15,820-Speed 9539.04 samples/sec   Loss 4.5184   LearningRate 0.0076   Epoch: 14   Global Step: 241980   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:18:16,902-Speed 9472.41 samples/sec   Loss 4.4235   LearningRate 0.0076   Epoch: 14   Global Step: 241990   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:18:18,037-Speed 9032.18 samples/sec   Loss 4.4616   LearningRate 0.0076   Epoch: 14   Global Step: 242000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:18:40,165-[lfw][242000]XNorm: 7.533228
Training: 2022-04-11 21:18:40,165-[lfw][242000]Accuracy-Flip: 0.99583+-0.00261
Training: 2022-04-11 21:18:40,166-[lfw][242000]Accuracy-Highest: 0.99733
Training: 2022-04-11 21:19:05,632-[cfp_fp][242000]XNorm: 6.510685
Training: 2022-04-11 21:19:05,633-[cfp_fp][242000]Accuracy-Flip: 0.96786+-0.00874
Training: 2022-04-11 21:19:05,633-[cfp_fp][242000]Accuracy-Highest: 0.97143
Training: 2022-04-11 21:19:27,513-[agedb_30][242000]XNorm: 7.333325
Training: 2022-04-11 21:19:27,514-[agedb_30][242000]Accuracy-Flip: 0.97233+-0.00879
Training: 2022-04-11 21:19:27,514-[agedb_30][242000]Accuracy-Highest: 0.97250
Training: 2022-04-11 21:19:28,562-Speed 145.20 samples/sec   Loss 4.3943   LearningRate 0.0076   Epoch: 14   Global Step: 242010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:29,628-Speed 9611.64 samples/sec   Loss 4.4348   LearningRate 0.0076   Epoch: 14   Global Step: 242020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:19:30,724-Speed 9352.32 samples/sec   Loss 4.4556   LearningRate 0.0076   Epoch: 14   Global Step: 242030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:19:31,806-Speed 9461.96 samples/sec   Loss 4.4900   LearningRate 0.0076   Epoch: 14   Global Step: 242040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:32,909-Speed 9291.55 samples/sec   Loss 4.4181   LearningRate 0.0076   Epoch: 14   Global Step: 242050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:33,974-Speed 9620.97 samples/sec   Loss 4.3913   LearningRate 0.0076   Epoch: 14   Global Step: 242060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:35,047-Speed 9554.13 samples/sec   Loss 4.5014   LearningRate 0.0076   Epoch: 14   Global Step: 242070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:36,154-Speed 9252.78 samples/sec   Loss 4.3996   LearningRate 0.0076   Epoch: 14   Global Step: 242080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:37,235-Speed 9478.57 samples/sec   Loss 4.4059   LearningRate 0.0076   Epoch: 14   Global Step: 242090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:38,312-Speed 9507.70 samples/sec   Loss 4.4306   LearningRate 0.0075   Epoch: 14   Global Step: 242100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:39,413-Speed 9311.33 samples/sec   Loss 4.4025   LearningRate 0.0075   Epoch: 14   Global Step: 242110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:40,489-Speed 9518.73 samples/sec   Loss 4.3773   LearningRate 0.0075   Epoch: 14   Global Step: 242120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:41,559-Speed 9573.09 samples/sec   Loss 4.4327   LearningRate 0.0075   Epoch: 14   Global Step: 242130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:42,613-Speed 9718.22 samples/sec   Loss 4.4687   LearningRate 0.0075   Epoch: 14   Global Step: 242140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:19:43,682-Speed 9586.13 samples/sec   Loss 4.4441   LearningRate 0.0075   Epoch: 14   Global Step: 242150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:19:44,748-Speed 9617.87 samples/sec   Loss 4.5702   LearningRate 0.0075   Epoch: 14   Global Step: 242160   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:45,849-Speed 9310.91 samples/sec   Loss 4.4166   LearningRate 0.0075   Epoch: 14   Global Step: 242170   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:46,924-Speed 9528.57 samples/sec   Loss 4.4307   LearningRate 0.0075   Epoch: 14   Global Step: 242180   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:48,079-Speed 8868.95 samples/sec   Loss 4.3975   LearningRate 0.0075   Epoch: 14   Global Step: 242190   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:49,213-Speed 9032.33 samples/sec   Loss 4.5695   LearningRate 0.0075   Epoch: 14   Global Step: 242200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:50,245-Speed 9931.32 samples/sec   Loss 4.3573   LearningRate 0.0075   Epoch: 14   Global Step: 242210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:51,285-Speed 9855.88 samples/sec   Loss 4.3906   LearningRate 0.0075   Epoch: 14   Global Step: 242220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:52,357-Speed 9559.24 samples/sec   Loss 4.4117   LearningRate 0.0075   Epoch: 14   Global Step: 242230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:53,422-Speed 9614.34 samples/sec   Loss 4.4073   LearningRate 0.0075   Epoch: 14   Global Step: 242240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:54,527-Speed 9275.55 samples/sec   Loss 4.4252   LearningRate 0.0075   Epoch: 14   Global Step: 242250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:19:55,584-Speed 9693.08 samples/sec   Loss 4.4416   LearningRate 0.0075   Epoch: 14   Global Step: 242260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:19:56,640-Speed 9699.58 samples/sec   Loss 4.4088   LearningRate 0.0075   Epoch: 14   Global Step: 242270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:19:57,735-Speed 9356.22 samples/sec   Loss 4.4164   LearningRate 0.0075   Epoch: 14   Global Step: 242280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:19:58,769-Speed 9914.49 samples/sec   Loss 4.3067   LearningRate 0.0075   Epoch: 14   Global Step: 242290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:19:59,870-Speed 9305.69 samples/sec   Loss 4.4884   LearningRate 0.0075   Epoch: 14   Global Step: 242300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:00,937-Speed 9600.63 samples/sec   Loss 4.4315   LearningRate 0.0075   Epoch: 14   Global Step: 242310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:02,036-Speed 9324.67 samples/sec   Loss 4.4518   LearningRate 0.0075   Epoch: 14   Global Step: 242320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:03,128-Speed 9380.21 samples/sec   Loss 4.4821   LearningRate 0.0075   Epoch: 14   Global Step: 242330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:04,173-Speed 9805.83 samples/sec   Loss 4.3732   LearningRate 0.0075   Epoch: 14   Global Step: 242340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:05,249-Speed 9525.29 samples/sec   Loss 4.5130   LearningRate 0.0075   Epoch: 14   Global Step: 242350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:20:06,283-Speed 9903.58 samples/sec   Loss 4.4575   LearningRate 0.0075   Epoch: 14   Global Step: 242360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:20:07,359-Speed 9524.25 samples/sec   Loss 4.4585   LearningRate 0.0075   Epoch: 14   Global Step: 242370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:20:08,447-Speed 9420.31 samples/sec   Loss 4.3572   LearningRate 0.0075   Epoch: 14   Global Step: 242380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:20:09,538-Speed 9390.26 samples/sec   Loss 4.4823   LearningRate 0.0075   Epoch: 14   Global Step: 242390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:20:10,604-Speed 9610.25 samples/sec   Loss 4.4384   LearningRate 0.0075   Epoch: 14   Global Step: 242400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:20:11,665-Speed 9656.83 samples/sec   Loss 4.4489   LearningRate 0.0075   Epoch: 14   Global Step: 242410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:20:12,729-Speed 9631.29 samples/sec   Loss 4.4634   LearningRate 0.0075   Epoch: 14   Global Step: 242420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:20:13,839-Speed 9227.26 samples/sec   Loss 4.4266   LearningRate 0.0075   Epoch: 14   Global Step: 242430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:20:14,880-Speed 9850.92 samples/sec   Loss 4.4256   LearningRate 0.0075   Epoch: 14   Global Step: 242440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:20:15,945-Speed 9619.58 samples/sec   Loss 4.4448   LearningRate 0.0075   Epoch: 14   Global Step: 242450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:17,020-Speed 9527.93 samples/sec   Loss 4.4766   LearningRate 0.0075   Epoch: 14   Global Step: 242460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:18,141-Speed 9142.19 samples/sec   Loss 4.4883   LearningRate 0.0075   Epoch: 14   Global Step: 242470   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:19,274-Speed 9042.87 samples/sec   Loss 4.4243   LearningRate 0.0075   Epoch: 14   Global Step: 242480   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:20,349-Speed 9529.38 samples/sec   Loss 4.4740   LearningRate 0.0075   Epoch: 14   Global Step: 242490   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:21,430-Speed 9482.66 samples/sec   Loss 4.4981   LearningRate 0.0075   Epoch: 14   Global Step: 242500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:22,513-Speed 9459.96 samples/sec   Loss 4.4549   LearningRate 0.0075   Epoch: 14   Global Step: 242510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:23,587-Speed 9542.69 samples/sec   Loss 4.4616   LearningRate 0.0075   Epoch: 14   Global Step: 242520   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:24,646-Speed 9674.23 samples/sec   Loss 4.3642   LearningRate 0.0075   Epoch: 14   Global Step: 242530   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:25,694-Speed 9772.86 samples/sec   Loss 4.4477   LearningRate 0.0075   Epoch: 14   Global Step: 242540   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:26,757-Speed 9642.93 samples/sec   Loss 4.4542   LearningRate 0.0075   Epoch: 14   Global Step: 242550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:27,872-Speed 9191.55 samples/sec   Loss 4.3824   LearningRate 0.0075   Epoch: 14   Global Step: 242560   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:28,962-Speed 9401.36 samples/sec   Loss 4.4910   LearningRate 0.0075   Epoch: 14   Global Step: 242570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:30,049-Speed 9423.38 samples/sec   Loss 4.4729   LearningRate 0.0075   Epoch: 14   Global Step: 242580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:31,155-Speed 9259.98 samples/sec   Loss 4.4733   LearningRate 0.0075   Epoch: 14   Global Step: 242590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:32,250-Speed 9360.79 samples/sec   Loss 4.4408   LearningRate 0.0075   Epoch: 14   Global Step: 242600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:33,297-Speed 9782.71 samples/sec   Loss 4.5060   LearningRate 0.0075   Epoch: 14   Global Step: 242610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:34,345-Speed 9784.32 samples/sec   Loss 4.5115   LearningRate 0.0075   Epoch: 14   Global Step: 242620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:35,430-Speed 9437.34 samples/sec   Loss 4.4723   LearningRate 0.0075   Epoch: 14   Global Step: 242630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:36,506-Speed 9521.61 samples/sec   Loss 4.6213   LearningRate 0.0075   Epoch: 14   Global Step: 242640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:37,566-Speed 9665.99 samples/sec   Loss 4.4246   LearningRate 0.0075   Epoch: 14   Global Step: 242650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:38,628-Speed 9648.65 samples/sec   Loss 4.3707   LearningRate 0.0075   Epoch: 14   Global Step: 242660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:39,696-Speed 9595.38 samples/sec   Loss 4.4479   LearningRate 0.0075   Epoch: 14   Global Step: 242670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:40,809-Speed 9205.40 samples/sec   Loss 4.5454   LearningRate 0.0075   Epoch: 14   Global Step: 242680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:41,891-Speed 9469.53 samples/sec   Loss 4.4417   LearningRate 0.0075   Epoch: 14   Global Step: 242690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:43,017-Speed 9098.39 samples/sec   Loss 4.4013   LearningRate 0.0075   Epoch: 14   Global Step: 242700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:44,125-Speed 9247.60 samples/sec   Loss 4.3829   LearningRate 0.0074   Epoch: 14   Global Step: 242710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:45,198-Speed 9549.34 samples/sec   Loss 4.4710   LearningRate 0.0074   Epoch: 14   Global Step: 242720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:46,258-Speed 9667.26 samples/sec   Loss 4.4228   LearningRate 0.0074   Epoch: 14   Global Step: 242730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:47,338-Speed 9491.17 samples/sec   Loss 4.4198   LearningRate 0.0074   Epoch: 14   Global Step: 242740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:48,418-Speed 9483.31 samples/sec   Loss 4.4002   LearningRate 0.0074   Epoch: 14   Global Step: 242750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:49,498-Speed 9493.29 samples/sec   Loss 4.3035   LearningRate 0.0074   Epoch: 14   Global Step: 242760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:50,595-Speed 9332.82 samples/sec   Loss 4.2938   LearningRate 0.0074   Epoch: 14   Global Step: 242770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:51,656-Speed 9663.38 samples/sec   Loss 4.4513   LearningRate 0.0074   Epoch: 14   Global Step: 242780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:20:52,713-Speed 9698.86 samples/sec   Loss 4.4338   LearningRate 0.0074   Epoch: 14   Global Step: 242790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:20:53,804-Speed 9387.86 samples/sec   Loss 4.4892   LearningRate 0.0074   Epoch: 14   Global Step: 242800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:20:54,881-Speed 9511.89 samples/sec   Loss 4.4005   LearningRate 0.0074   Epoch: 14   Global Step: 242810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:20:55,958-Speed 9516.21 samples/sec   Loss 4.4975   LearningRate 0.0074   Epoch: 14   Global Step: 242820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:20:57,040-Speed 9470.12 samples/sec   Loss 4.5238   LearningRate 0.0074   Epoch: 14   Global Step: 242830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:20:58,092-Speed 9732.42 samples/sec   Loss 4.4186   LearningRate 0.0074   Epoch: 14   Global Step: 242840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:20:59,185-Speed 9376.16 samples/sec   Loss 4.4352   LearningRate 0.0074   Epoch: 14   Global Step: 242850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:00,260-Speed 9532.47 samples/sec   Loss 4.3631   LearningRate 0.0074   Epoch: 14   Global Step: 242860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:01,317-Speed 9691.63 samples/sec   Loss 4.5003   LearningRate 0.0074   Epoch: 14   Global Step: 242870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:02,373-Speed 9703.51 samples/sec   Loss 4.3813   LearningRate 0.0074   Epoch: 14   Global Step: 242880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:03,475-Speed 9295.11 samples/sec   Loss 4.4484   LearningRate 0.0074   Epoch: 14   Global Step: 242890   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:04,584-Speed 9250.82 samples/sec   Loss 4.4260   LearningRate 0.0074   Epoch: 14   Global Step: 242900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:05,655-Speed 9566.49 samples/sec   Loss 4.5105   LearningRate 0.0074   Epoch: 14   Global Step: 242910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:06,701-Speed 9805.58 samples/sec   Loss 4.5101   LearningRate 0.0074   Epoch: 14   Global Step: 242920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:07,789-Speed 9411.55 samples/sec   Loss 4.2898   LearningRate 0.0074   Epoch: 14   Global Step: 242930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:08,883-Speed 9363.52 samples/sec   Loss 4.5026   LearningRate 0.0074   Epoch: 14   Global Step: 242940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:09,976-Speed 9374.37 samples/sec   Loss 4.4852   LearningRate 0.0074   Epoch: 14   Global Step: 242950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:11,049-Speed 9548.87 samples/sec   Loss 4.4503   LearningRate 0.0074   Epoch: 14   Global Step: 242960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:12,115-Speed 9608.55 samples/sec   Loss 4.4266   LearningRate 0.0074   Epoch: 14   Global Step: 242970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:13,191-Speed 9524.70 samples/sec   Loss 4.3656   LearningRate 0.0074   Epoch: 14   Global Step: 242980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:14,275-Speed 9456.55 samples/sec   Loss 4.4185   LearningRate 0.0074   Epoch: 14   Global Step: 242990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:15,377-Speed 9293.96 samples/sec   Loss 4.3651   LearningRate 0.0074   Epoch: 14   Global Step: 243000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:16,476-Speed 9327.27 samples/sec   Loss 4.3752   LearningRate 0.0074   Epoch: 14   Global Step: 243010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:17,543-Speed 9601.24 samples/sec   Loss 4.3763   LearningRate 0.0074   Epoch: 14   Global Step: 243020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:18,635-Speed 9383.72 samples/sec   Loss 4.3748   LearningRate 0.0074   Epoch: 14   Global Step: 243030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:19,737-Speed 9290.91 samples/sec   Loss 4.4254   LearningRate 0.0074   Epoch: 14   Global Step: 243040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:20,787-Speed 9761.59 samples/sec   Loss 4.4184   LearningRate 0.0074   Epoch: 14   Global Step: 243050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:21,897-Speed 9236.31 samples/sec   Loss 4.4663   LearningRate 0.0074   Epoch: 14   Global Step: 243060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:22,968-Speed 9565.04 samples/sec   Loss 4.4603   LearningRate 0.0074   Epoch: 14   Global Step: 243070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:24,086-Speed 9166.62 samples/sec   Loss 4.4685   LearningRate 0.0074   Epoch: 14   Global Step: 243080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:25,157-Speed 9571.47 samples/sec   Loss 4.4015   LearningRate 0.0074   Epoch: 14   Global Step: 243090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:26,239-Speed 9461.62 samples/sec   Loss 4.4842   LearningRate 0.0074   Epoch: 14   Global Step: 243100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:27,336-Speed 9339.75 samples/sec   Loss 4.5097   LearningRate 0.0074   Epoch: 14   Global Step: 243110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:28,446-Speed 9232.27 samples/sec   Loss 4.5176   LearningRate 0.0074   Epoch: 14   Global Step: 243120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:29,519-Speed 9548.00 samples/sec   Loss 4.4913   LearningRate 0.0074   Epoch: 14   Global Step: 243130   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:30,597-Speed 9501.14 samples/sec   Loss 4.3968   LearningRate 0.0074   Epoch: 14   Global Step: 243140   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:31,673-Speed 9522.40 samples/sec   Loss 4.4844   LearningRate 0.0074   Epoch: 14   Global Step: 243150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:32,739-Speed 9613.66 samples/sec   Loss 4.3958   LearningRate 0.0074   Epoch: 14   Global Step: 243160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:33,827-Speed 9417.25 samples/sec   Loss 4.4605   LearningRate 0.0074   Epoch: 14   Global Step: 243170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:34,894-Speed 9606.77 samples/sec   Loss 4.4616   LearningRate 0.0074   Epoch: 14   Global Step: 243180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:36,022-Speed 9084.86 samples/sec   Loss 4.4263   LearningRate 0.0074   Epoch: 14   Global Step: 243190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:37,104-Speed 9470.73 samples/sec   Loss 4.4514   LearningRate 0.0074   Epoch: 14   Global Step: 243200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:38,193-Speed 9406.34 samples/sec   Loss 4.4744   LearningRate 0.0074   Epoch: 14   Global Step: 243210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:39,287-Speed 9367.51 samples/sec   Loss 4.5465   LearningRate 0.0074   Epoch: 14   Global Step: 243220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:40,370-Speed 9459.78 samples/sec   Loss 4.5047   LearningRate 0.0074   Epoch: 14   Global Step: 243230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:41,444-Speed 9545.53 samples/sec   Loss 4.4398   LearningRate 0.0074   Epoch: 14   Global Step: 243240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:42,533-Speed 9405.84 samples/sec   Loss 4.4444   LearningRate 0.0074   Epoch: 14   Global Step: 243250   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 21:21:43,612-Speed 9499.19 samples/sec   Loss 4.5367   LearningRate 0.0074   Epoch: 14   Global Step: 243260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:44,681-Speed 9587.59 samples/sec   Loss 4.4531   LearningRate 0.0074   Epoch: 14   Global Step: 243270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:45,734-Speed 9724.05 samples/sec   Loss 4.5452   LearningRate 0.0074   Epoch: 14   Global Step: 243280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:46,786-Speed 9739.92 samples/sec   Loss 4.3546   LearningRate 0.0074   Epoch: 14   Global Step: 243290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:47,894-Speed 9247.66 samples/sec   Loss 4.3800   LearningRate 0.0074   Epoch: 14   Global Step: 243300   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:48,988-Speed 9368.45 samples/sec   Loss 4.4560   LearningRate 0.0074   Epoch: 14   Global Step: 243310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:50,062-Speed 9537.16 samples/sec   Loss 4.4641   LearningRate 0.0073   Epoch: 14   Global Step: 243320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:51,132-Speed 9583.83 samples/sec   Loss 4.4383   LearningRate 0.0073   Epoch: 14   Global Step: 243330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:52,237-Speed 9269.09 samples/sec   Loss 4.4887   LearningRate 0.0073   Epoch: 14   Global Step: 243340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:53,305-Speed 9595.93 samples/sec   Loss 4.4298   LearningRate 0.0073   Epoch: 14   Global Step: 243350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:54,387-Speed 9462.41 samples/sec   Loss 4.3160   LearningRate 0.0073   Epoch: 14   Global Step: 243360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:55,429-Speed 9838.01 samples/sec   Loss 4.3973   LearningRate 0.0073   Epoch: 14   Global Step: 243370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:56,518-Speed 9404.85 samples/sec   Loss 4.4060   LearningRate 0.0073   Epoch: 14   Global Step: 243380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:57,599-Speed 9480.83 samples/sec   Loss 4.3765   LearningRate 0.0073   Epoch: 14   Global Step: 243390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:21:58,647-Speed 9772.73 samples/sec   Loss 4.4108   LearningRate 0.0073   Epoch: 14   Global Step: 243400   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:21:59,718-Speed 9569.12 samples/sec   Loss 4.4370   LearningRate 0.0073   Epoch: 14   Global Step: 243410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:22:00,818-Speed 9317.75 samples/sec   Loss 4.3962   LearningRate 0.0073   Epoch: 14   Global Step: 243420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:22:01,906-Speed 9413.16 samples/sec   Loss 4.5225   LearningRate 0.0073   Epoch: 14   Global Step: 243430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:22:03,011-Speed 9273.38 samples/sec   Loss 4.4058   LearningRate 0.0073   Epoch: 14   Global Step: 243440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:22:04,118-Speed 9258.23 samples/sec   Loss 4.4651   LearningRate 0.0073   Epoch: 14   Global Step: 243450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:22:05,228-Speed 9233.97 samples/sec   Loss 4.3890   LearningRate 0.0073   Epoch: 14   Global Step: 243460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:22:06,308-Speed 9487.38 samples/sec   Loss 4.4489   LearningRate 0.0073   Epoch: 14   Global Step: 243470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:07,374-Speed 9607.87 samples/sec   Loss 4.4188   LearningRate 0.0073   Epoch: 14   Global Step: 243480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:08,440-Speed 9615.53 samples/sec   Loss 4.5176   LearningRate 0.0073   Epoch: 14   Global Step: 243490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:09,550-Speed 9230.66 samples/sec   Loss 4.5352   LearningRate 0.0073   Epoch: 14   Global Step: 243500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:10,660-Speed 9228.78 samples/sec   Loss 4.4442   LearningRate 0.0073   Epoch: 14   Global Step: 243510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:11,792-Speed 9046.02 samples/sec   Loss 4.4096   LearningRate 0.0073   Epoch: 14   Global Step: 243520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:12,897-Speed 9278.27 samples/sec   Loss 4.4478   LearningRate 0.0073   Epoch: 14   Global Step: 243530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:14,061-Speed 8798.37 samples/sec   Loss 4.4850   LearningRate 0.0073   Epoch: 14   Global Step: 243540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:15,144-Speed 9473.03 samples/sec   Loss 4.4484   LearningRate 0.0073   Epoch: 14   Global Step: 243550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:16,194-Speed 9756.34 samples/sec   Loss 4.5040   LearningRate 0.0073   Epoch: 14   Global Step: 243560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:17,276-Speed 9466.72 samples/sec   Loss 4.5364   LearningRate 0.0073   Epoch: 14   Global Step: 243570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:22:18,364-Speed 9420.73 samples/sec   Loss 4.4567   LearningRate 0.0073   Epoch: 14   Global Step: 243580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:22:19,432-Speed 9593.24 samples/sec   Loss 4.4446   LearningRate 0.0073   Epoch: 14   Global Step: 243590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:22:20,492-Speed 9661.93 samples/sec   Loss 4.3916   LearningRate 0.0073   Epoch: 14   Global Step: 243600   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:22:21,560-Speed 9598.90 samples/sec   Loss 4.4877   LearningRate 0.0073   Epoch: 14   Global Step: 243610   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:22:22,647-Speed 9427.79 samples/sec   Loss 4.4909   LearningRate 0.0073   Epoch: 14   Global Step: 243620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:23,744-Speed 9336.35 samples/sec   Loss 4.4024   LearningRate 0.0073   Epoch: 14   Global Step: 243630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:24,812-Speed 9588.45 samples/sec   Loss 4.5237   LearningRate 0.0073   Epoch: 14   Global Step: 243640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:25,885-Speed 9549.60 samples/sec   Loss 4.4134   LearningRate 0.0073   Epoch: 14   Global Step: 243650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:27,004-Speed 9161.44 samples/sec   Loss 4.4148   LearningRate 0.0073   Epoch: 14   Global Step: 243660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:28,146-Speed 8966.15 samples/sec   Loss 4.4295   LearningRate 0.0073   Epoch: 14   Global Step: 243670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:29,188-Speed 9836.66 samples/sec   Loss 4.4100   LearningRate 0.0073   Epoch: 14   Global Step: 243680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:30,248-Speed 9668.92 samples/sec   Loss 4.3552   LearningRate 0.0073   Epoch: 14   Global Step: 243690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:31,313-Speed 9617.65 samples/sec   Loss 4.4527   LearningRate 0.0073   Epoch: 14   Global Step: 243700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:32,423-Speed 9234.87 samples/sec   Loss 4.4721   LearningRate 0.0073   Epoch: 14   Global Step: 243710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:33,517-Speed 9360.29 samples/sec   Loss 4.4637   LearningRate 0.0073   Epoch: 14   Global Step: 243720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:22:34,640-Speed 9129.24 samples/sec   Loss 4.4801   LearningRate 0.0073   Epoch: 14   Global Step: 243730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:35,708-Speed 9594.76 samples/sec   Loss 4.4873   LearningRate 0.0073   Epoch: 14   Global Step: 243740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:36,821-Speed 9200.03 samples/sec   Loss 4.4843   LearningRate 0.0073   Epoch: 14   Global Step: 243750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:37,917-Speed 9350.08 samples/sec   Loss 4.4219   LearningRate 0.0073   Epoch: 14   Global Step: 243760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:39,018-Speed 9307.04 samples/sec   Loss 4.4553   LearningRate 0.0073   Epoch: 14   Global Step: 243770   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:40,078-Speed 9664.63 samples/sec   Loss 4.4566   LearningRate 0.0073   Epoch: 14   Global Step: 243780   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:41,171-Speed 9371.34 samples/sec   Loss 4.3756   LearningRate 0.0073   Epoch: 14   Global Step: 243790   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:42,253-Speed 9471.46 samples/sec   Loss 4.6206   LearningRate 0.0073   Epoch: 14   Global Step: 243800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:43,348-Speed 9450.34 samples/sec   Loss 4.3953   LearningRate 0.0073   Epoch: 14   Global Step: 243810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:44,421-Speed 9545.98 samples/sec   Loss 4.5068   LearningRate 0.0073   Epoch: 14   Global Step: 243820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:45,544-Speed 9126.73 samples/sec   Loss 4.4576   LearningRate 0.0073   Epoch: 14   Global Step: 243830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:22:46,604-Speed 9668.94 samples/sec   Loss 4.4532   LearningRate 0.0073   Epoch: 14   Global Step: 243840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:47,714-Speed 9228.53 samples/sec   Loss 4.3757   LearningRate 0.0073   Epoch: 14   Global Step: 243850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:48,847-Speed 9047.02 samples/sec   Loss 4.5126   LearningRate 0.0073   Epoch: 14   Global Step: 243860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:49,967-Speed 9142.45 samples/sec   Loss 4.5194   LearningRate 0.0073   Epoch: 14   Global Step: 243870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:51,018-Speed 9753.93 samples/sec   Loss 4.3375   LearningRate 0.0073   Epoch: 14   Global Step: 243880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:52,076-Speed 9688.40 samples/sec   Loss 4.3649   LearningRate 0.0073   Epoch: 14   Global Step: 243890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:53,116-Speed 9849.31 samples/sec   Loss 4.5461   LearningRate 0.0073   Epoch: 14   Global Step: 243900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:54,166-Speed 9759.08 samples/sec   Loss 4.4234   LearningRate 0.0073   Epoch: 14   Global Step: 243910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:55,260-Speed 9365.89 samples/sec   Loss 4.5047   LearningRate 0.0073   Epoch: 14   Global Step: 243920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:56,339-Speed 9496.54 samples/sec   Loss 4.3921   LearningRate 0.0073   Epoch: 14   Global Step: 243930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:22:57,420-Speed 9477.47 samples/sec   Loss 4.4024   LearningRate 0.0072   Epoch: 14   Global Step: 243940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:22:58,470-Speed 9760.18 samples/sec   Loss 4.3415   LearningRate 0.0072   Epoch: 14   Global Step: 243950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:22:59,521-Speed 9750.35 samples/sec   Loss 4.4732   LearningRate 0.0072   Epoch: 14   Global Step: 243960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:23:00,564-Speed 9823.64 samples/sec   Loss 4.4858   LearningRate 0.0072   Epoch: 14   Global Step: 243970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:23:01,628-Speed 9625.64 samples/sec   Loss 4.5101   LearningRate 0.0072   Epoch: 14   Global Step: 243980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:23:02,682-Speed 9717.34 samples/sec   Loss 4.4658   LearningRate 0.0072   Epoch: 14   Global Step: 243990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:23:03,761-Speed 9496.02 samples/sec   Loss 4.4451   LearningRate 0.0072   Epoch: 14   Global Step: 244000   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:23:25,835-[lfw][244000]XNorm: 7.556710
Training: 2022-04-11 21:23:25,836-[lfw][244000]Accuracy-Flip: 0.99650+-0.00252
Training: 2022-04-11 21:23:25,836-[lfw][244000]Accuracy-Highest: 0.99733
Training: 2022-04-11 21:23:51,317-[cfp_fp][244000]XNorm: 6.529520
Training: 2022-04-11 21:23:51,318-[cfp_fp][244000]Accuracy-Flip: 0.96971+-0.00878
Training: 2022-04-11 21:23:51,318-[cfp_fp][244000]Accuracy-Highest: 0.97143
Training: 2022-04-11 21:24:13,359-[agedb_30][244000]XNorm: 7.335563
Training: 2022-04-11 21:24:13,359-[agedb_30][244000]Accuracy-Flip: 0.97200+-0.00806
Training: 2022-04-11 21:24:13,360-[agedb_30][244000]Accuracy-Highest: 0.97250
Training: 2022-04-11 21:24:14,446-Speed 144.87 samples/sec   Loss 4.5110   LearningRate 0.0072   Epoch: 14   Global Step: 244010   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:15,511-Speed 9625.03 samples/sec   Loss 4.4725   LearningRate 0.0072   Epoch: 14   Global Step: 244020   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:16,617-Speed 9266.86 samples/sec   Loss 4.4516   LearningRate 0.0072   Epoch: 14   Global Step: 244030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:17,698-Speed 9477.89 samples/sec   Loss 4.3428   LearningRate 0.0072   Epoch: 14   Global Step: 244040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:18,773-Speed 9536.69 samples/sec   Loss 4.4094   LearningRate 0.0072   Epoch: 14   Global Step: 244050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:19,871-Speed 9329.61 samples/sec   Loss 4.4553   LearningRate 0.0072   Epoch: 14   Global Step: 244060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:20,955-Speed 9450.77 samples/sec   Loss 4.4248   LearningRate 0.0072   Epoch: 14   Global Step: 244070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:22,047-Speed 9386.03 samples/sec   Loss 4.4950   LearningRate 0.0072   Epoch: 14   Global Step: 244080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:23,171-Speed 9119.47 samples/sec   Loss 4.3912   LearningRate 0.0072   Epoch: 14   Global Step: 244090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:24,251-Speed 9481.35 samples/sec   Loss 4.5278   LearningRate 0.0072   Epoch: 14   Global Step: 244100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:25,345-Speed 9371.16 samples/sec   Loss 4.4685   LearningRate 0.0072   Epoch: 14   Global Step: 244110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:26,406-Speed 9651.92 samples/sec   Loss 4.3920   LearningRate 0.0072   Epoch: 14   Global Step: 244120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:27,515-Speed 9236.76 samples/sec   Loss 4.4605   LearningRate 0.0072   Epoch: 14   Global Step: 244130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:28,636-Speed 9143.67 samples/sec   Loss 4.5644   LearningRate 0.0072   Epoch: 14   Global Step: 244140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:29,740-Speed 9282.04 samples/sec   Loss 4.4802   LearningRate 0.0072   Epoch: 14   Global Step: 244150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:30,827-Speed 9420.60 samples/sec   Loss 4.4181   LearningRate 0.0072   Epoch: 14   Global Step: 244160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:31,907-Speed 9491.06 samples/sec   Loss 4.3918   LearningRate 0.0072   Epoch: 14   Global Step: 244170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:32,992-Speed 9439.54 samples/sec   Loss 4.4951   LearningRate 0.0072   Epoch: 14   Global Step: 244180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:34,068-Speed 9523.74 samples/sec   Loss 4.4518   LearningRate 0.0072   Epoch: 14   Global Step: 244190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:35,183-Speed 9190.31 samples/sec   Loss 4.5082   LearningRate 0.0072   Epoch: 14   Global Step: 244200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:36,271-Speed 9419.04 samples/sec   Loss 4.5365   LearningRate 0.0072   Epoch: 14   Global Step: 244210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:37,343-Speed 9561.42 samples/sec   Loss 4.4208   LearningRate 0.0072   Epoch: 14   Global Step: 244220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:38,421-Speed 9509.95 samples/sec   Loss 4.5336   LearningRate 0.0072   Epoch: 14   Global Step: 244230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:39,511-Speed 9395.95 samples/sec   Loss 4.5129   LearningRate 0.0072   Epoch: 14   Global Step: 244240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:40,615-Speed 9280.15 samples/sec   Loss 4.3692   LearningRate 0.0072   Epoch: 14   Global Step: 244250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:41,681-Speed 9612.79 samples/sec   Loss 4.4056   LearningRate 0.0072   Epoch: 14   Global Step: 244260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:42,755-Speed 9540.36 samples/sec   Loss 4.3459   LearningRate 0.0072   Epoch: 14   Global Step: 244270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:43,856-Speed 9308.27 samples/sec   Loss 4.4329   LearningRate 0.0072   Epoch: 14   Global Step: 244280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:44,909-Speed 9738.60 samples/sec   Loss 4.4390   LearningRate 0.0072   Epoch: 14   Global Step: 244290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:45,974-Speed 9620.06 samples/sec   Loss 4.4634   LearningRate 0.0072   Epoch: 14   Global Step: 244300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:47,042-Speed 9590.89 samples/sec   Loss 4.4176   LearningRate 0.0072   Epoch: 14   Global Step: 244310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:48,114-Speed 9557.82 samples/sec   Loss 4.4771   LearningRate 0.0072   Epoch: 14   Global Step: 244320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:49,218-Speed 9284.73 samples/sec   Loss 4.4584   LearningRate 0.0072   Epoch: 14   Global Step: 244330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:24:50,320-Speed 9295.90 samples/sec   Loss 4.4447   LearningRate 0.0072   Epoch: 14   Global Step: 244340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:51,418-Speed 9332.27 samples/sec   Loss 4.5149   LearningRate 0.0072   Epoch: 14   Global Step: 244350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:52,506-Speed 9414.48 samples/sec   Loss 4.3766   LearningRate 0.0072   Epoch: 14   Global Step: 244360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:53,598-Speed 9382.88 samples/sec   Loss 4.4968   LearningRate 0.0072   Epoch: 14   Global Step: 244370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:54,636-Speed 9870.81 samples/sec   Loss 4.5259   LearningRate 0.0072   Epoch: 14   Global Step: 244380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:55,700-Speed 9631.62 samples/sec   Loss 4.4680   LearningRate 0.0072   Epoch: 14   Global Step: 244390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:56,773-Speed 9556.39 samples/sec   Loss 4.4425   LearningRate 0.0072   Epoch: 14   Global Step: 244400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:57,885-Speed 9210.25 samples/sec   Loss 4.4330   LearningRate 0.0072   Epoch: 14   Global Step: 244410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:24:58,947-Speed 9649.65 samples/sec   Loss 4.5018   LearningRate 0.0072   Epoch: 14   Global Step: 244420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:00,046-Speed 9321.82 samples/sec   Loss 4.4699   LearningRate 0.0072   Epoch: 14   Global Step: 244430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:01,104-Speed 9685.55 samples/sec   Loss 4.4686   LearningRate 0.0072   Epoch: 14   Global Step: 244440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:02,181-Speed 9510.34 samples/sec   Loss 4.4491   LearningRate 0.0072   Epoch: 14   Global Step: 244450   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:03,266-Speed 9437.30 samples/sec   Loss 4.5450   LearningRate 0.0072   Epoch: 14   Global Step: 244460   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:04,335-Speed 9592.53 samples/sec   Loss 4.4423   LearningRate 0.0072   Epoch: 14   Global Step: 244470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:05,423-Speed 9412.43 samples/sec   Loss 4.4959   LearningRate 0.0072   Epoch: 14   Global Step: 244480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:06,537-Speed 9204.61 samples/sec   Loss 4.3885   LearningRate 0.0072   Epoch: 14   Global Step: 244490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:07,638-Speed 9298.43 samples/sec   Loss 4.4534   LearningRate 0.0072   Epoch: 14   Global Step: 244500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:08,678-Speed 9859.65 samples/sec   Loss 4.4593   LearningRate 0.0072   Epoch: 14   Global Step: 244510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:09,726-Speed 9771.11 samples/sec   Loss 4.4258   LearningRate 0.0072   Epoch: 14   Global Step: 244520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:10,834-Speed 9246.25 samples/sec   Loss 4.4136   LearningRate 0.0072   Epoch: 14   Global Step: 244530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:11,911-Speed 9514.00 samples/sec   Loss 4.5405   LearningRate 0.0072   Epoch: 14   Global Step: 244540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:13,018-Speed 9255.57 samples/sec   Loss 4.4798   LearningRate 0.0072   Epoch: 14   Global Step: 244550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:14,096-Speed 9499.76 samples/sec   Loss 4.3626   LearningRate 0.0071   Epoch: 14   Global Step: 244560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:15,143-Speed 9791.50 samples/sec   Loss 4.4664   LearningRate 0.0071   Epoch: 14   Global Step: 244570   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:16,205-Speed 9657.81 samples/sec   Loss 4.4605   LearningRate 0.0071   Epoch: 14   Global Step: 244580   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:17,316-Speed 9215.03 samples/sec   Loss 4.4837   LearningRate 0.0071   Epoch: 14   Global Step: 244590   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:18,403-Speed 9428.45 samples/sec   Loss 4.4446   LearningRate 0.0071   Epoch: 14   Global Step: 244600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:19,485-Speed 9468.75 samples/sec   Loss 4.3584   LearningRate 0.0071   Epoch: 14   Global Step: 244610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:20,549-Speed 9633.57 samples/sec   Loss 4.4461   LearningRate 0.0071   Epoch: 14   Global Step: 244620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:21,587-Speed 9875.34 samples/sec   Loss 4.4486   LearningRate 0.0071   Epoch: 14   Global Step: 244630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:22,636-Speed 9763.84 samples/sec   Loss 4.4834   LearningRate 0.0071   Epoch: 14   Global Step: 244640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:23,685-Speed 9763.14 samples/sec   Loss 4.3021   LearningRate 0.0071   Epoch: 14   Global Step: 244650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:24,780-Speed 9358.83 samples/sec   Loss 4.4329   LearningRate 0.0071   Epoch: 14   Global Step: 244660   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:25,847-Speed 9601.93 samples/sec   Loss 4.4281   LearningRate 0.0071   Epoch: 14   Global Step: 244670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:26,960-Speed 9203.87 samples/sec   Loss 4.3277   LearningRate 0.0071   Epoch: 14   Global Step: 244680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:28,082-Speed 9130.46 samples/sec   Loss 4.5035   LearningRate 0.0071   Epoch: 14   Global Step: 244690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:29,135-Speed 9736.30 samples/sec   Loss 4.5148   LearningRate 0.0071   Epoch: 14   Global Step: 244700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:30,205-Speed 9572.28 samples/sec   Loss 4.4934   LearningRate 0.0071   Epoch: 14   Global Step: 244710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:31,240-Speed 9902.51 samples/sec   Loss 4.4371   LearningRate 0.0071   Epoch: 14   Global Step: 244720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:32,293-Speed 9729.88 samples/sec   Loss 4.4750   LearningRate 0.0071   Epoch: 14   Global Step: 244730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:33,384-Speed 9393.03 samples/sec   Loss 4.3941   LearningRate 0.0071   Epoch: 14   Global Step: 244740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:34,479-Speed 9360.34 samples/sec   Loss 4.4632   LearningRate 0.0071   Epoch: 14   Global Step: 244750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:35,526-Speed 9781.80 samples/sec   Loss 4.3836   LearningRate 0.0071   Epoch: 14   Global Step: 244760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:36,550-Speed 10005.66 samples/sec   Loss 4.4774   LearningRate 0.0071   Epoch: 14   Global Step: 244770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:37,597-Speed 9785.33 samples/sec   Loss 4.4511   LearningRate 0.0071   Epoch: 14   Global Step: 244780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:38,681-Speed 9454.96 samples/sec   Loss 4.4887   LearningRate 0.0071   Epoch: 14   Global Step: 244790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:39,760-Speed 9491.45 samples/sec   Loss 4.4517   LearningRate 0.0071   Epoch: 14   Global Step: 244800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:40,821-Speed 9660.71 samples/sec   Loss 4.4865   LearningRate 0.0071   Epoch: 14   Global Step: 244810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:41,869-Speed 9779.06 samples/sec   Loss 4.3349   LearningRate 0.0071   Epoch: 14   Global Step: 244820   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:42,982-Speed 9198.91 samples/sec   Loss 4.5141   LearningRate 0.0071   Epoch: 14   Global Step: 244830   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:44,046-Speed 9631.29 samples/sec   Loss 4.5150   LearningRate 0.0071   Epoch: 14   Global Step: 244840   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:45,117-Speed 9574.72 samples/sec   Loss 4.4660   LearningRate 0.0071   Epoch: 14   Global Step: 244850   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:46,159-Speed 9832.62 samples/sec   Loss 4.4404   LearningRate 0.0071   Epoch: 14   Global Step: 244860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:47,242-Speed 9463.57 samples/sec   Loss 4.4043   LearningRate 0.0071   Epoch: 14   Global Step: 244870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:48,295-Speed 9726.01 samples/sec   Loss 4.4654   LearningRate 0.0071   Epoch: 14   Global Step: 244880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:49,394-Speed 9326.64 samples/sec   Loss 4.4642   LearningRate 0.0071   Epoch: 14   Global Step: 244890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:50,441-Speed 9783.10 samples/sec   Loss 4.5490   LearningRate 0.0071   Epoch: 14   Global Step: 244900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:51,547-Speed 9265.29 samples/sec   Loss 4.5813   LearningRate 0.0071   Epoch: 14   Global Step: 244910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:52,652-Speed 9267.56 samples/sec   Loss 4.4612   LearningRate 0.0071   Epoch: 14   Global Step: 244920   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:53,715-Speed 9643.17 samples/sec   Loss 4.4512   LearningRate 0.0071   Epoch: 14   Global Step: 244930   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:54,780-Speed 9624.15 samples/sec   Loss 4.4484   LearningRate 0.0071   Epoch: 14   Global Step: 244940   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:55,867-Speed 9419.16 samples/sec   Loss 4.4552   LearningRate 0.0071   Epoch: 14   Global Step: 244950   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:25:56,962-Speed 9359.14 samples/sec   Loss 4.5471   LearningRate 0.0071   Epoch: 14   Global Step: 244960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:58,050-Speed 9422.83 samples/sec   Loss 4.3149   LearningRate 0.0071   Epoch: 14   Global Step: 244970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:25:59,135-Speed 9435.63 samples/sec   Loss 4.4013   LearningRate 0.0071   Epoch: 14   Global Step: 244980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:00,215-Speed 9489.85 samples/sec   Loss 4.4628   LearningRate 0.0071   Epoch: 14   Global Step: 244990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:01,260-Speed 9806.32 samples/sec   Loss 4.5002   LearningRate 0.0071   Epoch: 14   Global Step: 245000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:02,410-Speed 8913.53 samples/sec   Loss 4.5134   LearningRate 0.0071   Epoch: 14   Global Step: 245010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:03,474-Speed 9628.89 samples/sec   Loss 4.5196   LearningRate 0.0071   Epoch: 14   Global Step: 245020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:04,540-Speed 9617.91 samples/sec   Loss 4.4375   LearningRate 0.0071   Epoch: 14   Global Step: 245030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:05,638-Speed 9331.94 samples/sec   Loss 4.4684   LearningRate 0.0071   Epoch: 14   Global Step: 245040   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:06,732-Speed 9362.61 samples/sec   Loss 4.4216   LearningRate 0.0071   Epoch: 14   Global Step: 245050   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:07,850-Speed 9164.06 samples/sec   Loss 4.4520   LearningRate 0.0071   Epoch: 14   Global Step: 245060   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:08,933-Speed 9465.39 samples/sec   Loss 4.3701   LearningRate 0.0071   Epoch: 14   Global Step: 245070   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:09,991-Speed 9682.67 samples/sec   Loss 4.4222   LearningRate 0.0071   Epoch: 14   Global Step: 245080   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:11,089-Speed 9327.57 samples/sec   Loss 4.4561   LearningRate 0.0071   Epoch: 14   Global Step: 245090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:12,145-Speed 9703.43 samples/sec   Loss 4.4480   LearningRate 0.0071   Epoch: 14   Global Step: 245100   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:13,235-Speed 9403.43 samples/sec   Loss 4.5126   LearningRate 0.0071   Epoch: 14   Global Step: 245110   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:14,276-Speed 9835.51 samples/sec   Loss 4.4377   LearningRate 0.0071   Epoch: 14   Global Step: 245120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:15,385-Speed 9247.85 samples/sec   Loss 4.4371   LearningRate 0.0071   Epoch: 14   Global Step: 245130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:16,488-Speed 9285.83 samples/sec   Loss 4.4952   LearningRate 0.0071   Epoch: 14   Global Step: 245140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:17,584-Speed 9346.42 samples/sec   Loss 4.5583   LearningRate 0.0071   Epoch: 14   Global Step: 245150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:18,712-Speed 9082.95 samples/sec   Loss 4.5449   LearningRate 0.0071   Epoch: 14   Global Step: 245160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:19,791-Speed 9495.91 samples/sec   Loss 4.3942   LearningRate 0.0071   Epoch: 14   Global Step: 245170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:20,875-Speed 9457.85 samples/sec   Loss 4.5140   LearningRate 0.0071   Epoch: 14   Global Step: 245180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:21,930-Speed 9708.64 samples/sec   Loss 4.4029   LearningRate 0.0070   Epoch: 14   Global Step: 245190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:22,983-Speed 9734.37 samples/sec   Loss 4.5208   LearningRate 0.0070   Epoch: 14   Global Step: 245200   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:24,061-Speed 9506.98 samples/sec   Loss 4.3742   LearningRate 0.0070   Epoch: 14   Global Step: 245210   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:25,134-Speed 9543.73 samples/sec   Loss 4.4377   LearningRate 0.0070   Epoch: 14   Global Step: 245220   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:26,213-Speed 9498.05 samples/sec   Loss 4.5665   LearningRate 0.0070   Epoch: 14   Global Step: 245230   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:27,299-Speed 9438.28 samples/sec   Loss 4.4014   LearningRate 0.0070   Epoch: 14   Global Step: 245240   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:28,387-Speed 9421.24 samples/sec   Loss 4.4926   LearningRate 0.0070   Epoch: 14   Global Step: 245250   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:29,495-Speed 9242.10 samples/sec   Loss 4.4651   LearningRate 0.0070   Epoch: 14   Global Step: 245260   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:30,559-Speed 9627.81 samples/sec   Loss 4.4565   LearningRate 0.0070   Epoch: 14   Global Step: 245270   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:31,635-Speed 9524.67 samples/sec   Loss 4.4508   LearningRate 0.0070   Epoch: 14   Global Step: 245280   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:32,708-Speed 9543.47 samples/sec   Loss 4.4093   LearningRate 0.0070   Epoch: 14   Global Step: 245290   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:33,835-Speed 9091.06 samples/sec   Loss 4.4159   LearningRate 0.0070   Epoch: 14   Global Step: 245300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:34,926-Speed 9390.55 samples/sec   Loss 4.4324   LearningRate 0.0070   Epoch: 14   Global Step: 245310   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:35,995-Speed 9588.08 samples/sec   Loss 4.4220   LearningRate 0.0070   Epoch: 14   Global Step: 245320   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:37,085-Speed 9404.01 samples/sec   Loss 4.5010   LearningRate 0.0070   Epoch: 14   Global Step: 245330   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:38,164-Speed 9495.26 samples/sec   Loss 4.4691   LearningRate 0.0070   Epoch: 14   Global Step: 245340   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:39,241-Speed 9511.89 samples/sec   Loss 4.4862   LearningRate 0.0070   Epoch: 14   Global Step: 245350   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:40,294-Speed 9739.63 samples/sec   Loss 4.4621   LearningRate 0.0070   Epoch: 14   Global Step: 245360   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:41,381-Speed 9424.62 samples/sec   Loss 4.3955   LearningRate 0.0070   Epoch: 14   Global Step: 245370   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:42,497-Speed 9178.77 samples/sec   Loss 4.5362   LearningRate 0.0070   Epoch: 14   Global Step: 245380   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:43,625-Speed 9084.89 samples/sec   Loss 4.4757   LearningRate 0.0070   Epoch: 14   Global Step: 245390   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:44,724-Speed 9322.68 samples/sec   Loss 4.5167   LearningRate 0.0070   Epoch: 14   Global Step: 245400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:45,753-Speed 9955.94 samples/sec   Loss 4.4843   LearningRate 0.0070   Epoch: 14   Global Step: 245410   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:46,805-Speed 9738.42 samples/sec   Loss 4.4703   LearningRate 0.0070   Epoch: 14   Global Step: 245420   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:47,872-Speed 9602.23 samples/sec   Loss 4.4697   LearningRate 0.0070   Epoch: 14   Global Step: 245430   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:48,921-Speed 9762.06 samples/sec   Loss 4.4906   LearningRate 0.0070   Epoch: 14   Global Step: 245440   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:50,013-Speed 9386.83 samples/sec   Loss 4.4403   LearningRate 0.0070   Epoch: 14   Global Step: 245450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:51,102-Speed 9409.83 samples/sec   Loss 4.3939   LearningRate 0.0070   Epoch: 14   Global Step: 245460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:52,239-Speed 9013.88 samples/sec   Loss 4.4282   LearningRate 0.0070   Epoch: 14   Global Step: 245470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:53,386-Speed 8927.56 samples/sec   Loss 4.5276   LearningRate 0.0070   Epoch: 14   Global Step: 245480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:54,454-Speed 9596.30 samples/sec   Loss 4.4557   LearningRate 0.0070   Epoch: 14   Global Step: 245490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:55,526-Speed 9562.64 samples/sec   Loss 4.4253   LearningRate 0.0070   Epoch: 14   Global Step: 245500   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:56,653-Speed 9093.65 samples/sec   Loss 4.4456   LearningRate 0.0070   Epoch: 14   Global Step: 245510   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:26:57,698-Speed 9805.27 samples/sec   Loss 4.4817   LearningRate 0.0070   Epoch: 14   Global Step: 245520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:58,764-Speed 9610.52 samples/sec   Loss 4.4725   LearningRate 0.0070   Epoch: 14   Global Step: 245530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:26:59,815-Speed 9748.84 samples/sec   Loss 4.4279   LearningRate 0.0070   Epoch: 14   Global Step: 245540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:00,885-Speed 9580.87 samples/sec   Loss 4.5060   LearningRate 0.0070   Epoch: 14   Global Step: 245550   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:01,983-Speed 9325.07 samples/sec   Loss 4.4345   LearningRate 0.0070   Epoch: 14   Global Step: 245560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:03,042-Speed 9673.74 samples/sec   Loss 4.4726   LearningRate 0.0070   Epoch: 14   Global Step: 245570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:04,148-Speed 9266.02 samples/sec   Loss 4.4512   LearningRate 0.0070   Epoch: 14   Global Step: 245580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:05,225-Speed 9517.38 samples/sec   Loss 4.4222   LearningRate 0.0070   Epoch: 14   Global Step: 245590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:06,321-Speed 9347.31 samples/sec   Loss 4.4418   LearningRate 0.0070   Epoch: 14   Global Step: 245600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:07,431-Speed 9230.23 samples/sec   Loss 4.5671   LearningRate 0.0070   Epoch: 14   Global Step: 245610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:08,532-Speed 9302.55 samples/sec   Loss 4.5095   LearningRate 0.0070   Epoch: 14   Global Step: 245620   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:09,582-Speed 9757.95 samples/sec   Loss 4.4652   LearningRate 0.0070   Epoch: 14   Global Step: 245630   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:10,625-Speed 9834.61 samples/sec   Loss 4.3613   LearningRate 0.0070   Epoch: 14   Global Step: 245640   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:11,704-Speed 9504.10 samples/sec   Loss 4.4977   LearningRate 0.0070   Epoch: 14   Global Step: 245650   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:12,755-Speed 9749.06 samples/sec   Loss 4.3626   LearningRate 0.0070   Epoch: 14   Global Step: 245660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:13,831-Speed 9522.76 samples/sec   Loss 4.4168   LearningRate 0.0070   Epoch: 14   Global Step: 245670   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:14,928-Speed 9343.13 samples/sec   Loss 4.3454   LearningRate 0.0070   Epoch: 14   Global Step: 245680   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:15,993-Speed 9618.42 samples/sec   Loss 4.3904   LearningRate 0.0070   Epoch: 14   Global Step: 245690   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:17,098-Speed 9273.82 samples/sec   Loss 4.5075   LearningRate 0.0070   Epoch: 14   Global Step: 245700   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:18,221-Speed 9118.28 samples/sec   Loss 4.4621   LearningRate 0.0070   Epoch: 14   Global Step: 245710   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:19,302-Speed 9485.96 samples/sec   Loss 4.4194   LearningRate 0.0070   Epoch: 14   Global Step: 245720   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:20,389-Speed 9427.69 samples/sec   Loss 4.4041   LearningRate 0.0070   Epoch: 14   Global Step: 245730   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:21,502-Speed 9200.91 samples/sec   Loss 4.4144   LearningRate 0.0070   Epoch: 14   Global Step: 245740   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:22,575-Speed 9548.06 samples/sec   Loss 4.4441   LearningRate 0.0070   Epoch: 14   Global Step: 245750   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:23,613-Speed 9875.46 samples/sec   Loss 4.4898   LearningRate 0.0070   Epoch: 14   Global Step: 245760   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:24,719-Speed 9261.15 samples/sec   Loss 4.4492   LearningRate 0.0070   Epoch: 14   Global Step: 245770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:25,784-Speed 9619.00 samples/sec   Loss 4.4604   LearningRate 0.0070   Epoch: 14   Global Step: 245780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:26,827-Speed 9830.06 samples/sec   Loss 4.2958   LearningRate 0.0070   Epoch: 14   Global Step: 245790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:27,878-Speed 9747.99 samples/sec   Loss 4.4157   LearningRate 0.0070   Epoch: 14   Global Step: 245800   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:28,938-Speed 9664.11 samples/sec   Loss 4.3863   LearningRate 0.0070   Epoch: 14   Global Step: 245810   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:30,000-Speed 9651.52 samples/sec   Loss 4.4425   LearningRate 0.0069   Epoch: 14   Global Step: 245820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:31,085-Speed 9448.31 samples/sec   Loss 4.5262   LearningRate 0.0069   Epoch: 14   Global Step: 245830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:32,186-Speed 9304.02 samples/sec   Loss 4.4484   LearningRate 0.0069   Epoch: 14   Global Step: 245840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:33,260-Speed 9537.41 samples/sec   Loss 4.4378   LearningRate 0.0069   Epoch: 14   Global Step: 245850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:34,335-Speed 9533.49 samples/sec   Loss 4.4471   LearningRate 0.0069   Epoch: 14   Global Step: 245860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:35,420-Speed 9440.90 samples/sec   Loss 4.3500   LearningRate 0.0069   Epoch: 14   Global Step: 245870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:36,519-Speed 9323.68 samples/sec   Loss 4.4713   LearningRate 0.0069   Epoch: 14   Global Step: 245880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:37,608-Speed 9408.20 samples/sec   Loss 4.4522   LearningRate 0.0069   Epoch: 14   Global Step: 245890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:27:38,698-Speed 9395.59 samples/sec   Loss 4.4633   LearningRate 0.0069   Epoch: 14   Global Step: 245900   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:39,750-Speed 9744.37 samples/sec   Loss 4.5197   LearningRate 0.0069   Epoch: 14   Global Step: 245910   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:40,806-Speed 9704.29 samples/sec   Loss 4.5112   LearningRate 0.0069   Epoch: 14   Global Step: 245920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:41,882-Speed 9522.27 samples/sec   Loss 4.3923   LearningRate 0.0069   Epoch: 14   Global Step: 245930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:42,974-Speed 9378.16 samples/sec   Loss 4.4881   LearningRate 0.0069   Epoch: 14   Global Step: 245940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:44,047-Speed 9547.77 samples/sec   Loss 4.4791   LearningRate 0.0069   Epoch: 14   Global Step: 245950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:45,131-Speed 9453.30 samples/sec   Loss 4.5167   LearningRate 0.0069   Epoch: 14   Global Step: 245960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:46,260-Speed 9072.27 samples/sec   Loss 4.4952   LearningRate 0.0069   Epoch: 14   Global Step: 245970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:47,393-Speed 9050.20 samples/sec   Loss 4.3798   LearningRate 0.0069   Epoch: 14   Global Step: 245980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:48,532-Speed 8992.58 samples/sec   Loss 4.4837   LearningRate 0.0069   Epoch: 14   Global Step: 245990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:27:49,635-Speed 9291.96 samples/sec   Loss 4.5163   LearningRate 0.0069   Epoch: 14   Global Step: 246000   Fp16 Grad Scale: 262144   Required: 4 hours
Training: 2022-04-11 21:28:11,569-[lfw][246000]XNorm: 7.421774
Training: 2022-04-11 21:28:11,570-[lfw][246000]Accuracy-Flip: 0.99583+-0.00239
Training: 2022-04-11 21:28:11,570-[lfw][246000]Accuracy-Highest: 0.99733
Training: 2022-04-11 21:28:36,844-[cfp_fp][246000]XNorm: 6.422978
Training: 2022-04-11 21:28:36,845-[cfp_fp][246000]Accuracy-Flip: 0.97114+-0.00774
Training: 2022-04-11 21:28:36,846-[cfp_fp][246000]Accuracy-Highest: 0.97143
Training: 2022-04-11 21:28:58,676-[agedb_30][246000]XNorm: 7.163093
Training: 2022-04-11 21:28:58,676-[agedb_30][246000]Accuracy-Flip: 0.97167+-0.00882
Training: 2022-04-11 21:28:58,677-[agedb_30][246000]Accuracy-Highest: 0.97250
Training: 2022-04-11 21:28:59,762-Speed 146.02 samples/sec   Loss 4.4916   LearningRate 0.0069   Epoch: 14   Global Step: 246010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:00,870-Speed 9247.07 samples/sec   Loss 4.3457   LearningRate 0.0069   Epoch: 14   Global Step: 246020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:01,904-Speed 9910.93 samples/sec   Loss 4.3838   LearningRate 0.0069   Epoch: 14   Global Step: 246030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:02,962-Speed 9679.41 samples/sec   Loss 4.4333   LearningRate 0.0069   Epoch: 14   Global Step: 246040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:04,045-Speed 9456.89 samples/sec   Loss 4.4544   LearningRate 0.0069   Epoch: 14   Global Step: 246050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:05,101-Speed 9710.06 samples/sec   Loss 4.5278   LearningRate 0.0069   Epoch: 14   Global Step: 246060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:06,202-Speed 9305.04 samples/sec   Loss 4.3803   LearningRate 0.0069   Epoch: 14   Global Step: 246070   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:07,309-Speed 9257.49 samples/sec   Loss 4.3871   LearningRate 0.0069   Epoch: 14   Global Step: 246080   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:08,395-Speed 9435.32 samples/sec   Loss 4.4540   LearningRate 0.0069   Epoch: 14   Global Step: 246090   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:09,469-Speed 9533.96 samples/sec   Loss 4.3353   LearningRate 0.0069   Epoch: 14   Global Step: 246100   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:10,550-Speed 9477.72 samples/sec   Loss 4.4865   LearningRate 0.0069   Epoch: 14   Global Step: 246110   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:11,632-Speed 9470.86 samples/sec   Loss 4.4118   LearningRate 0.0069   Epoch: 14   Global Step: 246120   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:12,713-Speed 9477.37 samples/sec   Loss 4.4889   LearningRate 0.0069   Epoch: 14   Global Step: 246130   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:13,779-Speed 9609.76 samples/sec   Loss 4.4900   LearningRate 0.0069   Epoch: 14   Global Step: 246140   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:14,878-Speed 9326.18 samples/sec   Loss 4.4820   LearningRate 0.0069   Epoch: 14   Global Step: 246150   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:15,964-Speed 9430.64 samples/sec   Loss 4.4850   LearningRate 0.0069   Epoch: 14   Global Step: 246160   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:17,014-Speed 9760.01 samples/sec   Loss 4.3963   LearningRate 0.0069   Epoch: 14   Global Step: 246170   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:18,109-Speed 9360.56 samples/sec   Loss 4.4965   LearningRate 0.0069   Epoch: 14   Global Step: 246180   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:19,251-Speed 8973.50 samples/sec   Loss 4.5159   LearningRate 0.0069   Epoch: 14   Global Step: 246190   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:20,303-Speed 9748.69 samples/sec   Loss 4.4273   LearningRate 0.0069   Epoch: 14   Global Step: 246200   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:21,372-Speed 9582.02 samples/sec   Loss 4.2959   LearningRate 0.0069   Epoch: 14   Global Step: 246210   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:22,481-Speed 9237.25 samples/sec   Loss 4.3841   LearningRate 0.0069   Epoch: 14   Global Step: 246220   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:23,534-Speed 9731.14 samples/sec   Loss 4.5603   LearningRate 0.0069   Epoch: 14   Global Step: 246230   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:24,586-Speed 9741.53 samples/sec   Loss 4.4487   LearningRate 0.0069   Epoch: 14   Global Step: 246240   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:25,676-Speed 9394.43 samples/sec   Loss 4.4381   LearningRate 0.0069   Epoch: 14   Global Step: 246250   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:26,747-Speed 9570.55 samples/sec   Loss 4.4769   LearningRate 0.0069   Epoch: 14   Global Step: 246260   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:27,824-Speed 9511.86 samples/sec   Loss 4.5333   LearningRate 0.0069   Epoch: 14   Global Step: 246270   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:28,930-Speed 9262.29 samples/sec   Loss 4.4558   LearningRate 0.0069   Epoch: 14   Global Step: 246280   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:30,026-Speed 9352.45 samples/sec   Loss 4.5372   LearningRate 0.0069   Epoch: 14   Global Step: 246290   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:31,149-Speed 9123.04 samples/sec   Loss 4.3946   LearningRate 0.0069   Epoch: 14   Global Step: 246300   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:32,201-Speed 9740.78 samples/sec   Loss 4.4484   LearningRate 0.0069   Epoch: 14   Global Step: 246310   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:33,281-Speed 9490.66 samples/sec   Loss 4.4378   LearningRate 0.0069   Epoch: 14   Global Step: 246320   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:34,334-Speed 9727.22 samples/sec   Loss 4.4354   LearningRate 0.0069   Epoch: 14   Global Step: 246330   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:35,412-Speed 9503.50 samples/sec   Loss 4.4635   LearningRate 0.0069   Epoch: 14   Global Step: 246340   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:36,460-Speed 9780.30 samples/sec   Loss 4.4433   LearningRate 0.0069   Epoch: 14   Global Step: 246350   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:37,527-Speed 9595.85 samples/sec   Loss 4.4001   LearningRate 0.0069   Epoch: 14   Global Step: 246360   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:38,580-Speed 9735.93 samples/sec   Loss 4.3610   LearningRate 0.0069   Epoch: 14   Global Step: 246370   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:39,614-Speed 9911.17 samples/sec   Loss 4.4597   LearningRate 0.0069   Epoch: 14   Global Step: 246380   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:40,651-Speed 9887.38 samples/sec   Loss 4.4155   LearningRate 0.0069   Epoch: 14   Global Step: 246390   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:41,765-Speed 9190.88 samples/sec   Loss 4.5000   LearningRate 0.0069   Epoch: 14   Global Step: 246400   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:42,854-Speed 9409.12 samples/sec   Loss 4.3937   LearningRate 0.0069   Epoch: 14   Global Step: 246410   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:43,920-Speed 9610.10 samples/sec   Loss 4.4474   LearningRate 0.0069   Epoch: 14   Global Step: 246420   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:45,024-Speed 9279.31 samples/sec   Loss 4.3843   LearningRate 0.0069   Epoch: 14   Global Step: 246430   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:46,093-Speed 9587.47 samples/sec   Loss 4.5301   LearningRate 0.0069   Epoch: 14   Global Step: 246440   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:47,184-Speed 9389.44 samples/sec   Loss 4.4670   LearningRate 0.0069   Epoch: 14   Global Step: 246450   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:48,284-Speed 9318.71 samples/sec   Loss 4.4492   LearningRate 0.0068   Epoch: 14   Global Step: 246460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:49,349-Speed 9615.26 samples/sec   Loss 4.4175   LearningRate 0.0068   Epoch: 14   Global Step: 246470   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:50,439-Speed 9400.00 samples/sec   Loss 4.4748   LearningRate 0.0068   Epoch: 14   Global Step: 246480   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:51,489-Speed 9759.33 samples/sec   Loss 4.4353   LearningRate 0.0068   Epoch: 14   Global Step: 246490   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:52,562-Speed 9552.88 samples/sec   Loss 4.4392   LearningRate 0.0068   Epoch: 14   Global Step: 246500   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:53,648-Speed 9433.14 samples/sec   Loss 4.3366   LearningRate 0.0068   Epoch: 14   Global Step: 246510   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:54,733-Speed 9444.34 samples/sec   Loss 4.4073   LearningRate 0.0068   Epoch: 14   Global Step: 246520   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:55,835-Speed 9299.95 samples/sec   Loss 4.4194   LearningRate 0.0068   Epoch: 14   Global Step: 246530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:56,928-Speed 9369.80 samples/sec   Loss 4.4722   LearningRate 0.0068   Epoch: 14   Global Step: 246540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:29:58,041-Speed 9213.25 samples/sec   Loss 4.4142   LearningRate 0.0068   Epoch: 14   Global Step: 246550   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:29:59,066-Speed 9988.88 samples/sec   Loss 4.4256   LearningRate 0.0068   Epoch: 14   Global Step: 246560   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:00,137-Speed 9570.17 samples/sec   Loss 4.4041   LearningRate 0.0068   Epoch: 14   Global Step: 246570   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:01,230-Speed 9368.86 samples/sec   Loss 4.4293   LearningRate 0.0068   Epoch: 14   Global Step: 246580   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:02,327-Speed 9342.41 samples/sec   Loss 4.3716   LearningRate 0.0068   Epoch: 14   Global Step: 246590   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:03,436-Speed 9244.33 samples/sec   Loss 4.4904   LearningRate 0.0068   Epoch: 14   Global Step: 246600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:04,524-Speed 9413.94 samples/sec   Loss 4.5036   LearningRate 0.0068   Epoch: 14   Global Step: 246610   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:05,601-Speed 9519.44 samples/sec   Loss 4.4534   LearningRate 0.0068   Epoch: 14   Global Step: 246620   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:06,692-Speed 9388.82 samples/sec   Loss 4.3544   LearningRate 0.0068   Epoch: 14   Global Step: 246630   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:07,857-Speed 8792.73 samples/sec   Loss 4.3996   LearningRate 0.0068   Epoch: 14   Global Step: 246640   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:08,963-Speed 9263.02 samples/sec   Loss 4.4357   LearningRate 0.0068   Epoch: 14   Global Step: 246650   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:10,086-Speed 9127.09 samples/sec   Loss 4.3893   LearningRate 0.0068   Epoch: 14   Global Step: 246660   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:11,191-Speed 9274.63 samples/sec   Loss 4.4958   LearningRate 0.0068   Epoch: 14   Global Step: 246670   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:12,343-Speed 8894.79 samples/sec   Loss 4.5662   LearningRate 0.0068   Epoch: 14   Global Step: 246680   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:13,437-Speed 9364.32 samples/sec   Loss 4.4130   LearningRate 0.0068   Epoch: 14   Global Step: 246690   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:14,491-Speed 9724.90 samples/sec   Loss 4.4026   LearningRate 0.0068   Epoch: 14   Global Step: 246700   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:15,555-Speed 9624.94 samples/sec   Loss 4.4091   LearningRate 0.0068   Epoch: 14   Global Step: 246710   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:16,611-Speed 9709.92 samples/sec   Loss 4.4232   LearningRate 0.0068   Epoch: 14   Global Step: 246720   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:17,693-Speed 9468.47 samples/sec   Loss 4.4258   LearningRate 0.0068   Epoch: 14   Global Step: 246730   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:18,757-Speed 9624.30 samples/sec   Loss 4.3744   LearningRate 0.0068   Epoch: 14   Global Step: 246740   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:19,879-Speed 9132.55 samples/sec   Loss 4.5099   LearningRate 0.0068   Epoch: 14   Global Step: 246750   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:20,962-Speed 9458.27 samples/sec   Loss 4.4948   LearningRate 0.0068   Epoch: 14   Global Step: 246760   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:22,107-Speed 8955.29 samples/sec   Loss 4.3724   LearningRate 0.0068   Epoch: 14   Global Step: 246770   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:23,216-Speed 9241.56 samples/sec   Loss 4.5073   LearningRate 0.0068   Epoch: 14   Global Step: 246780   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:24,320-Speed 9275.57 samples/sec   Loss 4.4259   LearningRate 0.0068   Epoch: 14   Global Step: 246790   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:25,425-Speed 9278.46 samples/sec   Loss 4.4859   LearningRate 0.0068   Epoch: 14   Global Step: 246800   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:26,514-Speed 9402.45 samples/sec   Loss 4.5532   LearningRate 0.0068   Epoch: 14   Global Step: 246810   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:27,562-Speed 9775.75 samples/sec   Loss 4.4768   LearningRate 0.0068   Epoch: 14   Global Step: 246820   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:28,645-Speed 9462.45 samples/sec   Loss 4.4693   LearningRate 0.0068   Epoch: 14   Global Step: 246830   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:29,741-Speed 9351.71 samples/sec   Loss 4.4427   LearningRate 0.0068   Epoch: 14   Global Step: 246840   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:30,797-Speed 9698.56 samples/sec   Loss 4.4764   LearningRate 0.0068   Epoch: 14   Global Step: 246850   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:31,854-Speed 9695.34 samples/sec   Loss 4.5067   LearningRate 0.0068   Epoch: 14   Global Step: 246860   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:32,932-Speed 9507.58 samples/sec   Loss 4.4648   LearningRate 0.0068   Epoch: 14   Global Step: 246870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:34,016-Speed 9460.24 samples/sec   Loss 4.4785   LearningRate 0.0068   Epoch: 14   Global Step: 246880   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:35,128-Speed 9211.44 samples/sec   Loss 4.4296   LearningRate 0.0068   Epoch: 14   Global Step: 246890   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:36,249-Speed 9141.89 samples/sec   Loss 4.3752   LearningRate 0.0068   Epoch: 14   Global Step: 246900   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:37,326-Speed 9511.85 samples/sec   Loss 4.4850   LearningRate 0.0068   Epoch: 14   Global Step: 246910   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-11 21:30:38,367-Speed 9845.84 samples/sec   Loss 4.4293   LearningRate 0.0068   Epoch: 14   Global Step: 246920   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:39,426-Speed 9678.68 samples/sec   Loss 4.4694   LearningRate 0.0068   Epoch: 14   Global Step: 246930   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:40,483-Speed 9690.79 samples/sec   Loss 4.4766   LearningRate 0.0068   Epoch: 14   Global Step: 246940   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:41,551-Speed 9595.60 samples/sec   Loss 4.4591   LearningRate 0.0068   Epoch: 14   Global Step: 246950   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:42,640-Speed 9406.20 samples/sec   Loss 4.4763   LearningRate 0.0068   Epoch: 14   Global Step: 246960   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:43,789-Speed 8911.93 samples/sec   Loss 4.5325   LearningRate 0.0068   Epoch: 14   Global Step: 246970   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:44,883-Speed 9369.55 samples/sec   Loss 4.5154   LearningRate 0.0068   Epoch: 14   Global Step: 246980   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:45,987-Speed 9279.30 samples/sec   Loss 4.3916   LearningRate 0.0068   Epoch: 14   Global Step: 246990   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:47,062-Speed 9529.66 samples/sec   Loss 4.4254   LearningRate 0.0068   Epoch: 14   Global Step: 247000   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:48,145-Speed 9460.98 samples/sec   Loss 4.4043   LearningRate 0.0068   Epoch: 14   Global Step: 247010   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:49,243-Speed 9327.59 samples/sec   Loss 4.5010   LearningRate 0.0068   Epoch: 14   Global Step: 247020   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:50,293-Speed 9765.49 samples/sec   Loss 4.5163   LearningRate 0.0068   Epoch: 14   Global Step: 247030   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:51,385-Speed 9387.01 samples/sec   Loss 4.4213   LearningRate 0.0068   Epoch: 14   Global Step: 247040   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:52,470-Speed 9448.88 samples/sec   Loss 4.4005   LearningRate 0.0068   Epoch: 14   Global Step: 247050   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:53,547-Speed 9519.47 samples/sec   Loss 4.4281   LearningRate 0.0068   Epoch: 14   Global Step: 247060   Fp16 Grad Scale: 131072   Required: 4 hours
Training: 2022-04-11 21:30:54,632-Speed 9443.49 samples/sec   Loss 4.4223   LearningRate 0.0068   Epoch: 14   Global Step: 247070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:30:55,707-Speed 9529.26 samples/sec   Loss 4.3786   LearningRate 0.0068   Epoch: 14   Global Step: 247080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:30:56,768-Speed 9657.65 samples/sec   Loss 4.4418   LearningRate 0.0068   Epoch: 14   Global Step: 247090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:30:57,870-Speed 9294.42 samples/sec   Loss 4.3726   LearningRate 0.0067   Epoch: 14   Global Step: 247100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:30:58,945-Speed 9528.90 samples/sec   Loss 4.5020   LearningRate 0.0067   Epoch: 14   Global Step: 247110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:00,050-Speed 9277.24 samples/sec   Loss 4.5815   LearningRate 0.0067   Epoch: 14   Global Step: 247120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:01,122-Speed 9556.61 samples/sec   Loss 4.3731   LearningRate 0.0067   Epoch: 14   Global Step: 247130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:02,207-Speed 9443.89 samples/sec   Loss 4.5220   LearningRate 0.0067   Epoch: 14   Global Step: 247140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:03,280-Speed 9548.99 samples/sec   Loss 4.4829   LearningRate 0.0067   Epoch: 14   Global Step: 247150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:04,354-Speed 9537.99 samples/sec   Loss 4.5029   LearningRate 0.0067   Epoch: 14   Global Step: 247160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:05,414-Speed 9669.46 samples/sec   Loss 4.4660   LearningRate 0.0067   Epoch: 14   Global Step: 247170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:06,518-Speed 9281.53 samples/sec   Loss 4.4861   LearningRate 0.0067   Epoch: 14   Global Step: 247180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:07,595-Speed 9512.70 samples/sec   Loss 4.4712   LearningRate 0.0067   Epoch: 14   Global Step: 247190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:08,661-Speed 9612.03 samples/sec   Loss 4.4967   LearningRate 0.0067   Epoch: 14   Global Step: 247200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:09,775-Speed 9197.32 samples/sec   Loss 4.4704   LearningRate 0.0067   Epoch: 14   Global Step: 247210   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:10,855-Speed 9485.95 samples/sec   Loss 4.4270   LearningRate 0.0067   Epoch: 14   Global Step: 247220   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:11,973-Speed 9166.00 samples/sec   Loss 4.4563   LearningRate 0.0067   Epoch: 14   Global Step: 247230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:13,084-Speed 9224.02 samples/sec   Loss 4.5118   LearningRate 0.0067   Epoch: 14   Global Step: 247240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:14,212-Speed 9082.88 samples/sec   Loss 4.3977   LearningRate 0.0067   Epoch: 14   Global Step: 247250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:15,292-Speed 9483.23 samples/sec   Loss 4.4588   LearningRate 0.0067   Epoch: 14   Global Step: 247260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:16,385-Speed 9373.63 samples/sec   Loss 4.3880   LearningRate 0.0067   Epoch: 14   Global Step: 247270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:17,439-Speed 9721.27 samples/sec   Loss 4.4963   LearningRate 0.0067   Epoch: 14   Global Step: 247280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:18,489-Speed 9756.20 samples/sec   Loss 4.4539   LearningRate 0.0067   Epoch: 14   Global Step: 247290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:19,567-Speed 9510.06 samples/sec   Loss 4.5018   LearningRate 0.0067   Epoch: 14   Global Step: 247300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:20,649-Speed 9462.93 samples/sec   Loss 4.5268   LearningRate 0.0067   Epoch: 14   Global Step: 247310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:21,765-Speed 9186.10 samples/sec   Loss 4.4245   LearningRate 0.0067   Epoch: 14   Global Step: 247320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:22,879-Speed 9201.67 samples/sec   Loss 4.4071   LearningRate 0.0067   Epoch: 14   Global Step: 247330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:23,988-Speed 9235.00 samples/sec   Loss 4.4121   LearningRate 0.0067   Epoch: 14   Global Step: 247340   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:25,023-Speed 9895.97 samples/sec   Loss 4.5459   LearningRate 0.0067   Epoch: 14   Global Step: 247350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:26,073-Speed 9763.83 samples/sec   Loss 4.5407   LearningRate 0.0067   Epoch: 14   Global Step: 247360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:27,165-Speed 9378.46 samples/sec   Loss 4.3382   LearningRate 0.0067   Epoch: 14   Global Step: 247370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:28,283-Speed 9167.19 samples/sec   Loss 4.4054   LearningRate 0.0067   Epoch: 14   Global Step: 247380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:29,416-Speed 9038.99 samples/sec   Loss 4.4383   LearningRate 0.0067   Epoch: 14   Global Step: 247390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:30,514-Speed 9339.58 samples/sec   Loss 4.4522   LearningRate 0.0067   Epoch: 14   Global Step: 247400   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-11 21:31:31,630-Speed 9181.30 samples/sec   Loss 4.4551   LearningRate 0.0067   Epoch: 14   Global Step: 247410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:32,692-Speed 9647.83 samples/sec   Loss 4.4270   LearningRate 0.0067   Epoch: 14   Global Step: 247420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:33,789-Speed 9339.62 samples/sec   Loss 4.4902   LearningRate 0.0067   Epoch: 14   Global Step: 247430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:34,861-Speed 9555.24 samples/sec   Loss 4.4046   LearningRate 0.0067   Epoch: 14   Global Step: 247440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:35,955-Speed 9364.87 samples/sec   Loss 4.5236   LearningRate 0.0067   Epoch: 14   Global Step: 247450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:37,017-Speed 9649.66 samples/sec   Loss 4.4291   LearningRate 0.0067   Epoch: 14   Global Step: 247460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:38,077-Speed 9661.31 samples/sec   Loss 4.5323   LearningRate 0.0067   Epoch: 14   Global Step: 247470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:39,133-Speed 9709.31 samples/sec   Loss 4.4828   LearningRate 0.0067   Epoch: 14   Global Step: 247480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:40,204-Speed 9570.11 samples/sec   Loss 4.4530   LearningRate 0.0067   Epoch: 14   Global Step: 247490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:41,274-Speed 9577.54 samples/sec   Loss 4.3989   LearningRate 0.0067   Epoch: 14   Global Step: 247500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:42,313-Speed 9861.98 samples/sec   Loss 4.4588   LearningRate 0.0067   Epoch: 14   Global Step: 247510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:43,396-Speed 9453.93 samples/sec   Loss 4.4068   LearningRate 0.0067   Epoch: 14   Global Step: 247520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:44,462-Speed 9615.34 samples/sec   Loss 4.3775   LearningRate 0.0067   Epoch: 14   Global Step: 247530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:45,621-Speed 8836.10 samples/sec   Loss 4.6051   LearningRate 0.0067   Epoch: 14   Global Step: 247540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:46,684-Speed 9638.27 samples/sec   Loss 4.4515   LearningRate 0.0067   Epoch: 14   Global Step: 247550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:47,766-Speed 9476.92 samples/sec   Loss 4.2657   LearningRate 0.0067   Epoch: 14   Global Step: 247560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:48,834-Speed 9590.54 samples/sec   Loss 4.4732   LearningRate 0.0067   Epoch: 14   Global Step: 247570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:49,887-Speed 9734.57 samples/sec   Loss 4.4567   LearningRate 0.0067   Epoch: 14   Global Step: 247580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:50,916-Speed 9953.33 samples/sec   Loss 4.5225   LearningRate 0.0067   Epoch: 14   Global Step: 247590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:51,979-Speed 9638.09 samples/sec   Loss 4.3780   LearningRate 0.0067   Epoch: 14   Global Step: 247600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:31:53,088-Speed 9240.63 samples/sec   Loss 4.3358   LearningRate 0.0067   Epoch: 14   Global Step: 247610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:54,151-Speed 9641.20 samples/sec   Loss 4.4226   LearningRate 0.0067   Epoch: 14   Global Step: 247620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:55,267-Speed 9179.37 samples/sec   Loss 4.4350   LearningRate 0.0067   Epoch: 14   Global Step: 247630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:56,347-Speed 9487.61 samples/sec   Loss 4.4311   LearningRate 0.0067   Epoch: 14   Global Step: 247640   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:57,439-Speed 9381.19 samples/sec   Loss 4.5929   LearningRate 0.0067   Epoch: 14   Global Step: 247650   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:58,540-Speed 9311.59 samples/sec   Loss 4.4372   LearningRate 0.0067   Epoch: 14   Global Step: 247660   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:31:59,680-Speed 8980.03 samples/sec   Loss 4.4614   LearningRate 0.0067   Epoch: 14   Global Step: 247670   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:32:00,814-Speed 9036.26 samples/sec   Loss 4.5480   LearningRate 0.0067   Epoch: 14   Global Step: 247680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:32:01,913-Speed 9319.92 samples/sec   Loss 4.4972   LearningRate 0.0067   Epoch: 14   Global Step: 247690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:03,021-Speed 9248.12 samples/sec   Loss 4.5207   LearningRate 0.0067   Epoch: 14   Global Step: 247700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:04,109-Speed 9422.90 samples/sec   Loss 4.3861   LearningRate 0.0067   Epoch: 14   Global Step: 247710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:05,243-Speed 9034.56 samples/sec   Loss 4.4641   LearningRate 0.0067   Epoch: 14   Global Step: 247720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:06,339-Speed 9351.74 samples/sec   Loss 4.4445   LearningRate 0.0067   Epoch: 14   Global Step: 247730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:07,423-Speed 9450.96 samples/sec   Loss 4.4224   LearningRate 0.0066   Epoch: 14   Global Step: 247740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:08,499-Speed 9523.59 samples/sec   Loss 4.4072   LearningRate 0.0066   Epoch: 14   Global Step: 247750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:09,599-Speed 9314.74 samples/sec   Loss 4.4491   LearningRate 0.0066   Epoch: 14   Global Step: 247760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:10,701-Speed 9300.26 samples/sec   Loss 4.4013   LearningRate 0.0066   Epoch: 14   Global Step: 247770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:11,796-Speed 9357.40 samples/sec   Loss 4.4885   LearningRate 0.0066   Epoch: 14   Global Step: 247780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:12,874-Speed 9499.64 samples/sec   Loss 4.4887   LearningRate 0.0066   Epoch: 14   Global Step: 247790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:32:13,937-Speed 9645.90 samples/sec   Loss 4.4715   LearningRate 0.0066   Epoch: 14   Global Step: 247800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:32:14,981-Speed 9812.17 samples/sec   Loss 4.4543   LearningRate 0.0066   Epoch: 14   Global Step: 247810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:32:16,032-Speed 9761.20 samples/sec   Loss 4.4751   LearningRate 0.0066   Epoch: 14   Global Step: 247820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:32:17,133-Speed 9301.47 samples/sec   Loss 4.4659   LearningRate 0.0066   Epoch: 14   Global Step: 247830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:32:18,185-Speed 9745.22 samples/sec   Loss 4.4166   LearningRate 0.0066   Epoch: 14   Global Step: 247840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:32:19,257-Speed 9557.35 samples/sec   Loss 4.4781   LearningRate 0.0066   Epoch: 14   Global Step: 247850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:32:20,380-Speed 9125.27 samples/sec   Loss 4.4721   LearningRate 0.0066   Epoch: 14   Global Step: 247860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:32:21,455-Speed 9524.26 samples/sec   Loss 4.4047   LearningRate 0.0066   Epoch: 14   Global Step: 247870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:32:22,560-Speed 9273.01 samples/sec   Loss 4.4482   LearningRate 0.0066   Epoch: 14   Global Step: 247880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:32:23,659-Speed 9327.84 samples/sec   Loss 4.4244   LearningRate 0.0066   Epoch: 14   Global Step: 247890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:32:24,756-Speed 9341.11 samples/sec   Loss 4.3668   LearningRate 0.0066   Epoch: 14   Global Step: 247900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:32:25,829-Speed 9546.98 samples/sec   Loss 4.3897   LearningRate 0.0066   Epoch: 14   Global Step: 247910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:26,945-Speed 9183.94 samples/sec   Loss 4.4428   LearningRate 0.0066   Epoch: 14   Global Step: 247920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:28,065-Speed 9151.53 samples/sec   Loss 4.4575   LearningRate 0.0066   Epoch: 14   Global Step: 247930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:29,188-Speed 9120.33 samples/sec   Loss 4.4654   LearningRate 0.0066   Epoch: 14   Global Step: 247940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:30,303-Speed 9188.85 samples/sec   Loss 4.3813   LearningRate 0.0066   Epoch: 14   Global Step: 247950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:31,431-Speed 9081.84 samples/sec   Loss 4.4997   LearningRate 0.0066   Epoch: 14   Global Step: 247960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:32,528-Speed 9340.81 samples/sec   Loss 4.4408   LearningRate 0.0066   Epoch: 14   Global Step: 247970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:33,614-Speed 9437.80 samples/sec   Loss 4.4398   LearningRate 0.0066   Epoch: 14   Global Step: 247980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:34,686-Speed 9555.51 samples/sec   Loss 4.4027   LearningRate 0.0066   Epoch: 14   Global Step: 247990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:35,770-Speed 9451.35 samples/sec   Loss 4.3953   LearningRate 0.0066   Epoch: 14   Global Step: 248000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:32:57,638-[lfw][248000]XNorm: 7.377807
Training: 2022-04-11 21:32:57,638-[lfw][248000]Accuracy-Flip: 0.99600+-0.00281
Training: 2022-04-11 21:32:57,639-[lfw][248000]Accuracy-Highest: 0.99733
Training: 2022-04-11 21:33:22,930-[cfp_fp][248000]XNorm: 6.355365
Training: 2022-04-11 21:33:22,931-[cfp_fp][248000]Accuracy-Flip: 0.96800+-0.00980
Training: 2022-04-11 21:33:22,931-[cfp_fp][248000]Accuracy-Highest: 0.97143
Training: 2022-04-11 21:33:44,762-[agedb_30][248000]XNorm: 7.191690
Training: 2022-04-11 21:33:44,763-[agedb_30][248000]Accuracy-Flip: 0.97350+-0.00828
Training: 2022-04-11 21:33:44,763-[agedb_30][248000]Accuracy-Highest: 0.97350
Training: 2022-04-11 21:33:45,872-Speed 146.08 samples/sec   Loss 4.4540   LearningRate 0.0066   Epoch: 14   Global Step: 248010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:33:46,948-Speed 9521.21 samples/sec   Loss 4.4630   LearningRate 0.0066   Epoch: 14   Global Step: 248020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:33:48,045-Speed 9343.85 samples/sec   Loss 4.4350   LearningRate 0.0066   Epoch: 14   Global Step: 248030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:33:49,158-Speed 9205.35 samples/sec   Loss 4.4011   LearningRate 0.0066   Epoch: 14   Global Step: 248040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:33:50,248-Speed 9395.67 samples/sec   Loss 4.4670   LearningRate 0.0066   Epoch: 14   Global Step: 248050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:33:51,349-Speed 9305.68 samples/sec   Loss 4.4717   LearningRate 0.0066   Epoch: 14   Global Step: 248060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:33:52,439-Speed 9405.33 samples/sec   Loss 4.4463   LearningRate 0.0066   Epoch: 14   Global Step: 248070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:33:53,489-Speed 9754.86 samples/sec   Loss 4.4349   LearningRate 0.0066   Epoch: 14   Global Step: 248080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:33:54,582-Speed 9378.72 samples/sec   Loss 4.4074   LearningRate 0.0066   Epoch: 14   Global Step: 248090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:33:55,683-Speed 9305.46 samples/sec   Loss 4.4078   LearningRate 0.0066   Epoch: 14   Global Step: 248100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:33:56,795-Speed 9210.90 samples/sec   Loss 4.4617   LearningRate 0.0066   Epoch: 14   Global Step: 248110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:33:57,869-Speed 9534.70 samples/sec   Loss 4.3979   LearningRate 0.0066   Epoch: 14   Global Step: 248120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:33:58,947-Speed 9511.03 samples/sec   Loss 4.4293   LearningRate 0.0066   Epoch: 14   Global Step: 248130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:00,045-Speed 9326.39 samples/sec   Loss 4.4536   LearningRate 0.0066   Epoch: 14   Global Step: 248140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:34:01,113-Speed 9592.96 samples/sec   Loss 4.4265   LearningRate 0.0066   Epoch: 14   Global Step: 248150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:34:02,174-Speed 9659.72 samples/sec   Loss 4.5242   LearningRate 0.0066   Epoch: 14   Global Step: 248160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:03,274-Speed 9312.96 samples/sec   Loss 4.5184   LearningRate 0.0066   Epoch: 14   Global Step: 248170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:04,389-Speed 9193.89 samples/sec   Loss 4.5075   LearningRate 0.0066   Epoch: 14   Global Step: 248180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:05,504-Speed 9185.26 samples/sec   Loss 4.4505   LearningRate 0.0066   Epoch: 14   Global Step: 248190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:06,559-Speed 9709.53 samples/sec   Loss 4.4970   LearningRate 0.0066   Epoch: 14   Global Step: 248200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:07,654-Speed 9360.17 samples/sec   Loss 4.4709   LearningRate 0.0066   Epoch: 14   Global Step: 248210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:08,772-Speed 9168.45 samples/sec   Loss 4.4398   LearningRate 0.0066   Epoch: 14   Global Step: 248220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:09,870-Speed 9329.95 samples/sec   Loss 4.4085   LearningRate 0.0066   Epoch: 14   Global Step: 248230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:10,946-Speed 9522.06 samples/sec   Loss 4.3575   LearningRate 0.0066   Epoch: 14   Global Step: 248240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:12,070-Speed 9115.82 samples/sec   Loss 4.4376   LearningRate 0.0066   Epoch: 14   Global Step: 248250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:13,163-Speed 9376.72 samples/sec   Loss 4.5176   LearningRate 0.0066   Epoch: 14   Global Step: 248260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:14,237-Speed 9537.88 samples/sec   Loss 4.4801   LearningRate 0.0066   Epoch: 14   Global Step: 248270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:15,328-Speed 9390.13 samples/sec   Loss 4.5333   LearningRate 0.0066   Epoch: 14   Global Step: 248280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:16,404-Speed 9525.61 samples/sec   Loss 4.4361   LearningRate 0.0066   Epoch: 14   Global Step: 248290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:17,521-Speed 9170.31 samples/sec   Loss 4.5095   LearningRate 0.0066   Epoch: 14   Global Step: 248300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:18,636-Speed 9193.09 samples/sec   Loss 4.4474   LearningRate 0.0066   Epoch: 14   Global Step: 248310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:19,715-Speed 9496.44 samples/sec   Loss 4.4859   LearningRate 0.0066   Epoch: 14   Global Step: 248320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:20,771-Speed 9704.55 samples/sec   Loss 4.4244   LearningRate 0.0066   Epoch: 14   Global Step: 248330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:21,873-Speed 9302.58 samples/sec   Loss 4.4889   LearningRate 0.0066   Epoch: 14   Global Step: 248340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:22,947-Speed 9532.04 samples/sec   Loss 4.3984   LearningRate 0.0066   Epoch: 14   Global Step: 248350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:24,061-Speed 9204.50 samples/sec   Loss 4.4375   LearningRate 0.0066   Epoch: 14   Global Step: 248360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:34:25,159-Speed 9330.26 samples/sec   Loss 4.5056   LearningRate 0.0066   Epoch: 14   Global Step: 248370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:34:26,222-Speed 9633.08 samples/sec   Loss 4.4489   LearningRate 0.0066   Epoch: 14   Global Step: 248380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:34:27,301-Speed 9496.36 samples/sec   Loss 4.4358   LearningRate 0.0065   Epoch: 14   Global Step: 248390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:28,397-Speed 9351.13 samples/sec   Loss 4.4648   LearningRate 0.0065   Epoch: 14   Global Step: 248400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:29,458-Speed 9653.79 samples/sec   Loss 4.4314   LearningRate 0.0065   Epoch: 14   Global Step: 248410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:30,579-Speed 9139.80 samples/sec   Loss 4.4412   LearningRate 0.0065   Epoch: 14   Global Step: 248420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:31,708-Speed 9078.98 samples/sec   Loss 4.3827   LearningRate 0.0065   Epoch: 14   Global Step: 248430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:32,799-Speed 9393.13 samples/sec   Loss 4.3911   LearningRate 0.0065   Epoch: 14   Global Step: 248440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:33,881-Speed 9480.74 samples/sec   Loss 4.4652   LearningRate 0.0065   Epoch: 14   Global Step: 248450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:34,982-Speed 9303.15 samples/sec   Loss 4.4497   LearningRate 0.0065   Epoch: 14   Global Step: 248460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:36,104-Speed 9127.98 samples/sec   Loss 4.4596   LearningRate 0.0065   Epoch: 14   Global Step: 248470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:37,171-Speed 9608.66 samples/sec   Loss 4.4496   LearningRate 0.0065   Epoch: 14   Global Step: 248480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:38,227-Speed 9697.84 samples/sec   Loss 4.4096   LearningRate 0.0065   Epoch: 14   Global Step: 248490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:34:39,279-Speed 9740.89 samples/sec   Loss 4.5496   LearningRate 0.0065   Epoch: 14   Global Step: 248500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:34:40,343-Speed 9633.91 samples/sec   Loss 4.4728   LearningRate 0.0065   Epoch: 14   Global Step: 248510   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:34:41,427-Speed 9450.02 samples/sec   Loss 4.5480   LearningRate 0.0065   Epoch: 14   Global Step: 248520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:34:42,502-Speed 9527.04 samples/sec   Loss 4.4710   LearningRate 0.0065   Epoch: 14   Global Step: 248530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:34:43,578-Speed 9525.63 samples/sec   Loss 4.3678   LearningRate 0.0065   Epoch: 14   Global Step: 248540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:34:44,690-Speed 9211.76 samples/sec   Loss 4.5416   LearningRate 0.0065   Epoch: 14   Global Step: 248550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:34:45,789-Speed 9334.74 samples/sec   Loss 4.5371   LearningRate 0.0065   Epoch: 14   Global Step: 248560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:34:46,848-Speed 9675.47 samples/sec   Loss 4.4859   LearningRate 0.0065   Epoch: 14   Global Step: 248570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:47,931-Speed 9455.27 samples/sec   Loss 4.4285   LearningRate 0.0065   Epoch: 14   Global Step: 248580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:49,085-Speed 8877.40 samples/sec   Loss 4.3999   LearningRate 0.0065   Epoch: 14   Global Step: 248590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:50,139-Speed 9725.67 samples/sec   Loss 4.4834   LearningRate 0.0065   Epoch: 14   Global Step: 248600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:51,208-Speed 9589.22 samples/sec   Loss 4.4723   LearningRate 0.0065   Epoch: 14   Global Step: 248610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:52,333-Speed 9108.23 samples/sec   Loss 4.4541   LearningRate 0.0065   Epoch: 14   Global Step: 248620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:53,401-Speed 9591.67 samples/sec   Loss 4.4288   LearningRate 0.0065   Epoch: 14   Global Step: 248630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:54,496-Speed 9351.42 samples/sec   Loss 4.3826   LearningRate 0.0065   Epoch: 14   Global Step: 248640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:55,633-Speed 9016.01 samples/sec   Loss 4.4258   LearningRate 0.0065   Epoch: 14   Global Step: 248650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:56,735-Speed 9292.50 samples/sec   Loss 4.4593   LearningRate 0.0065   Epoch: 14   Global Step: 248660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:34:57,858-Speed 9132.05 samples/sec   Loss 4.4984   LearningRate 0.0065   Epoch: 14   Global Step: 248670   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:34:58,913-Speed 9706.97 samples/sec   Loss 4.4785   LearningRate 0.0065   Epoch: 14   Global Step: 248680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:35:00,005-Speed 9382.60 samples/sec   Loss 4.4823   LearningRate 0.0065   Epoch: 14   Global Step: 248690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:01,125-Speed 9149.85 samples/sec   Loss 4.5059   LearningRate 0.0065   Epoch: 14   Global Step: 248700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:02,217-Speed 9380.14 samples/sec   Loss 4.3256   LearningRate 0.0065   Epoch: 14   Global Step: 248710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:03,368-Speed 8904.85 samples/sec   Loss 4.4706   LearningRate 0.0065   Epoch: 14   Global Step: 248720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:04,466-Speed 9331.37 samples/sec   Loss 4.4293   LearningRate 0.0065   Epoch: 14   Global Step: 248730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:05,562-Speed 9349.27 samples/sec   Loss 4.4588   LearningRate 0.0065   Epoch: 14   Global Step: 248740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:06,625-Speed 9636.37 samples/sec   Loss 4.4325   LearningRate 0.0065   Epoch: 14   Global Step: 248750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:07,669-Speed 9809.39 samples/sec   Loss 4.3976   LearningRate 0.0065   Epoch: 14   Global Step: 248760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:08,772-Speed 9294.68 samples/sec   Loss 4.3965   LearningRate 0.0065   Epoch: 14   Global Step: 248770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:09,841-Speed 9588.09 samples/sec   Loss 4.4420   LearningRate 0.0065   Epoch: 14   Global Step: 248780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:10,910-Speed 9586.12 samples/sec   Loss 4.3875   LearningRate 0.0065   Epoch: 14   Global Step: 248790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:11,983-Speed 9549.66 samples/sec   Loss 4.5206   LearningRate 0.0065   Epoch: 14   Global Step: 248800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:13,021-Speed 9870.71 samples/sec   Loss 4.4422   LearningRate 0.0065   Epoch: 14   Global Step: 248810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:14,108-Speed 9429.90 samples/sec   Loss 4.4732   LearningRate 0.0065   Epoch: 14   Global Step: 248820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:15,157-Speed 9767.74 samples/sec   Loss 4.4271   LearningRate 0.0065   Epoch: 14   Global Step: 248830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:16,203-Speed 9799.96 samples/sec   Loss 4.4012   LearningRate 0.0065   Epoch: 14   Global Step: 248840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:17,280-Speed 9513.25 samples/sec   Loss 4.5663   LearningRate 0.0065   Epoch: 14   Global Step: 248850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:18,402-Speed 9126.69 samples/sec   Loss 4.3996   LearningRate 0.0065   Epoch: 14   Global Step: 248860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:19,511-Speed 9235.81 samples/sec   Loss 4.5020   LearningRate 0.0065   Epoch: 14   Global Step: 248870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:20,583-Speed 9561.76 samples/sec   Loss 4.4330   LearningRate 0.0065   Epoch: 14   Global Step: 248880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:21,662-Speed 9492.42 samples/sec   Loss 4.4856   LearningRate 0.0065   Epoch: 14   Global Step: 248890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:35:22,728-Speed 9612.79 samples/sec   Loss 4.4159   LearningRate 0.0065   Epoch: 14   Global Step: 248900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:35:23,800-Speed 9559.78 samples/sec   Loss 4.4958   LearningRate 0.0065   Epoch: 14   Global Step: 248910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:35:24,857-Speed 9686.79 samples/sec   Loss 4.5128   LearningRate 0.0065   Epoch: 14   Global Step: 248920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:25,941-Speed 9455.12 samples/sec   Loss 4.4449   LearningRate 0.0065   Epoch: 14   Global Step: 248930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:27,055-Speed 9200.49 samples/sec   Loss 4.3992   LearningRate 0.0065   Epoch: 14   Global Step: 248940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:28,152-Speed 9343.47 samples/sec   Loss 4.4234   LearningRate 0.0065   Epoch: 14   Global Step: 248950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:29,199-Speed 9782.77 samples/sec   Loss 4.5084   LearningRate 0.0065   Epoch: 14   Global Step: 248960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:30,286-Speed 9431.51 samples/sec   Loss 4.4446   LearningRate 0.0065   Epoch: 14   Global Step: 248970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:31,373-Speed 9423.57 samples/sec   Loss 4.4391   LearningRate 0.0065   Epoch: 14   Global Step: 248980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:32,488-Speed 9186.44 samples/sec   Loss 4.4667   LearningRate 0.0065   Epoch: 14   Global Step: 248990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:33,554-Speed 9611.71 samples/sec   Loss 4.4823   LearningRate 0.0065   Epoch: 14   Global Step: 249000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:34,657-Speed 9286.44 samples/sec   Loss 4.3338   LearningRate 0.0065   Epoch: 14   Global Step: 249010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:35,774-Speed 9175.79 samples/sec   Loss 4.4783   LearningRate 0.0065   Epoch: 14   Global Step: 249020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:35:36,862-Speed 9417.38 samples/sec   Loss 4.4075   LearningRate 0.0065   Epoch: 14   Global Step: 249030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:35:37,919-Speed 9695.18 samples/sec   Loss 4.4636   LearningRate 0.0065   Epoch: 14   Global Step: 249040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:39,020-Speed 9308.00 samples/sec   Loss 4.4000   LearningRate 0.0064   Epoch: 14   Global Step: 249050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:40,091-Speed 9566.16 samples/sec   Loss 4.4736   LearningRate 0.0064   Epoch: 14   Global Step: 249060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:41,188-Speed 9334.21 samples/sec   Loss 4.5353   LearningRate 0.0064   Epoch: 14   Global Step: 249070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:42,298-Speed 9235.11 samples/sec   Loss 4.3922   LearningRate 0.0064   Epoch: 14   Global Step: 249080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:43,390-Speed 9384.56 samples/sec   Loss 4.5090   LearningRate 0.0064   Epoch: 14   Global Step: 249090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:44,432-Speed 9829.27 samples/sec   Loss 4.4021   LearningRate 0.0064   Epoch: 14   Global Step: 249100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:45,513-Speed 9481.48 samples/sec   Loss 4.5009   LearningRate 0.0064   Epoch: 14   Global Step: 249110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:46,627-Speed 9201.58 samples/sec   Loss 4.4660   LearningRate 0.0064   Epoch: 14   Global Step: 249120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:47,723-Speed 9347.89 samples/sec   Loss 4.4449   LearningRate 0.0064   Epoch: 14   Global Step: 249130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:48,861-Speed 8999.96 samples/sec   Loss 4.3950   LearningRate 0.0064   Epoch: 14   Global Step: 249140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:35:49,905-Speed 9816.95 samples/sec   Loss 4.4353   LearningRate 0.0064   Epoch: 14   Global Step: 249150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:50,973-Speed 9596.89 samples/sec   Loss 4.4668   LearningRate 0.0064   Epoch: 14   Global Step: 249160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:52,107-Speed 9031.93 samples/sec   Loss 4.4191   LearningRate 0.0064   Epoch: 14   Global Step: 249170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:53,173-Speed 9608.24 samples/sec   Loss 4.2824   LearningRate 0.0064   Epoch: 14   Global Step: 249180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:54,225-Speed 9745.25 samples/sec   Loss 4.4494   LearningRate 0.0064   Epoch: 14   Global Step: 249190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:55,338-Speed 9203.77 samples/sec   Loss 4.4797   LearningRate 0.0064   Epoch: 14   Global Step: 249200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:56,464-Speed 9096.81 samples/sec   Loss 4.4069   LearningRate 0.0064   Epoch: 14   Global Step: 249210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:57,532-Speed 9598.45 samples/sec   Loss 4.4044   LearningRate 0.0064   Epoch: 14   Global Step: 249220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:58,597-Speed 9615.49 samples/sec   Loss 4.4151   LearningRate 0.0064   Epoch: 14   Global Step: 249230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:35:59,656-Speed 9683.33 samples/sec   Loss 4.4488   LearningRate 0.0064   Epoch: 14   Global Step: 249240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:00,715-Speed 9671.55 samples/sec   Loss 4.5099   LearningRate 0.0064   Epoch: 14   Global Step: 249250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:01,785-Speed 9578.74 samples/sec   Loss 4.4531   LearningRate 0.0064   Epoch: 14   Global Step: 249260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:02,899-Speed 9201.76 samples/sec   Loss 4.4893   LearningRate 0.0064   Epoch: 14   Global Step: 249270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:03,996-Speed 9334.84 samples/sec   Loss 4.4551   LearningRate 0.0064   Epoch: 14   Global Step: 249280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:05,123-Speed 9094.80 samples/sec   Loss 4.4811   LearningRate 0.0064   Epoch: 14   Global Step: 249290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:06,197-Speed 9541.77 samples/sec   Loss 4.4064   LearningRate 0.0064   Epoch: 14   Global Step: 249300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:07,248-Speed 9745.16 samples/sec   Loss 4.4592   LearningRate 0.0064   Epoch: 14   Global Step: 249310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:08,304-Speed 9702.29 samples/sec   Loss 4.4038   LearningRate 0.0064   Epoch: 14   Global Step: 249320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:09,418-Speed 9203.85 samples/sec   Loss 4.4059   LearningRate 0.0064   Epoch: 14   Global Step: 249330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:10,522-Speed 9274.42 samples/sec   Loss 4.3713   LearningRate 0.0064   Epoch: 14   Global Step: 249340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:11,625-Speed 9288.53 samples/sec   Loss 4.5061   LearningRate 0.0064   Epoch: 14   Global Step: 249350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:12,751-Speed 9104.85 samples/sec   Loss 4.4836   LearningRate 0.0064   Epoch: 14   Global Step: 249360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:13,807-Speed 9699.01 samples/sec   Loss 4.5065   LearningRate 0.0064   Epoch: 14   Global Step: 249370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:14,846-Speed 9858.66 samples/sec   Loss 4.4674   LearningRate 0.0064   Epoch: 14   Global Step: 249380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:15,946-Speed 9322.68 samples/sec   Loss 4.4141   LearningRate 0.0064   Epoch: 14   Global Step: 249390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:17,039-Speed 9371.44 samples/sec   Loss 4.4026   LearningRate 0.0064   Epoch: 14   Global Step: 249400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:18,168-Speed 9075.54 samples/sec   Loss 4.5242   LearningRate 0.0064   Epoch: 14   Global Step: 249410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:19,248-Speed 9483.10 samples/sec   Loss 4.5057   LearningRate 0.0064   Epoch: 14   Global Step: 249420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:20,305-Speed 9698.00 samples/sec   Loss 4.4137   LearningRate 0.0064   Epoch: 14   Global Step: 249430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:21,351-Speed 9794.07 samples/sec   Loss 4.4319   LearningRate 0.0064   Epoch: 14   Global Step: 249440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:22,415-Speed 9631.91 samples/sec   Loss 4.4891   LearningRate 0.0064   Epoch: 14   Global Step: 249450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:23,489-Speed 9540.80 samples/sec   Loss 4.3428   LearningRate 0.0064   Epoch: 14   Global Step: 249460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:24,561-Speed 9558.98 samples/sec   Loss 4.3802   LearningRate 0.0064   Epoch: 14   Global Step: 249470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:25,648-Speed 9428.57 samples/sec   Loss 4.4059   LearningRate 0.0064   Epoch: 14   Global Step: 249480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:26,723-Speed 9527.52 samples/sec   Loss 4.3775   LearningRate 0.0064   Epoch: 14   Global Step: 249490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:27,787-Speed 9623.70 samples/sec   Loss 4.4437   LearningRate 0.0064   Epoch: 14   Global Step: 249500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:28,893-Speed 9267.03 samples/sec   Loss 4.4975   LearningRate 0.0064   Epoch: 14   Global Step: 249510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:29,965-Speed 9562.80 samples/sec   Loss 4.3579   LearningRate 0.0064   Epoch: 14   Global Step: 249520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:31,029-Speed 9630.39 samples/sec   Loss 4.4161   LearningRate 0.0064   Epoch: 14   Global Step: 249530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:32,125-Speed 9347.20 samples/sec   Loss 4.5017   LearningRate 0.0064   Epoch: 14   Global Step: 249540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:33,183-Speed 9677.49 samples/sec   Loss 4.4790   LearningRate 0.0064   Epoch: 14   Global Step: 249550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:34,236-Speed 9737.08 samples/sec   Loss 4.5556   LearningRate 0.0064   Epoch: 14   Global Step: 249560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:35,336-Speed 9310.91 samples/sec   Loss 4.5091   LearningRate 0.0064   Epoch: 14   Global Step: 249570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:36,397-Speed 9654.07 samples/sec   Loss 4.4040   LearningRate 0.0064   Epoch: 14   Global Step: 249580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:37,453-Speed 9703.93 samples/sec   Loss 4.4376   LearningRate 0.0064   Epoch: 14   Global Step: 249590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:38,573-Speed 9153.53 samples/sec   Loss 4.4561   LearningRate 0.0064   Epoch: 14   Global Step: 249600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:39,655-Speed 9474.73 samples/sec   Loss 4.5395   LearningRate 0.0064   Epoch: 14   Global Step: 249610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:40,723-Speed 9595.30 samples/sec   Loss 4.4378   LearningRate 0.0064   Epoch: 14   Global Step: 249620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:41,787-Speed 9629.64 samples/sec   Loss 4.4597   LearningRate 0.0064   Epoch: 14   Global Step: 249630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:42,893-Speed 9260.29 samples/sec   Loss 4.4758   LearningRate 0.0064   Epoch: 14   Global Step: 249640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:44,005-Speed 9214.24 samples/sec   Loss 4.4195   LearningRate 0.0064   Epoch: 14   Global Step: 249650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:45,072-Speed 9607.66 samples/sec   Loss 4.3638   LearningRate 0.0064   Epoch: 14   Global Step: 249660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:46,156-Speed 9451.22 samples/sec   Loss 4.4654   LearningRate 0.0064   Epoch: 14   Global Step: 249670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:47,267-Speed 9226.51 samples/sec   Loss 4.4705   LearningRate 0.0064   Epoch: 14   Global Step: 249680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:48,301-Speed 9910.15 samples/sec   Loss 4.3656   LearningRate 0.0064   Epoch: 14   Global Step: 249690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:49,435-Speed 9032.23 samples/sec   Loss 4.3874   LearningRate 0.0064   Epoch: 14   Global Step: 249700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:50,548-Speed 9205.88 samples/sec   Loss 4.4336   LearningRate 0.0063   Epoch: 14   Global Step: 249710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:36:51,616-Speed 9591.99 samples/sec   Loss 4.5154   LearningRate 0.0063   Epoch: 14   Global Step: 249720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:52,686-Speed 9572.38 samples/sec   Loss 4.4798   LearningRate 0.0063   Epoch: 14   Global Step: 249730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:53,776-Speed 9396.88 samples/sec   Loss 4.4403   LearningRate 0.0063   Epoch: 14   Global Step: 249740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:54,933-Speed 8858.87 samples/sec   Loss 4.5231   LearningRate 0.0063   Epoch: 14   Global Step: 249750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:55,988-Speed 9710.71 samples/sec   Loss 4.3819   LearningRate 0.0063   Epoch: 14   Global Step: 249760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:57,059-Speed 9563.73 samples/sec   Loss 4.4750   LearningRate 0.0063   Epoch: 14   Global Step: 249770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:58,091-Speed 9934.58 samples/sec   Loss 4.4370   LearningRate 0.0063   Epoch: 14   Global Step: 249780   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:36:59,120-Speed 9958.43 samples/sec   Loss 4.4496   LearningRate 0.0063   Epoch: 14   Global Step: 249790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:37:00,225-Speed 9276.88 samples/sec   Loss 4.3742   LearningRate 0.0063   Epoch: 14   Global Step: 249800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:37:01,285-Speed 9659.80 samples/sec   Loss 4.5376   LearningRate 0.0063   Epoch: 14   Global Step: 249810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:37:02,363-Speed 9506.71 samples/sec   Loss 4.3668   LearningRate 0.0063   Epoch: 14   Global Step: 249820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:37:03,425-Speed 9644.31 samples/sec   Loss 4.4156   LearningRate 0.0063   Epoch: 14   Global Step: 249830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:37:04,540-Speed 9190.67 samples/sec   Loss 4.4819   LearningRate 0.0063   Epoch: 14   Global Step: 249840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:37:05,641-Speed 9305.22 samples/sec   Loss 4.3679   LearningRate 0.0063   Epoch: 14   Global Step: 249850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:37:06,692-Speed 9752.62 samples/sec   Loss 4.4129   LearningRate 0.0063   Epoch: 14   Global Step: 249860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:37:07,759-Speed 9600.78 samples/sec   Loss 4.4897   LearningRate 0.0063   Epoch: 14   Global Step: 249870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:37:08,848-Speed 9413.32 samples/sec   Loss 4.3985   LearningRate 0.0063   Epoch: 14   Global Step: 249880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:37:09,928-Speed 9481.07 samples/sec   Loss 4.4317   LearningRate 0.0063   Epoch: 14   Global Step: 249890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:37:10,985-Speed 9698.58 samples/sec   Loss 4.4115   LearningRate 0.0063   Epoch: 14   Global Step: 249900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:37:12,076-Speed 9387.70 samples/sec   Loss 4.3899   LearningRate 0.0063   Epoch: 14   Global Step: 249910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:37:13,139-Speed 9639.24 samples/sec   Loss 4.4865   LearningRate 0.0063   Epoch: 14   Global Step: 249920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:37:14,221-Speed 9464.35 samples/sec   Loss 4.4083   LearningRate 0.0063   Epoch: 14   Global Step: 249930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:37:15,294-Speed 9555.83 samples/sec   Loss 4.4873   LearningRate 0.0063   Epoch: 14   Global Step: 249940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:37:16,388-Speed 9364.66 samples/sec   Loss 4.3430   LearningRate 0.0063   Epoch: 14   Global Step: 249950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:37:17,465-Speed 9514.76 samples/sec   Loss 4.4572   LearningRate 0.0063   Epoch: 14   Global Step: 249960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:37:18,559-Speed 9367.60 samples/sec   Loss 4.4778   LearningRate 0.0063   Epoch: 14   Global Step: 249970   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:37:19,672-Speed 9202.68 samples/sec   Loss 4.4608   LearningRate 0.0063   Epoch: 14   Global Step: 249980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:37:20,775-Speed 9293.06 samples/sec   Loss 4.4632   LearningRate 0.0063   Epoch: 14   Global Step: 249990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:37:21,858-Speed 9459.69 samples/sec   Loss 4.4505   LearningRate 0.0063   Epoch: 14   Global Step: 250000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:37:43,946-[lfw][250000]XNorm: 7.486033
Training: 2022-04-11 21:37:43,947-[lfw][250000]Accuracy-Flip: 0.99617+-0.00224
Training: 2022-04-11 21:37:43,947-[lfw][250000]Accuracy-Highest: 0.99733
Training: 2022-04-11 21:38:09,429-[cfp_fp][250000]XNorm: 6.442418
Training: 2022-04-11 21:38:09,430-[cfp_fp][250000]Accuracy-Flip: 0.96886+-0.00983
Training: 2022-04-11 21:38:09,430-[cfp_fp][250000]Accuracy-Highest: 0.97143
Training: 2022-04-11 21:38:31,399-[agedb_30][250000]XNorm: 7.234231
Training: 2022-04-11 21:38:31,400-[agedb_30][250000]Accuracy-Flip: 0.97100+-0.00867
Training: 2022-04-11 21:38:31,400-[agedb_30][250000]Accuracy-Highest: 0.97350
Training: 2022-04-11 21:38:32,485-Speed 144.99 samples/sec   Loss 4.4628   LearningRate 0.0063   Epoch: 14   Global Step: 250010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:38:33,603-Speed 9166.87 samples/sec   Loss 4.3672   LearningRate 0.0063   Epoch: 14   Global Step: 250020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:38:34,700-Speed 9340.65 samples/sec   Loss 4.4093   LearningRate 0.0063   Epoch: 14   Global Step: 250030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:38:35,791-Speed 9389.31 samples/sec   Loss 4.4830   LearningRate 0.0063   Epoch: 14   Global Step: 250040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:38:36,879-Speed 9423.16 samples/sec   Loss 4.4056   LearningRate 0.0063   Epoch: 14   Global Step: 250050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:38:37,951-Speed 9560.92 samples/sec   Loss 4.3609   LearningRate 0.0063   Epoch: 14   Global Step: 250060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:38:39,037-Speed 9532.92 samples/sec   Loss 4.4189   LearningRate 0.0063   Epoch: 14   Global Step: 250070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:38:40,157-Speed 9151.48 samples/sec   Loss 4.4376   LearningRate 0.0063   Epoch: 14   Global Step: 250080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:38:41,240-Speed 9460.10 samples/sec   Loss 4.4340   LearningRate 0.0063   Epoch: 14   Global Step: 250090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:38:42,307-Speed 9601.20 samples/sec   Loss 4.4443   LearningRate 0.0063   Epoch: 14   Global Step: 250100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:38:43,393-Speed 9431.83 samples/sec   Loss 4.4459   LearningRate 0.0063   Epoch: 14   Global Step: 250110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:38:44,496-Speed 9297.45 samples/sec   Loss 4.4210   LearningRate 0.0063   Epoch: 14   Global Step: 250120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:38:45,573-Speed 9506.75 samples/sec   Loss 4.4598   LearningRate 0.0063   Epoch: 14   Global Step: 250130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:38:46,666-Speed 9376.79 samples/sec   Loss 4.5550   LearningRate 0.0063   Epoch: 14   Global Step: 250140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:38:47,749-Speed 9463.64 samples/sec   Loss 4.5029   LearningRate 0.0063   Epoch: 14   Global Step: 250150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:38:48,876-Speed 9093.33 samples/sec   Loss 4.4675   LearningRate 0.0063   Epoch: 14   Global Step: 250160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:38:49,956-Speed 9482.25 samples/sec   Loss 4.3748   LearningRate 0.0063   Epoch: 14   Global Step: 250170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:38:51,020-Speed 9629.54 samples/sec   Loss 4.3865   LearningRate 0.0063   Epoch: 14   Global Step: 250180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:38:52,087-Speed 9602.34 samples/sec   Loss 4.4796   LearningRate 0.0063   Epoch: 14   Global Step: 250190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:38:53,167-Speed 9488.76 samples/sec   Loss 4.4347   LearningRate 0.0063   Epoch: 14   Global Step: 250200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:38:54,269-Speed 9294.09 samples/sec   Loss 4.3875   LearningRate 0.0063   Epoch: 14   Global Step: 250210   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:38:55,364-Speed 9357.93 samples/sec   Loss 4.4620   LearningRate 0.0063   Epoch: 14   Global Step: 250220   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:38:56,424-Speed 9669.28 samples/sec   Loss 4.5197   LearningRate 0.0063   Epoch: 14   Global Step: 250230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:38:57,521-Speed 9339.18 samples/sec   Loss 4.3506   LearningRate 0.0063   Epoch: 14   Global Step: 250240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:38:58,621-Speed 9316.91 samples/sec   Loss 4.3607   LearningRate 0.0063   Epoch: 14   Global Step: 250250   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-11 21:38:59,730-Speed 9236.59 samples/sec   Loss 4.4107   LearningRate 0.0063   Epoch: 14   Global Step: 250260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:00,868-Speed 9001.52 samples/sec   Loss 4.4246   LearningRate 0.0063   Epoch: 14   Global Step: 250270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:02,028-Speed 8832.77 samples/sec   Loss 4.3925   LearningRate 0.0063   Epoch: 14   Global Step: 250280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:03,123-Speed 9354.97 samples/sec   Loss 4.4575   LearningRate 0.0063   Epoch: 14   Global Step: 250290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:04,199-Speed 9527.17 samples/sec   Loss 4.4048   LearningRate 0.0063   Epoch: 14   Global Step: 250300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:05,257-Speed 9683.42 samples/sec   Loss 4.4294   LearningRate 0.0063   Epoch: 14   Global Step: 250310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:06,336-Speed 9500.49 samples/sec   Loss 4.3876   LearningRate 0.0063   Epoch: 14   Global Step: 250320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:07,409-Speed 9563.68 samples/sec   Loss 4.4252   LearningRate 0.0063   Epoch: 14   Global Step: 250330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:08,495-Speed 9438.74 samples/sec   Loss 4.3780   LearningRate 0.0063   Epoch: 14   Global Step: 250340   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:09,649-Speed 8875.96 samples/sec   Loss 4.5819   LearningRate 0.0063   Epoch: 14   Global Step: 250350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:10,922-Speed 8047.76 samples/sec   Loss 4.3836   LearningRate 0.0063   Epoch: 14   Global Step: 250360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:51,369-Speed 253.18 samples/sec   Loss 4.1669   LearningRate 0.0062   Epoch: 15   Global Step: 250370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:52,765-Speed 7342.38 samples/sec   Loss 3.8133   LearningRate 0.0062   Epoch: 15   Global Step: 250380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:54,341-Speed 6503.48 samples/sec   Loss 3.7623   LearningRate 0.0062   Epoch: 15   Global Step: 250390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:55,418-Speed 9505.32 samples/sec   Loss 3.8757   LearningRate 0.0062   Epoch: 15   Global Step: 250400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:56,735-Speed 7781.74 samples/sec   Loss 3.8940   LearningRate 0.0062   Epoch: 15   Global Step: 250410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:57,911-Speed 8716.76 samples/sec   Loss 3.8622   LearningRate 0.0062   Epoch: 15   Global Step: 250420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:39:59,234-Speed 7745.97 samples/sec   Loss 3.8714   LearningRate 0.0062   Epoch: 15   Global Step: 250430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:00,301-Speed 9599.45 samples/sec   Loss 3.8332   LearningRate 0.0062   Epoch: 15   Global Step: 250440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:01,358-Speed 9698.08 samples/sec   Loss 3.7128   LearningRate 0.0062   Epoch: 15   Global Step: 250450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:02,411-Speed 9729.62 samples/sec   Loss 3.9174   LearningRate 0.0062   Epoch: 15   Global Step: 250460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:03,488-Speed 9514.05 samples/sec   Loss 3.8739   LearningRate 0.0062   Epoch: 15   Global Step: 250470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:04,711-Speed 8373.71 samples/sec   Loss 3.8378   LearningRate 0.0062   Epoch: 15   Global Step: 250480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:05,817-Speed 9264.98 samples/sec   Loss 3.8023   LearningRate 0.0062   Epoch: 15   Global Step: 250490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:06,915-Speed 9332.33 samples/sec   Loss 3.8052   LearningRate 0.0062   Epoch: 15   Global Step: 250500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:08,032-Speed 9171.95 samples/sec   Loss 3.8868   LearningRate 0.0062   Epoch: 15   Global Step: 250510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:09,146-Speed 9195.19 samples/sec   Loss 3.7932   LearningRate 0.0062   Epoch: 15   Global Step: 250520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:10,260-Speed 9196.19 samples/sec   Loss 3.8325   LearningRate 0.0062   Epoch: 15   Global Step: 250530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:11,866-Speed 6378.59 samples/sec   Loss 3.8100   LearningRate 0.0062   Epoch: 15   Global Step: 250540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:12,996-Speed 9074.45 samples/sec   Loss 3.8991   LearningRate 0.0062   Epoch: 15   Global Step: 250550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:14,109-Speed 9199.39 samples/sec   Loss 3.7682   LearningRate 0.0062   Epoch: 15   Global Step: 250560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:15,406-Speed 7899.97 samples/sec   Loss 3.8060   LearningRate 0.0062   Epoch: 15   Global Step: 250570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:16,867-Speed 7012.24 samples/sec   Loss 3.8766   LearningRate 0.0062   Epoch: 15   Global Step: 250580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:17,933-Speed 9616.81 samples/sec   Loss 3.8671   LearningRate 0.0062   Epoch: 15   Global Step: 250590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:19,253-Speed 7757.56 samples/sec   Loss 3.9153   LearningRate 0.0062   Epoch: 15   Global Step: 250600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:20,363-Speed 9230.53 samples/sec   Loss 3.8881   LearningRate 0.0062   Epoch: 15   Global Step: 250610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:21,714-Speed 7585.15 samples/sec   Loss 3.8672   LearningRate 0.0062   Epoch: 15   Global Step: 250620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:22,850-Speed 9017.93 samples/sec   Loss 3.9059   LearningRate 0.0062   Epoch: 15   Global Step: 250630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:23,937-Speed 9422.38 samples/sec   Loss 3.8423   LearningRate 0.0062   Epoch: 15   Global Step: 250640   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:25,038-Speed 9307.37 samples/sec   Loss 3.9063   LearningRate 0.0062   Epoch: 15   Global Step: 250650   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:26,397-Speed 7544.28 samples/sec   Loss 3.8625   LearningRate 0.0062   Epoch: 15   Global Step: 250660   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:27,486-Speed 9404.78 samples/sec   Loss 3.8226   LearningRate 0.0062   Epoch: 15   Global Step: 250670   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:28,557-Speed 9568.50 samples/sec   Loss 3.8300   LearningRate 0.0062   Epoch: 15   Global Step: 250680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:29,658-Speed 9304.25 samples/sec   Loss 3.9295   LearningRate 0.0062   Epoch: 15   Global Step: 250690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:30,755-Speed 9340.77 samples/sec   Loss 3.8676   LearningRate 0.0062   Epoch: 15   Global Step: 250700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:31,869-Speed 9198.70 samples/sec   Loss 3.8102   LearningRate 0.0062   Epoch: 15   Global Step: 250710   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:32,964-Speed 9356.03 samples/sec   Loss 3.8540   LearningRate 0.0062   Epoch: 15   Global Step: 250720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:34,062-Speed 9333.38 samples/sec   Loss 3.8438   LearningRate 0.0062   Epoch: 15   Global Step: 250730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:35,173-Speed 9230.32 samples/sec   Loss 3.9134   LearningRate 0.0062   Epoch: 15   Global Step: 250740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:36,319-Speed 8933.18 samples/sec   Loss 3.8988   LearningRate 0.0062   Epoch: 15   Global Step: 250750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:37,423-Speed 9285.34 samples/sec   Loss 3.8336   LearningRate 0.0062   Epoch: 15   Global Step: 250760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:38,475-Speed 9736.57 samples/sec   Loss 3.8254   LearningRate 0.0062   Epoch: 15   Global Step: 250770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:39,504-Speed 9955.54 samples/sec   Loss 4.0018   LearningRate 0.0062   Epoch: 15   Global Step: 250780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:40,596-Speed 9387.00 samples/sec   Loss 3.9697   LearningRate 0.0062   Epoch: 15   Global Step: 250790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:41,691-Speed 9358.10 samples/sec   Loss 3.8183   LearningRate 0.0062   Epoch: 15   Global Step: 250800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:42,792-Speed 9302.44 samples/sec   Loss 3.8452   LearningRate 0.0062   Epoch: 15   Global Step: 250810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:43,878-Speed 9436.88 samples/sec   Loss 3.9248   LearningRate 0.0062   Epoch: 15   Global Step: 250820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:44,925-Speed 9785.72 samples/sec   Loss 3.9222   LearningRate 0.0062   Epoch: 15   Global Step: 250830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:45,996-Speed 9565.27 samples/sec   Loss 3.8697   LearningRate 0.0062   Epoch: 15   Global Step: 250840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:47,112-Speed 9179.63 samples/sec   Loss 3.7737   LearningRate 0.0062   Epoch: 15   Global Step: 250850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:48,204-Speed 9391.26 samples/sec   Loss 3.8630   LearningRate 0.0062   Epoch: 15   Global Step: 250860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:49,367-Speed 8803.79 samples/sec   Loss 3.8933   LearningRate 0.0062   Epoch: 15   Global Step: 250870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:40:50,406-Speed 9865.57 samples/sec   Loss 3.8990   LearningRate 0.0062   Epoch: 15   Global Step: 250880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:51,476-Speed 9577.37 samples/sec   Loss 3.8947   LearningRate 0.0062   Epoch: 15   Global Step: 250890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:52,580-Speed 9283.29 samples/sec   Loss 3.8628   LearningRate 0.0062   Epoch: 15   Global Step: 250900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:53,673-Speed 9368.63 samples/sec   Loss 3.8765   LearningRate 0.0062   Epoch: 15   Global Step: 250910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:54,744-Speed 9569.66 samples/sec   Loss 3.8759   LearningRate 0.0062   Epoch: 15   Global Step: 250920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:55,814-Speed 9577.61 samples/sec   Loss 3.9128   LearningRate 0.0062   Epoch: 15   Global Step: 250930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:56,917-Speed 9294.64 samples/sec   Loss 3.9926   LearningRate 0.0062   Epoch: 15   Global Step: 250940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:57,993-Speed 9522.52 samples/sec   Loss 3.8868   LearningRate 0.0062   Epoch: 15   Global Step: 250950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:40:59,098-Speed 9267.69 samples/sec   Loss 3.8674   LearningRate 0.0062   Epoch: 15   Global Step: 250960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:00,175-Speed 9514.63 samples/sec   Loss 3.8054   LearningRate 0.0062   Epoch: 15   Global Step: 250970   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:01,249-Speed 9538.90 samples/sec   Loss 3.9095   LearningRate 0.0062   Epoch: 15   Global Step: 250980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:02,315-Speed 9614.42 samples/sec   Loss 3.8964   LearningRate 0.0062   Epoch: 15   Global Step: 250990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:03,403-Speed 9414.89 samples/sec   Loss 3.8031   LearningRate 0.0062   Epoch: 15   Global Step: 251000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:04,465-Speed 9651.78 samples/sec   Loss 3.8981   LearningRate 0.0062   Epoch: 15   Global Step: 251010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:05,573-Speed 9248.80 samples/sec   Loss 3.9565   LearningRate 0.0062   Epoch: 15   Global Step: 251020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:06,676-Speed 9282.37 samples/sec   Loss 3.9303   LearningRate 0.0062   Epoch: 15   Global Step: 251030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:07,787-Speed 9227.52 samples/sec   Loss 3.7801   LearningRate 0.0061   Epoch: 15   Global Step: 251040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:08,827-Speed 9844.09 samples/sec   Loss 3.8990   LearningRate 0.0061   Epoch: 15   Global Step: 251050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:09,898-Speed 9568.26 samples/sec   Loss 3.8667   LearningRate 0.0061   Epoch: 15   Global Step: 251060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:10,977-Speed 9499.13 samples/sec   Loss 3.8695   LearningRate 0.0061   Epoch: 15   Global Step: 251070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:12,057-Speed 9488.65 samples/sec   Loss 3.8887   LearningRate 0.0061   Epoch: 15   Global Step: 251080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:13,121-Speed 9633.25 samples/sec   Loss 3.8632   LearningRate 0.0061   Epoch: 15   Global Step: 251090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:14,199-Speed 9507.35 samples/sec   Loss 4.0023   LearningRate 0.0061   Epoch: 15   Global Step: 251100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:15,308-Speed 9237.70 samples/sec   Loss 3.8567   LearningRate 0.0061   Epoch: 15   Global Step: 251110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:16,413-Speed 9271.33 samples/sec   Loss 3.9375   LearningRate 0.0061   Epoch: 15   Global Step: 251120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:17,695-Speed 7991.77 samples/sec   Loss 3.9138   LearningRate 0.0061   Epoch: 15   Global Step: 251130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:18,791-Speed 9352.83 samples/sec   Loss 3.8975   LearningRate 0.0061   Epoch: 15   Global Step: 251140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:19,913-Speed 9131.88 samples/sec   Loss 3.8661   LearningRate 0.0061   Epoch: 15   Global Step: 251150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:21,028-Speed 9186.81 samples/sec   Loss 3.8483   LearningRate 0.0061   Epoch: 15   Global Step: 251160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:22,096-Speed 9594.94 samples/sec   Loss 3.9844   LearningRate 0.0061   Epoch: 15   Global Step: 251170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:23,189-Speed 9367.67 samples/sec   Loss 3.9495   LearningRate 0.0061   Epoch: 15   Global Step: 251180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:24,292-Speed 9295.22 samples/sec   Loss 3.9083   LearningRate 0.0061   Epoch: 15   Global Step: 251190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:25,408-Speed 9173.37 samples/sec   Loss 3.8476   LearningRate 0.0061   Epoch: 15   Global Step: 251200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:26,476-Speed 9598.44 samples/sec   Loss 3.9473   LearningRate 0.0061   Epoch: 15   Global Step: 251210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:27,522-Speed 9797.05 samples/sec   Loss 3.8992   LearningRate 0.0061   Epoch: 15   Global Step: 251220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:28,671-Speed 8917.21 samples/sec   Loss 3.8811   LearningRate 0.0061   Epoch: 15   Global Step: 251230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:29,765-Speed 9366.39 samples/sec   Loss 3.8771   LearningRate 0.0061   Epoch: 15   Global Step: 251240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:30,842-Speed 9509.44 samples/sec   Loss 3.8759   LearningRate 0.0061   Epoch: 15   Global Step: 251250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:31,923-Speed 9484.48 samples/sec   Loss 3.9629   LearningRate 0.0061   Epoch: 15   Global Step: 251260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:33,014-Speed 9391.48 samples/sec   Loss 3.8998   LearningRate 0.0061   Epoch: 15   Global Step: 251270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:34,086-Speed 9556.10 samples/sec   Loss 3.8512   LearningRate 0.0061   Epoch: 15   Global Step: 251280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:35,152-Speed 9609.87 samples/sec   Loss 3.8934   LearningRate 0.0061   Epoch: 15   Global Step: 251290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:36,257-Speed 9276.61 samples/sec   Loss 3.8477   LearningRate 0.0061   Epoch: 15   Global Step: 251300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:37,335-Speed 9499.11 samples/sec   Loss 3.8947   LearningRate 0.0061   Epoch: 15   Global Step: 251310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:38,415-Speed 9489.25 samples/sec   Loss 4.0516   LearningRate 0.0061   Epoch: 15   Global Step: 251320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:39,501-Speed 9436.87 samples/sec   Loss 3.9461   LearningRate 0.0061   Epoch: 15   Global Step: 251330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:40,564-Speed 9638.95 samples/sec   Loss 3.9541   LearningRate 0.0061   Epoch: 15   Global Step: 251340   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:41,634-Speed 9568.60 samples/sec   Loss 3.8910   LearningRate 0.0061   Epoch: 15   Global Step: 251350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:42,695-Speed 9659.46 samples/sec   Loss 3.9678   LearningRate 0.0061   Epoch: 15   Global Step: 251360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:43,757-Speed 9646.08 samples/sec   Loss 3.9563   LearningRate 0.0061   Epoch: 15   Global Step: 251370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:44,841-Speed 9453.60 samples/sec   Loss 3.8793   LearningRate 0.0061   Epoch: 15   Global Step: 251380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:45,930-Speed 9410.94 samples/sec   Loss 3.9301   LearningRate 0.0061   Epoch: 15   Global Step: 251390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:47,039-Speed 9244.05 samples/sec   Loss 3.9045   LearningRate 0.0061   Epoch: 15   Global Step: 251400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:48,127-Speed 9411.85 samples/sec   Loss 3.9790   LearningRate 0.0061   Epoch: 15   Global Step: 251410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:49,223-Speed 9350.19 samples/sec   Loss 3.8799   LearningRate 0.0061   Epoch: 15   Global Step: 251420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:50,321-Speed 9337.10 samples/sec   Loss 3.9304   LearningRate 0.0061   Epoch: 15   Global Step: 251430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:51,368-Speed 9788.76 samples/sec   Loss 3.9798   LearningRate 0.0061   Epoch: 15   Global Step: 251440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:52,418-Speed 9755.21 samples/sec   Loss 3.7706   LearningRate 0.0061   Epoch: 15   Global Step: 251450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:53,487-Speed 9578.46 samples/sec   Loss 3.9605   LearningRate 0.0061   Epoch: 15   Global Step: 251460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:41:54,536-Speed 9774.52 samples/sec   Loss 4.0331   LearningRate 0.0061   Epoch: 15   Global Step: 251470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:55,648-Speed 9209.46 samples/sec   Loss 3.9411   LearningRate 0.0061   Epoch: 15   Global Step: 251480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:56,753-Speed 9273.94 samples/sec   Loss 3.8990   LearningRate 0.0061   Epoch: 15   Global Step: 251490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:57,822-Speed 9582.95 samples/sec   Loss 3.9756   LearningRate 0.0061   Epoch: 15   Global Step: 251500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:41:58,932-Speed 9231.84 samples/sec   Loss 3.8906   LearningRate 0.0061   Epoch: 15   Global Step: 251510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:00,065-Speed 9043.57 samples/sec   Loss 3.9717   LearningRate 0.0061   Epoch: 15   Global Step: 251520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:01,176-Speed 9221.42 samples/sec   Loss 3.8907   LearningRate 0.0061   Epoch: 15   Global Step: 251530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:02,257-Speed 9479.51 samples/sec   Loss 3.9342   LearningRate 0.0061   Epoch: 15   Global Step: 251540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:03,304-Speed 9793.99 samples/sec   Loss 3.9635   LearningRate 0.0061   Epoch: 15   Global Step: 251550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:04,374-Speed 9569.99 samples/sec   Loss 3.9958   LearningRate 0.0061   Epoch: 15   Global Step: 251560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:05,438-Speed 9627.87 samples/sec   Loss 4.0152   LearningRate 0.0061   Epoch: 15   Global Step: 251570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:06,516-Speed 9510.58 samples/sec   Loss 3.8695   LearningRate 0.0061   Epoch: 15   Global Step: 251580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:07,582-Speed 9607.44 samples/sec   Loss 3.9427   LearningRate 0.0061   Epoch: 15   Global Step: 251590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:08,676-Speed 9369.92 samples/sec   Loss 3.9217   LearningRate 0.0061   Epoch: 15   Global Step: 251600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:09,821-Speed 8945.45 samples/sec   Loss 4.0150   LearningRate 0.0061   Epoch: 15   Global Step: 251610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:10,905-Speed 9452.38 samples/sec   Loss 3.7965   LearningRate 0.0061   Epoch: 15   Global Step: 251620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:11,945-Speed 9859.66 samples/sec   Loss 3.9516   LearningRate 0.0061   Epoch: 15   Global Step: 251630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:13,008-Speed 9633.10 samples/sec   Loss 3.8813   LearningRate 0.0061   Epoch: 15   Global Step: 251640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:14,077-Speed 9585.35 samples/sec   Loss 3.9319   LearningRate 0.0061   Epoch: 15   Global Step: 251650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:15,116-Speed 9861.84 samples/sec   Loss 3.9595   LearningRate 0.0061   Epoch: 15   Global Step: 251660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:16,198-Speed 9477.09 samples/sec   Loss 3.9349   LearningRate 0.0061   Epoch: 15   Global Step: 251670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:17,295-Speed 9340.34 samples/sec   Loss 3.8778   LearningRate 0.0061   Epoch: 15   Global Step: 251680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:18,383-Speed 9417.37 samples/sec   Loss 3.8696   LearningRate 0.0061   Epoch: 15   Global Step: 251690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:19,439-Speed 9706.04 samples/sec   Loss 3.9822   LearningRate 0.0061   Epoch: 15   Global Step: 251700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:20,547-Speed 9249.93 samples/sec   Loss 3.9657   LearningRate 0.0061   Epoch: 15   Global Step: 251710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:21,666-Speed 9152.69 samples/sec   Loss 3.9625   LearningRate 0.0060   Epoch: 15   Global Step: 251720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:22,766-Speed 9316.05 samples/sec   Loss 4.0009   LearningRate 0.0060   Epoch: 15   Global Step: 251730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:23,865-Speed 9320.39 samples/sec   Loss 3.9512   LearningRate 0.0060   Epoch: 15   Global Step: 251740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:24,937-Speed 9559.48 samples/sec   Loss 3.9495   LearningRate 0.0060   Epoch: 15   Global Step: 251750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:26,011-Speed 9542.41 samples/sec   Loss 3.9296   LearningRate 0.0060   Epoch: 15   Global Step: 251760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:27,113-Speed 9302.03 samples/sec   Loss 3.9180   LearningRate 0.0060   Epoch: 15   Global Step: 251770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:28,228-Speed 9184.81 samples/sec   Loss 3.9405   LearningRate 0.0060   Epoch: 15   Global Step: 251780   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:29,313-Speed 9449.90 samples/sec   Loss 3.9194   LearningRate 0.0060   Epoch: 15   Global Step: 251790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:30,410-Speed 9338.07 samples/sec   Loss 3.8892   LearningRate 0.0060   Epoch: 15   Global Step: 251800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:31,485-Speed 9534.02 samples/sec   Loss 3.9613   LearningRate 0.0060   Epoch: 15   Global Step: 251810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:32,561-Speed 9523.45 samples/sec   Loss 3.9012   LearningRate 0.0060   Epoch: 15   Global Step: 251820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:33,639-Speed 9503.75 samples/sec   Loss 3.8886   LearningRate 0.0060   Epoch: 15   Global Step: 251830   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-11 21:42:34,700-Speed 9661.65 samples/sec   Loss 4.0110   LearningRate 0.0060   Epoch: 15   Global Step: 251840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:35,763-Speed 9639.20 samples/sec   Loss 3.8862   LearningRate 0.0060   Epoch: 15   Global Step: 251850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:36,863-Speed 9309.96 samples/sec   Loss 4.0117   LearningRate 0.0060   Epoch: 15   Global Step: 251860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:37,983-Speed 9149.24 samples/sec   Loss 3.8901   LearningRate 0.0060   Epoch: 15   Global Step: 251870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:39,098-Speed 9188.04 samples/sec   Loss 3.9247   LearningRate 0.0060   Epoch: 15   Global Step: 251880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:40,176-Speed 9508.03 samples/sec   Loss 3.9117   LearningRate 0.0060   Epoch: 15   Global Step: 251890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:41,223-Speed 9779.09 samples/sec   Loss 3.9937   LearningRate 0.0060   Epoch: 15   Global Step: 251900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:42,317-Speed 9374.13 samples/sec   Loss 3.9467   LearningRate 0.0060   Epoch: 15   Global Step: 251910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:43,419-Speed 9300.02 samples/sec   Loss 3.8922   LearningRate 0.0060   Epoch: 15   Global Step: 251920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:44,479-Speed 9661.00 samples/sec   Loss 3.9806   LearningRate 0.0060   Epoch: 15   Global Step: 251930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:45,540-Speed 9664.10 samples/sec   Loss 4.0174   LearningRate 0.0060   Epoch: 15   Global Step: 251940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:46,605-Speed 9616.56 samples/sec   Loss 3.9694   LearningRate 0.0060   Epoch: 15   Global Step: 251950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:47,704-Speed 9328.69 samples/sec   Loss 3.9666   LearningRate 0.0060   Epoch: 15   Global Step: 251960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:42:48,775-Speed 9563.87 samples/sec   Loss 3.9656   LearningRate 0.0060   Epoch: 15   Global Step: 251970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:49,893-Speed 9165.64 samples/sec   Loss 3.9132   LearningRate 0.0060   Epoch: 15   Global Step: 251980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:50,942-Speed 9765.75 samples/sec   Loss 3.8673   LearningRate 0.0060   Epoch: 15   Global Step: 251990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:42:52,065-Speed 9119.51 samples/sec   Loss 4.0172   LearningRate 0.0060   Epoch: 15   Global Step: 252000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:43:14,054-[lfw][252000]XNorm: 7.405152
Training: 2022-04-11 21:43:14,055-[lfw][252000]Accuracy-Flip: 0.99583+-0.00261
Training: 2022-04-11 21:43:14,055-[lfw][252000]Accuracy-Highest: 0.99733
Training: 2022-04-11 21:43:39,449-[cfp_fp][252000]XNorm: 6.373097
Training: 2022-04-11 21:43:39,450-[cfp_fp][252000]Accuracy-Flip: 0.97043+-0.00706
Training: 2022-04-11 21:43:39,450-[cfp_fp][252000]Accuracy-Highest: 0.97143
Training: 2022-04-11 21:44:01,317-[agedb_30][252000]XNorm: 7.182551
Training: 2022-04-11 21:44:01,318-[agedb_30][252000]Accuracy-Flip: 0.97050+-0.00823
Training: 2022-04-11 21:44:01,318-[agedb_30][252000]Accuracy-Highest: 0.97350
Training: 2022-04-11 21:44:02,401-Speed 145.59 samples/sec   Loss 3.9166   LearningRate 0.0060   Epoch: 15   Global Step: 252010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:03,442-Speed 9841.96 samples/sec   Loss 3.9233   LearningRate 0.0060   Epoch: 15   Global Step: 252020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:04,499-Speed 9698.22 samples/sec   Loss 3.9180   LearningRate 0.0060   Epoch: 15   Global Step: 252030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:05,572-Speed 9543.24 samples/sec   Loss 3.8353   LearningRate 0.0060   Epoch: 15   Global Step: 252040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:06,631-Speed 9670.95 samples/sec   Loss 3.9005   LearningRate 0.0060   Epoch: 15   Global Step: 252050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:07,703-Speed 9562.24 samples/sec   Loss 3.9602   LearningRate 0.0060   Epoch: 15   Global Step: 252060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:08,778-Speed 9529.77 samples/sec   Loss 3.9447   LearningRate 0.0060   Epoch: 15   Global Step: 252070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:44:09,878-Speed 9318.43 samples/sec   Loss 3.9406   LearningRate 0.0060   Epoch: 15   Global Step: 252080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:44:10,947-Speed 9586.69 samples/sec   Loss 4.0464   LearningRate 0.0060   Epoch: 15   Global Step: 252090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:12,012-Speed 9618.45 samples/sec   Loss 3.9437   LearningRate 0.0060   Epoch: 15   Global Step: 252100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:13,084-Speed 9562.62 samples/sec   Loss 3.9924   LearningRate 0.0060   Epoch: 15   Global Step: 252110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:14,141-Speed 9690.31 samples/sec   Loss 4.0180   LearningRate 0.0060   Epoch: 15   Global Step: 252120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:15,230-Speed 9409.10 samples/sec   Loss 3.9511   LearningRate 0.0060   Epoch: 15   Global Step: 252130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:16,322-Speed 9385.26 samples/sec   Loss 3.9452   LearningRate 0.0060   Epoch: 15   Global Step: 252140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:17,396-Speed 9536.21 samples/sec   Loss 3.9027   LearningRate 0.0060   Epoch: 15   Global Step: 252150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:18,532-Speed 9020.96 samples/sec   Loss 3.9219   LearningRate 0.0060   Epoch: 15   Global Step: 252160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:19,636-Speed 9279.70 samples/sec   Loss 4.0150   LearningRate 0.0060   Epoch: 15   Global Step: 252170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:20,716-Speed 9486.43 samples/sec   Loss 3.9406   LearningRate 0.0060   Epoch: 15   Global Step: 252180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:21,848-Speed 9055.52 samples/sec   Loss 3.8659   LearningRate 0.0060   Epoch: 15   Global Step: 252190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:44:23,174-Speed 7724.42 samples/sec   Loss 3.9285   LearningRate 0.0060   Epoch: 15   Global Step: 252200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:44:24,327-Speed 8886.34 samples/sec   Loss 3.9242   LearningRate 0.0060   Epoch: 15   Global Step: 252210   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:44:25,434-Speed 9258.04 samples/sec   Loss 3.9176   LearningRate 0.0060   Epoch: 15   Global Step: 252220   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:44:26,565-Speed 9060.25 samples/sec   Loss 4.0158   LearningRate 0.0060   Epoch: 15   Global Step: 252230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:44:27,629-Speed 9626.79 samples/sec   Loss 3.9774   LearningRate 0.0060   Epoch: 15   Global Step: 252240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:44:28,748-Speed 9157.11 samples/sec   Loss 3.9704   LearningRate 0.0060   Epoch: 15   Global Step: 252250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:44:29,879-Speed 9064.15 samples/sec   Loss 3.9325   LearningRate 0.0060   Epoch: 15   Global Step: 252260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:44:30,991-Speed 9211.17 samples/sec   Loss 3.9287   LearningRate 0.0060   Epoch: 15   Global Step: 252270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:44:32,071-Speed 9488.25 samples/sec   Loss 4.0526   LearningRate 0.0060   Epoch: 15   Global Step: 252280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:44:33,158-Speed 9434.20 samples/sec   Loss 3.9120   LearningRate 0.0060   Epoch: 15   Global Step: 252290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:34,237-Speed 9489.45 samples/sec   Loss 3.9114   LearningRate 0.0060   Epoch: 15   Global Step: 252300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:35,348-Speed 9222.35 samples/sec   Loss 3.9401   LearningRate 0.0060   Epoch: 15   Global Step: 252310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:36,413-Speed 9622.72 samples/sec   Loss 4.0149   LearningRate 0.0060   Epoch: 15   Global Step: 252320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:37,494-Speed 9481.56 samples/sec   Loss 3.9715   LearningRate 0.0060   Epoch: 15   Global Step: 252330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:38,557-Speed 9635.02 samples/sec   Loss 3.9754   LearningRate 0.0060   Epoch: 15   Global Step: 252340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:39,652-Speed 9358.49 samples/sec   Loss 3.9724   LearningRate 0.0060   Epoch: 15   Global Step: 252350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:40,718-Speed 9607.37 samples/sec   Loss 3.9141   LearningRate 0.0060   Epoch: 15   Global Step: 252360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:41,759-Speed 9842.40 samples/sec   Loss 4.0262   LearningRate 0.0060   Epoch: 15   Global Step: 252370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:42,831-Speed 9558.95 samples/sec   Loss 3.8561   LearningRate 0.0060   Epoch: 15   Global Step: 252380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:43,900-Speed 9585.46 samples/sec   Loss 3.8763   LearningRate 0.0060   Epoch: 15   Global Step: 252390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:44:45,068-Speed 8767.50 samples/sec   Loss 3.9969   LearningRate 0.0059   Epoch: 15   Global Step: 252400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:44:46,202-Speed 9041.53 samples/sec   Loss 3.8975   LearningRate 0.0059   Epoch: 15   Global Step: 252410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:44:47,304-Speed 9301.39 samples/sec   Loss 3.8930   LearningRate 0.0059   Epoch: 15   Global Step: 252420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:48,387-Speed 9456.97 samples/sec   Loss 4.0688   LearningRate 0.0059   Epoch: 15   Global Step: 252430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:49,490-Speed 9294.78 samples/sec   Loss 4.0711   LearningRate 0.0059   Epoch: 15   Global Step: 252440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:50,548-Speed 9685.08 samples/sec   Loss 3.9199   LearningRate 0.0059   Epoch: 15   Global Step: 252450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:51,577-Speed 9956.69 samples/sec   Loss 3.9661   LearningRate 0.0059   Epoch: 15   Global Step: 252460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:52,698-Speed 9137.97 samples/sec   Loss 3.9095   LearningRate 0.0059   Epoch: 15   Global Step: 252470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:53,787-Speed 9408.14 samples/sec   Loss 3.9033   LearningRate 0.0059   Epoch: 15   Global Step: 252480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:54,874-Speed 9424.63 samples/sec   Loss 3.9900   LearningRate 0.0059   Epoch: 15   Global Step: 252490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:55,967-Speed 9378.35 samples/sec   Loss 4.0315   LearningRate 0.0059   Epoch: 15   Global Step: 252500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:57,079-Speed 9214.32 samples/sec   Loss 4.0365   LearningRate 0.0059   Epoch: 15   Global Step: 252510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:44:58,163-Speed 9449.30 samples/sec   Loss 3.9354   LearningRate 0.0059   Epoch: 15   Global Step: 252520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:44:59,235-Speed 9560.08 samples/sec   Loss 3.9650   LearningRate 0.0059   Epoch: 15   Global Step: 252530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:00,321-Speed 9429.70 samples/sec   Loss 4.0360   LearningRate 0.0059   Epoch: 15   Global Step: 252540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:01,429-Speed 9245.94 samples/sec   Loss 3.9604   LearningRate 0.0059   Epoch: 15   Global Step: 252550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:02,529-Speed 9319.62 samples/sec   Loss 4.0170   LearningRate 0.0059   Epoch: 15   Global Step: 252560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:03,605-Speed 9520.45 samples/sec   Loss 3.9403   LearningRate 0.0059   Epoch: 15   Global Step: 252570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:04,684-Speed 9494.11 samples/sec   Loss 3.9369   LearningRate 0.0059   Epoch: 15   Global Step: 252580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:05,750-Speed 9613.74 samples/sec   Loss 3.9473   LearningRate 0.0059   Epoch: 15   Global Step: 252590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:06,810-Speed 9669.30 samples/sec   Loss 3.9748   LearningRate 0.0059   Epoch: 15   Global Step: 252600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:07,906-Speed 9348.60 samples/sec   Loss 3.9300   LearningRate 0.0059   Epoch: 15   Global Step: 252610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:08,943-Speed 9882.85 samples/sec   Loss 3.9576   LearningRate 0.0059   Epoch: 15   Global Step: 252620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:10,033-Speed 9399.18 samples/sec   Loss 3.9101   LearningRate 0.0059   Epoch: 15   Global Step: 252630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:11,092-Speed 9669.97 samples/sec   Loss 4.0240   LearningRate 0.0059   Epoch: 15   Global Step: 252640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:12,178-Speed 9437.99 samples/sec   Loss 4.1584   LearningRate 0.0059   Epoch: 15   Global Step: 252650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:13,273-Speed 9352.68 samples/sec   Loss 4.0030   LearningRate 0.0059   Epoch: 15   Global Step: 252660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:14,337-Speed 9634.97 samples/sec   Loss 4.0307   LearningRate 0.0059   Epoch: 15   Global Step: 252670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:15,469-Speed 9053.58 samples/sec   Loss 3.9777   LearningRate 0.0059   Epoch: 15   Global Step: 252680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:16,568-Speed 9319.81 samples/sec   Loss 4.0219   LearningRate 0.0059   Epoch: 15   Global Step: 252690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:17,649-Speed 9479.83 samples/sec   Loss 3.9661   LearningRate 0.0059   Epoch: 15   Global Step: 252700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:18,718-Speed 9584.55 samples/sec   Loss 4.0529   LearningRate 0.0059   Epoch: 15   Global Step: 252710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:19,783-Speed 9620.04 samples/sec   Loss 4.0512   LearningRate 0.0059   Epoch: 15   Global Step: 252720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:20,887-Speed 9280.61 samples/sec   Loss 3.9319   LearningRate 0.0059   Epoch: 15   Global Step: 252730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:21,961-Speed 9539.46 samples/sec   Loss 4.0060   LearningRate 0.0059   Epoch: 15   Global Step: 252740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:23,085-Speed 9117.27 samples/sec   Loss 3.9377   LearningRate 0.0059   Epoch: 15   Global Step: 252750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:24,192-Speed 9249.69 samples/sec   Loss 3.9905   LearningRate 0.0059   Epoch: 15   Global Step: 252760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:25,308-Speed 9187.86 samples/sec   Loss 4.0457   LearningRate 0.0059   Epoch: 15   Global Step: 252770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:26,413-Speed 9277.95 samples/sec   Loss 4.0390   LearningRate 0.0059   Epoch: 15   Global Step: 252780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:27,489-Speed 9519.75 samples/sec   Loss 4.0180   LearningRate 0.0059   Epoch: 15   Global Step: 252790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:28,609-Speed 9147.12 samples/sec   Loss 4.0034   LearningRate 0.0059   Epoch: 15   Global Step: 252800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:29,696-Speed 9426.11 samples/sec   Loss 3.9426   LearningRate 0.0059   Epoch: 15   Global Step: 252810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:30,790-Speed 9359.66 samples/sec   Loss 3.8598   LearningRate 0.0059   Epoch: 15   Global Step: 252820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:31,906-Speed 9184.89 samples/sec   Loss 4.1346   LearningRate 0.0059   Epoch: 15   Global Step: 252830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:33,018-Speed 9212.35 samples/sec   Loss 4.0150   LearningRate 0.0059   Epoch: 15   Global Step: 252840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:34,093-Speed 9534.11 samples/sec   Loss 3.9678   LearningRate 0.0059   Epoch: 15   Global Step: 252850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:35,168-Speed 9534.27 samples/sec   Loss 3.9953   LearningRate 0.0059   Epoch: 15   Global Step: 252860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:36,222-Speed 9720.50 samples/sec   Loss 3.9988   LearningRate 0.0059   Epoch: 15   Global Step: 252870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:37,282-Speed 9661.24 samples/sec   Loss 4.1124   LearningRate 0.0059   Epoch: 15   Global Step: 252880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:38,377-Speed 9361.01 samples/sec   Loss 3.9967   LearningRate 0.0059   Epoch: 15   Global Step: 252890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:39,459-Speed 9465.01 samples/sec   Loss 3.9989   LearningRate 0.0059   Epoch: 15   Global Step: 252900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:40,568-Speed 9245.53 samples/sec   Loss 4.0217   LearningRate 0.0059   Epoch: 15   Global Step: 252910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:41,660-Speed 9374.89 samples/sec   Loss 4.0654   LearningRate 0.0059   Epoch: 15   Global Step: 252920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:42,743-Speed 9462.81 samples/sec   Loss 4.0323   LearningRate 0.0059   Epoch: 15   Global Step: 252930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:43,820-Speed 9518.14 samples/sec   Loss 3.9808   LearningRate 0.0059   Epoch: 15   Global Step: 252940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:44,904-Speed 9455.97 samples/sec   Loss 3.9089   LearningRate 0.0059   Epoch: 15   Global Step: 252950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:45,994-Speed 9399.50 samples/sec   Loss 4.0635   LearningRate 0.0059   Epoch: 15   Global Step: 252960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:47,041-Speed 9782.39 samples/sec   Loss 3.9888   LearningRate 0.0059   Epoch: 15   Global Step: 252970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:48,117-Speed 9519.81 samples/sec   Loss 3.9788   LearningRate 0.0059   Epoch: 15   Global Step: 252980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:49,204-Speed 9424.52 samples/sec   Loss 4.0412   LearningRate 0.0059   Epoch: 15   Global Step: 252990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:50,262-Speed 9688.67 samples/sec   Loss 3.9576   LearningRate 0.0059   Epoch: 15   Global Step: 253000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:51,356-Speed 9370.27 samples/sec   Loss 4.0545   LearningRate 0.0059   Epoch: 15   Global Step: 253010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:52,450-Speed 9358.51 samples/sec   Loss 3.9835   LearningRate 0.0059   Epoch: 15   Global Step: 253020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:53,545-Speed 9359.85 samples/sec   Loss 3.9759   LearningRate 0.0059   Epoch: 15   Global Step: 253030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:54,662-Speed 9176.39 samples/sec   Loss 4.0260   LearningRate 0.0059   Epoch: 15   Global Step: 253040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:55,786-Speed 9109.75 samples/sec   Loss 4.0583   LearningRate 0.0059   Epoch: 15   Global Step: 253050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:45:56,888-Speed 9296.60 samples/sec   Loss 3.8905   LearningRate 0.0059   Epoch: 15   Global Step: 253060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:57,990-Speed 9299.12 samples/sec   Loss 4.0004   LearningRate 0.0059   Epoch: 15   Global Step: 253070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:45:59,058-Speed 9592.59 samples/sec   Loss 4.0605   LearningRate 0.0058   Epoch: 15   Global Step: 253080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:00,177-Speed 9162.04 samples/sec   Loss 4.0022   LearningRate 0.0058   Epoch: 15   Global Step: 253090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:01,330-Speed 8882.57 samples/sec   Loss 3.9245   LearningRate 0.0058   Epoch: 15   Global Step: 253100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:02,433-Speed 9296.44 samples/sec   Loss 4.0093   LearningRate 0.0058   Epoch: 15   Global Step: 253110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:03,527-Speed 9365.99 samples/sec   Loss 3.9875   LearningRate 0.0058   Epoch: 15   Global Step: 253120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:04,610-Speed 9455.80 samples/sec   Loss 3.9623   LearningRate 0.0058   Epoch: 15   Global Step: 253130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:05,679-Speed 9587.06 samples/sec   Loss 4.0540   LearningRate 0.0058   Epoch: 15   Global Step: 253140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:06,759-Speed 9487.50 samples/sec   Loss 3.9629   LearningRate 0.0058   Epoch: 15   Global Step: 253150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:07,823-Speed 9634.86 samples/sec   Loss 3.9522   LearningRate 0.0058   Epoch: 15   Global Step: 253160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:08,888-Speed 9623.05 samples/sec   Loss 3.9207   LearningRate 0.0058   Epoch: 15   Global Step: 253170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:09,948-Speed 9665.36 samples/sec   Loss 4.0675   LearningRate 0.0058   Epoch: 15   Global Step: 253180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:11,022-Speed 9540.89 samples/sec   Loss 4.0415   LearningRate 0.0058   Epoch: 15   Global Step: 253190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:12,087-Speed 9612.82 samples/sec   Loss 3.9971   LearningRate 0.0058   Epoch: 15   Global Step: 253200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:13,175-Speed 9423.22 samples/sec   Loss 3.9770   LearningRate 0.0058   Epoch: 15   Global Step: 253210   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:14,236-Speed 9652.66 samples/sec   Loss 4.0101   LearningRate 0.0058   Epoch: 15   Global Step: 253220   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:15,321-Speed 9446.51 samples/sec   Loss 4.1014   LearningRate 0.0058   Epoch: 15   Global Step: 253230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:16,402-Speed 9473.17 samples/sec   Loss 3.9737   LearningRate 0.0058   Epoch: 15   Global Step: 253240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:17,514-Speed 9213.76 samples/sec   Loss 4.0012   LearningRate 0.0058   Epoch: 15   Global Step: 253250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:18,590-Speed 9530.65 samples/sec   Loss 4.0672   LearningRate 0.0058   Epoch: 15   Global Step: 253260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:19,680-Speed 9402.99 samples/sec   Loss 4.0504   LearningRate 0.0058   Epoch: 15   Global Step: 253270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:20,752-Speed 9553.64 samples/sec   Loss 4.0348   LearningRate 0.0058   Epoch: 15   Global Step: 253280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:21,903-Speed 8909.21 samples/sec   Loss 4.1066   LearningRate 0.0058   Epoch: 15   Global Step: 253290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:22,947-Speed 9806.54 samples/sec   Loss 4.0426   LearningRate 0.0058   Epoch: 15   Global Step: 253300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:24,032-Speed 9445.65 samples/sec   Loss 4.0111   LearningRate 0.0058   Epoch: 15   Global Step: 253310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:25,071-Speed 9866.28 samples/sec   Loss 4.0503   LearningRate 0.0058   Epoch: 15   Global Step: 253320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:26,154-Speed 9454.84 samples/sec   Loss 4.0375   LearningRate 0.0058   Epoch: 15   Global Step: 253330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:27,250-Speed 9351.38 samples/sec   Loss 3.9773   LearningRate 0.0058   Epoch: 15   Global Step: 253340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:28,324-Speed 9541.64 samples/sec   Loss 3.9855   LearningRate 0.0058   Epoch: 15   Global Step: 253350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:29,383-Speed 9671.78 samples/sec   Loss 4.0406   LearningRate 0.0058   Epoch: 15   Global Step: 253360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:30,459-Speed 9521.33 samples/sec   Loss 4.0325   LearningRate 0.0058   Epoch: 15   Global Step: 253370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:31,526-Speed 9609.25 samples/sec   Loss 4.0651   LearningRate 0.0058   Epoch: 15   Global Step: 253380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:32,585-Speed 9674.25 samples/sec   Loss 3.9687   LearningRate 0.0058   Epoch: 15   Global Step: 253390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:33,665-Speed 9484.33 samples/sec   Loss 3.9778   LearningRate 0.0058   Epoch: 15   Global Step: 253400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:34,744-Speed 9492.00 samples/sec   Loss 3.9847   LearningRate 0.0058   Epoch: 15   Global Step: 253410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:35,832-Speed 9421.34 samples/sec   Loss 3.9146   LearningRate 0.0058   Epoch: 15   Global Step: 253420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:36,946-Speed 9196.02 samples/sec   Loss 3.9923   LearningRate 0.0058   Epoch: 15   Global Step: 253430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:38,033-Speed 9429.06 samples/sec   Loss 3.9806   LearningRate 0.0058   Epoch: 15   Global Step: 253440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:39,165-Speed 9054.88 samples/sec   Loss 4.0181   LearningRate 0.0058   Epoch: 15   Global Step: 253450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:40,240-Speed 9527.87 samples/sec   Loss 3.9311   LearningRate 0.0058   Epoch: 15   Global Step: 253460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:41,320-Speed 9482.54 samples/sec   Loss 3.9764   LearningRate 0.0058   Epoch: 15   Global Step: 253470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:42,380-Speed 9670.91 samples/sec   Loss 3.9573   LearningRate 0.0058   Epoch: 15   Global Step: 253480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:43,440-Speed 9670.02 samples/sec   Loss 4.0689   LearningRate 0.0058   Epoch: 15   Global Step: 253490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:44,480-Speed 9848.15 samples/sec   Loss 4.0852   LearningRate 0.0058   Epoch: 15   Global Step: 253500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:45,565-Speed 9439.40 samples/sec   Loss 3.9906   LearningRate 0.0058   Epoch: 15   Global Step: 253510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:46,637-Speed 9561.83 samples/sec   Loss 4.0474   LearningRate 0.0058   Epoch: 15   Global Step: 253520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:47,737-Speed 9318.32 samples/sec   Loss 4.0220   LearningRate 0.0058   Epoch: 15   Global Step: 253530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:48,845-Speed 9243.13 samples/sec   Loss 3.9805   LearningRate 0.0058   Epoch: 15   Global Step: 253540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:49,897-Speed 9741.14 samples/sec   Loss 4.0684   LearningRate 0.0058   Epoch: 15   Global Step: 253550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:50,967-Speed 9577.92 samples/sec   Loss 3.9560   LearningRate 0.0058   Epoch: 15   Global Step: 253560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:52,061-Speed 9367.23 samples/sec   Loss 3.9983   LearningRate 0.0058   Epoch: 15   Global Step: 253570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:53,200-Speed 8990.01 samples/sec   Loss 4.0273   LearningRate 0.0058   Epoch: 15   Global Step: 253580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:54,272-Speed 9556.44 samples/sec   Loss 4.1266   LearningRate 0.0058   Epoch: 15   Global Step: 253590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:55,350-Speed 9506.62 samples/sec   Loss 4.0938   LearningRate 0.0058   Epoch: 15   Global Step: 253600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:56,426-Speed 9525.14 samples/sec   Loss 4.1231   LearningRate 0.0058   Epoch: 15   Global Step: 253610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:57,488-Speed 9652.09 samples/sec   Loss 4.0369   LearningRate 0.0058   Epoch: 15   Global Step: 253620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:46:58,559-Speed 9564.64 samples/sec   Loss 3.9875   LearningRate 0.0058   Epoch: 15   Global Step: 253630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:46:59,664-Speed 9270.97 samples/sec   Loss 4.0810   LearningRate 0.0058   Epoch: 15   Global Step: 253640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:00,740-Speed 9523.40 samples/sec   Loss 3.9775   LearningRate 0.0058   Epoch: 15   Global Step: 253650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:01,807-Speed 9611.54 samples/sec   Loss 4.0556   LearningRate 0.0058   Epoch: 15   Global Step: 253660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:02,908-Speed 9304.45 samples/sec   Loss 4.1119   LearningRate 0.0058   Epoch: 15   Global Step: 253670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:04,008-Speed 9315.23 samples/sec   Loss 4.0964   LearningRate 0.0058   Epoch: 15   Global Step: 253680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:05,099-Speed 9388.47 samples/sec   Loss 4.0098   LearningRate 0.0058   Epoch: 15   Global Step: 253690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:06,165-Speed 9612.33 samples/sec   Loss 4.0156   LearningRate 0.0058   Epoch: 15   Global Step: 253700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:07,267-Speed 9291.30 samples/sec   Loss 4.0229   LearningRate 0.0058   Epoch: 15   Global Step: 253710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:08,384-Speed 9179.43 samples/sec   Loss 4.0446   LearningRate 0.0058   Epoch: 15   Global Step: 253720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:09,439-Speed 9708.26 samples/sec   Loss 4.0570   LearningRate 0.0058   Epoch: 15   Global Step: 253730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:47:10,542-Speed 9289.03 samples/sec   Loss 4.0999   LearningRate 0.0058   Epoch: 15   Global Step: 253740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:47:11,605-Speed 9644.99 samples/sec   Loss 4.0957   LearningRate 0.0058   Epoch: 15   Global Step: 253750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:12,682-Speed 9507.09 samples/sec   Loss 3.9709   LearningRate 0.0058   Epoch: 15   Global Step: 253760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:13,730-Speed 9780.13 samples/sec   Loss 3.9800   LearningRate 0.0058   Epoch: 15   Global Step: 253770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:14,803-Speed 9550.55 samples/sec   Loss 3.9447   LearningRate 0.0057   Epoch: 15   Global Step: 253780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:15,906-Speed 9284.73 samples/sec   Loss 4.0018   LearningRate 0.0057   Epoch: 15   Global Step: 253790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:17,021-Speed 9189.16 samples/sec   Loss 4.0394   LearningRate 0.0057   Epoch: 15   Global Step: 253800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:18,142-Speed 9145.25 samples/sec   Loss 4.0005   LearningRate 0.0057   Epoch: 15   Global Step: 253810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:19,229-Speed 9428.78 samples/sec   Loss 4.0013   LearningRate 0.0057   Epoch: 15   Global Step: 253820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:20,334-Speed 9270.16 samples/sec   Loss 4.0077   LearningRate 0.0057   Epoch: 15   Global Step: 253830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:21,439-Speed 9275.82 samples/sec   Loss 4.0956   LearningRate 0.0057   Epoch: 15   Global Step: 253840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:22,508-Speed 9580.59 samples/sec   Loss 4.0018   LearningRate 0.0057   Epoch: 15   Global Step: 253850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:47:23,557-Speed 9769.27 samples/sec   Loss 4.1339   LearningRate 0.0057   Epoch: 15   Global Step: 253860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:24,622-Speed 9619.44 samples/sec   Loss 3.9881   LearningRate 0.0057   Epoch: 15   Global Step: 253870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:25,683-Speed 9658.73 samples/sec   Loss 4.1044   LearningRate 0.0057   Epoch: 15   Global Step: 253880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:26,710-Speed 9969.55 samples/sec   Loss 4.0065   LearningRate 0.0057   Epoch: 15   Global Step: 253890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:27,786-Speed 9530.36 samples/sec   Loss 4.0251   LearningRate 0.0057   Epoch: 15   Global Step: 253900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:28,868-Speed 9465.63 samples/sec   Loss 4.0574   LearningRate 0.0057   Epoch: 15   Global Step: 253910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:29,988-Speed 9146.71 samples/sec   Loss 4.0095   LearningRate 0.0057   Epoch: 15   Global Step: 253920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:31,103-Speed 9188.57 samples/sec   Loss 3.9835   LearningRate 0.0057   Epoch: 15   Global Step: 253930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:32,207-Speed 9280.36 samples/sec   Loss 4.1119   LearningRate 0.0057   Epoch: 15   Global Step: 253940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:33,303-Speed 9355.96 samples/sec   Loss 4.0401   LearningRate 0.0057   Epoch: 15   Global Step: 253950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:47:34,348-Speed 9801.03 samples/sec   Loss 4.0861   LearningRate 0.0057   Epoch: 15   Global Step: 253960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:47:35,487-Speed 8994.76 samples/sec   Loss 4.0180   LearningRate 0.0057   Epoch: 15   Global Step: 253970   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:47:36,626-Speed 8996.96 samples/sec   Loss 4.0180   LearningRate 0.0057   Epoch: 15   Global Step: 253980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:47:37,738-Speed 9215.50 samples/sec   Loss 4.0331   LearningRate 0.0057   Epoch: 15   Global Step: 253990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:47:38,855-Speed 9174.17 samples/sec   Loss 4.0187   LearningRate 0.0057   Epoch: 15   Global Step: 254000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:48:01,030-[lfw][254000]XNorm: 7.178992
Training: 2022-04-11 21:48:01,031-[lfw][254000]Accuracy-Flip: 0.99667+-0.00269
Training: 2022-04-11 21:48:01,031-[lfw][254000]Accuracy-Highest: 0.99733
Training: 2022-04-11 21:48:26,713-[cfp_fp][254000]XNorm: 6.200063
Training: 2022-04-11 21:48:26,713-[cfp_fp][254000]Accuracy-Flip: 0.96957+-0.00777
Training: 2022-04-11 21:48:26,714-[cfp_fp][254000]Accuracy-Highest: 0.97143
Training: 2022-04-11 21:48:48,834-[agedb_30][254000]XNorm: 6.969766
Training: 2022-04-11 21:48:48,835-[agedb_30][254000]Accuracy-Flip: 0.97133+-0.00859
Training: 2022-04-11 21:48:48,835-[agedb_30][254000]Accuracy-Highest: 0.97350
Training: 2022-04-11 21:48:49,914-Speed 144.11 samples/sec   Loss 4.0010   LearningRate 0.0057   Epoch: 15   Global Step: 254010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:48:50,983-Speed 9590.72 samples/sec   Loss 3.9976   LearningRate 0.0057   Epoch: 15   Global Step: 254020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:48:52,033-Speed 9755.71 samples/sec   Loss 4.0115   LearningRate 0.0057   Epoch: 15   Global Step: 254030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:48:53,118-Speed 9440.07 samples/sec   Loss 4.0696   LearningRate 0.0057   Epoch: 15   Global Step: 254040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:48:54,219-Speed 9313.99 samples/sec   Loss 4.0402   LearningRate 0.0057   Epoch: 15   Global Step: 254050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:48:55,299-Speed 9486.08 samples/sec   Loss 4.0563   LearningRate 0.0057   Epoch: 15   Global Step: 254060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:48:56,402-Speed 9295.25 samples/sec   Loss 4.0530   LearningRate 0.0057   Epoch: 15   Global Step: 254070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:48:57,466-Speed 9625.04 samples/sec   Loss 4.0024   LearningRate 0.0057   Epoch: 15   Global Step: 254080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:48:58,536-Speed 9575.29 samples/sec   Loss 4.1104   LearningRate 0.0057   Epoch: 15   Global Step: 254090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:48:59,602-Speed 9608.98 samples/sec   Loss 4.0649   LearningRate 0.0057   Epoch: 15   Global Step: 254100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:00,662-Speed 9665.55 samples/sec   Loss 4.0320   LearningRate 0.0057   Epoch: 15   Global Step: 254110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:01,782-Speed 9154.08 samples/sec   Loss 4.0532   LearningRate 0.0057   Epoch: 15   Global Step: 254120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:02,841-Speed 9668.69 samples/sec   Loss 4.0254   LearningRate 0.0057   Epoch: 15   Global Step: 254130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:03,933-Speed 9384.23 samples/sec   Loss 4.0722   LearningRate 0.0057   Epoch: 15   Global Step: 254140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:05,009-Speed 9526.76 samples/sec   Loss 4.0077   LearningRate 0.0057   Epoch: 15   Global Step: 254150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:06,046-Speed 9880.92 samples/sec   Loss 4.0843   LearningRate 0.0057   Epoch: 15   Global Step: 254160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:07,104-Speed 9678.47 samples/sec   Loss 4.0782   LearningRate 0.0057   Epoch: 15   Global Step: 254170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:08,181-Speed 9523.31 samples/sec   Loss 3.9757   LearningRate 0.0057   Epoch: 15   Global Step: 254180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:09,312-Speed 9053.70 samples/sec   Loss 4.0362   LearningRate 0.0057   Epoch: 15   Global Step: 254190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:10,437-Speed 9104.72 samples/sec   Loss 4.1374   LearningRate 0.0057   Epoch: 15   Global Step: 254200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:11,543-Speed 9265.35 samples/sec   Loss 4.0418   LearningRate 0.0057   Epoch: 15   Global Step: 254210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:12,638-Speed 9359.59 samples/sec   Loss 4.1229   LearningRate 0.0057   Epoch: 15   Global Step: 254220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:13,719-Speed 9475.43 samples/sec   Loss 3.9737   LearningRate 0.0057   Epoch: 15   Global Step: 254230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:14,797-Speed 9502.68 samples/sec   Loss 4.0803   LearningRate 0.0057   Epoch: 15   Global Step: 254240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:15,838-Speed 9842.87 samples/sec   Loss 4.0889   LearningRate 0.0057   Epoch: 15   Global Step: 254250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:16,890-Speed 9743.48 samples/sec   Loss 4.0930   LearningRate 0.0057   Epoch: 15   Global Step: 254260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:18,019-Speed 9078.38 samples/sec   Loss 4.0546   LearningRate 0.0057   Epoch: 15   Global Step: 254270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:19,117-Speed 9329.86 samples/sec   Loss 4.0956   LearningRate 0.0057   Epoch: 15   Global Step: 254280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:20,195-Speed 9509.99 samples/sec   Loss 4.1120   LearningRate 0.0057   Epoch: 15   Global Step: 254290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:21,294-Speed 9326.84 samples/sec   Loss 4.0699   LearningRate 0.0057   Epoch: 15   Global Step: 254300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:22,407-Speed 9200.87 samples/sec   Loss 4.0464   LearningRate 0.0057   Epoch: 15   Global Step: 254310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:23,534-Speed 9089.78 samples/sec   Loss 4.1153   LearningRate 0.0057   Epoch: 15   Global Step: 254320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:24,635-Speed 9307.54 samples/sec   Loss 4.1022   LearningRate 0.0057   Epoch: 15   Global Step: 254330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:25,697-Speed 9644.29 samples/sec   Loss 4.0263   LearningRate 0.0057   Epoch: 15   Global Step: 254340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:26,775-Speed 9509.67 samples/sec   Loss 4.0117   LearningRate 0.0057   Epoch: 15   Global Step: 254350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:27,878-Speed 9286.15 samples/sec   Loss 3.9683   LearningRate 0.0057   Epoch: 15   Global Step: 254360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:29,045-Speed 8780.90 samples/sec   Loss 4.1636   LearningRate 0.0057   Epoch: 15   Global Step: 254370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:30,092-Speed 9785.04 samples/sec   Loss 4.0437   LearningRate 0.0057   Epoch: 15   Global Step: 254380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:31,134-Speed 9833.56 samples/sec   Loss 4.0535   LearningRate 0.0057   Epoch: 15   Global Step: 254390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:32,233-Speed 9325.10 samples/sec   Loss 4.1194   LearningRate 0.0057   Epoch: 15   Global Step: 254400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:33,364-Speed 9060.49 samples/sec   Loss 4.0262   LearningRate 0.0057   Epoch: 15   Global Step: 254410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:34,485-Speed 9134.98 samples/sec   Loss 4.0570   LearningRate 0.0057   Epoch: 15   Global Step: 254420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:35,608-Speed 9123.65 samples/sec   Loss 4.0883   LearningRate 0.0057   Epoch: 15   Global Step: 254430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:36,744-Speed 9018.65 samples/sec   Loss 4.0353   LearningRate 0.0057   Epoch: 15   Global Step: 254440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:37,811-Speed 9608.29 samples/sec   Loss 4.1185   LearningRate 0.0057   Epoch: 15   Global Step: 254450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:38,946-Speed 9036.19 samples/sec   Loss 4.1100   LearningRate 0.0057   Epoch: 15   Global Step: 254460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:40,055-Speed 9242.79 samples/sec   Loss 4.1221   LearningRate 0.0057   Epoch: 15   Global Step: 254470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:41,153-Speed 9326.89 samples/sec   Loss 4.0424   LearningRate 0.0056   Epoch: 15   Global Step: 254480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:42,252-Speed 9323.98 samples/sec   Loss 4.0242   LearningRate 0.0056   Epoch: 15   Global Step: 254490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:43,338-Speed 9434.67 samples/sec   Loss 3.9973   LearningRate 0.0056   Epoch: 15   Global Step: 254500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:44,388-Speed 9755.71 samples/sec   Loss 4.1003   LearningRate 0.0056   Epoch: 15   Global Step: 254510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:45,502-Speed 9200.90 samples/sec   Loss 4.1129   LearningRate 0.0056   Epoch: 15   Global Step: 254520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:46,591-Speed 9403.55 samples/sec   Loss 4.2118   LearningRate 0.0056   Epoch: 15   Global Step: 254530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:47,665-Speed 9544.66 samples/sec   Loss 4.0488   LearningRate 0.0056   Epoch: 15   Global Step: 254540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:48,747-Speed 9463.28 samples/sec   Loss 4.0643   LearningRate 0.0056   Epoch: 15   Global Step: 254550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:49,830-Speed 9464.05 samples/sec   Loss 4.0215   LearningRate 0.0056   Epoch: 15   Global Step: 254560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:50,982-Speed 8895.45 samples/sec   Loss 4.1097   LearningRate 0.0056   Epoch: 15   Global Step: 254570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:52,132-Speed 8913.16 samples/sec   Loss 4.0260   LearningRate 0.0056   Epoch: 15   Global Step: 254580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:49:53,210-Speed 9499.50 samples/sec   Loss 4.0430   LearningRate 0.0056   Epoch: 15   Global Step: 254590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:54,326-Speed 9184.33 samples/sec   Loss 4.0018   LearningRate 0.0056   Epoch: 15   Global Step: 254600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:55,432-Speed 9256.92 samples/sec   Loss 3.9662   LearningRate 0.0056   Epoch: 15   Global Step: 254610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:56,553-Speed 9146.53 samples/sec   Loss 4.0753   LearningRate 0.0056   Epoch: 15   Global Step: 254620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:57,592-Speed 9861.83 samples/sec   Loss 3.9917   LearningRate 0.0056   Epoch: 15   Global Step: 254630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:58,659-Speed 9599.65 samples/sec   Loss 4.0549   LearningRate 0.0056   Epoch: 15   Global Step: 254640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:49:59,756-Speed 9340.85 samples/sec   Loss 4.0616   LearningRate 0.0056   Epoch: 15   Global Step: 254650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:00,851-Speed 9354.81 samples/sec   Loss 4.0367   LearningRate 0.0056   Epoch: 15   Global Step: 254660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:01,949-Speed 9338.60 samples/sec   Loss 4.1253   LearningRate 0.0056   Epoch: 15   Global Step: 254670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:03,031-Speed 9466.73 samples/sec   Loss 4.0886   LearningRate 0.0056   Epoch: 15   Global Step: 254680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:04,118-Speed 9421.62 samples/sec   Loss 4.1597   LearningRate 0.0056   Epoch: 15   Global Step: 254690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:05,190-Speed 9560.24 samples/sec   Loss 4.0826   LearningRate 0.0056   Epoch: 15   Global Step: 254700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:06,263-Speed 9546.97 samples/sec   Loss 4.1409   LearningRate 0.0056   Epoch: 15   Global Step: 254710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:07,370-Speed 9258.09 samples/sec   Loss 4.0425   LearningRate 0.0056   Epoch: 15   Global Step: 254720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:08,538-Speed 8773.21 samples/sec   Loss 4.0572   LearningRate 0.0056   Epoch: 15   Global Step: 254730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:09,637-Speed 9319.65 samples/sec   Loss 4.0956   LearningRate 0.0056   Epoch: 15   Global Step: 254740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:10,706-Speed 9583.35 samples/sec   Loss 4.0161   LearningRate 0.0056   Epoch: 15   Global Step: 254750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:11,806-Speed 9314.07 samples/sec   Loss 3.9533   LearningRate 0.0056   Epoch: 15   Global Step: 254760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:12,886-Speed 9486.83 samples/sec   Loss 4.1109   LearningRate 0.0056   Epoch: 15   Global Step: 254770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:13,997-Speed 9228.58 samples/sec   Loss 4.0382   LearningRate 0.0056   Epoch: 15   Global Step: 254780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:15,091-Speed 9365.25 samples/sec   Loss 4.0343   LearningRate 0.0056   Epoch: 15   Global Step: 254790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:16,208-Speed 9173.60 samples/sec   Loss 4.1738   LearningRate 0.0056   Epoch: 15   Global Step: 254800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:17,303-Speed 9362.70 samples/sec   Loss 4.0692   LearningRate 0.0056   Epoch: 15   Global Step: 254810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:18,366-Speed 9637.90 samples/sec   Loss 4.1364   LearningRate 0.0056   Epoch: 15   Global Step: 254820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:19,463-Speed 9332.82 samples/sec   Loss 4.0113   LearningRate 0.0056   Epoch: 15   Global Step: 254830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:20,532-Speed 9591.68 samples/sec   Loss 3.9734   LearningRate 0.0056   Epoch: 15   Global Step: 254840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:21,592-Speed 9663.22 samples/sec   Loss 4.0540   LearningRate 0.0056   Epoch: 15   Global Step: 254850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:22,700-Speed 9242.91 samples/sec   Loss 4.0610   LearningRate 0.0056   Epoch: 15   Global Step: 254860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:23,786-Speed 9440.39 samples/sec   Loss 4.0353   LearningRate 0.0056   Epoch: 15   Global Step: 254870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:24,824-Speed 9870.77 samples/sec   Loss 4.0191   LearningRate 0.0056   Epoch: 15   Global Step: 254880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:25,940-Speed 9179.10 samples/sec   Loss 4.1371   LearningRate 0.0056   Epoch: 15   Global Step: 254890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:27,041-Speed 9306.87 samples/sec   Loss 4.0879   LearningRate 0.0056   Epoch: 15   Global Step: 254900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:28,135-Speed 9365.03 samples/sec   Loss 4.0105   LearningRate 0.0056   Epoch: 15   Global Step: 254910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:29,189-Speed 9724.66 samples/sec   Loss 4.0944   LearningRate 0.0056   Epoch: 15   Global Step: 254920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:30,268-Speed 9494.79 samples/sec   Loss 4.1194   LearningRate 0.0056   Epoch: 15   Global Step: 254930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:31,337-Speed 9583.50 samples/sec   Loss 4.0829   LearningRate 0.0056   Epoch: 15   Global Step: 254940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:32,448-Speed 9224.13 samples/sec   Loss 4.1139   LearningRate 0.0056   Epoch: 15   Global Step: 254950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:33,558-Speed 9229.36 samples/sec   Loss 4.1346   LearningRate 0.0056   Epoch: 15   Global Step: 254960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:34,634-Speed 9524.37 samples/sec   Loss 4.0722   LearningRate 0.0056   Epoch: 15   Global Step: 254970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:35,709-Speed 9533.76 samples/sec   Loss 4.0559   LearningRate 0.0056   Epoch: 15   Global Step: 254980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:36,759-Speed 9763.05 samples/sec   Loss 4.0497   LearningRate 0.0056   Epoch: 15   Global Step: 254990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:37,838-Speed 9490.56 samples/sec   Loss 4.0905   LearningRate 0.0056   Epoch: 15   Global Step: 255000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:38,905-Speed 9606.46 samples/sec   Loss 4.0457   LearningRate 0.0056   Epoch: 15   Global Step: 255010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:39,978-Speed 9546.24 samples/sec   Loss 4.0923   LearningRate 0.0056   Epoch: 15   Global Step: 255020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:41,047-Speed 9585.37 samples/sec   Loss 4.1259   LearningRate 0.0056   Epoch: 15   Global Step: 255030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:42,132-Speed 9443.16 samples/sec   Loss 4.0537   LearningRate 0.0056   Epoch: 15   Global Step: 255040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:43,244-Speed 9212.79 samples/sec   Loss 3.9883   LearningRate 0.0056   Epoch: 15   Global Step: 255050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:44,338-Speed 9370.99 samples/sec   Loss 4.0848   LearningRate 0.0056   Epoch: 15   Global Step: 255060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:45,417-Speed 9488.52 samples/sec   Loss 4.0680   LearningRate 0.0056   Epoch: 15   Global Step: 255070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:46,553-Speed 9022.09 samples/sec   Loss 4.0705   LearningRate 0.0056   Epoch: 15   Global Step: 255080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:47,670-Speed 9173.02 samples/sec   Loss 4.1061   LearningRate 0.0056   Epoch: 15   Global Step: 255090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:48,707-Speed 9886.94 samples/sec   Loss 3.9456   LearningRate 0.0056   Epoch: 15   Global Step: 255100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:49,787-Speed 9485.02 samples/sec   Loss 4.0476   LearningRate 0.0056   Epoch: 15   Global Step: 255110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:50,881-Speed 9364.58 samples/sec   Loss 3.9901   LearningRate 0.0056   Epoch: 15   Global Step: 255120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:51,991-Speed 9225.40 samples/sec   Loss 4.0074   LearningRate 0.0056   Epoch: 15   Global Step: 255130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:53,083-Speed 9388.48 samples/sec   Loss 4.0516   LearningRate 0.0056   Epoch: 15   Global Step: 255140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:54,202-Speed 9154.49 samples/sec   Loss 4.0743   LearningRate 0.0056   Epoch: 15   Global Step: 255150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:50:55,280-Speed 9506.54 samples/sec   Loss 4.1404   LearningRate 0.0056   Epoch: 15   Global Step: 255160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:56,378-Speed 9329.46 samples/sec   Loss 4.0446   LearningRate 0.0056   Epoch: 15   Global Step: 255170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:57,475-Speed 9342.98 samples/sec   Loss 4.0133   LearningRate 0.0055   Epoch: 15   Global Step: 255180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:58,576-Speed 9305.34 samples/sec   Loss 4.0313   LearningRate 0.0055   Epoch: 15   Global Step: 255190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:50:59,668-Speed 9376.39 samples/sec   Loss 4.1053   LearningRate 0.0055   Epoch: 15   Global Step: 255200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:00,802-Speed 9038.53 samples/sec   Loss 4.0819   LearningRate 0.0055   Epoch: 15   Global Step: 255210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:01,916-Speed 9192.05 samples/sec   Loss 4.0624   LearningRate 0.0055   Epoch: 15   Global Step: 255220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:03,002-Speed 9437.53 samples/sec   Loss 4.0822   LearningRate 0.0055   Epoch: 15   Global Step: 255230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:04,077-Speed 9529.04 samples/sec   Loss 4.1109   LearningRate 0.0055   Epoch: 15   Global Step: 255240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:05,119-Speed 9831.76 samples/sec   Loss 4.0839   LearningRate 0.0055   Epoch: 15   Global Step: 255250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:06,218-Speed 9328.84 samples/sec   Loss 4.0836   LearningRate 0.0055   Epoch: 15   Global Step: 255260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:07,304-Speed 9443.13 samples/sec   Loss 4.0731   LearningRate 0.0055   Epoch: 15   Global Step: 255270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:08,403-Speed 9317.37 samples/sec   Loss 4.0385   LearningRate 0.0055   Epoch: 15   Global Step: 255280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:09,493-Speed 9402.08 samples/sec   Loss 4.0194   LearningRate 0.0055   Epoch: 15   Global Step: 255290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:10,598-Speed 9274.13 samples/sec   Loss 4.0369   LearningRate 0.0055   Epoch: 15   Global Step: 255300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:11,681-Speed 9463.84 samples/sec   Loss 4.0601   LearningRate 0.0055   Epoch: 15   Global Step: 255310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:12,797-Speed 9178.58 samples/sec   Loss 3.9944   LearningRate 0.0055   Epoch: 15   Global Step: 255320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:13,915-Speed 9162.02 samples/sec   Loss 4.0864   LearningRate 0.0055   Epoch: 15   Global Step: 255330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:14,976-Speed 9658.86 samples/sec   Loss 4.1201   LearningRate 0.0055   Epoch: 15   Global Step: 255340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:16,050-Speed 9543.30 samples/sec   Loss 4.1278   LearningRate 0.0055   Epoch: 15   Global Step: 255350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:17,139-Speed 9409.42 samples/sec   Loss 4.0912   LearningRate 0.0055   Epoch: 15   Global Step: 255360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:18,218-Speed 9494.36 samples/sec   Loss 4.0895   LearningRate 0.0055   Epoch: 15   Global Step: 255370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:19,353-Speed 9027.29 samples/sec   Loss 4.0584   LearningRate 0.0055   Epoch: 15   Global Step: 255380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:20,452-Speed 9328.31 samples/sec   Loss 4.1158   LearningRate 0.0055   Epoch: 15   Global Step: 255390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:21,539-Speed 9425.38 samples/sec   Loss 4.0959   LearningRate 0.0055   Epoch: 15   Global Step: 255400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:22,624-Speed 9439.52 samples/sec   Loss 4.0795   LearningRate 0.0055   Epoch: 15   Global Step: 255410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:23,711-Speed 9423.50 samples/sec   Loss 4.1310   LearningRate 0.0055   Epoch: 15   Global Step: 255420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:24,773-Speed 9654.46 samples/sec   Loss 4.1033   LearningRate 0.0055   Epoch: 15   Global Step: 255430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:25,826-Speed 9731.73 samples/sec   Loss 4.0356   LearningRate 0.0055   Epoch: 15   Global Step: 255440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:26,905-Speed 9499.06 samples/sec   Loss 4.1455   LearningRate 0.0055   Epoch: 15   Global Step: 255450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:27,996-Speed 9387.73 samples/sec   Loss 4.1277   LearningRate 0.0055   Epoch: 15   Global Step: 255460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:29,087-Speed 9387.47 samples/sec   Loss 4.0895   LearningRate 0.0055   Epoch: 15   Global Step: 255470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:30,162-Speed 9529.23 samples/sec   Loss 4.1000   LearningRate 0.0055   Epoch: 15   Global Step: 255480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:31,270-Speed 9258.37 samples/sec   Loss 4.0484   LearningRate 0.0055   Epoch: 15   Global Step: 255490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:32,329-Speed 9668.18 samples/sec   Loss 4.0790   LearningRate 0.0055   Epoch: 15   Global Step: 255500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:33,381-Speed 9744.09 samples/sec   Loss 4.1156   LearningRate 0.0055   Epoch: 15   Global Step: 255510   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:34,467-Speed 9433.66 samples/sec   Loss 4.1185   LearningRate 0.0055   Epoch: 15   Global Step: 255520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:35,571-Speed 9275.72 samples/sec   Loss 4.0124   LearningRate 0.0055   Epoch: 15   Global Step: 255530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:36,718-Speed 8935.79 samples/sec   Loss 4.0106   LearningRate 0.0055   Epoch: 15   Global Step: 255540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:37,795-Speed 9509.82 samples/sec   Loss 4.0340   LearningRate 0.0055   Epoch: 15   Global Step: 255550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:38,863-Speed 9594.09 samples/sec   Loss 4.0405   LearningRate 0.0055   Epoch: 15   Global Step: 255560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:39,929-Speed 9614.37 samples/sec   Loss 4.0958   LearningRate 0.0055   Epoch: 15   Global Step: 255570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:41,027-Speed 9325.44 samples/sec   Loss 4.1180   LearningRate 0.0055   Epoch: 15   Global Step: 255580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:42,111-Speed 9455.84 samples/sec   Loss 4.1482   LearningRate 0.0055   Epoch: 15   Global Step: 255590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:43,273-Speed 8818.97 samples/sec   Loss 4.1529   LearningRate 0.0055   Epoch: 15   Global Step: 255600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:44,342-Speed 9591.74 samples/sec   Loss 4.0309   LearningRate 0.0055   Epoch: 15   Global Step: 255610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:45,387-Speed 9804.95 samples/sec   Loss 4.0490   LearningRate 0.0055   Epoch: 15   Global Step: 255620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:46,436-Speed 9759.79 samples/sec   Loss 4.1106   LearningRate 0.0055   Epoch: 15   Global Step: 255630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:47,481-Speed 9810.39 samples/sec   Loss 4.0768   LearningRate 0.0055   Epoch: 15   Global Step: 255640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:48,591-Speed 9231.40 samples/sec   Loss 4.1646   LearningRate 0.0055   Epoch: 15   Global Step: 255650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:49,680-Speed 9408.45 samples/sec   Loss 4.0729   LearningRate 0.0055   Epoch: 15   Global Step: 255660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:50,802-Speed 9128.74 samples/sec   Loss 4.1489   LearningRate 0.0055   Epoch: 15   Global Step: 255670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:51,938-Speed 9019.80 samples/sec   Loss 4.0726   LearningRate 0.0055   Epoch: 15   Global Step: 255680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:53,073-Speed 9031.17 samples/sec   Loss 4.0787   LearningRate 0.0055   Epoch: 15   Global Step: 255690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:54,172-Speed 9319.94 samples/sec   Loss 4.0956   LearningRate 0.0055   Epoch: 15   Global Step: 255700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:55,258-Speed 9431.39 samples/sec   Loss 4.1315   LearningRate 0.0055   Epoch: 15   Global Step: 255710   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:51:56,306-Speed 9774.27 samples/sec   Loss 4.0709   LearningRate 0.0055   Epoch: 15   Global Step: 255720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:57,413-Speed 9259.49 samples/sec   Loss 4.0938   LearningRate 0.0055   Epoch: 15   Global Step: 255730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:58,500-Speed 9427.14 samples/sec   Loss 4.1244   LearningRate 0.0055   Epoch: 15   Global Step: 255740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:51:59,581-Speed 9476.78 samples/sec   Loss 4.0477   LearningRate 0.0055   Epoch: 15   Global Step: 255750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:00,693-Speed 9215.86 samples/sec   Loss 4.1257   LearningRate 0.0055   Epoch: 15   Global Step: 255760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:01,790-Speed 9337.69 samples/sec   Loss 4.1879   LearningRate 0.0055   Epoch: 15   Global Step: 255770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:02,916-Speed 9106.73 samples/sec   Loss 4.0775   LearningRate 0.0055   Epoch: 15   Global Step: 255780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:03,974-Speed 9681.72 samples/sec   Loss 4.0973   LearningRate 0.0055   Epoch: 15   Global Step: 255790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:05,053-Speed 9492.14 samples/sec   Loss 4.0600   LearningRate 0.0055   Epoch: 15   Global Step: 255800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:06,146-Speed 9379.04 samples/sec   Loss 4.1090   LearningRate 0.0055   Epoch: 15   Global Step: 255810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:07,212-Speed 9612.56 samples/sec   Loss 4.0952   LearningRate 0.0055   Epoch: 15   Global Step: 255820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:52:08,330-Speed 9163.83 samples/sec   Loss 4.0753   LearningRate 0.0055   Epoch: 15   Global Step: 255830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:52:09,460-Speed 9062.65 samples/sec   Loss 4.0720   LearningRate 0.0055   Epoch: 15   Global Step: 255840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:52:10,531-Speed 9566.35 samples/sec   Loss 4.1092   LearningRate 0.0055   Epoch: 15   Global Step: 255850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:11,565-Speed 9910.86 samples/sec   Loss 4.1272   LearningRate 0.0055   Epoch: 15   Global Step: 255860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:12,644-Speed 9508.50 samples/sec   Loss 4.1561   LearningRate 0.0055   Epoch: 15   Global Step: 255870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:13,693-Speed 9765.11 samples/sec   Loss 4.0716   LearningRate 0.0055   Epoch: 15   Global Step: 255880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:14,759-Speed 9616.33 samples/sec   Loss 4.1506   LearningRate 0.0054   Epoch: 15   Global Step: 255890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:15,817-Speed 9680.16 samples/sec   Loss 4.1093   LearningRate 0.0054   Epoch: 15   Global Step: 255900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:16,862-Speed 9802.15 samples/sec   Loss 4.0323   LearningRate 0.0054   Epoch: 15   Global Step: 255910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:17,924-Speed 9647.13 samples/sec   Loss 4.0837   LearningRate 0.0054   Epoch: 15   Global Step: 255920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:19,069-Speed 8955.85 samples/sec   Loss 4.0639   LearningRate 0.0054   Epoch: 15   Global Step: 255930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:20,148-Speed 9498.20 samples/sec   Loss 4.0606   LearningRate 0.0054   Epoch: 15   Global Step: 255940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:52:21,256-Speed 9242.74 samples/sec   Loss 4.0136   LearningRate 0.0054   Epoch: 15   Global Step: 255950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:52:22,355-Speed 9323.95 samples/sec   Loss 4.0532   LearningRate 0.0054   Epoch: 15   Global Step: 255960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:52:23,407-Speed 9738.17 samples/sec   Loss 4.1183   LearningRate 0.0054   Epoch: 15   Global Step: 255970   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:52:24,500-Speed 9377.28 samples/sec   Loss 4.1127   LearningRate 0.0054   Epoch: 15   Global Step: 255980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:52:25,587-Speed 9430.18 samples/sec   Loss 4.0632   LearningRate 0.0054   Epoch: 15   Global Step: 255990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:52:26,702-Speed 9187.37 samples/sec   Loss 4.0786   LearningRate 0.0054   Epoch: 15   Global Step: 256000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:52:48,792-[lfw][256000]XNorm: 7.242381
Training: 2022-04-11 21:52:48,793-[lfw][256000]Accuracy-Flip: 0.99617+-0.00269
Training: 2022-04-11 21:52:48,793-[lfw][256000]Accuracy-Highest: 0.99733
Training: 2022-04-11 21:53:14,304-[cfp_fp][256000]XNorm: 6.275827
Training: 2022-04-11 21:53:14,305-[cfp_fp][256000]Accuracy-Flip: 0.97086+-0.00547
Training: 2022-04-11 21:53:14,305-[cfp_fp][256000]Accuracy-Highest: 0.97143
Training: 2022-04-11 21:53:36,314-[agedb_30][256000]XNorm: 7.043263
Training: 2022-04-11 21:53:36,315-[agedb_30][256000]Accuracy-Flip: 0.96883+-0.00928
Training: 2022-04-11 21:53:36,315-[agedb_30][256000]Accuracy-Highest: 0.97350
Training: 2022-04-11 21:53:37,414-Speed 144.81 samples/sec   Loss 4.2174   LearningRate 0.0054   Epoch: 15   Global Step: 256010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:53:38,519-Speed 9268.58 samples/sec   Loss 4.2231   LearningRate 0.0054   Epoch: 15   Global Step: 256020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:53:39,625-Speed 9263.86 samples/sec   Loss 4.0869   LearningRate 0.0054   Epoch: 15   Global Step: 256030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:53:40,702-Speed 9512.48 samples/sec   Loss 4.1004   LearningRate 0.0054   Epoch: 15   Global Step: 256040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:53:41,792-Speed 9408.27 samples/sec   Loss 4.2157   LearningRate 0.0054   Epoch: 15   Global Step: 256050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:53:42,919-Speed 9094.07 samples/sec   Loss 4.0989   LearningRate 0.0054   Epoch: 15   Global Step: 256060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:53:44,008-Speed 9401.94 samples/sec   Loss 4.1158   LearningRate 0.0054   Epoch: 15   Global Step: 256070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:53:45,087-Speed 9495.06 samples/sec   Loss 4.0374   LearningRate 0.0054   Epoch: 15   Global Step: 256080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:53:46,162-Speed 9531.29 samples/sec   Loss 4.0555   LearningRate 0.0054   Epoch: 15   Global Step: 256090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:53:47,291-Speed 9076.07 samples/sec   Loss 4.0849   LearningRate 0.0054   Epoch: 15   Global Step: 256100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:53:48,362-Speed 9569.89 samples/sec   Loss 3.9945   LearningRate 0.0054   Epoch: 15   Global Step: 256110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:53:49,442-Speed 9492.75 samples/sec   Loss 4.1681   LearningRate 0.0054   Epoch: 15   Global Step: 256120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:53:50,552-Speed 9224.88 samples/sec   Loss 4.1580   LearningRate 0.0054   Epoch: 15   Global Step: 256130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:53:51,632-Speed 9494.95 samples/sec   Loss 4.0572   LearningRate 0.0054   Epoch: 15   Global Step: 256140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:53:52,708-Speed 9515.82 samples/sec   Loss 4.0756   LearningRate 0.0054   Epoch: 15   Global Step: 256150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:53:53,772-Speed 9632.14 samples/sec   Loss 4.0487   LearningRate 0.0054   Epoch: 15   Global Step: 256160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:53:54,832-Speed 9665.13 samples/sec   Loss 4.1041   LearningRate 0.0054   Epoch: 15   Global Step: 256170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:53:55,975-Speed 8962.89 samples/sec   Loss 4.1260   LearningRate 0.0054   Epoch: 15   Global Step: 256180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:53:57,084-Speed 9235.22 samples/sec   Loss 4.0549   LearningRate 0.0054   Epoch: 15   Global Step: 256190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:53:58,202-Speed 9171.39 samples/sec   Loss 4.1090   LearningRate 0.0054   Epoch: 15   Global Step: 256200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:53:59,276-Speed 9539.66 samples/sec   Loss 4.0891   LearningRate 0.0054   Epoch: 15   Global Step: 256210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:00,403-Speed 9084.51 samples/sec   Loss 4.1196   LearningRate 0.0054   Epoch: 15   Global Step: 256220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:01,499-Speed 9354.14 samples/sec   Loss 4.0947   LearningRate 0.0054   Epoch: 15   Global Step: 256230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:02,547-Speed 9775.25 samples/sec   Loss 4.1913   LearningRate 0.0054   Epoch: 15   Global Step: 256240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:03,633-Speed 9437.21 samples/sec   Loss 4.1994   LearningRate 0.0054   Epoch: 15   Global Step: 256250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:04,706-Speed 9547.75 samples/sec   Loss 4.1697   LearningRate 0.0054   Epoch: 15   Global Step: 256260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:05,800-Speed 9364.81 samples/sec   Loss 4.1300   LearningRate 0.0054   Epoch: 15   Global Step: 256270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:06,921-Speed 9136.86 samples/sec   Loss 4.1725   LearningRate 0.0054   Epoch: 15   Global Step: 256280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:08,036-Speed 9193.61 samples/sec   Loss 4.2407   LearningRate 0.0054   Epoch: 15   Global Step: 256290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:09,162-Speed 9096.53 samples/sec   Loss 4.2334   LearningRate 0.0054   Epoch: 15   Global Step: 256300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:10,224-Speed 9650.27 samples/sec   Loss 4.0222   LearningRate 0.0054   Epoch: 15   Global Step: 256310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:11,359-Speed 9025.30 samples/sec   Loss 4.0377   LearningRate 0.0054   Epoch: 15   Global Step: 256320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:12,444-Speed 9454.07 samples/sec   Loss 4.0331   LearningRate 0.0054   Epoch: 15   Global Step: 256330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:13,503-Speed 9672.74 samples/sec   Loss 3.9791   LearningRate 0.0054   Epoch: 15   Global Step: 256340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:14,592-Speed 9404.79 samples/sec   Loss 4.0756   LearningRate 0.0054   Epoch: 15   Global Step: 256350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:15,668-Speed 9530.06 samples/sec   Loss 4.2101   LearningRate 0.0054   Epoch: 15   Global Step: 256360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:16,784-Speed 9179.26 samples/sec   Loss 4.1247   LearningRate 0.0054   Epoch: 15   Global Step: 256370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:17,903-Speed 9154.71 samples/sec   Loss 4.0776   LearningRate 0.0054   Epoch: 15   Global Step: 256380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:18,986-Speed 9462.14 samples/sec   Loss 4.1167   LearningRate 0.0054   Epoch: 15   Global Step: 256390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:20,112-Speed 9098.53 samples/sec   Loss 4.2075   LearningRate 0.0054   Epoch: 15   Global Step: 256400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:21,220-Speed 9249.00 samples/sec   Loss 4.1605   LearningRate 0.0054   Epoch: 15   Global Step: 256410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:22,362-Speed 8968.62 samples/sec   Loss 4.1115   LearningRate 0.0054   Epoch: 15   Global Step: 256420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:23,453-Speed 9391.92 samples/sec   Loss 4.0420   LearningRate 0.0054   Epoch: 15   Global Step: 256430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:24,513-Speed 9668.87 samples/sec   Loss 4.0826   LearningRate 0.0054   Epoch: 15   Global Step: 256440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:25,591-Speed 9509.28 samples/sec   Loss 4.1168   LearningRate 0.0054   Epoch: 15   Global Step: 256450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:26,639-Speed 9783.08 samples/sec   Loss 4.0476   LearningRate 0.0054   Epoch: 15   Global Step: 256460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:27,723-Speed 9450.26 samples/sec   Loss 4.0509   LearningRate 0.0054   Epoch: 15   Global Step: 256470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:28,823-Speed 9310.32 samples/sec   Loss 4.1431   LearningRate 0.0054   Epoch: 15   Global Step: 256480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:29,874-Speed 9752.46 samples/sec   Loss 4.0339   LearningRate 0.0054   Epoch: 15   Global Step: 256490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:30,938-Speed 9627.61 samples/sec   Loss 4.0528   LearningRate 0.0054   Epoch: 15   Global Step: 256500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:31,997-Speed 9674.41 samples/sec   Loss 4.0969   LearningRate 0.0054   Epoch: 15   Global Step: 256510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:33,089-Speed 9384.92 samples/sec   Loss 4.0717   LearningRate 0.0054   Epoch: 15   Global Step: 256520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:34,177-Speed 9421.98 samples/sec   Loss 4.0846   LearningRate 0.0054   Epoch: 15   Global Step: 256530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:35,252-Speed 9531.77 samples/sec   Loss 4.0931   LearningRate 0.0054   Epoch: 15   Global Step: 256540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:36,362-Speed 9229.52 samples/sec   Loss 4.0663   LearningRate 0.0054   Epoch: 15   Global Step: 256550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:37,469-Speed 9251.44 samples/sec   Loss 4.1232   LearningRate 0.0054   Epoch: 15   Global Step: 256560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:38,564-Speed 9353.43 samples/sec   Loss 4.1755   LearningRate 0.0054   Epoch: 15   Global Step: 256570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:39,671-Speed 9258.65 samples/sec   Loss 4.1121   LearningRate 0.0054   Epoch: 15   Global Step: 256580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:40,727-Speed 9702.03 samples/sec   Loss 4.0894   LearningRate 0.0054   Epoch: 15   Global Step: 256590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:41,849-Speed 9131.20 samples/sec   Loss 4.1225   LearningRate 0.0054   Epoch: 15   Global Step: 256600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:42,955-Speed 9265.46 samples/sec   Loss 4.1192   LearningRate 0.0053   Epoch: 15   Global Step: 256610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:44,051-Speed 9354.45 samples/sec   Loss 4.1665   LearningRate 0.0053   Epoch: 15   Global Step: 256620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:45,113-Speed 9647.17 samples/sec   Loss 4.0829   LearningRate 0.0053   Epoch: 15   Global Step: 256630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:46,146-Speed 9917.99 samples/sec   Loss 4.1196   LearningRate 0.0053   Epoch: 15   Global Step: 256640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:47,192-Speed 9794.62 samples/sec   Loss 4.0702   LearningRate 0.0053   Epoch: 15   Global Step: 256650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:48,268-Speed 9522.98 samples/sec   Loss 4.0762   LearningRate 0.0053   Epoch: 15   Global Step: 256660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:49,320-Speed 9741.21 samples/sec   Loss 4.1317   LearningRate 0.0053   Epoch: 15   Global Step: 256670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:50,428-Speed 9243.41 samples/sec   Loss 4.1173   LearningRate 0.0053   Epoch: 15   Global Step: 256680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:51,557-Speed 9074.73 samples/sec   Loss 4.0613   LearningRate 0.0053   Epoch: 15   Global Step: 256690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:52,672-Speed 9191.64 samples/sec   Loss 4.1041   LearningRate 0.0053   Epoch: 15   Global Step: 256700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:53,766-Speed 9360.79 samples/sec   Loss 4.0477   LearningRate 0.0053   Epoch: 15   Global Step: 256710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:54,951-Speed 8651.46 samples/sec   Loss 4.0996   LearningRate 0.0053   Epoch: 15   Global Step: 256720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:56,048-Speed 9339.13 samples/sec   Loss 4.0398   LearningRate 0.0053   Epoch: 15   Global Step: 256730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:54:57,154-Speed 9261.66 samples/sec   Loss 4.1549   LearningRate 0.0053   Epoch: 15   Global Step: 256740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:58,216-Speed 9645.03 samples/sec   Loss 4.1029   LearningRate 0.0053   Epoch: 15   Global Step: 256750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:54:59,311-Speed 9353.11 samples/sec   Loss 4.1048   LearningRate 0.0053   Epoch: 15   Global Step: 256760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:55:00,374-Speed 9639.07 samples/sec   Loss 4.0695   LearningRate 0.0053   Epoch: 15   Global Step: 256770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:55:01,452-Speed 9509.21 samples/sec   Loss 4.0764   LearningRate 0.0053   Epoch: 15   Global Step: 256780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:02,562-Speed 9236.15 samples/sec   Loss 4.1534   LearningRate 0.0053   Epoch: 15   Global Step: 256790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:03,653-Speed 9389.25 samples/sec   Loss 4.0887   LearningRate 0.0053   Epoch: 15   Global Step: 256800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:04,771-Speed 9164.63 samples/sec   Loss 4.0774   LearningRate 0.0053   Epoch: 15   Global Step: 256810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:05,829-Speed 9681.21 samples/sec   Loss 4.1133   LearningRate 0.0053   Epoch: 15   Global Step: 256820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:06,915-Speed 9434.46 samples/sec   Loss 4.1547   LearningRate 0.0053   Epoch: 15   Global Step: 256830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:08,013-Speed 9333.96 samples/sec   Loss 4.1810   LearningRate 0.0053   Epoch: 15   Global Step: 256840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:09,077-Speed 9626.19 samples/sec   Loss 4.0343   LearningRate 0.0053   Epoch: 15   Global Step: 256850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:10,145-Speed 9594.68 samples/sec   Loss 4.1715   LearningRate 0.0053   Epoch: 15   Global Step: 256860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:11,272-Speed 9091.90 samples/sec   Loss 4.0286   LearningRate 0.0053   Epoch: 15   Global Step: 256870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:12,344-Speed 9565.86 samples/sec   Loss 4.1413   LearningRate 0.0053   Epoch: 15   Global Step: 256880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:13,447-Speed 9294.08 samples/sec   Loss 4.0566   LearningRate 0.0053   Epoch: 15   Global Step: 256890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:14,594-Speed 8933.21 samples/sec   Loss 4.0759   LearningRate 0.0053   Epoch: 15   Global Step: 256900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:15,651-Speed 9703.78 samples/sec   Loss 4.1617   LearningRate 0.0053   Epoch: 15   Global Step: 256910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:16,758-Speed 9249.46 samples/sec   Loss 4.1599   LearningRate 0.0053   Epoch: 15   Global Step: 256920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:17,813-Speed 9713.20 samples/sec   Loss 4.1094   LearningRate 0.0053   Epoch: 15   Global Step: 256930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:18,874-Speed 9663.85 samples/sec   Loss 4.0656   LearningRate 0.0053   Epoch: 15   Global Step: 256940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:20,061-Speed 8633.99 samples/sec   Loss 4.0948   LearningRate 0.0053   Epoch: 15   Global Step: 256950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:21,147-Speed 9433.35 samples/sec   Loss 4.1177   LearningRate 0.0053   Epoch: 15   Global Step: 256960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:22,246-Speed 9324.39 samples/sec   Loss 4.1034   LearningRate 0.0053   Epoch: 15   Global Step: 256970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:23,336-Speed 9395.43 samples/sec   Loss 4.1705   LearningRate 0.0053   Epoch: 15   Global Step: 256980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:55:24,460-Speed 9122.01 samples/sec   Loss 4.0737   LearningRate 0.0053   Epoch: 15   Global Step: 256990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:55:25,541-Speed 9477.99 samples/sec   Loss 4.0836   LearningRate 0.0053   Epoch: 15   Global Step: 257000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:55:26,676-Speed 9020.79 samples/sec   Loss 4.1722   LearningRate 0.0053   Epoch: 15   Global Step: 257010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:55:27,804-Speed 9090.86 samples/sec   Loss 4.0630   LearningRate 0.0053   Epoch: 15   Global Step: 257020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:55:28,946-Speed 8963.90 samples/sec   Loss 4.1080   LearningRate 0.0053   Epoch: 15   Global Step: 257030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:55:30,049-Speed 9297.19 samples/sec   Loss 4.0262   LearningRate 0.0053   Epoch: 15   Global Step: 257040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:55:31,135-Speed 9437.05 samples/sec   Loss 4.0929   LearningRate 0.0053   Epoch: 15   Global Step: 257050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:55:32,196-Speed 9658.90 samples/sec   Loss 4.1652   LearningRate 0.0053   Epoch: 15   Global Step: 257060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:55:33,294-Speed 9326.81 samples/sec   Loss 4.1684   LearningRate 0.0053   Epoch: 15   Global Step: 257070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:34,367-Speed 9552.13 samples/sec   Loss 4.0497   LearningRate 0.0053   Epoch: 15   Global Step: 257080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:35,483-Speed 9179.13 samples/sec   Loss 4.0744   LearningRate 0.0053   Epoch: 15   Global Step: 257090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:36,555-Speed 9559.27 samples/sec   Loss 4.1150   LearningRate 0.0053   Epoch: 15   Global Step: 257100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:37,676-Speed 9139.55 samples/sec   Loss 4.1753   LearningRate 0.0053   Epoch: 15   Global Step: 257110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:38,744-Speed 9594.96 samples/sec   Loss 4.0269   LearningRate 0.0053   Epoch: 15   Global Step: 257120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:39,833-Speed 9408.57 samples/sec   Loss 4.1622   LearningRate 0.0053   Epoch: 15   Global Step: 257130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:40,932-Speed 9326.41 samples/sec   Loss 4.0699   LearningRate 0.0053   Epoch: 15   Global Step: 257140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:41,989-Speed 9691.92 samples/sec   Loss 4.1604   LearningRate 0.0053   Epoch: 15   Global Step: 257150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:43,072-Speed 9461.97 samples/sec   Loss 4.0456   LearningRate 0.0053   Epoch: 15   Global Step: 257160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:44,176-Speed 9280.70 samples/sec   Loss 4.1103   LearningRate 0.0053   Epoch: 15   Global Step: 257170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:55:45,265-Speed 9403.43 samples/sec   Loss 4.1332   LearningRate 0.0053   Epoch: 15   Global Step: 257180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:55:46,348-Speed 9464.42 samples/sec   Loss 4.1138   LearningRate 0.0053   Epoch: 15   Global Step: 257190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:55:47,513-Speed 8795.42 samples/sec   Loss 4.0829   LearningRate 0.0053   Epoch: 15   Global Step: 257200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:55:48,625-Speed 9226.03 samples/sec   Loss 4.0962   LearningRate 0.0053   Epoch: 15   Global Step: 257210   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:55:49,671-Speed 9796.09 samples/sec   Loss 4.1631   LearningRate 0.0053   Epoch: 15   Global Step: 257220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:50,730-Speed 9674.50 samples/sec   Loss 4.0456   LearningRate 0.0053   Epoch: 15   Global Step: 257230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:51,858-Speed 9082.09 samples/sec   Loss 4.1188   LearningRate 0.0053   Epoch: 15   Global Step: 257240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:52,957-Speed 9325.81 samples/sec   Loss 4.1222   LearningRate 0.0053   Epoch: 15   Global Step: 257250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:54,013-Speed 9702.17 samples/sec   Loss 4.1602   LearningRate 0.0053   Epoch: 15   Global Step: 257260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:55,103-Speed 9399.24 samples/sec   Loss 4.1034   LearningRate 0.0053   Epoch: 15   Global Step: 257270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:56,190-Speed 9423.96 samples/sec   Loss 4.0715   LearningRate 0.0053   Epoch: 15   Global Step: 257280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:57,272-Speed 9468.26 samples/sec   Loss 4.1748   LearningRate 0.0053   Epoch: 15   Global Step: 257290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:58,351-Speed 9492.54 samples/sec   Loss 4.0531   LearningRate 0.0053   Epoch: 15   Global Step: 257300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:55:59,458-Speed 9260.79 samples/sec   Loss 4.1903   LearningRate 0.0053   Epoch: 15   Global Step: 257310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:00,599-Speed 8979.45 samples/sec   Loss 4.1989   LearningRate 0.0053   Epoch: 15   Global Step: 257320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:01,657-Speed 9683.72 samples/sec   Loss 4.1419   LearningRate 0.0053   Epoch: 15   Global Step: 257330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:02,724-Speed 9604.74 samples/sec   Loss 4.1246   LearningRate 0.0052   Epoch: 15   Global Step: 257340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:03,831-Speed 9250.76 samples/sec   Loss 4.0800   LearningRate 0.0052   Epoch: 15   Global Step: 257350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:04,924-Speed 9376.77 samples/sec   Loss 4.1109   LearningRate 0.0052   Epoch: 15   Global Step: 257360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:05,960-Speed 9891.76 samples/sec   Loss 4.0965   LearningRate 0.0052   Epoch: 15   Global Step: 257370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:07,014-Speed 9721.22 samples/sec   Loss 4.0393   LearningRate 0.0052   Epoch: 15   Global Step: 257380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:08,062-Speed 9779.14 samples/sec   Loss 4.0263   LearningRate 0.0052   Epoch: 15   Global Step: 257390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:09,154-Speed 9378.82 samples/sec   Loss 4.2534   LearningRate 0.0052   Epoch: 15   Global Step: 257400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:10,251-Speed 9343.93 samples/sec   Loss 4.1472   LearningRate 0.0052   Epoch: 15   Global Step: 257410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:11,311-Speed 9657.23 samples/sec   Loss 4.0888   LearningRate 0.0052   Epoch: 15   Global Step: 257420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:12,402-Speed 9401.60 samples/sec   Loss 4.1411   LearningRate 0.0052   Epoch: 15   Global Step: 257430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:13,474-Speed 9552.05 samples/sec   Loss 4.1140   LearningRate 0.0052   Epoch: 15   Global Step: 257440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:14,540-Speed 9612.28 samples/sec   Loss 4.0754   LearningRate 0.0052   Epoch: 15   Global Step: 257450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:15,646-Speed 9262.70 samples/sec   Loss 4.0863   LearningRate 0.0052   Epoch: 15   Global Step: 257460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:16,783-Speed 9016.96 samples/sec   Loss 4.1782   LearningRate 0.0052   Epoch: 15   Global Step: 257470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:17,907-Speed 9113.20 samples/sec   Loss 4.1456   LearningRate 0.0052   Epoch: 15   Global Step: 257480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:19,087-Speed 8686.20 samples/sec   Loss 4.1586   LearningRate 0.0052   Epoch: 15   Global Step: 257490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:20,147-Speed 9661.68 samples/sec   Loss 4.1619   LearningRate 0.0052   Epoch: 15   Global Step: 257500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:21,214-Speed 9600.56 samples/sec   Loss 4.1880   LearningRate 0.0052   Epoch: 15   Global Step: 257510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:22,289-Speed 9531.34 samples/sec   Loss 4.1041   LearningRate 0.0052   Epoch: 15   Global Step: 257520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:23,380-Speed 9392.61 samples/sec   Loss 4.1490   LearningRate 0.0052   Epoch: 15   Global Step: 257530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:24,473-Speed 9381.69 samples/sec   Loss 4.1003   LearningRate 0.0052   Epoch: 15   Global Step: 257540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:25,555-Speed 9465.51 samples/sec   Loss 4.1251   LearningRate 0.0052   Epoch: 15   Global Step: 257550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:26,653-Speed 9328.66 samples/sec   Loss 4.1285   LearningRate 0.0052   Epoch: 15   Global Step: 257560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:27,789-Speed 9024.35 samples/sec   Loss 4.1065   LearningRate 0.0052   Epoch: 15   Global Step: 257570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:28,857-Speed 9594.85 samples/sec   Loss 4.1376   LearningRate 0.0052   Epoch: 15   Global Step: 257580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:29,955-Speed 9328.35 samples/sec   Loss 4.1587   LearningRate 0.0052   Epoch: 15   Global Step: 257590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:31,081-Speed 9112.68 samples/sec   Loss 4.1002   LearningRate 0.0052   Epoch: 15   Global Step: 257600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:32,246-Speed 8794.04 samples/sec   Loss 4.0705   LearningRate 0.0052   Epoch: 15   Global Step: 257610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:33,338-Speed 9384.74 samples/sec   Loss 4.1400   LearningRate 0.0052   Epoch: 15   Global Step: 257620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:34,439-Speed 9302.61 samples/sec   Loss 4.1163   LearningRate 0.0052   Epoch: 15   Global Step: 257630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:35,527-Speed 9420.43 samples/sec   Loss 4.1001   LearningRate 0.0052   Epoch: 15   Global Step: 257640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:36,644-Speed 9166.93 samples/sec   Loss 4.0838   LearningRate 0.0052   Epoch: 15   Global Step: 257650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:37,790-Speed 8943.71 samples/sec   Loss 4.1813   LearningRate 0.0052   Epoch: 15   Global Step: 257660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:38,885-Speed 9356.01 samples/sec   Loss 4.0553   LearningRate 0.0052   Epoch: 15   Global Step: 257670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:39,975-Speed 9402.19 samples/sec   Loss 4.0774   LearningRate 0.0052   Epoch: 15   Global Step: 257680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:41,061-Speed 9434.07 samples/sec   Loss 4.1484   LearningRate 0.0052   Epoch: 15   Global Step: 257690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:42,126-Speed 9622.59 samples/sec   Loss 4.0291   LearningRate 0.0052   Epoch: 15   Global Step: 257700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:43,182-Speed 9704.31 samples/sec   Loss 4.0950   LearningRate 0.0052   Epoch: 15   Global Step: 257710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:44,272-Speed 9395.18 samples/sec   Loss 4.0685   LearningRate 0.0052   Epoch: 15   Global Step: 257720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:45,376-Speed 9285.53 samples/sec   Loss 4.1876   LearningRate 0.0052   Epoch: 15   Global Step: 257730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:46,481-Speed 9267.27 samples/sec   Loss 4.1846   LearningRate 0.0052   Epoch: 15   Global Step: 257740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:47,593-Speed 9213.13 samples/sec   Loss 4.1437   LearningRate 0.0052   Epoch: 15   Global Step: 257750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:48,670-Speed 9520.05 samples/sec   Loss 4.2767   LearningRate 0.0052   Epoch: 15   Global Step: 257760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:49,751-Speed 9479.46 samples/sec   Loss 4.1169   LearningRate 0.0052   Epoch: 15   Global Step: 257770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:50,868-Speed 9173.23 samples/sec   Loss 4.1588   LearningRate 0.0052   Epoch: 15   Global Step: 257780   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:51,953-Speed 9439.47 samples/sec   Loss 4.1373   LearningRate 0.0052   Epoch: 15   Global Step: 257790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:56:53,054-Speed 9306.52 samples/sec   Loss 4.0423   LearningRate 0.0052   Epoch: 15   Global Step: 257800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:54,133-Speed 9512.46 samples/sec   Loss 4.1484   LearningRate 0.0052   Epoch: 15   Global Step: 257810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:55,248-Speed 9184.59 samples/sec   Loss 4.1810   LearningRate 0.0052   Epoch: 15   Global Step: 257820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:56,357-Speed 9242.81 samples/sec   Loss 4.1365   LearningRate 0.0052   Epoch: 15   Global Step: 257830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:57,487-Speed 9070.40 samples/sec   Loss 4.1102   LearningRate 0.0052   Epoch: 15   Global Step: 257840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:58,603-Speed 9183.34 samples/sec   Loss 4.2298   LearningRate 0.0052   Epoch: 15   Global Step: 257850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:56:59,697-Speed 9362.76 samples/sec   Loss 4.1767   LearningRate 0.0052   Epoch: 15   Global Step: 257860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:57:00,839-Speed 8973.39 samples/sec   Loss 4.0935   LearningRate 0.0052   Epoch: 15   Global Step: 257870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:57:01,897-Speed 9691.07 samples/sec   Loss 4.0721   LearningRate 0.0052   Epoch: 15   Global Step: 257880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:57:02,951-Speed 9720.19 samples/sec   Loss 4.1167   LearningRate 0.0052   Epoch: 15   Global Step: 257890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:57:04,054-Speed 9284.68 samples/sec   Loss 4.2073   LearningRate 0.0052   Epoch: 15   Global Step: 257900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:57:05,164-Speed 9234.22 samples/sec   Loss 4.1972   LearningRate 0.0052   Epoch: 15   Global Step: 257910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:57:06,275-Speed 9225.75 samples/sec   Loss 4.1194   LearningRate 0.0052   Epoch: 15   Global Step: 257920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:57:07,357-Speed 9467.79 samples/sec   Loss 4.1427   LearningRate 0.0052   Epoch: 15   Global Step: 257930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:57:08,419-Speed 9648.91 samples/sec   Loss 4.0056   LearningRate 0.0052   Epoch: 15   Global Step: 257940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:57:09,516-Speed 9339.69 samples/sec   Loss 4.1211   LearningRate 0.0052   Epoch: 15   Global Step: 257950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:57:10,585-Speed 9582.24 samples/sec   Loss 4.0686   LearningRate 0.0052   Epoch: 15   Global Step: 257960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:57:11,713-Speed 9087.04 samples/sec   Loss 4.1684   LearningRate 0.0052   Epoch: 15   Global Step: 257970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:57:12,797-Speed 9451.86 samples/sec   Loss 4.1801   LearningRate 0.0052   Epoch: 15   Global Step: 257980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:57:13,919-Speed 9130.15 samples/sec   Loss 4.1599   LearningRate 0.0052   Epoch: 15   Global Step: 257990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:57:15,029-Speed 9231.78 samples/sec   Loss 4.1362   LearningRate 0.0052   Epoch: 15   Global Step: 258000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:57:37,352-[lfw][258000]XNorm: 7.148704
Training: 2022-04-11 21:57:37,352-[lfw][258000]Accuracy-Flip: 0.99717+-0.00299
Training: 2022-04-11 21:57:37,353-[lfw][258000]Accuracy-Highest: 0.99733
Training: 2022-04-11 21:58:02,795-[cfp_fp][258000]XNorm: 6.214708
Training: 2022-04-11 21:58:02,796-[cfp_fp][258000]Accuracy-Flip: 0.97086+-0.00853
Training: 2022-04-11 21:58:02,797-[cfp_fp][258000]Accuracy-Highest: 0.97143
Training: 2022-04-11 21:58:24,794-[agedb_30][258000]XNorm: 6.985307
Training: 2022-04-11 21:58:24,795-[agedb_30][258000]Accuracy-Flip: 0.97083+-0.01014
Training: 2022-04-11 21:58:24,795-[agedb_30][258000]Accuracy-Highest: 0.97350
Training: 2022-04-11 21:58:25,911-Speed 144.47 samples/sec   Loss 4.0846   LearningRate 0.0052   Epoch: 15   Global Step: 258010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:26,983-Speed 9556.50 samples/sec   Loss 4.1335   LearningRate 0.0052   Epoch: 15   Global Step: 258020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:28,124-Speed 8982.82 samples/sec   Loss 4.1306   LearningRate 0.0052   Epoch: 15   Global Step: 258030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:58:29,200-Speed 9513.23 samples/sec   Loss 4.0809   LearningRate 0.0052   Epoch: 15   Global Step: 258040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:30,285-Speed 9443.21 samples/sec   Loss 4.2575   LearningRate 0.0052   Epoch: 15   Global Step: 258050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:31,389-Speed 9284.38 samples/sec   Loss 4.2614   LearningRate 0.0052   Epoch: 15   Global Step: 258060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:32,496-Speed 9257.67 samples/sec   Loss 4.1540   LearningRate 0.0051   Epoch: 15   Global Step: 258070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:33,658-Speed 8816.22 samples/sec   Loss 4.1871   LearningRate 0.0051   Epoch: 15   Global Step: 258080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:34,773-Speed 9195.68 samples/sec   Loss 4.1268   LearningRate 0.0051   Epoch: 15   Global Step: 258090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:35,846-Speed 9542.96 samples/sec   Loss 4.1432   LearningRate 0.0051   Epoch: 15   Global Step: 258100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:36,964-Speed 9163.54 samples/sec   Loss 4.1290   LearningRate 0.0051   Epoch: 15   Global Step: 258110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:38,046-Speed 9471.48 samples/sec   Loss 4.1246   LearningRate 0.0051   Epoch: 15   Global Step: 258120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:39,162-Speed 9181.37 samples/sec   Loss 4.1206   LearningRate 0.0051   Epoch: 15   Global Step: 258130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:40,229-Speed 9607.24 samples/sec   Loss 4.1737   LearningRate 0.0051   Epoch: 15   Global Step: 258140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:58:41,372-Speed 8962.52 samples/sec   Loss 4.2068   LearningRate 0.0051   Epoch: 15   Global Step: 258150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:58:42,503-Speed 9072.01 samples/sec   Loss 4.0543   LearningRate 0.0051   Epoch: 15   Global Step: 258160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:58:43,562-Speed 9668.60 samples/sec   Loss 4.1038   LearningRate 0.0051   Epoch: 15   Global Step: 258170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:58:44,634-Speed 9556.91 samples/sec   Loss 4.1571   LearningRate 0.0051   Epoch: 15   Global Step: 258180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:58:45,672-Speed 9869.38 samples/sec   Loss 4.0303   LearningRate 0.0051   Epoch: 15   Global Step: 258190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:58:46,841-Speed 8766.41 samples/sec   Loss 4.0875   LearningRate 0.0051   Epoch: 15   Global Step: 258200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:58:47,956-Speed 9189.83 samples/sec   Loss 4.1903   LearningRate 0.0051   Epoch: 15   Global Step: 258210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:49,072-Speed 9183.17 samples/sec   Loss 4.0615   LearningRate 0.0051   Epoch: 15   Global Step: 258220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:50,172-Speed 9313.56 samples/sec   Loss 4.1964   LearningRate 0.0051   Epoch: 15   Global Step: 258230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:51,249-Speed 9510.62 samples/sec   Loss 4.1150   LearningRate 0.0051   Epoch: 15   Global Step: 258240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:52,373-Speed 9120.82 samples/sec   Loss 4.1297   LearningRate 0.0051   Epoch: 15   Global Step: 258250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:53,513-Speed 8984.67 samples/sec   Loss 4.0790   LearningRate 0.0051   Epoch: 15   Global Step: 258260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:54,625-Speed 9216.23 samples/sec   Loss 4.0925   LearningRate 0.0051   Epoch: 15   Global Step: 258270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:55,756-Speed 9055.70 samples/sec   Loss 4.1316   LearningRate 0.0051   Epoch: 15   Global Step: 258280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:56,849-Speed 9369.79 samples/sec   Loss 4.1137   LearningRate 0.0051   Epoch: 15   Global Step: 258290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:57,907-Speed 9689.29 samples/sec   Loss 4.0229   LearningRate 0.0051   Epoch: 15   Global Step: 258300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:58:59,004-Speed 9341.07 samples/sec   Loss 4.1300   LearningRate 0.0051   Epoch: 15   Global Step: 258310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:00,110-Speed 9256.74 samples/sec   Loss 4.1924   LearningRate 0.0051   Epoch: 15   Global Step: 258320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:01,181-Speed 9577.74 samples/sec   Loss 4.1395   LearningRate 0.0051   Epoch: 15   Global Step: 258330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:02,317-Speed 9018.53 samples/sec   Loss 4.1449   LearningRate 0.0051   Epoch: 15   Global Step: 258340   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:03,472-Speed 8873.05 samples/sec   Loss 4.0988   LearningRate 0.0051   Epoch: 15   Global Step: 258350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:04,551-Speed 9493.04 samples/sec   Loss 4.1283   LearningRate 0.0051   Epoch: 15   Global Step: 258360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:05,601-Speed 9753.74 samples/sec   Loss 4.1402   LearningRate 0.0051   Epoch: 15   Global Step: 258370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:06,696-Speed 9363.47 samples/sec   Loss 4.1909   LearningRate 0.0051   Epoch: 15   Global Step: 258380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:07,795-Speed 9318.83 samples/sec   Loss 4.0892   LearningRate 0.0051   Epoch: 15   Global Step: 258390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:08,889-Speed 9368.40 samples/sec   Loss 4.2120   LearningRate 0.0051   Epoch: 15   Global Step: 258400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:09,943-Speed 9721.33 samples/sec   Loss 4.1289   LearningRate 0.0051   Epoch: 15   Global Step: 258410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:11,070-Speed 9084.85 samples/sec   Loss 4.1227   LearningRate 0.0051   Epoch: 15   Global Step: 258420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:12,156-Speed 9437.57 samples/sec   Loss 4.1139   LearningRate 0.0051   Epoch: 15   Global Step: 258430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:13,289-Speed 9047.36 samples/sec   Loss 4.1601   LearningRate 0.0051   Epoch: 15   Global Step: 258440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:14,383-Speed 9364.72 samples/sec   Loss 4.0954   LearningRate 0.0051   Epoch: 15   Global Step: 258450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:15,492-Speed 9240.74 samples/sec   Loss 4.0345   LearningRate 0.0051   Epoch: 15   Global Step: 258460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:16,557-Speed 9613.48 samples/sec   Loss 4.1797   LearningRate 0.0051   Epoch: 15   Global Step: 258470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:17,662-Speed 9271.58 samples/sec   Loss 4.1088   LearningRate 0.0051   Epoch: 15   Global Step: 258480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:18,768-Speed 9270.14 samples/sec   Loss 4.1419   LearningRate 0.0051   Epoch: 15   Global Step: 258490   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:19,859-Speed 9398.81 samples/sec   Loss 4.1662   LearningRate 0.0051   Epoch: 15   Global Step: 258500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:20,946-Speed 9424.50 samples/sec   Loss 4.2438   LearningRate 0.0051   Epoch: 15   Global Step: 258510   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:22,022-Speed 9524.81 samples/sec   Loss 4.1555   LearningRate 0.0051   Epoch: 15   Global Step: 258520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:23,119-Speed 9341.39 samples/sec   Loss 4.0495   LearningRate 0.0051   Epoch: 15   Global Step: 258530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:24,190-Speed 9569.64 samples/sec   Loss 4.1335   LearningRate 0.0051   Epoch: 15   Global Step: 258540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:25,304-Speed 9192.16 samples/sec   Loss 4.0917   LearningRate 0.0051   Epoch: 15   Global Step: 258550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:26,447-Speed 8961.69 samples/sec   Loss 4.1745   LearningRate 0.0051   Epoch: 15   Global Step: 258560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:27,510-Speed 9637.42 samples/sec   Loss 4.1413   LearningRate 0.0051   Epoch: 15   Global Step: 258570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:28,575-Speed 9622.00 samples/sec   Loss 4.1713   LearningRate 0.0051   Epoch: 15   Global Step: 258580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:29,685-Speed 9233.30 samples/sec   Loss 4.2048   LearningRate 0.0051   Epoch: 15   Global Step: 258590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:30,790-Speed 9270.30 samples/sec   Loss 4.0936   LearningRate 0.0051   Epoch: 15   Global Step: 258600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:31,891-Speed 9312.02 samples/sec   Loss 4.1740   LearningRate 0.0051   Epoch: 15   Global Step: 258610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:32,994-Speed 9282.48 samples/sec   Loss 4.1696   LearningRate 0.0051   Epoch: 15   Global Step: 258620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:34,098-Speed 9281.89 samples/sec   Loss 4.1057   LearningRate 0.0051   Epoch: 15   Global Step: 258630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:35,186-Speed 9418.57 samples/sec   Loss 4.0734   LearningRate 0.0051   Epoch: 15   Global Step: 258640   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:36,262-Speed 9525.70 samples/sec   Loss 4.0862   LearningRate 0.0051   Epoch: 15   Global Step: 258650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:37,374-Speed 9212.96 samples/sec   Loss 4.1073   LearningRate 0.0051   Epoch: 15   Global Step: 258660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:38,470-Speed 9352.18 samples/sec   Loss 4.2419   LearningRate 0.0051   Epoch: 15   Global Step: 258670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:39,551-Speed 9477.30 samples/sec   Loss 4.0855   LearningRate 0.0051   Epoch: 15   Global Step: 258680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:40,629-Speed 9496.17 samples/sec   Loss 4.1428   LearningRate 0.0051   Epoch: 15   Global Step: 258690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:41,695-Speed 9619.41 samples/sec   Loss 4.1857   LearningRate 0.0051   Epoch: 15   Global Step: 258700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:42,815-Speed 9151.78 samples/sec   Loss 4.0894   LearningRate 0.0051   Epoch: 15   Global Step: 258710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:43,906-Speed 9387.29 samples/sec   Loss 4.1727   LearningRate 0.0051   Epoch: 15   Global Step: 258720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:44,991-Speed 9447.99 samples/sec   Loss 4.0932   LearningRate 0.0051   Epoch: 15   Global Step: 258730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:46,089-Speed 9325.77 samples/sec   Loss 4.1889   LearningRate 0.0051   Epoch: 15   Global Step: 258740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:47,152-Speed 9638.91 samples/sec   Loss 4.2205   LearningRate 0.0051   Epoch: 15   Global Step: 258750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:48,206-Speed 9719.63 samples/sec   Loss 4.1453   LearningRate 0.0051   Epoch: 15   Global Step: 258760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 21:59:49,308-Speed 9299.50 samples/sec   Loss 4.0827   LearningRate 0.0051   Epoch: 15   Global Step: 258770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:50,382-Speed 9547.53 samples/sec   Loss 4.2092   LearningRate 0.0051   Epoch: 15   Global Step: 258780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:51,492-Speed 9231.93 samples/sec   Loss 4.0984   LearningRate 0.0051   Epoch: 15   Global Step: 258790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:52,568-Speed 9517.72 samples/sec   Loss 4.1517   LearningRate 0.0051   Epoch: 15   Global Step: 258800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:53,690-Speed 9136.12 samples/sec   Loss 4.1461   LearningRate 0.0050   Epoch: 15   Global Step: 258810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:54,810-Speed 9143.53 samples/sec   Loss 4.1295   LearningRate 0.0050   Epoch: 15   Global Step: 258820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:55,926-Speed 9180.50 samples/sec   Loss 4.0911   LearningRate 0.0050   Epoch: 15   Global Step: 258830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:57,080-Speed 8879.45 samples/sec   Loss 4.1526   LearningRate 0.0050   Epoch: 15   Global Step: 258840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:58,199-Speed 9155.77 samples/sec   Loss 4.1230   LearningRate 0.0050   Epoch: 15   Global Step: 258850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 21:59:59,337-Speed 9004.04 samples/sec   Loss 4.2522   LearningRate 0.0050   Epoch: 15   Global Step: 258860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:00,489-Speed 8893.12 samples/sec   Loss 4.2240   LearningRate 0.0050   Epoch: 15   Global Step: 258870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:01,596-Speed 9258.26 samples/sec   Loss 4.1608   LearningRate 0.0050   Epoch: 15   Global Step: 258880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:02,723-Speed 9091.43 samples/sec   Loss 4.1455   LearningRate 0.0050   Epoch: 15   Global Step: 258890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:03,801-Speed 9505.57 samples/sec   Loss 4.1826   LearningRate 0.0050   Epoch: 15   Global Step: 258900   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:04,892-Speed 9392.89 samples/sec   Loss 4.0750   LearningRate 0.0050   Epoch: 15   Global Step: 258910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:06,003-Speed 9216.15 samples/sec   Loss 4.0898   LearningRate 0.0050   Epoch: 15   Global Step: 258920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:07,126-Speed 9122.01 samples/sec   Loss 4.1467   LearningRate 0.0050   Epoch: 15   Global Step: 258930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:08,229-Speed 9290.99 samples/sec   Loss 4.0582   LearningRate 0.0050   Epoch: 15   Global Step: 258940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:09,340-Speed 9226.40 samples/sec   Loss 4.1595   LearningRate 0.0050   Epoch: 15   Global Step: 258950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:10,414-Speed 9535.89 samples/sec   Loss 4.0704   LearningRate 0.0050   Epoch: 15   Global Step: 258960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:11,481-Speed 9602.66 samples/sec   Loss 4.1678   LearningRate 0.0050   Epoch: 15   Global Step: 258970   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:12,546-Speed 9620.61 samples/sec   Loss 4.0953   LearningRate 0.0050   Epoch: 15   Global Step: 258980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:13,701-Speed 8872.76 samples/sec   Loss 4.0782   LearningRate 0.0050   Epoch: 15   Global Step: 258990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:14,793-Speed 9387.77 samples/sec   Loss 4.1198   LearningRate 0.0050   Epoch: 15   Global Step: 259000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:15,922-Speed 9072.08 samples/sec   Loss 4.1426   LearningRate 0.0050   Epoch: 15   Global Step: 259010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:16,975-Speed 9729.85 samples/sec   Loss 4.1779   LearningRate 0.0050   Epoch: 15   Global Step: 259020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:18,080-Speed 9276.47 samples/sec   Loss 4.0911   LearningRate 0.0050   Epoch: 15   Global Step: 259030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:19,207-Speed 9094.89 samples/sec   Loss 4.1363   LearningRate 0.0050   Epoch: 15   Global Step: 259040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:20,282-Speed 9526.27 samples/sec   Loss 4.0469   LearningRate 0.0050   Epoch: 15   Global Step: 259050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:21,367-Speed 9442.16 samples/sec   Loss 4.0553   LearningRate 0.0050   Epoch: 15   Global Step: 259060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:22,514-Speed 8932.62 samples/sec   Loss 4.2023   LearningRate 0.0050   Epoch: 15   Global Step: 259070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:23,597-Speed 9465.01 samples/sec   Loss 4.0869   LearningRate 0.0050   Epoch: 15   Global Step: 259080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:24,664-Speed 9601.02 samples/sec   Loss 4.0144   LearningRate 0.0050   Epoch: 15   Global Step: 259090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:25,770-Speed 9265.74 samples/sec   Loss 4.2339   LearningRate 0.0050   Epoch: 15   Global Step: 259100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:26,860-Speed 9394.61 samples/sec   Loss 4.1199   LearningRate 0.0050   Epoch: 15   Global Step: 259110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:27,962-Speed 9303.00 samples/sec   Loss 4.0674   LearningRate 0.0050   Epoch: 15   Global Step: 259120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:29,077-Speed 9187.47 samples/sec   Loss 4.2122   LearningRate 0.0050   Epoch: 15   Global Step: 259130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:30,192-Speed 9190.05 samples/sec   Loss 4.1236   LearningRate 0.0050   Epoch: 15   Global Step: 259140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:31,257-Speed 9625.54 samples/sec   Loss 4.1655   LearningRate 0.0050   Epoch: 15   Global Step: 259150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:32,363-Speed 9259.78 samples/sec   Loss 4.1324   LearningRate 0.0050   Epoch: 15   Global Step: 259160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:33,504-Speed 8980.95 samples/sec   Loss 4.1234   LearningRate 0.0050   Epoch: 15   Global Step: 259170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:34,647-Speed 8968.39 samples/sec   Loss 4.0401   LearningRate 0.0050   Epoch: 15   Global Step: 259180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:35,715-Speed 9593.18 samples/sec   Loss 4.2152   LearningRate 0.0050   Epoch: 15   Global Step: 259190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:36,807-Speed 9378.62 samples/sec   Loss 4.0902   LearningRate 0.0050   Epoch: 15   Global Step: 259200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:37,913-Speed 9263.94 samples/sec   Loss 4.0940   LearningRate 0.0050   Epoch: 15   Global Step: 259210   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:39,010-Speed 9339.31 samples/sec   Loss 4.1769   LearningRate 0.0050   Epoch: 15   Global Step: 259220   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:40,084-Speed 9542.10 samples/sec   Loss 4.0701   LearningRate 0.0050   Epoch: 15   Global Step: 259230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:41,130-Speed 9791.79 samples/sec   Loss 4.0977   LearningRate 0.0050   Epoch: 15   Global Step: 259240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:42,210-Speed 9489.78 samples/sec   Loss 4.1545   LearningRate 0.0050   Epoch: 15   Global Step: 259250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:43,346-Speed 9025.09 samples/sec   Loss 4.1800   LearningRate 0.0050   Epoch: 15   Global Step: 259260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:44,395-Speed 9767.27 samples/sec   Loss 4.1379   LearningRate 0.0050   Epoch: 15   Global Step: 259270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:45,462-Speed 9599.18 samples/sec   Loss 4.1987   LearningRate 0.0050   Epoch: 15   Global Step: 259280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:46,518-Speed 9699.10 samples/sec   Loss 4.1378   LearningRate 0.0050   Epoch: 15   Global Step: 259290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:47,637-Speed 9159.71 samples/sec   Loss 4.0865   LearningRate 0.0050   Epoch: 15   Global Step: 259300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:48,764-Speed 9089.43 samples/sec   Loss 4.1401   LearningRate 0.0050   Epoch: 15   Global Step: 259310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:49,829-Speed 9624.96 samples/sec   Loss 4.0383   LearningRate 0.0050   Epoch: 15   Global Step: 259320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:50,918-Speed 9414.66 samples/sec   Loss 4.0611   LearningRate 0.0050   Epoch: 15   Global Step: 259330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:00:52,026-Speed 9245.46 samples/sec   Loss 4.0718   LearningRate 0.0050   Epoch: 15   Global Step: 259340   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:53,135-Speed 9235.46 samples/sec   Loss 4.2239   LearningRate 0.0050   Epoch: 15   Global Step: 259350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:54,241-Speed 9270.08 samples/sec   Loss 4.1061   LearningRate 0.0050   Epoch: 15   Global Step: 259360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:55,360-Speed 9153.28 samples/sec   Loss 4.1770   LearningRate 0.0050   Epoch: 15   Global Step: 259370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:56,479-Speed 9154.53 samples/sec   Loss 4.1539   LearningRate 0.0050   Epoch: 15   Global Step: 259380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:57,590-Speed 9220.19 samples/sec   Loss 4.1842   LearningRate 0.0050   Epoch: 15   Global Step: 259390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:58,677-Speed 9430.14 samples/sec   Loss 4.1214   LearningRate 0.0050   Epoch: 15   Global Step: 259400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:00:59,762-Speed 9438.53 samples/sec   Loss 4.2026   LearningRate 0.0050   Epoch: 15   Global Step: 259410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:01:00,832-Speed 9575.03 samples/sec   Loss 4.1310   LearningRate 0.0050   Epoch: 15   Global Step: 259420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:01:01,943-Speed 9230.62 samples/sec   Loss 4.1627   LearningRate 0.0050   Epoch: 15   Global Step: 259430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:01:03,031-Speed 9415.01 samples/sec   Loss 4.1057   LearningRate 0.0050   Epoch: 15   Global Step: 259440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:01:04,116-Speed 9436.57 samples/sec   Loss 4.1030   LearningRate 0.0050   Epoch: 15   Global Step: 259450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:05,276-Speed 8834.69 samples/sec   Loss 4.0621   LearningRate 0.0050   Epoch: 15   Global Step: 259460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:06,388-Speed 9213.60 samples/sec   Loss 4.1491   LearningRate 0.0050   Epoch: 15   Global Step: 259470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:07,464-Speed 9518.86 samples/sec   Loss 4.1619   LearningRate 0.0050   Epoch: 15   Global Step: 259480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:08,559-Speed 9360.18 samples/sec   Loss 4.1682   LearningRate 0.0050   Epoch: 15   Global Step: 259490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:09,668-Speed 9236.35 samples/sec   Loss 4.0692   LearningRate 0.0050   Epoch: 15   Global Step: 259500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:10,748-Speed 9487.50 samples/sec   Loss 4.1786   LearningRate 0.0050   Epoch: 15   Global Step: 259510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:11,811-Speed 9644.23 samples/sec   Loss 4.0722   LearningRate 0.0050   Epoch: 15   Global Step: 259520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:12,881-Speed 9579.01 samples/sec   Loss 4.1547   LearningRate 0.0050   Epoch: 15   Global Step: 259530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:13,959-Speed 9500.60 samples/sec   Loss 4.1559   LearningRate 0.0050   Epoch: 15   Global Step: 259540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:15,051-Speed 9388.32 samples/sec   Loss 4.1993   LearningRate 0.0049   Epoch: 15   Global Step: 259550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:01:16,149-Speed 9324.59 samples/sec   Loss 4.1655   LearningRate 0.0049   Epoch: 15   Global Step: 259560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:01:17,221-Speed 9559.74 samples/sec   Loss 4.1947   LearningRate 0.0049   Epoch: 15   Global Step: 259570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:01:18,285-Speed 9631.30 samples/sec   Loss 4.1382   LearningRate 0.0049   Epoch: 15   Global Step: 259580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:01:19,400-Speed 9186.81 samples/sec   Loss 4.1075   LearningRate 0.0049   Epoch: 15   Global Step: 259590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:01:20,500-Speed 9318.79 samples/sec   Loss 4.1315   LearningRate 0.0049   Epoch: 15   Global Step: 259600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:21,637-Speed 9008.42 samples/sec   Loss 4.0746   LearningRate 0.0049   Epoch: 15   Global Step: 259610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:22,724-Speed 9425.16 samples/sec   Loss 4.1474   LearningRate 0.0049   Epoch: 15   Global Step: 259620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:23,818-Speed 9365.52 samples/sec   Loss 4.1573   LearningRate 0.0049   Epoch: 15   Global Step: 259630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:24,900-Speed 9472.56 samples/sec   Loss 4.0878   LearningRate 0.0049   Epoch: 15   Global Step: 259640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:25,974-Speed 9533.88 samples/sec   Loss 4.1913   LearningRate 0.0049   Epoch: 15   Global Step: 259650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:27,093-Speed 9157.77 samples/sec   Loss 4.1926   LearningRate 0.0049   Epoch: 15   Global Step: 259660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:28,286-Speed 8589.64 samples/sec   Loss 4.0526   LearningRate 0.0049   Epoch: 15   Global Step: 259670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:29,405-Speed 9155.07 samples/sec   Loss 4.1160   LearningRate 0.0049   Epoch: 15   Global Step: 259680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:30,526-Speed 9145.16 samples/sec   Loss 4.1414   LearningRate 0.0049   Epoch: 15   Global Step: 259690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:31,586-Speed 9670.67 samples/sec   Loss 4.1216   LearningRate 0.0049   Epoch: 15   Global Step: 259700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:01:32,753-Speed 8778.03 samples/sec   Loss 4.2567   LearningRate 0.0049   Epoch: 15   Global Step: 259710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:33,852-Speed 9321.06 samples/sec   Loss 4.1917   LearningRate 0.0049   Epoch: 15   Global Step: 259720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:34,961-Speed 9239.44 samples/sec   Loss 4.1786   LearningRate 0.0049   Epoch: 15   Global Step: 259730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:36,041-Speed 9487.39 samples/sec   Loss 4.1775   LearningRate 0.0049   Epoch: 15   Global Step: 259740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:37,091-Speed 9756.06 samples/sec   Loss 4.0082   LearningRate 0.0049   Epoch: 15   Global Step: 259750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:38,201-Speed 9226.67 samples/sec   Loss 4.1598   LearningRate 0.0049   Epoch: 15   Global Step: 259760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:39,296-Speed 9360.97 samples/sec   Loss 4.1699   LearningRate 0.0049   Epoch: 15   Global Step: 259770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:40,436-Speed 8990.39 samples/sec   Loss 4.0227   LearningRate 0.0049   Epoch: 15   Global Step: 259780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:41,512-Speed 9521.59 samples/sec   Loss 4.1000   LearningRate 0.0049   Epoch: 15   Global Step: 259790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:42,582-Speed 9574.51 samples/sec   Loss 4.1502   LearningRate 0.0049   Epoch: 15   Global Step: 259800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:43,671-Speed 9410.79 samples/sec   Loss 4.1128   LearningRate 0.0049   Epoch: 15   Global Step: 259810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:01:44,779-Speed 9246.56 samples/sec   Loss 4.1640   LearningRate 0.0049   Epoch: 15   Global Step: 259820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:01:45,852-Speed 9544.97 samples/sec   Loss 4.1303   LearningRate 0.0049   Epoch: 15   Global Step: 259830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:01:46,965-Speed 9205.62 samples/sec   Loss 4.2141   LearningRate 0.0049   Epoch: 15   Global Step: 259840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:01:48,101-Speed 9021.61 samples/sec   Loss 4.0687   LearningRate 0.0049   Epoch: 15   Global Step: 259850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:01:49,257-Speed 8866.03 samples/sec   Loss 4.1282   LearningRate 0.0049   Epoch: 15   Global Step: 259860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:50,353-Speed 9350.45 samples/sec   Loss 4.1747   LearningRate 0.0049   Epoch: 15   Global Step: 259870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:51,488-Speed 9027.93 samples/sec   Loss 4.1089   LearningRate 0.0049   Epoch: 15   Global Step: 259880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:52,646-Speed 8847.88 samples/sec   Loss 4.1266   LearningRate 0.0049   Epoch: 15   Global Step: 259890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:53,733-Speed 9417.63 samples/sec   Loss 4.0871   LearningRate 0.0049   Epoch: 15   Global Step: 259900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:54,855-Speed 9134.66 samples/sec   Loss 4.1508   LearningRate 0.0049   Epoch: 15   Global Step: 259910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:55,945-Speed 9401.48 samples/sec   Loss 4.1309   LearningRate 0.0049   Epoch: 15   Global Step: 259920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:57,037-Speed 9377.67 samples/sec   Loss 4.1545   LearningRate 0.0049   Epoch: 15   Global Step: 259930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:58,133-Speed 9349.18 samples/sec   Loss 4.1894   LearningRate 0.0049   Epoch: 15   Global Step: 259940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:01:59,200-Speed 9606.00 samples/sec   Loss 4.2098   LearningRate 0.0049   Epoch: 15   Global Step: 259950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:02:00,290-Speed 9396.31 samples/sec   Loss 4.2073   LearningRate 0.0049   Epoch: 15   Global Step: 259960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:02:01,414-Speed 9117.40 samples/sec   Loss 4.0808   LearningRate 0.0049   Epoch: 15   Global Step: 259970   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:02:02,506-Speed 9386.34 samples/sec   Loss 4.1646   LearningRate 0.0049   Epoch: 15   Global Step: 259980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:02:03,551-Speed 9801.94 samples/sec   Loss 4.0652   LearningRate 0.0049   Epoch: 15   Global Step: 259990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:02:04,609-Speed 9685.15 samples/sec   Loss 4.0816   LearningRate 0.0049   Epoch: 15   Global Step: 260000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:02:26,777-[lfw][260000]XNorm: 7.209153
Training: 2022-04-11 22:02:26,777-[lfw][260000]Accuracy-Flip: 0.99617+-0.00248
Training: 2022-04-11 22:02:26,778-[lfw][260000]Accuracy-Highest: 0.99733
Training: 2022-04-11 22:02:52,380-[cfp_fp][260000]XNorm: 6.235661
Training: 2022-04-11 22:02:52,381-[cfp_fp][260000]Accuracy-Flip: 0.96943+-0.00980
Training: 2022-04-11 22:02:52,381-[cfp_fp][260000]Accuracy-Highest: 0.97143
Training: 2022-04-11 22:03:14,502-[agedb_30][260000]XNorm: 7.008566
Training: 2022-04-11 22:03:14,503-[agedb_30][260000]Accuracy-Flip: 0.97083+-0.01023
Training: 2022-04-11 22:03:14,503-[agedb_30][260000]Accuracy-Highest: 0.97350
Training: 2022-04-11 22:03:15,581-Speed 144.28 samples/sec   Loss 4.1186   LearningRate 0.0049   Epoch: 15   Global Step: 260010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:16,666-Speed 9441.91 samples/sec   Loss 4.1005   LearningRate 0.0049   Epoch: 15   Global Step: 260020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:17,751-Speed 9438.35 samples/sec   Loss 4.1735   LearningRate 0.0049   Epoch: 15   Global Step: 260030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:18,850-Speed 9322.75 samples/sec   Loss 4.1442   LearningRate 0.0049   Epoch: 15   Global Step: 260040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:19,958-Speed 9247.33 samples/sec   Loss 4.2354   LearningRate 0.0049   Epoch: 15   Global Step: 260050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:21,028-Speed 9582.20 samples/sec   Loss 4.1181   LearningRate 0.0049   Epoch: 15   Global Step: 260060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:22,119-Speed 9391.19 samples/sec   Loss 4.1786   LearningRate 0.0049   Epoch: 15   Global Step: 260070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:23,228-Speed 9235.04 samples/sec   Loss 4.1843   LearningRate 0.0049   Epoch: 15   Global Step: 260080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:24,325-Speed 9342.46 samples/sec   Loss 4.1518   LearningRate 0.0049   Epoch: 15   Global Step: 260090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:25,444-Speed 9152.64 samples/sec   Loss 4.2131   LearningRate 0.0049   Epoch: 15   Global Step: 260100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:26,630-Speed 8638.96 samples/sec   Loss 4.0794   LearningRate 0.0049   Epoch: 15   Global Step: 260110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:27,706-Speed 9522.64 samples/sec   Loss 4.1361   LearningRate 0.0049   Epoch: 15   Global Step: 260120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:28,767-Speed 9662.43 samples/sec   Loss 4.1477   LearningRate 0.0049   Epoch: 15   Global Step: 260130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:29,869-Speed 9290.36 samples/sec   Loss 4.1305   LearningRate 0.0049   Epoch: 15   Global Step: 260140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:30,966-Speed 9347.25 samples/sec   Loss 4.1969   LearningRate 0.0049   Epoch: 15   Global Step: 260150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:32,055-Speed 9402.26 samples/sec   Loss 4.1509   LearningRate 0.0049   Epoch: 15   Global Step: 260160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:33,163-Speed 9253.06 samples/sec   Loss 4.1870   LearningRate 0.0049   Epoch: 15   Global Step: 260170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:34,260-Speed 9344.02 samples/sec   Loss 4.0752   LearningRate 0.0049   Epoch: 15   Global Step: 260180   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:35,375-Speed 9185.92 samples/sec   Loss 4.2288   LearningRate 0.0049   Epoch: 15   Global Step: 260190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:36,505-Speed 9071.01 samples/sec   Loss 4.1419   LearningRate 0.0049   Epoch: 15   Global Step: 260200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:37,676-Speed 8745.83 samples/sec   Loss 4.1296   LearningRate 0.0049   Epoch: 15   Global Step: 260210   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:38,781-Speed 9275.69 samples/sec   Loss 4.1418   LearningRate 0.0049   Epoch: 15   Global Step: 260220   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:39,877-Speed 9347.24 samples/sec   Loss 4.0466   LearningRate 0.0049   Epoch: 15   Global Step: 260230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:40,955-Speed 9509.85 samples/sec   Loss 4.1170   LearningRate 0.0049   Epoch: 15   Global Step: 260240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:42,060-Speed 9272.24 samples/sec   Loss 4.0410   LearningRate 0.0049   Epoch: 15   Global Step: 260250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:43,161-Speed 9305.46 samples/sec   Loss 4.1563   LearningRate 0.0049   Epoch: 15   Global Step: 260260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:44,266-Speed 9269.18 samples/sec   Loss 4.0703   LearningRate 0.0049   Epoch: 15   Global Step: 260270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:45,369-Speed 9289.47 samples/sec   Loss 4.1945   LearningRate 0.0049   Epoch: 15   Global Step: 260280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:46,485-Speed 9184.53 samples/sec   Loss 4.1435   LearningRate 0.0049   Epoch: 15   Global Step: 260290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:47,560-Speed 9529.94 samples/sec   Loss 4.1834   LearningRate 0.0049   Epoch: 15   Global Step: 260300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:48,644-Speed 9451.62 samples/sec   Loss 4.1452   LearningRate 0.0048   Epoch: 15   Global Step: 260310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:49,761-Speed 9176.30 samples/sec   Loss 4.1680   LearningRate 0.0048   Epoch: 15   Global Step: 260320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:50,889-Speed 9083.33 samples/sec   Loss 4.1032   LearningRate 0.0048   Epoch: 15   Global Step: 260330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:51,964-Speed 9527.32 samples/sec   Loss 4.1488   LearningRate 0.0048   Epoch: 15   Global Step: 260340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:53,048-Speed 9459.03 samples/sec   Loss 4.1524   LearningRate 0.0048   Epoch: 15   Global Step: 260350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:54,180-Speed 9048.20 samples/sec   Loss 4.1766   LearningRate 0.0048   Epoch: 15   Global Step: 260360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:55,324-Speed 8957.11 samples/sec   Loss 4.1180   LearningRate 0.0048   Epoch: 15   Global Step: 260370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:56,464-Speed 8985.84 samples/sec   Loss 4.1656   LearningRate 0.0048   Epoch: 15   Global Step: 260380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:57,549-Speed 9450.97 samples/sec   Loss 4.1508   LearningRate 0.0048   Epoch: 15   Global Step: 260390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:03:58,738-Speed 8611.84 samples/sec   Loss 4.1851   LearningRate 0.0048   Epoch: 15   Global Step: 260400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:03:59,882-Speed 8957.62 samples/sec   Loss 4.1746   LearningRate 0.0048   Epoch: 15   Global Step: 260410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:00,986-Speed 9282.52 samples/sec   Loss 4.1947   LearningRate 0.0048   Epoch: 15   Global Step: 260420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:02,099-Speed 9200.81 samples/sec   Loss 4.1389   LearningRate 0.0048   Epoch: 15   Global Step: 260430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:03,183-Speed 9455.26 samples/sec   Loss 4.0908   LearningRate 0.0048   Epoch: 15   Global Step: 260440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:04,292-Speed 9246.86 samples/sec   Loss 4.1064   LearningRate 0.0048   Epoch: 15   Global Step: 260450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:05,417-Speed 9102.75 samples/sec   Loss 4.2259   LearningRate 0.0048   Epoch: 15   Global Step: 260460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:06,498-Speed 9481.62 samples/sec   Loss 4.1802   LearningRate 0.0048   Epoch: 15   Global Step: 260470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:07,614-Speed 9175.80 samples/sec   Loss 4.1966   LearningRate 0.0048   Epoch: 15   Global Step: 260480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:08,729-Speed 9190.42 samples/sec   Loss 4.1240   LearningRate 0.0048   Epoch: 15   Global Step: 260490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:09,823-Speed 9368.79 samples/sec   Loss 4.1844   LearningRate 0.0048   Epoch: 15   Global Step: 260500   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:04:10,897-Speed 9538.88 samples/sec   Loss 4.1250   LearningRate 0.0048   Epoch: 15   Global Step: 260510   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:04:11,975-Speed 9505.40 samples/sec   Loss 4.0996   LearningRate 0.0048   Epoch: 15   Global Step: 260520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:13,116-Speed 8977.37 samples/sec   Loss 4.1784   LearningRate 0.0048   Epoch: 15   Global Step: 260530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:14,198-Speed 9472.19 samples/sec   Loss 4.1324   LearningRate 0.0048   Epoch: 15   Global Step: 260540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:15,326-Speed 9078.96 samples/sec   Loss 4.1093   LearningRate 0.0048   Epoch: 15   Global Step: 260550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:16,435-Speed 9239.74 samples/sec   Loss 4.1587   LearningRate 0.0048   Epoch: 15   Global Step: 260560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:17,533-Speed 9338.35 samples/sec   Loss 4.1437   LearningRate 0.0048   Epoch: 15   Global Step: 260570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:18,582-Speed 9768.18 samples/sec   Loss 4.1027   LearningRate 0.0048   Epoch: 15   Global Step: 260580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:19,638-Speed 9699.16 samples/sec   Loss 4.2318   LearningRate 0.0048   Epoch: 15   Global Step: 260590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:20,700-Speed 9643.15 samples/sec   Loss 4.1705   LearningRate 0.0048   Epoch: 15   Global Step: 260600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:21,783-Speed 9465.65 samples/sec   Loss 4.1271   LearningRate 0.0048   Epoch: 15   Global Step: 260610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:22,869-Speed 9434.05 samples/sec   Loss 4.1710   LearningRate 0.0048   Epoch: 15   Global Step: 260620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:04:23,999-Speed 9067.95 samples/sec   Loss 4.1557   LearningRate 0.0048   Epoch: 15   Global Step: 260630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:04:25,081-Speed 9464.01 samples/sec   Loss 4.0982   LearningRate 0.0048   Epoch: 15   Global Step: 260640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:26,152-Speed 9567.47 samples/sec   Loss 4.1225   LearningRate 0.0048   Epoch: 15   Global Step: 260650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:27,236-Speed 9452.99 samples/sec   Loss 4.1610   LearningRate 0.0048   Epoch: 15   Global Step: 260660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:28,351-Speed 9191.44 samples/sec   Loss 4.0703   LearningRate 0.0048   Epoch: 15   Global Step: 260670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:29,479-Speed 9084.69 samples/sec   Loss 4.1942   LearningRate 0.0048   Epoch: 15   Global Step: 260680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:30,612-Speed 9047.34 samples/sec   Loss 4.1220   LearningRate 0.0048   Epoch: 15   Global Step: 260690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:31,699-Speed 9423.91 samples/sec   Loss 4.1752   LearningRate 0.0048   Epoch: 15   Global Step: 260700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:32,827-Speed 9084.15 samples/sec   Loss 4.0804   LearningRate 0.0048   Epoch: 15   Global Step: 260710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:33,900-Speed 9555.56 samples/sec   Loss 4.1290   LearningRate 0.0048   Epoch: 15   Global Step: 260720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:35,023-Speed 9124.25 samples/sec   Loss 4.1042   LearningRate 0.0048   Epoch: 15   Global Step: 260730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:36,127-Speed 9275.03 samples/sec   Loss 4.1582   LearningRate 0.0048   Epoch: 15   Global Step: 260740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:04:37,198-Speed 9566.33 samples/sec   Loss 4.0841   LearningRate 0.0048   Epoch: 15   Global Step: 260750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:04:38,280-Speed 9472.04 samples/sec   Loss 4.1123   LearningRate 0.0048   Epoch: 15   Global Step: 260760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:04:39,336-Speed 9701.35 samples/sec   Loss 4.1333   LearningRate 0.0048   Epoch: 15   Global Step: 260770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:40,455-Speed 9152.53 samples/sec   Loss 4.1689   LearningRate 0.0048   Epoch: 15   Global Step: 260780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:41,538-Speed 9464.55 samples/sec   Loss 4.1397   LearningRate 0.0048   Epoch: 15   Global Step: 260790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:42,603-Speed 9616.40 samples/sec   Loss 4.1513   LearningRate 0.0048   Epoch: 15   Global Step: 260800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:43,714-Speed 9225.54 samples/sec   Loss 4.1631   LearningRate 0.0048   Epoch: 15   Global Step: 260810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:44,811-Speed 9335.11 samples/sec   Loss 4.1732   LearningRate 0.0048   Epoch: 15   Global Step: 260820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:04:45,959-Speed 8925.81 samples/sec   Loss 4.2053   LearningRate 0.0048   Epoch: 15   Global Step: 260830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:04:47,021-Speed 9656.01 samples/sec   Loss 4.1292   LearningRate 0.0048   Epoch: 15   Global Step: 260840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:04:48,082-Speed 9657.61 samples/sec   Loss 4.2235   LearningRate 0.0048   Epoch: 15   Global Step: 260850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:04:49,128-Speed 9800.32 samples/sec   Loss 4.2094   LearningRate 0.0048   Epoch: 15   Global Step: 260860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:04:50,209-Speed 9478.36 samples/sec   Loss 4.1667   LearningRate 0.0048   Epoch: 15   Global Step: 260870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:04:51,280-Speed 9568.61 samples/sec   Loss 4.1461   LearningRate 0.0048   Epoch: 15   Global Step: 260880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:04:52,394-Speed 9198.67 samples/sec   Loss 4.2054   LearningRate 0.0048   Epoch: 15   Global Step: 260890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:04:53,505-Speed 9218.78 samples/sec   Loss 4.2781   LearningRate 0.0048   Epoch: 15   Global Step: 260900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:04:54,610-Speed 9276.22 samples/sec   Loss 4.2369   LearningRate 0.0048   Epoch: 15   Global Step: 260910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:04:55,683-Speed 9541.90 samples/sec   Loss 4.1342   LearningRate 0.0048   Epoch: 15   Global Step: 260920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:56,755-Speed 9569.27 samples/sec   Loss 4.1043   LearningRate 0.0048   Epoch: 15   Global Step: 260930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:57,887-Speed 9048.18 samples/sec   Loss 4.0885   LearningRate 0.0048   Epoch: 15   Global Step: 260940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:04:58,975-Speed 9413.54 samples/sec   Loss 4.1396   LearningRate 0.0048   Epoch: 15   Global Step: 260950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:00,062-Speed 9432.61 samples/sec   Loss 4.1309   LearningRate 0.0048   Epoch: 15   Global Step: 260960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:01,152-Speed 9396.71 samples/sec   Loss 4.1228   LearningRate 0.0048   Epoch: 15   Global Step: 260970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:02,261-Speed 9238.77 samples/sec   Loss 4.0521   LearningRate 0.0048   Epoch: 15   Global Step: 260980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:03,368-Speed 9256.00 samples/sec   Loss 4.1699   LearningRate 0.0048   Epoch: 15   Global Step: 260990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:04,472-Speed 9280.93 samples/sec   Loss 4.0672   LearningRate 0.0048   Epoch: 15   Global Step: 261000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:05,600-Speed 9084.67 samples/sec   Loss 4.1907   LearningRate 0.0048   Epoch: 15   Global Step: 261010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:06,715-Speed 9188.38 samples/sec   Loss 4.1633   LearningRate 0.0048   Epoch: 15   Global Step: 261020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:07,838-Speed 9127.15 samples/sec   Loss 4.2448   LearningRate 0.0048   Epoch: 15   Global Step: 261030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:08,959-Speed 9137.93 samples/sec   Loss 4.1464   LearningRate 0.0048   Epoch: 15   Global Step: 261040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:10,047-Speed 9415.50 samples/sec   Loss 4.1141   LearningRate 0.0048   Epoch: 15   Global Step: 261050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:11,135-Speed 9421.73 samples/sec   Loss 4.0719   LearningRate 0.0048   Epoch: 15   Global Step: 261060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:12,290-Speed 8869.82 samples/sec   Loss 4.2580   LearningRate 0.0047   Epoch: 15   Global Step: 261070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:13,419-Speed 9074.84 samples/sec   Loss 4.1905   LearningRate 0.0047   Epoch: 15   Global Step: 261080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:14,560-Speed 8981.56 samples/sec   Loss 4.1861   LearningRate 0.0047   Epoch: 15   Global Step: 261090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:15,691-Speed 9055.52 samples/sec   Loss 4.1187   LearningRate 0.0047   Epoch: 15   Global Step: 261100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:16,762-Speed 9572.52 samples/sec   Loss 4.1618   LearningRate 0.0047   Epoch: 15   Global Step: 261110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:17,846-Speed 9448.53 samples/sec   Loss 4.0536   LearningRate 0.0047   Epoch: 15   Global Step: 261120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:18,930-Speed 9453.28 samples/sec   Loss 4.1021   LearningRate 0.0047   Epoch: 15   Global Step: 261130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:20,030-Speed 9313.29 samples/sec   Loss 4.0349   LearningRate 0.0047   Epoch: 15   Global Step: 261140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:21,118-Speed 9422.20 samples/sec   Loss 4.0915   LearningRate 0.0047   Epoch: 15   Global Step: 261150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:22,227-Speed 9240.84 samples/sec   Loss 4.1120   LearningRate 0.0047   Epoch: 15   Global Step: 261160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:23,347-Speed 9140.89 samples/sec   Loss 4.1576   LearningRate 0.0047   Epoch: 15   Global Step: 261170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:24,422-Speed 9532.14 samples/sec   Loss 4.1552   LearningRate 0.0047   Epoch: 15   Global Step: 261180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:25,530-Speed 9248.25 samples/sec   Loss 4.0101   LearningRate 0.0047   Epoch: 15   Global Step: 261190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:26,605-Speed 9532.78 samples/sec   Loss 4.1543   LearningRate 0.0047   Epoch: 15   Global Step: 261200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:27,726-Speed 9143.72 samples/sec   Loss 4.2264   LearningRate 0.0047   Epoch: 15   Global Step: 261210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:28,844-Speed 9167.80 samples/sec   Loss 4.1589   LearningRate 0.0047   Epoch: 15   Global Step: 261220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:29,939-Speed 9355.65 samples/sec   Loss 4.1476   LearningRate 0.0047   Epoch: 15   Global Step: 261230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:31,016-Speed 9507.71 samples/sec   Loss 4.1191   LearningRate 0.0047   Epoch: 15   Global Step: 261240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:32,124-Speed 9254.42 samples/sec   Loss 4.1291   LearningRate 0.0047   Epoch: 15   Global Step: 261250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:33,189-Speed 9618.01 samples/sec   Loss 4.0743   LearningRate 0.0047   Epoch: 15   Global Step: 261260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:34,256-Speed 9605.90 samples/sec   Loss 4.0997   LearningRate 0.0047   Epoch: 15   Global Step: 261270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:35,332-Speed 9524.28 samples/sec   Loss 4.2596   LearningRate 0.0047   Epoch: 15   Global Step: 261280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:36,417-Speed 9436.30 samples/sec   Loss 4.1589   LearningRate 0.0047   Epoch: 15   Global Step: 261290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:37,510-Speed 9380.50 samples/sec   Loss 4.1861   LearningRate 0.0047   Epoch: 15   Global Step: 261300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:38,589-Speed 9496.10 samples/sec   Loss 4.2021   LearningRate 0.0047   Epoch: 15   Global Step: 261310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:39,674-Speed 9435.83 samples/sec   Loss 4.0891   LearningRate 0.0047   Epoch: 15   Global Step: 261320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:40,735-Speed 9660.60 samples/sec   Loss 4.0638   LearningRate 0.0047   Epoch: 15   Global Step: 261330   Fp16 Grad Scale: 262144   Required: 3 hours
Training: 2022-04-11 22:05:41,845-Speed 9233.63 samples/sec   Loss 4.0608   LearningRate 0.0047   Epoch: 15   Global Step: 261340   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:42,961-Speed 9179.38 samples/sec   Loss 4.1663   LearningRate 0.0047   Epoch: 15   Global Step: 261350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:44,104-Speed 8958.38 samples/sec   Loss 4.1858   LearningRate 0.0047   Epoch: 15   Global Step: 261360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:45,189-Speed 9442.50 samples/sec   Loss 4.1224   LearningRate 0.0047   Epoch: 15   Global Step: 261370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:46,315-Speed 9126.11 samples/sec   Loss 4.0410   LearningRate 0.0047   Epoch: 15   Global Step: 261380   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:47,371-Speed 9696.05 samples/sec   Loss 4.1699   LearningRate 0.0047   Epoch: 15   Global Step: 261390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:48,491-Speed 9148.80 samples/sec   Loss 4.1567   LearningRate 0.0047   Epoch: 15   Global Step: 261400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:49,595-Speed 9286.00 samples/sec   Loss 4.0915   LearningRate 0.0047   Epoch: 15   Global Step: 261410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:50,680-Speed 9438.33 samples/sec   Loss 4.1270   LearningRate 0.0047   Epoch: 15   Global Step: 261420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:51,831-Speed 8905.04 samples/sec   Loss 4.1839   LearningRate 0.0047   Epoch: 15   Global Step: 261430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:52,908-Speed 9507.28 samples/sec   Loss 4.1663   LearningRate 0.0047   Epoch: 15   Global Step: 261440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:54,008-Speed 9316.13 samples/sec   Loss 4.1475   LearningRate 0.0047   Epoch: 15   Global Step: 261450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:05:55,134-Speed 9096.63 samples/sec   Loss 4.1471   LearningRate 0.0047   Epoch: 15   Global Step: 261460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:56,215-Speed 9484.83 samples/sec   Loss 4.2205   LearningRate 0.0047   Epoch: 15   Global Step: 261470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:57,335-Speed 9151.34 samples/sec   Loss 4.1329   LearningRate 0.0047   Epoch: 15   Global Step: 261480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:58,442-Speed 9251.65 samples/sec   Loss 4.1031   LearningRate 0.0047   Epoch: 15   Global Step: 261490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:05:59,548-Speed 9266.86 samples/sec   Loss 4.1226   LearningRate 0.0047   Epoch: 15   Global Step: 261500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:00,628-Speed 9482.61 samples/sec   Loss 4.1139   LearningRate 0.0047   Epoch: 15   Global Step: 261510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:01,769-Speed 8978.89 samples/sec   Loss 4.1541   LearningRate 0.0047   Epoch: 15   Global Step: 261520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:02,879-Speed 9232.47 samples/sec   Loss 4.1432   LearningRate 0.0047   Epoch: 15   Global Step: 261530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:04,005-Speed 9099.57 samples/sec   Loss 4.1361   LearningRate 0.0047   Epoch: 15   Global Step: 261540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:05,123-Speed 9165.97 samples/sec   Loss 4.1827   LearningRate 0.0047   Epoch: 15   Global Step: 261550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:06,238-Speed 9188.91 samples/sec   Loss 4.2542   LearningRate 0.0047   Epoch: 15   Global Step: 261560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:06:07,331-Speed 9378.61 samples/sec   Loss 4.2063   LearningRate 0.0047   Epoch: 15   Global Step: 261570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:06:08,443-Speed 9211.28 samples/sec   Loss 4.1899   LearningRate 0.0047   Epoch: 15   Global Step: 261580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:06:09,542-Speed 9320.46 samples/sec   Loss 4.1890   LearningRate 0.0047   Epoch: 15   Global Step: 261590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:06:10,651-Speed 9244.70 samples/sec   Loss 4.1484   LearningRate 0.0047   Epoch: 15   Global Step: 261600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:06:11,709-Speed 9676.48 samples/sec   Loss 4.0048   LearningRate 0.0047   Epoch: 15   Global Step: 261610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:06:12,831-Speed 9136.43 samples/sec   Loss 4.1137   LearningRate 0.0047   Epoch: 15   Global Step: 261620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:06:13,957-Speed 9100.82 samples/sec   Loss 4.1174   LearningRate 0.0047   Epoch: 15   Global Step: 261630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:06:15,040-Speed 9452.66 samples/sec   Loss 4.2179   LearningRate 0.0047   Epoch: 15   Global Step: 261640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:16,138-Speed 9333.95 samples/sec   Loss 4.1517   LearningRate 0.0047   Epoch: 15   Global Step: 261650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:17,207-Speed 9592.96 samples/sec   Loss 4.3379   LearningRate 0.0047   Epoch: 15   Global Step: 261660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:18,301-Speed 9364.92 samples/sec   Loss 4.0884   LearningRate 0.0047   Epoch: 15   Global Step: 261670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:19,365-Speed 9622.64 samples/sec   Loss 4.2419   LearningRate 0.0047   Epoch: 15   Global Step: 261680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:20,435-Speed 9574.26 samples/sec   Loss 4.1834   LearningRate 0.0047   Epoch: 15   Global Step: 261690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:21,528-Speed 9377.10 samples/sec   Loss 4.1876   LearningRate 0.0047   Epoch: 15   Global Step: 261700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:22,646-Speed 9166.52 samples/sec   Loss 4.1842   LearningRate 0.0047   Epoch: 15   Global Step: 261710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:23,773-Speed 9089.71 samples/sec   Loss 4.1868   LearningRate 0.0047   Epoch: 15   Global Step: 261720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:24,893-Speed 9153.13 samples/sec   Loss 4.1179   LearningRate 0.0047   Epoch: 15   Global Step: 261730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:25,946-Speed 9728.72 samples/sec   Loss 4.1447   LearningRate 0.0047   Epoch: 15   Global Step: 261740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:27,062-Speed 9180.85 samples/sec   Loss 4.1100   LearningRate 0.0047   Epoch: 15   Global Step: 261750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:28,161-Speed 9321.27 samples/sec   Loss 4.1718   LearningRate 0.0047   Epoch: 15   Global Step: 261760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:29,295-Speed 9035.57 samples/sec   Loss 4.0368   LearningRate 0.0047   Epoch: 15   Global Step: 261770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:30,377-Speed 9469.38 samples/sec   Loss 4.1712   LearningRate 0.0047   Epoch: 15   Global Step: 261780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:31,510-Speed 9040.61 samples/sec   Loss 4.0375   LearningRate 0.0047   Epoch: 15   Global Step: 261790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:32,579-Speed 9591.79 samples/sec   Loss 4.1126   LearningRate 0.0047   Epoch: 15   Global Step: 261800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:33,685-Speed 9260.27 samples/sec   Loss 4.0773   LearningRate 0.0047   Epoch: 15   Global Step: 261810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:34,780-Speed 9358.91 samples/sec   Loss 4.1898   LearningRate 0.0047   Epoch: 15   Global Step: 261820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:35,934-Speed 8879.14 samples/sec   Loss 4.1491   LearningRate 0.0047   Epoch: 15   Global Step: 261830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:37,014-Speed 9488.56 samples/sec   Loss 4.1268   LearningRate 0.0046   Epoch: 15   Global Step: 261840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:06:38,104-Speed 9405.44 samples/sec   Loss 4.1122   LearningRate 0.0046   Epoch: 15   Global Step: 261850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:06:39,207-Speed 9287.37 samples/sec   Loss 4.0910   LearningRate 0.0046   Epoch: 15   Global Step: 261860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:06:40,331-Speed 9114.68 samples/sec   Loss 4.1577   LearningRate 0.0046   Epoch: 15   Global Step: 261870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:06:41,450-Speed 9156.43 samples/sec   Loss 4.1957   LearningRate 0.0046   Epoch: 15   Global Step: 261880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:06:42,552-Speed 9298.24 samples/sec   Loss 4.1822   LearningRate 0.0046   Epoch: 15   Global Step: 261890   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:06:43,665-Speed 9204.90 samples/sec   Loss 4.1321   LearningRate 0.0046   Epoch: 15   Global Step: 261900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:44,745-Speed 9485.74 samples/sec   Loss 4.0766   LearningRate 0.0046   Epoch: 15   Global Step: 261910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:45,829-Speed 9456.70 samples/sec   Loss 4.0714   LearningRate 0.0046   Epoch: 15   Global Step: 261920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:46,905-Speed 9528.16 samples/sec   Loss 4.1382   LearningRate 0.0046   Epoch: 15   Global Step: 261930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:47,990-Speed 9438.17 samples/sec   Loss 4.2018   LearningRate 0.0046   Epoch: 15   Global Step: 261940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:49,097-Speed 9255.09 samples/sec   Loss 4.1190   LearningRate 0.0046   Epoch: 15   Global Step: 261950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:50,208-Speed 9225.97 samples/sec   Loss 4.1652   LearningRate 0.0046   Epoch: 15   Global Step: 261960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:51,326-Speed 9162.58 samples/sec   Loss 4.1030   LearningRate 0.0046   Epoch: 15   Global Step: 261970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:52,389-Speed 9636.95 samples/sec   Loss 4.0666   LearningRate 0.0046   Epoch: 15   Global Step: 261980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:53,459-Speed 9575.86 samples/sec   Loss 4.1930   LearningRate 0.0046   Epoch: 15   Global Step: 261990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:06:54,559-Speed 9312.76 samples/sec   Loss 4.1160   LearningRate 0.0046   Epoch: 15   Global Step: 262000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:07:16,558-[lfw][262000]XNorm: 7.071909
Training: 2022-04-11 22:07:16,559-[lfw][262000]Accuracy-Flip: 0.99650+-0.00302
Training: 2022-04-11 22:07:16,560-[lfw][262000]Accuracy-Highest: 0.99733
Training: 2022-04-11 22:07:41,988-[cfp_fp][262000]XNorm: 6.119181
Training: 2022-04-11 22:07:41,988-[cfp_fp][262000]Accuracy-Flip: 0.97171+-0.00873
Training: 2022-04-11 22:07:41,989-[cfp_fp][262000]Accuracy-Highest: 0.97171
Training: 2022-04-11 22:08:03,956-[agedb_30][262000]XNorm: 6.837060
Training: 2022-04-11 22:08:03,957-[agedb_30][262000]Accuracy-Flip: 0.97067+-0.00775
Training: 2022-04-11 22:08:03,957-[agedb_30][262000]Accuracy-Highest: 0.97350
Training: 2022-04-11 22:08:05,060-Speed 145.25 samples/sec   Loss 4.1320   LearningRate 0.0046   Epoch: 15   Global Step: 262010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:06,118-Speed 9689.59 samples/sec   Loss 4.1761   LearningRate 0.0046   Epoch: 15   Global Step: 262020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:07,219-Speed 9306.56 samples/sec   Loss 4.1756   LearningRate 0.0046   Epoch: 15   Global Step: 262030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:08,323-Speed 9278.72 samples/sec   Loss 4.1373   LearningRate 0.0046   Epoch: 15   Global Step: 262040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:09,451-Speed 9087.45 samples/sec   Loss 4.0788   LearningRate 0.0046   Epoch: 15   Global Step: 262050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:10,535-Speed 9449.92 samples/sec   Loss 4.1164   LearningRate 0.0046   Epoch: 15   Global Step: 262060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:11,595-Speed 9665.27 samples/sec   Loss 4.0991   LearningRate 0.0046   Epoch: 15   Global Step: 262070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:12,692-Speed 9339.05 samples/sec   Loss 4.1572   LearningRate 0.0046   Epoch: 15   Global Step: 262080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:13,777-Speed 9447.12 samples/sec   Loss 4.2021   LearningRate 0.0046   Epoch: 15   Global Step: 262090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:14,874-Speed 9338.08 samples/sec   Loss 4.1618   LearningRate 0.0046   Epoch: 15   Global Step: 262100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:15,989-Speed 9188.79 samples/sec   Loss 4.1136   LearningRate 0.0046   Epoch: 15   Global Step: 262110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:17,106-Speed 9174.68 samples/sec   Loss 4.0846   LearningRate 0.0046   Epoch: 15   Global Step: 262120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:18,201-Speed 9360.11 samples/sec   Loss 4.1356   LearningRate 0.0046   Epoch: 15   Global Step: 262130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:19,279-Speed 9498.36 samples/sec   Loss 4.1490   LearningRate 0.0046   Epoch: 15   Global Step: 262140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:20,391-Speed 9215.32 samples/sec   Loss 4.0898   LearningRate 0.0046   Epoch: 15   Global Step: 262150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:21,554-Speed 8808.35 samples/sec   Loss 4.1666   LearningRate 0.0046   Epoch: 15   Global Step: 262160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:22,629-Speed 9532.19 samples/sec   Loss 4.1003   LearningRate 0.0046   Epoch: 15   Global Step: 262170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:23,694-Speed 9622.42 samples/sec   Loss 4.1434   LearningRate 0.0046   Epoch: 15   Global Step: 262180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:24,810-Speed 9186.53 samples/sec   Loss 4.2017   LearningRate 0.0046   Epoch: 15   Global Step: 262190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:25,908-Speed 9323.00 samples/sec   Loss 4.1795   LearningRate 0.0046   Epoch: 15   Global Step: 262200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:26,972-Speed 9634.40 samples/sec   Loss 4.1408   LearningRate 0.0046   Epoch: 15   Global Step: 262210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:28,070-Speed 9329.70 samples/sec   Loss 4.1521   LearningRate 0.0046   Epoch: 15   Global Step: 262220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:29,180-Speed 9230.51 samples/sec   Loss 4.1252   LearningRate 0.0046   Epoch: 15   Global Step: 262230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:30,280-Speed 9318.23 samples/sec   Loss 4.1685   LearningRate 0.0046   Epoch: 15   Global Step: 262240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:31,357-Speed 9513.24 samples/sec   Loss 4.1272   LearningRate 0.0046   Epoch: 15   Global Step: 262250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:32,444-Speed 9427.28 samples/sec   Loss 4.2359   LearningRate 0.0046   Epoch: 15   Global Step: 262260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:33,541-Speed 9340.53 samples/sec   Loss 4.1537   LearningRate 0.0046   Epoch: 15   Global Step: 262270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:34,608-Speed 9608.95 samples/sec   Loss 4.1609   LearningRate 0.0046   Epoch: 15   Global Step: 262280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:35,713-Speed 9273.80 samples/sec   Loss 4.0448   LearningRate 0.0046   Epoch: 15   Global Step: 262290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:36,832-Speed 9152.07 samples/sec   Loss 4.1341   LearningRate 0.0046   Epoch: 15   Global Step: 262300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:37,975-Speed 8962.88 samples/sec   Loss 4.1736   LearningRate 0.0046   Epoch: 15   Global Step: 262310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:39,116-Speed 8980.44 samples/sec   Loss 4.1434   LearningRate 0.0046   Epoch: 15   Global Step: 262320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:40,236-Speed 9145.06 samples/sec   Loss 4.0047   LearningRate 0.0046   Epoch: 15   Global Step: 262330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:41,361-Speed 9110.90 samples/sec   Loss 4.2277   LearningRate 0.0046   Epoch: 15   Global Step: 262340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:42,453-Speed 9381.85 samples/sec   Loss 4.1444   LearningRate 0.0046   Epoch: 15   Global Step: 262350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:43,539-Speed 9433.86 samples/sec   Loss 4.1884   LearningRate 0.0046   Epoch: 15   Global Step: 262360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:44,651-Speed 9211.08 samples/sec   Loss 4.1865   LearningRate 0.0046   Epoch: 15   Global Step: 262370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:45,739-Speed 9416.30 samples/sec   Loss 4.1680   LearningRate 0.0046   Epoch: 15   Global Step: 262380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:46,842-Speed 9292.69 samples/sec   Loss 4.0908   LearningRate 0.0046   Epoch: 15   Global Step: 262390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:48,011-Speed 8765.62 samples/sec   Loss 4.1316   LearningRate 0.0046   Epoch: 15   Global Step: 262400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:49,103-Speed 9382.62 samples/sec   Loss 4.1124   LearningRate 0.0046   Epoch: 15   Global Step: 262410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:50,179-Speed 9521.57 samples/sec   Loss 4.1479   LearningRate 0.0046   Epoch: 15   Global Step: 262420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:51,280-Speed 9303.38 samples/sec   Loss 4.2711   LearningRate 0.0046   Epoch: 15   Global Step: 262430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:52,403-Speed 9130.51 samples/sec   Loss 4.1337   LearningRate 0.0046   Epoch: 15   Global Step: 262440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:53,506-Speed 9284.56 samples/sec   Loss 4.1193   LearningRate 0.0046   Epoch: 15   Global Step: 262450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:54,635-Speed 9074.36 samples/sec   Loss 4.1327   LearningRate 0.0046   Epoch: 15   Global Step: 262460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:55,766-Speed 9059.22 samples/sec   Loss 4.1548   LearningRate 0.0046   Epoch: 15   Global Step: 262470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:56,866-Speed 9314.80 samples/sec   Loss 4.1243   LearningRate 0.0046   Epoch: 15   Global Step: 262480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:08:57,947-Speed 9484.78 samples/sec   Loss 4.1401   LearningRate 0.0046   Epoch: 15   Global Step: 262490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:08:59,046-Speed 9324.65 samples/sec   Loss 4.1642   LearningRate 0.0046   Epoch: 15   Global Step: 262500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:00,173-Speed 9084.42 samples/sec   Loss 4.1253   LearningRate 0.0046   Epoch: 15   Global Step: 262510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:01,271-Speed 9346.99 samples/sec   Loss 4.1801   LearningRate 0.0046   Epoch: 15   Global Step: 262520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:02,400-Speed 9073.34 samples/sec   Loss 4.1067   LearningRate 0.0046   Epoch: 15   Global Step: 262530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:03,541-Speed 8984.17 samples/sec   Loss 4.1411   LearningRate 0.0046   Epoch: 15   Global Step: 262540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:04,632-Speed 9395.14 samples/sec   Loss 4.2271   LearningRate 0.0046   Epoch: 15   Global Step: 262550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:05,738-Speed 9261.69 samples/sec   Loss 4.1127   LearningRate 0.0046   Epoch: 15   Global Step: 262560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:06,847-Speed 9243.50 samples/sec   Loss 4.1760   LearningRate 0.0046   Epoch: 15   Global Step: 262570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:07,901-Speed 9721.00 samples/sec   Loss 4.1045   LearningRate 0.0046   Epoch: 15   Global Step: 262580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:09,020-Speed 9152.28 samples/sec   Loss 4.1773   LearningRate 0.0046   Epoch: 15   Global Step: 262590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:09:10,154-Speed 9036.87 samples/sec   Loss 4.1355   LearningRate 0.0046   Epoch: 15   Global Step: 262600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:09:11,265-Speed 9226.24 samples/sec   Loss 4.2116   LearningRate 0.0046   Epoch: 15   Global Step: 262610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:09:12,363-Speed 9330.44 samples/sec   Loss 4.1493   LearningRate 0.0045   Epoch: 15   Global Step: 262620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:09:13,475-Speed 9213.53 samples/sec   Loss 4.1067   LearningRate 0.0045   Epoch: 15   Global Step: 262630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:09:14,651-Speed 8715.63 samples/sec   Loss 4.1315   LearningRate 0.0045   Epoch: 15   Global Step: 262640   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:09:15,815-Speed 8800.60 samples/sec   Loss 4.2152   LearningRate 0.0045   Epoch: 15   Global Step: 262650   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:09:16,953-Speed 9007.54 samples/sec   Loss 4.1045   LearningRate 0.0045   Epoch: 15   Global Step: 262660   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:09:18,072-Speed 9150.53 samples/sec   Loss 4.1482   LearningRate 0.0045   Epoch: 15   Global Step: 262670   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:09:19,162-Speed 9404.47 samples/sec   Loss 4.1351   LearningRate 0.0045   Epoch: 15   Global Step: 262680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:09:20,234-Speed 9556.36 samples/sec   Loss 4.1798   LearningRate 0.0045   Epoch: 15   Global Step: 262690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:21,345-Speed 9220.08 samples/sec   Loss 4.2132   LearningRate 0.0045   Epoch: 15   Global Step: 262700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:22,437-Speed 9384.83 samples/sec   Loss 4.1826   LearningRate 0.0045   Epoch: 15   Global Step: 262710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:23,544-Speed 9256.03 samples/sec   Loss 4.1624   LearningRate 0.0045   Epoch: 15   Global Step: 262720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:24,627-Speed 9459.01 samples/sec   Loss 4.2016   LearningRate 0.0045   Epoch: 15   Global Step: 262730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:25,806-Speed 8687.26 samples/sec   Loss 4.3045   LearningRate 0.0045   Epoch: 15   Global Step: 262740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:26,931-Speed 9107.62 samples/sec   Loss 4.0913   LearningRate 0.0045   Epoch: 15   Global Step: 262750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:28,094-Speed 8813.85 samples/sec   Loss 4.1919   LearningRate 0.0045   Epoch: 15   Global Step: 262760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:29,220-Speed 9102.13 samples/sec   Loss 4.1412   LearningRate 0.0045   Epoch: 15   Global Step: 262770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:30,344-Speed 9122.55 samples/sec   Loss 4.2095   LearningRate 0.0045   Epoch: 15   Global Step: 262780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:31,421-Speed 9509.62 samples/sec   Loss 4.1113   LearningRate 0.0045   Epoch: 15   Global Step: 262790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:09:32,581-Speed 8837.10 samples/sec   Loss 4.1220   LearningRate 0.0045   Epoch: 15   Global Step: 262800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:09:33,716-Speed 9028.11 samples/sec   Loss 4.0916   LearningRate 0.0045   Epoch: 15   Global Step: 262810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:09:34,841-Speed 9102.79 samples/sec   Loss 4.2217   LearningRate 0.0045   Epoch: 15   Global Step: 262820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:09:35,968-Speed 9098.33 samples/sec   Loss 4.1461   LearningRate 0.0045   Epoch: 15   Global Step: 262830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:09:37,068-Speed 9307.51 samples/sec   Loss 4.1412   LearningRate 0.0045   Epoch: 15   Global Step: 262840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:09:38,177-Speed 9238.39 samples/sec   Loss 4.0322   LearningRate 0.0045   Epoch: 15   Global Step: 262850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:39,241-Speed 9634.96 samples/sec   Loss 4.1297   LearningRate 0.0045   Epoch: 15   Global Step: 262860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:40,354-Speed 9203.32 samples/sec   Loss 4.2083   LearningRate 0.0045   Epoch: 15   Global Step: 262870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:41,433-Speed 9494.95 samples/sec   Loss 4.1173   LearningRate 0.0045   Epoch: 15   Global Step: 262880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:42,535-Speed 9298.43 samples/sec   Loss 4.1146   LearningRate 0.0045   Epoch: 15   Global Step: 262890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:43,661-Speed 9103.62 samples/sec   Loss 4.2275   LearningRate 0.0045   Epoch: 15   Global Step: 262900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:44,791-Speed 9065.85 samples/sec   Loss 4.1724   LearningRate 0.0045   Epoch: 15   Global Step: 262910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:45,837-Speed 9797.70 samples/sec   Loss 4.2041   LearningRate 0.0045   Epoch: 15   Global Step: 262920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:46,930-Speed 9372.51 samples/sec   Loss 4.1149   LearningRate 0.0045   Epoch: 15   Global Step: 262930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:09:48,028-Speed 9327.00 samples/sec   Loss 4.1294   LearningRate 0.0045   Epoch: 15   Global Step: 262940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:09:49,117-Speed 9411.90 samples/sec   Loss 4.1779   LearningRate 0.0045   Epoch: 15   Global Step: 262950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:09:50,242-Speed 9108.70 samples/sec   Loss 4.1517   LearningRate 0.0045   Epoch: 15   Global Step: 262960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:09:51,385-Speed 8968.06 samples/sec   Loss 4.1165   LearningRate 0.0045   Epoch: 15   Global Step: 262970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:09:52,472-Speed 9425.28 samples/sec   Loss 4.0925   LearningRate 0.0045   Epoch: 15   Global Step: 262980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:09:53,550-Speed 9498.87 samples/sec   Loss 4.0862   LearningRate 0.0045   Epoch: 15   Global Step: 262990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:09:54,594-Speed 9814.04 samples/sec   Loss 4.1289   LearningRate 0.0045   Epoch: 15   Global Step: 263000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:09:55,717-Speed 9126.76 samples/sec   Loss 4.1949   LearningRate 0.0045   Epoch: 15   Global Step: 263010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:09:56,858-Speed 8977.00 samples/sec   Loss 4.0995   LearningRate 0.0045   Epoch: 15   Global Step: 263020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:09:58,030-Speed 8742.86 samples/sec   Loss 4.1975   LearningRate 0.0045   Epoch: 15   Global Step: 263030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:09:59,156-Speed 9098.87 samples/sec   Loss 4.1968   LearningRate 0.0045   Epoch: 15   Global Step: 263040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:00,257-Speed 9307.57 samples/sec   Loss 4.1256   LearningRate 0.0045   Epoch: 15   Global Step: 263050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:01,321-Speed 9631.95 samples/sec   Loss 4.1103   LearningRate 0.0045   Epoch: 15   Global Step: 263060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:02,380-Speed 9676.23 samples/sec   Loss 4.1847   LearningRate 0.0045   Epoch: 15   Global Step: 263070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:03,435-Speed 9707.69 samples/sec   Loss 4.0852   LearningRate 0.0045   Epoch: 15   Global Step: 263080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:04,544-Speed 9237.44 samples/sec   Loss 4.2321   LearningRate 0.0045   Epoch: 15   Global Step: 263090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:05,630-Speed 9435.68 samples/sec   Loss 4.1713   LearningRate 0.0045   Epoch: 15   Global Step: 263100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:06,743-Speed 9210.82 samples/sec   Loss 4.2104   LearningRate 0.0045   Epoch: 15   Global Step: 263110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:07,808-Speed 9618.54 samples/sec   Loss 4.1473   LearningRate 0.0045   Epoch: 15   Global Step: 263120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:08,863-Speed 9715.33 samples/sec   Loss 4.2111   LearningRate 0.0045   Epoch: 15   Global Step: 263130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:09,944-Speed 9485.64 samples/sec   Loss 4.1495   LearningRate 0.0045   Epoch: 15   Global Step: 263140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:11,050-Speed 9262.40 samples/sec   Loss 4.1834   LearningRate 0.0045   Epoch: 15   Global Step: 263150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:12,154-Speed 9278.15 samples/sec   Loss 4.1420   LearningRate 0.0045   Epoch: 15   Global Step: 263160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:13,264-Speed 9228.28 samples/sec   Loss 4.2126   LearningRate 0.0045   Epoch: 15   Global Step: 263170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:14,343-Speed 9499.39 samples/sec   Loss 4.1629   LearningRate 0.0045   Epoch: 15   Global Step: 263180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:15,411-Speed 9589.23 samples/sec   Loss 4.1226   LearningRate 0.0045   Epoch: 15   Global Step: 263190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:16,495-Speed 9456.47 samples/sec   Loss 4.1587   LearningRate 0.0045   Epoch: 15   Global Step: 263200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:17,569-Speed 9540.39 samples/sec   Loss 4.1843   LearningRate 0.0045   Epoch: 15   Global Step: 263210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:18,690-Speed 9133.39 samples/sec   Loss 4.1226   LearningRate 0.0045   Epoch: 15   Global Step: 263220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:19,754-Speed 9632.43 samples/sec   Loss 4.1502   LearningRate 0.0045   Epoch: 15   Global Step: 263230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:10:20,863-Speed 9239.56 samples/sec   Loss 4.1728   LearningRate 0.0045   Epoch: 15   Global Step: 263240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:10:22,001-Speed 9007.98 samples/sec   Loss 4.1477   LearningRate 0.0045   Epoch: 15   Global Step: 263250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:10:23,107-Speed 9260.93 samples/sec   Loss 4.1896   LearningRate 0.0045   Epoch: 15   Global Step: 263260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:10:24,233-Speed 9101.31 samples/sec   Loss 4.1482   LearningRate 0.0045   Epoch: 15   Global Step: 263270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:10:25,334-Speed 9313.28 samples/sec   Loss 4.1357   LearningRate 0.0045   Epoch: 15   Global Step: 263280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:10:26,450-Speed 9184.10 samples/sec   Loss 4.1990   LearningRate 0.0045   Epoch: 15   Global Step: 263290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:10:27,527-Speed 9507.62 samples/sec   Loss 4.2040   LearningRate 0.0045   Epoch: 15   Global Step: 263300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:10:28,651-Speed 9115.98 samples/sec   Loss 4.1542   LearningRate 0.0045   Epoch: 15   Global Step: 263310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:10:29,718-Speed 9605.34 samples/sec   Loss 4.0206   LearningRate 0.0045   Epoch: 15   Global Step: 263320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:10:30,795-Speed 9509.11 samples/sec   Loss 4.1424   LearningRate 0.0045   Epoch: 15   Global Step: 263330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:10:31,864-Speed 9585.68 samples/sec   Loss 4.1493   LearningRate 0.0045   Epoch: 15   Global Step: 263340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:32,973-Speed 9242.52 samples/sec   Loss 4.2214   LearningRate 0.0045   Epoch: 15   Global Step: 263350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:34,169-Speed 8576.44 samples/sec   Loss 4.1642   LearningRate 0.0045   Epoch: 15   Global Step: 263360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:35,297-Speed 9081.13 samples/sec   Loss 4.1426   LearningRate 0.0045   Epoch: 15   Global Step: 263370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:36,380-Speed 9460.58 samples/sec   Loss 4.0442   LearningRate 0.0045   Epoch: 15   Global Step: 263380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:37,458-Speed 9499.90 samples/sec   Loss 4.2009   LearningRate 0.0045   Epoch: 15   Global Step: 263390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:38,587-Speed 9075.05 samples/sec   Loss 4.1243   LearningRate 0.0045   Epoch: 15   Global Step: 263400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:39,655-Speed 9600.28 samples/sec   Loss 4.1531   LearningRate 0.0044   Epoch: 15   Global Step: 263410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:40,781-Speed 9099.76 samples/sec   Loss 4.2412   LearningRate 0.0044   Epoch: 15   Global Step: 263420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:41,830-Speed 9770.55 samples/sec   Loss 4.1323   LearningRate 0.0044   Epoch: 15   Global Step: 263430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:42,999-Speed 8761.22 samples/sec   Loss 4.0893   LearningRate 0.0044   Epoch: 15   Global Step: 263440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:10:44,130-Speed 9061.17 samples/sec   Loss 4.0574   LearningRate 0.0044   Epoch: 15   Global Step: 263450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:10:45,225-Speed 9357.34 samples/sec   Loss 4.1434   LearningRate 0.0044   Epoch: 15   Global Step: 263460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:10:46,305-Speed 9488.79 samples/sec   Loss 4.2204   LearningRate 0.0044   Epoch: 15   Global Step: 263470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:10:47,359-Speed 9725.75 samples/sec   Loss 4.0959   LearningRate 0.0044   Epoch: 15   Global Step: 263480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:10:48,435-Speed 9518.17 samples/sec   Loss 4.1804   LearningRate 0.0044   Epoch: 15   Global Step: 263490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:49,570-Speed 9026.13 samples/sec   Loss 4.1558   LearningRate 0.0044   Epoch: 15   Global Step: 263500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:50,692-Speed 9129.14 samples/sec   Loss 4.1056   LearningRate 0.0044   Epoch: 15   Global Step: 263510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:51,751-Speed 9681.38 samples/sec   Loss 4.0488   LearningRate 0.0044   Epoch: 15   Global Step: 263520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:52,823-Speed 9559.89 samples/sec   Loss 4.2489   LearningRate 0.0044   Epoch: 15   Global Step: 263530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:53,938-Speed 9188.63 samples/sec   Loss 4.2426   LearningRate 0.0044   Epoch: 15   Global Step: 263540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:55,098-Speed 8831.13 samples/sec   Loss 4.2005   LearningRate 0.0044   Epoch: 15   Global Step: 263550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:56,199-Speed 9307.26 samples/sec   Loss 4.2533   LearningRate 0.0044   Epoch: 15   Global Step: 263560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:57,311-Speed 9210.69 samples/sec   Loss 4.0767   LearningRate 0.0044   Epoch: 15   Global Step: 263570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:58,382-Speed 9571.18 samples/sec   Loss 4.0900   LearningRate 0.0044   Epoch: 15   Global Step: 263580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:10:59,472-Speed 9397.33 samples/sec   Loss 4.1434   LearningRate 0.0044   Epoch: 15   Global Step: 263590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:11:00,518-Speed 9789.98 samples/sec   Loss 4.1767   LearningRate 0.0044   Epoch: 15   Global Step: 263600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:01,672-Speed 8878.18 samples/sec   Loss 4.2213   LearningRate 0.0044   Epoch: 15   Global Step: 263610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:02,820-Speed 8938.06 samples/sec   Loss 4.1084   LearningRate 0.0044   Epoch: 15   Global Step: 263620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:03,916-Speed 9342.79 samples/sec   Loss 4.2187   LearningRate 0.0044   Epoch: 15   Global Step: 263630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:05,007-Speed 9390.14 samples/sec   Loss 4.1624   LearningRate 0.0044   Epoch: 15   Global Step: 263640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:06,122-Speed 9194.80 samples/sec   Loss 4.1538   LearningRate 0.0044   Epoch: 15   Global Step: 263650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:07,239-Speed 9169.42 samples/sec   Loss 4.1201   LearningRate 0.0044   Epoch: 15   Global Step: 263660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:08,336-Speed 9335.92 samples/sec   Loss 4.1552   LearningRate 0.0044   Epoch: 15   Global Step: 263670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:09,414-Speed 9514.91 samples/sec   Loss 4.1303   LearningRate 0.0044   Epoch: 15   Global Step: 263680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:10,529-Speed 9188.45 samples/sec   Loss 4.2340   LearningRate 0.0044   Epoch: 15   Global Step: 263690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:11,682-Speed 8879.62 samples/sec   Loss 4.1364   LearningRate 0.0044   Epoch: 15   Global Step: 263700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:11:12,777-Speed 9361.22 samples/sec   Loss 4.1376   LearningRate 0.0044   Epoch: 15   Global Step: 263710   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:11:13,908-Speed 9055.07 samples/sec   Loss 4.1314   LearningRate 0.0044   Epoch: 15   Global Step: 263720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:11:15,063-Speed 8874.76 samples/sec   Loss 4.1094   LearningRate 0.0044   Epoch: 15   Global Step: 263730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:11:16,175-Speed 9215.44 samples/sec   Loss 4.2350   LearningRate 0.0044   Epoch: 15   Global Step: 263740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:17,233-Speed 9681.83 samples/sec   Loss 4.2030   LearningRate 0.0044   Epoch: 15   Global Step: 263750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:18,341-Speed 9246.33 samples/sec   Loss 4.2319   LearningRate 0.0044   Epoch: 15   Global Step: 263760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:19,448-Speed 9255.08 samples/sec   Loss 4.2007   LearningRate 0.0044   Epoch: 15   Global Step: 263770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:20,540-Speed 9384.54 samples/sec   Loss 4.0525   LearningRate 0.0044   Epoch: 15   Global Step: 263780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:21,632-Speed 9389.90 samples/sec   Loss 4.1495   LearningRate 0.0044   Epoch: 15   Global Step: 263790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:22,699-Speed 9597.87 samples/sec   Loss 4.1495   LearningRate 0.0044   Epoch: 15   Global Step: 263800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:23,770-Speed 9569.36 samples/sec   Loss 4.1306   LearningRate 0.0044   Epoch: 15   Global Step: 263810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:24,913-Speed 8962.38 samples/sec   Loss 4.0226   LearningRate 0.0044   Epoch: 15   Global Step: 263820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:25,995-Speed 9470.36 samples/sec   Loss 4.2117   LearningRate 0.0044   Epoch: 15   Global Step: 263830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:27,089-Speed 9361.28 samples/sec   Loss 4.1438   LearningRate 0.0044   Epoch: 15   Global Step: 263840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:11:28,180-Speed 9394.44 samples/sec   Loss 4.1881   LearningRate 0.0044   Epoch: 15   Global Step: 263850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:11:29,304-Speed 9118.98 samples/sec   Loss 4.0704   LearningRate 0.0044   Epoch: 15   Global Step: 263860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:11:30,366-Speed 9643.28 samples/sec   Loss 4.2706   LearningRate 0.0044   Epoch: 15   Global Step: 263870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:11:31,427-Speed 9657.18 samples/sec   Loss 4.1514   LearningRate 0.0044   Epoch: 15   Global Step: 263880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:32,501-Speed 9546.74 samples/sec   Loss 4.1071   LearningRate 0.0044   Epoch: 15   Global Step: 263890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:33,608-Speed 9254.35 samples/sec   Loss 4.1118   LearningRate 0.0044   Epoch: 15   Global Step: 263900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:34,727-Speed 9151.95 samples/sec   Loss 4.1802   LearningRate 0.0044   Epoch: 15   Global Step: 263910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:35,821-Speed 9368.79 samples/sec   Loss 4.1492   LearningRate 0.0044   Epoch: 15   Global Step: 263920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:36,909-Speed 9419.07 samples/sec   Loss 4.1103   LearningRate 0.0044   Epoch: 15   Global Step: 263930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:38,006-Speed 9334.19 samples/sec   Loss 4.1680   LearningRate 0.0044   Epoch: 15   Global Step: 263940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:39,107-Speed 9311.23 samples/sec   Loss 4.0785   LearningRate 0.0044   Epoch: 15   Global Step: 263950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:40,203-Speed 9357.58 samples/sec   Loss 4.1434   LearningRate 0.0044   Epoch: 15   Global Step: 263960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:41,262-Speed 9672.38 samples/sec   Loss 4.1796   LearningRate 0.0044   Epoch: 15   Global Step: 263970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:42,364-Speed 9294.93 samples/sec   Loss 4.0715   LearningRate 0.0044   Epoch: 15   Global Step: 263980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:43,523-Speed 8841.69 samples/sec   Loss 4.1900   LearningRate 0.0044   Epoch: 15   Global Step: 263990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:11:44,664-Speed 8978.71 samples/sec   Loss 4.2114   LearningRate 0.0044   Epoch: 15   Global Step: 264000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:12:06,622-[lfw][264000]XNorm: 7.037474
Training: 2022-04-11 22:12:06,623-[lfw][264000]Accuracy-Flip: 0.99650+-0.00293
Training: 2022-04-11 22:12:06,623-[lfw][264000]Accuracy-Highest: 0.99733
Training: 2022-04-11 22:12:32,021-[cfp_fp][264000]XNorm: 6.096041
Training: 2022-04-11 22:12:32,022-[cfp_fp][264000]Accuracy-Flip: 0.97043+-0.00756
Training: 2022-04-11 22:12:32,022-[cfp_fp][264000]Accuracy-Highest: 0.97171
Training: 2022-04-11 22:12:53,939-[agedb_30][264000]XNorm: 6.834484
Training: 2022-04-11 22:12:53,940-[agedb_30][264000]Accuracy-Flip: 0.97317+-0.00740
Training: 2022-04-11 22:12:53,940-[agedb_30][264000]Accuracy-Highest: 0.97350
Training: 2022-04-11 22:12:55,053-Speed 145.48 samples/sec   Loss 4.1893   LearningRate 0.0044   Epoch: 15   Global Step: 264010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:12:56,171-Speed 9165.99 samples/sec   Loss 4.0995   LearningRate 0.0044   Epoch: 15   Global Step: 264020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:12:57,292-Speed 9137.67 samples/sec   Loss 4.1031   LearningRate 0.0044   Epoch: 15   Global Step: 264030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:12:58,396-Speed 9284.14 samples/sec   Loss 4.1762   LearningRate 0.0044   Epoch: 15   Global Step: 264040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:12:59,494-Speed 9336.12 samples/sec   Loss 4.0901   LearningRate 0.0044   Epoch: 15   Global Step: 264050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:00,584-Speed 9399.99 samples/sec   Loss 4.1587   LearningRate 0.0044   Epoch: 15   Global Step: 264060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:01,678-Speed 9365.65 samples/sec   Loss 4.1779   LearningRate 0.0044   Epoch: 15   Global Step: 264070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:02,775-Speed 9342.55 samples/sec   Loss 4.1357   LearningRate 0.0044   Epoch: 15   Global Step: 264080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:03,834-Speed 9668.50 samples/sec   Loss 4.1214   LearningRate 0.0044   Epoch: 15   Global Step: 264090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:04,901-Speed 9606.19 samples/sec   Loss 4.1519   LearningRate 0.0044   Epoch: 15   Global Step: 264100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:06,009-Speed 9247.55 samples/sec   Loss 4.1984   LearningRate 0.0044   Epoch: 15   Global Step: 264110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:07,085-Speed 9516.84 samples/sec   Loss 4.1433   LearningRate 0.0044   Epoch: 15   Global Step: 264120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:08,181-Speed 9353.29 samples/sec   Loss 4.0975   LearningRate 0.0044   Epoch: 15   Global Step: 264130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:09,284-Speed 9291.56 samples/sec   Loss 4.0994   LearningRate 0.0044   Epoch: 15   Global Step: 264140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:10,402-Speed 9163.22 samples/sec   Loss 4.1388   LearningRate 0.0044   Epoch: 15   Global Step: 264150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:11,543-Speed 8979.67 samples/sec   Loss 4.1441   LearningRate 0.0044   Epoch: 15   Global Step: 264160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:12,658-Speed 9189.57 samples/sec   Loss 4.1465   LearningRate 0.0044   Epoch: 15   Global Step: 264170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:13,729-Speed 9565.58 samples/sec   Loss 4.0461   LearningRate 0.0044   Epoch: 15   Global Step: 264180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:14,819-Speed 9398.98 samples/sec   Loss 4.1490   LearningRate 0.0044   Epoch: 15   Global Step: 264190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:15,950-Speed 9059.89 samples/sec   Loss 4.1821   LearningRate 0.0043   Epoch: 15   Global Step: 264200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:17,029-Speed 9490.21 samples/sec   Loss 4.2312   LearningRate 0.0043   Epoch: 15   Global Step: 264210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:18,142-Speed 9204.43 samples/sec   Loss 4.1194   LearningRate 0.0043   Epoch: 15   Global Step: 264220   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:19,221-Speed 9494.53 samples/sec   Loss 4.1545   LearningRate 0.0043   Epoch: 15   Global Step: 264230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:20,293-Speed 9558.32 samples/sec   Loss 4.1980   LearningRate 0.0043   Epoch: 15   Global Step: 264240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:21,435-Speed 8980.43 samples/sec   Loss 4.1654   LearningRate 0.0043   Epoch: 15   Global Step: 264250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:22,574-Speed 8989.26 samples/sec   Loss 4.2329   LearningRate 0.0043   Epoch: 15   Global Step: 264260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:23,674-Speed 9321.43 samples/sec   Loss 4.1389   LearningRate 0.0043   Epoch: 15   Global Step: 264270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:24,851-Speed 8705.96 samples/sec   Loss 4.1661   LearningRate 0.0043   Epoch: 15   Global Step: 264280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:25,916-Speed 9622.32 samples/sec   Loss 4.1956   LearningRate 0.0043   Epoch: 15   Global Step: 264290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:27,026-Speed 9227.64 samples/sec   Loss 4.1748   LearningRate 0.0043   Epoch: 15   Global Step: 264300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:28,116-Speed 9402.16 samples/sec   Loss 4.2046   LearningRate 0.0043   Epoch: 15   Global Step: 264310   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:29,221-Speed 9270.70 samples/sec   Loss 4.2647   LearningRate 0.0043   Epoch: 15   Global Step: 264320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:30,288-Speed 9610.42 samples/sec   Loss 4.1519   LearningRate 0.0043   Epoch: 15   Global Step: 264330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:31,416-Speed 9078.59 samples/sec   Loss 4.1558   LearningRate 0.0043   Epoch: 15   Global Step: 264340   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:32,525-Speed 9242.37 samples/sec   Loss 4.0789   LearningRate 0.0043   Epoch: 15   Global Step: 264350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:33,659-Speed 9033.55 samples/sec   Loss 4.0673   LearningRate 0.0043   Epoch: 15   Global Step: 264360   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:34,748-Speed 9407.86 samples/sec   Loss 4.0989   LearningRate 0.0043   Epoch: 15   Global Step: 264370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:13:35,815-Speed 9600.25 samples/sec   Loss 4.1273   LearningRate 0.0043   Epoch: 15   Global Step: 264380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:36,969-Speed 8883.74 samples/sec   Loss 4.1101   LearningRate 0.0043   Epoch: 15   Global Step: 264390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:38,056-Speed 9418.38 samples/sec   Loss 4.1427   LearningRate 0.0043   Epoch: 15   Global Step: 264400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:13:39,174-Speed 9173.35 samples/sec   Loss 4.1410   LearningRate 0.0043   Epoch: 15   Global Step: 264410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:13:40,231-Speed 9694.47 samples/sec   Loss 4.1107   LearningRate 0.0043   Epoch: 15   Global Step: 264420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:13:41,336-Speed 9266.09 samples/sec   Loss 4.1056   LearningRate 0.0043   Epoch: 15   Global Step: 264430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:13:42,512-Speed 8712.74 samples/sec   Loss 4.1986   LearningRate 0.0043   Epoch: 15   Global Step: 264440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:13:43,630-Speed 9161.22 samples/sec   Loss 4.1955   LearningRate 0.0043   Epoch: 15   Global Step: 264450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:13:44,745-Speed 9196.90 samples/sec   Loss 4.2134   LearningRate 0.0043   Epoch: 15   Global Step: 264460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:13:45,878-Speed 9045.20 samples/sec   Loss 4.2032   LearningRate 0.0043   Epoch: 15   Global Step: 264470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:13:47,030-Speed 8889.09 samples/sec   Loss 4.1049   LearningRate 0.0043   Epoch: 15   Global Step: 264480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:13:48,096-Speed 9611.17 samples/sec   Loss 4.1747   LearningRate 0.0043   Epoch: 15   Global Step: 264490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:13:49,220-Speed 9116.20 samples/sec   Loss 4.1579   LearningRate 0.0043   Epoch: 15   Global Step: 264500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:50,355-Speed 9025.57 samples/sec   Loss 4.3133   LearningRate 0.0043   Epoch: 15   Global Step: 264510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:51,454-Speed 9330.94 samples/sec   Loss 4.0733   LearningRate 0.0043   Epoch: 15   Global Step: 264520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:52,570-Speed 9176.80 samples/sec   Loss 4.1082   LearningRate 0.0043   Epoch: 15   Global Step: 264530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:53,666-Speed 9355.33 samples/sec   Loss 4.1189   LearningRate 0.0043   Epoch: 15   Global Step: 264540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:54,751-Speed 9443.52 samples/sec   Loss 4.1683   LearningRate 0.0043   Epoch: 15   Global Step: 264550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:55,811-Speed 9661.39 samples/sec   Loss 4.1123   LearningRate 0.0043   Epoch: 15   Global Step: 264560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:56,891-Speed 9489.82 samples/sec   Loss 4.1218   LearningRate 0.0043   Epoch: 15   Global Step: 264570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:57,984-Speed 9370.80 samples/sec   Loss 4.0550   LearningRate 0.0043   Epoch: 15   Global Step: 264580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:13:59,107-Speed 9121.73 samples/sec   Loss 4.1767   LearningRate 0.0043   Epoch: 15   Global Step: 264590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:00,213-Speed 9263.22 samples/sec   Loss 4.0955   LearningRate 0.0043   Epoch: 15   Global Step: 264600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:14:01,337-Speed 9118.32 samples/sec   Loss 4.1562   LearningRate 0.0043   Epoch: 15   Global Step: 264610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:14:02,429-Speed 9380.75 samples/sec   Loss 4.0842   LearningRate 0.0043   Epoch: 15   Global Step: 264620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:14:03,514-Speed 9457.28 samples/sec   Loss 4.1701   LearningRate 0.0043   Epoch: 15   Global Step: 264630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:04,650-Speed 9017.28 samples/sec   Loss 4.1455   LearningRate 0.0043   Epoch: 15   Global Step: 264640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:05,736-Speed 9432.29 samples/sec   Loss 4.1508   LearningRate 0.0043   Epoch: 15   Global Step: 264650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:06,835-Speed 9328.39 samples/sec   Loss 4.1351   LearningRate 0.0043   Epoch: 15   Global Step: 264660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:07,936-Speed 9306.18 samples/sec   Loss 4.1439   LearningRate 0.0043   Epoch: 15   Global Step: 264670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:09,040-Speed 9275.92 samples/sec   Loss 4.2556   LearningRate 0.0043   Epoch: 15   Global Step: 264680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:10,152-Speed 9220.40 samples/sec   Loss 4.0792   LearningRate 0.0043   Epoch: 15   Global Step: 264690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:11,231-Speed 9491.59 samples/sec   Loss 4.1318   LearningRate 0.0043   Epoch: 15   Global Step: 264700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:12,367-Speed 9022.97 samples/sec   Loss 4.2032   LearningRate 0.0043   Epoch: 15   Global Step: 264710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:13,469-Speed 9291.90 samples/sec   Loss 4.2178   LearningRate 0.0043   Epoch: 15   Global Step: 264720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:14,621-Speed 8897.99 samples/sec   Loss 4.1806   LearningRate 0.0043   Epoch: 15   Global Step: 264730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:14:15,777-Speed 8857.89 samples/sec   Loss 4.1139   LearningRate 0.0043   Epoch: 15   Global Step: 264740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:14:16,885-Speed 9249.94 samples/sec   Loss 4.1591   LearningRate 0.0043   Epoch: 15   Global Step: 264750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:14:17,970-Speed 9446.13 samples/sec   Loss 4.0945   LearningRate 0.0043   Epoch: 15   Global Step: 264760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:14:19,035-Speed 9618.35 samples/sec   Loss 4.1288   LearningRate 0.0043   Epoch: 15   Global Step: 264770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:20,153-Speed 9161.78 samples/sec   Loss 4.1063   LearningRate 0.0043   Epoch: 15   Global Step: 264780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:21,281-Speed 9089.64 samples/sec   Loss 4.1603   LearningRate 0.0043   Epoch: 15   Global Step: 264790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:22,434-Speed 8888.64 samples/sec   Loss 4.1206   LearningRate 0.0043   Epoch: 15   Global Step: 264800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:23,558-Speed 9119.96 samples/sec   Loss 4.1976   LearningRate 0.0043   Epoch: 15   Global Step: 264810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:24,712-Speed 8876.01 samples/sec   Loss 4.1927   LearningRate 0.0043   Epoch: 15   Global Step: 264820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:25,786-Speed 9544.23 samples/sec   Loss 4.1281   LearningRate 0.0043   Epoch: 15   Global Step: 264830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:26,851-Speed 9619.09 samples/sec   Loss 4.1480   LearningRate 0.0043   Epoch: 15   Global Step: 264840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:27,956-Speed 9265.88 samples/sec   Loss 4.0772   LearningRate 0.0043   Epoch: 15   Global Step: 264850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:29,046-Speed 9400.80 samples/sec   Loss 4.2171   LearningRate 0.0043   Epoch: 15   Global Step: 264860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:30,134-Speed 9413.70 samples/sec   Loss 4.1095   LearningRate 0.0043   Epoch: 15   Global Step: 264870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:14:31,202-Speed 9600.93 samples/sec   Loss 4.1411   LearningRate 0.0043   Epoch: 15   Global Step: 264880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:32,302-Speed 9308.33 samples/sec   Loss 4.2403   LearningRate 0.0043   Epoch: 15   Global Step: 264890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:33,432-Speed 9069.60 samples/sec   Loss 4.1545   LearningRate 0.0043   Epoch: 15   Global Step: 264900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:34,543-Speed 9227.20 samples/sec   Loss 4.0410   LearningRate 0.0043   Epoch: 15   Global Step: 264910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:35,674-Speed 9054.38 samples/sec   Loss 4.2625   LearningRate 0.0043   Epoch: 15   Global Step: 264920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:36,759-Speed 9444.74 samples/sec   Loss 4.1790   LearningRate 0.0043   Epoch: 15   Global Step: 264930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:37,896-Speed 9015.01 samples/sec   Loss 4.1166   LearningRate 0.0043   Epoch: 15   Global Step: 264940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:39,026-Speed 9061.13 samples/sec   Loss 4.0399   LearningRate 0.0043   Epoch: 15   Global Step: 264950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:40,110-Speed 9457.61 samples/sec   Loss 4.1513   LearningRate 0.0043   Epoch: 15   Global Step: 264960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:41,181-Speed 9565.71 samples/sec   Loss 4.1384   LearningRate 0.0043   Epoch: 15   Global Step: 264970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:42,250-Speed 9590.11 samples/sec   Loss 4.1343   LearningRate 0.0043   Epoch: 15   Global Step: 264980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:14:43,349-Speed 9323.12 samples/sec   Loss 4.1163   LearningRate 0.0043   Epoch: 15   Global Step: 264990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:14:44,499-Speed 8905.30 samples/sec   Loss 4.1455   LearningRate 0.0043   Epoch: 15   Global Step: 265000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:14:45,588-Speed 9412.34 samples/sec   Loss 4.1924   LearningRate 0.0042   Epoch: 15   Global Step: 265010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:14:46,668-Speed 9485.53 samples/sec   Loss 4.2538   LearningRate 0.0042   Epoch: 15   Global Step: 265020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:14:47,771-Speed 9287.07 samples/sec   Loss 4.1850   LearningRate 0.0042   Epoch: 15   Global Step: 265030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:14:48,867-Speed 9346.05 samples/sec   Loss 4.0131   LearningRate 0.0042   Epoch: 15   Global Step: 265040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:14:50,075-Speed 8484.18 samples/sec   Loss 4.1164   LearningRate 0.0042   Epoch: 15   Global Step: 265050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:14:51,144-Speed 9587.19 samples/sec   Loss 4.1260   LearningRate 0.0042   Epoch: 15   Global Step: 265060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:52,305-Speed 8830.93 samples/sec   Loss 4.0551   LearningRate 0.0042   Epoch: 15   Global Step: 265070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:53,390-Speed 9444.65 samples/sec   Loss 4.1804   LearningRate 0.0042   Epoch: 15   Global Step: 265080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:54,471-Speed 9474.92 samples/sec   Loss 4.1197   LearningRate 0.0042   Epoch: 15   Global Step: 265090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:55,647-Speed 8712.40 samples/sec   Loss 4.2261   LearningRate 0.0042   Epoch: 15   Global Step: 265100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:56,738-Speed 9396.53 samples/sec   Loss 4.1356   LearningRate 0.0042   Epoch: 15   Global Step: 265110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:57,845-Speed 9256.11 samples/sec   Loss 4.0959   LearningRate 0.0042   Epoch: 15   Global Step: 265120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:14:58,941-Speed 9350.54 samples/sec   Loss 4.1075   LearningRate 0.0042   Epoch: 15   Global Step: 265130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:00,064-Speed 9120.99 samples/sec   Loss 4.0990   LearningRate 0.0042   Epoch: 15   Global Step: 265140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:01,181-Speed 9171.92 samples/sec   Loss 4.1225   LearningRate 0.0042   Epoch: 15   Global Step: 265150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:02,259-Speed 9509.66 samples/sec   Loss 4.2503   LearningRate 0.0042   Epoch: 15   Global Step: 265160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:15:03,396-Speed 9009.65 samples/sec   Loss 4.1895   LearningRate 0.0042   Epoch: 15   Global Step: 265170   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:15:04,520-Speed 9113.37 samples/sec   Loss 4.1782   LearningRate 0.0042   Epoch: 15   Global Step: 265180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:05,611-Speed 9394.77 samples/sec   Loss 4.1898   LearningRate 0.0042   Epoch: 15   Global Step: 265190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:06,684-Speed 9549.09 samples/sec   Loss 4.0648   LearningRate 0.0042   Epoch: 15   Global Step: 265200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:07,771-Speed 9422.23 samples/sec   Loss 4.0982   LearningRate 0.0042   Epoch: 15   Global Step: 265210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:08,915-Speed 8955.77 samples/sec   Loss 4.1232   LearningRate 0.0042   Epoch: 15   Global Step: 265220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:10,050-Speed 9032.11 samples/sec   Loss 4.1436   LearningRate 0.0042   Epoch: 15   Global Step: 265230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:11,194-Speed 8956.78 samples/sec   Loss 4.2092   LearningRate 0.0042   Epoch: 15   Global Step: 265240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:12,314-Speed 9148.10 samples/sec   Loss 4.1116   LearningRate 0.0042   Epoch: 15   Global Step: 265250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:13,386-Speed 9559.82 samples/sec   Loss 4.1859   LearningRate 0.0042   Epoch: 15   Global Step: 265260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:14,507-Speed 9136.58 samples/sec   Loss 4.2011   LearningRate 0.0042   Epoch: 15   Global Step: 265270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:15,643-Speed 9018.26 samples/sec   Loss 4.2311   LearningRate 0.0042   Epoch: 15   Global Step: 265280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:15:16,736-Speed 9377.77 samples/sec   Loss 4.1946   LearningRate 0.0042   Epoch: 15   Global Step: 265290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:17,860-Speed 9115.82 samples/sec   Loss 4.1295   LearningRate 0.0042   Epoch: 15   Global Step: 265300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:18,993-Speed 9047.22 samples/sec   Loss 4.1457   LearningRate 0.0042   Epoch: 15   Global Step: 265310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:20,101-Speed 9245.72 samples/sec   Loss 4.1985   LearningRate 0.0042   Epoch: 15   Global Step: 265320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:21,241-Speed 8983.88 samples/sec   Loss 4.2724   LearningRate 0.0042   Epoch: 15   Global Step: 265330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:22,363-Speed 9135.70 samples/sec   Loss 4.1498   LearningRate 0.0042   Epoch: 15   Global Step: 265340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:23,486-Speed 9128.43 samples/sec   Loss 4.1079   LearningRate 0.0042   Epoch: 15   Global Step: 265350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:24,598-Speed 9209.49 samples/sec   Loss 4.0634   LearningRate 0.0042   Epoch: 15   Global Step: 265360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:25,688-Speed 9406.17 samples/sec   Loss 4.1635   LearningRate 0.0042   Epoch: 15   Global Step: 265370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:26,829-Speed 8976.74 samples/sec   Loss 4.0955   LearningRate 0.0042   Epoch: 15   Global Step: 265380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:27,911-Speed 9471.34 samples/sec   Loss 4.1836   LearningRate 0.0042   Epoch: 15   Global Step: 265390   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:15:29,012-Speed 9308.49 samples/sec   Loss 4.0530   LearningRate 0.0042   Epoch: 15   Global Step: 265400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:15:30,110-Speed 9330.21 samples/sec   Loss 4.1177   LearningRate 0.0042   Epoch: 15   Global Step: 265410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:15:31,202-Speed 9380.19 samples/sec   Loss 4.0719   LearningRate 0.0042   Epoch: 15   Global Step: 265420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:32,305-Speed 9283.76 samples/sec   Loss 4.2075   LearningRate 0.0042   Epoch: 15   Global Step: 265430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:33,389-Speed 9456.08 samples/sec   Loss 4.0588   LearningRate 0.0042   Epoch: 15   Global Step: 265440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:34,508-Speed 9159.18 samples/sec   Loss 4.1341   LearningRate 0.0042   Epoch: 15   Global Step: 265450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:35,597-Speed 9415.58 samples/sec   Loss 4.1735   LearningRate 0.0042   Epoch: 15   Global Step: 265460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:36,681-Speed 9446.63 samples/sec   Loss 4.1872   LearningRate 0.0042   Epoch: 15   Global Step: 265470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:37,777-Speed 9349.97 samples/sec   Loss 4.0625   LearningRate 0.0042   Epoch: 15   Global Step: 265480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:38,838-Speed 9660.61 samples/sec   Loss 4.1127   LearningRate 0.0042   Epoch: 15   Global Step: 265490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:39,943-Speed 9273.96 samples/sec   Loss 4.0943   LearningRate 0.0042   Epoch: 15   Global Step: 265500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:41,039-Speed 9349.53 samples/sec   Loss 4.1629   LearningRate 0.0042   Epoch: 15   Global Step: 265510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:42,143-Speed 9276.94 samples/sec   Loss 4.1038   LearningRate 0.0042   Epoch: 15   Global Step: 265520   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:15:43,264-Speed 9140.69 samples/sec   Loss 4.0961   LearningRate 0.0042   Epoch: 15   Global Step: 265530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:15:44,372-Speed 9248.42 samples/sec   Loss 4.2485   LearningRate 0.0042   Epoch: 15   Global Step: 265540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:15:45,441-Speed 9586.11 samples/sec   Loss 4.1068   LearningRate 0.0042   Epoch: 15   Global Step: 265550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:15:46,577-Speed 9018.95 samples/sec   Loss 4.1024   LearningRate 0.0042   Epoch: 15   Global Step: 265560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:15:47,681-Speed 9281.15 samples/sec   Loss 4.1694   LearningRate 0.0042   Epoch: 15   Global Step: 265570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:15:48,806-Speed 9103.90 samples/sec   Loss 4.2849   LearningRate 0.0042   Epoch: 15   Global Step: 265580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:49,856-Speed 9758.40 samples/sec   Loss 4.0875   LearningRate 0.0042   Epoch: 15   Global Step: 265590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:50,951-Speed 9358.26 samples/sec   Loss 4.1173   LearningRate 0.0042   Epoch: 15   Global Step: 265600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:52,047-Speed 9352.27 samples/sec   Loss 4.2499   LearningRate 0.0042   Epoch: 15   Global Step: 265610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:53,172-Speed 9103.84 samples/sec   Loss 4.1262   LearningRate 0.0042   Epoch: 15   Global Step: 265620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:54,282-Speed 9238.32 samples/sec   Loss 4.1612   LearningRate 0.0042   Epoch: 15   Global Step: 265630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:55,395-Speed 9201.74 samples/sec   Loss 4.0946   LearningRate 0.0042   Epoch: 15   Global Step: 265640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:56,462-Speed 9606.53 samples/sec   Loss 4.1007   LearningRate 0.0042   Epoch: 15   Global Step: 265650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:57,551-Speed 9404.09 samples/sec   Loss 4.1833   LearningRate 0.0042   Epoch: 15   Global Step: 265660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:58,663-Speed 9214.15 samples/sec   Loss 4.1527   LearningRate 0.0042   Epoch: 15   Global Step: 265670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:15:59,787-Speed 9117.88 samples/sec   Loss 4.2777   LearningRate 0.0042   Epoch: 15   Global Step: 265680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:16:00,867-Speed 9483.10 samples/sec   Loss 4.0660   LearningRate 0.0042   Epoch: 15   Global Step: 265690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:16:01,919-Speed 9743.40 samples/sec   Loss 4.1456   LearningRate 0.0042   Epoch: 15   Global Step: 265700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:16:03,017-Speed 9331.87 samples/sec   Loss 4.1215   LearningRate 0.0042   Epoch: 15   Global Step: 265710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:04,131-Speed 9198.02 samples/sec   Loss 4.1088   LearningRate 0.0042   Epoch: 15   Global Step: 265720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:05,227-Speed 9350.71 samples/sec   Loss 4.2200   LearningRate 0.0042   Epoch: 15   Global Step: 265730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:06,382-Speed 8867.82 samples/sec   Loss 4.1171   LearningRate 0.0042   Epoch: 15   Global Step: 265740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:07,514-Speed 9054.13 samples/sec   Loss 4.0721   LearningRate 0.0042   Epoch: 15   Global Step: 265750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:08,708-Speed 8577.26 samples/sec   Loss 4.1364   LearningRate 0.0042   Epoch: 15   Global Step: 265760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:09,816-Speed 9247.07 samples/sec   Loss 4.1051   LearningRate 0.0042   Epoch: 15   Global Step: 265770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:10,947-Speed 9060.14 samples/sec   Loss 4.0878   LearningRate 0.0042   Epoch: 15   Global Step: 265780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:12,095-Speed 8926.94 samples/sec   Loss 4.1185   LearningRate 0.0042   Epoch: 15   Global Step: 265790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:13,198-Speed 9289.56 samples/sec   Loss 4.1021   LearningRate 0.0042   Epoch: 15   Global Step: 265800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:14,248-Speed 9766.76 samples/sec   Loss 4.1071   LearningRate 0.0042   Epoch: 15   Global Step: 265810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:16:15,365-Speed 9170.96 samples/sec   Loss 4.0705   LearningRate 0.0041   Epoch: 15   Global Step: 265820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:16:16,446-Speed 9473.40 samples/sec   Loss 4.1637   LearningRate 0.0041   Epoch: 15   Global Step: 265830   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:16:17,552-Speed 9268.55 samples/sec   Loss 4.0943   LearningRate 0.0041   Epoch: 15   Global Step: 265840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:16:18,666-Speed 9194.79 samples/sec   Loss 4.1410   LearningRate 0.0041   Epoch: 15   Global Step: 265850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:16:19,777-Speed 9221.24 samples/sec   Loss 4.2074   LearningRate 0.0041   Epoch: 15   Global Step: 265860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:16:20,912-Speed 9025.59 samples/sec   Loss 4.1886   LearningRate 0.0041   Epoch: 15   Global Step: 265870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:16:22,029-Speed 9175.86 samples/sec   Loss 4.1583   LearningRate 0.0041   Epoch: 15   Global Step: 265880   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:16:23,110-Speed 9477.58 samples/sec   Loss 4.1177   LearningRate 0.0041   Epoch: 15   Global Step: 265890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:24,200-Speed 9401.48 samples/sec   Loss 4.1755   LearningRate 0.0041   Epoch: 15   Global Step: 265900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:25,272-Speed 9555.32 samples/sec   Loss 4.1077   LearningRate 0.0041   Epoch: 15   Global Step: 265910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:26,332-Speed 9666.82 samples/sec   Loss 4.1401   LearningRate 0.0041   Epoch: 15   Global Step: 265920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:27,428-Speed 9346.47 samples/sec   Loss 4.2049   LearningRate 0.0041   Epoch: 15   Global Step: 265930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:28,545-Speed 9173.17 samples/sec   Loss 4.1658   LearningRate 0.0041   Epoch: 15   Global Step: 265940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:29,656-Speed 9223.88 samples/sec   Loss 4.1282   LearningRate 0.0041   Epoch: 15   Global Step: 265950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:30,786-Speed 9060.34 samples/sec   Loss 4.2069   LearningRate 0.0041   Epoch: 15   Global Step: 265960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:31,919-Speed 9045.22 samples/sec   Loss 4.1727   LearningRate 0.0041   Epoch: 15   Global Step: 265970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:33,005-Speed 9441.85 samples/sec   Loss 4.1132   LearningRate 0.0041   Epoch: 15   Global Step: 265980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:16:34,066-Speed 9656.45 samples/sec   Loss 4.1071   LearningRate 0.0041   Epoch: 15   Global Step: 265990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:16:35,208-Speed 8976.20 samples/sec   Loss 4.1506   LearningRate 0.0041   Epoch: 15   Global Step: 266000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:16:57,100-[lfw][266000]XNorm: 7.006951
Training: 2022-04-11 22:16:57,101-[lfw][266000]Accuracy-Flip: 0.99683+-0.00283
Training: 2022-04-11 22:16:57,102-[lfw][266000]Accuracy-Highest: 0.99733
Training: 2022-04-11 22:17:22,445-[cfp_fp][266000]XNorm: 6.055339
Training: 2022-04-11 22:17:22,446-[cfp_fp][266000]Accuracy-Flip: 0.97100+-0.00694
Training: 2022-04-11 22:17:22,446-[cfp_fp][266000]Accuracy-Highest: 0.97171
Training: 2022-04-11 22:17:44,293-[agedb_30][266000]XNorm: 6.813855
Training: 2022-04-11 22:17:44,294-[agedb_30][266000]Accuracy-Flip: 0.97167+-0.00872
Training: 2022-04-11 22:17:44,295-[agedb_30][266000]Accuracy-Highest: 0.97350
Training: 2022-04-11 22:17:45,388-Speed 145.91 samples/sec   Loss 4.1075   LearningRate 0.0041   Epoch: 15   Global Step: 266010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:17:46,480-Speed 9383.09 samples/sec   Loss 4.1934   LearningRate 0.0041   Epoch: 15   Global Step: 266020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:17:47,603-Speed 9124.33 samples/sec   Loss 4.0943   LearningRate 0.0041   Epoch: 15   Global Step: 266030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:17:48,717-Speed 9199.94 samples/sec   Loss 4.1732   LearningRate 0.0041   Epoch: 15   Global Step: 266040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:17:49,786-Speed 9577.52 samples/sec   Loss 4.1551   LearningRate 0.0041   Epoch: 15   Global Step: 266050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:17:50,886-Speed 9314.41 samples/sec   Loss 4.1110   LearningRate 0.0041   Epoch: 15   Global Step: 266060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:17:52,003-Speed 9178.48 samples/sec   Loss 4.1208   LearningRate 0.0041   Epoch: 15   Global Step: 266070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:17:53,123-Speed 9151.23 samples/sec   Loss 4.1562   LearningRate 0.0041   Epoch: 15   Global Step: 266080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:17:54,230-Speed 9253.25 samples/sec   Loss 4.1668   LearningRate 0.0041   Epoch: 15   Global Step: 266090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:17:55,287-Speed 9696.28 samples/sec   Loss 4.1518   LearningRate 0.0041   Epoch: 15   Global Step: 266100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:17:56,405-Speed 9167.93 samples/sec   Loss 4.1353   LearningRate 0.0041   Epoch: 15   Global Step: 266110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:17:57,556-Speed 8897.74 samples/sec   Loss 4.0304   LearningRate 0.0041   Epoch: 15   Global Step: 266120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:17:58,652-Speed 9350.17 samples/sec   Loss 4.1835   LearningRate 0.0041   Epoch: 15   Global Step: 266130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:17:59,731-Speed 9490.74 samples/sec   Loss 4.0632   LearningRate 0.0041   Epoch: 15   Global Step: 266140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:00,800-Speed 9586.42 samples/sec   Loss 4.0399   LearningRate 0.0041   Epoch: 15   Global Step: 266150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:01,920-Speed 9146.52 samples/sec   Loss 4.1012   LearningRate 0.0041   Epoch: 15   Global Step: 266160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:03,033-Speed 9211.76 samples/sec   Loss 4.0853   LearningRate 0.0041   Epoch: 15   Global Step: 266170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:04,107-Speed 9541.28 samples/sec   Loss 3.9800   LearningRate 0.0041   Epoch: 15   Global Step: 266180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:05,174-Speed 9606.73 samples/sec   Loss 4.0734   LearningRate 0.0041   Epoch: 15   Global Step: 266190   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:18:06,264-Speed 9391.16 samples/sec   Loss 4.1504   LearningRate 0.0041   Epoch: 15   Global Step: 266200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:07,388-Speed 9115.73 samples/sec   Loss 4.1372   LearningRate 0.0041   Epoch: 15   Global Step: 266210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:08,525-Speed 9015.98 samples/sec   Loss 4.2733   LearningRate 0.0041   Epoch: 15   Global Step: 266220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:09,628-Speed 9293.61 samples/sec   Loss 4.0681   LearningRate 0.0041   Epoch: 15   Global Step: 266230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:10,718-Speed 9394.79 samples/sec   Loss 4.0755   LearningRate 0.0041   Epoch: 15   Global Step: 266240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:11,845-Speed 9091.22 samples/sec   Loss 4.0489   LearningRate 0.0041   Epoch: 15   Global Step: 266250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:12,984-Speed 8996.64 samples/sec   Loss 4.1521   LearningRate 0.0041   Epoch: 15   Global Step: 266260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:14,070-Speed 9437.67 samples/sec   Loss 4.2069   LearningRate 0.0041   Epoch: 15   Global Step: 266270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:15,197-Speed 9090.34 samples/sec   Loss 4.1976   LearningRate 0.0041   Epoch: 15   Global Step: 266280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:16,351-Speed 8882.34 samples/sec   Loss 4.0505   LearningRate 0.0041   Epoch: 15   Global Step: 266290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:17,440-Speed 9404.35 samples/sec   Loss 4.1648   LearningRate 0.0041   Epoch: 15   Global Step: 266300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:18,534-Speed 9370.29 samples/sec   Loss 4.1091   LearningRate 0.0041   Epoch: 15   Global Step: 266310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:19,636-Speed 9293.08 samples/sec   Loss 4.0986   LearningRate 0.0041   Epoch: 15   Global Step: 266320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:20,764-Speed 9084.26 samples/sec   Loss 4.1222   LearningRate 0.0041   Epoch: 15   Global Step: 266330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:21,897-Speed 9043.15 samples/sec   Loss 4.1613   LearningRate 0.0041   Epoch: 15   Global Step: 266340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:22,988-Speed 9393.87 samples/sec   Loss 4.1397   LearningRate 0.0041   Epoch: 15   Global Step: 266350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:24,081-Speed 9376.12 samples/sec   Loss 4.0905   LearningRate 0.0041   Epoch: 15   Global Step: 266360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:25,207-Speed 9099.24 samples/sec   Loss 4.0964   LearningRate 0.0041   Epoch: 15   Global Step: 266370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:26,340-Speed 9043.74 samples/sec   Loss 4.0510   LearningRate 0.0041   Epoch: 15   Global Step: 266380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:27,447-Speed 9256.09 samples/sec   Loss 4.1837   LearningRate 0.0041   Epoch: 15   Global Step: 266390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:28,507-Speed 9661.56 samples/sec   Loss 4.1417   LearningRate 0.0041   Epoch: 15   Global Step: 266400   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:18:29,595-Speed 9418.18 samples/sec   Loss 4.2412   LearningRate 0.0041   Epoch: 15   Global Step: 266410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:18:30,677-Speed 9469.04 samples/sec   Loss 4.1622   LearningRate 0.0041   Epoch: 15   Global Step: 266420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:18:31,776-Speed 9327.21 samples/sec   Loss 4.1399   LearningRate 0.0041   Epoch: 15   Global Step: 266430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:32,893-Speed 9171.71 samples/sec   Loss 4.0882   LearningRate 0.0041   Epoch: 15   Global Step: 266440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:34,001-Speed 9240.16 samples/sec   Loss 4.1376   LearningRate 0.0041   Epoch: 15   Global Step: 266450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:35,132-Speed 9063.72 samples/sec   Loss 4.0486   LearningRate 0.0041   Epoch: 15   Global Step: 266460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:36,265-Speed 9038.66 samples/sec   Loss 4.0978   LearningRate 0.0041   Epoch: 15   Global Step: 266470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:37,416-Speed 8908.99 samples/sec   Loss 4.1428   LearningRate 0.0041   Epoch: 15   Global Step: 266480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:38,505-Speed 9412.64 samples/sec   Loss 4.1427   LearningRate 0.0041   Epoch: 15   Global Step: 266490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:39,646-Speed 8973.88 samples/sec   Loss 4.1696   LearningRate 0.0041   Epoch: 15   Global Step: 266500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:40,760-Speed 9200.98 samples/sec   Loss 4.0982   LearningRate 0.0041   Epoch: 15   Global Step: 266510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:41,865-Speed 9270.93 samples/sec   Loss 4.2387   LearningRate 0.0041   Epoch: 15   Global Step: 266520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:42,946-Speed 9478.46 samples/sec   Loss 4.1320   LearningRate 0.0041   Epoch: 15   Global Step: 266530   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:18:44,049-Speed 9290.57 samples/sec   Loss 4.0544   LearningRate 0.0041   Epoch: 15   Global Step: 266540   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:18:45,136-Speed 9430.60 samples/sec   Loss 4.1210   LearningRate 0.0041   Epoch: 15   Global Step: 266550   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:18:46,222-Speed 9430.48 samples/sec   Loss 4.1263   LearningRate 0.0041   Epoch: 15   Global Step: 266560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:18:47,339-Speed 9175.51 samples/sec   Loss 4.1886   LearningRate 0.0041   Epoch: 15   Global Step: 266570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:48,470-Speed 9061.20 samples/sec   Loss 4.1411   LearningRate 0.0041   Epoch: 15   Global Step: 266580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:49,618-Speed 8924.28 samples/sec   Loss 4.1744   LearningRate 0.0041   Epoch: 15   Global Step: 266590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:50,728-Speed 9228.46 samples/sec   Loss 4.1087   LearningRate 0.0041   Epoch: 15   Global Step: 266600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:51,881-Speed 8886.66 samples/sec   Loss 4.0789   LearningRate 0.0041   Epoch: 15   Global Step: 266610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:53,003-Speed 9128.28 samples/sec   Loss 4.2258   LearningRate 0.0041   Epoch: 15   Global Step: 266620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:54,203-Speed 8533.86 samples/sec   Loss 4.1610   LearningRate 0.0041   Epoch: 15   Global Step: 266630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:55,340-Speed 9015.92 samples/sec   Loss 4.1930   LearningRate 0.0041   Epoch: 15   Global Step: 266640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:56,403-Speed 9639.04 samples/sec   Loss 4.1807   LearningRate 0.0040   Epoch: 15   Global Step: 266650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:57,511-Speed 9253.62 samples/sec   Loss 4.1656   LearningRate 0.0040   Epoch: 15   Global Step: 266660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:18:58,559-Speed 9771.84 samples/sec   Loss 4.1813   LearningRate 0.0040   Epoch: 15   Global Step: 266670   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:18:59,669-Speed 9234.10 samples/sec   Loss 4.1714   LearningRate 0.0040   Epoch: 15   Global Step: 266680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:00,804-Speed 9027.01 samples/sec   Loss 4.1796   LearningRate 0.0040   Epoch: 15   Global Step: 266690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:01,897-Speed 9372.98 samples/sec   Loss 4.1464   LearningRate 0.0040   Epoch: 15   Global Step: 266700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:03,040-Speed 8968.22 samples/sec   Loss 4.0767   LearningRate 0.0040   Epoch: 15   Global Step: 266710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:04,153-Speed 9202.52 samples/sec   Loss 4.1898   LearningRate 0.0040   Epoch: 15   Global Step: 266720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:05,265-Speed 9216.53 samples/sec   Loss 4.1904   LearningRate 0.0040   Epoch: 15   Global Step: 266730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:06,351-Speed 9432.95 samples/sec   Loss 4.1095   LearningRate 0.0040   Epoch: 15   Global Step: 266740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:07,420-Speed 9582.20 samples/sec   Loss 4.0918   LearningRate 0.0040   Epoch: 15   Global Step: 266750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:08,527-Speed 9254.80 samples/sec   Loss 4.1245   LearningRate 0.0040   Epoch: 15   Global Step: 266760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:09,617-Speed 9398.77 samples/sec   Loss 4.1662   LearningRate 0.0040   Epoch: 15   Global Step: 266770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:10,721-Speed 9286.22 samples/sec   Loss 4.2179   LearningRate 0.0040   Epoch: 15   Global Step: 266780   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:19:11,784-Speed 9639.07 samples/sec   Loss 4.2051   LearningRate 0.0040   Epoch: 15   Global Step: 266790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:12,900-Speed 9178.19 samples/sec   Loss 4.1484   LearningRate 0.0040   Epoch: 15   Global Step: 266800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:14,034-Speed 9041.06 samples/sec   Loss 4.1254   LearningRate 0.0040   Epoch: 15   Global Step: 266810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:15,134-Speed 9311.05 samples/sec   Loss 4.1321   LearningRate 0.0040   Epoch: 15   Global Step: 266820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:16,303-Speed 8763.74 samples/sec   Loss 4.1853   LearningRate 0.0040   Epoch: 15   Global Step: 266830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:17,449-Speed 8938.60 samples/sec   Loss 4.1861   LearningRate 0.0040   Epoch: 15   Global Step: 266840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:18,584-Speed 9028.03 samples/sec   Loss 4.1086   LearningRate 0.0040   Epoch: 15   Global Step: 266850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:19,651-Speed 9606.68 samples/sec   Loss 4.1343   LearningRate 0.0040   Epoch: 15   Global Step: 266860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:20,783-Speed 9058.20 samples/sec   Loss 4.1344   LearningRate 0.0040   Epoch: 15   Global Step: 266870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:21,877-Speed 9359.89 samples/sec   Loss 4.1072   LearningRate 0.0040   Epoch: 15   Global Step: 266880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:22,960-Speed 9469.98 samples/sec   Loss 4.1193   LearningRate 0.0040   Epoch: 15   Global Step: 266890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:24,050-Speed 9398.67 samples/sec   Loss 4.1791   LearningRate 0.0040   Epoch: 15   Global Step: 266900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:25,121-Speed 9567.22 samples/sec   Loss 4.1474   LearningRate 0.0040   Epoch: 15   Global Step: 266910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:26,193-Speed 9556.71 samples/sec   Loss 4.1073   LearningRate 0.0040   Epoch: 15   Global Step: 266920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:27,306-Speed 9198.44 samples/sec   Loss 4.1398   LearningRate 0.0040   Epoch: 15   Global Step: 266930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:28,474-Speed 8778.25 samples/sec   Loss 4.1389   LearningRate 0.0040   Epoch: 15   Global Step: 266940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:29,564-Speed 9398.91 samples/sec   Loss 4.2253   LearningRate 0.0040   Epoch: 15   Global Step: 266950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:30,644-Speed 9484.75 samples/sec   Loss 4.0878   LearningRate 0.0040   Epoch: 15   Global Step: 266960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:31,754-Speed 9229.90 samples/sec   Loss 4.1675   LearningRate 0.0040   Epoch: 15   Global Step: 266970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:32,854-Speed 9312.76 samples/sec   Loss 4.0989   LearningRate 0.0040   Epoch: 15   Global Step: 266980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:33,994-Speed 8989.46 samples/sec   Loss 4.1621   LearningRate 0.0040   Epoch: 15   Global Step: 266990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:35,084-Speed 9401.00 samples/sec   Loss 4.1039   LearningRate 0.0040   Epoch: 15   Global Step: 267000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:36,231-Speed 8931.70 samples/sec   Loss 4.1194   LearningRate 0.0040   Epoch: 15   Global Step: 267010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:37,322-Speed 9398.22 samples/sec   Loss 4.1317   LearningRate 0.0040   Epoch: 15   Global Step: 267020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:38,423-Speed 9303.55 samples/sec   Loss 4.0983   LearningRate 0.0040   Epoch: 15   Global Step: 267030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:39,538-Speed 9187.87 samples/sec   Loss 4.0749   LearningRate 0.0040   Epoch: 15   Global Step: 267040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:19:40,892-Speed 7566.51 samples/sec   Loss 4.1784   LearningRate 0.0040   Epoch: 15   Global Step: 267050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:10,659-Speed 344.02 samples/sec   Loss 3.9493   LearningRate 0.0040   Epoch: 16   Global Step: 267060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:12,294-Speed 6267.48 samples/sec   Loss 3.6054   LearningRate 0.0040   Epoch: 16   Global Step: 267070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:13,823-Speed 6702.70 samples/sec   Loss 3.5812   LearningRate 0.0040   Epoch: 16   Global Step: 267080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:14,907-Speed 9445.91 samples/sec   Loss 3.6582   LearningRate 0.0040   Epoch: 16   Global Step: 267090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:20:15,999-Speed 9382.66 samples/sec   Loss 3.6699   LearningRate 0.0040   Epoch: 16   Global Step: 267100   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:20:17,325-Speed 7727.40 samples/sec   Loss 3.6341   LearningRate 0.0040   Epoch: 16   Global Step: 267110   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:20:18,840-Speed 6765.19 samples/sec   Loss 3.6860   LearningRate 0.0040   Epoch: 16   Global Step: 267120   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:20:19,975-Speed 9023.27 samples/sec   Loss 3.6697   LearningRate 0.0040   Epoch: 16   Global Step: 267130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:21,081-Speed 9264.68 samples/sec   Loss 3.5985   LearningRate 0.0040   Epoch: 16   Global Step: 267140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:22,225-Speed 8958.41 samples/sec   Loss 3.6188   LearningRate 0.0040   Epoch: 16   Global Step: 267150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:23,374-Speed 8919.89 samples/sec   Loss 3.5744   LearningRate 0.0040   Epoch: 16   Global Step: 267160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:24,524-Speed 8910.78 samples/sec   Loss 3.6237   LearningRate 0.0040   Epoch: 16   Global Step: 267170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:25,657-Speed 9039.72 samples/sec   Loss 3.6814   LearningRate 0.0040   Epoch: 16   Global Step: 267180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:26,788-Speed 9063.32 samples/sec   Loss 3.6738   LearningRate 0.0040   Epoch: 16   Global Step: 267190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:27,848-Speed 9665.14 samples/sec   Loss 3.6188   LearningRate 0.0040   Epoch: 16   Global Step: 267200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:28,948-Speed 9312.16 samples/sec   Loss 3.5732   LearningRate 0.0040   Epoch: 16   Global Step: 267210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:30,056-Speed 9246.68 samples/sec   Loss 3.5086   LearningRate 0.0040   Epoch: 16   Global Step: 267220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:31,134-Speed 9512.04 samples/sec   Loss 3.6493   LearningRate 0.0040   Epoch: 16   Global Step: 267230   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:20:32,217-Speed 9460.98 samples/sec   Loss 3.5087   LearningRate 0.0040   Epoch: 16   Global Step: 267240   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:20:33,334-Speed 9167.45 samples/sec   Loss 3.6728   LearningRate 0.0040   Epoch: 16   Global Step: 267250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:34,404-Speed 9575.39 samples/sec   Loss 3.6031   LearningRate 0.0040   Epoch: 16   Global Step: 267260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:35,503-Speed 9326.74 samples/sec   Loss 3.5924   LearningRate 0.0040   Epoch: 16   Global Step: 267270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:36,570-Speed 9602.34 samples/sec   Loss 3.6689   LearningRate 0.0040   Epoch: 16   Global Step: 267280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:37,632-Speed 9649.59 samples/sec   Loss 3.5509   LearningRate 0.0040   Epoch: 16   Global Step: 267290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:38,679-Speed 9780.07 samples/sec   Loss 3.6419   LearningRate 0.0040   Epoch: 16   Global Step: 267300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:39,841-Speed 8817.90 samples/sec   Loss 3.6128   LearningRate 0.0040   Epoch: 16   Global Step: 267310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:40,971-Speed 9070.01 samples/sec   Loss 3.6122   LearningRate 0.0040   Epoch: 16   Global Step: 267320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:42,050-Speed 9494.45 samples/sec   Loss 3.6645   LearningRate 0.0040   Epoch: 16   Global Step: 267330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:43,200-Speed 8910.72 samples/sec   Loss 3.6698   LearningRate 0.0040   Epoch: 16   Global Step: 267340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:44,270-Speed 9577.56 samples/sec   Loss 3.6970   LearningRate 0.0040   Epoch: 16   Global Step: 267350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:20:45,344-Speed 9534.87 samples/sec   Loss 3.5769   LearningRate 0.0040   Epoch: 16   Global Step: 267360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:46,460-Speed 9185.79 samples/sec   Loss 3.6390   LearningRate 0.0040   Epoch: 16   Global Step: 267370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:47,508-Speed 9771.21 samples/sec   Loss 3.5766   LearningRate 0.0040   Epoch: 16   Global Step: 267380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:48,597-Speed 9446.54 samples/sec   Loss 3.5938   LearningRate 0.0040   Epoch: 16   Global Step: 267390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:49,882-Speed 7976.92 samples/sec   Loss 3.6429   LearningRate 0.0040   Epoch: 16   Global Step: 267400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:50,953-Speed 9569.94 samples/sec   Loss 3.7269   LearningRate 0.0040   Epoch: 16   Global Step: 267410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:52,200-Speed 8210.34 samples/sec   Loss 3.7118   LearningRate 0.0040   Epoch: 16   Global Step: 267420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:53,465-Speed 8101.95 samples/sec   Loss 3.7172   LearningRate 0.0040   Epoch: 16   Global Step: 267430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:54,703-Speed 8275.04 samples/sec   Loss 3.6109   LearningRate 0.0040   Epoch: 16   Global Step: 267440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:55,964-Speed 8130.22 samples/sec   Loss 3.5983   LearningRate 0.0040   Epoch: 16   Global Step: 267450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:57,092-Speed 9082.85 samples/sec   Loss 3.5538   LearningRate 0.0040   Epoch: 16   Global Step: 267460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:58,342-Speed 8196.41 samples/sec   Loss 3.5540   LearningRate 0.0040   Epoch: 16   Global Step: 267470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:20:59,442-Speed 9317.45 samples/sec   Loss 3.5814   LearningRate 0.0039   Epoch: 16   Global Step: 267480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:00,694-Speed 8176.92 samples/sec   Loss 3.6305   LearningRate 0.0039   Epoch: 16   Global Step: 267490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:01,778-Speed 9452.32 samples/sec   Loss 3.6871   LearningRate 0.0039   Epoch: 16   Global Step: 267500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:02,843-Speed 9628.64 samples/sec   Loss 3.7257   LearningRate 0.0039   Epoch: 16   Global Step: 267510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:04,003-Speed 8826.23 samples/sec   Loss 3.6128   LearningRate 0.0039   Epoch: 16   Global Step: 267520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:05,087-Speed 9451.15 samples/sec   Loss 3.6883   LearningRate 0.0039   Epoch: 16   Global Step: 267530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:06,225-Speed 9003.27 samples/sec   Loss 3.7533   LearningRate 0.0039   Epoch: 16   Global Step: 267540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:07,312-Speed 9427.74 samples/sec   Loss 3.6707   LearningRate 0.0039   Epoch: 16   Global Step: 267550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:08,364-Speed 9748.96 samples/sec   Loss 3.6257   LearningRate 0.0039   Epoch: 16   Global Step: 267560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:21:09,486-Speed 9128.66 samples/sec   Loss 3.5473   LearningRate 0.0039   Epoch: 16   Global Step: 267570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:21:10,586-Speed 9316.52 samples/sec   Loss 3.5770   LearningRate 0.0039   Epoch: 16   Global Step: 267580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:21:11,745-Speed 8843.17 samples/sec   Loss 3.6495   LearningRate 0.0039   Epoch: 16   Global Step: 267590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:21:12,859-Speed 9198.83 samples/sec   Loss 3.6373   LearningRate 0.0039   Epoch: 16   Global Step: 267600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:21:13,959-Speed 9314.89 samples/sec   Loss 3.6672   LearningRate 0.0039   Epoch: 16   Global Step: 267610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:21:15,031-Speed 9554.42 samples/sec   Loss 3.6610   LearningRate 0.0039   Epoch: 16   Global Step: 267620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:21:16,118-Speed 9424.58 samples/sec   Loss 3.7166   LearningRate 0.0039   Epoch: 16   Global Step: 267630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:21:17,219-Speed 9305.26 samples/sec   Loss 3.7498   LearningRate 0.0039   Epoch: 16   Global Step: 267640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:18,310-Speed 9393.60 samples/sec   Loss 3.6745   LearningRate 0.0039   Epoch: 16   Global Step: 267650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:21:19,421-Speed 9223.23 samples/sec   Loss 3.6788   LearningRate 0.0039   Epoch: 16   Global Step: 267660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:21:20,542-Speed 9136.06 samples/sec   Loss 3.6986   LearningRate 0.0039   Epoch: 16   Global Step: 267670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:21:21,630-Speed 9417.51 samples/sec   Loss 3.6936   LearningRate 0.0039   Epoch: 16   Global Step: 267680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:21:22,741-Speed 9227.28 samples/sec   Loss 3.6708   LearningRate 0.0039   Epoch: 16   Global Step: 267690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:21:23,825-Speed 9451.89 samples/sec   Loss 3.6503   LearningRate 0.0039   Epoch: 16   Global Step: 267700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:21:24,918-Speed 9372.63 samples/sec   Loss 3.5940   LearningRate 0.0039   Epoch: 16   Global Step: 267710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:21:25,994-Speed 9526.72 samples/sec   Loss 3.6352   LearningRate 0.0039   Epoch: 16   Global Step: 267720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:21:27,109-Speed 9182.33 samples/sec   Loss 3.6705   LearningRate 0.0039   Epoch: 16   Global Step: 267730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:21:28,206-Speed 9343.51 samples/sec   Loss 3.5971   LearningRate 0.0039   Epoch: 16   Global Step: 267740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:21:29,318-Speed 9216.04 samples/sec   Loss 3.6193   LearningRate 0.0039   Epoch: 16   Global Step: 267750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:30,426-Speed 9246.97 samples/sec   Loss 3.7624   LearningRate 0.0039   Epoch: 16   Global Step: 267760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:31,490-Speed 9626.87 samples/sec   Loss 3.5838   LearningRate 0.0039   Epoch: 16   Global Step: 267770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:32,583-Speed 9375.08 samples/sec   Loss 3.6306   LearningRate 0.0039   Epoch: 16   Global Step: 267780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:33,685-Speed 9297.74 samples/sec   Loss 3.6230   LearningRate 0.0039   Epoch: 16   Global Step: 267790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:34,834-Speed 8934.17 samples/sec   Loss 3.7017   LearningRate 0.0039   Epoch: 16   Global Step: 267800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:35,887-Speed 9734.29 samples/sec   Loss 3.7104   LearningRate 0.0039   Epoch: 16   Global Step: 267810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:37,002-Speed 9193.69 samples/sec   Loss 3.6720   LearningRate 0.0039   Epoch: 16   Global Step: 267820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:38,117-Speed 9188.84 samples/sec   Loss 3.7567   LearningRate 0.0039   Epoch: 16   Global Step: 267830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:39,279-Speed 8811.48 samples/sec   Loss 3.7040   LearningRate 0.0039   Epoch: 16   Global Step: 267840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:40,384-Speed 9276.89 samples/sec   Loss 3.6629   LearningRate 0.0039   Epoch: 16   Global Step: 267850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:21:41,460-Speed 9523.40 samples/sec   Loss 3.7161   LearningRate 0.0039   Epoch: 16   Global Step: 267860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:42,553-Speed 9374.18 samples/sec   Loss 3.6981   LearningRate 0.0039   Epoch: 16   Global Step: 267870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:43,639-Speed 9426.57 samples/sec   Loss 3.6026   LearningRate 0.0039   Epoch: 16   Global Step: 267880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:44,729-Speed 9398.93 samples/sec   Loss 3.5815   LearningRate 0.0039   Epoch: 16   Global Step: 267890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:45,878-Speed 8919.77 samples/sec   Loss 3.5763   LearningRate 0.0039   Epoch: 16   Global Step: 267900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:46,955-Speed 9517.95 samples/sec   Loss 3.6471   LearningRate 0.0039   Epoch: 16   Global Step: 267910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:48,046-Speed 9390.06 samples/sec   Loss 3.6993   LearningRate 0.0039   Epoch: 16   Global Step: 267920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:49,151-Speed 9273.47 samples/sec   Loss 3.6521   LearningRate 0.0039   Epoch: 16   Global Step: 267930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:50,271-Speed 9149.45 samples/sec   Loss 3.6199   LearningRate 0.0039   Epoch: 16   Global Step: 267940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:51,356-Speed 9445.83 samples/sec   Loss 3.5820   LearningRate 0.0039   Epoch: 16   Global Step: 267950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:21:52,425-Speed 9583.67 samples/sec   Loss 3.6767   LearningRate 0.0039   Epoch: 16   Global Step: 267960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:21:53,509-Speed 9452.07 samples/sec   Loss 3.7047   LearningRate 0.0039   Epoch: 16   Global Step: 267970   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:21:54,612-Speed 9289.68 samples/sec   Loss 3.6309   LearningRate 0.0039   Epoch: 16   Global Step: 267980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:21:55,762-Speed 8910.83 samples/sec   Loss 3.6498   LearningRate 0.0039   Epoch: 16   Global Step: 267990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:21:56,851-Speed 9411.03 samples/sec   Loss 3.6609   LearningRate 0.0039   Epoch: 16   Global Step: 268000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:22:18,703-[lfw][268000]XNorm: 6.989682
Training: 2022-04-11 22:22:18,704-[lfw][268000]Accuracy-Flip: 0.99617+-0.00289
Training: 2022-04-11 22:22:18,705-[lfw][268000]Accuracy-Highest: 0.99733
Training: 2022-04-11 22:22:43,936-[cfp_fp][268000]XNorm: 6.071566
Training: 2022-04-11 22:22:43,937-[cfp_fp][268000]Accuracy-Flip: 0.96886+-0.00755
Training: 2022-04-11 22:22:43,937-[cfp_fp][268000]Accuracy-Highest: 0.97171
Training: 2022-04-11 22:23:05,989-[agedb_30][268000]XNorm: 6.805370
Training: 2022-04-11 22:23:05,990-[agedb_30][268000]Accuracy-Flip: 0.97033+-0.00939
Training: 2022-04-11 22:23:05,990-[agedb_30][268000]Accuracy-Highest: 0.97350
Training: 2022-04-11 22:23:07,085-Speed 145.80 samples/sec   Loss 3.7482   LearningRate 0.0039   Epoch: 16   Global Step: 268010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:23:08,149-Speed 9630.71 samples/sec   Loss 3.6823   LearningRate 0.0039   Epoch: 16   Global Step: 268020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:23:09,273-Speed 9117.04 samples/sec   Loss 3.7584   LearningRate 0.0039   Epoch: 16   Global Step: 268030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:23:10,371-Speed 9328.94 samples/sec   Loss 3.6488   LearningRate 0.0039   Epoch: 16   Global Step: 268040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:23:11,438-Speed 9608.43 samples/sec   Loss 3.7521   LearningRate 0.0039   Epoch: 16   Global Step: 268050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:23:12,498-Speed 9661.46 samples/sec   Loss 3.6556   LearningRate 0.0039   Epoch: 16   Global Step: 268060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:13,620-Speed 9134.87 samples/sec   Loss 3.7516   LearningRate 0.0039   Epoch: 16   Global Step: 268070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:14,716-Speed 9348.28 samples/sec   Loss 3.6733   LearningRate 0.0039   Epoch: 16   Global Step: 268080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:15,804-Speed 9421.37 samples/sec   Loss 3.6169   LearningRate 0.0039   Epoch: 16   Global Step: 268090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:16,936-Speed 9047.90 samples/sec   Loss 3.7066   LearningRate 0.0039   Epoch: 16   Global Step: 268100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:18,046-Speed 9231.22 samples/sec   Loss 3.6899   LearningRate 0.0039   Epoch: 16   Global Step: 268110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:19,237-Speed 8606.21 samples/sec   Loss 3.6690   LearningRate 0.0039   Epoch: 16   Global Step: 268120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:20,379-Speed 8971.52 samples/sec   Loss 3.6637   LearningRate 0.0039   Epoch: 16   Global Step: 268130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:21,504-Speed 9104.89 samples/sec   Loss 3.6395   LearningRate 0.0039   Epoch: 16   Global Step: 268140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:22,623-Speed 9152.32 samples/sec   Loss 3.5817   LearningRate 0.0039   Epoch: 16   Global Step: 268150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:23,723-Speed 9321.78 samples/sec   Loss 3.6937   LearningRate 0.0039   Epoch: 16   Global Step: 268160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:23:24,834-Speed 9220.61 samples/sec   Loss 3.7418   LearningRate 0.0039   Epoch: 16   Global Step: 268170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:25,946-Speed 9218.03 samples/sec   Loss 3.6292   LearningRate 0.0039   Epoch: 16   Global Step: 268180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:27,024-Speed 9500.30 samples/sec   Loss 3.6293   LearningRate 0.0039   Epoch: 16   Global Step: 268190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:28,134-Speed 9228.63 samples/sec   Loss 3.6014   LearningRate 0.0039   Epoch: 16   Global Step: 268200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:29,220-Speed 9441.03 samples/sec   Loss 3.6575   LearningRate 0.0039   Epoch: 16   Global Step: 268210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:30,307-Speed 9427.99 samples/sec   Loss 3.7083   LearningRate 0.0039   Epoch: 16   Global Step: 268220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:31,407-Speed 9310.80 samples/sec   Loss 3.6235   LearningRate 0.0039   Epoch: 16   Global Step: 268230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:32,519-Speed 9213.24 samples/sec   Loss 3.6961   LearningRate 0.0039   Epoch: 16   Global Step: 268240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:33,662-Speed 8963.54 samples/sec   Loss 3.6503   LearningRate 0.0039   Epoch: 16   Global Step: 268250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:34,797-Speed 9031.85 samples/sec   Loss 3.7624   LearningRate 0.0039   Epoch: 16   Global Step: 268260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:35,884-Speed 9424.06 samples/sec   Loss 3.6829   LearningRate 0.0039   Epoch: 16   Global Step: 268270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:23:36,939-Speed 9716.07 samples/sec   Loss 3.6919   LearningRate 0.0039   Epoch: 16   Global Step: 268280   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:23:38,071-Speed 9049.39 samples/sec   Loss 3.6502   LearningRate 0.0039   Epoch: 16   Global Step: 268290   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:23:39,223-Speed 8892.55 samples/sec   Loss 3.7225   LearningRate 0.0039   Epoch: 16   Global Step: 268300   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:23:40,329-Speed 9266.24 samples/sec   Loss 3.7417   LearningRate 0.0039   Epoch: 16   Global Step: 268310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:41,434-Speed 9282.90 samples/sec   Loss 3.7295   LearningRate 0.0038   Epoch: 16   Global Step: 268320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:42,571-Speed 9010.29 samples/sec   Loss 3.6087   LearningRate 0.0038   Epoch: 16   Global Step: 268330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:43,705-Speed 9039.20 samples/sec   Loss 3.6676   LearningRate 0.0038   Epoch: 16   Global Step: 268340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:44,752-Speed 9777.99 samples/sec   Loss 3.7352   LearningRate 0.0038   Epoch: 16   Global Step: 268350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:45,882-Speed 9073.17 samples/sec   Loss 3.7195   LearningRate 0.0038   Epoch: 16   Global Step: 268360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:46,987-Speed 9268.26 samples/sec   Loss 3.7029   LearningRate 0.0038   Epoch: 16   Global Step: 268370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:48,125-Speed 9003.02 samples/sec   Loss 3.6104   LearningRate 0.0038   Epoch: 16   Global Step: 268380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:49,291-Speed 8789.61 samples/sec   Loss 3.6707   LearningRate 0.0038   Epoch: 16   Global Step: 268390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:50,396-Speed 9274.35 samples/sec   Loss 3.6793   LearningRate 0.0038   Epoch: 16   Global Step: 268400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:51,472-Speed 9520.66 samples/sec   Loss 3.6299   LearningRate 0.0038   Epoch: 16   Global Step: 268410   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:23:52,536-Speed 9633.03 samples/sec   Loss 3.6080   LearningRate 0.0038   Epoch: 16   Global Step: 268420   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:23:53,632-Speed 9353.76 samples/sec   Loss 3.6706   LearningRate 0.0038   Epoch: 16   Global Step: 268430   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:23:54,673-Speed 9844.96 samples/sec   Loss 3.6445   LearningRate 0.0038   Epoch: 16   Global Step: 268440   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:23:55,786-Speed 9202.57 samples/sec   Loss 3.6506   LearningRate 0.0038   Epoch: 16   Global Step: 268450   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:23:56,943-Speed 8852.50 samples/sec   Loss 3.7181   LearningRate 0.0038   Epoch: 16   Global Step: 268460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:23:58,018-Speed 9536.59 samples/sec   Loss 3.6886   LearningRate 0.0038   Epoch: 16   Global Step: 268470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:23:59,110-Speed 9382.29 samples/sec   Loss 3.6856   LearningRate 0.0038   Epoch: 16   Global Step: 268480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:00,218-Speed 9248.56 samples/sec   Loss 3.7411   LearningRate 0.0038   Epoch: 16   Global Step: 268490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:01,352-Speed 9031.40 samples/sec   Loss 3.6548   LearningRate 0.0038   Epoch: 16   Global Step: 268500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:02,459-Speed 9255.97 samples/sec   Loss 3.7071   LearningRate 0.0038   Epoch: 16   Global Step: 268510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:03,583-Speed 9113.89 samples/sec   Loss 3.6546   LearningRate 0.0038   Epoch: 16   Global Step: 268520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:04,739-Speed 8864.71 samples/sec   Loss 3.7191   LearningRate 0.0038   Epoch: 16   Global Step: 268530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:05,848-Speed 9239.67 samples/sec   Loss 3.6957   LearningRate 0.0038   Epoch: 16   Global Step: 268540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:06,957-Speed 9237.05 samples/sec   Loss 3.6626   LearningRate 0.0038   Epoch: 16   Global Step: 268550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:08,041-Speed 9454.98 samples/sec   Loss 3.6201   LearningRate 0.0038   Epoch: 16   Global Step: 268560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:09,151-Speed 9225.90 samples/sec   Loss 3.6823   LearningRate 0.0038   Epoch: 16   Global Step: 268570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:24:10,252-Speed 9302.64 samples/sec   Loss 3.6582   LearningRate 0.0038   Epoch: 16   Global Step: 268580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:24:11,370-Speed 9170.06 samples/sec   Loss 3.6976   LearningRate 0.0038   Epoch: 16   Global Step: 268590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:12,439-Speed 9591.58 samples/sec   Loss 3.7354   LearningRate 0.0038   Epoch: 16   Global Step: 268600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:13,563-Speed 9116.30 samples/sec   Loss 3.6796   LearningRate 0.0038   Epoch: 16   Global Step: 268610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:14,660-Speed 9338.28 samples/sec   Loss 3.6723   LearningRate 0.0038   Epoch: 16   Global Step: 268620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:15,747-Speed 9428.12 samples/sec   Loss 3.6752   LearningRate 0.0038   Epoch: 16   Global Step: 268630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:16,869-Speed 9126.11 samples/sec   Loss 3.6714   LearningRate 0.0038   Epoch: 16   Global Step: 268640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:18,016-Speed 8931.48 samples/sec   Loss 3.7143   LearningRate 0.0038   Epoch: 16   Global Step: 268650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:19,157-Speed 8983.18 samples/sec   Loss 3.7587   LearningRate 0.0038   Epoch: 16   Global Step: 268660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:20,264-Speed 9250.93 samples/sec   Loss 3.6930   LearningRate 0.0038   Epoch: 16   Global Step: 268670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:21,327-Speed 9640.23 samples/sec   Loss 3.6962   LearningRate 0.0038   Epoch: 16   Global Step: 268680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:22,431-Speed 9284.86 samples/sec   Loss 3.7732   LearningRate 0.0038   Epoch: 16   Global Step: 268690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:24:23,557-Speed 9098.50 samples/sec   Loss 3.6586   LearningRate 0.0038   Epoch: 16   Global Step: 268700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:24,658-Speed 9312.89 samples/sec   Loss 3.7049   LearningRate 0.0038   Epoch: 16   Global Step: 268710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:25,753-Speed 9359.29 samples/sec   Loss 3.6657   LearningRate 0.0038   Epoch: 16   Global Step: 268720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:26,831-Speed 9499.69 samples/sec   Loss 3.6659   LearningRate 0.0038   Epoch: 16   Global Step: 268730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:27,897-Speed 9609.58 samples/sec   Loss 3.6584   LearningRate 0.0038   Epoch: 16   Global Step: 268740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:28,964-Speed 9613.24 samples/sec   Loss 3.6667   LearningRate 0.0038   Epoch: 16   Global Step: 268750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:30,087-Speed 9123.38 samples/sec   Loss 3.6711   LearningRate 0.0038   Epoch: 16   Global Step: 268760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:31,162-Speed 9532.63 samples/sec   Loss 3.7220   LearningRate 0.0038   Epoch: 16   Global Step: 268770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:32,296-Speed 9037.07 samples/sec   Loss 3.7496   LearningRate 0.0038   Epoch: 16   Global Step: 268780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:33,444-Speed 8922.84 samples/sec   Loss 3.7102   LearningRate 0.0038   Epoch: 16   Global Step: 268790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:34,555-Speed 9225.06 samples/sec   Loss 3.7154   LearningRate 0.0038   Epoch: 16   Global Step: 268800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:24:35,635-Speed 9481.15 samples/sec   Loss 3.6738   LearningRate 0.0038   Epoch: 16   Global Step: 268810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:36,719-Speed 9455.22 samples/sec   Loss 3.6787   LearningRate 0.0038   Epoch: 16   Global Step: 268820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:37,827-Speed 9240.31 samples/sec   Loss 3.6304   LearningRate 0.0038   Epoch: 16   Global Step: 268830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:38,904-Speed 9517.99 samples/sec   Loss 3.7307   LearningRate 0.0038   Epoch: 16   Global Step: 268840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:39,982-Speed 9501.43 samples/sec   Loss 3.7222   LearningRate 0.0038   Epoch: 16   Global Step: 268850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:41,132-Speed 8912.47 samples/sec   Loss 3.7487   LearningRate 0.0038   Epoch: 16   Global Step: 268860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:42,240-Speed 9247.58 samples/sec   Loss 3.7127   LearningRate 0.0038   Epoch: 16   Global Step: 268870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:43,361-Speed 9143.75 samples/sec   Loss 3.8183   LearningRate 0.0038   Epoch: 16   Global Step: 268880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:44,489-Speed 9082.50 samples/sec   Loss 3.7010   LearningRate 0.0038   Epoch: 16   Global Step: 268890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:45,554-Speed 9618.39 samples/sec   Loss 3.6774   LearningRate 0.0038   Epoch: 16   Global Step: 268900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:46,631-Speed 9520.35 samples/sec   Loss 3.7305   LearningRate 0.0038   Epoch: 16   Global Step: 268910   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:24:47,738-Speed 9248.58 samples/sec   Loss 3.6867   LearningRate 0.0038   Epoch: 16   Global Step: 268920   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:24:48,817-Speed 9501.49 samples/sec   Loss 3.7083   LearningRate 0.0038   Epoch: 16   Global Step: 268930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:24:49,887-Speed 9569.21 samples/sec   Loss 3.6518   LearningRate 0.0038   Epoch: 16   Global Step: 268940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:50,999-Speed 9218.82 samples/sec   Loss 3.7461   LearningRate 0.0038   Epoch: 16   Global Step: 268950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:52,106-Speed 9250.69 samples/sec   Loss 3.6462   LearningRate 0.0038   Epoch: 16   Global Step: 268960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:53,169-Speed 9641.08 samples/sec   Loss 3.6521   LearningRate 0.0038   Epoch: 16   Global Step: 268970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:54,235-Speed 9619.43 samples/sec   Loss 3.6941   LearningRate 0.0038   Epoch: 16   Global Step: 268980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:55,303-Speed 9584.64 samples/sec   Loss 3.7565   LearningRate 0.0038   Epoch: 16   Global Step: 268990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:56,405-Speed 9301.81 samples/sec   Loss 3.7141   LearningRate 0.0038   Epoch: 16   Global Step: 269000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:57,513-Speed 9246.59 samples/sec   Loss 3.7609   LearningRate 0.0038   Epoch: 16   Global Step: 269010   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:58,641-Speed 9091.04 samples/sec   Loss 3.7159   LearningRate 0.0038   Epoch: 16   Global Step: 269020   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:24:59,808-Speed 8777.80 samples/sec   Loss 3.6781   LearningRate 0.0038   Epoch: 16   Global Step: 269030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:00,898-Speed 9393.57 samples/sec   Loss 3.6971   LearningRate 0.0038   Epoch: 16   Global Step: 269040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:25:01,998-Speed 9318.92 samples/sec   Loss 3.6452   LearningRate 0.0038   Epoch: 16   Global Step: 269050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:03,166-Speed 8773.73 samples/sec   Loss 3.8042   LearningRate 0.0038   Epoch: 16   Global Step: 269060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:04,258-Speed 9375.69 samples/sec   Loss 3.6688   LearningRate 0.0038   Epoch: 16   Global Step: 269070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:05,350-Speed 9388.65 samples/sec   Loss 3.7647   LearningRate 0.0038   Epoch: 16   Global Step: 269080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:06,456-Speed 9270.18 samples/sec   Loss 3.7349   LearningRate 0.0038   Epoch: 16   Global Step: 269090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:07,563-Speed 9257.88 samples/sec   Loss 3.7298   LearningRate 0.0038   Epoch: 16   Global Step: 269100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:08,659-Speed 9345.65 samples/sec   Loss 3.6841   LearningRate 0.0038   Epoch: 16   Global Step: 269110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:09,790-Speed 9055.86 samples/sec   Loss 3.7431   LearningRate 0.0038   Epoch: 16   Global Step: 269120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:10,864-Speed 9540.90 samples/sec   Loss 3.6871   LearningRate 0.0038   Epoch: 16   Global Step: 269130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:11,951-Speed 9423.11 samples/sec   Loss 3.7416   LearningRate 0.0038   Epoch: 16   Global Step: 269140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:13,009-Speed 9689.01 samples/sec   Loss 3.6720   LearningRate 0.0038   Epoch: 16   Global Step: 269150   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:25:14,078-Speed 9582.48 samples/sec   Loss 3.7294   LearningRate 0.0038   Epoch: 16   Global Step: 269160   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:25:15,138-Speed 9664.54 samples/sec   Loss 3.7633   LearningRate 0.0038   Epoch: 16   Global Step: 269170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:16,222-Speed 9457.56 samples/sec   Loss 3.7472   LearningRate 0.0037   Epoch: 16   Global Step: 269180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:17,323-Speed 9305.15 samples/sec   Loss 3.7140   LearningRate 0.0037   Epoch: 16   Global Step: 269190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:18,416-Speed 9378.35 samples/sec   Loss 3.8256   LearningRate 0.0037   Epoch: 16   Global Step: 269200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:19,528-Speed 9212.31 samples/sec   Loss 3.6983   LearningRate 0.0037   Epoch: 16   Global Step: 269210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:20,674-Speed 8938.40 samples/sec   Loss 3.6381   LearningRate 0.0037   Epoch: 16   Global Step: 269220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:21,769-Speed 9357.48 samples/sec   Loss 3.7596   LearningRate 0.0037   Epoch: 16   Global Step: 269230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:22,944-Speed 8720.67 samples/sec   Loss 3.6838   LearningRate 0.0037   Epoch: 16   Global Step: 269240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:24,095-Speed 8902.97 samples/sec   Loss 3.8393   LearningRate 0.0037   Epoch: 16   Global Step: 269250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:25,186-Speed 9395.04 samples/sec   Loss 3.7833   LearningRate 0.0037   Epoch: 16   Global Step: 269260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:26,299-Speed 9199.50 samples/sec   Loss 3.7298   LearningRate 0.0037   Epoch: 16   Global Step: 269270   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:25:27,387-Speed 9416.40 samples/sec   Loss 3.6678   LearningRate 0.0037   Epoch: 16   Global Step: 269280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:28,450-Speed 9639.24 samples/sec   Loss 3.7084   LearningRate 0.0037   Epoch: 16   Global Step: 269290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:29,567-Speed 9183.23 samples/sec   Loss 3.6787   LearningRate 0.0037   Epoch: 16   Global Step: 269300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:30,641-Speed 9537.57 samples/sec   Loss 3.7162   LearningRate 0.0037   Epoch: 16   Global Step: 269310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:31,729-Speed 9416.63 samples/sec   Loss 3.7572   LearningRate 0.0037   Epoch: 16   Global Step: 269320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:32,869-Speed 8984.31 samples/sec   Loss 3.7627   LearningRate 0.0037   Epoch: 16   Global Step: 269330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:33,982-Speed 9207.03 samples/sec   Loss 3.8166   LearningRate 0.0037   Epoch: 16   Global Step: 269340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:35,027-Speed 9803.65 samples/sec   Loss 3.7090   LearningRate 0.0037   Epoch: 16   Global Step: 269350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:36,105-Speed 9503.62 samples/sec   Loss 3.6927   LearningRate 0.0037   Epoch: 16   Global Step: 269360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:25:37,240-Speed 9030.48 samples/sec   Loss 3.7358   LearningRate 0.0037   Epoch: 16   Global Step: 269370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:25:38,386-Speed 8935.93 samples/sec   Loss 3.7449   LearningRate 0.0037   Epoch: 16   Global Step: 269380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:25:39,494-Speed 9254.06 samples/sec   Loss 3.6354   LearningRate 0.0037   Epoch: 16   Global Step: 269390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:25:40,640-Speed 8936.53 samples/sec   Loss 3.7713   LearningRate 0.0037   Epoch: 16   Global Step: 269400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:25:41,780-Speed 8996.90 samples/sec   Loss 3.7715   LearningRate 0.0037   Epoch: 16   Global Step: 269410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:25:42,886-Speed 9261.90 samples/sec   Loss 3.6895   LearningRate 0.0037   Epoch: 16   Global Step: 269420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:25:44,029-Speed 8966.16 samples/sec   Loss 3.7661   LearningRate 0.0037   Epoch: 16   Global Step: 269430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:25:45,136-Speed 9248.51 samples/sec   Loss 3.7692   LearningRate 0.0037   Epoch: 16   Global Step: 269440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:25:46,264-Speed 9089.50 samples/sec   Loss 3.8183   LearningRate 0.0037   Epoch: 16   Global Step: 269450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:25:47,405-Speed 8972.95 samples/sec   Loss 3.7446   LearningRate 0.0037   Epoch: 16   Global Step: 269460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:48,485-Speed 9497.78 samples/sec   Loss 3.7376   LearningRate 0.0037   Epoch: 16   Global Step: 269470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:49,639-Speed 8876.32 samples/sec   Loss 3.6996   LearningRate 0.0037   Epoch: 16   Global Step: 269480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:50,728-Speed 9409.83 samples/sec   Loss 3.8155   LearningRate 0.0037   Epoch: 16   Global Step: 269490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:51,809-Speed 9472.33 samples/sec   Loss 3.7249   LearningRate 0.0037   Epoch: 16   Global Step: 269500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:52,904-Speed 9355.75 samples/sec   Loss 3.7278   LearningRate 0.0037   Epoch: 16   Global Step: 269510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:54,014-Speed 9236.69 samples/sec   Loss 3.8200   LearningRate 0.0037   Epoch: 16   Global Step: 269520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:55,132-Speed 9157.85 samples/sec   Loss 3.7629   LearningRate 0.0037   Epoch: 16   Global Step: 269530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:56,215-Speed 9463.10 samples/sec   Loss 3.7222   LearningRate 0.0037   Epoch: 16   Global Step: 269540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:57,324-Speed 9239.27 samples/sec   Loss 3.7268   LearningRate 0.0037   Epoch: 16   Global Step: 269550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:25:58,442-Speed 9167.36 samples/sec   Loss 3.8195   LearningRate 0.0037   Epoch: 16   Global Step: 269560   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:25:59,592-Speed 8913.86 samples/sec   Loss 3.7529   LearningRate 0.0037   Epoch: 16   Global Step: 269570   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:26:00,703-Speed 9224.23 samples/sec   Loss 3.7918   LearningRate 0.0037   Epoch: 16   Global Step: 269580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:01,825-Speed 9130.68 samples/sec   Loss 3.7403   LearningRate 0.0037   Epoch: 16   Global Step: 269590   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:02,923-Speed 9326.32 samples/sec   Loss 3.7191   LearningRate 0.0037   Epoch: 16   Global Step: 269600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:04,014-Speed 9392.49 samples/sec   Loss 3.6929   LearningRate 0.0037   Epoch: 16   Global Step: 269610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:05,131-Speed 9172.19 samples/sec   Loss 3.7927   LearningRate 0.0037   Epoch: 16   Global Step: 269620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:06,300-Speed 8766.79 samples/sec   Loss 3.7343   LearningRate 0.0037   Epoch: 16   Global Step: 269630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:07,408-Speed 9248.21 samples/sec   Loss 3.7466   LearningRate 0.0037   Epoch: 16   Global Step: 269640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:08,507-Speed 9323.01 samples/sec   Loss 3.8456   LearningRate 0.0037   Epoch: 16   Global Step: 269650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:09,620-Speed 9207.20 samples/sec   Loss 3.7755   LearningRate 0.0037   Epoch: 16   Global Step: 269660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:10,763-Speed 8962.39 samples/sec   Loss 3.7878   LearningRate 0.0037   Epoch: 16   Global Step: 269670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:11,860-Speed 9337.53 samples/sec   Loss 3.7529   LearningRate 0.0037   Epoch: 16   Global Step: 269680   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:26:12,953-Speed 9373.27 samples/sec   Loss 3.7848   LearningRate 0.0037   Epoch: 16   Global Step: 269690   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:26:14,012-Speed 9672.50 samples/sec   Loss 3.7791   LearningRate 0.0037   Epoch: 16   Global Step: 269700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:15,093-Speed 9477.93 samples/sec   Loss 3.7714   LearningRate 0.0037   Epoch: 16   Global Step: 269710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:16,181-Speed 9424.65 samples/sec   Loss 3.7460   LearningRate 0.0037   Epoch: 16   Global Step: 269720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:17,279-Speed 9330.04 samples/sec   Loss 3.8549   LearningRate 0.0037   Epoch: 16   Global Step: 269730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:18,441-Speed 8818.94 samples/sec   Loss 3.8213   LearningRate 0.0037   Epoch: 16   Global Step: 269740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:19,565-Speed 9117.64 samples/sec   Loss 3.6917   LearningRate 0.0037   Epoch: 16   Global Step: 269750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:20,639-Speed 9543.12 samples/sec   Loss 3.7425   LearningRate 0.0037   Epoch: 16   Global Step: 269760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:21,812-Speed 8733.62 samples/sec   Loss 3.7692   LearningRate 0.0037   Epoch: 16   Global Step: 269770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:22,907-Speed 9354.64 samples/sec   Loss 3.8524   LearningRate 0.0037   Epoch: 16   Global Step: 269780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:23,983-Speed 9523.05 samples/sec   Loss 3.7325   LearningRate 0.0037   Epoch: 16   Global Step: 269790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:25,129-Speed 8940.76 samples/sec   Loss 3.8416   LearningRate 0.0037   Epoch: 16   Global Step: 269800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:26:26,263-Speed 9030.54 samples/sec   Loss 3.7436   LearningRate 0.0037   Epoch: 16   Global Step: 269810   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:26:27,372-Speed 9237.35 samples/sec   Loss 3.7804   LearningRate 0.0037   Epoch: 16   Global Step: 269820   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:26:28,476-Speed 9283.45 samples/sec   Loss 3.7936   LearningRate 0.0037   Epoch: 16   Global Step: 269830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:29,534-Speed 9682.07 samples/sec   Loss 3.7604   LearningRate 0.0037   Epoch: 16   Global Step: 269840   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:30,617-Speed 9461.72 samples/sec   Loss 3.7596   LearningRate 0.0037   Epoch: 16   Global Step: 269850   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:31,706-Speed 9409.15 samples/sec   Loss 3.7071   LearningRate 0.0037   Epoch: 16   Global Step: 269860   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:32,815-Speed 9235.46 samples/sec   Loss 3.7340   LearningRate 0.0037   Epoch: 16   Global Step: 269870   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:33,962-Speed 8933.74 samples/sec   Loss 3.7533   LearningRate 0.0037   Epoch: 16   Global Step: 269880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:35,067-Speed 9270.00 samples/sec   Loss 3.7495   LearningRate 0.0037   Epoch: 16   Global Step: 269890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:36,199-Speed 9059.93 samples/sec   Loss 3.6887   LearningRate 0.0037   Epoch: 16   Global Step: 269900   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:37,326-Speed 9096.22 samples/sec   Loss 3.7212   LearningRate 0.0037   Epoch: 16   Global Step: 269910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:38,547-Speed 8390.07 samples/sec   Loss 3.8235   LearningRate 0.0037   Epoch: 16   Global Step: 269920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:26:39,662-Speed 9189.29 samples/sec   Loss 3.7562   LearningRate 0.0037   Epoch: 16   Global Step: 269930   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:26:40,733-Speed 9567.34 samples/sec   Loss 3.7803   LearningRate 0.0037   Epoch: 16   Global Step: 269940   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:26:41,844-Speed 9221.74 samples/sec   Loss 3.6939   LearningRate 0.0037   Epoch: 16   Global Step: 269950   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:26:42,896-Speed 9735.53 samples/sec   Loss 3.8077   LearningRate 0.0037   Epoch: 16   Global Step: 269960   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:26:44,029-Speed 9043.19 samples/sec   Loss 3.7732   LearningRate 0.0037   Epoch: 16   Global Step: 269970   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:26:45,119-Speed 9394.88 samples/sec   Loss 3.7726   LearningRate 0.0037   Epoch: 16   Global Step: 269980   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:26:46,229-Speed 9237.21 samples/sec   Loss 3.8637   LearningRate 0.0037   Epoch: 16   Global Step: 269990   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:26:47,299-Speed 9573.91 samples/sec   Loss 3.7602   LearningRate 0.0037   Epoch: 16   Global Step: 270000   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:27:09,298-[lfw][270000]XNorm: 6.974490
Training: 2022-04-11 22:27:09,298-[lfw][270000]Accuracy-Flip: 0.99733+-0.00309
Training: 2022-04-11 22:27:09,299-[lfw][270000]Accuracy-Highest: 0.99733
Training: 2022-04-11 22:27:34,696-[cfp_fp][270000]XNorm: 6.065904
Training: 2022-04-11 22:27:34,697-[cfp_fp][270000]Accuracy-Flip: 0.97071+-0.00662
Training: 2022-04-11 22:27:34,698-[cfp_fp][270000]Accuracy-Highest: 0.97171
Training: 2022-04-11 22:27:56,661-[agedb_30][270000]XNorm: 6.788168
Training: 2022-04-11 22:27:56,661-[agedb_30][270000]Accuracy-Flip: 0.97033+-0.00927
Training: 2022-04-11 22:27:56,662-[agedb_30][270000]Accuracy-Highest: 0.97350
Training: 2022-04-11 22:27:57,747-Speed 145.36 samples/sec   Loss 3.8149   LearningRate 0.0037   Epoch: 16   Global Step: 270010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:27:58,833-Speed 9440.52 samples/sec   Loss 3.7148   LearningRate 0.0037   Epoch: 16   Global Step: 270020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:27:59,962-Speed 9077.05 samples/sec   Loss 3.8257   LearningRate 0.0037   Epoch: 16   Global Step: 270030   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:28:01,075-Speed 9199.18 samples/sec   Loss 3.7350   LearningRate 0.0037   Epoch: 16   Global Step: 270040   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:28:02,200-Speed 9108.56 samples/sec   Loss 3.7002   LearningRate 0.0036   Epoch: 16   Global Step: 270050   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:28:03,271-Speed 9572.63 samples/sec   Loss 3.7090   LearningRate 0.0036   Epoch: 16   Global Step: 270060   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:28:04,334-Speed 9636.56 samples/sec   Loss 3.8054   LearningRate 0.0036   Epoch: 16   Global Step: 270070   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:28:05,437-Speed 9282.53 samples/sec   Loss 3.7716   LearningRate 0.0036   Epoch: 16   Global Step: 270080   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:28:06,551-Speed 9202.08 samples/sec   Loss 3.7054   LearningRate 0.0036   Epoch: 16   Global Step: 270090   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:28:07,683-Speed 9047.41 samples/sec   Loss 3.7252   LearningRate 0.0036   Epoch: 16   Global Step: 270100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:08,810-Speed 9093.83 samples/sec   Loss 3.7246   LearningRate 0.0036   Epoch: 16   Global Step: 270110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:09,981-Speed 8749.25 samples/sec   Loss 3.7079   LearningRate 0.0036   Epoch: 16   Global Step: 270120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:11,036-Speed 9718.73 samples/sec   Loss 3.7224   LearningRate 0.0036   Epoch: 16   Global Step: 270130   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:12,160-Speed 9109.01 samples/sec   Loss 3.7191   LearningRate 0.0036   Epoch: 16   Global Step: 270140   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:13,293-Speed 9043.05 samples/sec   Loss 3.7228   LearningRate 0.0036   Epoch: 16   Global Step: 270150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:14,376-Speed 9458.95 samples/sec   Loss 3.7662   LearningRate 0.0036   Epoch: 16   Global Step: 270160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:15,497-Speed 9146.42 samples/sec   Loss 3.7924   LearningRate 0.0036   Epoch: 16   Global Step: 270170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:16,595-Speed 9327.49 samples/sec   Loss 3.7294   LearningRate 0.0036   Epoch: 16   Global Step: 270180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:17,712-Speed 9174.98 samples/sec   Loss 3.7588   LearningRate 0.0036   Epoch: 16   Global Step: 270190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:18,834-Speed 9132.91 samples/sec   Loss 3.7433   LearningRate 0.0036   Epoch: 16   Global Step: 270200   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:28:19,976-Speed 8969.93 samples/sec   Loss 3.7599   LearningRate 0.0036   Epoch: 16   Global Step: 270210   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:28:21,083-Speed 9255.63 samples/sec   Loss 3.8189   LearningRate 0.0036   Epoch: 16   Global Step: 270220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:22,202-Speed 9156.06 samples/sec   Loss 3.8621   LearningRate 0.0036   Epoch: 16   Global Step: 270230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:23,321-Speed 9158.54 samples/sec   Loss 3.7541   LearningRate 0.0036   Epoch: 16   Global Step: 270240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:24,390-Speed 9581.88 samples/sec   Loss 3.7814   LearningRate 0.0036   Epoch: 16   Global Step: 270250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:25,534-Speed 8955.07 samples/sec   Loss 3.7190   LearningRate 0.0036   Epoch: 16   Global Step: 270260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:26,649-Speed 9196.34 samples/sec   Loss 3.7589   LearningRate 0.0036   Epoch: 16   Global Step: 270270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:27,749-Speed 9315.14 samples/sec   Loss 3.7053   LearningRate 0.0036   Epoch: 16   Global Step: 270280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:28,854-Speed 9279.34 samples/sec   Loss 3.7319   LearningRate 0.0036   Epoch: 16   Global Step: 270290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:29,923-Speed 9580.82 samples/sec   Loss 3.7336   LearningRate 0.0036   Epoch: 16   Global Step: 270300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:30,986-Speed 9633.93 samples/sec   Loss 3.6802   LearningRate 0.0036   Epoch: 16   Global Step: 270310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:32,065-Speed 9510.26 samples/sec   Loss 3.8150   LearningRate 0.0036   Epoch: 16   Global Step: 270320   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:28:33,188-Speed 9124.09 samples/sec   Loss 3.6975   LearningRate 0.0036   Epoch: 16   Global Step: 270330   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:28:34,291-Speed 9290.19 samples/sec   Loss 3.6208   LearningRate 0.0036   Epoch: 16   Global Step: 270340   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:28:35,372-Speed 9473.23 samples/sec   Loss 3.7273   LearningRate 0.0036   Epoch: 16   Global Step: 270350   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:28:36,439-Speed 9604.94 samples/sec   Loss 3.7332   LearningRate 0.0036   Epoch: 16   Global Step: 270360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:37,535-Speed 9349.06 samples/sec   Loss 3.7708   LearningRate 0.0036   Epoch: 16   Global Step: 270370   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:38,631-Speed 9345.35 samples/sec   Loss 3.7418   LearningRate 0.0036   Epoch: 16   Global Step: 270380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:39,741-Speed 9232.09 samples/sec   Loss 3.8493   LearningRate 0.0036   Epoch: 16   Global Step: 270390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:40,845-Speed 9280.06 samples/sec   Loss 3.7536   LearningRate 0.0036   Epoch: 16   Global Step: 270400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:41,934-Speed 9413.58 samples/sec   Loss 3.8024   LearningRate 0.0036   Epoch: 16   Global Step: 270410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:43,002-Speed 9587.33 samples/sec   Loss 3.7284   LearningRate 0.0036   Epoch: 16   Global Step: 270420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:44,111-Speed 9243.12 samples/sec   Loss 3.7558   LearningRate 0.0036   Epoch: 16   Global Step: 270430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:45,182-Speed 9563.92 samples/sec   Loss 3.7909   LearningRate 0.0036   Epoch: 16   Global Step: 270440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:46,305-Speed 9134.51 samples/sec   Loss 3.7300   LearningRate 0.0036   Epoch: 16   Global Step: 270450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:47,402-Speed 9335.79 samples/sec   Loss 3.7618   LearningRate 0.0036   Epoch: 16   Global Step: 270460   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:28:48,513-Speed 9219.35 samples/sec   Loss 3.7921   LearningRate 0.0036   Epoch: 16   Global Step: 270470   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:28:49,595-Speed 9472.15 samples/sec   Loss 3.7663   LearningRate 0.0036   Epoch: 16   Global Step: 270480   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:28:50,716-Speed 9136.81 samples/sec   Loss 3.8227   LearningRate 0.0036   Epoch: 16   Global Step: 270490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:51,828-Speed 9214.83 samples/sec   Loss 3.7820   LearningRate 0.0036   Epoch: 16   Global Step: 270500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:52,972-Speed 8952.76 samples/sec   Loss 3.7390   LearningRate 0.0036   Epoch: 16   Global Step: 270510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:54,080-Speed 9248.40 samples/sec   Loss 3.8210   LearningRate 0.0036   Epoch: 16   Global Step: 270520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:55,198-Speed 9162.98 samples/sec   Loss 3.8352   LearningRate 0.0036   Epoch: 16   Global Step: 270530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:56,328-Speed 9067.54 samples/sec   Loss 3.7252   LearningRate 0.0036   Epoch: 16   Global Step: 270540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:57,455-Speed 9093.84 samples/sec   Loss 3.7246   LearningRate 0.0036   Epoch: 16   Global Step: 270550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:58,583-Speed 9086.16 samples/sec   Loss 3.7668   LearningRate 0.0036   Epoch: 16   Global Step: 270560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:28:59,671-Speed 9414.43 samples/sec   Loss 3.7715   LearningRate 0.0036   Epoch: 16   Global Step: 270570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:00,808-Speed 9006.37 samples/sec   Loss 3.7754   LearningRate 0.0036   Epoch: 16   Global Step: 270580   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:01,906-Speed 9335.19 samples/sec   Loss 3.7381   LearningRate 0.0036   Epoch: 16   Global Step: 270590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:29:03,010-Speed 9285.81 samples/sec   Loss 3.7176   LearningRate 0.0036   Epoch: 16   Global Step: 270600   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:29:04,098-Speed 9414.01 samples/sec   Loss 3.7662   LearningRate 0.0036   Epoch: 16   Global Step: 270610   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:29:05,211-Speed 9209.28 samples/sec   Loss 3.7417   LearningRate 0.0036   Epoch: 16   Global Step: 270620   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:29:06,263-Speed 9733.96 samples/sec   Loss 3.8239   LearningRate 0.0036   Epoch: 16   Global Step: 270630   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:29:07,353-Speed 9398.59 samples/sec   Loss 3.7439   LearningRate 0.0036   Epoch: 16   Global Step: 270640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:08,445-Speed 9382.14 samples/sec   Loss 3.7114   LearningRate 0.0036   Epoch: 16   Global Step: 270650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:09,616-Speed 8758.08 samples/sec   Loss 3.7403   LearningRate 0.0036   Epoch: 16   Global Step: 270660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:10,728-Speed 9206.26 samples/sec   Loss 3.7934   LearningRate 0.0036   Epoch: 16   Global Step: 270670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:11,838-Speed 9234.08 samples/sec   Loss 3.7286   LearningRate 0.0036   Epoch: 16   Global Step: 270680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:12,928-Speed 9403.61 samples/sec   Loss 3.7785   LearningRate 0.0036   Epoch: 16   Global Step: 270690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:14,048-Speed 9146.62 samples/sec   Loss 3.7435   LearningRate 0.0036   Epoch: 16   Global Step: 270700   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:15,172-Speed 9109.62 samples/sec   Loss 3.7938   LearningRate 0.0036   Epoch: 16   Global Step: 270710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:16,275-Speed 9295.80 samples/sec   Loss 3.7711   LearningRate 0.0036   Epoch: 16   Global Step: 270720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:17,402-Speed 9096.41 samples/sec   Loss 3.7115   LearningRate 0.0036   Epoch: 16   Global Step: 270730   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:18,531-Speed 9071.99 samples/sec   Loss 3.7779   LearningRate 0.0036   Epoch: 16   Global Step: 270740   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:29:19,643-Speed 9213.95 samples/sec   Loss 3.8081   LearningRate 0.0036   Epoch: 16   Global Step: 270750   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:29:20,783-Speed 8990.69 samples/sec   Loss 3.7675   LearningRate 0.0036   Epoch: 16   Global Step: 270760   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:29:21,870-Speed 9425.05 samples/sec   Loss 3.7769   LearningRate 0.0036   Epoch: 16   Global Step: 270770   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:29:22,963-Speed 9372.53 samples/sec   Loss 3.8610   LearningRate 0.0036   Epoch: 16   Global Step: 270780   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:29:24,069-Speed 9263.86 samples/sec   Loss 3.7633   LearningRate 0.0036   Epoch: 16   Global Step: 270790   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:29:25,146-Speed 9513.56 samples/sec   Loss 3.7298   LearningRate 0.0036   Epoch: 16   Global Step: 270800   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:29:26,232-Speed 9436.54 samples/sec   Loss 3.8113   LearningRate 0.0036   Epoch: 16   Global Step: 270810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:29:27,376-Speed 8955.40 samples/sec   Loss 3.7931   LearningRate 0.0036   Epoch: 16   Global Step: 270820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:29:28,496-Speed 9155.32 samples/sec   Loss 3.8192   LearningRate 0.0036   Epoch: 16   Global Step: 270830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:29:29,605-Speed 9237.83 samples/sec   Loss 3.8020   LearningRate 0.0036   Epoch: 16   Global Step: 270840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:29:30,771-Speed 8787.25 samples/sec   Loss 3.7166   LearningRate 0.0036   Epoch: 16   Global Step: 270850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:29:31,910-Speed 8993.75 samples/sec   Loss 3.8414   LearningRate 0.0036   Epoch: 16   Global Step: 270860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:29:33,054-Speed 8952.21 samples/sec   Loss 3.7300   LearningRate 0.0036   Epoch: 16   Global Step: 270870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:29:34,162-Speed 9253.47 samples/sec   Loss 3.6696   LearningRate 0.0036   Epoch: 16   Global Step: 270880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:29:35,234-Speed 9550.95 samples/sec   Loss 3.7411   LearningRate 0.0036   Epoch: 16   Global Step: 270890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:29:36,321-Speed 9425.98 samples/sec   Loss 3.7932   LearningRate 0.0036   Epoch: 16   Global Step: 270900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-11 22:29:37,403-Speed 9471.90 samples/sec   Loss 3.7742   LearningRate 0.0036   Epoch: 16   Global Step: 270910   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:38,512-Speed 9237.80 samples/sec   Loss 3.7879   LearningRate 0.0036   Epoch: 16   Global Step: 270920   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:39,624-Speed 9220.13 samples/sec   Loss 3.7376   LearningRate 0.0035   Epoch: 16   Global Step: 270930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:40,753-Speed 9071.59 samples/sec   Loss 3.7996   LearningRate 0.0035   Epoch: 16   Global Step: 270940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:41,861-Speed 9254.05 samples/sec   Loss 3.8484   LearningRate 0.0035   Epoch: 16   Global Step: 270950   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:42,968-Speed 9251.82 samples/sec   Loss 3.6939   LearningRate 0.0035   Epoch: 16   Global Step: 270960   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:44,063-Speed 9357.00 samples/sec   Loss 3.7680   LearningRate 0.0035   Epoch: 16   Global Step: 270970   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:45,180-Speed 9172.29 samples/sec   Loss 3.7643   LearningRate 0.0035   Epoch: 16   Global Step: 270980   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:46,326-Speed 8943.50 samples/sec   Loss 3.7551   LearningRate 0.0035   Epoch: 16   Global Step: 270990   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:47,434-Speed 9249.05 samples/sec   Loss 3.6823   LearningRate 0.0035   Epoch: 16   Global Step: 271000   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:48,571-Speed 9009.57 samples/sec   Loss 3.8059   LearningRate 0.0035   Epoch: 16   Global Step: 271010   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:29:49,689-Speed 9164.94 samples/sec   Loss 3.7992   LearningRate 0.0035   Epoch: 16   Global Step: 271020   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:29:50,839-Speed 8909.13 samples/sec   Loss 3.7645   LearningRate 0.0035   Epoch: 16   Global Step: 271030   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:51,994-Speed 8871.97 samples/sec   Loss 3.8547   LearningRate 0.0035   Epoch: 16   Global Step: 271040   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:53,108-Speed 9191.79 samples/sec   Loss 3.8010   LearningRate 0.0035   Epoch: 16   Global Step: 271050   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:54,288-Speed 8686.41 samples/sec   Loss 3.7973   LearningRate 0.0035   Epoch: 16   Global Step: 271060   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:55,408-Speed 9147.66 samples/sec   Loss 3.8074   LearningRate 0.0035   Epoch: 16   Global Step: 271070   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:56,538-Speed 9067.36 samples/sec   Loss 3.8058   LearningRate 0.0035   Epoch: 16   Global Step: 271080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:57,696-Speed 8855.29 samples/sec   Loss 3.7831   LearningRate 0.0035   Epoch: 16   Global Step: 271090   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:58,791-Speed 9361.29 samples/sec   Loss 3.7226   LearningRate 0.0035   Epoch: 16   Global Step: 271100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:29:59,873-Speed 9475.54 samples/sec   Loss 3.7685   LearningRate 0.0035   Epoch: 16   Global Step: 271110   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:01,029-Speed 8858.89 samples/sec   Loss 3.7917   LearningRate 0.0035   Epoch: 16   Global Step: 271120   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:02,157-Speed 9084.94 samples/sec   Loss 3.7999   LearningRate 0.0035   Epoch: 16   Global Step: 271130   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:30:03,286-Speed 9077.13 samples/sec   Loss 3.8475   LearningRate 0.0035   Epoch: 16   Global Step: 271140   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:30:04,388-Speed 9295.20 samples/sec   Loss 3.8303   LearningRate 0.0035   Epoch: 16   Global Step: 271150   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:05,442-Speed 9713.75 samples/sec   Loss 3.8208   LearningRate 0.0035   Epoch: 16   Global Step: 271160   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:06,517-Speed 9539.29 samples/sec   Loss 3.7575   LearningRate 0.0035   Epoch: 16   Global Step: 271170   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:07,627-Speed 9228.92 samples/sec   Loss 3.7496   LearningRate 0.0035   Epoch: 16   Global Step: 271180   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:08,698-Speed 9565.35 samples/sec   Loss 3.7967   LearningRate 0.0035   Epoch: 16   Global Step: 271190   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:09,803-Speed 9276.55 samples/sec   Loss 3.7505   LearningRate 0.0035   Epoch: 16   Global Step: 271200   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:10,885-Speed 9469.28 samples/sec   Loss 3.9895   LearningRate 0.0035   Epoch: 16   Global Step: 271210   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:12,034-Speed 8913.62 samples/sec   Loss 3.8036   LearningRate 0.0035   Epoch: 16   Global Step: 271220   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:13,151-Speed 9176.09 samples/sec   Loss 3.8413   LearningRate 0.0035   Epoch: 16   Global Step: 271230   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:14,300-Speed 8918.78 samples/sec   Loss 3.8127   LearningRate 0.0035   Epoch: 16   Global Step: 271240   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:15,395-Speed 9352.27 samples/sec   Loss 3.8088   LearningRate 0.0035   Epoch: 16   Global Step: 271250   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:30:16,536-Speed 8987.10 samples/sec   Loss 3.7100   LearningRate 0.0035   Epoch: 16   Global Step: 271260   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:30:17,635-Speed 9329.14 samples/sec   Loss 3.7880   LearningRate 0.0035   Epoch: 16   Global Step: 271270   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:18,704-Speed 9579.88 samples/sec   Loss 3.7829   LearningRate 0.0035   Epoch: 16   Global Step: 271280   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:19,785-Speed 9479.73 samples/sec   Loss 3.7633   LearningRate 0.0035   Epoch: 16   Global Step: 271290   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:20,896-Speed 9222.33 samples/sec   Loss 3.8060   LearningRate 0.0035   Epoch: 16   Global Step: 271300   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:22,043-Speed 8933.75 samples/sec   Loss 3.7459   LearningRate 0.0035   Epoch: 16   Global Step: 271310   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:23,126-Speed 9461.33 samples/sec   Loss 3.7987   LearningRate 0.0035   Epoch: 16   Global Step: 271320   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:24,253-Speed 9093.37 samples/sec   Loss 3.8032   LearningRate 0.0035   Epoch: 16   Global Step: 271330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:25,303-Speed 9753.07 samples/sec   Loss 3.8103   LearningRate 0.0035   Epoch: 16   Global Step: 271340   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:26,466-Speed 8813.36 samples/sec   Loss 3.8236   LearningRate 0.0035   Epoch: 16   Global Step: 271350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:27,534-Speed 9587.57 samples/sec   Loss 3.7172   LearningRate 0.0035   Epoch: 16   Global Step: 271360   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:28,687-Speed 8892.80 samples/sec   Loss 3.7532   LearningRate 0.0035   Epoch: 16   Global Step: 271370   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:30:29,784-Speed 9335.28 samples/sec   Loss 3.7912   LearningRate 0.0035   Epoch: 16   Global Step: 271380   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:30,864-Speed 9490.69 samples/sec   Loss 3.8101   LearningRate 0.0035   Epoch: 16   Global Step: 271390   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:31,965-Speed 9304.09 samples/sec   Loss 3.7651   LearningRate 0.0035   Epoch: 16   Global Step: 271400   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:33,087-Speed 9130.01 samples/sec   Loss 3.7411   LearningRate 0.0035   Epoch: 16   Global Step: 271410   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:34,228-Speed 8979.93 samples/sec   Loss 3.8093   LearningRate 0.0035   Epoch: 16   Global Step: 271420   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:35,366-Speed 9003.99 samples/sec   Loss 3.8686   LearningRate 0.0035   Epoch: 16   Global Step: 271430   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:36,492-Speed 9100.74 samples/sec   Loss 3.8195   LearningRate 0.0035   Epoch: 16   Global Step: 271440   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:37,636-Speed 8956.09 samples/sec   Loss 3.7419   LearningRate 0.0035   Epoch: 16   Global Step: 271450   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:38,713-Speed 9510.98 samples/sec   Loss 3.7689   LearningRate 0.0035   Epoch: 16   Global Step: 271460   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:39,799-Speed 9439.70 samples/sec   Loss 3.7754   LearningRate 0.0035   Epoch: 16   Global Step: 271470   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:40,870-Speed 9571.67 samples/sec   Loss 3.7186   LearningRate 0.0035   Epoch: 16   Global Step: 271480   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:41,944-Speed 9539.21 samples/sec   Loss 3.7338   LearningRate 0.0035   Epoch: 16   Global Step: 271490   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:43,025-Speed 9473.06 samples/sec   Loss 3.8450   LearningRate 0.0035   Epoch: 16   Global Step: 271500   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:44,077-Speed 9745.49 samples/sec   Loss 3.7613   LearningRate 0.0035   Epoch: 16   Global Step: 271510   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:45,160-Speed 9457.61 samples/sec   Loss 3.7928   LearningRate 0.0035   Epoch: 16   Global Step: 271520   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:46,258-Speed 9333.44 samples/sec   Loss 3.8363   LearningRate 0.0035   Epoch: 16   Global Step: 271530   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:47,352-Speed 9374.06 samples/sec   Loss 3.7445   LearningRate 0.0035   Epoch: 16   Global Step: 271540   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:48,462-Speed 9226.70 samples/sec   Loss 3.7110   LearningRate 0.0035   Epoch: 16   Global Step: 271550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:49,564-Speed 9293.22 samples/sec   Loss 3.8174   LearningRate 0.0035   Epoch: 16   Global Step: 271560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:50,646-Speed 9470.45 samples/sec   Loss 3.8109   LearningRate 0.0035   Epoch: 16   Global Step: 271570   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:51,759-Speed 9211.08 samples/sec   Loss 3.8207   LearningRate 0.0035   Epoch: 16   Global Step: 271580   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:30:52,875-Speed 9178.71 samples/sec   Loss 3.8252   LearningRate 0.0035   Epoch: 16   Global Step: 271590   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:30:53,943-Speed 9593.52 samples/sec   Loss 3.7589   LearningRate 0.0035   Epoch: 16   Global Step: 271600   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:55,140-Speed 8558.27 samples/sec   Loss 3.8290   LearningRate 0.0035   Epoch: 16   Global Step: 271610   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:56,260-Speed 9147.84 samples/sec   Loss 3.8261   LearningRate 0.0035   Epoch: 16   Global Step: 271620   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:57,418-Speed 8846.29 samples/sec   Loss 3.7414   LearningRate 0.0035   Epoch: 16   Global Step: 271630   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:58,534-Speed 9190.08 samples/sec   Loss 3.8151   LearningRate 0.0035   Epoch: 16   Global Step: 271640   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:30:59,670-Speed 9016.70 samples/sec   Loss 3.8092   LearningRate 0.0035   Epoch: 16   Global Step: 271650   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:00,729-Speed 9675.91 samples/sec   Loss 3.7729   LearningRate 0.0035   Epoch: 16   Global Step: 271660   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:01,784-Speed 9713.51 samples/sec   Loss 3.8089   LearningRate 0.0035   Epoch: 16   Global Step: 271670   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:02,884-Speed 9307.54 samples/sec   Loss 3.8180   LearningRate 0.0035   Epoch: 16   Global Step: 271680   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:03,975-Speed 9393.38 samples/sec   Loss 3.7942   LearningRate 0.0035   Epoch: 16   Global Step: 271690   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:05,111-Speed 9016.55 samples/sec   Loss 3.7153   LearningRate 0.0035   Epoch: 16   Global Step: 271700   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:31:06,179-Speed 9593.96 samples/sec   Loss 3.7697   LearningRate 0.0035   Epoch: 16   Global Step: 271710   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:31:07,313-Speed 9036.71 samples/sec   Loss 3.8141   LearningRate 0.0035   Epoch: 16   Global Step: 271720   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:31:08,467-Speed 8883.21 samples/sec   Loss 3.9448   LearningRate 0.0035   Epoch: 16   Global Step: 271730   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:31:09,532-Speed 9624.63 samples/sec   Loss 3.8077   LearningRate 0.0035   Epoch: 16   Global Step: 271740   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:10,615-Speed 9456.66 samples/sec   Loss 3.8211   LearningRate 0.0035   Epoch: 16   Global Step: 271750   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:11,723-Speed 9253.76 samples/sec   Loss 3.8186   LearningRate 0.0035   Epoch: 16   Global Step: 271760   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:12,831-Speed 9242.14 samples/sec   Loss 3.7501   LearningRate 0.0035   Epoch: 16   Global Step: 271770   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:13,895-Speed 9633.13 samples/sec   Loss 3.8717   LearningRate 0.0035   Epoch: 16   Global Step: 271780   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:14,988-Speed 9372.49 samples/sec   Loss 3.8704   LearningRate 0.0035   Epoch: 16   Global Step: 271790   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:16,083-Speed 9358.17 samples/sec   Loss 3.8036   LearningRate 0.0035   Epoch: 16   Global Step: 271800   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:17,193-Speed 9232.15 samples/sec   Loss 3.7588   LearningRate 0.0035   Epoch: 16   Global Step: 271810   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:18,308-Speed 9187.09 samples/sec   Loss 3.7904   LearningRate 0.0034   Epoch: 16   Global Step: 271820   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:19,406-Speed 9331.33 samples/sec   Loss 3.9174   LearningRate 0.0034   Epoch: 16   Global Step: 271830   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:20,525-Speed 9156.81 samples/sec   Loss 3.8256   LearningRate 0.0034   Epoch: 16   Global Step: 271840   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:31:21,614-Speed 9408.86 samples/sec   Loss 3.8507   LearningRate 0.0034   Epoch: 16   Global Step: 271850   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:31:22,684-Speed 9575.21 samples/sec   Loss 3.8252   LearningRate 0.0034   Epoch: 16   Global Step: 271860   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:31:23,795-Speed 9217.78 samples/sec   Loss 3.8345   LearningRate 0.0034   Epoch: 16   Global Step: 271870   Fp16 Grad Scale: 131072   Required: 3 hours
Training: 2022-04-11 22:31:24,913-Speed 9166.59 samples/sec   Loss 3.7948   LearningRate 0.0034   Epoch: 16   Global Step: 271880   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:25,975-Speed 9644.30 samples/sec   Loss 3.7619   LearningRate 0.0034   Epoch: 16   Global Step: 271890   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-11 22:31:27,057-Speed 9470.26 samples/sec   Loss 3.7708   LearningRate 0.0034   Epoch: 16   Global Step: 271900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:31:28,225-Speed 8774.77 samples/sec   Loss 3.7733   LearningRate 0.0034   Epoch: 16   Global Step: 271910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:31:29,352-Speed 9090.13 samples/sec   Loss 3.7868   LearningRate 0.0034   Epoch: 16   Global Step: 271920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:31:30,437-Speed 9448.42 samples/sec   Loss 3.8361   LearningRate 0.0034   Epoch: 16   Global Step: 271930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:31:31,514-Speed 9512.36 samples/sec   Loss 3.8092   LearningRate 0.0034   Epoch: 16   Global Step: 271940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:31:32,624-Speed 9231.43 samples/sec   Loss 3.7939   LearningRate 0.0034   Epoch: 16   Global Step: 271950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:31:33,712-Speed 9420.86 samples/sec   Loss 3.9309   LearningRate 0.0034   Epoch: 16   Global Step: 271960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:31:34,803-Speed 9390.84 samples/sec   Loss 3.6996   LearningRate 0.0034   Epoch: 16   Global Step: 271970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:31:35,906-Speed 9288.91 samples/sec   Loss 3.7556   LearningRate 0.0034   Epoch: 16   Global Step: 271980   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:31:37,011-Speed 9269.31 samples/sec   Loss 3.7674   LearningRate 0.0034   Epoch: 16   Global Step: 271990   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:31:38,085-Speed 9535.03 samples/sec   Loss 3.7546   LearningRate 0.0034   Epoch: 16   Global Step: 272000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:32:00,167-[lfw][272000]XNorm: 6.936082
Training: 2022-04-11 22:32:00,168-[lfw][272000]Accuracy-Flip: 0.99683+-0.00311
Training: 2022-04-11 22:32:00,168-[lfw][272000]Accuracy-Highest: 0.99733
Training: 2022-04-11 22:32:25,642-[cfp_fp][272000]XNorm: 6.039087
Training: 2022-04-11 22:32:25,643-[cfp_fp][272000]Accuracy-Flip: 0.97029+-0.00887
Training: 2022-04-11 22:32:25,643-[cfp_fp][272000]Accuracy-Highest: 0.97171
Training: 2022-04-11 22:32:47,668-[agedb_30][272000]XNorm: 6.750198
Training: 2022-04-11 22:32:47,669-[agedb_30][272000]Accuracy-Flip: 0.97150+-0.00880
Training: 2022-04-11 22:32:47,669-[agedb_30][272000]Accuracy-Highest: 0.97350
Training: 2022-04-11 22:32:48,794-Speed 144.82 samples/sec   Loss 3.7791   LearningRate 0.0034   Epoch: 16   Global Step: 272010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:32:49,884-Speed 9397.35 samples/sec   Loss 3.7800   LearningRate 0.0034   Epoch: 16   Global Step: 272020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:32:51,013-Speed 9080.67 samples/sec   Loss 3.7325   LearningRate 0.0034   Epoch: 16   Global Step: 272030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:32:52,145-Speed 9044.80 samples/sec   Loss 3.8428   LearningRate 0.0034   Epoch: 16   Global Step: 272040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:32:53,229-Speed 9453.48 samples/sec   Loss 3.8092   LearningRate 0.0034   Epoch: 16   Global Step: 272050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:32:54,341-Speed 9210.05 samples/sec   Loss 3.8569   LearningRate 0.0034   Epoch: 16   Global Step: 272060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:32:55,427-Speed 9439.86 samples/sec   Loss 3.7896   LearningRate 0.0034   Epoch: 16   Global Step: 272070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:32:56,548-Speed 9143.55 samples/sec   Loss 3.7917   LearningRate 0.0034   Epoch: 16   Global Step: 272080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:32:57,726-Speed 8695.97 samples/sec   Loss 3.7697   LearningRate 0.0034   Epoch: 16   Global Step: 272090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:32:58,860-Speed 9032.99 samples/sec   Loss 3.8176   LearningRate 0.0034   Epoch: 16   Global Step: 272100   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:32:59,979-Speed 9160.52 samples/sec   Loss 3.8448   LearningRate 0.0034   Epoch: 16   Global Step: 272110   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:33:01,085-Speed 9262.05 samples/sec   Loss 3.8098   LearningRate 0.0034   Epoch: 16   Global Step: 272120   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:33:02,167-Speed 9467.37 samples/sec   Loss 3.8119   LearningRate 0.0034   Epoch: 16   Global Step: 272130   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:33:03,292-Speed 9105.38 samples/sec   Loss 3.8286   LearningRate 0.0034   Epoch: 16   Global Step: 272140   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:33:04,443-Speed 8905.55 samples/sec   Loss 3.7243   LearningRate 0.0034   Epoch: 16   Global Step: 272150   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:33:05,577-Speed 9032.36 samples/sec   Loss 3.8244   LearningRate 0.0034   Epoch: 16   Global Step: 272160   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:33:06,686-Speed 9239.36 samples/sec   Loss 3.8111   LearningRate 0.0034   Epoch: 16   Global Step: 272170   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:33:07,801-Speed 9189.19 samples/sec   Loss 3.7669   LearningRate 0.0034   Epoch: 16   Global Step: 272180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:33:08,910-Speed 9237.09 samples/sec   Loss 3.8070   LearningRate 0.0034   Epoch: 16   Global Step: 272190   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:33:09,990-Speed 9489.82 samples/sec   Loss 3.8460   LearningRate 0.0034   Epoch: 16   Global Step: 272200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:11,082-Speed 9387.10 samples/sec   Loss 3.8978   LearningRate 0.0034   Epoch: 16   Global Step: 272210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:12,207-Speed 9106.21 samples/sec   Loss 3.8799   LearningRate 0.0034   Epoch: 16   Global Step: 272220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:13,350-Speed 8963.29 samples/sec   Loss 3.8452   LearningRate 0.0034   Epoch: 16   Global Step: 272230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:14,492-Speed 8972.69 samples/sec   Loss 3.7783   LearningRate 0.0034   Epoch: 16   Global Step: 272240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:15,602-Speed 9231.75 samples/sec   Loss 3.8779   LearningRate 0.0034   Epoch: 16   Global Step: 272250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:16,723-Speed 9141.81 samples/sec   Loss 3.8734   LearningRate 0.0034   Epoch: 16   Global Step: 272260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:17,804-Speed 9483.80 samples/sec   Loss 3.7947   LearningRate 0.0034   Epoch: 16   Global Step: 272270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:18,965-Speed 8823.96 samples/sec   Loss 3.8151   LearningRate 0.0034   Epoch: 16   Global Step: 272280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:20,083-Speed 9159.68 samples/sec   Loss 3.7444   LearningRate 0.0034   Epoch: 16   Global Step: 272290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:21,201-Speed 9164.42 samples/sec   Loss 3.7155   LearningRate 0.0034   Epoch: 16   Global Step: 272300   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:33:22,321-Speed 9154.27 samples/sec   Loss 3.8495   LearningRate 0.0034   Epoch: 16   Global Step: 272310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:23,434-Speed 9204.85 samples/sec   Loss 3.8571   LearningRate 0.0034   Epoch: 16   Global Step: 272320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:24,510-Speed 9524.71 samples/sec   Loss 3.7675   LearningRate 0.0034   Epoch: 16   Global Step: 272330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:25,635-Speed 9106.85 samples/sec   Loss 3.8138   LearningRate 0.0034   Epoch: 16   Global Step: 272340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:26,721-Speed 9433.14 samples/sec   Loss 3.7456   LearningRate 0.0034   Epoch: 16   Global Step: 272350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:27,850-Speed 9075.96 samples/sec   Loss 3.6946   LearningRate 0.0034   Epoch: 16   Global Step: 272360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:28,959-Speed 9241.86 samples/sec   Loss 3.8668   LearningRate 0.0034   Epoch: 16   Global Step: 272370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:30,022-Speed 9638.19 samples/sec   Loss 3.8044   LearningRate 0.0034   Epoch: 16   Global Step: 272380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:31,111-Speed 9408.14 samples/sec   Loss 3.7887   LearningRate 0.0034   Epoch: 16   Global Step: 272390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:32,222-Speed 9227.83 samples/sec   Loss 3.9154   LearningRate 0.0034   Epoch: 16   Global Step: 272400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:33,381-Speed 8833.67 samples/sec   Loss 3.8039   LearningRate 0.0034   Epoch: 16   Global Step: 272410   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:33:34,450-Speed 9588.88 samples/sec   Loss 3.8658   LearningRate 0.0034   Epoch: 16   Global Step: 272420   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:33:35,546-Speed 9350.57 samples/sec   Loss 3.8460   LearningRate 0.0034   Epoch: 16   Global Step: 272430   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:33:36,663-Speed 9169.99 samples/sec   Loss 3.7799   LearningRate 0.0034   Epoch: 16   Global Step: 272440   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:33:37,803-Speed 8987.99 samples/sec   Loss 3.7885   LearningRate 0.0034   Epoch: 16   Global Step: 272450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:33:38,924-Speed 9141.24 samples/sec   Loss 3.7894   LearningRate 0.0034   Epoch: 16   Global Step: 272460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:40,032-Speed 9249.69 samples/sec   Loss 3.7441   LearningRate 0.0034   Epoch: 16   Global Step: 272470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:41,160-Speed 9084.90 samples/sec   Loss 3.6951   LearningRate 0.0034   Epoch: 16   Global Step: 272480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:42,226-Speed 9610.33 samples/sec   Loss 3.7101   LearningRate 0.0034   Epoch: 16   Global Step: 272490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:43,319-Speed 9370.45 samples/sec   Loss 3.7658   LearningRate 0.0034   Epoch: 16   Global Step: 272500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:44,415-Speed 9358.21 samples/sec   Loss 3.7748   LearningRate 0.0034   Epoch: 16   Global Step: 272510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:45,499-Speed 9443.87 samples/sec   Loss 3.7408   LearningRate 0.0034   Epoch: 16   Global Step: 272520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:46,615-Speed 9184.05 samples/sec   Loss 3.8157   LearningRate 0.0034   Epoch: 16   Global Step: 272530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:47,740-Speed 9102.25 samples/sec   Loss 3.8210   LearningRate 0.0034   Epoch: 16   Global Step: 272540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:48,870-Speed 9071.90 samples/sec   Loss 3.8433   LearningRate 0.0034   Epoch: 16   Global Step: 272550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:49,993-Speed 9122.06 samples/sec   Loss 3.8965   LearningRate 0.0034   Epoch: 16   Global Step: 272560   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:33:51,109-Speed 9185.63 samples/sec   Loss 3.8782   LearningRate 0.0034   Epoch: 16   Global Step: 272570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:52,258-Speed 8919.44 samples/sec   Loss 3.8077   LearningRate 0.0034   Epoch: 16   Global Step: 272580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:53,405-Speed 8932.54 samples/sec   Loss 3.7452   LearningRate 0.0034   Epoch: 16   Global Step: 272590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:54,519-Speed 9197.97 samples/sec   Loss 3.9018   LearningRate 0.0034   Epoch: 16   Global Step: 272600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:55,611-Speed 9385.03 samples/sec   Loss 3.8516   LearningRate 0.0034   Epoch: 16   Global Step: 272610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:56,781-Speed 8753.58 samples/sec   Loss 3.8418   LearningRate 0.0034   Epoch: 16   Global Step: 272620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:57,906-Speed 9105.22 samples/sec   Loss 3.7747   LearningRate 0.0034   Epoch: 16   Global Step: 272630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:33:59,044-Speed 9002.43 samples/sec   Loss 3.8492   LearningRate 0.0034   Epoch: 16   Global Step: 272640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:00,158-Speed 9197.02 samples/sec   Loss 3.8197   LearningRate 0.0034   Epoch: 16   Global Step: 272650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:01,275-Speed 9173.23 samples/sec   Loss 3.8283   LearningRate 0.0034   Epoch: 16   Global Step: 272660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:02,362-Speed 9428.49 samples/sec   Loss 3.8503   LearningRate 0.0034   Epoch: 16   Global Step: 272670   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:34:03,492-Speed 9070.67 samples/sec   Loss 3.7931   LearningRate 0.0034   Epoch: 16   Global Step: 272680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:34:04,607-Speed 9184.54 samples/sec   Loss 3.8314   LearningRate 0.0034   Epoch: 16   Global Step: 272690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:05,747-Speed 8993.15 samples/sec   Loss 3.7808   LearningRate 0.0034   Epoch: 16   Global Step: 272700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:06,843-Speed 9345.92 samples/sec   Loss 3.7610   LearningRate 0.0034   Epoch: 16   Global Step: 272710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:07,918-Speed 9531.62 samples/sec   Loss 3.8051   LearningRate 0.0034   Epoch: 16   Global Step: 272720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:09,018-Speed 9311.58 samples/sec   Loss 3.7978   LearningRate 0.0033   Epoch: 16   Global Step: 272730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:10,149-Speed 9061.08 samples/sec   Loss 3.7142   LearningRate 0.0033   Epoch: 16   Global Step: 272740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:11,235-Speed 9431.97 samples/sec   Loss 3.7978   LearningRate 0.0033   Epoch: 16   Global Step: 272750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:12,439-Speed 8510.08 samples/sec   Loss 3.8352   LearningRate 0.0033   Epoch: 16   Global Step: 272760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:13,546-Speed 9259.31 samples/sec   Loss 3.8039   LearningRate 0.0033   Epoch: 16   Global Step: 272770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:14,677-Speed 9061.59 samples/sec   Loss 3.8212   LearningRate 0.0033   Epoch: 16   Global Step: 272780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:15,781-Speed 9278.45 samples/sec   Loss 3.8104   LearningRate 0.0033   Epoch: 16   Global Step: 272790   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:34:16,946-Speed 8790.96 samples/sec   Loss 3.8230   LearningRate 0.0033   Epoch: 16   Global Step: 272800   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:34:18,059-Speed 9209.85 samples/sec   Loss 3.8519   LearningRate 0.0033   Epoch: 16   Global Step: 272810   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:34:19,185-Speed 9101.45 samples/sec   Loss 3.8131   LearningRate 0.0033   Epoch: 16   Global Step: 272820   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:34:20,261-Speed 9516.82 samples/sec   Loss 3.7817   LearningRate 0.0033   Epoch: 16   Global Step: 272830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:21,426-Speed 8797.24 samples/sec   Loss 3.8443   LearningRate 0.0033   Epoch: 16   Global Step: 272840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:22,522-Speed 9355.89 samples/sec   Loss 3.9326   LearningRate 0.0033   Epoch: 16   Global Step: 272850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:23,706-Speed 8651.04 samples/sec   Loss 3.8368   LearningRate 0.0033   Epoch: 16   Global Step: 272860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:24,816-Speed 9231.05 samples/sec   Loss 3.7488   LearningRate 0.0033   Epoch: 16   Global Step: 272870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:25,913-Speed 9334.93 samples/sec   Loss 3.8189   LearningRate 0.0033   Epoch: 16   Global Step: 272880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:27,031-Speed 9167.82 samples/sec   Loss 3.7768   LearningRate 0.0033   Epoch: 16   Global Step: 272890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:28,164-Speed 9042.93 samples/sec   Loss 3.8289   LearningRate 0.0033   Epoch: 16   Global Step: 272900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:29,273-Speed 9237.07 samples/sec   Loss 3.8223   LearningRate 0.0033   Epoch: 16   Global Step: 272910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:30,377-Speed 9284.62 samples/sec   Loss 3.7604   LearningRate 0.0033   Epoch: 16   Global Step: 272920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:31,531-Speed 8885.05 samples/sec   Loss 3.8744   LearningRate 0.0033   Epoch: 16   Global Step: 272930   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:34:32,653-Speed 9132.45 samples/sec   Loss 3.7476   LearningRate 0.0033   Epoch: 16   Global Step: 272940   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:34:33,769-Speed 9194.97 samples/sec   Loss 3.8737   LearningRate 0.0033   Epoch: 16   Global Step: 272950   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:34:34,889-Speed 9150.06 samples/sec   Loss 3.8233   LearningRate 0.0033   Epoch: 16   Global Step: 272960   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:34:35,983-Speed 9365.99 samples/sec   Loss 3.9168   LearningRate 0.0033   Epoch: 16   Global Step: 272970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:37,065-Speed 9471.50 samples/sec   Loss 3.7787   LearningRate 0.0033   Epoch: 16   Global Step: 272980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:38,140-Speed 9530.26 samples/sec   Loss 3.8313   LearningRate 0.0033   Epoch: 16   Global Step: 272990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:39,334-Speed 8579.42 samples/sec   Loss 3.8905   LearningRate 0.0033   Epoch: 16   Global Step: 273000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:40,393-Speed 9672.51 samples/sec   Loss 3.7193   LearningRate 0.0033   Epoch: 16   Global Step: 273010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:41,540-Speed 8931.47 samples/sec   Loss 3.8071   LearningRate 0.0033   Epoch: 16   Global Step: 273020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:42,666-Speed 9100.93 samples/sec   Loss 3.7703   LearningRate 0.0033   Epoch: 16   Global Step: 273030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:43,788-Speed 9131.36 samples/sec   Loss 3.8740   LearningRate 0.0033   Epoch: 16   Global Step: 273040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:44,949-Speed 8826.77 samples/sec   Loss 3.8666   LearningRate 0.0033   Epoch: 16   Global Step: 273050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:46,099-Speed 8910.02 samples/sec   Loss 3.8305   LearningRate 0.0033   Epoch: 16   Global Step: 273060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:47,216-Speed 9172.82 samples/sec   Loss 3.7998   LearningRate 0.0033   Epoch: 16   Global Step: 273070   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:34:48,321-Speed 9273.42 samples/sec   Loss 3.8185   LearningRate 0.0033   Epoch: 16   Global Step: 273080   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:34:49,433-Speed 9213.14 samples/sec   Loss 3.8107   LearningRate 0.0033   Epoch: 16   Global Step: 273090   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:34:50,517-Speed 9455.42 samples/sec   Loss 3.8036   LearningRate 0.0033   Epoch: 16   Global Step: 273100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:51,611-Speed 9363.28 samples/sec   Loss 3.7348   LearningRate 0.0033   Epoch: 16   Global Step: 273110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:52,712-Speed 9316.46 samples/sec   Loss 3.8412   LearningRate 0.0033   Epoch: 16   Global Step: 273120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:53,795-Speed 9456.24 samples/sec   Loss 3.8156   LearningRate 0.0033   Epoch: 16   Global Step: 273130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:54,870-Speed 9533.54 samples/sec   Loss 3.7223   LearningRate 0.0033   Epoch: 16   Global Step: 273140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:56,053-Speed 8658.16 samples/sec   Loss 3.7075   LearningRate 0.0033   Epoch: 16   Global Step: 273150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:57,213-Speed 8834.34 samples/sec   Loss 3.8389   LearningRate 0.0033   Epoch: 16   Global Step: 273160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:58,306-Speed 9375.84 samples/sec   Loss 3.8895   LearningRate 0.0033   Epoch: 16   Global Step: 273170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:34:59,423-Speed 9175.64 samples/sec   Loss 3.7560   LearningRate 0.0033   Epoch: 16   Global Step: 273180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:00,519-Speed 9348.27 samples/sec   Loss 3.8329   LearningRate 0.0033   Epoch: 16   Global Step: 273190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:01,674-Speed 8866.17 samples/sec   Loss 3.7767   LearningRate 0.0033   Epoch: 16   Global Step: 273200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:02,802-Speed 9086.70 samples/sec   Loss 3.8474   LearningRate 0.0033   Epoch: 16   Global Step: 273210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:03,895-Speed 9372.88 samples/sec   Loss 3.7995   LearningRate 0.0033   Epoch: 16   Global Step: 273220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:05,018-Speed 9124.51 samples/sec   Loss 3.7858   LearningRate 0.0033   Epoch: 16   Global Step: 273230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:06,180-Speed 8816.83 samples/sec   Loss 3.8758   LearningRate 0.0033   Epoch: 16   Global Step: 273240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:07,296-Speed 9183.36 samples/sec   Loss 3.7206   LearningRate 0.0033   Epoch: 16   Global Step: 273250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:08,443-Speed 8933.58 samples/sec   Loss 3.8128   LearningRate 0.0033   Epoch: 16   Global Step: 273260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:09,604-Speed 8820.76 samples/sec   Loss 3.8735   LearningRate 0.0033   Epoch: 16   Global Step: 273270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:10,696-Speed 9380.48 samples/sec   Loss 3.8377   LearningRate 0.0033   Epoch: 16   Global Step: 273280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:11,834-Speed 9005.02 samples/sec   Loss 3.7971   LearningRate 0.0033   Epoch: 16   Global Step: 273290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:12,968-Speed 9033.11 samples/sec   Loss 3.8359   LearningRate 0.0033   Epoch: 16   Global Step: 273300   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:35:14,101-Speed 9042.69 samples/sec   Loss 3.8081   LearningRate 0.0033   Epoch: 16   Global Step: 273310   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:35:15,214-Speed 9214.85 samples/sec   Loss 3.7573   LearningRate 0.0033   Epoch: 16   Global Step: 273320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:35:16,317-Speed 9289.00 samples/sec   Loss 3.8139   LearningRate 0.0033   Epoch: 16   Global Step: 273330   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:35:17,474-Speed 8855.88 samples/sec   Loss 3.8613   LearningRate 0.0033   Epoch: 16   Global Step: 273340   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:35:18,623-Speed 8909.91 samples/sec   Loss 3.7964   LearningRate 0.0033   Epoch: 16   Global Step: 273350   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:35:19,713-Speed 9406.47 samples/sec   Loss 3.9223   LearningRate 0.0033   Epoch: 16   Global Step: 273360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:20,793-Speed 9480.32 samples/sec   Loss 3.8334   LearningRate 0.0033   Epoch: 16   Global Step: 273370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:21,936-Speed 8966.43 samples/sec   Loss 3.8694   LearningRate 0.0033   Epoch: 16   Global Step: 273380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:23,054-Speed 9164.22 samples/sec   Loss 3.7634   LearningRate 0.0033   Epoch: 16   Global Step: 273390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:24,171-Speed 9173.25 samples/sec   Loss 3.8142   LearningRate 0.0033   Epoch: 16   Global Step: 273400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:25,307-Speed 9017.89 samples/sec   Loss 3.8976   LearningRate 0.0033   Epoch: 16   Global Step: 273410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:26,401-Speed 9369.87 samples/sec   Loss 3.8463   LearningRate 0.0033   Epoch: 16   Global Step: 273420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:27,513-Speed 9218.94 samples/sec   Loss 3.8586   LearningRate 0.0033   Epoch: 16   Global Step: 273430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:28,581-Speed 9587.36 samples/sec   Loss 3.7175   LearningRate 0.0033   Epoch: 16   Global Step: 273440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:29,662-Speed 9482.61 samples/sec   Loss 3.7942   LearningRate 0.0033   Epoch: 16   Global Step: 273450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:30,747-Speed 9437.38 samples/sec   Loss 3.8568   LearningRate 0.0033   Epoch: 16   Global Step: 273460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:31,849-Speed 9317.78 samples/sec   Loss 3.8397   LearningRate 0.0033   Epoch: 16   Global Step: 273470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:32,949-Speed 9316.23 samples/sec   Loss 3.8012   LearningRate 0.0033   Epoch: 16   Global Step: 273480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:34,017-Speed 9592.15 samples/sec   Loss 3.8636   LearningRate 0.0033   Epoch: 16   Global Step: 273490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:35,139-Speed 9132.03 samples/sec   Loss 3.8707   LearningRate 0.0033   Epoch: 16   Global Step: 273500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:36,233-Speed 9360.66 samples/sec   Loss 3.7988   LearningRate 0.0033   Epoch: 16   Global Step: 273510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:37,337-Speed 9286.47 samples/sec   Loss 3.7757   LearningRate 0.0033   Epoch: 16   Global Step: 273520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:38,518-Speed 8676.92 samples/sec   Loss 3.8050   LearningRate 0.0033   Epoch: 16   Global Step: 273530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:39,591-Speed 9548.48 samples/sec   Loss 3.7909   LearningRate 0.0033   Epoch: 16   Global Step: 273540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:40,637-Speed 9792.50 samples/sec   Loss 3.8548   LearningRate 0.0033   Epoch: 16   Global Step: 273550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:41,753-Speed 9181.53 samples/sec   Loss 3.8213   LearningRate 0.0033   Epoch: 16   Global Step: 273560   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:35:42,864-Speed 9218.93 samples/sec   Loss 3.8094   LearningRate 0.0033   Epoch: 16   Global Step: 273570   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:35:43,975-Speed 9229.35 samples/sec   Loss 3.8649   LearningRate 0.0033   Epoch: 16   Global Step: 273580   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:35:45,049-Speed 9544.32 samples/sec   Loss 3.8024   LearningRate 0.0033   Epoch: 16   Global Step: 273590   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:35:46,120-Speed 9568.22 samples/sec   Loss 3.7618   LearningRate 0.0033   Epoch: 16   Global Step: 273600   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:35:47,203-Speed 9463.52 samples/sec   Loss 3.7505   LearningRate 0.0033   Epoch: 16   Global Step: 273610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:48,292-Speed 9408.59 samples/sec   Loss 3.7672   LearningRate 0.0033   Epoch: 16   Global Step: 273620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:49,396-Speed 9279.36 samples/sec   Loss 3.7473   LearningRate 0.0033   Epoch: 16   Global Step: 273630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:50,588-Speed 8597.17 samples/sec   Loss 3.7689   LearningRate 0.0032   Epoch: 16   Global Step: 273640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:51,728-Speed 8989.95 samples/sec   Loss 3.8494   LearningRate 0.0032   Epoch: 16   Global Step: 273650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:52,875-Speed 8931.83 samples/sec   Loss 3.8320   LearningRate 0.0032   Epoch: 16   Global Step: 273660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:54,057-Speed 8663.15 samples/sec   Loss 3.8896   LearningRate 0.0032   Epoch: 16   Global Step: 273670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:55,229-Speed 8748.31 samples/sec   Loss 3.8016   LearningRate 0.0032   Epoch: 16   Global Step: 273680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:56,349-Speed 9143.38 samples/sec   Loss 3.7956   LearningRate 0.0032   Epoch: 16   Global Step: 273690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:57,458-Speed 9239.60 samples/sec   Loss 3.7329   LearningRate 0.0032   Epoch: 16   Global Step: 273700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:58,515-Speed 9693.09 samples/sec   Loss 3.8371   LearningRate 0.0032   Epoch: 16   Global Step: 273710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:35:59,687-Speed 8746.64 samples/sec   Loss 3.8010   LearningRate 0.0032   Epoch: 16   Global Step: 273720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:00,821-Speed 9031.84 samples/sec   Loss 3.8885   LearningRate 0.0032   Epoch: 16   Global Step: 273730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:01,953-Speed 9051.61 samples/sec   Loss 3.9324   LearningRate 0.0032   Epoch: 16   Global Step: 273740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:03,051-Speed 9336.19 samples/sec   Loss 3.8389   LearningRate 0.0032   Epoch: 16   Global Step: 273750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:04,171-Speed 9156.05 samples/sec   Loss 3.8474   LearningRate 0.0032   Epoch: 16   Global Step: 273760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:05,335-Speed 8797.78 samples/sec   Loss 3.9005   LearningRate 0.0032   Epoch: 16   Global Step: 273770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:06,443-Speed 9245.16 samples/sec   Loss 3.7495   LearningRate 0.0032   Epoch: 16   Global Step: 273780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:07,575-Speed 9052.01 samples/sec   Loss 3.8211   LearningRate 0.0032   Epoch: 16   Global Step: 273790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:08,693-Speed 9168.72 samples/sec   Loss 3.8035   LearningRate 0.0032   Epoch: 16   Global Step: 273800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:09,799-Speed 9258.36 samples/sec   Loss 3.7353   LearningRate 0.0032   Epoch: 16   Global Step: 273810   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:36:10,875-Speed 9523.97 samples/sec   Loss 3.7721   LearningRate 0.0032   Epoch: 16   Global Step: 273820   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:36:11,977-Speed 9296.15 samples/sec   Loss 3.7303   LearningRate 0.0032   Epoch: 16   Global Step: 273830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:13,112-Speed 9032.70 samples/sec   Loss 3.8234   LearningRate 0.0032   Epoch: 16   Global Step: 273840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:14,220-Speed 9245.09 samples/sec   Loss 3.8370   LearningRate 0.0032   Epoch: 16   Global Step: 273850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:15,337-Speed 9172.78 samples/sec   Loss 3.8008   LearningRate 0.0032   Epoch: 16   Global Step: 273860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:16,459-Speed 9132.51 samples/sec   Loss 3.8309   LearningRate 0.0032   Epoch: 16   Global Step: 273870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:17,535-Speed 9525.46 samples/sec   Loss 3.7907   LearningRate 0.0032   Epoch: 16   Global Step: 273880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:18,670-Speed 9022.27 samples/sec   Loss 3.8052   LearningRate 0.0032   Epoch: 16   Global Step: 273890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:19,788-Speed 9166.86 samples/sec   Loss 3.8122   LearningRate 0.0032   Epoch: 16   Global Step: 273900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:20,897-Speed 9240.68 samples/sec   Loss 3.8468   LearningRate 0.0032   Epoch: 16   Global Step: 273910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:21,996-Speed 9319.51 samples/sec   Loss 3.8558   LearningRate 0.0032   Epoch: 16   Global Step: 273920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:36:23,091-Speed 9360.94 samples/sec   Loss 3.8098   LearningRate 0.0032   Epoch: 16   Global Step: 273930   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:36:24,170-Speed 9494.62 samples/sec   Loss 3.7908   LearningRate 0.0032   Epoch: 16   Global Step: 273940   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:36:25,274-Speed 9279.56 samples/sec   Loss 3.7114   LearningRate 0.0032   Epoch: 16   Global Step: 273950   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:36:26,397-Speed 9126.78 samples/sec   Loss 3.7893   LearningRate 0.0032   Epoch: 16   Global Step: 273960   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:36:27,492-Speed 9355.54 samples/sec   Loss 3.8596   LearningRate 0.0032   Epoch: 16   Global Step: 273970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:36:28,595-Speed 9287.34 samples/sec   Loss 3.8369   LearningRate 0.0032   Epoch: 16   Global Step: 273980   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:36:29,719-Speed 9114.43 samples/sec   Loss 3.8530   LearningRate 0.0032   Epoch: 16   Global Step: 273990   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:36:30,858-Speed 8999.62 samples/sec   Loss 3.8803   LearningRate 0.0032   Epoch: 16   Global Step: 274000   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:36:53,035-[lfw][274000]XNorm: 6.892952
Training: 2022-04-11 22:36:53,035-[lfw][274000]Accuracy-Flip: 0.99717+-0.00308
Training: 2022-04-11 22:36:53,036-[lfw][274000]Accuracy-Highest: 0.99733
Training: 2022-04-11 22:37:18,574-[cfp_fp][274000]XNorm: 5.968948
Training: 2022-04-11 22:37:18,574-[cfp_fp][274000]Accuracy-Flip: 0.97157+-0.00794
Training: 2022-04-11 22:37:18,575-[cfp_fp][274000]Accuracy-Highest: 0.97171
Training: 2022-04-11 22:37:40,607-[agedb_30][274000]XNorm: 6.717413
Training: 2022-04-11 22:37:40,607-[agedb_30][274000]Accuracy-Flip: 0.97050+-0.00966
Training: 2022-04-11 22:37:40,608-[agedb_30][274000]Accuracy-Highest: 0.97350
Training: 2022-04-11 22:37:41,737-Speed 144.47 samples/sec   Loss 3.8143   LearningRate 0.0032   Epoch: 16   Global Step: 274010   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:37:42,781-Speed 9816.50 samples/sec   Loss 3.8650   LearningRate 0.0032   Epoch: 16   Global Step: 274020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:37:43,889-Speed 9241.46 samples/sec   Loss 3.8661   LearningRate 0.0032   Epoch: 16   Global Step: 274030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:37:44,999-Speed 9233.86 samples/sec   Loss 3.8201   LearningRate 0.0032   Epoch: 16   Global Step: 274040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:37:46,059-Speed 9669.37 samples/sec   Loss 3.8441   LearningRate 0.0032   Epoch: 16   Global Step: 274050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:37:47,163-Speed 9280.33 samples/sec   Loss 3.9227   LearningRate 0.0032   Epoch: 16   Global Step: 274060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:37:48,242-Speed 9493.11 samples/sec   Loss 3.8145   LearningRate 0.0032   Epoch: 16   Global Step: 274070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:37:49,342-Speed 9321.49 samples/sec   Loss 3.7796   LearningRate 0.0032   Epoch: 16   Global Step: 274080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:37:50,482-Speed 8984.54 samples/sec   Loss 3.8085   LearningRate 0.0032   Epoch: 16   Global Step: 274090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:37:51,580-Speed 9333.55 samples/sec   Loss 3.8614   LearningRate 0.0032   Epoch: 16   Global Step: 274100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:37:52,712-Speed 9052.88 samples/sec   Loss 3.8404   LearningRate 0.0032   Epoch: 16   Global Step: 274110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:37:53,848-Speed 9019.00 samples/sec   Loss 3.8200   LearningRate 0.0032   Epoch: 16   Global Step: 274120   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:37:55,045-Speed 8557.22 samples/sec   Loss 3.8289   LearningRate 0.0032   Epoch: 16   Global Step: 274130   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:37:56,136-Speed 9392.05 samples/sec   Loss 3.8610   LearningRate 0.0032   Epoch: 16   Global Step: 274140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:37:57,220-Speed 9454.37 samples/sec   Loss 3.8743   LearningRate 0.0032   Epoch: 16   Global Step: 274150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:37:58,295-Speed 9531.09 samples/sec   Loss 3.8398   LearningRate 0.0032   Epoch: 16   Global Step: 274160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:37:59,451-Speed 8859.64 samples/sec   Loss 3.8392   LearningRate 0.0032   Epoch: 16   Global Step: 274170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:00,617-Speed 8785.15 samples/sec   Loss 3.8053   LearningRate 0.0032   Epoch: 16   Global Step: 274180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:01,696-Speed 9501.99 samples/sec   Loss 3.8567   LearningRate 0.0032   Epoch: 16   Global Step: 274190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:02,812-Speed 9177.42 samples/sec   Loss 3.7856   LearningRate 0.0032   Epoch: 16   Global Step: 274200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:03,980-Speed 8779.22 samples/sec   Loss 3.8758   LearningRate 0.0032   Epoch: 16   Global Step: 274210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:05,109-Speed 9070.61 samples/sec   Loss 3.8674   LearningRate 0.0032   Epoch: 16   Global Step: 274220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:06,240-Speed 9056.00 samples/sec   Loss 3.8508   LearningRate 0.0032   Epoch: 16   Global Step: 274230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:07,360-Speed 9153.32 samples/sec   Loss 3.8552   LearningRate 0.0032   Epoch: 16   Global Step: 274240   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:38:08,523-Speed 8805.01 samples/sec   Loss 3.8379   LearningRate 0.0032   Epoch: 16   Global Step: 274250   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:38:09,649-Speed 9102.77 samples/sec   Loss 3.8648   LearningRate 0.0032   Epoch: 16   Global Step: 274260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:38:10,776-Speed 9095.27 samples/sec   Loss 3.8749   LearningRate 0.0032   Epoch: 16   Global Step: 274270   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:38:11,909-Speed 9042.16 samples/sec   Loss 3.7311   LearningRate 0.0032   Epoch: 16   Global Step: 274280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:38:13,026-Speed 9170.97 samples/sec   Loss 3.8184   LearningRate 0.0032   Epoch: 16   Global Step: 274290   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:38:14,175-Speed 8916.96 samples/sec   Loss 3.8390   LearningRate 0.0032   Epoch: 16   Global Step: 274300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:15,271-Speed 9350.34 samples/sec   Loss 3.8519   LearningRate 0.0032   Epoch: 16   Global Step: 274310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:16,419-Speed 8926.70 samples/sec   Loss 3.8068   LearningRate 0.0032   Epoch: 16   Global Step: 274320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:17,522-Speed 9284.12 samples/sec   Loss 3.8112   LearningRate 0.0032   Epoch: 16   Global Step: 274330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:18,667-Speed 8955.10 samples/sec   Loss 3.8785   LearningRate 0.0032   Epoch: 16   Global Step: 274340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:19,799-Speed 9047.14 samples/sec   Loss 3.8197   LearningRate 0.0032   Epoch: 16   Global Step: 274350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:20,882-Speed 9458.34 samples/sec   Loss 3.7957   LearningRate 0.0032   Epoch: 16   Global Step: 274360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:22,056-Speed 8732.93 samples/sec   Loss 3.8744   LearningRate 0.0032   Epoch: 16   Global Step: 274370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:23,189-Speed 9036.51 samples/sec   Loss 3.9533   LearningRate 0.0032   Epoch: 16   Global Step: 274380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:24,364-Speed 8724.47 samples/sec   Loss 3.8657   LearningRate 0.0032   Epoch: 16   Global Step: 274390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:25,480-Speed 9178.13 samples/sec   Loss 3.8245   LearningRate 0.0032   Epoch: 16   Global Step: 274400   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:38:26,574-Speed 9366.07 samples/sec   Loss 3.8712   LearningRate 0.0032   Epoch: 16   Global Step: 274410   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:38:27,653-Speed 9494.56 samples/sec   Loss 3.8717   LearningRate 0.0032   Epoch: 16   Global Step: 274420   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:38:28,726-Speed 9553.69 samples/sec   Loss 3.8179   LearningRate 0.0032   Epoch: 16   Global Step: 274430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:29,834-Speed 9247.89 samples/sec   Loss 3.8553   LearningRate 0.0032   Epoch: 16   Global Step: 274440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:30,943-Speed 9232.80 samples/sec   Loss 3.8291   LearningRate 0.0032   Epoch: 16   Global Step: 274450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:32,073-Speed 9067.89 samples/sec   Loss 3.8346   LearningRate 0.0032   Epoch: 16   Global Step: 274460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:33,248-Speed 8723.04 samples/sec   Loss 3.8523   LearningRate 0.0032   Epoch: 16   Global Step: 274470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:34,358-Speed 9235.96 samples/sec   Loss 3.8337   LearningRate 0.0032   Epoch: 16   Global Step: 274480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:35,427-Speed 9580.12 samples/sec   Loss 3.8725   LearningRate 0.0032   Epoch: 16   Global Step: 274490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:36,601-Speed 8728.01 samples/sec   Loss 3.8163   LearningRate 0.0032   Epoch: 16   Global Step: 274500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:37,708-Speed 9258.20 samples/sec   Loss 3.7352   LearningRate 0.0032   Epoch: 16   Global Step: 274510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:38,807-Speed 9321.57 samples/sec   Loss 3.8354   LearningRate 0.0032   Epoch: 16   Global Step: 274520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:39,943-Speed 9020.72 samples/sec   Loss 3.8418   LearningRate 0.0032   Epoch: 16   Global Step: 274530   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:38:41,082-Speed 8994.14 samples/sec   Loss 3.7364   LearningRate 0.0032   Epoch: 16   Global Step: 274540   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:38:42,177-Speed 9352.20 samples/sec   Loss 3.8265   LearningRate 0.0032   Epoch: 16   Global Step: 274550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:43,287-Speed 9236.23 samples/sec   Loss 3.8087   LearningRate 0.0032   Epoch: 16   Global Step: 274560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:44,441-Speed 8873.58 samples/sec   Loss 3.8417   LearningRate 0.0032   Epoch: 16   Global Step: 274570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:45,630-Speed 8621.10 samples/sec   Loss 3.8684   LearningRate 0.0031   Epoch: 16   Global Step: 274580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:46,739-Speed 9239.15 samples/sec   Loss 3.9231   LearningRate 0.0031   Epoch: 16   Global Step: 274590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:38:47,823-Speed 9452.18 samples/sec   Loss 3.7636   LearningRate 0.0031   Epoch: 16   Global Step: 274600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:38:48,935-Speed 9219.50 samples/sec   Loss 3.7922   LearningRate 0.0031   Epoch: 16   Global Step: 274610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:38:50,029-Speed 9364.87 samples/sec   Loss 3.8567   LearningRate 0.0031   Epoch: 16   Global Step: 274620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:38:51,177-Speed 8919.82 samples/sec   Loss 3.8260   LearningRate 0.0031   Epoch: 16   Global Step: 274630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:38:52,252-Speed 9531.95 samples/sec   Loss 3.7376   LearningRate 0.0031   Epoch: 16   Global Step: 274640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:38:53,364-Speed 9213.74 samples/sec   Loss 3.8742   LearningRate 0.0031   Epoch: 16   Global Step: 274650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:38:54,530-Speed 8789.89 samples/sec   Loss 3.8613   LearningRate 0.0031   Epoch: 16   Global Step: 274660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:38:55,662-Speed 9048.92 samples/sec   Loss 3.8471   LearningRate 0.0031   Epoch: 16   Global Step: 274670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:38:56,806-Speed 8957.11 samples/sec   Loss 3.7781   LearningRate 0.0031   Epoch: 16   Global Step: 274680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:38:57,902-Speed 9345.09 samples/sec   Loss 3.8320   LearningRate 0.0031   Epoch: 16   Global Step: 274690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:38:59,068-Speed 8785.49 samples/sec   Loss 3.8010   LearningRate 0.0031   Epoch: 16   Global Step: 274700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:00,179-Speed 9220.62 samples/sec   Loss 3.7463   LearningRate 0.0031   Epoch: 16   Global Step: 274710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:01,297-Speed 9168.87 samples/sec   Loss 3.8537   LearningRate 0.0031   Epoch: 16   Global Step: 274720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:02,358-Speed 9651.34 samples/sec   Loss 3.8776   LearningRate 0.0031   Epoch: 16   Global Step: 274730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:03,439-Speed 9478.71 samples/sec   Loss 3.8195   LearningRate 0.0031   Epoch: 16   Global Step: 274740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:04,539-Speed 9318.28 samples/sec   Loss 3.8610   LearningRate 0.0031   Epoch: 16   Global Step: 274750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:05,674-Speed 9032.11 samples/sec   Loss 3.7814   LearningRate 0.0031   Epoch: 16   Global Step: 274760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:06,763-Speed 9408.27 samples/sec   Loss 3.8352   LearningRate 0.0031   Epoch: 16   Global Step: 274770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:07,872-Speed 9241.41 samples/sec   Loss 3.7549   LearningRate 0.0031   Epoch: 16   Global Step: 274780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:08,975-Speed 9288.94 samples/sec   Loss 3.8909   LearningRate 0.0031   Epoch: 16   Global Step: 274790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:10,073-Speed 9332.25 samples/sec   Loss 3.8255   LearningRate 0.0031   Epoch: 16   Global Step: 274800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:11,178-Speed 9273.65 samples/sec   Loss 3.7578   LearningRate 0.0031   Epoch: 16   Global Step: 274810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:12,294-Speed 9178.23 samples/sec   Loss 3.8769   LearningRate 0.0031   Epoch: 16   Global Step: 274820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:13,449-Speed 8873.27 samples/sec   Loss 3.9462   LearningRate 0.0031   Epoch: 16   Global Step: 274830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:14,619-Speed 8753.94 samples/sec   Loss 3.8606   LearningRate 0.0031   Epoch: 16   Global Step: 274840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:15,685-Speed 9614.17 samples/sec   Loss 3.8144   LearningRate 0.0031   Epoch: 16   Global Step: 274850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:16,821-Speed 9016.68 samples/sec   Loss 3.8677   LearningRate 0.0031   Epoch: 16   Global Step: 274860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:17,911-Speed 9401.25 samples/sec   Loss 3.8053   LearningRate 0.0031   Epoch: 16   Global Step: 274870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:18,996-Speed 9440.63 samples/sec   Loss 3.7804   LearningRate 0.0031   Epoch: 16   Global Step: 274880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:20,115-Speed 9163.07 samples/sec   Loss 3.8429   LearningRate 0.0031   Epoch: 16   Global Step: 274890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:21,245-Speed 9061.56 samples/sec   Loss 3.8099   LearningRate 0.0031   Epoch: 16   Global Step: 274900   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:39:22,390-Speed 8955.20 samples/sec   Loss 3.8736   LearningRate 0.0031   Epoch: 16   Global Step: 274910   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:39:23,490-Speed 9317.39 samples/sec   Loss 3.7794   LearningRate 0.0031   Epoch: 16   Global Step: 274920   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:39:24,623-Speed 9044.01 samples/sec   Loss 3.7779   LearningRate 0.0031   Epoch: 16   Global Step: 274930   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:39:25,769-Speed 8938.15 samples/sec   Loss 3.8808   LearningRate 0.0031   Epoch: 16   Global Step: 274940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:26,908-Speed 8997.60 samples/sec   Loss 3.8104   LearningRate 0.0031   Epoch: 16   Global Step: 274950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:28,023-Speed 9189.23 samples/sec   Loss 3.8464   LearningRate 0.0031   Epoch: 16   Global Step: 274960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:29,158-Speed 9024.49 samples/sec   Loss 3.8725   LearningRate 0.0031   Epoch: 16   Global Step: 274970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:30,293-Speed 9028.06 samples/sec   Loss 3.8110   LearningRate 0.0031   Epoch: 16   Global Step: 274980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:31,413-Speed 9147.09 samples/sec   Loss 3.8236   LearningRate 0.0031   Epoch: 16   Global Step: 274990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:32,551-Speed 9001.74 samples/sec   Loss 3.8648   LearningRate 0.0031   Epoch: 16   Global Step: 275000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:33,684-Speed 9047.02 samples/sec   Loss 3.8817   LearningRate 0.0031   Epoch: 16   Global Step: 275010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:34,849-Speed 8792.53 samples/sec   Loss 3.9426   LearningRate 0.0031   Epoch: 16   Global Step: 275020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:35,961-Speed 9218.79 samples/sec   Loss 3.9405   LearningRate 0.0031   Epoch: 16   Global Step: 275030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:37,046-Speed 9444.83 samples/sec   Loss 3.8438   LearningRate 0.0031   Epoch: 16   Global Step: 275040   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:39:38,207-Speed 8824.04 samples/sec   Loss 3.7765   LearningRate 0.0031   Epoch: 16   Global Step: 275050   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:39:39,320-Speed 9201.22 samples/sec   Loss 3.8159   LearningRate 0.0031   Epoch: 16   Global Step: 275060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:40,473-Speed 8889.07 samples/sec   Loss 3.7910   LearningRate 0.0031   Epoch: 16   Global Step: 275070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:41,612-Speed 8995.04 samples/sec   Loss 3.8727   LearningRate 0.0031   Epoch: 16   Global Step: 275080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:42,726-Speed 9193.16 samples/sec   Loss 3.8672   LearningRate 0.0031   Epoch: 16   Global Step: 275090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:43,873-Speed 8937.27 samples/sec   Loss 3.8299   LearningRate 0.0031   Epoch: 16   Global Step: 275100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:44,989-Speed 9183.88 samples/sec   Loss 3.8532   LearningRate 0.0031   Epoch: 16   Global Step: 275110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:46,093-Speed 9280.12 samples/sec   Loss 3.7781   LearningRate 0.0031   Epoch: 16   Global Step: 275120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:47,243-Speed 8904.39 samples/sec   Loss 3.7875   LearningRate 0.0031   Epoch: 16   Global Step: 275130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:48,332-Speed 9410.91 samples/sec   Loss 3.9310   LearningRate 0.0031   Epoch: 16   Global Step: 275140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:49,460-Speed 9084.80 samples/sec   Loss 3.8636   LearningRate 0.0031   Epoch: 16   Global Step: 275150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:50,583-Speed 9122.85 samples/sec   Loss 3.7395   LearningRate 0.0031   Epoch: 16   Global Step: 275160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:51,720-Speed 9011.68 samples/sec   Loss 3.8562   LearningRate 0.0031   Epoch: 16   Global Step: 275170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:52,838-Speed 9166.13 samples/sec   Loss 3.8243   LearningRate 0.0031   Epoch: 16   Global Step: 275180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:53,974-Speed 9023.43 samples/sec   Loss 3.8030   LearningRate 0.0031   Epoch: 16   Global Step: 275190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:55,109-Speed 9022.19 samples/sec   Loss 3.7513   LearningRate 0.0031   Epoch: 16   Global Step: 275200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:56,252-Speed 8960.53 samples/sec   Loss 3.9401   LearningRate 0.0031   Epoch: 16   Global Step: 275210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:57,392-Speed 8987.32 samples/sec   Loss 3.7834   LearningRate 0.0031   Epoch: 16   Global Step: 275220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:58,474-Speed 9477.10 samples/sec   Loss 3.7562   LearningRate 0.0031   Epoch: 16   Global Step: 275230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:39:59,628-Speed 8870.67 samples/sec   Loss 3.8307   LearningRate 0.0031   Epoch: 16   Global Step: 275240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:00,753-Speed 9111.52 samples/sec   Loss 3.8095   LearningRate 0.0031   Epoch: 16   Global Step: 275250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:01,890-Speed 9016.16 samples/sec   Loss 3.8277   LearningRate 0.0031   Epoch: 16   Global Step: 275260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:40:03,002-Speed 9209.96 samples/sec   Loss 3.8637   LearningRate 0.0031   Epoch: 16   Global Step: 275270   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:40:04,158-Speed 8863.94 samples/sec   Loss 3.8592   LearningRate 0.0031   Epoch: 16   Global Step: 275280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:40:05,263-Speed 9278.21 samples/sec   Loss 3.7921   LearningRate 0.0031   Epoch: 16   Global Step: 275290   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:40:06,387-Speed 9115.16 samples/sec   Loss 3.7680   LearningRate 0.0031   Epoch: 16   Global Step: 275300   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:40:07,494-Speed 9254.53 samples/sec   Loss 3.8839   LearningRate 0.0031   Epoch: 16   Global Step: 275310   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:40:08,568-Speed 9538.38 samples/sec   Loss 3.8952   LearningRate 0.0031   Epoch: 16   Global Step: 275320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:40:09,648-Speed 9482.54 samples/sec   Loss 3.8605   LearningRate 0.0031   Epoch: 16   Global Step: 275330   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:40:10,734-Speed 9439.20 samples/sec   Loss 3.8262   LearningRate 0.0031   Epoch: 16   Global Step: 275340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:11,863-Speed 9068.13 samples/sec   Loss 3.7867   LearningRate 0.0031   Epoch: 16   Global Step: 275350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:13,055-Speed 8599.08 samples/sec   Loss 3.7444   LearningRate 0.0031   Epoch: 16   Global Step: 275360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:14,184-Speed 9074.13 samples/sec   Loss 3.8635   LearningRate 0.0031   Epoch: 16   Global Step: 275370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:15,249-Speed 9620.15 samples/sec   Loss 3.8622   LearningRate 0.0031   Epoch: 16   Global Step: 275380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:16,387-Speed 9003.95 samples/sec   Loss 3.7997   LearningRate 0.0031   Epoch: 16   Global Step: 275390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:17,495-Speed 9245.12 samples/sec   Loss 3.8313   LearningRate 0.0031   Epoch: 16   Global Step: 275400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:18,571-Speed 9525.95 samples/sec   Loss 3.7777   LearningRate 0.0031   Epoch: 16   Global Step: 275410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:19,664-Speed 9375.60 samples/sec   Loss 3.9297   LearningRate 0.0031   Epoch: 16   Global Step: 275420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:20,749-Speed 9446.27 samples/sec   Loss 3.9260   LearningRate 0.0031   Epoch: 16   Global Step: 275430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:21,827-Speed 9505.40 samples/sec   Loss 3.7532   LearningRate 0.0031   Epoch: 16   Global Step: 275440   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:40:22,954-Speed 9091.07 samples/sec   Loss 3.7613   LearningRate 0.0031   Epoch: 16   Global Step: 275450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:40:24,067-Speed 9205.18 samples/sec   Loss 3.8520   LearningRate 0.0031   Epoch: 16   Global Step: 275460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:25,183-Speed 9182.24 samples/sec   Loss 3.8824   LearningRate 0.0031   Epoch: 16   Global Step: 275470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:26,304-Speed 9138.59 samples/sec   Loss 3.8004   LearningRate 0.0031   Epoch: 16   Global Step: 275480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:27,473-Speed 8767.55 samples/sec   Loss 3.8966   LearningRate 0.0031   Epoch: 16   Global Step: 275490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:28,598-Speed 9100.65 samples/sec   Loss 3.8232   LearningRate 0.0031   Epoch: 16   Global Step: 275500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:29,681-Speed 9468.69 samples/sec   Loss 3.8716   LearningRate 0.0031   Epoch: 16   Global Step: 275510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:30,801-Speed 9141.10 samples/sec   Loss 3.8454   LearningRate 0.0031   Epoch: 16   Global Step: 275520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:31,879-Speed 9509.10 samples/sec   Loss 3.8536   LearningRate 0.0030   Epoch: 16   Global Step: 275530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:32,987-Speed 9249.39 samples/sec   Loss 3.8165   LearningRate 0.0030   Epoch: 16   Global Step: 275540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:34,071-Speed 9446.69 samples/sec   Loss 3.9376   LearningRate 0.0030   Epoch: 16   Global Step: 275550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:35,179-Speed 9250.61 samples/sec   Loss 3.8199   LearningRate 0.0030   Epoch: 16   Global Step: 275560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:36,321-Speed 8966.75 samples/sec   Loss 3.8226   LearningRate 0.0030   Epoch: 16   Global Step: 275570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:37,488-Speed 8781.82 samples/sec   Loss 3.8178   LearningRate 0.0030   Epoch: 16   Global Step: 275580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:38,655-Speed 8780.57 samples/sec   Loss 3.8541   LearningRate 0.0030   Epoch: 16   Global Step: 275590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:39,786-Speed 9070.47 samples/sec   Loss 3.9089   LearningRate 0.0030   Epoch: 16   Global Step: 275600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:40,863-Speed 9512.66 samples/sec   Loss 3.8098   LearningRate 0.0030   Epoch: 16   Global Step: 275610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:41,966-Speed 9286.59 samples/sec   Loss 3.8641   LearningRate 0.0030   Epoch: 16   Global Step: 275620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:43,086-Speed 9148.62 samples/sec   Loss 3.8937   LearningRate 0.0030   Epoch: 16   Global Step: 275630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:44,157-Speed 9565.13 samples/sec   Loss 3.8914   LearningRate 0.0030   Epoch: 16   Global Step: 275640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:45,284-Speed 9092.82 samples/sec   Loss 3.9332   LearningRate 0.0030   Epoch: 16   Global Step: 275650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:46,343-Speed 9672.82 samples/sec   Loss 3.8414   LearningRate 0.0030   Epoch: 16   Global Step: 275660   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:40:47,424-Speed 9480.23 samples/sec   Loss 3.7792   LearningRate 0.0030   Epoch: 16   Global Step: 275670   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:40:48,521-Speed 9339.18 samples/sec   Loss 3.8953   LearningRate 0.0030   Epoch: 16   Global Step: 275680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:40:49,612-Speed 9390.09 samples/sec   Loss 3.9043   LearningRate 0.0030   Epoch: 16   Global Step: 275690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:50,718-Speed 9291.51 samples/sec   Loss 3.8819   LearningRate 0.0030   Epoch: 16   Global Step: 275700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:51,831-Speed 9197.67 samples/sec   Loss 3.8937   LearningRate 0.0030   Epoch: 16   Global Step: 275710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:52,943-Speed 9220.29 samples/sec   Loss 3.8468   LearningRate 0.0030   Epoch: 16   Global Step: 275720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:54,119-Speed 8707.18 samples/sec   Loss 3.8548   LearningRate 0.0030   Epoch: 16   Global Step: 275730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:55,223-Speed 9286.00 samples/sec   Loss 3.8955   LearningRate 0.0030   Epoch: 16   Global Step: 275740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:56,345-Speed 9133.76 samples/sec   Loss 3.7789   LearningRate 0.0030   Epoch: 16   Global Step: 275750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:57,486-Speed 8981.58 samples/sec   Loss 3.8313   LearningRate 0.0030   Epoch: 16   Global Step: 275760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:58,618-Speed 9048.30 samples/sec   Loss 3.8471   LearningRate 0.0030   Epoch: 16   Global Step: 275770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:40:59,762-Speed 8960.55 samples/sec   Loss 3.8414   LearningRate 0.0030   Epoch: 16   Global Step: 275780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:00,883-Speed 9139.22 samples/sec   Loss 3.8890   LearningRate 0.0030   Epoch: 16   Global Step: 275790   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:41:01,972-Speed 9405.95 samples/sec   Loss 3.8367   LearningRate 0.0030   Epoch: 16   Global Step: 275800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:03,071-Speed 9324.51 samples/sec   Loss 3.9143   LearningRate 0.0030   Epoch: 16   Global Step: 275810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:04,159-Speed 9417.77 samples/sec   Loss 3.9207   LearningRate 0.0030   Epoch: 16   Global Step: 275820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:05,224-Speed 9613.14 samples/sec   Loss 3.8968   LearningRate 0.0030   Epoch: 16   Global Step: 275830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:06,286-Speed 9651.14 samples/sec   Loss 3.8722   LearningRate 0.0030   Epoch: 16   Global Step: 275840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:07,401-Speed 9189.93 samples/sec   Loss 3.8262   LearningRate 0.0030   Epoch: 16   Global Step: 275850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:08,521-Speed 9145.15 samples/sec   Loss 3.8141   LearningRate 0.0030   Epoch: 16   Global Step: 275860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:09,676-Speed 8873.31 samples/sec   Loss 3.8575   LearningRate 0.0030   Epoch: 16   Global Step: 275870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:10,789-Speed 9208.94 samples/sec   Loss 3.7359   LearningRate 0.0030   Epoch: 16   Global Step: 275880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:11,887-Speed 9331.16 samples/sec   Loss 3.8398   LearningRate 0.0030   Epoch: 16   Global Step: 275890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:12,952-Speed 9617.71 samples/sec   Loss 3.8890   LearningRate 0.0030   Epoch: 16   Global Step: 275900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:14,067-Speed 9190.15 samples/sec   Loss 3.8951   LearningRate 0.0030   Epoch: 16   Global Step: 275910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:15,233-Speed 8788.21 samples/sec   Loss 3.8070   LearningRate 0.0030   Epoch: 16   Global Step: 275920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:16,317-Speed 9448.31 samples/sec   Loss 3.8015   LearningRate 0.0030   Epoch: 16   Global Step: 275930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:17,438-Speed 9144.33 samples/sec   Loss 3.8726   LearningRate 0.0030   Epoch: 16   Global Step: 275940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:18,561-Speed 9123.54 samples/sec   Loss 3.8303   LearningRate 0.0030   Epoch: 16   Global Step: 275950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:19,688-Speed 9087.05 samples/sec   Loss 3.8751   LearningRate 0.0030   Epoch: 16   Global Step: 275960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:20,792-Speed 9286.10 samples/sec   Loss 3.7864   LearningRate 0.0030   Epoch: 16   Global Step: 275970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:21,884-Speed 9380.89 samples/sec   Loss 3.9316   LearningRate 0.0030   Epoch: 16   Global Step: 275980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:22,962-Speed 9510.32 samples/sec   Loss 3.8318   LearningRate 0.0030   Epoch: 16   Global Step: 275990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:41:24,052-Speed 9396.34 samples/sec   Loss 3.8075   LearningRate 0.0030   Epoch: 16   Global Step: 276000   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:41:45,905-[lfw][276000]XNorm: 6.900148
Training: 2022-04-11 22:41:45,906-[lfw][276000]Accuracy-Flip: 0.99583+-0.00281
Training: 2022-04-11 22:41:45,906-[lfw][276000]Accuracy-Highest: 0.99733
Training: 2022-04-11 22:42:11,233-[cfp_fp][276000]XNorm: 5.996978
Training: 2022-04-11 22:42:11,234-[cfp_fp][276000]Accuracy-Flip: 0.97243+-0.00670
Training: 2022-04-11 22:42:11,234-[cfp_fp][276000]Accuracy-Highest: 0.97243
Training: 2022-04-11 22:42:33,078-[agedb_30][276000]XNorm: 6.713654
Training: 2022-04-11 22:42:33,079-[agedb_30][276000]Accuracy-Flip: 0.97233+-0.00901
Training: 2022-04-11 22:42:33,079-[agedb_30][276000]Accuracy-Highest: 0.97350
Training: 2022-04-11 22:42:34,144-Speed 146.09 samples/sec   Loss 3.8436   LearningRate 0.0030   Epoch: 16   Global Step: 276010   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:42:35,220-Speed 9522.98 samples/sec   Loss 3.8884   LearningRate 0.0030   Epoch: 16   Global Step: 276020   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:42:36,327-Speed 9260.48 samples/sec   Loss 3.7923   LearningRate 0.0030   Epoch: 16   Global Step: 276030   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:42:37,424-Speed 9336.66 samples/sec   Loss 3.8758   LearningRate 0.0030   Epoch: 16   Global Step: 276040   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:42:38,565-Speed 8982.61 samples/sec   Loss 3.9269   LearningRate 0.0030   Epoch: 16   Global Step: 276050   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:42:39,620-Speed 9710.11 samples/sec   Loss 3.7974   LearningRate 0.0030   Epoch: 16   Global Step: 276060   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:42:40,722-Speed 9343.20 samples/sec   Loss 3.8229   LearningRate 0.0030   Epoch: 16   Global Step: 276070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:42:41,864-Speed 8970.97 samples/sec   Loss 3.7969   LearningRate 0.0030   Epoch: 16   Global Step: 276080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:42:42,993-Speed 9072.85 samples/sec   Loss 3.8760   LearningRate 0.0030   Epoch: 16   Global Step: 276090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:42:44,129-Speed 9018.43 samples/sec   Loss 3.8804   LearningRate 0.0030   Epoch: 16   Global Step: 276100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:42:45,241-Speed 9212.73 samples/sec   Loss 3.8648   LearningRate 0.0030   Epoch: 16   Global Step: 276110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:42:46,360-Speed 9157.56 samples/sec   Loss 3.8286   LearningRate 0.0030   Epoch: 16   Global Step: 276120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:42:47,467-Speed 9251.63 samples/sec   Loss 3.8293   LearningRate 0.0030   Epoch: 16   Global Step: 276130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:42:48,613-Speed 8943.08 samples/sec   Loss 3.7843   LearningRate 0.0030   Epoch: 16   Global Step: 276140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:42:49,735-Speed 9126.19 samples/sec   Loss 3.9446   LearningRate 0.0030   Epoch: 16   Global Step: 276150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:42:50,850-Speed 9197.79 samples/sec   Loss 3.8376   LearningRate 0.0030   Epoch: 16   Global Step: 276160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:42:51,995-Speed 8944.96 samples/sec   Loss 3.9197   LearningRate 0.0030   Epoch: 16   Global Step: 276170   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:42:53,103-Speed 9249.41 samples/sec   Loss 3.8887   LearningRate 0.0030   Epoch: 16   Global Step: 276180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:42:54,230-Speed 9086.51 samples/sec   Loss 3.8539   LearningRate 0.0030   Epoch: 16   Global Step: 276190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:42:55,344-Speed 9203.84 samples/sec   Loss 3.8291   LearningRate 0.0030   Epoch: 16   Global Step: 276200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:42:56,473-Speed 9072.30 samples/sec   Loss 3.8613   LearningRate 0.0030   Epoch: 16   Global Step: 276210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:42:57,586-Speed 9212.05 samples/sec   Loss 3.8475   LearningRate 0.0030   Epoch: 16   Global Step: 276220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:42:58,710-Speed 9114.31 samples/sec   Loss 3.8404   LearningRate 0.0030   Epoch: 16   Global Step: 276230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:42:59,810-Speed 9313.87 samples/sec   Loss 3.8607   LearningRate 0.0030   Epoch: 16   Global Step: 276240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:00,923-Speed 9205.08 samples/sec   Loss 3.8636   LearningRate 0.0030   Epoch: 16   Global Step: 276250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:02,036-Speed 9200.68 samples/sec   Loss 3.9067   LearningRate 0.0030   Epoch: 16   Global Step: 276260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:03,124-Speed 9422.38 samples/sec   Loss 3.8274   LearningRate 0.0030   Epoch: 16   Global Step: 276270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:04,242-Speed 9161.73 samples/sec   Loss 3.8447   LearningRate 0.0030   Epoch: 16   Global Step: 276280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:05,343-Speed 9304.51 samples/sec   Loss 3.8856   LearningRate 0.0030   Epoch: 16   Global Step: 276290   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:43:06,449-Speed 9262.97 samples/sec   Loss 3.9106   LearningRate 0.0030   Epoch: 16   Global Step: 276300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:07,605-Speed 8864.21 samples/sec   Loss 3.8496   LearningRate 0.0030   Epoch: 16   Global Step: 276310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:08,774-Speed 8769.67 samples/sec   Loss 3.8906   LearningRate 0.0030   Epoch: 16   Global Step: 276320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:09,921-Speed 8935.55 samples/sec   Loss 3.7813   LearningRate 0.0030   Epoch: 16   Global Step: 276330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:11,035-Speed 9193.81 samples/sec   Loss 3.8892   LearningRate 0.0030   Epoch: 16   Global Step: 276340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:12,199-Speed 8804.65 samples/sec   Loss 3.8660   LearningRate 0.0030   Epoch: 16   Global Step: 276350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:13,334-Speed 9028.32 samples/sec   Loss 3.8384   LearningRate 0.0030   Epoch: 16   Global Step: 276360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:14,448-Speed 9196.50 samples/sec   Loss 3.8313   LearningRate 0.0030   Epoch: 16   Global Step: 276370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:15,623-Speed 8718.76 samples/sec   Loss 3.8231   LearningRate 0.0030   Epoch: 16   Global Step: 276380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:16,725-Speed 9292.56 samples/sec   Loss 3.8279   LearningRate 0.0030   Epoch: 16   Global Step: 276390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:17,891-Speed 8788.52 samples/sec   Loss 3.8992   LearningRate 0.0030   Epoch: 16   Global Step: 276400   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:43:19,016-Speed 9107.75 samples/sec   Loss 3.8861   LearningRate 0.0030   Epoch: 16   Global Step: 276410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:20,140-Speed 9117.07 samples/sec   Loss 3.8958   LearningRate 0.0030   Epoch: 16   Global Step: 276420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:21,254-Speed 9200.62 samples/sec   Loss 3.8519   LearningRate 0.0030   Epoch: 16   Global Step: 276430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:22,375-Speed 9140.10 samples/sec   Loss 3.8265   LearningRate 0.0030   Epoch: 16   Global Step: 276440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:23,486-Speed 9223.34 samples/sec   Loss 3.7962   LearningRate 0.0030   Epoch: 16   Global Step: 276450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:24,614-Speed 9084.84 samples/sec   Loss 3.8219   LearningRate 0.0030   Epoch: 16   Global Step: 276460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:25,736-Speed 9133.02 samples/sec   Loss 3.8983   LearningRate 0.0030   Epoch: 16   Global Step: 276470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:26,870-Speed 9028.50 samples/sec   Loss 3.8412   LearningRate 0.0030   Epoch: 16   Global Step: 276480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:27,985-Speed 9198.97 samples/sec   Loss 3.8743   LearningRate 0.0029   Epoch: 16   Global Step: 276490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:29,079-Speed 9361.03 samples/sec   Loss 3.8158   LearningRate 0.0029   Epoch: 16   Global Step: 276500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:30,193-Speed 9196.41 samples/sec   Loss 3.8395   LearningRate 0.0029   Epoch: 16   Global Step: 276510   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:43:31,320-Speed 9095.49 samples/sec   Loss 3.8130   LearningRate 0.0029   Epoch: 16   Global Step: 276520   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:43:32,455-Speed 9024.99 samples/sec   Loss 3.8060   LearningRate 0.0029   Epoch: 16   Global Step: 276530   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:43:33,643-Speed 8621.28 samples/sec   Loss 3.8196   LearningRate 0.0029   Epoch: 16   Global Step: 276540   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:43:34,759-Speed 9184.11 samples/sec   Loss 3.9228   LearningRate 0.0029   Epoch: 16   Global Step: 276550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:35,860-Speed 9302.95 samples/sec   Loss 3.9363   LearningRate 0.0029   Epoch: 16   Global Step: 276560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:36,978-Speed 9167.93 samples/sec   Loss 3.8277   LearningRate 0.0029   Epoch: 16   Global Step: 276570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:38,089-Speed 9221.59 samples/sec   Loss 3.7477   LearningRate 0.0029   Epoch: 16   Global Step: 276580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:39,178-Speed 9411.88 samples/sec   Loss 3.8751   LearningRate 0.0029   Epoch: 16   Global Step: 276590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:40,302-Speed 9119.07 samples/sec   Loss 3.8817   LearningRate 0.0029   Epoch: 16   Global Step: 276600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:41,456-Speed 8878.87 samples/sec   Loss 3.8556   LearningRate 0.0029   Epoch: 16   Global Step: 276610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:42,614-Speed 8848.24 samples/sec   Loss 3.8854   LearningRate 0.0029   Epoch: 16   Global Step: 276620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:43,770-Speed 8856.14 samples/sec   Loss 3.9248   LearningRate 0.0029   Epoch: 16   Global Step: 276630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:44,863-Speed 9373.20 samples/sec   Loss 3.8979   LearningRate 0.0029   Epoch: 16   Global Step: 276640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:45,976-Speed 9208.73 samples/sec   Loss 3.8688   LearningRate 0.0029   Epoch: 16   Global Step: 276650   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:43:47,047-Speed 9565.00 samples/sec   Loss 3.9117   LearningRate 0.0029   Epoch: 16   Global Step: 276660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:48,138-Speed 9393.71 samples/sec   Loss 3.8760   LearningRate 0.0029   Epoch: 16   Global Step: 276670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:49,266-Speed 9084.23 samples/sec   Loss 3.7514   LearningRate 0.0029   Epoch: 16   Global Step: 276680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:50,384-Speed 9167.28 samples/sec   Loss 3.8472   LearningRate 0.0029   Epoch: 16   Global Step: 276690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:51,496-Speed 9209.08 samples/sec   Loss 3.8227   LearningRate 0.0029   Epoch: 16   Global Step: 276700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:52,613-Speed 9173.96 samples/sec   Loss 3.8535   LearningRate 0.0029   Epoch: 16   Global Step: 276710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:53,740-Speed 9093.03 samples/sec   Loss 3.8580   LearningRate 0.0029   Epoch: 16   Global Step: 276720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:54,884-Speed 8955.84 samples/sec   Loss 3.9338   LearningRate 0.0029   Epoch: 16   Global Step: 276730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:55,987-Speed 9286.96 samples/sec   Loss 3.9013   LearningRate 0.0029   Epoch: 16   Global Step: 276740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:57,099-Speed 9221.39 samples/sec   Loss 3.9397   LearningRate 0.0029   Epoch: 16   Global Step: 276750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:43:58,266-Speed 8777.48 samples/sec   Loss 3.9321   LearningRate 0.0029   Epoch: 16   Global Step: 276760   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:43:59,432-Speed 8785.73 samples/sec   Loss 3.9172   LearningRate 0.0029   Epoch: 16   Global Step: 276770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:00,550-Speed 9167.52 samples/sec   Loss 3.9418   LearningRate 0.0029   Epoch: 16   Global Step: 276780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:01,657-Speed 9255.59 samples/sec   Loss 3.7881   LearningRate 0.0029   Epoch: 16   Global Step: 276790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:02,774-Speed 9171.14 samples/sec   Loss 3.7801   LearningRate 0.0029   Epoch: 16   Global Step: 276800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:03,897-Speed 9124.62 samples/sec   Loss 3.9263   LearningRate 0.0029   Epoch: 16   Global Step: 276810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:05,003-Speed 9261.22 samples/sec   Loss 3.9041   LearningRate 0.0029   Epoch: 16   Global Step: 276820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:06,122-Speed 9155.26 samples/sec   Loss 3.7799   LearningRate 0.0029   Epoch: 16   Global Step: 276830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:07,265-Speed 8969.85 samples/sec   Loss 3.8685   LearningRate 0.0029   Epoch: 16   Global Step: 276840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:08,371-Speed 9260.26 samples/sec   Loss 3.9082   LearningRate 0.0029   Epoch: 16   Global Step: 276850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:09,517-Speed 8942.44 samples/sec   Loss 3.8659   LearningRate 0.0029   Epoch: 16   Global Step: 276860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:10,637-Speed 9151.44 samples/sec   Loss 3.9080   LearningRate 0.0029   Epoch: 16   Global Step: 276870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:11,742-Speed 9270.35 samples/sec   Loss 3.9062   LearningRate 0.0029   Epoch: 16   Global Step: 276880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:12,906-Speed 8800.29 samples/sec   Loss 3.9100   LearningRate 0.0029   Epoch: 16   Global Step: 276890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:14,120-Speed 8438.82 samples/sec   Loss 3.8160   LearningRate 0.0029   Epoch: 16   Global Step: 276900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:15,252-Speed 9054.98 samples/sec   Loss 3.8958   LearningRate 0.0029   Epoch: 16   Global Step: 276910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:16,356-Speed 9284.71 samples/sec   Loss 3.8747   LearningRate 0.0029   Epoch: 16   Global Step: 276920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:17,441-Speed 9445.31 samples/sec   Loss 3.8045   LearningRate 0.0029   Epoch: 16   Global Step: 276930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:18,565-Speed 9113.97 samples/sec   Loss 3.8087   LearningRate 0.0029   Epoch: 16   Global Step: 276940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:19,689-Speed 9110.65 samples/sec   Loss 3.8545   LearningRate 0.0029   Epoch: 16   Global Step: 276950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:20,820-Speed 9062.77 samples/sec   Loss 3.8884   LearningRate 0.0029   Epoch: 16   Global Step: 276960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:21,919-Speed 9324.85 samples/sec   Loss 3.8907   LearningRate 0.0029   Epoch: 16   Global Step: 276970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:44:23,055-Speed 9022.36 samples/sec   Loss 3.7369   LearningRate 0.0029   Epoch: 16   Global Step: 276980   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:44:24,139-Speed 9451.34 samples/sec   Loss 3.8020   LearningRate 0.0029   Epoch: 16   Global Step: 276990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:25,219-Speed 9484.04 samples/sec   Loss 3.8229   LearningRate 0.0029   Epoch: 16   Global Step: 277000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:26,328-Speed 9236.91 samples/sec   Loss 3.8003   LearningRate 0.0029   Epoch: 16   Global Step: 277010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:27,497-Speed 8765.09 samples/sec   Loss 3.8594   LearningRate 0.0029   Epoch: 16   Global Step: 277020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:28,629-Speed 9055.49 samples/sec   Loss 3.8837   LearningRate 0.0029   Epoch: 16   Global Step: 277030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:29,779-Speed 8907.33 samples/sec   Loss 3.8633   LearningRate 0.0029   Epoch: 16   Global Step: 277040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:30,925-Speed 8939.50 samples/sec   Loss 3.8683   LearningRate 0.0029   Epoch: 16   Global Step: 277050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:32,033-Speed 9250.42 samples/sec   Loss 3.8296   LearningRate 0.0029   Epoch: 16   Global Step: 277060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:33,228-Speed 8575.15 samples/sec   Loss 3.8338   LearningRate 0.0029   Epoch: 16   Global Step: 277070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:34,344-Speed 9176.90 samples/sec   Loss 3.8477   LearningRate 0.0029   Epoch: 16   Global Step: 277080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:35,433-Speed 9409.97 samples/sec   Loss 3.9437   LearningRate 0.0029   Epoch: 16   Global Step: 277090   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:44:36,562-Speed 9077.38 samples/sec   Loss 3.8060   LearningRate 0.0029   Epoch: 16   Global Step: 277100   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:44:37,669-Speed 9252.02 samples/sec   Loss 3.8505   LearningRate 0.0029   Epoch: 16   Global Step: 277110   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:44:38,751-Speed 9471.42 samples/sec   Loss 3.8706   LearningRate 0.0029   Epoch: 16   Global Step: 277120   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:44:39,878-Speed 9096.84 samples/sec   Loss 3.8023   LearningRate 0.0029   Epoch: 16   Global Step: 277130   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:44:40,984-Speed 9264.97 samples/sec   Loss 3.7835   LearningRate 0.0029   Epoch: 16   Global Step: 277140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:42,107-Speed 9118.75 samples/sec   Loss 3.8545   LearningRate 0.0029   Epoch: 16   Global Step: 277150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:43,238-Speed 9060.92 samples/sec   Loss 3.7971   LearningRate 0.0029   Epoch: 16   Global Step: 277160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:44,356-Speed 9159.94 samples/sec   Loss 3.8944   LearningRate 0.0029   Epoch: 16   Global Step: 277170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:45,467-Speed 9227.28 samples/sec   Loss 3.8902   LearningRate 0.0029   Epoch: 16   Global Step: 277180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:46,550-Speed 9457.41 samples/sec   Loss 3.8077   LearningRate 0.0029   Epoch: 16   Global Step: 277190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:47,683-Speed 9042.60 samples/sec   Loss 3.9161   LearningRate 0.0029   Epoch: 16   Global Step: 277200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:48,763-Speed 9483.81 samples/sec   Loss 3.8175   LearningRate 0.0029   Epoch: 16   Global Step: 277210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:49,846-Speed 9462.24 samples/sec   Loss 3.9816   LearningRate 0.0029   Epoch: 16   Global Step: 277220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:50,970-Speed 9116.52 samples/sec   Loss 3.8385   LearningRate 0.0029   Epoch: 16   Global Step: 277230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:52,063-Speed 9381.56 samples/sec   Loss 3.8494   LearningRate 0.0029   Epoch: 16   Global Step: 277240   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:44:53,174-Speed 9220.91 samples/sec   Loss 3.7822   LearningRate 0.0029   Epoch: 16   Global Step: 277250   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:44:54,288-Speed 9199.00 samples/sec   Loss 3.8272   LearningRate 0.0029   Epoch: 16   Global Step: 277260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:44:55,362-Speed 9537.44 samples/sec   Loss 3.8118   LearningRate 0.0029   Epoch: 16   Global Step: 277270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:56,458-Speed 9354.17 samples/sec   Loss 3.9055   LearningRate 0.0029   Epoch: 16   Global Step: 277280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:57,620-Speed 8818.02 samples/sec   Loss 3.7689   LearningRate 0.0029   Epoch: 16   Global Step: 277290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:58,695-Speed 9529.01 samples/sec   Loss 3.9199   LearningRate 0.0029   Epoch: 16   Global Step: 277300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:44:59,818-Speed 9122.36 samples/sec   Loss 3.8724   LearningRate 0.0029   Epoch: 16   Global Step: 277310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:00,946-Speed 9083.20 samples/sec   Loss 3.7559   LearningRate 0.0029   Epoch: 16   Global Step: 277320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:02,055-Speed 9236.42 samples/sec   Loss 3.8743   LearningRate 0.0029   Epoch: 16   Global Step: 277330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:03,204-Speed 8916.16 samples/sec   Loss 3.8485   LearningRate 0.0029   Epoch: 16   Global Step: 277340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:04,351-Speed 8937.72 samples/sec   Loss 3.8929   LearningRate 0.0029   Epoch: 16   Global Step: 277350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:05,447-Speed 9347.57 samples/sec   Loss 3.8418   LearningRate 0.0029   Epoch: 16   Global Step: 277360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:06,559-Speed 9207.47 samples/sec   Loss 3.8444   LearningRate 0.0029   Epoch: 16   Global Step: 277370   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:45:07,716-Speed 8854.26 samples/sec   Loss 3.8827   LearningRate 0.0029   Epoch: 16   Global Step: 277380   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:45:08,808-Speed 9389.00 samples/sec   Loss 3.9862   LearningRate 0.0029   Epoch: 16   Global Step: 277390   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:45:09,873-Speed 9618.74 samples/sec   Loss 3.8082   LearningRate 0.0029   Epoch: 16   Global Step: 277400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:10,962-Speed 9415.34 samples/sec   Loss 3.9270   LearningRate 0.0029   Epoch: 16   Global Step: 277410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:12,052-Speed 9402.75 samples/sec   Loss 3.8100   LearningRate 0.0029   Epoch: 16   Global Step: 277420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:13,177-Speed 9102.61 samples/sec   Loss 3.8739   LearningRate 0.0029   Epoch: 16   Global Step: 277430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:14,284-Speed 9256.94 samples/sec   Loss 3.8697   LearningRate 0.0029   Epoch: 16   Global Step: 277440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:15,409-Speed 9108.10 samples/sec   Loss 3.8222   LearningRate 0.0029   Epoch: 16   Global Step: 277450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:16,486-Speed 9509.79 samples/sec   Loss 3.8072   LearningRate 0.0029   Epoch: 16   Global Step: 277460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:17,639-Speed 8887.48 samples/sec   Loss 3.7892   LearningRate 0.0028   Epoch: 16   Global Step: 277470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:18,800-Speed 8822.07 samples/sec   Loss 3.9717   LearningRate 0.0028   Epoch: 16   Global Step: 277480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:19,887-Speed 9429.67 samples/sec   Loss 3.7863   LearningRate 0.0028   Epoch: 16   Global Step: 277490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:20,990-Speed 9295.08 samples/sec   Loss 3.8707   LearningRate 0.0028   Epoch: 16   Global Step: 277500   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:45:22,056-Speed 9610.35 samples/sec   Loss 3.8064   LearningRate 0.0028   Epoch: 16   Global Step: 277510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:23,158-Speed 9293.94 samples/sec   Loss 3.9172   LearningRate 0.0028   Epoch: 16   Global Step: 277520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:24,322-Speed 8804.53 samples/sec   Loss 3.9159   LearningRate 0.0028   Epoch: 16   Global Step: 277530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:25,455-Speed 9041.51 samples/sec   Loss 3.8668   LearningRate 0.0028   Epoch: 16   Global Step: 277540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:26,575-Speed 9147.85 samples/sec   Loss 3.8859   LearningRate 0.0028   Epoch: 16   Global Step: 277550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:27,704-Speed 9079.38 samples/sec   Loss 3.7802   LearningRate 0.0028   Epoch: 16   Global Step: 277560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:28,797-Speed 9373.26 samples/sec   Loss 3.7751   LearningRate 0.0028   Epoch: 16   Global Step: 277570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:29,891-Speed 9372.03 samples/sec   Loss 3.8248   LearningRate 0.0028   Epoch: 16   Global Step: 277580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:31,004-Speed 9200.50 samples/sec   Loss 3.9549   LearningRate 0.0028   Epoch: 16   Global Step: 277590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:32,148-Speed 8960.84 samples/sec   Loss 3.8326   LearningRate 0.0028   Epoch: 16   Global Step: 277600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:33,315-Speed 8779.58 samples/sec   Loss 3.8507   LearningRate 0.0028   Epoch: 16   Global Step: 277610   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:45:34,461-Speed 8935.85 samples/sec   Loss 3.7920   LearningRate 0.0028   Epoch: 16   Global Step: 277620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:35,582-Speed 9141.74 samples/sec   Loss 3.9320   LearningRate 0.0028   Epoch: 16   Global Step: 277630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:36,689-Speed 9256.31 samples/sec   Loss 3.8977   LearningRate 0.0028   Epoch: 16   Global Step: 277640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:37,826-Speed 9009.82 samples/sec   Loss 3.9020   LearningRate 0.0028   Epoch: 16   Global Step: 277650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:38,967-Speed 8983.32 samples/sec   Loss 3.8225   LearningRate 0.0028   Epoch: 16   Global Step: 277660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:40,108-Speed 8981.24 samples/sec   Loss 3.7819   LearningRate 0.0028   Epoch: 16   Global Step: 277670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:41,252-Speed 8955.95 samples/sec   Loss 3.8912   LearningRate 0.0028   Epoch: 16   Global Step: 277680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:42,384-Speed 9052.54 samples/sec   Loss 3.7971   LearningRate 0.0028   Epoch: 16   Global Step: 277690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:43,494-Speed 9226.71 samples/sec   Loss 3.8845   LearningRate 0.0028   Epoch: 16   Global Step: 277700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:44,612-Speed 9165.47 samples/sec   Loss 3.8607   LearningRate 0.0028   Epoch: 16   Global Step: 277710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:45,717-Speed 9272.23 samples/sec   Loss 3.8225   LearningRate 0.0028   Epoch: 16   Global Step: 277720   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:45:46,826-Speed 9237.28 samples/sec   Loss 3.8838   LearningRate 0.0028   Epoch: 16   Global Step: 277730   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:45:47,960-Speed 9038.02 samples/sec   Loss 3.9236   LearningRate 0.0028   Epoch: 16   Global Step: 277740   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:45:49,117-Speed 8852.85 samples/sec   Loss 3.8389   LearningRate 0.0028   Epoch: 16   Global Step: 277750   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:45:50,269-Speed 8894.82 samples/sec   Loss 3.9061   LearningRate 0.0028   Epoch: 16   Global Step: 277760   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:45:51,381-Speed 9213.93 samples/sec   Loss 3.8108   LearningRate 0.0028   Epoch: 16   Global Step: 277770   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:45:52,462-Speed 9483.36 samples/sec   Loss 3.7779   LearningRate 0.0028   Epoch: 16   Global Step: 277780   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:45:53,549-Speed 9428.73 samples/sec   Loss 3.8741   LearningRate 0.0028   Epoch: 16   Global Step: 277790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:54,666-Speed 9168.72 samples/sec   Loss 3.9656   LearningRate 0.0028   Epoch: 16   Global Step: 277800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:55,774-Speed 9249.68 samples/sec   Loss 3.7664   LearningRate 0.0028   Epoch: 16   Global Step: 277810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:56,897-Speed 9119.17 samples/sec   Loss 3.8777   LearningRate 0.0028   Epoch: 16   Global Step: 277820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:58,026-Speed 9090.34 samples/sec   Loss 3.9030   LearningRate 0.0028   Epoch: 16   Global Step: 277830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:45:59,099-Speed 9542.19 samples/sec   Loss 3.8825   LearningRate 0.0028   Epoch: 16   Global Step: 277840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:46:00,195-Speed 9352.66 samples/sec   Loss 3.9551   LearningRate 0.0028   Epoch: 16   Global Step: 277850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:46:01,292-Speed 9332.16 samples/sec   Loss 3.9473   LearningRate 0.0028   Epoch: 16   Global Step: 277860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:46:02,416-Speed 9121.51 samples/sec   Loss 3.8924   LearningRate 0.0028   Epoch: 16   Global Step: 277870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:46:03,520-Speed 9277.37 samples/sec   Loss 3.7503   LearningRate 0.0028   Epoch: 16   Global Step: 277880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:46:04,631-Speed 9220.58 samples/sec   Loss 3.9364   LearningRate 0.0028   Epoch: 16   Global Step: 277890   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:46:05,740-Speed 9242.14 samples/sec   Loss 3.8767   LearningRate 0.0028   Epoch: 16   Global Step: 277900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:46:06,859-Speed 9151.78 samples/sec   Loss 3.8362   LearningRate 0.0028   Epoch: 16   Global Step: 277910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:46:07,981-Speed 9133.08 samples/sec   Loss 3.8698   LearningRate 0.0028   Epoch: 16   Global Step: 277920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:46:09,120-Speed 8996.27 samples/sec   Loss 3.7804   LearningRate 0.0028   Epoch: 16   Global Step: 277930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:46:10,242-Speed 9140.96 samples/sec   Loss 3.8460   LearningRate 0.0028   Epoch: 16   Global Step: 277940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:46:11,403-Speed 8822.43 samples/sec   Loss 3.8214   LearningRate 0.0028   Epoch: 16   Global Step: 277950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:46:12,514-Speed 9222.16 samples/sec   Loss 3.9226   LearningRate 0.0028   Epoch: 16   Global Step: 277960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:46:13,688-Speed 8724.39 samples/sec   Loss 3.9034   LearningRate 0.0028   Epoch: 16   Global Step: 277970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:46:14,790-Speed 9293.39 samples/sec   Loss 3.8631   LearningRate 0.0028   Epoch: 16   Global Step: 277980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:46:15,918-Speed 9088.62 samples/sec   Loss 3.8888   LearningRate 0.0028   Epoch: 16   Global Step: 277990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:46:17,048-Speed 9066.61 samples/sec   Loss 3.8054   LearningRate 0.0028   Epoch: 16   Global Step: 278000   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:46:39,249-[lfw][278000]XNorm: 6.848850
Training: 2022-04-11 22:46:39,250-[lfw][278000]Accuracy-Flip: 0.99683+-0.00273
Training: 2022-04-11 22:46:39,250-[lfw][278000]Accuracy-Highest: 0.99733
Training: 2022-04-11 22:47:04,961-[cfp_fp][278000]XNorm: 5.966458
Training: 2022-04-11 22:47:04,961-[cfp_fp][278000]Accuracy-Flip: 0.97257+-0.00903
Training: 2022-04-11 22:47:04,962-[cfp_fp][278000]Accuracy-Highest: 0.97257
Training: 2022-04-11 22:47:26,951-[agedb_30][278000]XNorm: 6.658534
Training: 2022-04-11 22:47:26,951-[agedb_30][278000]Accuracy-Flip: 0.97200+-0.00933
Training: 2022-04-11 22:47:26,952-[agedb_30][278000]Accuracy-Highest: 0.97350
Training: 2022-04-11 22:47:28,054-Speed 144.21 samples/sec   Loss 3.7723   LearningRate 0.0028   Epoch: 16   Global Step: 278010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:29,268-Speed 8441.06 samples/sec   Loss 3.8825   LearningRate 0.0028   Epoch: 16   Global Step: 278020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:30,414-Speed 8941.32 samples/sec   Loss 3.8407   LearningRate 0.0028   Epoch: 16   Global Step: 278030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:31,610-Speed 8570.92 samples/sec   Loss 3.9064   LearningRate 0.0028   Epoch: 16   Global Step: 278040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:32,699-Speed 9408.00 samples/sec   Loss 3.8645   LearningRate 0.0028   Epoch: 16   Global Step: 278050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:33,843-Speed 8955.03 samples/sec   Loss 3.9286   LearningRate 0.0028   Epoch: 16   Global Step: 278060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:34,975-Speed 9050.97 samples/sec   Loss 3.9169   LearningRate 0.0028   Epoch: 16   Global Step: 278070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:36,070-Speed 9358.30 samples/sec   Loss 3.9268   LearningRate 0.0028   Epoch: 16   Global Step: 278080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:37,209-Speed 8997.08 samples/sec   Loss 3.8778   LearningRate 0.0028   Epoch: 16   Global Step: 278090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:38,371-Speed 8818.11 samples/sec   Loss 3.8168   LearningRate 0.0028   Epoch: 16   Global Step: 278100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:39,523-Speed 8889.83 samples/sec   Loss 3.8440   LearningRate 0.0028   Epoch: 16   Global Step: 278110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:40,667-Speed 8959.28 samples/sec   Loss 3.9051   LearningRate 0.0028   Epoch: 16   Global Step: 278120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:41,813-Speed 8943.66 samples/sec   Loss 3.9339   LearningRate 0.0028   Epoch: 16   Global Step: 278130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:42,910-Speed 9336.81 samples/sec   Loss 3.9532   LearningRate 0.0028   Epoch: 16   Global Step: 278140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:44,019-Speed 9238.09 samples/sec   Loss 3.8056   LearningRate 0.0028   Epoch: 16   Global Step: 278150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:45,094-Speed 9532.71 samples/sec   Loss 3.9150   LearningRate 0.0028   Epoch: 16   Global Step: 278160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:46,281-Speed 8627.95 samples/sec   Loss 3.9068   LearningRate 0.0028   Epoch: 16   Global Step: 278170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:47,400-Speed 9163.21 samples/sec   Loss 3.8579   LearningRate 0.0028   Epoch: 16   Global Step: 278180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:48,562-Speed 8820.25 samples/sec   Loss 3.8147   LearningRate 0.0028   Epoch: 16   Global Step: 278190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:49,670-Speed 9243.11 samples/sec   Loss 3.8299   LearningRate 0.0028   Epoch: 16   Global Step: 278200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:47:50,762-Speed 9385.91 samples/sec   Loss 3.8971   LearningRate 0.0028   Epoch: 16   Global Step: 278210   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:47:51,831-Speed 9580.71 samples/sec   Loss 3.8485   LearningRate 0.0028   Epoch: 16   Global Step: 278220   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:47:53,001-Speed 8754.92 samples/sec   Loss 3.8087   LearningRate 0.0028   Epoch: 16   Global Step: 278230   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:47:54,136-Speed 9047.46 samples/sec   Loss 3.8380   LearningRate 0.0028   Epoch: 16   Global Step: 278240   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:47:55,259-Speed 9122.58 samples/sec   Loss 3.8512   LearningRate 0.0028   Epoch: 16   Global Step: 278250   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:47:56,409-Speed 8911.38 samples/sec   Loss 3.7963   LearningRate 0.0028   Epoch: 16   Global Step: 278260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:47:57,528-Speed 9161.76 samples/sec   Loss 3.8882   LearningRate 0.0028   Epoch: 16   Global Step: 278270   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:47:58,686-Speed 8845.08 samples/sec   Loss 3.9334   LearningRate 0.0028   Epoch: 16   Global Step: 278280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:47:59,782-Speed 9351.31 samples/sec   Loss 3.8240   LearningRate 0.0028   Epoch: 16   Global Step: 278290   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:48:00,881-Speed 9328.20 samples/sec   Loss 3.8226   LearningRate 0.0028   Epoch: 16   Global Step: 278300   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:48:01,978-Speed 9331.38 samples/sec   Loss 3.7778   LearningRate 0.0028   Epoch: 16   Global Step: 278310   Fp16 Grad Scale: 262144   Required: 2 hours
Training: 2022-04-11 22:48:03,053-Speed 9530.27 samples/sec   Loss 3.8579   LearningRate 0.0028   Epoch: 16   Global Step: 278320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:04,179-Speed 9099.58 samples/sec   Loss 3.7967   LearningRate 0.0028   Epoch: 16   Global Step: 278330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:05,346-Speed 8779.49 samples/sec   Loss 3.8859   LearningRate 0.0028   Epoch: 16   Global Step: 278340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:06,410-Speed 9632.38 samples/sec   Loss 3.8065   LearningRate 0.0028   Epoch: 16   Global Step: 278350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:07,523-Speed 9207.37 samples/sec   Loss 3.8884   LearningRate 0.0028   Epoch: 16   Global Step: 278360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:08,650-Speed 9086.72 samples/sec   Loss 3.8599   LearningRate 0.0028   Epoch: 16   Global Step: 278370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:09,761-Speed 9252.29 samples/sec   Loss 3.7941   LearningRate 0.0028   Epoch: 16   Global Step: 278380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:10,880-Speed 9152.63 samples/sec   Loss 3.8744   LearningRate 0.0028   Epoch: 16   Global Step: 278390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:12,006-Speed 9100.03 samples/sec   Loss 3.8614   LearningRate 0.0028   Epoch: 16   Global Step: 278400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:13,159-Speed 8881.37 samples/sec   Loss 3.8145   LearningRate 0.0028   Epoch: 16   Global Step: 278410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:14,305-Speed 8943.56 samples/sec   Loss 3.8638   LearningRate 0.0028   Epoch: 16   Global Step: 278420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:15,429-Speed 9116.84 samples/sec   Loss 3.8666   LearningRate 0.0028   Epoch: 16   Global Step: 278430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:16,533-Speed 9285.29 samples/sec   Loss 3.8414   LearningRate 0.0028   Epoch: 16   Global Step: 278440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:17,626-Speed 9371.22 samples/sec   Loss 3.8631   LearningRate 0.0028   Epoch: 16   Global Step: 278450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:18,749-Speed 9124.99 samples/sec   Loss 3.8764   LearningRate 0.0028   Epoch: 16   Global Step: 278460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:19,950-Speed 8526.99 samples/sec   Loss 3.8439   LearningRate 0.0027   Epoch: 16   Global Step: 278470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:21,067-Speed 9180.55 samples/sec   Loss 3.8619   LearningRate 0.0027   Epoch: 16   Global Step: 278480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:22,198-Speed 9052.79 samples/sec   Loss 3.9193   LearningRate 0.0027   Epoch: 16   Global Step: 278490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:23,328-Speed 9071.84 samples/sec   Loss 3.8454   LearningRate 0.0027   Epoch: 16   Global Step: 278500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:24,454-Speed 9092.85 samples/sec   Loss 3.8121   LearningRate 0.0027   Epoch: 16   Global Step: 278510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:25,570-Speed 9182.50 samples/sec   Loss 3.8901   LearningRate 0.0027   Epoch: 16   Global Step: 278520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:26,672-Speed 9300.95 samples/sec   Loss 3.8012   LearningRate 0.0027   Epoch: 16   Global Step: 278530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:27,769-Speed 9342.06 samples/sec   Loss 3.8634   LearningRate 0.0027   Epoch: 16   Global Step: 278540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:28,896-Speed 9091.17 samples/sec   Loss 3.8156   LearningRate 0.0027   Epoch: 16   Global Step: 278550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:30,010-Speed 9196.62 samples/sec   Loss 3.8184   LearningRate 0.0027   Epoch: 16   Global Step: 278560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:31,198-Speed 8622.15 samples/sec   Loss 3.8041   LearningRate 0.0027   Epoch: 16   Global Step: 278570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:32,332-Speed 9032.94 samples/sec   Loss 3.9110   LearningRate 0.0027   Epoch: 16   Global Step: 278580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:33,503-Speed 8753.84 samples/sec   Loss 3.9150   LearningRate 0.0027   Epoch: 16   Global Step: 278590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:34,608-Speed 9270.13 samples/sec   Loss 3.8176   LearningRate 0.0027   Epoch: 16   Global Step: 278600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:35,706-Speed 9329.49 samples/sec   Loss 3.8449   LearningRate 0.0027   Epoch: 16   Global Step: 278610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:36,852-Speed 8946.79 samples/sec   Loss 3.9433   LearningRate 0.0027   Epoch: 16   Global Step: 278620   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:48:37,950-Speed 9324.73 samples/sec   Loss 3.8580   LearningRate 0.0027   Epoch: 16   Global Step: 278630   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:48:39,051-Speed 9312.24 samples/sec   Loss 3.8198   LearningRate 0.0027   Epoch: 16   Global Step: 278640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:40,168-Speed 9174.90 samples/sec   Loss 3.8784   LearningRate 0.0027   Epoch: 16   Global Step: 278650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:41,288-Speed 9143.32 samples/sec   Loss 3.9281   LearningRate 0.0027   Epoch: 16   Global Step: 278660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:42,426-Speed 9003.21 samples/sec   Loss 3.8112   LearningRate 0.0027   Epoch: 16   Global Step: 278670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:43,532-Speed 9269.26 samples/sec   Loss 3.9140   LearningRate 0.0027   Epoch: 16   Global Step: 278680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:44,701-Speed 8764.34 samples/sec   Loss 3.8438   LearningRate 0.0027   Epoch: 16   Global Step: 278690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:45,833-Speed 9049.13 samples/sec   Loss 3.8541   LearningRate 0.0027   Epoch: 16   Global Step: 278700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:46,948-Speed 9189.66 samples/sec   Loss 3.8849   LearningRate 0.0027   Epoch: 16   Global Step: 278710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:48,048-Speed 9314.70 samples/sec   Loss 3.8132   LearningRate 0.0027   Epoch: 16   Global Step: 278720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:49,280-Speed 8312.13 samples/sec   Loss 3.8819   LearningRate 0.0027   Epoch: 16   Global Step: 278730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:50,425-Speed 8953.28 samples/sec   Loss 3.8791   LearningRate 0.0027   Epoch: 16   Global Step: 278740   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:48:51,498-Speed 9553.39 samples/sec   Loss 3.7725   LearningRate 0.0027   Epoch: 16   Global Step: 278750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:52,620-Speed 9128.89 samples/sec   Loss 3.8849   LearningRate 0.0027   Epoch: 16   Global Step: 278760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:53,800-Speed 8686.37 samples/sec   Loss 3.8748   LearningRate 0.0027   Epoch: 16   Global Step: 278770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:54,920-Speed 9147.93 samples/sec   Loss 3.8833   LearningRate 0.0027   Epoch: 16   Global Step: 278780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:56,034-Speed 9280.12 samples/sec   Loss 3.9054   LearningRate 0.0027   Epoch: 16   Global Step: 278790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:57,158-Speed 9116.82 samples/sec   Loss 3.8443   LearningRate 0.0027   Epoch: 16   Global Step: 278800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:58,263-Speed 9272.45 samples/sec   Loss 3.8358   LearningRate 0.0027   Epoch: 16   Global Step: 278810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:48:59,359-Speed 9354.10 samples/sec   Loss 3.8386   LearningRate 0.0027   Epoch: 16   Global Step: 278820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:00,502-Speed 8958.78 samples/sec   Loss 3.8789   LearningRate 0.0027   Epoch: 16   Global Step: 278830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:01,638-Speed 9021.12 samples/sec   Loss 3.8866   LearningRate 0.0027   Epoch: 16   Global Step: 278840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:02,785-Speed 8936.34 samples/sec   Loss 3.8971   LearningRate 0.0027   Epoch: 16   Global Step: 278850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:03,883-Speed 9330.53 samples/sec   Loss 3.7465   LearningRate 0.0027   Epoch: 16   Global Step: 278860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:04,976-Speed 9370.95 samples/sec   Loss 3.8586   LearningRate 0.0027   Epoch: 16   Global Step: 278870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:06,095-Speed 9157.84 samples/sec   Loss 3.8195   LearningRate 0.0027   Epoch: 16   Global Step: 278880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:07,186-Speed 9395.18 samples/sec   Loss 3.8284   LearningRate 0.0027   Epoch: 16   Global Step: 278890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:08,376-Speed 8607.38 samples/sec   Loss 3.8906   LearningRate 0.0027   Epoch: 16   Global Step: 278900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:09,508-Speed 9047.36 samples/sec   Loss 3.9743   LearningRate 0.0027   Epoch: 16   Global Step: 278910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:10,675-Speed 8784.76 samples/sec   Loss 3.7706   LearningRate 0.0027   Epoch: 16   Global Step: 278920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:11,766-Speed 9390.45 samples/sec   Loss 3.8594   LearningRate 0.0027   Epoch: 16   Global Step: 278930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:12,851-Speed 9438.85 samples/sec   Loss 3.8247   LearningRate 0.0027   Epoch: 16   Global Step: 278940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:13,986-Speed 9031.57 samples/sec   Loss 3.8968   LearningRate 0.0027   Epoch: 16   Global Step: 278950   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:49:15,149-Speed 8808.70 samples/sec   Loss 3.7846   LearningRate 0.0027   Epoch: 16   Global Step: 278960   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:49:16,292-Speed 8967.64 samples/sec   Loss 3.9067   LearningRate 0.0027   Epoch: 16   Global Step: 278970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:49:17,391-Speed 9322.08 samples/sec   Loss 3.9084   LearningRate 0.0027   Epoch: 16   Global Step: 278980   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:49:18,482-Speed 9389.49 samples/sec   Loss 3.9220   LearningRate 0.0027   Epoch: 16   Global Step: 278990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:19,645-Speed 8812.32 samples/sec   Loss 3.8377   LearningRate 0.0027   Epoch: 16   Global Step: 279000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:20,780-Speed 9027.88 samples/sec   Loss 3.8248   LearningRate 0.0027   Epoch: 16   Global Step: 279010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:21,927-Speed 8931.56 samples/sec   Loss 3.9016   LearningRate 0.0027   Epoch: 16   Global Step: 279020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:23,079-Speed 8895.26 samples/sec   Loss 3.8974   LearningRate 0.0027   Epoch: 16   Global Step: 279030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:24,239-Speed 8828.86 samples/sec   Loss 3.9229   LearningRate 0.0027   Epoch: 16   Global Step: 279040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:25,355-Speed 9183.94 samples/sec   Loss 3.8674   LearningRate 0.0027   Epoch: 16   Global Step: 279050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:26,472-Speed 9173.69 samples/sec   Loss 3.8639   LearningRate 0.0027   Epoch: 16   Global Step: 279060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:27,621-Speed 8917.55 samples/sec   Loss 3.8432   LearningRate 0.0027   Epoch: 16   Global Step: 279070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:28,800-Speed 8691.27 samples/sec   Loss 3.8740   LearningRate 0.0027   Epoch: 16   Global Step: 279080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:29,901-Speed 9306.11 samples/sec   Loss 3.8102   LearningRate 0.0027   Epoch: 16   Global Step: 279090   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:49:31,058-Speed 8857.16 samples/sec   Loss 3.8819   LearningRate 0.0027   Epoch: 16   Global Step: 279100   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:49:32,163-Speed 9273.96 samples/sec   Loss 3.8327   LearningRate 0.0027   Epoch: 16   Global Step: 279110   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:49:33,253-Speed 9399.96 samples/sec   Loss 3.8685   LearningRate 0.0027   Epoch: 16   Global Step: 279120   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:49:34,417-Speed 8799.26 samples/sec   Loss 3.8477   LearningRate 0.0027   Epoch: 16   Global Step: 279130   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:49:35,488-Speed 9569.76 samples/sec   Loss 3.8864   LearningRate 0.0027   Epoch: 16   Global Step: 279140   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:49:36,580-Speed 9381.52 samples/sec   Loss 3.8775   LearningRate 0.0027   Epoch: 16   Global Step: 279150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:37,719-Speed 8996.29 samples/sec   Loss 3.9232   LearningRate 0.0027   Epoch: 16   Global Step: 279160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:38,864-Speed 8948.72 samples/sec   Loss 3.7978   LearningRate 0.0027   Epoch: 16   Global Step: 279170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:39,931-Speed 9604.88 samples/sec   Loss 3.7514   LearningRate 0.0027   Epoch: 16   Global Step: 279180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:41,071-Speed 8984.68 samples/sec   Loss 3.7954   LearningRate 0.0027   Epoch: 16   Global Step: 279190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:42,216-Speed 8945.63 samples/sec   Loss 3.8708   LearningRate 0.0027   Epoch: 16   Global Step: 279200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:43,360-Speed 8958.85 samples/sec   Loss 3.8243   LearningRate 0.0027   Epoch: 16   Global Step: 279210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:44,519-Speed 8839.80 samples/sec   Loss 3.9650   LearningRate 0.0027   Epoch: 16   Global Step: 279220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:45,690-Speed 8751.91 samples/sec   Loss 3.8401   LearningRate 0.0027   Epoch: 16   Global Step: 279230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:46,867-Speed 8702.75 samples/sec   Loss 3.8320   LearningRate 0.0027   Epoch: 16   Global Step: 279240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:47,939-Speed 9563.16 samples/sec   Loss 3.8894   LearningRate 0.0027   Epoch: 16   Global Step: 279250   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:49:49,057-Speed 9163.80 samples/sec   Loss 3.8801   LearningRate 0.0027   Epoch: 16   Global Step: 279260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:49:50,203-Speed 8937.94 samples/sec   Loss 3.9228   LearningRate 0.0027   Epoch: 16   Global Step: 279270   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:49:51,313-Speed 9228.18 samples/sec   Loss 3.8634   LearningRate 0.0027   Epoch: 16   Global Step: 279280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:49:52,439-Speed 9102.90 samples/sec   Loss 3.9243   LearningRate 0.0027   Epoch: 16   Global Step: 279290   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:49:53,549-Speed 9234.02 samples/sec   Loss 3.8806   LearningRate 0.0027   Epoch: 16   Global Step: 279300   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:49:54,609-Speed 9661.85 samples/sec   Loss 3.8855   LearningRate 0.0027   Epoch: 16   Global Step: 279310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:55,730-Speed 9144.22 samples/sec   Loss 3.8269   LearningRate 0.0027   Epoch: 16   Global Step: 279320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:56,797-Speed 9597.80 samples/sec   Loss 3.8339   LearningRate 0.0027   Epoch: 16   Global Step: 279330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:57,904-Speed 9261.96 samples/sec   Loss 3.9001   LearningRate 0.0027   Epoch: 16   Global Step: 279340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:49:59,054-Speed 8910.00 samples/sec   Loss 3.9057   LearningRate 0.0027   Epoch: 16   Global Step: 279350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:00,177-Speed 9117.88 samples/sec   Loss 3.8388   LearningRate 0.0027   Epoch: 16   Global Step: 279360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:01,333-Speed 8868.17 samples/sec   Loss 3.8129   LearningRate 0.0027   Epoch: 16   Global Step: 279370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:02,509-Speed 8711.64 samples/sec   Loss 3.8915   LearningRate 0.0027   Epoch: 16   Global Step: 279380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:03,635-Speed 9097.48 samples/sec   Loss 3.8994   LearningRate 0.0027   Epoch: 16   Global Step: 279390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:04,780-Speed 8949.18 samples/sec   Loss 3.8185   LearningRate 0.0027   Epoch: 16   Global Step: 279400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:05,918-Speed 9011.43 samples/sec   Loss 3.8216   LearningRate 0.0027   Epoch: 16   Global Step: 279410   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:50:07,028-Speed 9229.60 samples/sec   Loss 3.9316   LearningRate 0.0027   Epoch: 16   Global Step: 279420   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:50:08,170-Speed 8967.96 samples/sec   Loss 3.8034   LearningRate 0.0027   Epoch: 16   Global Step: 279430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:09,255-Speed 9443.69 samples/sec   Loss 3.7882   LearningRate 0.0027   Epoch: 16   Global Step: 279440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:10,382-Speed 9090.87 samples/sec   Loss 3.8900   LearningRate 0.0027   Epoch: 16   Global Step: 279450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:11,495-Speed 9203.07 samples/sec   Loss 3.7639   LearningRate 0.0027   Epoch: 16   Global Step: 279460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:12,631-Speed 9024.59 samples/sec   Loss 3.8464   LearningRate 0.0027   Epoch: 16   Global Step: 279470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:13,732-Speed 9305.41 samples/sec   Loss 3.9018   LearningRate 0.0026   Epoch: 16   Global Step: 279480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:14,852-Speed 9148.34 samples/sec   Loss 3.8232   LearningRate 0.0026   Epoch: 16   Global Step: 279490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:15,955-Speed 9291.19 samples/sec   Loss 3.9188   LearningRate 0.0026   Epoch: 16   Global Step: 279500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:17,095-Speed 8984.10 samples/sec   Loss 3.8832   LearningRate 0.0026   Epoch: 16   Global Step: 279510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:18,212-Speed 9169.52 samples/sec   Loss 3.9190   LearningRate 0.0026   Epoch: 16   Global Step: 279520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:19,339-Speed 9091.21 samples/sec   Loss 3.8861   LearningRate 0.0026   Epoch: 16   Global Step: 279530   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:50:20,526-Speed 8629.76 samples/sec   Loss 3.8185   LearningRate 0.0026   Epoch: 16   Global Step: 279540   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:50:21,623-Speed 9340.15 samples/sec   Loss 3.8191   LearningRate 0.0026   Epoch: 16   Global Step: 279550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:22,775-Speed 8894.35 samples/sec   Loss 3.8325   LearningRate 0.0026   Epoch: 16   Global Step: 279560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:23,902-Speed 9098.85 samples/sec   Loss 3.8494   LearningRate 0.0026   Epoch: 16   Global Step: 279570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:25,011-Speed 9233.21 samples/sec   Loss 3.8579   LearningRate 0.0026   Epoch: 16   Global Step: 279580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:26,137-Speed 9106.62 samples/sec   Loss 3.8518   LearningRate 0.0026   Epoch: 16   Global Step: 279590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:27,313-Speed 8711.30 samples/sec   Loss 3.8431   LearningRate 0.0026   Epoch: 16   Global Step: 279600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:28,458-Speed 8949.79 samples/sec   Loss 3.8915   LearningRate 0.0026   Epoch: 16   Global Step: 279610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:29,559-Speed 9307.87 samples/sec   Loss 3.8941   LearningRate 0.0026   Epoch: 16   Global Step: 279620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:30,684-Speed 9105.55 samples/sec   Loss 3.9456   LearningRate 0.0026   Epoch: 16   Global Step: 279630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:31,781-Speed 9341.42 samples/sec   Loss 3.8789   LearningRate 0.0026   Epoch: 16   Global Step: 279640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:32,879-Speed 9332.04 samples/sec   Loss 3.8714   LearningRate 0.0026   Epoch: 16   Global Step: 279650   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:50:33,969-Speed 9397.39 samples/sec   Loss 3.8855   LearningRate 0.0026   Epoch: 16   Global Step: 279660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:35,091-Speed 9133.24 samples/sec   Loss 3.8501   LearningRate 0.0026   Epoch: 16   Global Step: 279670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:36,168-Speed 9518.82 samples/sec   Loss 3.7929   LearningRate 0.0026   Epoch: 16   Global Step: 279680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:37,280-Speed 9206.54 samples/sec   Loss 3.8379   LearningRate 0.0026   Epoch: 16   Global Step: 279690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:38,380-Speed 9317.01 samples/sec   Loss 3.8442   LearningRate 0.0026   Epoch: 16   Global Step: 279700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:39,484-Speed 9280.10 samples/sec   Loss 3.9168   LearningRate 0.0026   Epoch: 16   Global Step: 279710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:40,634-Speed 8910.64 samples/sec   Loss 3.8904   LearningRate 0.0026   Epoch: 16   Global Step: 279720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:41,761-Speed 9091.87 samples/sec   Loss 3.7821   LearningRate 0.0026   Epoch: 16   Global Step: 279730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:42,890-Speed 9072.75 samples/sec   Loss 3.8844   LearningRate 0.0026   Epoch: 16   Global Step: 279740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:43,987-Speed 9345.94 samples/sec   Loss 3.8072   LearningRate 0.0026   Epoch: 16   Global Step: 279750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:45,061-Speed 9543.57 samples/sec   Loss 3.9171   LearningRate 0.0026   Epoch: 16   Global Step: 279760   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:50:46,125-Speed 9631.19 samples/sec   Loss 3.8636   LearningRate 0.0026   Epoch: 16   Global Step: 279770   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:50:47,192-Speed 9596.31 samples/sec   Loss 3.8298   LearningRate 0.0026   Epoch: 16   Global Step: 279780   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:50:48,295-Speed 9288.10 samples/sec   Loss 3.8932   LearningRate 0.0026   Epoch: 16   Global Step: 279790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:49,421-Speed 9104.12 samples/sec   Loss 3.7885   LearningRate 0.0026   Epoch: 16   Global Step: 279800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:50,535-Speed 9195.84 samples/sec   Loss 3.8338   LearningRate 0.0026   Epoch: 16   Global Step: 279810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:51,663-Speed 9082.56 samples/sec   Loss 3.9611   LearningRate 0.0026   Epoch: 16   Global Step: 279820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:52,809-Speed 8939.59 samples/sec   Loss 3.8701   LearningRate 0.0026   Epoch: 16   Global Step: 279830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:53,935-Speed 9096.84 samples/sec   Loss 3.9276   LearningRate 0.0026   Epoch: 16   Global Step: 279840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:55,087-Speed 8894.89 samples/sec   Loss 3.8949   LearningRate 0.0026   Epoch: 16   Global Step: 279850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:56,252-Speed 8797.34 samples/sec   Loss 3.9652   LearningRate 0.0026   Epoch: 16   Global Step: 279860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:57,404-Speed 8890.71 samples/sec   Loss 3.9459   LearningRate 0.0026   Epoch: 16   Global Step: 279870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:58,560-Speed 8865.50 samples/sec   Loss 3.8066   LearningRate 0.0026   Epoch: 16   Global Step: 279880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:50:59,675-Speed 9188.15 samples/sec   Loss 3.8440   LearningRate 0.0026   Epoch: 16   Global Step: 279890   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:51:00,790-Speed 9192.63 samples/sec   Loss 3.8461   LearningRate 0.0026   Epoch: 16   Global Step: 279900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:51:01,920-Speed 9065.20 samples/sec   Loss 3.8369   LearningRate 0.0026   Epoch: 16   Global Step: 279910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:51:02,997-Speed 9520.57 samples/sec   Loss 3.8206   LearningRate 0.0026   Epoch: 16   Global Step: 279920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:51:04,104-Speed 9256.19 samples/sec   Loss 3.9402   LearningRate 0.0026   Epoch: 16   Global Step: 279930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:51:05,170-Speed 9616.99 samples/sec   Loss 3.8907   LearningRate 0.0026   Epoch: 16   Global Step: 279940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:51:06,281-Speed 9214.75 samples/sec   Loss 3.8892   LearningRate 0.0026   Epoch: 16   Global Step: 279950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:51:07,397-Speed 9181.21 samples/sec   Loss 3.8510   LearningRate 0.0026   Epoch: 16   Global Step: 279960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:51:08,510-Speed 9206.31 samples/sec   Loss 3.8475   LearningRate 0.0026   Epoch: 16   Global Step: 279970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:51:09,630-Speed 9147.41 samples/sec   Loss 3.8837   LearningRate 0.0026   Epoch: 16   Global Step: 279980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:51:10,741-Speed 9219.79 samples/sec   Loss 3.8107   LearningRate 0.0026   Epoch: 16   Global Step: 279990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:51:11,849-Speed 9248.02 samples/sec   Loss 3.8380   LearningRate 0.0026   Epoch: 16   Global Step: 280000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:51:33,704-[lfw][280000]XNorm: 6.746896
Training: 2022-04-11 22:51:33,705-[lfw][280000]Accuracy-Flip: 0.99733+-0.00291
Training: 2022-04-11 22:51:33,705-[lfw][280000]Accuracy-Highest: 0.99733
Training: 2022-04-11 22:51:58,993-[cfp_fp][280000]XNorm: 5.880494
Training: 2022-04-11 22:51:58,994-[cfp_fp][280000]Accuracy-Flip: 0.97257+-0.00810
Training: 2022-04-11 22:51:58,994-[cfp_fp][280000]Accuracy-Highest: 0.97257
Training: 2022-04-11 22:52:20,864-[agedb_30][280000]XNorm: 6.566937
Training: 2022-04-11 22:52:20,865-[agedb_30][280000]Accuracy-Flip: 0.97300+-0.00912
Training: 2022-04-11 22:52:20,865-[agedb_30][280000]Accuracy-Highest: 0.97350
Training: 2022-04-11 22:52:21,997-Speed 145.98 samples/sec   Loss 3.7978   LearningRate 0.0026   Epoch: 16   Global Step: 280010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:23,074-Speed 9509.78 samples/sec   Loss 3.9451   LearningRate 0.0026   Epoch: 16   Global Step: 280020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:24,216-Speed 8971.31 samples/sec   Loss 3.8587   LearningRate 0.0026   Epoch: 16   Global Step: 280030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:25,330-Speed 9197.89 samples/sec   Loss 3.9285   LearningRate 0.0026   Epoch: 16   Global Step: 280040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:26,391-Speed 9657.28 samples/sec   Loss 3.8831   LearningRate 0.0026   Epoch: 16   Global Step: 280050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:27,510-Speed 9159.15 samples/sec   Loss 3.8133   LearningRate 0.0026   Epoch: 16   Global Step: 280060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:28,622-Speed 9215.43 samples/sec   Loss 3.9250   LearningRate 0.0026   Epoch: 16   Global Step: 280070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:29,737-Speed 9180.72 samples/sec   Loss 3.8401   LearningRate 0.0026   Epoch: 16   Global Step: 280080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:30,859-Speed 9133.96 samples/sec   Loss 3.8880   LearningRate 0.0026   Epoch: 16   Global Step: 280090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:32,006-Speed 8931.77 samples/sec   Loss 3.8693   LearningRate 0.0026   Epoch: 16   Global Step: 280100   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:52:33,136-Speed 9074.94 samples/sec   Loss 3.9166   LearningRate 0.0026   Epoch: 16   Global Step: 280110   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:52:34,265-Speed 9069.87 samples/sec   Loss 3.8419   LearningRate 0.0026   Epoch: 16   Global Step: 280120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:35,368-Speed 9288.82 samples/sec   Loss 3.8143   LearningRate 0.0026   Epoch: 16   Global Step: 280130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:36,482-Speed 9201.23 samples/sec   Loss 3.8493   LearningRate 0.0026   Epoch: 16   Global Step: 280140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:37,645-Speed 8810.88 samples/sec   Loss 3.8034   LearningRate 0.0026   Epoch: 16   Global Step: 280150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:38,770-Speed 9108.68 samples/sec   Loss 3.8317   LearningRate 0.0026   Epoch: 16   Global Step: 280160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:39,872-Speed 9297.31 samples/sec   Loss 3.9069   LearningRate 0.0026   Epoch: 16   Global Step: 280170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:40,959-Speed 9426.61 samples/sec   Loss 3.8715   LearningRate 0.0026   Epoch: 16   Global Step: 280180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:42,111-Speed 8889.70 samples/sec   Loss 3.9005   LearningRate 0.0026   Epoch: 16   Global Step: 280190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:43,241-Speed 9067.83 samples/sec   Loss 3.8460   LearningRate 0.0026   Epoch: 16   Global Step: 280200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:44,430-Speed 8618.68 samples/sec   Loss 3.7793   LearningRate 0.0026   Epoch: 16   Global Step: 280210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:45,549-Speed 9158.13 samples/sec   Loss 3.8736   LearningRate 0.0026   Epoch: 16   Global Step: 280220   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:52:46,690-Speed 8979.02 samples/sec   Loss 3.8489   LearningRate 0.0026   Epoch: 16   Global Step: 280230   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:52:47,864-Speed 8727.74 samples/sec   Loss 3.7772   LearningRate 0.0026   Epoch: 16   Global Step: 280240   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:52:48,994-Speed 9066.96 samples/sec   Loss 3.8944   LearningRate 0.0026   Epoch: 16   Global Step: 280250   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:52:50,161-Speed 8777.98 samples/sec   Loss 3.8884   LearningRate 0.0026   Epoch: 16   Global Step: 280260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:51,296-Speed 9030.42 samples/sec   Loss 3.8188   LearningRate 0.0026   Epoch: 16   Global Step: 280270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:52,434-Speed 9000.56 samples/sec   Loss 3.9139   LearningRate 0.0026   Epoch: 16   Global Step: 280280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:53,577-Speed 8962.17 samples/sec   Loss 3.8727   LearningRate 0.0026   Epoch: 16   Global Step: 280290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:54,659-Speed 9475.40 samples/sec   Loss 3.8363   LearningRate 0.0026   Epoch: 16   Global Step: 280300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:55,766-Speed 9249.47 samples/sec   Loss 3.8459   LearningRate 0.0026   Epoch: 16   Global Step: 280310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:56,936-Speed 8763.79 samples/sec   Loss 3.8016   LearningRate 0.0026   Epoch: 16   Global Step: 280320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:58,060-Speed 9114.32 samples/sec   Loss 3.8008   LearningRate 0.0026   Epoch: 16   Global Step: 280330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:52:59,185-Speed 9104.72 samples/sec   Loss 3.8444   LearningRate 0.0026   Epoch: 16   Global Step: 280340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:00,301-Speed 9181.25 samples/sec   Loss 3.8954   LearningRate 0.0026   Epoch: 16   Global Step: 280350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:01,430-Speed 9075.65 samples/sec   Loss 3.8831   LearningRate 0.0026   Epoch: 16   Global Step: 280360   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:53:02,545-Speed 9184.86 samples/sec   Loss 3.9190   LearningRate 0.0026   Epoch: 16   Global Step: 280370   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:53:03,664-Speed 9164.15 samples/sec   Loss 3.8478   LearningRate 0.0026   Epoch: 16   Global Step: 280380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:04,768-Speed 9280.38 samples/sec   Loss 3.9000   LearningRate 0.0026   Epoch: 16   Global Step: 280390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:05,882-Speed 9193.31 samples/sec   Loss 3.8774   LearningRate 0.0026   Epoch: 16   Global Step: 280400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:07,019-Speed 9018.72 samples/sec   Loss 3.8811   LearningRate 0.0026   Epoch: 16   Global Step: 280410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:08,116-Speed 9338.18 samples/sec   Loss 3.8224   LearningRate 0.0026   Epoch: 16   Global Step: 280420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:09,281-Speed 8793.69 samples/sec   Loss 3.8565   LearningRate 0.0026   Epoch: 16   Global Step: 280430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:10,382-Speed 9300.90 samples/sec   Loss 3.8914   LearningRate 0.0026   Epoch: 16   Global Step: 280440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:11,520-Speed 9005.90 samples/sec   Loss 3.8421   LearningRate 0.0026   Epoch: 16   Global Step: 280450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:12,644-Speed 9117.00 samples/sec   Loss 3.7619   LearningRate 0.0026   Epoch: 16   Global Step: 280460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:13,778-Speed 9034.12 samples/sec   Loss 3.8376   LearningRate 0.0026   Epoch: 16   Global Step: 280470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:14,862-Speed 9447.04 samples/sec   Loss 3.8363   LearningRate 0.0026   Epoch: 16   Global Step: 280480   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:53:15,993-Speed 9065.25 samples/sec   Loss 3.9531   LearningRate 0.0026   Epoch: 16   Global Step: 280490   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:53:17,097-Speed 9282.74 samples/sec   Loss 3.7872   LearningRate 0.0026   Epoch: 16   Global Step: 280500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:18,240-Speed 8960.23 samples/sec   Loss 3.8982   LearningRate 0.0026   Epoch: 16   Global Step: 280510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:19,315-Speed 9529.39 samples/sec   Loss 3.8540   LearningRate 0.0025   Epoch: 16   Global Step: 280520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:20,401-Speed 9433.99 samples/sec   Loss 3.8964   LearningRate 0.0025   Epoch: 16   Global Step: 280530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:21,511-Speed 9231.38 samples/sec   Loss 3.8728   LearningRate 0.0025   Epoch: 16   Global Step: 280540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:22,623-Speed 9210.47 samples/sec   Loss 3.8580   LearningRate 0.0025   Epoch: 16   Global Step: 280550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:23,795-Speed 8747.51 samples/sec   Loss 3.8657   LearningRate 0.0025   Epoch: 16   Global Step: 280560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:24,847-Speed 9735.16 samples/sec   Loss 3.7513   LearningRate 0.0025   Epoch: 16   Global Step: 280570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:25,937-Speed 9407.20 samples/sec   Loss 3.8767   LearningRate 0.0025   Epoch: 16   Global Step: 280580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:27,017-Speed 9485.34 samples/sec   Loss 3.9063   LearningRate 0.0025   Epoch: 16   Global Step: 280590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:28,140-Speed 9129.07 samples/sec   Loss 3.8690   LearningRate 0.0025   Epoch: 16   Global Step: 280600   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:53:29,288-Speed 8923.57 samples/sec   Loss 3.9215   LearningRate 0.0025   Epoch: 16   Global Step: 280610   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:53:30,397-Speed 9231.41 samples/sec   Loss 3.8378   LearningRate 0.0025   Epoch: 16   Global Step: 280620   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:53:31,502-Speed 9277.81 samples/sec   Loss 3.8445   LearningRate 0.0025   Epoch: 16   Global Step: 280630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:32,670-Speed 8769.21 samples/sec   Loss 3.8493   LearningRate 0.0025   Epoch: 16   Global Step: 280640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:33,828-Speed 8852.82 samples/sec   Loss 3.8080   LearningRate 0.0025   Epoch: 16   Global Step: 280650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:34,914-Speed 9434.10 samples/sec   Loss 3.8572   LearningRate 0.0025   Epoch: 16   Global Step: 280660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:36,073-Speed 8837.56 samples/sec   Loss 3.8181   LearningRate 0.0025   Epoch: 16   Global Step: 280670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:37,239-Speed 8784.92 samples/sec   Loss 3.8759   LearningRate 0.0025   Epoch: 16   Global Step: 280680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:38,404-Speed 8794.63 samples/sec   Loss 3.8247   LearningRate 0.0025   Epoch: 16   Global Step: 280690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:39,549-Speed 8947.46 samples/sec   Loss 3.9088   LearningRate 0.0025   Epoch: 16   Global Step: 280700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:40,666-Speed 9175.62 samples/sec   Loss 3.8445   LearningRate 0.0025   Epoch: 16   Global Step: 280710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:41,808-Speed 8964.82 samples/sec   Loss 3.8550   LearningRate 0.0025   Epoch: 16   Global Step: 280720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:42,950-Speed 8975.52 samples/sec   Loss 3.8765   LearningRate 0.0025   Epoch: 16   Global Step: 280730   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:53:44,057-Speed 9264.82 samples/sec   Loss 3.8362   LearningRate 0.0025   Epoch: 16   Global Step: 280740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:45,140-Speed 9458.76 samples/sec   Loss 3.7997   LearningRate 0.0025   Epoch: 16   Global Step: 280750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:46,220-Speed 9483.79 samples/sec   Loss 3.8337   LearningRate 0.0025   Epoch: 16   Global Step: 280760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:47,319-Speed 9324.01 samples/sec   Loss 3.9038   LearningRate 0.0025   Epoch: 16   Global Step: 280770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:48,443-Speed 9116.65 samples/sec   Loss 3.8725   LearningRate 0.0025   Epoch: 16   Global Step: 280780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:49,546-Speed 9290.16 samples/sec   Loss 3.8552   LearningRate 0.0025   Epoch: 16   Global Step: 280790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:50,654-Speed 9241.75 samples/sec   Loss 3.8191   LearningRate 0.0025   Epoch: 16   Global Step: 280800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:51,792-Speed 9007.51 samples/sec   Loss 3.8289   LearningRate 0.0025   Epoch: 16   Global Step: 280810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:52,868-Speed 9517.54 samples/sec   Loss 3.8162   LearningRate 0.0025   Epoch: 16   Global Step: 280820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:53,978-Speed 9232.02 samples/sec   Loss 3.8988   LearningRate 0.0025   Epoch: 16   Global Step: 280830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:55,079-Speed 9306.02 samples/sec   Loss 3.7335   LearningRate 0.0025   Epoch: 16   Global Step: 280840   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:53:56,214-Speed 9027.07 samples/sec   Loss 3.8624   LearningRate 0.0025   Epoch: 16   Global Step: 280850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:53:57,284-Speed 9576.34 samples/sec   Loss 3.7994   LearningRate 0.0025   Epoch: 16   Global Step: 280860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:58,428-Speed 8959.43 samples/sec   Loss 3.8747   LearningRate 0.0025   Epoch: 16   Global Step: 280870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:53:59,565-Speed 9009.78 samples/sec   Loss 3.8227   LearningRate 0.0025   Epoch: 16   Global Step: 280880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:00,742-Speed 8702.72 samples/sec   Loss 3.8530   LearningRate 0.0025   Epoch: 16   Global Step: 280890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:01,859-Speed 9180.55 samples/sec   Loss 3.9304   LearningRate 0.0025   Epoch: 16   Global Step: 280900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:02,967-Speed 9242.26 samples/sec   Loss 3.8847   LearningRate 0.0025   Epoch: 16   Global Step: 280910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:04,037-Speed 9583.37 samples/sec   Loss 3.8806   LearningRate 0.0025   Epoch: 16   Global Step: 280920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:05,162-Speed 9103.98 samples/sec   Loss 3.8829   LearningRate 0.0025   Epoch: 16   Global Step: 280930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:06,243-Speed 9477.89 samples/sec   Loss 3.8058   LearningRate 0.0025   Epoch: 16   Global Step: 280940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:07,290-Speed 9782.91 samples/sec   Loss 3.8852   LearningRate 0.0025   Epoch: 16   Global Step: 280950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:08,366-Speed 9524.05 samples/sec   Loss 3.9204   LearningRate 0.0025   Epoch: 16   Global Step: 280960   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:54:09,490-Speed 9120.20 samples/sec   Loss 3.9090   LearningRate 0.0025   Epoch: 16   Global Step: 280970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:54:10,588-Speed 9330.63 samples/sec   Loss 3.8813   LearningRate 0.0025   Epoch: 16   Global Step: 280980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:11,693-Speed 9269.93 samples/sec   Loss 3.8646   LearningRate 0.0025   Epoch: 16   Global Step: 280990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:12,855-Speed 8815.58 samples/sec   Loss 3.9056   LearningRate 0.0025   Epoch: 16   Global Step: 281000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:14,039-Speed 8653.41 samples/sec   Loss 3.8489   LearningRate 0.0025   Epoch: 16   Global Step: 281010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:15,174-Speed 9026.16 samples/sec   Loss 3.8180   LearningRate 0.0025   Epoch: 16   Global Step: 281020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:16,283-Speed 9241.50 samples/sec   Loss 3.9262   LearningRate 0.0025   Epoch: 16   Global Step: 281030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:17,416-Speed 9043.73 samples/sec   Loss 3.8324   LearningRate 0.0025   Epoch: 16   Global Step: 281040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:18,562-Speed 8940.12 samples/sec   Loss 3.8289   LearningRate 0.0025   Epoch: 16   Global Step: 281050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:19,741-Speed 8689.42 samples/sec   Loss 3.8427   LearningRate 0.0025   Epoch: 16   Global Step: 281060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:20,888-Speed 8938.26 samples/sec   Loss 3.9191   LearningRate 0.0025   Epoch: 16   Global Step: 281070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:22,039-Speed 8901.75 samples/sec   Loss 3.9460   LearningRate 0.0025   Epoch: 16   Global Step: 281080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:23,186-Speed 8928.15 samples/sec   Loss 3.7916   LearningRate 0.0025   Epoch: 16   Global Step: 281090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:24,289-Speed 9288.42 samples/sec   Loss 3.8188   LearningRate 0.0025   Epoch: 16   Global Step: 281100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:25,420-Speed 9059.16 samples/sec   Loss 3.9484   LearningRate 0.0025   Epoch: 16   Global Step: 281110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:26,535-Speed 9195.04 samples/sec   Loss 3.9037   LearningRate 0.0025   Epoch: 16   Global Step: 281120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:27,688-Speed 8882.12 samples/sec   Loss 3.8678   LearningRate 0.0025   Epoch: 16   Global Step: 281130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:28,793-Speed 9272.27 samples/sec   Loss 3.9190   LearningRate 0.0025   Epoch: 16   Global Step: 281140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:29,927-Speed 9039.16 samples/sec   Loss 3.8223   LearningRate 0.0025   Epoch: 16   Global Step: 281150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:31,072-Speed 8944.82 samples/sec   Loss 3.8788   LearningRate 0.0025   Epoch: 16   Global Step: 281160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:32,210-Speed 9000.80 samples/sec   Loss 3.8287   LearningRate 0.0025   Epoch: 16   Global Step: 281170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:33,372-Speed 8823.64 samples/sec   Loss 3.8292   LearningRate 0.0025   Epoch: 16   Global Step: 281180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:34,496-Speed 9113.91 samples/sec   Loss 3.8026   LearningRate 0.0025   Epoch: 16   Global Step: 281190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:35,646-Speed 8910.20 samples/sec   Loss 3.8397   LearningRate 0.0025   Epoch: 16   Global Step: 281200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:36,760-Speed 9193.18 samples/sec   Loss 3.7647   LearningRate 0.0025   Epoch: 16   Global Step: 281210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:37,870-Speed 9240.02 samples/sec   Loss 3.9202   LearningRate 0.0025   Epoch: 16   Global Step: 281220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:38,995-Speed 9105.61 samples/sec   Loss 3.8526   LearningRate 0.0025   Epoch: 16   Global Step: 281230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:40,116-Speed 9135.78 samples/sec   Loss 3.8064   LearningRate 0.0025   Epoch: 16   Global Step: 281240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:41,252-Speed 9025.11 samples/sec   Loss 3.8898   LearningRate 0.0025   Epoch: 16   Global Step: 281250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:42,355-Speed 9280.63 samples/sec   Loss 3.8825   LearningRate 0.0025   Epoch: 16   Global Step: 281260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:43,504-Speed 8919.84 samples/sec   Loss 3.8339   LearningRate 0.0025   Epoch: 16   Global Step: 281270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:44,600-Speed 9351.91 samples/sec   Loss 3.7654   LearningRate 0.0025   Epoch: 16   Global Step: 281280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:54:45,695-Speed 9356.78 samples/sec   Loss 3.9499   LearningRate 0.0025   Epoch: 16   Global Step: 281290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:46,779-Speed 9452.31 samples/sec   Loss 3.9079   LearningRate 0.0025   Epoch: 16   Global Step: 281300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:47,864-Speed 9439.74 samples/sec   Loss 3.8083   LearningRate 0.0025   Epoch: 16   Global Step: 281310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:49,024-Speed 8831.33 samples/sec   Loss 3.8728   LearningRate 0.0025   Epoch: 16   Global Step: 281320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:50,167-Speed 8963.94 samples/sec   Loss 3.8351   LearningRate 0.0025   Epoch: 16   Global Step: 281330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:51,285-Speed 9166.90 samples/sec   Loss 3.8134   LearningRate 0.0025   Epoch: 16   Global Step: 281340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:52,380-Speed 9351.54 samples/sec   Loss 3.9478   LearningRate 0.0025   Epoch: 16   Global Step: 281350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:53,541-Speed 8832.13 samples/sec   Loss 3.8064   LearningRate 0.0025   Epoch: 16   Global Step: 281360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:54,639-Speed 9330.04 samples/sec   Loss 3.8782   LearningRate 0.0025   Epoch: 16   Global Step: 281370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:55,734-Speed 9354.59 samples/sec   Loss 3.8909   LearningRate 0.0025   Epoch: 16   Global Step: 281380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:54:56,840-Speed 9267.22 samples/sec   Loss 3.8336   LearningRate 0.0025   Epoch: 16   Global Step: 281390   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:54:58,023-Speed 8664.26 samples/sec   Loss 3.8825   LearningRate 0.0025   Epoch: 16   Global Step: 281400   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:54:59,153-Speed 9070.74 samples/sec   Loss 3.8557   LearningRate 0.0025   Epoch: 16   Global Step: 281410   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:55:00,279-Speed 9092.69 samples/sec   Loss 3.8516   LearningRate 0.0025   Epoch: 16   Global Step: 281420   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:55:01,386-Speed 9258.44 samples/sec   Loss 3.9245   LearningRate 0.0025   Epoch: 16   Global Step: 281430   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:55:02,523-Speed 9012.36 samples/sec   Loss 3.7993   LearningRate 0.0025   Epoch: 16   Global Step: 281440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:03,641-Speed 9166.66 samples/sec   Loss 3.8850   LearningRate 0.0025   Epoch: 16   Global Step: 281450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:04,720-Speed 9501.54 samples/sec   Loss 3.8305   LearningRate 0.0025   Epoch: 16   Global Step: 281460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:05,833-Speed 9203.33 samples/sec   Loss 3.8825   LearningRate 0.0025   Epoch: 16   Global Step: 281470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:06,950-Speed 9172.69 samples/sec   Loss 3.8697   LearningRate 0.0025   Epoch: 16   Global Step: 281480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:08,094-Speed 8953.61 samples/sec   Loss 3.8879   LearningRate 0.0025   Epoch: 16   Global Step: 281490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:09,199-Speed 9269.31 samples/sec   Loss 3.9524   LearningRate 0.0025   Epoch: 16   Global Step: 281500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:10,328-Speed 9074.55 samples/sec   Loss 3.8828   LearningRate 0.0025   Epoch: 16   Global Step: 281510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:11,456-Speed 9082.43 samples/sec   Loss 3.8447   LearningRate 0.0025   Epoch: 16   Global Step: 281520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:12,612-Speed 8864.01 samples/sec   Loss 3.9068   LearningRate 0.0025   Epoch: 16   Global Step: 281530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:13,697-Speed 9441.92 samples/sec   Loss 3.7837   LearningRate 0.0025   Epoch: 16   Global Step: 281540   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:55:14,848-Speed 8902.89 samples/sec   Loss 3.7684   LearningRate 0.0025   Epoch: 16   Global Step: 281550   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:55:15,989-Speed 8987.66 samples/sec   Loss 3.9085   LearningRate 0.0025   Epoch: 16   Global Step: 281560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:17,094-Speed 9272.68 samples/sec   Loss 3.8752   LearningRate 0.0024   Epoch: 16   Global Step: 281570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:18,161-Speed 9602.70 samples/sec   Loss 3.8767   LearningRate 0.0024   Epoch: 16   Global Step: 281580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:19,304-Speed 8963.59 samples/sec   Loss 3.8176   LearningRate 0.0024   Epoch: 16   Global Step: 281590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:20,417-Speed 9199.84 samples/sec   Loss 3.8356   LearningRate 0.0024   Epoch: 16   Global Step: 281600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:21,573-Speed 8864.51 samples/sec   Loss 3.7916   LearningRate 0.0024   Epoch: 16   Global Step: 281610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:22,670-Speed 9341.11 samples/sec   Loss 3.9017   LearningRate 0.0024   Epoch: 16   Global Step: 281620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:23,828-Speed 8848.42 samples/sec   Loss 3.8780   LearningRate 0.0024   Epoch: 16   Global Step: 281630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:24,982-Speed 8876.71 samples/sec   Loss 3.8215   LearningRate 0.0024   Epoch: 16   Global Step: 281640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:26,115-Speed 9040.26 samples/sec   Loss 3.8393   LearningRate 0.0024   Epoch: 16   Global Step: 281650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:27,256-Speed 8983.39 samples/sec   Loss 3.9210   LearningRate 0.0024   Epoch: 16   Global Step: 281660   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:55:28,358-Speed 9297.49 samples/sec   Loss 3.8738   LearningRate 0.0024   Epoch: 16   Global Step: 281670   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:55:29,466-Speed 9243.87 samples/sec   Loss 3.9550   LearningRate 0.0024   Epoch: 16   Global Step: 281680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:55:30,583-Speed 9176.09 samples/sec   Loss 3.8008   LearningRate 0.0024   Epoch: 16   Global Step: 281690   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:55:31,699-Speed 9176.26 samples/sec   Loss 3.9421   LearningRate 0.0024   Epoch: 16   Global Step: 281700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:32,821-Speed 9131.14 samples/sec   Loss 3.9412   LearningRate 0.0024   Epoch: 16   Global Step: 281710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:33,929-Speed 9261.88 samples/sec   Loss 3.7829   LearningRate 0.0024   Epoch: 16   Global Step: 281720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:35,025-Speed 9350.87 samples/sec   Loss 3.8741   LearningRate 0.0024   Epoch: 16   Global Step: 281730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:36,912-Speed 5426.70 samples/sec   Loss 3.8623   LearningRate 0.0024   Epoch: 16   Global Step: 281740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:38,025-Speed 9209.24 samples/sec   Loss 3.9031   LearningRate 0.0024   Epoch: 16   Global Step: 281750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:39,156-Speed 9057.83 samples/sec   Loss 3.7834   LearningRate 0.0024   Epoch: 16   Global Step: 281760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:40,271-Speed 9183.48 samples/sec   Loss 3.8684   LearningRate 0.0024   Epoch: 16   Global Step: 281770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:55:41,455-Speed 8654.83 samples/sec   Loss 3.9187   LearningRate 0.0024   Epoch: 16   Global Step: 281780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:55:42,565-Speed 9234.71 samples/sec   Loss 3.8062   LearningRate 0.0024   Epoch: 16   Global Step: 281790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:55:43,719-Speed 8875.46 samples/sec   Loss 3.9130   LearningRate 0.0024   Epoch: 16   Global Step: 281800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:55:44,836-Speed 9170.71 samples/sec   Loss 3.8552   LearningRate 0.0024   Epoch: 16   Global Step: 281810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:55:45,922-Speed 9442.95 samples/sec   Loss 3.8561   LearningRate 0.0024   Epoch: 16   Global Step: 281820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:55:47,044-Speed 9132.00 samples/sec   Loss 3.8762   LearningRate 0.0024   Epoch: 16   Global Step: 281830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:55:48,159-Speed 9187.95 samples/sec   Loss 3.8871   LearningRate 0.0024   Epoch: 16   Global Step: 281840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:55:49,238-Speed 9495.93 samples/sec   Loss 3.8634   LearningRate 0.0024   Epoch: 16   Global Step: 281850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:55:50,361-Speed 9122.56 samples/sec   Loss 3.7854   LearningRate 0.0024   Epoch: 16   Global Step: 281860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:55:51,441-Speed 9479.68 samples/sec   Loss 3.9176   LearningRate 0.0024   Epoch: 16   Global Step: 281870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:52,547-Speed 9263.65 samples/sec   Loss 3.8954   LearningRate 0.0024   Epoch: 16   Global Step: 281880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:53,679-Speed 9055.85 samples/sec   Loss 3.7497   LearningRate 0.0024   Epoch: 16   Global Step: 281890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:54,793-Speed 9195.00 samples/sec   Loss 3.8709   LearningRate 0.0024   Epoch: 16   Global Step: 281900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:55,876-Speed 9459.29 samples/sec   Loss 3.9203   LearningRate 0.0024   Epoch: 16   Global Step: 281910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:57,019-Speed 8962.05 samples/sec   Loss 3.7808   LearningRate 0.0024   Epoch: 16   Global Step: 281920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:58,127-Speed 9246.53 samples/sec   Loss 3.8955   LearningRate 0.0024   Epoch: 16   Global Step: 281930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:55:59,239-Speed 9221.73 samples/sec   Loss 3.8213   LearningRate 0.0024   Epoch: 16   Global Step: 281940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:56:00,355-Speed 9174.47 samples/sec   Loss 3.8070   LearningRate 0.0024   Epoch: 16   Global Step: 281950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:56:01,476-Speed 9142.96 samples/sec   Loss 3.8190   LearningRate 0.0024   Epoch: 16   Global Step: 281960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:56:02,599-Speed 9122.61 samples/sec   Loss 3.9588   LearningRate 0.0024   Epoch: 16   Global Step: 281970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:56:03,675-Speed 9525.48 samples/sec   Loss 3.8863   LearningRate 0.0024   Epoch: 16   Global Step: 281980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:56:04,776-Speed 9305.90 samples/sec   Loss 3.8517   LearningRate 0.0024   Epoch: 16   Global Step: 281990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:56:05,896-Speed 9147.41 samples/sec   Loss 3.8703   LearningRate 0.0024   Epoch: 16   Global Step: 282000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:56:27,725-[lfw][282000]XNorm: 6.857243
Training: 2022-04-11 22:56:27,725-[lfw][282000]Accuracy-Flip: 0.99733+-0.00309
Training: 2022-04-11 22:56:27,726-[lfw][282000]Accuracy-Highest: 0.99733
Training: 2022-04-11 22:56:52,884-[cfp_fp][282000]XNorm: 5.975030
Training: 2022-04-11 22:56:52,884-[cfp_fp][282000]Accuracy-Flip: 0.97386+-0.00919
Training: 2022-04-11 22:56:52,885-[cfp_fp][282000]Accuracy-Highest: 0.97386
Training: 2022-04-11 22:57:14,592-[agedb_30][282000]XNorm: 6.682816
Training: 2022-04-11 22:57:14,593-[agedb_30][282000]Accuracy-Flip: 0.97250+-0.00870
Training: 2022-04-11 22:57:14,593-[agedb_30][282000]Accuracy-Highest: 0.97350
Training: 2022-04-11 22:57:15,695-Speed 146.71 samples/sec   Loss 3.8338   LearningRate 0.0024   Epoch: 16   Global Step: 282010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:16,815-Speed 9148.69 samples/sec   Loss 3.8612   LearningRate 0.0024   Epoch: 16   Global Step: 282020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:17,945-Speed 9069.32 samples/sec   Loss 3.8369   LearningRate 0.0024   Epoch: 16   Global Step: 282030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:19,064-Speed 9153.02 samples/sec   Loss 3.8576   LearningRate 0.0024   Epoch: 16   Global Step: 282040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:20,187-Speed 9128.91 samples/sec   Loss 3.8185   LearningRate 0.0024   Epoch: 16   Global Step: 282050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:21,272-Speed 9440.29 samples/sec   Loss 3.9598   LearningRate 0.0024   Epoch: 16   Global Step: 282060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:22,409-Speed 9021.77 samples/sec   Loss 3.8742   LearningRate 0.0024   Epoch: 16   Global Step: 282070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:23,487-Speed 9504.14 samples/sec   Loss 3.8644   LearningRate 0.0024   Epoch: 16   Global Step: 282080   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:57:24,594-Speed 9254.77 samples/sec   Loss 3.8651   LearningRate 0.0024   Epoch: 16   Global Step: 282090   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:57:25,695-Speed 9306.39 samples/sec   Loss 3.7992   LearningRate 0.0024   Epoch: 16   Global Step: 282100   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:57:26,790-Speed 9352.28 samples/sec   Loss 3.8838   LearningRate 0.0024   Epoch: 16   Global Step: 282110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:27,888-Speed 9336.81 samples/sec   Loss 3.8403   LearningRate 0.0024   Epoch: 16   Global Step: 282120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:29,070-Speed 8668.52 samples/sec   Loss 3.7843   LearningRate 0.0024   Epoch: 16   Global Step: 282130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:30,190-Speed 9146.75 samples/sec   Loss 3.8915   LearningRate 0.0024   Epoch: 16   Global Step: 282140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:31,327-Speed 9009.37 samples/sec   Loss 3.8485   LearningRate 0.0024   Epoch: 16   Global Step: 282150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:32,450-Speed 9121.48 samples/sec   Loss 3.8901   LearningRate 0.0024   Epoch: 16   Global Step: 282160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:33,591-Speed 8980.58 samples/sec   Loss 3.8500   LearningRate 0.0024   Epoch: 16   Global Step: 282170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:34,754-Speed 8807.83 samples/sec   Loss 3.8401   LearningRate 0.0024   Epoch: 16   Global Step: 282180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:35,866-Speed 9220.14 samples/sec   Loss 3.8148   LearningRate 0.0024   Epoch: 16   Global Step: 282190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:36,987-Speed 9148.33 samples/sec   Loss 3.8550   LearningRate 0.0024   Epoch: 16   Global Step: 282200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:38,075-Speed 9412.27 samples/sec   Loss 3.9032   LearningRate 0.0024   Epoch: 16   Global Step: 282210   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:57:39,183-Speed 9247.80 samples/sec   Loss 3.9255   LearningRate 0.0024   Epoch: 16   Global Step: 282220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:40,331-Speed 8928.06 samples/sec   Loss 3.8605   LearningRate 0.0024   Epoch: 16   Global Step: 282230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:41,464-Speed 9036.49 samples/sec   Loss 3.9283   LearningRate 0.0024   Epoch: 16   Global Step: 282240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:42,571-Speed 9256.82 samples/sec   Loss 3.8454   LearningRate 0.0024   Epoch: 16   Global Step: 282250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:43,710-Speed 8995.28 samples/sec   Loss 3.8238   LearningRate 0.0024   Epoch: 16   Global Step: 282260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:44,842-Speed 9057.05 samples/sec   Loss 3.8076   LearningRate 0.0024   Epoch: 16   Global Step: 282270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:45,998-Speed 8856.08 samples/sec   Loss 3.8694   LearningRate 0.0024   Epoch: 16   Global Step: 282280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:47,143-Speed 8954.45 samples/sec   Loss 3.8273   LearningRate 0.0024   Epoch: 16   Global Step: 282290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:48,289-Speed 8943.52 samples/sec   Loss 3.8977   LearningRate 0.0024   Epoch: 16   Global Step: 282300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:49,400-Speed 9220.52 samples/sec   Loss 3.8799   LearningRate 0.0024   Epoch: 16   Global Step: 282310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:50,553-Speed 8880.64 samples/sec   Loss 3.9055   LearningRate 0.0024   Epoch: 16   Global Step: 282320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:57:51,660-Speed 9256.64 samples/sec   Loss 3.9431   LearningRate 0.0024   Epoch: 16   Global Step: 282330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:52,759-Speed 9327.10 samples/sec   Loss 3.9137   LearningRate 0.0024   Epoch: 16   Global Step: 282340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:53,897-Speed 9002.42 samples/sec   Loss 3.8364   LearningRate 0.0024   Epoch: 16   Global Step: 282350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:55,048-Speed 8894.65 samples/sec   Loss 3.8131   LearningRate 0.0024   Epoch: 16   Global Step: 282360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:56,201-Speed 8890.39 samples/sec   Loss 3.9555   LearningRate 0.0024   Epoch: 16   Global Step: 282370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:57,289-Speed 9416.20 samples/sec   Loss 3.8118   LearningRate 0.0024   Epoch: 16   Global Step: 282380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:58,446-Speed 8857.76 samples/sec   Loss 3.8796   LearningRate 0.0024   Epoch: 16   Global Step: 282390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:57:59,519-Speed 9549.86 samples/sec   Loss 3.9079   LearningRate 0.0024   Epoch: 16   Global Step: 282400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:00,651-Speed 9053.91 samples/sec   Loss 3.8348   LearningRate 0.0024   Epoch: 16   Global Step: 282410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:01,795-Speed 8957.20 samples/sec   Loss 3.8644   LearningRate 0.0024   Epoch: 16   Global Step: 282420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:02,880-Speed 9452.55 samples/sec   Loss 3.8103   LearningRate 0.0024   Epoch: 16   Global Step: 282430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:03,969-Speed 9407.49 samples/sec   Loss 3.8786   LearningRate 0.0024   Epoch: 16   Global Step: 282440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:05,038-Speed 9586.71 samples/sec   Loss 3.9038   LearningRate 0.0024   Epoch: 16   Global Step: 282450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:06,197-Speed 8836.38 samples/sec   Loss 3.8555   LearningRate 0.0024   Epoch: 16   Global Step: 282460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:07,334-Speed 9016.44 samples/sec   Loss 3.8463   LearningRate 0.0024   Epoch: 16   Global Step: 282470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:08,491-Speed 8852.72 samples/sec   Loss 3.7898   LearningRate 0.0024   Epoch: 16   Global Step: 282480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:09,614-Speed 9124.95 samples/sec   Loss 3.7671   LearningRate 0.0024   Epoch: 16   Global Step: 282490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:10,752-Speed 9000.54 samples/sec   Loss 3.8650   LearningRate 0.0024   Epoch: 16   Global Step: 282500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:11,875-Speed 9133.33 samples/sec   Loss 3.8312   LearningRate 0.0024   Epoch: 16   Global Step: 282510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:13,020-Speed 8942.11 samples/sec   Loss 3.8685   LearningRate 0.0024   Epoch: 16   Global Step: 282520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:14,216-Speed 8565.68 samples/sec   Loss 3.8100   LearningRate 0.0024   Epoch: 16   Global Step: 282530   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:58:15,342-Speed 9108.52 samples/sec   Loss 3.8380   LearningRate 0.0024   Epoch: 16   Global Step: 282540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:16,513-Speed 8743.79 samples/sec   Loss 3.8970   LearningRate 0.0024   Epoch: 16   Global Step: 282550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:17,659-Speed 8941.53 samples/sec   Loss 3.9172   LearningRate 0.0024   Epoch: 16   Global Step: 282560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:18,818-Speed 8840.84 samples/sec   Loss 3.8933   LearningRate 0.0024   Epoch: 16   Global Step: 282570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:19,973-Speed 8874.41 samples/sec   Loss 3.8563   LearningRate 0.0024   Epoch: 16   Global Step: 282580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:21,118-Speed 8946.67 samples/sec   Loss 3.8376   LearningRate 0.0024   Epoch: 16   Global Step: 282590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:22,258-Speed 8986.44 samples/sec   Loss 3.8652   LearningRate 0.0024   Epoch: 16   Global Step: 282600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:23,386-Speed 9082.23 samples/sec   Loss 3.9164   LearningRate 0.0024   Epoch: 16   Global Step: 282610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:24,534-Speed 8926.18 samples/sec   Loss 3.8089   LearningRate 0.0024   Epoch: 16   Global Step: 282620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:25,641-Speed 9250.87 samples/sec   Loss 3.8189   LearningRate 0.0024   Epoch: 16   Global Step: 282630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:26,737-Speed 9347.97 samples/sec   Loss 3.9115   LearningRate 0.0024   Epoch: 16   Global Step: 282640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:27,880-Speed 8962.88 samples/sec   Loss 3.7963   LearningRate 0.0023   Epoch: 16   Global Step: 282650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:28,950-Speed 9576.86 samples/sec   Loss 3.8266   LearningRate 0.0023   Epoch: 16   Global Step: 282660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:30,071-Speed 9145.60 samples/sec   Loss 3.9511   LearningRate 0.0023   Epoch: 16   Global Step: 282670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:31,232-Speed 8825.84 samples/sec   Loss 3.8393   LearningRate 0.0023   Epoch: 16   Global Step: 282680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:32,400-Speed 8771.63 samples/sec   Loss 3.9105   LearningRate 0.0023   Epoch: 16   Global Step: 282690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:33,532-Speed 9053.88 samples/sec   Loss 3.8320   LearningRate 0.0023   Epoch: 16   Global Step: 282700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:34,648-Speed 9177.87 samples/sec   Loss 3.7442   LearningRate 0.0023   Epoch: 16   Global Step: 282710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:35,748-Speed 9313.39 samples/sec   Loss 3.8429   LearningRate 0.0023   Epoch: 16   Global Step: 282720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:36,882-Speed 9032.97 samples/sec   Loss 3.8533   LearningRate 0.0023   Epoch: 16   Global Step: 282730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:38,044-Speed 8819.37 samples/sec   Loss 3.8624   LearningRate 0.0023   Epoch: 16   Global Step: 282740   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:58:39,165-Speed 9140.10 samples/sec   Loss 3.8322   LearningRate 0.0023   Epoch: 16   Global Step: 282750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:40,331-Speed 8786.77 samples/sec   Loss 3.9126   LearningRate 0.0023   Epoch: 16   Global Step: 282760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:41,433-Speed 9295.76 samples/sec   Loss 3.8558   LearningRate 0.0023   Epoch: 16   Global Step: 282770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:42,529-Speed 9349.87 samples/sec   Loss 3.8705   LearningRate 0.0023   Epoch: 16   Global Step: 282780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:43,643-Speed 9194.73 samples/sec   Loss 3.8600   LearningRate 0.0023   Epoch: 16   Global Step: 282790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:44,752-Speed 9246.05 samples/sec   Loss 3.9158   LearningRate 0.0023   Epoch: 16   Global Step: 282800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:45,862-Speed 9227.52 samples/sec   Loss 3.8121   LearningRate 0.0023   Epoch: 16   Global Step: 282810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:47,023-Speed 8820.11 samples/sec   Loss 3.8703   LearningRate 0.0023   Epoch: 16   Global Step: 282820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:48,132-Speed 9244.91 samples/sec   Loss 3.8286   LearningRate 0.0023   Epoch: 16   Global Step: 282830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:49,278-Speed 8945.35 samples/sec   Loss 3.8450   LearningRate 0.0023   Epoch: 16   Global Step: 282840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:50,363-Speed 9444.02 samples/sec   Loss 3.8022   LearningRate 0.0023   Epoch: 16   Global Step: 282850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:58:51,484-Speed 9138.63 samples/sec   Loss 3.8869   LearningRate 0.0023   Epoch: 16   Global Step: 282860   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:58:52,621-Speed 9005.84 samples/sec   Loss 3.9278   LearningRate 0.0023   Epoch: 16   Global Step: 282870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:53,785-Speed 8804.57 samples/sec   Loss 3.8827   LearningRate 0.0023   Epoch: 16   Global Step: 282880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:54,984-Speed 8545.45 samples/sec   Loss 3.8151   LearningRate 0.0023   Epoch: 16   Global Step: 282890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:56,085-Speed 9301.77 samples/sec   Loss 3.8113   LearningRate 0.0023   Epoch: 16   Global Step: 282900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:57,200-Speed 9192.86 samples/sec   Loss 3.8566   LearningRate 0.0023   Epoch: 16   Global Step: 282910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:58,348-Speed 8921.63 samples/sec   Loss 3.8650   LearningRate 0.0023   Epoch: 16   Global Step: 282920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:58:59,453-Speed 9277.00 samples/sec   Loss 3.8621   LearningRate 0.0023   Epoch: 16   Global Step: 282930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:00,598-Speed 8945.25 samples/sec   Loss 3.8578   LearningRate 0.0023   Epoch: 16   Global Step: 282940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:01,764-Speed 8788.05 samples/sec   Loss 3.8423   LearningRate 0.0023   Epoch: 16   Global Step: 282950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:02,855-Speed 9388.87 samples/sec   Loss 3.8732   LearningRate 0.0023   Epoch: 16   Global Step: 282960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:03,942-Speed 9429.57 samples/sec   Loss 3.9083   LearningRate 0.0023   Epoch: 16   Global Step: 282970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:59:05,064-Speed 9136.62 samples/sec   Loss 3.8773   LearningRate 0.0023   Epoch: 16   Global Step: 282980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:06,150-Speed 9431.60 samples/sec   Loss 3.8601   LearningRate 0.0023   Epoch: 16   Global Step: 282990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:07,238-Speed 9420.22 samples/sec   Loss 3.8004   LearningRate 0.0023   Epoch: 16   Global Step: 283000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:08,407-Speed 8761.74 samples/sec   Loss 3.8342   LearningRate 0.0023   Epoch: 16   Global Step: 283010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:09,562-Speed 8867.39 samples/sec   Loss 3.8422   LearningRate 0.0023   Epoch: 16   Global Step: 283020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:10,681-Speed 9159.21 samples/sec   Loss 3.8403   LearningRate 0.0023   Epoch: 16   Global Step: 283030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:11,798-Speed 9169.35 samples/sec   Loss 3.8827   LearningRate 0.0023   Epoch: 16   Global Step: 283040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:12,887-Speed 9416.41 samples/sec   Loss 3.9007   LearningRate 0.0023   Epoch: 16   Global Step: 283050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:14,037-Speed 8904.24 samples/sec   Loss 3.8124   LearningRate 0.0023   Epoch: 16   Global Step: 283060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:15,188-Speed 8900.95 samples/sec   Loss 3.8818   LearningRate 0.0023   Epoch: 16   Global Step: 283070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:16,273-Speed 9448.79 samples/sec   Loss 3.8337   LearningRate 0.0023   Epoch: 16   Global Step: 283080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:17,417-Speed 8952.77 samples/sec   Loss 3.8898   LearningRate 0.0023   Epoch: 16   Global Step: 283090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:18,519-Speed 9298.43 samples/sec   Loss 3.8098   LearningRate 0.0023   Epoch: 16   Global Step: 283100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:19,658-Speed 8994.97 samples/sec   Loss 3.8642   LearningRate 0.0023   Epoch: 16   Global Step: 283110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:20,764-Speed 9260.24 samples/sec   Loss 3.8186   LearningRate 0.0023   Epoch: 16   Global Step: 283120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:21,876-Speed 9306.74 samples/sec   Loss 3.8227   LearningRate 0.0023   Epoch: 16   Global Step: 283130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:22,974-Speed 9334.14 samples/sec   Loss 3.8613   LearningRate 0.0023   Epoch: 16   Global Step: 283140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:24,123-Speed 8919.94 samples/sec   Loss 3.9189   LearningRate 0.0023   Epoch: 16   Global Step: 283150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:25,218-Speed 9351.29 samples/sec   Loss 3.9084   LearningRate 0.0023   Epoch: 16   Global Step: 283160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:26,437-Speed 8405.55 samples/sec   Loss 3.9305   LearningRate 0.0023   Epoch: 16   Global Step: 283170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:27,613-Speed 8710.28 samples/sec   Loss 3.9182   LearningRate 0.0023   Epoch: 16   Global Step: 283180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 22:59:28,740-Speed 9093.67 samples/sec   Loss 3.7868   LearningRate 0.0023   Epoch: 16   Global Step: 283190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:29,845-Speed 9269.24 samples/sec   Loss 3.7557   LearningRate 0.0023   Epoch: 16   Global Step: 283200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:30,989-Speed 8959.47 samples/sec   Loss 3.7719   LearningRate 0.0023   Epoch: 16   Global Step: 283210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:32,149-Speed 8829.41 samples/sec   Loss 3.8178   LearningRate 0.0023   Epoch: 16   Global Step: 283220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:33,297-Speed 8927.03 samples/sec   Loss 3.8707   LearningRate 0.0023   Epoch: 16   Global Step: 283230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:34,424-Speed 9093.56 samples/sec   Loss 3.8461   LearningRate 0.0023   Epoch: 16   Global Step: 283240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:35,530-Speed 9260.22 samples/sec   Loss 3.9598   LearningRate 0.0023   Epoch: 16   Global Step: 283250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:36,633-Speed 9288.21 samples/sec   Loss 3.7988   LearningRate 0.0023   Epoch: 16   Global Step: 283260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:37,801-Speed 8771.32 samples/sec   Loss 3.8339   LearningRate 0.0023   Epoch: 16   Global Step: 283270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:38,920-Speed 9153.40 samples/sec   Loss 3.9198   LearningRate 0.0023   Epoch: 16   Global Step: 283280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:40,043-Speed 9125.91 samples/sec   Loss 3.8033   LearningRate 0.0023   Epoch: 16   Global Step: 283290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:41,183-Speed 9007.29 samples/sec   Loss 3.9409   LearningRate 0.0023   Epoch: 16   Global Step: 283300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:42,282-Speed 9326.50 samples/sec   Loss 3.8011   LearningRate 0.0023   Epoch: 16   Global Step: 283310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:59:43,452-Speed 8755.58 samples/sec   Loss 3.8082   LearningRate 0.0023   Epoch: 16   Global Step: 283320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:59:44,602-Speed 8911.11 samples/sec   Loss 3.9136   LearningRate 0.0023   Epoch: 16   Global Step: 283330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:59:45,673-Speed 9563.78 samples/sec   Loss 3.9136   LearningRate 0.0023   Epoch: 16   Global Step: 283340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:59:46,772-Speed 9320.70 samples/sec   Loss 3.9019   LearningRate 0.0023   Epoch: 16   Global Step: 283350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:59:47,901-Speed 9071.61 samples/sec   Loss 3.8619   LearningRate 0.0023   Epoch: 16   Global Step: 283360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:59:49,013-Speed 9223.28 samples/sec   Loss 3.7791   LearningRate 0.0023   Epoch: 16   Global Step: 283370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:59:50,112-Speed 9317.76 samples/sec   Loss 3.9471   LearningRate 0.0023   Epoch: 16   Global Step: 283380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:59:51,227-Speed 9186.23 samples/sec   Loss 3.9200   LearningRate 0.0023   Epoch: 16   Global Step: 283390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:59:52,374-Speed 8935.02 samples/sec   Loss 3.8879   LearningRate 0.0023   Epoch: 16   Global Step: 283400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:59:53,475-Speed 9302.86 samples/sec   Loss 3.8691   LearningRate 0.0023   Epoch: 16   Global Step: 283410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:54,575-Speed 9317.74 samples/sec   Loss 3.8563   LearningRate 0.0023   Epoch: 16   Global Step: 283420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:55,689-Speed 9195.79 samples/sec   Loss 3.8296   LearningRate 0.0023   Epoch: 16   Global Step: 283430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 22:59:56,785-Speed 9347.89 samples/sec   Loss 3.9117   LearningRate 0.0023   Epoch: 16   Global Step: 283440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:59:57,935-Speed 8914.55 samples/sec   Loss 3.8088   LearningRate 0.0023   Epoch: 16   Global Step: 283450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 22:59:59,109-Speed 8724.67 samples/sec   Loss 3.8689   LearningRate 0.0023   Epoch: 16   Global Step: 283460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:00,253-Speed 8956.97 samples/sec   Loss 3.8223   LearningRate 0.0023   Epoch: 16   Global Step: 283470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:01,331-Speed 9514.37 samples/sec   Loss 3.9081   LearningRate 0.0023   Epoch: 16   Global Step: 283480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:02,421-Speed 9392.77 samples/sec   Loss 3.8700   LearningRate 0.0023   Epoch: 16   Global Step: 283490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:03,490-Speed 9584.71 samples/sec   Loss 3.8954   LearningRate 0.0023   Epoch: 16   Global Step: 283500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:04,590-Speed 9315.83 samples/sec   Loss 3.8567   LearningRate 0.0023   Epoch: 16   Global Step: 283510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:05,676-Speed 9433.42 samples/sec   Loss 3.8480   LearningRate 0.0023   Epoch: 16   Global Step: 283520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:06,734-Speed 9684.43 samples/sec   Loss 3.8508   LearningRate 0.0023   Epoch: 16   Global Step: 283530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:07,817-Speed 9464.52 samples/sec   Loss 3.9172   LearningRate 0.0023   Epoch: 16   Global Step: 283540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:00:08,928-Speed 9218.90 samples/sec   Loss 3.8944   LearningRate 0.0023   Epoch: 16   Global Step: 283550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:00:10,042-Speed 9193.08 samples/sec   Loss 3.8330   LearningRate 0.0023   Epoch: 16   Global Step: 283560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:00:11,192-Speed 8908.97 samples/sec   Loss 3.9412   LearningRate 0.0023   Epoch: 16   Global Step: 283570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:12,364-Speed 8745.66 samples/sec   Loss 3.8642   LearningRate 0.0023   Epoch: 16   Global Step: 283580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:13,427-Speed 9641.28 samples/sec   Loss 3.8424   LearningRate 0.0023   Epoch: 16   Global Step: 283590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:14,570-Speed 8960.73 samples/sec   Loss 3.8227   LearningRate 0.0023   Epoch: 16   Global Step: 283600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:15,721-Speed 8902.90 samples/sec   Loss 3.9396   LearningRate 0.0023   Epoch: 16   Global Step: 283610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:16,800-Speed 9494.45 samples/sec   Loss 3.8278   LearningRate 0.0023   Epoch: 16   Global Step: 283620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:17,902-Speed 9298.40 samples/sec   Loss 3.7516   LearningRate 0.0023   Epoch: 16   Global Step: 283630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:19,021-Speed 9161.30 samples/sec   Loss 3.9112   LearningRate 0.0023   Epoch: 16   Global Step: 283640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:20,131-Speed 9235.55 samples/sec   Loss 3.8416   LearningRate 0.0023   Epoch: 16   Global Step: 283650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:21,261-Speed 9065.75 samples/sec   Loss 3.8144   LearningRate 0.0023   Epoch: 16   Global Step: 283660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:00:22,461-Speed 8537.48 samples/sec   Loss 4.0328   LearningRate 0.0023   Epoch: 16   Global Step: 283670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:00:23,601-Speed 8982.11 samples/sec   Loss 3.8254   LearningRate 0.0023   Epoch: 16   Global Step: 283680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:00:24,698-Speed 9340.16 samples/sec   Loss 3.8305   LearningRate 0.0023   Epoch: 16   Global Step: 283690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:00:25,897-Speed 8547.87 samples/sec   Loss 3.8197   LearningRate 0.0023   Epoch: 16   Global Step: 283700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:00:27,057-Speed 8832.55 samples/sec   Loss 3.7826   LearningRate 0.0023   Epoch: 16   Global Step: 283710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:00:28,220-Speed 8810.10 samples/sec   Loss 3.8752   LearningRate 0.0023   Epoch: 16   Global Step: 283720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:00:29,353-Speed 9039.14 samples/sec   Loss 3.8027   LearningRate 0.0023   Epoch: 16   Global Step: 283730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:00:30,736-Speed 7408.68 samples/sec   Loss 3.8656   LearningRate 0.0023   Epoch: 16   Global Step: 283740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:00:58,625-Speed 367.19 samples/sec   Loss 3.8192   LearningRate 0.0022   Epoch: 17   Global Step: 283750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:00:59,978-Speed 7579.24 samples/sec   Loss 3.5038   LearningRate 0.0022   Epoch: 17   Global Step: 283760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:01,151-Speed 8736.32 samples/sec   Loss 3.4445   LearningRate 0.0022   Epoch: 17   Global Step: 283770   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:01:02,818-Speed 6146.99 samples/sec   Loss 3.3952   LearningRate 0.0022   Epoch: 17   Global Step: 283780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:04,433-Speed 6340.72 samples/sec   Loss 3.3905   LearningRate 0.0022   Epoch: 17   Global Step: 283790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:05,642-Speed 8474.65 samples/sec   Loss 3.4260   LearningRate 0.0022   Epoch: 17   Global Step: 283800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:06,804-Speed 8823.11 samples/sec   Loss 3.4252   LearningRate 0.0022   Epoch: 17   Global Step: 283810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:07,855-Speed 9750.06 samples/sec   Loss 3.4625   LearningRate 0.0022   Epoch: 17   Global Step: 283820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:09,027-Speed 8742.31 samples/sec   Loss 3.4486   LearningRate 0.0022   Epoch: 17   Global Step: 283830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:10,202-Speed 8719.98 samples/sec   Loss 3.5122   LearningRate 0.0022   Epoch: 17   Global Step: 283840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:11,308-Speed 9259.84 samples/sec   Loss 3.4640   LearningRate 0.0022   Epoch: 17   Global Step: 283850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:12,443-Speed 9027.26 samples/sec   Loss 3.4688   LearningRate 0.0022   Epoch: 17   Global Step: 283860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:13,561-Speed 9170.44 samples/sec   Loss 3.4699   LearningRate 0.0022   Epoch: 17   Global Step: 283870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:14,677-Speed 9276.70 samples/sec   Loss 3.4882   LearningRate 0.0022   Epoch: 17   Global Step: 283880   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:01:15,796-Speed 9157.23 samples/sec   Loss 3.5229   LearningRate 0.0022   Epoch: 17   Global Step: 283890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:16,929-Speed 9049.12 samples/sec   Loss 3.4912   LearningRate 0.0022   Epoch: 17   Global Step: 283900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:18,043-Speed 9192.66 samples/sec   Loss 3.5341   LearningRate 0.0022   Epoch: 17   Global Step: 283910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:19,127-Speed 9453.39 samples/sec   Loss 3.4809   LearningRate 0.0022   Epoch: 17   Global Step: 283920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:20,230-Speed 9293.82 samples/sec   Loss 3.4687   LearningRate 0.0022   Epoch: 17   Global Step: 283930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:21,336-Speed 9268.46 samples/sec   Loss 3.4351   LearningRate 0.0022   Epoch: 17   Global Step: 283940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:22,445-Speed 9240.39 samples/sec   Loss 3.5552   LearningRate 0.0022   Epoch: 17   Global Step: 283950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:23,601-Speed 8860.84 samples/sec   Loss 3.3921   LearningRate 0.0022   Epoch: 17   Global Step: 283960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:25,021-Speed 7215.04 samples/sec   Loss 3.3789   LearningRate 0.0022   Epoch: 17   Global Step: 283970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:26,156-Speed 9028.84 samples/sec   Loss 3.4393   LearningRate 0.0022   Epoch: 17   Global Step: 283980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:01:27,345-Speed 8616.79 samples/sec   Loss 3.3752   LearningRate 0.0022   Epoch: 17   Global Step: 283990   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:01:28,457-Speed 9211.02 samples/sec   Loss 3.4216   LearningRate 0.0022   Epoch: 17   Global Step: 284000   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:01:50,554-[lfw][284000]XNorm: 6.808961
Training: 2022-04-11 23:01:50,555-[lfw][284000]Accuracy-Flip: 0.99683+-0.00283
Training: 2022-04-11 23:01:50,555-[lfw][284000]Accuracy-Highest: 0.99733
Training: 2022-04-11 23:02:16,072-[cfp_fp][284000]XNorm: 5.943644
Training: 2022-04-11 23:02:16,073-[cfp_fp][284000]Accuracy-Flip: 0.97214+-0.00884
Training: 2022-04-11 23:02:16,073-[cfp_fp][284000]Accuracy-Highest: 0.97386
Training: 2022-04-11 23:02:38,145-[agedb_30][284000]XNorm: 6.613818
Training: 2022-04-11 23:02:38,145-[agedb_30][284000]Accuracy-Flip: 0.97300+-0.00833
Training: 2022-04-11 23:02:38,146-[agedb_30][284000]Accuracy-Highest: 0.97350
Training: 2022-04-11 23:02:39,252-Speed 144.65 samples/sec   Loss 3.5006   LearningRate 0.0022   Epoch: 17   Global Step: 284010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:02:40,355-Speed 9292.26 samples/sec   Loss 3.4498   LearningRate 0.0022   Epoch: 17   Global Step: 284020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:02:41,480-Speed 9103.98 samples/sec   Loss 3.4370   LearningRate 0.0022   Epoch: 17   Global Step: 284030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:02:42,575-Speed 9360.05 samples/sec   Loss 3.4808   LearningRate 0.0022   Epoch: 17   Global Step: 284040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:02:43,747-Speed 8741.38 samples/sec   Loss 3.4537   LearningRate 0.0022   Epoch: 17   Global Step: 284050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:02:44,889-Speed 8975.51 samples/sec   Loss 3.5411   LearningRate 0.0022   Epoch: 17   Global Step: 284060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:02:46,001-Speed 9216.28 samples/sec   Loss 3.5077   LearningRate 0.0022   Epoch: 17   Global Step: 284070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:02:47,074-Speed 9547.02 samples/sec   Loss 3.4425   LearningRate 0.0022   Epoch: 17   Global Step: 284080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:02:48,233-Speed 8840.12 samples/sec   Loss 3.4519   LearningRate 0.0022   Epoch: 17   Global Step: 284090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:02:49,302-Speed 9586.78 samples/sec   Loss 3.5136   LearningRate 0.0022   Epoch: 17   Global Step: 284100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:02:50,411-Speed 9236.58 samples/sec   Loss 3.4285   LearningRate 0.0022   Epoch: 17   Global Step: 284110   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:02:51,553-Speed 8973.93 samples/sec   Loss 3.4649   LearningRate 0.0022   Epoch: 17   Global Step: 284120   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:02:52,703-Speed 8903.32 samples/sec   Loss 3.5147   LearningRate 0.0022   Epoch: 17   Global Step: 284130   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:02:53,849-Speed 8941.66 samples/sec   Loss 3.4386   LearningRate 0.0022   Epoch: 17   Global Step: 284140   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:02:54,978-Speed 9080.83 samples/sec   Loss 3.5411   LearningRate 0.0022   Epoch: 17   Global Step: 284150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:02:56,086-Speed 9240.60 samples/sec   Loss 3.4517   LearningRate 0.0022   Epoch: 17   Global Step: 284160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:02:57,220-Speed 9038.54 samples/sec   Loss 3.4915   LearningRate 0.0022   Epoch: 17   Global Step: 284170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:02:58,321-Speed 9306.78 samples/sec   Loss 3.4473   LearningRate 0.0022   Epoch: 17   Global Step: 284180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:02:59,387-Speed 9616.59 samples/sec   Loss 3.4125   LearningRate 0.0022   Epoch: 17   Global Step: 284190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:00,478-Speed 9386.95 samples/sec   Loss 3.4626   LearningRate 0.0022   Epoch: 17   Global Step: 284200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:01,701-Speed 8378.94 samples/sec   Loss 3.4482   LearningRate 0.0022   Epoch: 17   Global Step: 284210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:02,849-Speed 8926.31 samples/sec   Loss 3.4777   LearningRate 0.0022   Epoch: 17   Global Step: 284220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:04,006-Speed 8850.86 samples/sec   Loss 3.5788   LearningRate 0.0022   Epoch: 17   Global Step: 284230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:05,082-Speed 9524.97 samples/sec   Loss 3.4990   LearningRate 0.0022   Epoch: 17   Global Step: 284240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:06,180-Speed 9335.89 samples/sec   Loss 3.4883   LearningRate 0.0022   Epoch: 17   Global Step: 284250   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:03:07,680-Speed 6828.12 samples/sec   Loss 3.3820   LearningRate 0.0022   Epoch: 17   Global Step: 284260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:03:08,796-Speed 9184.30 samples/sec   Loss 3.5774   LearningRate 0.0022   Epoch: 17   Global Step: 284270   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:03:09,927-Speed 9058.55 samples/sec   Loss 3.4990   LearningRate 0.0022   Epoch: 17   Global Step: 284280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:03:11,442-Speed 6760.65 samples/sec   Loss 3.4078   LearningRate 0.0022   Epoch: 17   Global Step: 284290   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:03:12,757-Speed 7793.30 samples/sec   Loss 3.5084   LearningRate 0.0022   Epoch: 17   Global Step: 284300   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:03:14,054-Speed 7896.05 samples/sec   Loss 3.5200   LearningRate 0.0022   Epoch: 17   Global Step: 284310   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:03:15,157-Speed 9292.38 samples/sec   Loss 3.4706   LearningRate 0.0022   Epoch: 17   Global Step: 284320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:16,430-Speed 8047.56 samples/sec   Loss 3.4677   LearningRate 0.0022   Epoch: 17   Global Step: 284330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:17,590-Speed 8831.21 samples/sec   Loss 3.4784   LearningRate 0.0022   Epoch: 17   Global Step: 284340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:18,730-Speed 8988.28 samples/sec   Loss 3.5153   LearningRate 0.0022   Epoch: 17   Global Step: 284350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:20,000-Speed 8070.36 samples/sec   Loss 3.5364   LearningRate 0.0022   Epoch: 17   Global Step: 284360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:21,102-Speed 9292.22 samples/sec   Loss 3.5600   LearningRate 0.0022   Epoch: 17   Global Step: 284370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:22,186-Speed 9451.50 samples/sec   Loss 3.4334   LearningRate 0.0022   Epoch: 17   Global Step: 284380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:23,387-Speed 8537.75 samples/sec   Loss 3.4260   LearningRate 0.0022   Epoch: 17   Global Step: 284390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:24,536-Speed 8915.99 samples/sec   Loss 3.3988   LearningRate 0.0022   Epoch: 17   Global Step: 284400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:25,620-Speed 9450.50 samples/sec   Loss 3.4554   LearningRate 0.0022   Epoch: 17   Global Step: 284410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:26,756-Speed 9015.79 samples/sec   Loss 3.5104   LearningRate 0.0022   Epoch: 17   Global Step: 284420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:27,917-Speed 8831.50 samples/sec   Loss 3.4705   LearningRate 0.0022   Epoch: 17   Global Step: 284430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:29,201-Speed 7973.85 samples/sec   Loss 3.4281   LearningRate 0.0022   Epoch: 17   Global Step: 284440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:30,292-Speed 9389.69 samples/sec   Loss 3.5810   LearningRate 0.0022   Epoch: 17   Global Step: 284450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:31,388-Speed 9353.93 samples/sec   Loss 3.5044   LearningRate 0.0022   Epoch: 17   Global Step: 284460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:32,508-Speed 9149.09 samples/sec   Loss 3.4749   LearningRate 0.0022   Epoch: 17   Global Step: 284470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:03:33,637-Speed 9071.13 samples/sec   Loss 3.3985   LearningRate 0.0022   Epoch: 17   Global Step: 284480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:03:34,752-Speed 9190.21 samples/sec   Loss 3.4313   LearningRate 0.0022   Epoch: 17   Global Step: 284490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:03:35,820-Speed 9591.24 samples/sec   Loss 3.4465   LearningRate 0.0022   Epoch: 17   Global Step: 284500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:03:36,958-Speed 9008.17 samples/sec   Loss 3.4601   LearningRate 0.0022   Epoch: 17   Global Step: 284510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:03:38,097-Speed 8990.39 samples/sec   Loss 3.5396   LearningRate 0.0022   Epoch: 17   Global Step: 284520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:03:39,234-Speed 9016.21 samples/sec   Loss 3.5066   LearningRate 0.0022   Epoch: 17   Global Step: 284530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:03:40,421-Speed 8626.75 samples/sec   Loss 3.4823   LearningRate 0.0022   Epoch: 17   Global Step: 284540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:03:41,544-Speed 9123.72 samples/sec   Loss 3.4951   LearningRate 0.0022   Epoch: 17   Global Step: 284550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:03:42,677-Speed 9053.55 samples/sec   Loss 3.5235   LearningRate 0.0022   Epoch: 17   Global Step: 284560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:03:43,771-Speed 9360.39 samples/sec   Loss 3.4949   LearningRate 0.0022   Epoch: 17   Global Step: 284570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:44,930-Speed 8845.85 samples/sec   Loss 3.4114   LearningRate 0.0022   Epoch: 17   Global Step: 284580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:46,047-Speed 9172.43 samples/sec   Loss 3.4537   LearningRate 0.0022   Epoch: 17   Global Step: 284590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:47,176-Speed 9072.29 samples/sec   Loss 3.4882   LearningRate 0.0022   Epoch: 17   Global Step: 284600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:48,329-Speed 8891.67 samples/sec   Loss 3.4691   LearningRate 0.0022   Epoch: 17   Global Step: 284610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:49,527-Speed 8554.70 samples/sec   Loss 3.5455   LearningRate 0.0022   Epoch: 17   Global Step: 284620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:50,686-Speed 8836.07 samples/sec   Loss 3.4825   LearningRate 0.0022   Epoch: 17   Global Step: 284630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:51,773-Speed 9425.56 samples/sec   Loss 3.4545   LearningRate 0.0022   Epoch: 17   Global Step: 284640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:52,945-Speed 8743.21 samples/sec   Loss 3.5222   LearningRate 0.0022   Epoch: 17   Global Step: 284650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:54,082-Speed 9010.82 samples/sec   Loss 3.5185   LearningRate 0.0022   Epoch: 17   Global Step: 284660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:55,264-Speed 8666.74 samples/sec   Loss 3.5093   LearningRate 0.0022   Epoch: 17   Global Step: 284670   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:03:56,439-Speed 8720.18 samples/sec   Loss 3.5268   LearningRate 0.0022   Epoch: 17   Global Step: 284680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:57,578-Speed 8998.33 samples/sec   Loss 3.4027   LearningRate 0.0022   Epoch: 17   Global Step: 284690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:58,717-Speed 8992.52 samples/sec   Loss 3.4304   LearningRate 0.0022   Epoch: 17   Global Step: 284700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:03:59,827-Speed 9228.43 samples/sec   Loss 3.5114   LearningRate 0.0022   Epoch: 17   Global Step: 284710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:00,954-Speed 9094.47 samples/sec   Loss 3.5100   LearningRate 0.0022   Epoch: 17   Global Step: 284720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:02,087-Speed 9049.39 samples/sec   Loss 3.4890   LearningRate 0.0022   Epoch: 17   Global Step: 284730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:03,233-Speed 8940.18 samples/sec   Loss 3.4476   LearningRate 0.0022   Epoch: 17   Global Step: 284740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:04,329-Speed 9346.17 samples/sec   Loss 3.5246   LearningRate 0.0022   Epoch: 17   Global Step: 284750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:05,434-Speed 9275.12 samples/sec   Loss 3.4807   LearningRate 0.0022   Epoch: 17   Global Step: 284760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:06,581-Speed 8934.58 samples/sec   Loss 3.5326   LearningRate 0.0022   Epoch: 17   Global Step: 284770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:04:07,717-Speed 9013.39 samples/sec   Loss 3.5205   LearningRate 0.0022   Epoch: 17   Global Step: 284780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:04:08,788-Speed 9573.39 samples/sec   Loss 3.4272   LearningRate 0.0022   Epoch: 17   Global Step: 284790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:04:09,911-Speed 9123.56 samples/sec   Loss 3.4549   LearningRate 0.0022   Epoch: 17   Global Step: 284800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:04:11,008-Speed 9338.72 samples/sec   Loss 3.4191   LearningRate 0.0022   Epoch: 17   Global Step: 284810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:04:12,177-Speed 8760.81 samples/sec   Loss 3.4584   LearningRate 0.0022   Epoch: 17   Global Step: 284820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:04:13,315-Speed 9012.46 samples/sec   Loss 3.5038   LearningRate 0.0022   Epoch: 17   Global Step: 284830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:04:14,390-Speed 9527.81 samples/sec   Loss 3.5695   LearningRate 0.0022   Epoch: 17   Global Step: 284840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:04:15,534-Speed 8955.45 samples/sec   Loss 3.4222   LearningRate 0.0022   Epoch: 17   Global Step: 284850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:04:16,667-Speed 9046.66 samples/sec   Loss 3.4228   LearningRate 0.0022   Epoch: 17   Global Step: 284860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:04:17,800-Speed 9043.85 samples/sec   Loss 3.4892   LearningRate 0.0022   Epoch: 17   Global Step: 284870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:18,911-Speed 9220.18 samples/sec   Loss 3.4388   LearningRate 0.0021   Epoch: 17   Global Step: 284880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:20,002-Speed 9398.58 samples/sec   Loss 3.4781   LearningRate 0.0021   Epoch: 17   Global Step: 284890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:21,102-Speed 9314.59 samples/sec   Loss 3.4779   LearningRate 0.0021   Epoch: 17   Global Step: 284900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:22,264-Speed 8819.26 samples/sec   Loss 3.5308   LearningRate 0.0021   Epoch: 17   Global Step: 284910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:23,410-Speed 8935.89 samples/sec   Loss 3.4939   LearningRate 0.0021   Epoch: 17   Global Step: 284920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:24,497-Speed 9429.21 samples/sec   Loss 3.5082   LearningRate 0.0021   Epoch: 17   Global Step: 284930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:25,599-Speed 9298.22 samples/sec   Loss 3.5097   LearningRate 0.0021   Epoch: 17   Global Step: 284940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:26,658-Speed 9674.68 samples/sec   Loss 3.4602   LearningRate 0.0021   Epoch: 17   Global Step: 284950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:27,801-Speed 8962.13 samples/sec   Loss 3.5424   LearningRate 0.0021   Epoch: 17   Global Step: 284960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:28,925-Speed 9109.34 samples/sec   Loss 3.4807   LearningRate 0.0021   Epoch: 17   Global Step: 284970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:30,057-Speed 9055.37 samples/sec   Loss 3.4581   LearningRate 0.0021   Epoch: 17   Global Step: 284980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:31,184-Speed 9091.41 samples/sec   Loss 3.5742   LearningRate 0.0021   Epoch: 17   Global Step: 284990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:32,322-Speed 9007.78 samples/sec   Loss 3.5798   LearningRate 0.0021   Epoch: 17   Global Step: 285000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:33,451-Speed 9076.96 samples/sec   Loss 3.5174   LearningRate 0.0021   Epoch: 17   Global Step: 285010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:34,584-Speed 9037.40 samples/sec   Loss 3.4932   LearningRate 0.0021   Epoch: 17   Global Step: 285020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:35,764-Speed 8686.38 samples/sec   Loss 3.4991   LearningRate 0.0021   Epoch: 17   Global Step: 285030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:36,901-Speed 9010.20 samples/sec   Loss 3.4518   LearningRate 0.0021   Epoch: 17   Global Step: 285040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:38,074-Speed 8730.59 samples/sec   Loss 3.4995   LearningRate 0.0021   Epoch: 17   Global Step: 285050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:39,200-Speed 9102.71 samples/sec   Loss 3.4351   LearningRate 0.0021   Epoch: 17   Global Step: 285060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:40,313-Speed 9206.45 samples/sec   Loss 3.4797   LearningRate 0.0021   Epoch: 17   Global Step: 285070   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:04:41,418-Speed 9271.67 samples/sec   Loss 3.5204   LearningRate 0.0021   Epoch: 17   Global Step: 285080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:42,527-Speed 9242.03 samples/sec   Loss 3.5051   LearningRate 0.0021   Epoch: 17   Global Step: 285090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:43,633-Speed 9269.06 samples/sec   Loss 3.4556   LearningRate 0.0021   Epoch: 17   Global Step: 285100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:44,723-Speed 9397.53 samples/sec   Loss 3.5163   LearningRate 0.0021   Epoch: 17   Global Step: 285110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:45,849-Speed 9096.94 samples/sec   Loss 3.5602   LearningRate 0.0021   Epoch: 17   Global Step: 285120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:46,987-Speed 9003.76 samples/sec   Loss 3.5111   LearningRate 0.0021   Epoch: 17   Global Step: 285130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:48,123-Speed 9023.27 samples/sec   Loss 3.4852   LearningRate 0.0021   Epoch: 17   Global Step: 285140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:49,274-Speed 8899.72 samples/sec   Loss 3.5200   LearningRate 0.0021   Epoch: 17   Global Step: 285150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:50,415-Speed 8981.30 samples/sec   Loss 3.5869   LearningRate 0.0021   Epoch: 17   Global Step: 285160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:51,523-Speed 9251.85 samples/sec   Loss 3.5357   LearningRate 0.0021   Epoch: 17   Global Step: 285170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:52,583-Speed 9665.99 samples/sec   Loss 3.4821   LearningRate 0.0021   Epoch: 17   Global Step: 285180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:04:53,675-Speed 9381.12 samples/sec   Loss 3.4638   LearningRate 0.0021   Epoch: 17   Global Step: 285190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:54,774-Speed 9319.44 samples/sec   Loss 3.5515   LearningRate 0.0021   Epoch: 17   Global Step: 285200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:55,891-Speed 9175.59 samples/sec   Loss 3.5377   LearningRate 0.0021   Epoch: 17   Global Step: 285210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:56,991-Speed 9306.97 samples/sec   Loss 3.5012   LearningRate 0.0021   Epoch: 17   Global Step: 285220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:58,084-Speed 9380.52 samples/sec   Loss 3.4844   LearningRate 0.0021   Epoch: 17   Global Step: 285230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:04:59,173-Speed 9404.53 samples/sec   Loss 3.4651   LearningRate 0.0021   Epoch: 17   Global Step: 285240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:00,284-Speed 9225.72 samples/sec   Loss 3.4881   LearningRate 0.0021   Epoch: 17   Global Step: 285250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:01,409-Speed 9110.21 samples/sec   Loss 3.5035   LearningRate 0.0021   Epoch: 17   Global Step: 285260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:02,527-Speed 9163.34 samples/sec   Loss 3.5347   LearningRate 0.0021   Epoch: 17   Global Step: 285270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:03,608-Speed 9478.11 samples/sec   Loss 3.4607   LearningRate 0.0021   Epoch: 17   Global Step: 285280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:04,736-Speed 9087.35 samples/sec   Loss 3.5779   LearningRate 0.0021   Epoch: 17   Global Step: 285290   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:05:05,848-Speed 9210.79 samples/sec   Loss 3.5300   LearningRate 0.0021   Epoch: 17   Global Step: 285300   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:05:06,944-Speed 9349.63 samples/sec   Loss 3.5591   LearningRate 0.0021   Epoch: 17   Global Step: 285310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:08,075-Speed 9057.24 samples/sec   Loss 3.5114   LearningRate 0.0021   Epoch: 17   Global Step: 285320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:09,177-Speed 9294.33 samples/sec   Loss 3.5692   LearningRate 0.0021   Epoch: 17   Global Step: 285330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:10,308-Speed 9058.96 samples/sec   Loss 3.5795   LearningRate 0.0021   Epoch: 17   Global Step: 285340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:11,456-Speed 8923.64 samples/sec   Loss 3.5412   LearningRate 0.0021   Epoch: 17   Global Step: 285350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:12,606-Speed 8918.36 samples/sec   Loss 3.4608   LearningRate 0.0021   Epoch: 17   Global Step: 285360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:13,751-Speed 8945.07 samples/sec   Loss 3.4483   LearningRate 0.0021   Epoch: 17   Global Step: 285370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:14,891-Speed 8993.67 samples/sec   Loss 3.4330   LearningRate 0.0021   Epoch: 17   Global Step: 285380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:16,022-Speed 9059.91 samples/sec   Loss 3.4842   LearningRate 0.0021   Epoch: 17   Global Step: 285390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:17,236-Speed 8440.07 samples/sec   Loss 3.4333   LearningRate 0.0021   Epoch: 17   Global Step: 285400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:18,342-Speed 9265.00 samples/sec   Loss 3.4456   LearningRate 0.0021   Epoch: 17   Global Step: 285410   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:05:19,446-Speed 9283.24 samples/sec   Loss 3.4947   LearningRate 0.0021   Epoch: 17   Global Step: 285420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:20,591-Speed 8946.64 samples/sec   Loss 3.5611   LearningRate 0.0021   Epoch: 17   Global Step: 285430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:21,710-Speed 9158.53 samples/sec   Loss 3.5965   LearningRate 0.0021   Epoch: 17   Global Step: 285440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:22,851-Speed 8976.17 samples/sec   Loss 3.4539   LearningRate 0.0021   Epoch: 17   Global Step: 285450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:23,946-Speed 9362.78 samples/sec   Loss 3.4783   LearningRate 0.0021   Epoch: 17   Global Step: 285460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:25,038-Speed 9381.57 samples/sec   Loss 3.5133   LearningRate 0.0021   Epoch: 17   Global Step: 285470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:26,113-Speed 9527.31 samples/sec   Loss 3.4333   LearningRate 0.0021   Epoch: 17   Global Step: 285480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:27,249-Speed 9024.47 samples/sec   Loss 3.5640   LearningRate 0.0021   Epoch: 17   Global Step: 285490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:28,339-Speed 9397.50 samples/sec   Loss 3.5160   LearningRate 0.0021   Epoch: 17   Global Step: 285500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:29,515-Speed 8708.70 samples/sec   Loss 3.5183   LearningRate 0.0021   Epoch: 17   Global Step: 285510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:30,635-Speed 9148.63 samples/sec   Loss 3.4925   LearningRate 0.0021   Epoch: 17   Global Step: 285520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:31,735-Speed 9325.97 samples/sec   Loss 3.4752   LearningRate 0.0021   Epoch: 17   Global Step: 285530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:32,829-Speed 9361.92 samples/sec   Loss 3.5341   LearningRate 0.0021   Epoch: 17   Global Step: 285540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:33,987-Speed 8845.94 samples/sec   Loss 3.5258   LearningRate 0.0021   Epoch: 17   Global Step: 285550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:35,104-Speed 9180.23 samples/sec   Loss 3.5077   LearningRate 0.0021   Epoch: 17   Global Step: 285560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:36,183-Speed 9493.73 samples/sec   Loss 3.4784   LearningRate 0.0021   Epoch: 17   Global Step: 285570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:37,256-Speed 9545.98 samples/sec   Loss 3.5262   LearningRate 0.0021   Epoch: 17   Global Step: 285580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:38,373-Speed 9176.28 samples/sec   Loss 3.5060   LearningRate 0.0021   Epoch: 17   Global Step: 285590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:39,447-Speed 9538.73 samples/sec   Loss 3.5016   LearningRate 0.0021   Epoch: 17   Global Step: 285600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:40,561-Speed 9193.07 samples/sec   Loss 3.4253   LearningRate 0.0021   Epoch: 17   Global Step: 285610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:41,664-Speed 9289.59 samples/sec   Loss 3.5325   LearningRate 0.0021   Epoch: 17   Global Step: 285620   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:05:42,761-Speed 9349.91 samples/sec   Loss 3.5152   LearningRate 0.0021   Epoch: 17   Global Step: 285630   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:05:43,887-Speed 9096.26 samples/sec   Loss 3.5171   LearningRate 0.0021   Epoch: 17   Global Step: 285640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:45,014-Speed 9090.09 samples/sec   Loss 3.4797   LearningRate 0.0021   Epoch: 17   Global Step: 285650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:46,117-Speed 9291.46 samples/sec   Loss 3.5612   LearningRate 0.0021   Epoch: 17   Global Step: 285660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:47,295-Speed 8696.84 samples/sec   Loss 3.4965   LearningRate 0.0021   Epoch: 17   Global Step: 285670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:48,447-Speed 8889.03 samples/sec   Loss 3.5372   LearningRate 0.0021   Epoch: 17   Global Step: 285680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:49,604-Speed 8860.88 samples/sec   Loss 3.5149   LearningRate 0.0021   Epoch: 17   Global Step: 285690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:50,688-Speed 9453.29 samples/sec   Loss 3.4274   LearningRate 0.0021   Epoch: 17   Global Step: 285700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:51,753-Speed 9619.76 samples/sec   Loss 3.5806   LearningRate 0.0021   Epoch: 17   Global Step: 285710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:52,866-Speed 9207.92 samples/sec   Loss 3.4816   LearningRate 0.0021   Epoch: 17   Global Step: 285720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:53,982-Speed 9183.52 samples/sec   Loss 3.6355   LearningRate 0.0021   Epoch: 17   Global Step: 285730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:55,128-Speed 8937.32 samples/sec   Loss 3.5413   LearningRate 0.0021   Epoch: 17   Global Step: 285740   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:05:56,247-Speed 9157.72 samples/sec   Loss 3.4397   LearningRate 0.0021   Epoch: 17   Global Step: 285750   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:05:57,342-Speed 9358.49 samples/sec   Loss 3.5289   LearningRate 0.0021   Epoch: 17   Global Step: 285760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:58,499-Speed 8854.46 samples/sec   Loss 3.6070   LearningRate 0.0021   Epoch: 17   Global Step: 285770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:05:59,622-Speed 9121.16 samples/sec   Loss 3.5880   LearningRate 0.0021   Epoch: 17   Global Step: 285780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:00,765-Speed 8959.09 samples/sec   Loss 3.4926   LearningRate 0.0021   Epoch: 17   Global Step: 285790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:01,851-Speed 9438.71 samples/sec   Loss 3.5134   LearningRate 0.0021   Epoch: 17   Global Step: 285800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:02,972-Speed 9138.95 samples/sec   Loss 3.4403   LearningRate 0.0021   Epoch: 17   Global Step: 285810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:04,063-Speed 9395.09 samples/sec   Loss 3.4524   LearningRate 0.0021   Epoch: 17   Global Step: 285820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:05,155-Speed 9382.77 samples/sec   Loss 3.5095   LearningRate 0.0021   Epoch: 17   Global Step: 285830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:06,359-Speed 8505.89 samples/sec   Loss 3.4677   LearningRate 0.0021   Epoch: 17   Global Step: 285840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:07,496-Speed 9008.57 samples/sec   Loss 3.5389   LearningRate 0.0021   Epoch: 17   Global Step: 285850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:08,599-Speed 9292.20 samples/sec   Loss 3.6047   LearningRate 0.0021   Epoch: 17   Global Step: 285860   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:06:09,742-Speed 8964.09 samples/sec   Loss 3.5882   LearningRate 0.0021   Epoch: 17   Global Step: 285870   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:06:10,829-Speed 9428.30 samples/sec   Loss 3.5101   LearningRate 0.0021   Epoch: 17   Global Step: 285880   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:06:11,919-Speed 9400.04 samples/sec   Loss 3.5385   LearningRate 0.0021   Epoch: 17   Global Step: 285890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:13,051-Speed 9055.86 samples/sec   Loss 3.5075   LearningRate 0.0021   Epoch: 17   Global Step: 285900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:14,175-Speed 9110.57 samples/sec   Loss 3.5414   LearningRate 0.0021   Epoch: 17   Global Step: 285910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:15,301-Speed 9103.32 samples/sec   Loss 3.4932   LearningRate 0.0021   Epoch: 17   Global Step: 285920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:16,428-Speed 9086.39 samples/sec   Loss 3.6678   LearningRate 0.0021   Epoch: 17   Global Step: 285930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:17,569-Speed 8981.90 samples/sec   Loss 3.5995   LearningRate 0.0021   Epoch: 17   Global Step: 285940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:18,730-Speed 8820.06 samples/sec   Loss 3.4815   LearningRate 0.0021   Epoch: 17   Global Step: 285950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:19,812-Speed 9473.68 samples/sec   Loss 3.5031   LearningRate 0.0021   Epoch: 17   Global Step: 285960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:20,952-Speed 8990.33 samples/sec   Loss 3.5497   LearningRate 0.0021   Epoch: 17   Global Step: 285970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:22,102-Speed 8907.20 samples/sec   Loss 3.5035   LearningRate 0.0021   Epoch: 17   Global Step: 285980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:23,209-Speed 9256.26 samples/sec   Loss 3.4890   LearningRate 0.0021   Epoch: 17   Global Step: 285990   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:06:24,318-Speed 9245.94 samples/sec   Loss 3.4916   LearningRate 0.0021   Epoch: 17   Global Step: 286000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:06:46,419-[lfw][286000]XNorm: 6.755071
Training: 2022-04-11 23:06:46,420-[lfw][286000]Accuracy-Flip: 0.99600+-0.00281
Training: 2022-04-11 23:06:46,420-[lfw][286000]Accuracy-Highest: 0.99733
Training: 2022-04-11 23:07:11,982-[cfp_fp][286000]XNorm: 5.887999
Training: 2022-04-11 23:07:11,983-[cfp_fp][286000]Accuracy-Flip: 0.97329+-0.00740
Training: 2022-04-11 23:07:11,983-[cfp_fp][286000]Accuracy-Highest: 0.97386
Training: 2022-04-11 23:07:33,992-[agedb_30][286000]XNorm: 6.568239
Training: 2022-04-11 23:07:33,993-[agedb_30][286000]Accuracy-Flip: 0.97317+-0.00855
Training: 2022-04-11 23:07:33,993-[agedb_30][286000]Accuracy-Highest: 0.97350
Training: 2022-04-11 23:07:35,070-Speed 144.73 samples/sec   Loss 3.5722   LearningRate 0.0021   Epoch: 17   Global Step: 286010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:36,137-Speed 9601.57 samples/sec   Loss 3.5254   LearningRate 0.0021   Epoch: 17   Global Step: 286020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:37,269-Speed 9050.56 samples/sec   Loss 3.5247   LearningRate 0.0020   Epoch: 17   Global Step: 286030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:38,329-Speed 9658.01 samples/sec   Loss 3.5424   LearningRate 0.0020   Epoch: 17   Global Step: 286040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:39,450-Speed 9142.93 samples/sec   Loss 3.4834   LearningRate 0.0020   Epoch: 17   Global Step: 286050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:40,528-Speed 9507.43 samples/sec   Loss 3.4885   LearningRate 0.0020   Epoch: 17   Global Step: 286060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:41,644-Speed 9178.40 samples/sec   Loss 3.5547   LearningRate 0.0020   Epoch: 17   Global Step: 286070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:42,748-Speed 9285.65 samples/sec   Loss 3.5373   LearningRate 0.0020   Epoch: 17   Global Step: 286080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:43,856-Speed 9243.04 samples/sec   Loss 3.5114   LearningRate 0.0020   Epoch: 17   Global Step: 286090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:44,920-Speed 9632.74 samples/sec   Loss 3.4847   LearningRate 0.0020   Epoch: 17   Global Step: 286100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:46,042-Speed 9131.46 samples/sec   Loss 3.6205   LearningRate 0.0020   Epoch: 17   Global Step: 286110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:47,155-Speed 9201.56 samples/sec   Loss 3.4545   LearningRate 0.0020   Epoch: 17   Global Step: 286120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:48,287-Speed 9051.81 samples/sec   Loss 3.5276   LearningRate 0.0020   Epoch: 17   Global Step: 286130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:49,355-Speed 9591.50 samples/sec   Loss 3.4182   LearningRate 0.0020   Epoch: 17   Global Step: 286140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:50,445-Speed 9402.86 samples/sec   Loss 3.4654   LearningRate 0.0020   Epoch: 17   Global Step: 286150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:51,598-Speed 8888.85 samples/sec   Loss 3.5565   LearningRate 0.0020   Epoch: 17   Global Step: 286160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:52,725-Speed 9085.53 samples/sec   Loss 3.5709   LearningRate 0.0020   Epoch: 17   Global Step: 286170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:53,833-Speed 9252.17 samples/sec   Loss 3.4885   LearningRate 0.0020   Epoch: 17   Global Step: 286180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:54,922-Speed 9408.16 samples/sec   Loss 3.5372   LearningRate 0.0020   Epoch: 17   Global Step: 286190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:56,125-Speed 8514.03 samples/sec   Loss 3.4050   LearningRate 0.0020   Epoch: 17   Global Step: 286200   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:07:57,211-Speed 9429.05 samples/sec   Loss 3.5630   LearningRate 0.0020   Epoch: 17   Global Step: 286210   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:07:58,356-Speed 8956.91 samples/sec   Loss 3.5809   LearningRate 0.0020   Epoch: 17   Global Step: 286220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:07:59,465-Speed 9245.24 samples/sec   Loss 3.5298   LearningRate 0.0020   Epoch: 17   Global Step: 286230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:00,590-Speed 9108.24 samples/sec   Loss 3.5576   LearningRate 0.0020   Epoch: 17   Global Step: 286240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:01,740-Speed 8904.32 samples/sec   Loss 3.4594   LearningRate 0.0020   Epoch: 17   Global Step: 286250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:02,853-Speed 9205.44 samples/sec   Loss 3.5679   LearningRate 0.0020   Epoch: 17   Global Step: 286260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:03,959-Speed 9270.05 samples/sec   Loss 3.5218   LearningRate 0.0020   Epoch: 17   Global Step: 286270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:05,056-Speed 9340.03 samples/sec   Loss 3.5647   LearningRate 0.0020   Epoch: 17   Global Step: 286280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:06,193-Speed 9007.92 samples/sec   Loss 3.5184   LearningRate 0.0020   Epoch: 17   Global Step: 286290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:07,315-Speed 9134.53 samples/sec   Loss 3.4520   LearningRate 0.0020   Epoch: 17   Global Step: 286300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:08,460-Speed 8945.85 samples/sec   Loss 3.4603   LearningRate 0.0020   Epoch: 17   Global Step: 286310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:09,577-Speed 9170.12 samples/sec   Loss 3.4997   LearningRate 0.0020   Epoch: 17   Global Step: 286320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:10,694-Speed 9178.82 samples/sec   Loss 3.6195   LearningRate 0.0020   Epoch: 17   Global Step: 286330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:11,862-Speed 8769.15 samples/sec   Loss 3.5619   LearningRate 0.0020   Epoch: 17   Global Step: 286340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:12,917-Speed 9712.19 samples/sec   Loss 3.5391   LearningRate 0.0020   Epoch: 17   Global Step: 286350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:14,040-Speed 9128.29 samples/sec   Loss 3.5415   LearningRate 0.0020   Epoch: 17   Global Step: 286360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:15,175-Speed 9024.13 samples/sec   Loss 3.5551   LearningRate 0.0020   Epoch: 17   Global Step: 286370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:16,319-Speed 8954.34 samples/sec   Loss 3.4620   LearningRate 0.0020   Epoch: 17   Global Step: 286380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:17,446-Speed 9095.66 samples/sec   Loss 3.5027   LearningRate 0.0020   Epoch: 17   Global Step: 286390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:18,600-Speed 8878.94 samples/sec   Loss 3.4729   LearningRate 0.0020   Epoch: 17   Global Step: 286400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:19,720-Speed 9147.36 samples/sec   Loss 3.5522   LearningRate 0.0020   Epoch: 17   Global Step: 286410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:20,799-Speed 9504.14 samples/sec   Loss 3.5057   LearningRate 0.0020   Epoch: 17   Global Step: 286420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:21,936-Speed 9009.91 samples/sec   Loss 3.5717   LearningRate 0.0020   Epoch: 17   Global Step: 286430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:23,093-Speed 8849.83 samples/sec   Loss 3.5574   LearningRate 0.0020   Epoch: 17   Global Step: 286440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:24,218-Speed 9108.84 samples/sec   Loss 3.5815   LearningRate 0.0020   Epoch: 17   Global Step: 286450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:25,299-Speed 9475.47 samples/sec   Loss 3.5593   LearningRate 0.0020   Epoch: 17   Global Step: 286460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:26,393-Speed 9370.28 samples/sec   Loss 3.5881   LearningRate 0.0020   Epoch: 17   Global Step: 286470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:27,472-Speed 9492.13 samples/sec   Loss 3.5187   LearningRate 0.0020   Epoch: 17   Global Step: 286480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:28,624-Speed 8899.10 samples/sec   Loss 3.5082   LearningRate 0.0020   Epoch: 17   Global Step: 286490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:29,748-Speed 9109.00 samples/sec   Loss 3.5061   LearningRate 0.0020   Epoch: 17   Global Step: 286500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:30,855-Speed 9261.70 samples/sec   Loss 3.4816   LearningRate 0.0020   Epoch: 17   Global Step: 286510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:31,958-Speed 9285.95 samples/sec   Loss 3.4800   LearningRate 0.0020   Epoch: 17   Global Step: 286520   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:08:33,061-Speed 9289.89 samples/sec   Loss 3.5200   LearningRate 0.0020   Epoch: 17   Global Step: 286530   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:08:34,157-Speed 9349.03 samples/sec   Loss 3.5806   LearningRate 0.0020   Epoch: 17   Global Step: 286540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:35,270-Speed 9205.16 samples/sec   Loss 3.4952   LearningRate 0.0020   Epoch: 17   Global Step: 286550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:36,423-Speed 8885.72 samples/sec   Loss 3.5785   LearningRate 0.0020   Epoch: 17   Global Step: 286560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:37,516-Speed 9371.69 samples/sec   Loss 3.5098   LearningRate 0.0020   Epoch: 17   Global Step: 286570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:38,641-Speed 9108.83 samples/sec   Loss 3.5080   LearningRate 0.0020   Epoch: 17   Global Step: 286580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:39,753-Speed 9217.29 samples/sec   Loss 3.6064   LearningRate 0.0020   Epoch: 17   Global Step: 286590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:40,847-Speed 9378.26 samples/sec   Loss 3.4990   LearningRate 0.0020   Epoch: 17   Global Step: 286600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:41,992-Speed 8953.34 samples/sec   Loss 3.5272   LearningRate 0.0020   Epoch: 17   Global Step: 286610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:43,082-Speed 9391.96 samples/sec   Loss 3.4231   LearningRate 0.0020   Epoch: 17   Global Step: 286620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:44,241-Speed 8841.75 samples/sec   Loss 3.5732   LearningRate 0.0020   Epoch: 17   Global Step: 286630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:45,351-Speed 9230.85 samples/sec   Loss 3.5064   LearningRate 0.0020   Epoch: 17   Global Step: 286640   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:08:46,449-Speed 9329.80 samples/sec   Loss 3.5461   LearningRate 0.0020   Epoch: 17   Global Step: 286650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:47,586-Speed 9010.78 samples/sec   Loss 3.5648   LearningRate 0.0020   Epoch: 17   Global Step: 286660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:48,733-Speed 8938.39 samples/sec   Loss 3.4407   LearningRate 0.0020   Epoch: 17   Global Step: 286670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:49,914-Speed 8672.75 samples/sec   Loss 3.4862   LearningRate 0.0020   Epoch: 17   Global Step: 286680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:51,049-Speed 9028.07 samples/sec   Loss 3.4433   LearningRate 0.0020   Epoch: 17   Global Step: 286690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:52,164-Speed 9193.84 samples/sec   Loss 3.5657   LearningRate 0.0020   Epoch: 17   Global Step: 286700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:53,338-Speed 8726.89 samples/sec   Loss 3.5959   LearningRate 0.0020   Epoch: 17   Global Step: 286710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:54,482-Speed 8957.26 samples/sec   Loss 3.5139   LearningRate 0.0020   Epoch: 17   Global Step: 286720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:55,593-Speed 9220.36 samples/sec   Loss 3.4875   LearningRate 0.0020   Epoch: 17   Global Step: 286730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:56,749-Speed 8863.59 samples/sec   Loss 3.4533   LearningRate 0.0020   Epoch: 17   Global Step: 286740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:57,903-Speed 8882.91 samples/sec   Loss 3.5369   LearningRate 0.0020   Epoch: 17   Global Step: 286750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:08:59,038-Speed 9021.91 samples/sec   Loss 3.5538   LearningRate 0.0020   Epoch: 17   Global Step: 286760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:00,167-Speed 9080.33 samples/sec   Loss 3.5359   LearningRate 0.0020   Epoch: 17   Global Step: 286770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:01,324-Speed 8849.94 samples/sec   Loss 3.5507   LearningRate 0.0020   Epoch: 17   Global Step: 286780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:02,451-Speed 9094.80 samples/sec   Loss 3.5725   LearningRate 0.0020   Epoch: 17   Global Step: 286790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:03,617-Speed 8783.58 samples/sec   Loss 3.6002   LearningRate 0.0020   Epoch: 17   Global Step: 286800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:04,681-Speed 9628.82 samples/sec   Loss 3.4754   LearningRate 0.0020   Epoch: 17   Global Step: 286810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:05,800-Speed 9155.22 samples/sec   Loss 3.5303   LearningRate 0.0020   Epoch: 17   Global Step: 286820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:06,961-Speed 8828.99 samples/sec   Loss 3.5063   LearningRate 0.0020   Epoch: 17   Global Step: 286830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:08,100-Speed 8993.34 samples/sec   Loss 3.5792   LearningRate 0.0020   Epoch: 17   Global Step: 286840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:09,261-Speed 8823.57 samples/sec   Loss 3.5952   LearningRate 0.0020   Epoch: 17   Global Step: 286850   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:09:10,362-Speed 9306.66 samples/sec   Loss 3.4918   LearningRate 0.0020   Epoch: 17   Global Step: 286860   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:09:11,461-Speed 9323.74 samples/sec   Loss 3.5824   LearningRate 0.0020   Epoch: 17   Global Step: 286870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:12,627-Speed 8788.66 samples/sec   Loss 3.5155   LearningRate 0.0020   Epoch: 17   Global Step: 286880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:13,772-Speed 8952.55 samples/sec   Loss 3.4996   LearningRate 0.0020   Epoch: 17   Global Step: 286890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:14,852-Speed 9489.20 samples/sec   Loss 3.4209   LearningRate 0.0020   Epoch: 17   Global Step: 286900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:15,978-Speed 9093.90 samples/sec   Loss 3.5120   LearningRate 0.0020   Epoch: 17   Global Step: 286910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:17,075-Speed 9344.49 samples/sec   Loss 3.5454   LearningRate 0.0020   Epoch: 17   Global Step: 286920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:18,201-Speed 9098.57 samples/sec   Loss 3.4595   LearningRate 0.0020   Epoch: 17   Global Step: 286930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:19,318-Speed 9175.04 samples/sec   Loss 3.5366   LearningRate 0.0020   Epoch: 17   Global Step: 286940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:20,458-Speed 8986.47 samples/sec   Loss 3.4223   LearningRate 0.0020   Epoch: 17   Global Step: 286950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:21,589-Speed 9060.90 samples/sec   Loss 3.5173   LearningRate 0.0020   Epoch: 17   Global Step: 286960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:22,740-Speed 8902.56 samples/sec   Loss 3.5625   LearningRate 0.0020   Epoch: 17   Global Step: 286970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:09:23,833-Speed 9371.84 samples/sec   Loss 3.5504   LearningRate 0.0020   Epoch: 17   Global Step: 286980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:24,952-Speed 9161.16 samples/sec   Loss 3.5438   LearningRate 0.0020   Epoch: 17   Global Step: 286990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:26,075-Speed 9116.94 samples/sec   Loss 3.6042   LearningRate 0.0020   Epoch: 17   Global Step: 287000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:27,208-Speed 9048.23 samples/sec   Loss 3.5473   LearningRate 0.0020   Epoch: 17   Global Step: 287010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:28,350-Speed 8978.58 samples/sec   Loss 3.5090   LearningRate 0.0020   Epoch: 17   Global Step: 287020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:29,468-Speed 9164.85 samples/sec   Loss 3.5018   LearningRate 0.0020   Epoch: 17   Global Step: 287030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:30,602-Speed 9034.82 samples/sec   Loss 3.5538   LearningRate 0.0020   Epoch: 17   Global Step: 287040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:31,783-Speed 8675.18 samples/sec   Loss 3.5516   LearningRate 0.0020   Epoch: 17   Global Step: 287050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:32,931-Speed 8920.24 samples/sec   Loss 3.6219   LearningRate 0.0020   Epoch: 17   Global Step: 287060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:34,023-Speed 9385.30 samples/sec   Loss 3.5398   LearningRate 0.0020   Epoch: 17   Global Step: 287070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:35,090-Speed 9608.68 samples/sec   Loss 3.5315   LearningRate 0.0020   Epoch: 17   Global Step: 287080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:36,211-Speed 9133.62 samples/sec   Loss 3.5085   LearningRate 0.0020   Epoch: 17   Global Step: 287090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:37,308-Speed 9343.80 samples/sec   Loss 3.6252   LearningRate 0.0020   Epoch: 17   Global Step: 287100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:38,442-Speed 9031.81 samples/sec   Loss 3.4333   LearningRate 0.0020   Epoch: 17   Global Step: 287110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:39,554-Speed 9213.97 samples/sec   Loss 3.5998   LearningRate 0.0020   Epoch: 17   Global Step: 287120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:40,645-Speed 9396.32 samples/sec   Loss 3.5595   LearningRate 0.0020   Epoch: 17   Global Step: 287130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:41,716-Speed 9562.90 samples/sec   Loss 3.5832   LearningRate 0.0020   Epoch: 17   Global Step: 287140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:42,831-Speed 9188.49 samples/sec   Loss 3.4951   LearningRate 0.0020   Epoch: 17   Global Step: 287150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:44,018-Speed 8637.43 samples/sec   Loss 3.5449   LearningRate 0.0020   Epoch: 17   Global Step: 287160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:45,142-Speed 9108.81 samples/sec   Loss 3.5313   LearningRate 0.0020   Epoch: 17   Global Step: 287170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:46,232-Speed 9400.97 samples/sec   Loss 3.5235   LearningRate 0.0020   Epoch: 17   Global Step: 287180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:47,358-Speed 9103.58 samples/sec   Loss 3.5858   LearningRate 0.0020   Epoch: 17   Global Step: 287190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:48,426-Speed 9587.87 samples/sec   Loss 3.5514   LearningRate 0.0020   Epoch: 17   Global Step: 287200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:49,531-Speed 9275.67 samples/sec   Loss 3.6494   LearningRate 0.0019   Epoch: 17   Global Step: 287210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:50,631-Speed 9314.83 samples/sec   Loss 3.5385   LearningRate 0.0019   Epoch: 17   Global Step: 287220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:51,762-Speed 9061.99 samples/sec   Loss 3.5323   LearningRate 0.0019   Epoch: 17   Global Step: 287230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:52,902-Speed 8991.11 samples/sec   Loss 3.5470   LearningRate 0.0019   Epoch: 17   Global Step: 287240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:54,020-Speed 9165.66 samples/sec   Loss 3.6139   LearningRate 0.0019   Epoch: 17   Global Step: 287250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:55,149-Speed 9076.63 samples/sec   Loss 3.6014   LearningRate 0.0019   Epoch: 17   Global Step: 287260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:56,287-Speed 9004.59 samples/sec   Loss 3.5381   LearningRate 0.0019   Epoch: 17   Global Step: 287270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:57,375-Speed 9415.22 samples/sec   Loss 3.5867   LearningRate 0.0019   Epoch: 17   Global Step: 287280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:09:58,479-Speed 9284.28 samples/sec   Loss 3.5112   LearningRate 0.0019   Epoch: 17   Global Step: 287290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:09:59,582-Speed 9286.92 samples/sec   Loss 3.4966   LearningRate 0.0019   Epoch: 17   Global Step: 287300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:00,731-Speed 8921.65 samples/sec   Loss 3.4608   LearningRate 0.0019   Epoch: 17   Global Step: 287310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:01,857-Speed 9098.40 samples/sec   Loss 3.5100   LearningRate 0.0019   Epoch: 17   Global Step: 287320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:03,008-Speed 8898.79 samples/sec   Loss 3.5271   LearningRate 0.0019   Epoch: 17   Global Step: 287330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:04,099-Speed 9388.34 samples/sec   Loss 3.5556   LearningRate 0.0019   Epoch: 17   Global Step: 287340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:05,230-Speed 9057.68 samples/sec   Loss 3.5224   LearningRate 0.0019   Epoch: 17   Global Step: 287350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:06,370-Speed 8992.91 samples/sec   Loss 3.5130   LearningRate 0.0019   Epoch: 17   Global Step: 287360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:07,476-Speed 9259.13 samples/sec   Loss 3.5675   LearningRate 0.0019   Epoch: 17   Global Step: 287370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:08,670-Speed 8584.20 samples/sec   Loss 3.5671   LearningRate 0.0019   Epoch: 17   Global Step: 287380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:09,830-Speed 8834.27 samples/sec   Loss 3.5387   LearningRate 0.0019   Epoch: 17   Global Step: 287390   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:10:10,971-Speed 8984.31 samples/sec   Loss 3.5222   LearningRate 0.0019   Epoch: 17   Global Step: 287400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:12,089-Speed 9167.53 samples/sec   Loss 3.5033   LearningRate 0.0019   Epoch: 17   Global Step: 287410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:13,194-Speed 9270.52 samples/sec   Loss 3.5355   LearningRate 0.0019   Epoch: 17   Global Step: 287420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:14,326-Speed 9053.37 samples/sec   Loss 3.5349   LearningRate 0.0019   Epoch: 17   Global Step: 287430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:16,305-Speed 5174.49 samples/sec   Loss 3.5886   LearningRate 0.0019   Epoch: 17   Global Step: 287440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:17,427-Speed 9132.18 samples/sec   Loss 3.4906   LearningRate 0.0019   Epoch: 17   Global Step: 287450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:18,531-Speed 9283.13 samples/sec   Loss 3.5395   LearningRate 0.0019   Epoch: 17   Global Step: 287460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:19,608-Speed 9509.67 samples/sec   Loss 3.4607   LearningRate 0.0019   Epoch: 17   Global Step: 287470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:20,721-Speed 9201.55 samples/sec   Loss 3.5886   LearningRate 0.0019   Epoch: 17   Global Step: 287480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:21,796-Speed 9538.22 samples/sec   Loss 3.5298   LearningRate 0.0019   Epoch: 17   Global Step: 287490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:22,913-Speed 9174.77 samples/sec   Loss 3.5922   LearningRate 0.0019   Epoch: 17   Global Step: 287500   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:10:24,001-Speed 9415.14 samples/sec   Loss 3.4990   LearningRate 0.0019   Epoch: 17   Global Step: 287510   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:10:25,164-Speed 8805.73 samples/sec   Loss 3.5562   LearningRate 0.0019   Epoch: 17   Global Step: 287520   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:10:26,275-Speed 9226.31 samples/sec   Loss 3.5119   LearningRate 0.0019   Epoch: 17   Global Step: 287530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:27,357-Speed 9474.02 samples/sec   Loss 3.5672   LearningRate 0.0019   Epoch: 17   Global Step: 287540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:28,460-Speed 9289.79 samples/sec   Loss 3.5028   LearningRate 0.0019   Epoch: 17   Global Step: 287550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:29,528-Speed 9595.54 samples/sec   Loss 3.4663   LearningRate 0.0019   Epoch: 17   Global Step: 287560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:30,645-Speed 9168.22 samples/sec   Loss 3.5773   LearningRate 0.0019   Epoch: 17   Global Step: 287570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:31,739-Speed 9372.16 samples/sec   Loss 3.4658   LearningRate 0.0019   Epoch: 17   Global Step: 287580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:32,809-Speed 9573.03 samples/sec   Loss 3.5415   LearningRate 0.0019   Epoch: 17   Global Step: 287590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:33,903-Speed 9363.98 samples/sec   Loss 3.5218   LearningRate 0.0019   Epoch: 17   Global Step: 287600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:35,035-Speed 9052.81 samples/sec   Loss 3.5477   LearningRate 0.0019   Epoch: 17   Global Step: 287610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:36,139-Speed 9278.22 samples/sec   Loss 3.4977   LearningRate 0.0019   Epoch: 17   Global Step: 287620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:37,297-Speed 8851.10 samples/sec   Loss 3.5608   LearningRate 0.0019   Epoch: 17   Global Step: 287630   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:10:38,396-Speed 9320.24 samples/sec   Loss 3.5013   LearningRate 0.0019   Epoch: 17   Global Step: 287640   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:10:39,517-Speed 9140.91 samples/sec   Loss 3.5600   LearningRate 0.0019   Epoch: 17   Global Step: 287650   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:10:40,643-Speed 9103.64 samples/sec   Loss 3.5958   LearningRate 0.0019   Epoch: 17   Global Step: 287660   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:10:41,701-Speed 9678.13 samples/sec   Loss 3.6269   LearningRate 0.0019   Epoch: 17   Global Step: 287670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:42,813-Speed 9218.55 samples/sec   Loss 3.5164   LearningRate 0.0019   Epoch: 17   Global Step: 287680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:43,920-Speed 9251.67 samples/sec   Loss 3.5347   LearningRate 0.0019   Epoch: 17   Global Step: 287690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:45,013-Speed 9376.70 samples/sec   Loss 3.5657   LearningRate 0.0019   Epoch: 17   Global Step: 287700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:46,155-Speed 8971.70 samples/sec   Loss 3.6128   LearningRate 0.0019   Epoch: 17   Global Step: 287710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:47,310-Speed 8869.42 samples/sec   Loss 3.5953   LearningRate 0.0019   Epoch: 17   Global Step: 287720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:48,380-Speed 9581.28 samples/sec   Loss 3.6039   LearningRate 0.0019   Epoch: 17   Global Step: 287730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:49,551-Speed 8744.19 samples/sec   Loss 3.5251   LearningRate 0.0019   Epoch: 17   Global Step: 287740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:50,640-Speed 9405.41 samples/sec   Loss 3.5105   LearningRate 0.0019   Epoch: 17   Global Step: 287750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:51,758-Speed 9174.08 samples/sec   Loss 3.5490   LearningRate 0.0019   Epoch: 17   Global Step: 287760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:52,879-Speed 9136.13 samples/sec   Loss 3.6169   LearningRate 0.0019   Epoch: 17   Global Step: 287770   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:10:53,975-Speed 9352.82 samples/sec   Loss 3.5821   LearningRate 0.0019   Epoch: 17   Global Step: 287780   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:10:55,037-Speed 9644.99 samples/sec   Loss 3.5529   LearningRate 0.0019   Epoch: 17   Global Step: 287790   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:10:56,204-Speed 8782.05 samples/sec   Loss 3.5658   LearningRate 0.0019   Epoch: 17   Global Step: 287800   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:10:57,330-Speed 9097.23 samples/sec   Loss 3.5313   LearningRate 0.0019   Epoch: 17   Global Step: 287810   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:10:58,411-Speed 9475.35 samples/sec   Loss 3.5567   LearningRate 0.0019   Epoch: 17   Global Step: 287820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:10:59,503-Speed 9387.52 samples/sec   Loss 3.5254   LearningRate 0.0019   Epoch: 17   Global Step: 287830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:11:00,639-Speed 9019.46 samples/sec   Loss 3.5764   LearningRate 0.0019   Epoch: 17   Global Step: 287840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:11:01,754-Speed 9184.25 samples/sec   Loss 3.5921   LearningRate 0.0019   Epoch: 17   Global Step: 287850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:11:02,892-Speed 9009.94 samples/sec   Loss 3.5905   LearningRate 0.0019   Epoch: 17   Global Step: 287860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:11:04,040-Speed 8919.99 samples/sec   Loss 3.5607   LearningRate 0.0019   Epoch: 17   Global Step: 287870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:11:05,141-Speed 9306.26 samples/sec   Loss 3.5557   LearningRate 0.0019   Epoch: 17   Global Step: 287880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:11:06,267-Speed 9106.77 samples/sec   Loss 3.4839   LearningRate 0.0019   Epoch: 17   Global Step: 287890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:11:07,433-Speed 8782.12 samples/sec   Loss 3.5244   LearningRate 0.0019   Epoch: 17   Global Step: 287900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:11:08,517-Speed 9451.95 samples/sec   Loss 3.5922   LearningRate 0.0019   Epoch: 17   Global Step: 287910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:11:09,630-Speed 9204.53 samples/sec   Loss 3.5617   LearningRate 0.0019   Epoch: 17   Global Step: 287920   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:11:10,778-Speed 8930.77 samples/sec   Loss 3.5829   LearningRate 0.0019   Epoch: 17   Global Step: 287930   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:11:11,929-Speed 8898.51 samples/sec   Loss 3.4251   LearningRate 0.0019   Epoch: 17   Global Step: 287940   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:11:13,031-Speed 9297.93 samples/sec   Loss 3.5077   LearningRate 0.0019   Epoch: 17   Global Step: 287950   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:11:14,165-Speed 9033.46 samples/sec   Loss 3.6234   LearningRate 0.0019   Epoch: 17   Global Step: 287960   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:11:15,274-Speed 9241.60 samples/sec   Loss 3.4985   LearningRate 0.0019   Epoch: 17   Global Step: 287970   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:11:16,391-Speed 9171.61 samples/sec   Loss 3.5985   LearningRate 0.0019   Epoch: 17   Global Step: 287980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:11:17,542-Speed 8900.52 samples/sec   Loss 3.4991   LearningRate 0.0019   Epoch: 17   Global Step: 287990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:11:18,676-Speed 9036.64 samples/sec   Loss 3.5530   LearningRate 0.0019   Epoch: 17   Global Step: 288000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:11:40,672-[lfw][288000]XNorm: 6.748821
Training: 2022-04-11 23:11:40,673-[lfw][288000]Accuracy-Flip: 0.99683+-0.00263
Training: 2022-04-11 23:11:40,673-[lfw][288000]Accuracy-Highest: 0.99733
Training: 2022-04-11 23:12:06,130-[cfp_fp][288000]XNorm: 5.885726
Training: 2022-04-11 23:12:06,131-[cfp_fp][288000]Accuracy-Flip: 0.97243+-0.00852
Training: 2022-04-11 23:12:06,132-[cfp_fp][288000]Accuracy-Highest: 0.97386
Training: 2022-04-11 23:12:28,060-[agedb_30][288000]XNorm: 6.571554
Training: 2022-04-11 23:12:28,061-[agedb_30][288000]Accuracy-Flip: 0.97417+-0.00827
Training: 2022-04-11 23:12:28,061-[agedb_30][288000]Accuracy-Highest: 0.97417
Training: 2022-04-11 23:12:29,175-Speed 145.25 samples/sec   Loss 3.5005   LearningRate 0.0019   Epoch: 17   Global Step: 288010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:30,268-Speed 9376.69 samples/sec   Loss 3.5566   LearningRate 0.0019   Epoch: 17   Global Step: 288020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:31,387-Speed 9153.95 samples/sec   Loss 3.4939   LearningRate 0.0019   Epoch: 17   Global Step: 288030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:32,519-Speed 9053.89 samples/sec   Loss 3.5262   LearningRate 0.0019   Epoch: 17   Global Step: 288040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:33,630-Speed 9220.43 samples/sec   Loss 3.5642   LearningRate 0.0019   Epoch: 17   Global Step: 288050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:34,744-Speed 9197.90 samples/sec   Loss 3.5688   LearningRate 0.0019   Epoch: 17   Global Step: 288060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:35,894-Speed 8911.93 samples/sec   Loss 3.5588   LearningRate 0.0019   Epoch: 17   Global Step: 288070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:37,000-Speed 9259.47 samples/sec   Loss 3.5739   LearningRate 0.0019   Epoch: 17   Global Step: 288080   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:12:38,113-Speed 9207.95 samples/sec   Loss 3.5641   LearningRate 0.0019   Epoch: 17   Global Step: 288090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:39,225-Speed 9210.53 samples/sec   Loss 3.4952   LearningRate 0.0019   Epoch: 17   Global Step: 288100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:40,330-Speed 9268.92 samples/sec   Loss 3.5894   LearningRate 0.0019   Epoch: 17   Global Step: 288110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:41,507-Speed 8709.93 samples/sec   Loss 3.5397   LearningRate 0.0019   Epoch: 17   Global Step: 288120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:42,626-Speed 9157.79 samples/sec   Loss 3.5755   LearningRate 0.0019   Epoch: 17   Global Step: 288130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:43,772-Speed 8938.82 samples/sec   Loss 3.5731   LearningRate 0.0019   Epoch: 17   Global Step: 288140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:44,892-Speed 9147.74 samples/sec   Loss 3.5120   LearningRate 0.0019   Epoch: 17   Global Step: 288150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:46,013-Speed 9137.29 samples/sec   Loss 3.5527   LearningRate 0.0019   Epoch: 17   Global Step: 288160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:47,106-Speed 9373.83 samples/sec   Loss 3.6225   LearningRate 0.0019   Epoch: 17   Global Step: 288170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:48,246-Speed 8986.12 samples/sec   Loss 3.5032   LearningRate 0.0019   Epoch: 17   Global Step: 288180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:51,156-Speed 3519.27 samples/sec   Loss 3.4417   LearningRate 0.0019   Epoch: 17   Global Step: 288190   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:12:52,304-Speed 8928.22 samples/sec   Loss 3.5833   LearningRate 0.0019   Epoch: 17   Global Step: 288200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:53,403-Speed 9326.92 samples/sec   Loss 3.5666   LearningRate 0.0019   Epoch: 17   Global Step: 288210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:54,574-Speed 8750.93 samples/sec   Loss 3.6178   LearningRate 0.0019   Epoch: 17   Global Step: 288220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:55,734-Speed 8833.13 samples/sec   Loss 3.4728   LearningRate 0.0019   Epoch: 17   Global Step: 288230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:56,873-Speed 8994.75 samples/sec   Loss 3.6022   LearningRate 0.0019   Epoch: 17   Global Step: 288240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:57,998-Speed 9108.69 samples/sec   Loss 3.5398   LearningRate 0.0019   Epoch: 17   Global Step: 288250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:12:59,134-Speed 9019.28 samples/sec   Loss 3.5364   LearningRate 0.0019   Epoch: 17   Global Step: 288260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:00,276-Speed 8973.51 samples/sec   Loss 3.6283   LearningRate 0.0019   Epoch: 17   Global Step: 288270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:01,406-Speed 9070.45 samples/sec   Loss 3.5852   LearningRate 0.0019   Epoch: 17   Global Step: 288280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:02,493-Speed 9422.87 samples/sec   Loss 3.5793   LearningRate 0.0019   Epoch: 17   Global Step: 288290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:03,620-Speed 9091.17 samples/sec   Loss 3.5290   LearningRate 0.0019   Epoch: 17   Global Step: 288300   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:13:04,702-Speed 9463.10 samples/sec   Loss 3.4333   LearningRate 0.0019   Epoch: 17   Global Step: 288310   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:13:05,855-Speed 8889.69 samples/sec   Loss 3.5263   LearningRate 0.0019   Epoch: 17   Global Step: 288320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:13:06,978-Speed 9120.54 samples/sec   Loss 3.4556   LearningRate 0.0019   Epoch: 17   Global Step: 288330   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:13:08,036-Speed 9688.84 samples/sec   Loss 3.5563   LearningRate 0.0019   Epoch: 17   Global Step: 288340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:09,096-Speed 9667.88 samples/sec   Loss 3.5529   LearningRate 0.0019   Epoch: 17   Global Step: 288350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:10,193-Speed 9335.50 samples/sec   Loss 3.5088   LearningRate 0.0019   Epoch: 17   Global Step: 288360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:11,304-Speed 9228.69 samples/sec   Loss 3.6329   LearningRate 0.0019   Epoch: 17   Global Step: 288370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:12,426-Speed 9131.26 samples/sec   Loss 3.5232   LearningRate 0.0019   Epoch: 17   Global Step: 288380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:13,574-Speed 8928.56 samples/sec   Loss 3.6280   LearningRate 0.0019   Epoch: 17   Global Step: 288390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:14,706-Speed 9048.77 samples/sec   Loss 3.5403   LearningRate 0.0019   Epoch: 17   Global Step: 288400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:15,860-Speed 8878.52 samples/sec   Loss 3.5555   LearningRate 0.0019   Epoch: 17   Global Step: 288410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:16,985-Speed 9108.27 samples/sec   Loss 3.6223   LearningRate 0.0018   Epoch: 17   Global Step: 288420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:18,119-Speed 9027.25 samples/sec   Loss 3.4854   LearningRate 0.0018   Epoch: 17   Global Step: 288430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:19,218-Speed 9327.38 samples/sec   Loss 3.5646   LearningRate 0.0018   Epoch: 17   Global Step: 288440   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:13:20,345-Speed 9090.63 samples/sec   Loss 3.5979   LearningRate 0.0018   Epoch: 17   Global Step: 288450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:21,424-Speed 9495.70 samples/sec   Loss 3.5419   LearningRate 0.0018   Epoch: 17   Global Step: 288460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:22,573-Speed 8916.27 samples/sec   Loss 3.5124   LearningRate 0.0018   Epoch: 17   Global Step: 288470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:23,711-Speed 9003.02 samples/sec   Loss 3.6683   LearningRate 0.0018   Epoch: 17   Global Step: 288480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:24,822-Speed 9223.12 samples/sec   Loss 3.5562   LearningRate 0.0018   Epoch: 17   Global Step: 288490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:25,969-Speed 8933.07 samples/sec   Loss 3.4890   LearningRate 0.0018   Epoch: 17   Global Step: 288500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:27,185-Speed 8428.32 samples/sec   Loss 3.5706   LearningRate 0.0018   Epoch: 17   Global Step: 288510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:28,304-Speed 9159.67 samples/sec   Loss 3.5439   LearningRate 0.0018   Epoch: 17   Global Step: 288520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:29,422-Speed 9160.35 samples/sec   Loss 3.4984   LearningRate 0.0018   Epoch: 17   Global Step: 288530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:30,558-Speed 9019.11 samples/sec   Loss 3.6017   LearningRate 0.0018   Epoch: 17   Global Step: 288540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:31,732-Speed 8726.49 samples/sec   Loss 3.5871   LearningRate 0.0018   Epoch: 17   Global Step: 288550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:32,894-Speed 8821.29 samples/sec   Loss 3.6069   LearningRate 0.0018   Epoch: 17   Global Step: 288560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:34,034-Speed 8985.00 samples/sec   Loss 3.4803   LearningRate 0.0018   Epoch: 17   Global Step: 288570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:35,150-Speed 9184.67 samples/sec   Loss 3.6474   LearningRate 0.0018   Epoch: 17   Global Step: 288580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:36,288-Speed 8996.83 samples/sec   Loss 3.5885   LearningRate 0.0018   Epoch: 17   Global Step: 288590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:37,410-Speed 9131.54 samples/sec   Loss 3.5472   LearningRate 0.0018   Epoch: 17   Global Step: 288600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:38,523-Speed 9211.12 samples/sec   Loss 3.5767   LearningRate 0.0018   Epoch: 17   Global Step: 288610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:39,650-Speed 9092.02 samples/sec   Loss 3.4749   LearningRate 0.0018   Epoch: 17   Global Step: 288620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:40,763-Speed 9204.90 samples/sec   Loss 3.5882   LearningRate 0.0018   Epoch: 17   Global Step: 288630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:41,868-Speed 9270.84 samples/sec   Loss 3.5775   LearningRate 0.0018   Epoch: 17   Global Step: 288640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:42,979-Speed 9219.87 samples/sec   Loss 3.5616   LearningRate 0.0018   Epoch: 17   Global Step: 288650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:44,139-Speed 8832.34 samples/sec   Loss 3.5357   LearningRate 0.0018   Epoch: 17   Global Step: 288660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:45,246-Speed 9256.35 samples/sec   Loss 3.5784   LearningRate 0.0018   Epoch: 17   Global Step: 288670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:46,408-Speed 8819.78 samples/sec   Loss 3.5722   LearningRate 0.0018   Epoch: 17   Global Step: 288680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:47,565-Speed 8854.76 samples/sec   Loss 3.5641   LearningRate 0.0018   Epoch: 17   Global Step: 288690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:48,713-Speed 8926.34 samples/sec   Loss 3.5338   LearningRate 0.0018   Epoch: 17   Global Step: 288700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:49,844-Speed 9058.03 samples/sec   Loss 3.5475   LearningRate 0.0018   Epoch: 17   Global Step: 288710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:50,990-Speed 8938.95 samples/sec   Loss 3.5919   LearningRate 0.0018   Epoch: 17   Global Step: 288720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:52,132-Speed 8983.21 samples/sec   Loss 3.5206   LearningRate 0.0018   Epoch: 17   Global Step: 288730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:53,240-Speed 9242.68 samples/sec   Loss 3.5820   LearningRate 0.0018   Epoch: 17   Global Step: 288740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:13:54,376-Speed 9019.37 samples/sec   Loss 3.5791   LearningRate 0.0018   Epoch: 17   Global Step: 288750   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:13:55,509-Speed 9042.70 samples/sec   Loss 3.5342   LearningRate 0.0018   Epoch: 17   Global Step: 288760   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:13:56,617-Speed 9245.72 samples/sec   Loss 3.5609   LearningRate 0.0018   Epoch: 17   Global Step: 288770   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:13:57,759-Speed 8972.62 samples/sec   Loss 3.5608   LearningRate 0.0018   Epoch: 17   Global Step: 288780   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:13:58,894-Speed 9027.05 samples/sec   Loss 3.5276   LearningRate 0.0018   Epoch: 17   Global Step: 288790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:00,029-Speed 9026.18 samples/sec   Loss 3.6002   LearningRate 0.0018   Epoch: 17   Global Step: 288800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:01,155-Speed 9103.71 samples/sec   Loss 3.4992   LearningRate 0.0018   Epoch: 17   Global Step: 288810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:02,297-Speed 8969.32 samples/sec   Loss 3.5064   LearningRate 0.0018   Epoch: 17   Global Step: 288820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:03,448-Speed 8902.23 samples/sec   Loss 3.5531   LearningRate 0.0018   Epoch: 17   Global Step: 288830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:04,575-Speed 9090.18 samples/sec   Loss 3.5879   LearningRate 0.0018   Epoch: 17   Global Step: 288840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:05,730-Speed 8870.26 samples/sec   Loss 3.6340   LearningRate 0.0018   Epoch: 17   Global Step: 288850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:06,883-Speed 8886.92 samples/sec   Loss 3.5148   LearningRate 0.0018   Epoch: 17   Global Step: 288860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:08,037-Speed 8878.67 samples/sec   Loss 3.5134   LearningRate 0.0018   Epoch: 17   Global Step: 288870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:14:09,152-Speed 9190.83 samples/sec   Loss 3.5275   LearningRate 0.0018   Epoch: 17   Global Step: 288880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:14:10,306-Speed 8874.08 samples/sec   Loss 3.6696   LearningRate 0.0018   Epoch: 17   Global Step: 288890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:14:11,418-Speed 9218.40 samples/sec   Loss 3.4944   LearningRate 0.0018   Epoch: 17   Global Step: 288900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:14:12,592-Speed 8730.38 samples/sec   Loss 3.5262   LearningRate 0.0018   Epoch: 17   Global Step: 288910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:14:13,712-Speed 9141.55 samples/sec   Loss 3.5127   LearningRate 0.0018   Epoch: 17   Global Step: 288920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:14:15,912-Speed 4657.41 samples/sec   Loss 3.5658   LearningRate 0.0018   Epoch: 17   Global Step: 288930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:14:17,073-Speed 8822.09 samples/sec   Loss 3.5972   LearningRate 0.0018   Epoch: 17   Global Step: 288940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:14:18,213-Speed 8987.74 samples/sec   Loss 3.4678   LearningRate 0.0018   Epoch: 17   Global Step: 288950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:14:20,211-Speed 5127.14 samples/sec   Loss 3.6476   LearningRate 0.0018   Epoch: 17   Global Step: 288960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:14:21,330-Speed 9157.02 samples/sec   Loss 3.5519   LearningRate 0.0018   Epoch: 17   Global Step: 288970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:22,483-Speed 8887.91 samples/sec   Loss 3.6042   LearningRate 0.0018   Epoch: 17   Global Step: 288980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:24,716-Speed 4587.95 samples/sec   Loss 3.5821   LearningRate 0.0018   Epoch: 17   Global Step: 288990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:25,868-Speed 8897.09 samples/sec   Loss 3.5239   LearningRate 0.0018   Epoch: 17   Global Step: 289000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:27,062-Speed 8581.78 samples/sec   Loss 3.6516   LearningRate 0.0018   Epoch: 17   Global Step: 289010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:28,986-Speed 5323.78 samples/sec   Loss 3.5559   LearningRate 0.0018   Epoch: 17   Global Step: 289020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:30,116-Speed 9067.72 samples/sec   Loss 3.5054   LearningRate 0.0018   Epoch: 17   Global Step: 289030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:31,273-Speed 8853.90 samples/sec   Loss 3.4421   LearningRate 0.0018   Epoch: 17   Global Step: 289040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:32,412-Speed 8999.16 samples/sec   Loss 3.5839   LearningRate 0.0018   Epoch: 17   Global Step: 289050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:33,565-Speed 8883.97 samples/sec   Loss 3.6543   LearningRate 0.0018   Epoch: 17   Global Step: 289060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:34,671-Speed 9267.94 samples/sec   Loss 3.6315   LearningRate 0.0018   Epoch: 17   Global Step: 289070   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:14:35,790-Speed 9149.16 samples/sec   Loss 3.5545   LearningRate 0.0018   Epoch: 17   Global Step: 289080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:36,879-Speed 9407.35 samples/sec   Loss 3.6254   LearningRate 0.0018   Epoch: 17   Global Step: 289090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:37,979-Speed 9319.77 samples/sec   Loss 3.5459   LearningRate 0.0018   Epoch: 17   Global Step: 289100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:39,092-Speed 9201.94 samples/sec   Loss 3.6030   LearningRate 0.0018   Epoch: 17   Global Step: 289110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:40,221-Speed 9077.36 samples/sec   Loss 3.5479   LearningRate 0.0018   Epoch: 17   Global Step: 289120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:41,378-Speed 8860.41 samples/sec   Loss 3.5210   LearningRate 0.0018   Epoch: 17   Global Step: 289130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:42,515-Speed 9009.61 samples/sec   Loss 3.5823   LearningRate 0.0018   Epoch: 17   Global Step: 289140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:43,690-Speed 8719.07 samples/sec   Loss 3.6237   LearningRate 0.0018   Epoch: 17   Global Step: 289150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:44,820-Speed 9068.91 samples/sec   Loss 3.4451   LearningRate 0.0018   Epoch: 17   Global Step: 289160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:45,958-Speed 9004.41 samples/sec   Loss 3.5709   LearningRate 0.0018   Epoch: 17   Global Step: 289170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:47,067-Speed 9237.49 samples/sec   Loss 3.5191   LearningRate 0.0018   Epoch: 17   Global Step: 289180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:14:48,158-Speed 9390.28 samples/sec   Loss 3.5466   LearningRate 0.0018   Epoch: 17   Global Step: 289190   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:14:49,286-Speed 9078.03 samples/sec   Loss 3.5280   LearningRate 0.0018   Epoch: 17   Global Step: 289200   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:14:50,386-Speed 9314.61 samples/sec   Loss 3.5523   LearningRate 0.0018   Epoch: 17   Global Step: 289210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:51,480-Speed 9368.87 samples/sec   Loss 3.6042   LearningRate 0.0018   Epoch: 17   Global Step: 289220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:52,619-Speed 9001.10 samples/sec   Loss 3.5409   LearningRate 0.0018   Epoch: 17   Global Step: 289230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:53,763-Speed 8950.80 samples/sec   Loss 3.5603   LearningRate 0.0018   Epoch: 17   Global Step: 289240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:54,894-Speed 9061.17 samples/sec   Loss 3.5225   LearningRate 0.0018   Epoch: 17   Global Step: 289250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:56,040-Speed 8943.85 samples/sec   Loss 3.5235   LearningRate 0.0018   Epoch: 17   Global Step: 289260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:57,183-Speed 8968.48 samples/sec   Loss 3.5682   LearningRate 0.0018   Epoch: 17   Global Step: 289270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:58,320-Speed 9008.72 samples/sec   Loss 3.5932   LearningRate 0.0018   Epoch: 17   Global Step: 289280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:14:59,458-Speed 9003.55 samples/sec   Loss 3.5885   LearningRate 0.0018   Epoch: 17   Global Step: 289290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:00,590-Speed 9048.97 samples/sec   Loss 3.4905   LearningRate 0.0018   Epoch: 17   Global Step: 289300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:01,692-Speed 9294.55 samples/sec   Loss 3.5757   LearningRate 0.0018   Epoch: 17   Global Step: 289310   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:15:02,804-Speed 9220.75 samples/sec   Loss 3.5401   LearningRate 0.0018   Epoch: 17   Global Step: 289320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:15:03,903-Speed 9329.67 samples/sec   Loss 3.5812   LearningRate 0.0018   Epoch: 17   Global Step: 289330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:05,013-Speed 9227.84 samples/sec   Loss 3.5280   LearningRate 0.0018   Epoch: 17   Global Step: 289340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:06,123-Speed 9229.73 samples/sec   Loss 3.6086   LearningRate 0.0018   Epoch: 17   Global Step: 289350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:07,205-Speed 9471.44 samples/sec   Loss 3.5608   LearningRate 0.0018   Epoch: 17   Global Step: 289360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:08,385-Speed 8682.03 samples/sec   Loss 3.6021   LearningRate 0.0018   Epoch: 17   Global Step: 289370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:09,535-Speed 8908.12 samples/sec   Loss 3.5596   LearningRate 0.0018   Epoch: 17   Global Step: 289380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:10,653-Speed 9160.20 samples/sec   Loss 3.6021   LearningRate 0.0018   Epoch: 17   Global Step: 289390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:11,823-Speed 8757.20 samples/sec   Loss 3.6335   LearningRate 0.0018   Epoch: 17   Global Step: 289400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:12,974-Speed 8902.64 samples/sec   Loss 3.6105   LearningRate 0.0018   Epoch: 17   Global Step: 289410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:14,082-Speed 9247.07 samples/sec   Loss 3.5810   LearningRate 0.0018   Epoch: 17   Global Step: 289420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:15,220-Speed 9004.52 samples/sec   Loss 3.5579   LearningRate 0.0018   Epoch: 17   Global Step: 289430   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:15:16,329-Speed 9244.63 samples/sec   Loss 3.5882   LearningRate 0.0018   Epoch: 17   Global Step: 289440   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:15:17,460-Speed 9059.41 samples/sec   Loss 3.6027   LearningRate 0.0018   Epoch: 17   Global Step: 289450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:15:18,616-Speed 8865.55 samples/sec   Loss 3.4980   LearningRate 0.0018   Epoch: 17   Global Step: 289460   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:15:19,720-Speed 9273.54 samples/sec   Loss 3.5324   LearningRate 0.0018   Epoch: 17   Global Step: 289470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:20,787-Speed 9609.85 samples/sec   Loss 3.5848   LearningRate 0.0018   Epoch: 17   Global Step: 289480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:21,964-Speed 8705.35 samples/sec   Loss 3.5681   LearningRate 0.0018   Epoch: 17   Global Step: 289490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:23,083-Speed 9150.33 samples/sec   Loss 3.5754   LearningRate 0.0018   Epoch: 17   Global Step: 289500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:24,250-Speed 8779.38 samples/sec   Loss 3.6105   LearningRate 0.0018   Epoch: 17   Global Step: 289510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:25,411-Speed 8829.43 samples/sec   Loss 3.5464   LearningRate 0.0018   Epoch: 17   Global Step: 289520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:26,537-Speed 9105.92 samples/sec   Loss 3.5868   LearningRate 0.0018   Epoch: 17   Global Step: 289530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:27,672-Speed 9022.10 samples/sec   Loss 3.5534   LearningRate 0.0018   Epoch: 17   Global Step: 289540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:28,789-Speed 9178.10 samples/sec   Loss 3.6045   LearningRate 0.0018   Epoch: 17   Global Step: 289550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:29,922-Speed 9043.40 samples/sec   Loss 3.5681   LearningRate 0.0018   Epoch: 17   Global Step: 289560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:31,056-Speed 9031.54 samples/sec   Loss 3.5992   LearningRate 0.0018   Epoch: 17   Global Step: 289570   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:15:32,206-Speed 8907.35 samples/sec   Loss 3.4784   LearningRate 0.0018   Epoch: 17   Global Step: 289580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:33,367-Speed 8827.42 samples/sec   Loss 3.5475   LearningRate 0.0018   Epoch: 17   Global Step: 289590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:34,489-Speed 9136.30 samples/sec   Loss 3.5009   LearningRate 0.0018   Epoch: 17   Global Step: 289600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:35,615-Speed 9094.28 samples/sec   Loss 3.5804   LearningRate 0.0018   Epoch: 17   Global Step: 289610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:36,753-Speed 9004.66 samples/sec   Loss 3.5989   LearningRate 0.0018   Epoch: 17   Global Step: 289620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:37,852-Speed 9324.89 samples/sec   Loss 3.5472   LearningRate 0.0018   Epoch: 17   Global Step: 289630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:39,104-Speed 8183.20 samples/sec   Loss 3.5728   LearningRate 0.0018   Epoch: 17   Global Step: 289640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:40,207-Speed 9288.97 samples/sec   Loss 3.5647   LearningRate 0.0018   Epoch: 17   Global Step: 289650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:41,396-Speed 8617.59 samples/sec   Loss 3.6183   LearningRate 0.0017   Epoch: 17   Global Step: 289660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:42,526-Speed 9064.42 samples/sec   Loss 3.5617   LearningRate 0.0017   Epoch: 17   Global Step: 289670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:43,672-Speed 8937.85 samples/sec   Loss 3.5922   LearningRate 0.0017   Epoch: 17   Global Step: 289680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:15:44,776-Speed 9285.15 samples/sec   Loss 3.5936   LearningRate 0.0017   Epoch: 17   Global Step: 289690   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:15:45,881-Speed 9274.14 samples/sec   Loss 3.5274   LearningRate 0.0017   Epoch: 17   Global Step: 289700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:47,021-Speed 8986.04 samples/sec   Loss 3.5829   LearningRate 0.0017   Epoch: 17   Global Step: 289710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:48,231-Speed 8466.45 samples/sec   Loss 3.5598   LearningRate 0.0017   Epoch: 17   Global Step: 289720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:49,361-Speed 9067.72 samples/sec   Loss 3.6018   LearningRate 0.0017   Epoch: 17   Global Step: 289730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:50,472-Speed 9229.34 samples/sec   Loss 3.5474   LearningRate 0.0017   Epoch: 17   Global Step: 289740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:51,582-Speed 9229.23 samples/sec   Loss 3.6186   LearningRate 0.0017   Epoch: 17   Global Step: 289750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:52,674-Speed 9381.00 samples/sec   Loss 3.6162   LearningRate 0.0017   Epoch: 17   Global Step: 289760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:53,815-Speed 8976.41 samples/sec   Loss 3.6136   LearningRate 0.0017   Epoch: 17   Global Step: 289770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:54,892-Speed 9517.52 samples/sec   Loss 3.6018   LearningRate 0.0017   Epoch: 17   Global Step: 289780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:56,036-Speed 8952.16 samples/sec   Loss 3.5720   LearningRate 0.0017   Epoch: 17   Global Step: 289790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:15:57,167-Speed 9064.27 samples/sec   Loss 3.5211   LearningRate 0.0017   Epoch: 17   Global Step: 289800   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:15:58,283-Speed 9187.41 samples/sec   Loss 3.6158   LearningRate 0.0017   Epoch: 17   Global Step: 289810   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:15:59,422-Speed 8992.19 samples/sec   Loss 3.5976   LearningRate 0.0017   Epoch: 17   Global Step: 289820   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:16:00,554-Speed 9052.05 samples/sec   Loss 3.5716   LearningRate 0.0017   Epoch: 17   Global Step: 289830   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:16:01,682-Speed 9079.43 samples/sec   Loss 3.4911   LearningRate 0.0017   Epoch: 17   Global Step: 289840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:02,803-Speed 9141.32 samples/sec   Loss 3.6016   LearningRate 0.0017   Epoch: 17   Global Step: 289850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:03,938-Speed 9032.45 samples/sec   Loss 3.4997   LearningRate 0.0017   Epoch: 17   Global Step: 289860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:05,094-Speed 8859.56 samples/sec   Loss 3.5787   LearningRate 0.0017   Epoch: 17   Global Step: 289870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:06,194-Speed 9319.51 samples/sec   Loss 3.5386   LearningRate 0.0017   Epoch: 17   Global Step: 289880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:07,338-Speed 8950.02 samples/sec   Loss 3.5469   LearningRate 0.0017   Epoch: 17   Global Step: 289890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:08,437-Speed 9330.97 samples/sec   Loss 3.5506   LearningRate 0.0017   Epoch: 17   Global Step: 289900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:09,568-Speed 9055.76 samples/sec   Loss 3.5844   LearningRate 0.0017   Epoch: 17   Global Step: 289910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:10,691-Speed 9122.72 samples/sec   Loss 3.5112   LearningRate 0.0017   Epoch: 17   Global Step: 289920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:11,861-Speed 8757.37 samples/sec   Loss 3.6260   LearningRate 0.0017   Epoch: 17   Global Step: 289930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:12,949-Speed 9416.68 samples/sec   Loss 3.5668   LearningRate 0.0017   Epoch: 17   Global Step: 289940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:14,110-Speed 8824.14 samples/sec   Loss 3.5764   LearningRate 0.0017   Epoch: 17   Global Step: 289950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:15,204-Speed 9368.82 samples/sec   Loss 3.5369   LearningRate 0.0017   Epoch: 17   Global Step: 289960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:16,328-Speed 9117.01 samples/sec   Loss 3.6219   LearningRate 0.0017   Epoch: 17   Global Step: 289970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:17,469-Speed 8980.43 samples/sec   Loss 3.5500   LearningRate 0.0017   Epoch: 17   Global Step: 289980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:18,601-Speed 9054.13 samples/sec   Loss 3.5787   LearningRate 0.0017   Epoch: 17   Global Step: 289990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:19,778-Speed 8704.99 samples/sec   Loss 3.6273   LearningRate 0.0017   Epoch: 17   Global Step: 290000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:16:41,938-[lfw][290000]XNorm: 6.694873
Training: 2022-04-11 23:16:41,939-[lfw][290000]Accuracy-Flip: 0.99667+-0.00289
Training: 2022-04-11 23:16:41,939-[lfw][290000]Accuracy-Highest: 0.99733
Training: 2022-04-11 23:17:07,587-[cfp_fp][290000]XNorm: 5.841547
Training: 2022-04-11 23:17:07,588-[cfp_fp][290000]Accuracy-Flip: 0.97257+-0.00885
Training: 2022-04-11 23:17:07,588-[cfp_fp][290000]Accuracy-Highest: 0.97386
Training: 2022-04-11 23:17:29,700-[agedb_30][290000]XNorm: 6.521216
Training: 2022-04-11 23:17:29,700-[agedb_30][290000]Accuracy-Flip: 0.97050+-0.00931
Training: 2022-04-11 23:17:29,701-[agedb_30][290000]Accuracy-Highest: 0.97417
Training: 2022-04-11 23:17:30,802-Speed 144.18 samples/sec   Loss 3.5535   LearningRate 0.0017   Epoch: 17   Global Step: 290010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:31,917-Speed 9183.48 samples/sec   Loss 3.6246   LearningRate 0.0017   Epoch: 17   Global Step: 290020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:33,047-Speed 9069.04 samples/sec   Loss 3.5733   LearningRate 0.0017   Epoch: 17   Global Step: 290030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:34,171-Speed 9115.84 samples/sec   Loss 3.5779   LearningRate 0.0017   Epoch: 17   Global Step: 290040   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:17:35,279-Speed 9246.62 samples/sec   Loss 3.5380   LearningRate 0.0017   Epoch: 17   Global Step: 290050   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:17:36,347-Speed 9591.21 samples/sec   Loss 3.5807   LearningRate 0.0017   Epoch: 17   Global Step: 290060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:37,457-Speed 9237.94 samples/sec   Loss 3.5892   LearningRate 0.0017   Epoch: 17   Global Step: 290070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:38,610-Speed 8881.58 samples/sec   Loss 3.5305   LearningRate 0.0017   Epoch: 17   Global Step: 290080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:39,745-Speed 9027.43 samples/sec   Loss 3.5621   LearningRate 0.0017   Epoch: 17   Global Step: 290090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:40,881-Speed 9022.59 samples/sec   Loss 3.5577   LearningRate 0.0017   Epoch: 17   Global Step: 290100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:42,010-Speed 9078.53 samples/sec   Loss 3.5466   LearningRate 0.0017   Epoch: 17   Global Step: 290110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:43,165-Speed 8868.36 samples/sec   Loss 3.6631   LearningRate 0.0017   Epoch: 17   Global Step: 290120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:44,274-Speed 9240.23 samples/sec   Loss 3.5547   LearningRate 0.0017   Epoch: 17   Global Step: 290130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:45,413-Speed 8998.66 samples/sec   Loss 3.6177   LearningRate 0.0017   Epoch: 17   Global Step: 290140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:46,528-Speed 9191.10 samples/sec   Loss 3.5878   LearningRate 0.0017   Epoch: 17   Global Step: 290150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:47,664-Speed 9016.05 samples/sec   Loss 3.5554   LearningRate 0.0017   Epoch: 17   Global Step: 290160   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:17:48,771-Speed 9255.13 samples/sec   Loss 3.5942   LearningRate 0.0017   Epoch: 17   Global Step: 290170   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:17:49,889-Speed 9170.34 samples/sec   Loss 3.6135   LearningRate 0.0017   Epoch: 17   Global Step: 290180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:17:50,984-Speed 9357.00 samples/sec   Loss 3.5697   LearningRate 0.0017   Epoch: 17   Global Step: 290190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:52,117-Speed 9038.30 samples/sec   Loss 3.5513   LearningRate 0.0017   Epoch: 17   Global Step: 290200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:53,216-Speed 9322.24 samples/sec   Loss 3.5494   LearningRate 0.0017   Epoch: 17   Global Step: 290210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:54,333-Speed 9172.49 samples/sec   Loss 3.6253   LearningRate 0.0017   Epoch: 17   Global Step: 290220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:55,452-Speed 9160.82 samples/sec   Loss 3.6060   LearningRate 0.0017   Epoch: 17   Global Step: 290230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:56,553-Speed 9303.92 samples/sec   Loss 3.6745   LearningRate 0.0017   Epoch: 17   Global Step: 290240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:57,685-Speed 9051.48 samples/sec   Loss 3.6491   LearningRate 0.0017   Epoch: 17   Global Step: 290250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:58,789-Speed 9278.92 samples/sec   Loss 3.5752   LearningRate 0.0017   Epoch: 17   Global Step: 290260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:17:59,923-Speed 9033.91 samples/sec   Loss 3.5374   LearningRate 0.0017   Epoch: 17   Global Step: 290270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:01,069-Speed 8939.25 samples/sec   Loss 3.5821   LearningRate 0.0017   Epoch: 17   Global Step: 290280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:02,191-Speed 9133.19 samples/sec   Loss 3.5482   LearningRate 0.0017   Epoch: 17   Global Step: 290290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:03,298-Speed 9252.15 samples/sec   Loss 3.5978   LearningRate 0.0017   Epoch: 17   Global Step: 290300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:04,429-Speed 9064.28 samples/sec   Loss 3.5563   LearningRate 0.0017   Epoch: 17   Global Step: 290310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:05,587-Speed 8851.42 samples/sec   Loss 3.6069   LearningRate 0.0017   Epoch: 17   Global Step: 290320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:06,732-Speed 8943.88 samples/sec   Loss 3.5397   LearningRate 0.0017   Epoch: 17   Global Step: 290330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:07,812-Speed 9490.82 samples/sec   Loss 3.5634   LearningRate 0.0017   Epoch: 17   Global Step: 290340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:08,909-Speed 9341.56 samples/sec   Loss 3.5532   LearningRate 0.0017   Epoch: 17   Global Step: 290350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:10,026-Speed 9171.04 samples/sec   Loss 3.5234   LearningRate 0.0017   Epoch: 17   Global Step: 290360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:11,172-Speed 8938.41 samples/sec   Loss 3.6600   LearningRate 0.0017   Epoch: 17   Global Step: 290370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:12,288-Speed 9183.65 samples/sec   Loss 3.6173   LearningRate 0.0017   Epoch: 17   Global Step: 290380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:13,393-Speed 9272.63 samples/sec   Loss 3.5145   LearningRate 0.0017   Epoch: 17   Global Step: 290390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:14,534-Speed 8980.91 samples/sec   Loss 3.5088   LearningRate 0.0017   Epoch: 17   Global Step: 290400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:15,688-Speed 8875.85 samples/sec   Loss 3.5627   LearningRate 0.0017   Epoch: 17   Global Step: 290410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:16,788-Speed 9316.78 samples/sec   Loss 3.6464   LearningRate 0.0017   Epoch: 17   Global Step: 290420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:17,959-Speed 8751.59 samples/sec   Loss 3.6755   LearningRate 0.0017   Epoch: 17   Global Step: 290430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:19,078-Speed 9154.04 samples/sec   Loss 3.5415   LearningRate 0.0017   Epoch: 17   Global Step: 290440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:20,273-Speed 8573.33 samples/sec   Loss 3.5610   LearningRate 0.0017   Epoch: 17   Global Step: 290450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:21,443-Speed 8760.11 samples/sec   Loss 3.4857   LearningRate 0.0017   Epoch: 17   Global Step: 290460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:22,599-Speed 8865.22 samples/sec   Loss 3.5427   LearningRate 0.0017   Epoch: 17   Global Step: 290470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:23,707-Speed 9246.90 samples/sec   Loss 3.6089   LearningRate 0.0017   Epoch: 17   Global Step: 290480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:24,806-Speed 9322.89 samples/sec   Loss 3.5816   LearningRate 0.0017   Epoch: 17   Global Step: 290490   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:18:25,899-Speed 9371.57 samples/sec   Loss 3.6728   LearningRate 0.0017   Epoch: 17   Global Step: 290500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:27,061-Speed 8824.28 samples/sec   Loss 3.6197   LearningRate 0.0017   Epoch: 17   Global Step: 290510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:18:28,202-Speed 8976.23 samples/sec   Loss 3.6307   LearningRate 0.0017   Epoch: 17   Global Step: 290520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:18:29,302-Speed 9313.54 samples/sec   Loss 3.5202   LearningRate 0.0017   Epoch: 17   Global Step: 290530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:18:30,405-Speed 9292.37 samples/sec   Loss 3.6254   LearningRate 0.0017   Epoch: 17   Global Step: 290540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:18:31,545-Speed 8986.04 samples/sec   Loss 3.6348   LearningRate 0.0017   Epoch: 17   Global Step: 290550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:18:32,670-Speed 9105.87 samples/sec   Loss 3.5763   LearningRate 0.0017   Epoch: 17   Global Step: 290560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:18:33,770-Speed 9322.49 samples/sec   Loss 3.5426   LearningRate 0.0017   Epoch: 17   Global Step: 290570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:18:34,876-Speed 9259.54 samples/sec   Loss 3.5827   LearningRate 0.0017   Epoch: 17   Global Step: 290580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:18:35,978-Speed 9297.87 samples/sec   Loss 3.5428   LearningRate 0.0017   Epoch: 17   Global Step: 290590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:18:37,170-Speed 8599.26 samples/sec   Loss 3.6033   LearningRate 0.0017   Epoch: 17   Global Step: 290600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:18:38,310-Speed 8984.71 samples/sec   Loss 3.5560   LearningRate 0.0017   Epoch: 17   Global Step: 290610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:39,473-Speed 8811.98 samples/sec   Loss 3.6403   LearningRate 0.0017   Epoch: 17   Global Step: 290620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:40,624-Speed 8899.51 samples/sec   Loss 3.5307   LearningRate 0.0017   Epoch: 17   Global Step: 290630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:41,797-Speed 8733.17 samples/sec   Loss 3.5413   LearningRate 0.0017   Epoch: 17   Global Step: 290640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:42,963-Speed 8789.22 samples/sec   Loss 3.6521   LearningRate 0.0017   Epoch: 17   Global Step: 290650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:44,123-Speed 8839.48 samples/sec   Loss 3.4396   LearningRate 0.0017   Epoch: 17   Global Step: 290660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:45,253-Speed 9061.97 samples/sec   Loss 3.5046   LearningRate 0.0017   Epoch: 17   Global Step: 290670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:46,362-Speed 9247.63 samples/sec   Loss 3.5486   LearningRate 0.0017   Epoch: 17   Global Step: 290680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:47,484-Speed 9129.94 samples/sec   Loss 3.5632   LearningRate 0.0017   Epoch: 17   Global Step: 290690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:48,623-Speed 8989.09 samples/sec   Loss 3.5453   LearningRate 0.0017   Epoch: 17   Global Step: 290700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:49,753-Speed 9070.23 samples/sec   Loss 3.5409   LearningRate 0.0017   Epoch: 17   Global Step: 290710   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:18:50,929-Speed 8710.77 samples/sec   Loss 3.5180   LearningRate 0.0017   Epoch: 17   Global Step: 290720   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:18:52,084-Speed 8869.53 samples/sec   Loss 3.6496   LearningRate 0.0017   Epoch: 17   Global Step: 290730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:53,152-Speed 9594.05 samples/sec   Loss 3.6104   LearningRate 0.0017   Epoch: 17   Global Step: 290740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:54,307-Speed 8871.15 samples/sec   Loss 3.6173   LearningRate 0.0017   Epoch: 17   Global Step: 290750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:55,492-Speed 8647.83 samples/sec   Loss 3.5854   LearningRate 0.0017   Epoch: 17   Global Step: 290760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:56,617-Speed 9109.52 samples/sec   Loss 3.5458   LearningRate 0.0017   Epoch: 17   Global Step: 290770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:57,791-Speed 8729.67 samples/sec   Loss 3.5441   LearningRate 0.0017   Epoch: 17   Global Step: 290780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:18:58,920-Speed 9071.21 samples/sec   Loss 3.5902   LearningRate 0.0017   Epoch: 17   Global Step: 290790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:00,049-Speed 9079.02 samples/sec   Loss 3.5848   LearningRate 0.0017   Epoch: 17   Global Step: 290800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:01,160-Speed 9227.13 samples/sec   Loss 3.5624   LearningRate 0.0017   Epoch: 17   Global Step: 290810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:02,305-Speed 8947.22 samples/sec   Loss 3.5832   LearningRate 0.0017   Epoch: 17   Global Step: 290820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:03,415-Speed 9230.38 samples/sec   Loss 3.6171   LearningRate 0.0017   Epoch: 17   Global Step: 290830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:04,531-Speed 9189.00 samples/sec   Loss 3.6372   LearningRate 0.0017   Epoch: 17   Global Step: 290840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:05,606-Speed 9533.64 samples/sec   Loss 3.5964   LearningRate 0.0017   Epoch: 17   Global Step: 290850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:06,713-Speed 9248.87 samples/sec   Loss 3.5274   LearningRate 0.0017   Epoch: 17   Global Step: 290860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:07,845-Speed 9051.82 samples/sec   Loss 3.5169   LearningRate 0.0017   Epoch: 17   Global Step: 290870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:08,962-Speed 9177.24 samples/sec   Loss 3.5445   LearningRate 0.0017   Epoch: 17   Global Step: 290880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:10,065-Speed 9281.99 samples/sec   Loss 3.6187   LearningRate 0.0017   Epoch: 17   Global Step: 290890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:11,202-Speed 9016.40 samples/sec   Loss 3.5547   LearningRate 0.0017   Epoch: 17   Global Step: 290900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:12,326-Speed 9109.95 samples/sec   Loss 3.6111   LearningRate 0.0017   Epoch: 17   Global Step: 290910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:13,455-Speed 9079.09 samples/sec   Loss 3.5310   LearningRate 0.0017   Epoch: 17   Global Step: 290920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:14,567-Speed 9215.00 samples/sec   Loss 3.6917   LearningRate 0.0017   Epoch: 17   Global Step: 290930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:15,686-Speed 9154.71 samples/sec   Loss 3.5294   LearningRate 0.0017   Epoch: 17   Global Step: 290940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:16,831-Speed 8952.33 samples/sec   Loss 3.5913   LearningRate 0.0016   Epoch: 17   Global Step: 290950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:17,965-Speed 9033.67 samples/sec   Loss 3.5834   LearningRate 0.0016   Epoch: 17   Global Step: 290960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:19,115-Speed 8906.51 samples/sec   Loss 3.5443   LearningRate 0.0016   Epoch: 17   Global Step: 290970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:20,311-Speed 8565.90 samples/sec   Loss 3.5171   LearningRate 0.0016   Epoch: 17   Global Step: 290980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:21,477-Speed 8789.64 samples/sec   Loss 3.5372   LearningRate 0.0016   Epoch: 17   Global Step: 290990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:22,562-Speed 9439.74 samples/sec   Loss 3.5781   LearningRate 0.0016   Epoch: 17   Global Step: 291000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:23,734-Speed 8743.34 samples/sec   Loss 3.5642   LearningRate 0.0016   Epoch: 17   Global Step: 291010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:24,877-Speed 8962.59 samples/sec   Loss 3.5361   LearningRate 0.0016   Epoch: 17   Global Step: 291020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:26,016-Speed 8999.96 samples/sec   Loss 3.5905   LearningRate 0.0016   Epoch: 17   Global Step: 291030   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:19:27,132-Speed 9186.47 samples/sec   Loss 3.6136   LearningRate 0.0016   Epoch: 17   Global Step: 291040   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:19:28,243-Speed 9219.41 samples/sec   Loss 3.5154   LearningRate 0.0016   Epoch: 17   Global Step: 291050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:29,406-Speed 8810.91 samples/sec   Loss 3.6191   LearningRate 0.0016   Epoch: 17   Global Step: 291060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:30,542-Speed 9014.83 samples/sec   Loss 3.5701   LearningRate 0.0016   Epoch: 17   Global Step: 291070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:31,641-Speed 9320.19 samples/sec   Loss 3.6044   LearningRate 0.0016   Epoch: 17   Global Step: 291080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:32,746-Speed 9273.84 samples/sec   Loss 3.5330   LearningRate 0.0016   Epoch: 17   Global Step: 291090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:33,890-Speed 8963.40 samples/sec   Loss 3.5150   LearningRate 0.0016   Epoch: 17   Global Step: 291100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:35,031-Speed 8978.59 samples/sec   Loss 3.4376   LearningRate 0.0016   Epoch: 17   Global Step: 291110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:36,108-Speed 9512.28 samples/sec   Loss 3.5437   LearningRate 0.0016   Epoch: 17   Global Step: 291120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:37,219-Speed 9220.90 samples/sec   Loss 3.5562   LearningRate 0.0016   Epoch: 17   Global Step: 291130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:38,366-Speed 8935.76 samples/sec   Loss 3.5920   LearningRate 0.0016   Epoch: 17   Global Step: 291140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:39,554-Speed 8626.48 samples/sec   Loss 3.5527   LearningRate 0.0016   Epoch: 17   Global Step: 291150   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:19:40,645-Speed 9388.89 samples/sec   Loss 3.5748   LearningRate 0.0016   Epoch: 17   Global Step: 291160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:41,762-Speed 9172.56 samples/sec   Loss 3.5452   LearningRate 0.0016   Epoch: 17   Global Step: 291170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:42,895-Speed 9047.22 samples/sec   Loss 3.5946   LearningRate 0.0016   Epoch: 17   Global Step: 291180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:44,026-Speed 9052.66 samples/sec   Loss 3.5140   LearningRate 0.0016   Epoch: 17   Global Step: 291190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:45,215-Speed 8618.36 samples/sec   Loss 3.6288   LearningRate 0.0016   Epoch: 17   Global Step: 291200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:46,341-Speed 9101.37 samples/sec   Loss 3.5977   LearningRate 0.0016   Epoch: 17   Global Step: 291210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:47,447-Speed 9262.65 samples/sec   Loss 3.5961   LearningRate 0.0016   Epoch: 17   Global Step: 291220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:48,601-Speed 8881.92 samples/sec   Loss 3.6004   LearningRate 0.0016   Epoch: 17   Global Step: 291230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:49,718-Speed 9174.35 samples/sec   Loss 3.5890   LearningRate 0.0016   Epoch: 17   Global Step: 291240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:50,847-Speed 9070.98 samples/sec   Loss 3.5453   LearningRate 0.0016   Epoch: 17   Global Step: 291250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:51,971-Speed 9112.83 samples/sec   Loss 3.5922   LearningRate 0.0016   Epoch: 17   Global Step: 291260   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:19:53,097-Speed 9104.42 samples/sec   Loss 3.5514   LearningRate 0.0016   Epoch: 17   Global Step: 291270   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:19:54,261-Speed 8800.19 samples/sec   Loss 3.5788   LearningRate 0.0016   Epoch: 17   Global Step: 291280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:55,356-Speed 9350.36 samples/sec   Loss 3.6730   LearningRate 0.0016   Epoch: 17   Global Step: 291290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:56,479-Speed 9128.39 samples/sec   Loss 3.5654   LearningRate 0.0016   Epoch: 17   Global Step: 291300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:57,603-Speed 9122.27 samples/sec   Loss 3.5346   LearningRate 0.0016   Epoch: 17   Global Step: 291310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:58,705-Speed 9297.83 samples/sec   Loss 3.5624   LearningRate 0.0016   Epoch: 17   Global Step: 291320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:19:59,858-Speed 8882.37 samples/sec   Loss 3.5574   LearningRate 0.0016   Epoch: 17   Global Step: 291330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:01,030-Speed 8745.33 samples/sec   Loss 3.5468   LearningRate 0.0016   Epoch: 17   Global Step: 291340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:02,151-Speed 9136.91 samples/sec   Loss 3.5545   LearningRate 0.0016   Epoch: 17   Global Step: 291350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:03,289-Speed 9004.93 samples/sec   Loss 3.6086   LearningRate 0.0016   Epoch: 17   Global Step: 291360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:04,416-Speed 9094.48 samples/sec   Loss 3.5288   LearningRate 0.0016   Epoch: 17   Global Step: 291370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:05,502-Speed 9433.88 samples/sec   Loss 3.6269   LearningRate 0.0016   Epoch: 17   Global Step: 291380   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:20:06,612-Speed 9229.19 samples/sec   Loss 3.5843   LearningRate 0.0016   Epoch: 17   Global Step: 291390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:07,705-Speed 9377.46 samples/sec   Loss 3.5385   LearningRate 0.0016   Epoch: 17   Global Step: 291400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:08,836-Speed 9054.49 samples/sec   Loss 3.5593   LearningRate 0.0016   Epoch: 17   Global Step: 291410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:09,964-Speed 9086.90 samples/sec   Loss 3.5631   LearningRate 0.0016   Epoch: 17   Global Step: 291420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:11,120-Speed 8858.47 samples/sec   Loss 3.5964   LearningRate 0.0016   Epoch: 17   Global Step: 291430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:12,262-Speed 8973.37 samples/sec   Loss 3.6106   LearningRate 0.0016   Epoch: 17   Global Step: 291440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:13,475-Speed 8447.20 samples/sec   Loss 3.5657   LearningRate 0.0016   Epoch: 17   Global Step: 291450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:14,606-Speed 9059.93 samples/sec   Loss 3.6189   LearningRate 0.0016   Epoch: 17   Global Step: 291460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:15,730-Speed 9113.17 samples/sec   Loss 3.5684   LearningRate 0.0016   Epoch: 17   Global Step: 291470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:16,887-Speed 8864.51 samples/sec   Loss 3.5653   LearningRate 0.0016   Epoch: 17   Global Step: 291480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:18,063-Speed 8710.36 samples/sec   Loss 3.6504   LearningRate 0.0016   Epoch: 17   Global Step: 291490   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:20:19,155-Speed 9378.63 samples/sec   Loss 3.5588   LearningRate 0.0016   Epoch: 17   Global Step: 291500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:20,305-Speed 8910.04 samples/sec   Loss 3.5642   LearningRate 0.0016   Epoch: 17   Global Step: 291510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:21,441-Speed 9020.47 samples/sec   Loss 3.5799   LearningRate 0.0016   Epoch: 17   Global Step: 291520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:22,562-Speed 9144.45 samples/sec   Loss 3.5502   LearningRate 0.0016   Epoch: 17   Global Step: 291530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:23,693-Speed 9054.44 samples/sec   Loss 3.6658   LearningRate 0.0016   Epoch: 17   Global Step: 291540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:24,834-Speed 8982.06 samples/sec   Loss 3.6280   LearningRate 0.0016   Epoch: 17   Global Step: 291550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:25,934-Speed 9311.95 samples/sec   Loss 3.7194   LearningRate 0.0016   Epoch: 17   Global Step: 291560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:27,079-Speed 8951.50 samples/sec   Loss 3.4827   LearningRate 0.0016   Epoch: 17   Global Step: 291570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:28,258-Speed 8691.38 samples/sec   Loss 3.6632   LearningRate 0.0016   Epoch: 17   Global Step: 291580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:29,378-Speed 9151.33 samples/sec   Loss 3.5624   LearningRate 0.0016   Epoch: 17   Global Step: 291590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:30,481-Speed 9282.66 samples/sec   Loss 3.5212   LearningRate 0.0016   Epoch: 17   Global Step: 291600   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:20:31,583-Speed 9296.15 samples/sec   Loss 3.6366   LearningRate 0.0016   Epoch: 17   Global Step: 291610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:32,729-Speed 8943.02 samples/sec   Loss 3.5390   LearningRate 0.0016   Epoch: 17   Global Step: 291620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:33,876-Speed 8941.09 samples/sec   Loss 3.6013   LearningRate 0.0016   Epoch: 17   Global Step: 291630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:35,025-Speed 8914.29 samples/sec   Loss 3.5400   LearningRate 0.0016   Epoch: 17   Global Step: 291640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:36,183-Speed 8849.36 samples/sec   Loss 3.6145   LearningRate 0.0016   Epoch: 17   Global Step: 291650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:37,320-Speed 9008.21 samples/sec   Loss 3.5810   LearningRate 0.0016   Epoch: 17   Global Step: 291660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:38,432-Speed 9218.58 samples/sec   Loss 3.5766   LearningRate 0.0016   Epoch: 17   Global Step: 291670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:39,568-Speed 9011.03 samples/sec   Loss 3.5956   LearningRate 0.0016   Epoch: 17   Global Step: 291680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:40,664-Speed 9356.04 samples/sec   Loss 3.5571   LearningRate 0.0016   Epoch: 17   Global Step: 291690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:41,746-Speed 9463.32 samples/sec   Loss 3.5471   LearningRate 0.0016   Epoch: 17   Global Step: 291700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:42,852-Speed 9265.86 samples/sec   Loss 3.7030   LearningRate 0.0016   Epoch: 17   Global Step: 291710   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:20:43,988-Speed 9015.54 samples/sec   Loss 3.6558   LearningRate 0.0016   Epoch: 17   Global Step: 291720   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:20:45,058-Speed 9575.32 samples/sec   Loss 3.5829   LearningRate 0.0016   Epoch: 17   Global Step: 291730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:46,183-Speed 9112.35 samples/sec   Loss 3.5553   LearningRate 0.0016   Epoch: 17   Global Step: 291740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:47,340-Speed 8860.98 samples/sec   Loss 3.6493   LearningRate 0.0016   Epoch: 17   Global Step: 291750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:48,469-Speed 9070.25 samples/sec   Loss 3.5246   LearningRate 0.0016   Epoch: 17   Global Step: 291760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:49,567-Speed 9334.25 samples/sec   Loss 3.5335   LearningRate 0.0016   Epoch: 17   Global Step: 291770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:50,658-Speed 9386.00 samples/sec   Loss 3.5312   LearningRate 0.0016   Epoch: 17   Global Step: 291780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:20:51,776-Speed 9165.48 samples/sec   Loss 3.5767   LearningRate 0.0016   Epoch: 17   Global Step: 291790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:20:52,888-Speed 9217.25 samples/sec   Loss 3.6238   LearningRate 0.0016   Epoch: 17   Global Step: 291800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:20:54,100-Speed 8449.89 samples/sec   Loss 3.5489   LearningRate 0.0016   Epoch: 17   Global Step: 291810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:20:55,221-Speed 9144.56 samples/sec   Loss 3.5426   LearningRate 0.0016   Epoch: 17   Global Step: 291820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:20:56,389-Speed 8768.05 samples/sec   Loss 3.5791   LearningRate 0.0016   Epoch: 17   Global Step: 291830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:20:57,511-Speed 9134.16 samples/sec   Loss 3.6477   LearningRate 0.0016   Epoch: 17   Global Step: 291840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:20:58,667-Speed 8866.65 samples/sec   Loss 3.5573   LearningRate 0.0016   Epoch: 17   Global Step: 291850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:20:59,800-Speed 9040.76 samples/sec   Loss 3.6644   LearningRate 0.0016   Epoch: 17   Global Step: 291860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:21:00,904-Speed 9280.85 samples/sec   Loss 3.5894   LearningRate 0.0016   Epoch: 17   Global Step: 291870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:21:02,003-Speed 9326.49 samples/sec   Loss 3.6174   LearningRate 0.0016   Epoch: 17   Global Step: 291880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:21:03,120-Speed 9169.51 samples/sec   Loss 3.6221   LearningRate 0.0016   Epoch: 17   Global Step: 291890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:21:04,217-Speed 9344.64 samples/sec   Loss 3.5744   LearningRate 0.0016   Epoch: 17   Global Step: 291900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:21:05,365-Speed 8923.32 samples/sec   Loss 3.5073   LearningRate 0.0016   Epoch: 17   Global Step: 291910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:21:06,510-Speed 8943.49 samples/sec   Loss 3.6098   LearningRate 0.0016   Epoch: 17   Global Step: 291920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:21:07,654-Speed 8960.82 samples/sec   Loss 3.6011   LearningRate 0.0016   Epoch: 17   Global Step: 291930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:21:08,762-Speed 9243.15 samples/sec   Loss 3.5216   LearningRate 0.0016   Epoch: 17   Global Step: 291940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:21:09,889-Speed 9093.26 samples/sec   Loss 3.5389   LearningRate 0.0016   Epoch: 17   Global Step: 291950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:21:11,025-Speed 9021.18 samples/sec   Loss 3.5402   LearningRate 0.0016   Epoch: 17   Global Step: 291960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:21:12,138-Speed 9212.77 samples/sec   Loss 3.5270   LearningRate 0.0016   Epoch: 17   Global Step: 291970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:21:13,266-Speed 9077.75 samples/sec   Loss 3.6037   LearningRate 0.0016   Epoch: 17   Global Step: 291980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:21:14,442-Speed 8716.87 samples/sec   Loss 3.6674   LearningRate 0.0016   Epoch: 17   Global Step: 291990   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:21:15,534-Speed 9377.59 samples/sec   Loss 3.4707   LearningRate 0.0016   Epoch: 17   Global Step: 292000   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:21:37,471-[lfw][292000]XNorm: 6.713849
Training: 2022-04-11 23:21:37,472-[lfw][292000]Accuracy-Flip: 0.99717+-0.00299
Training: 2022-04-11 23:21:37,472-[lfw][292000]Accuracy-Highest: 0.99733
Training: 2022-04-11 23:22:02,822-[cfp_fp][292000]XNorm: 5.849598
Training: 2022-04-11 23:22:02,823-[cfp_fp][292000]Accuracy-Flip: 0.97314+-0.00845
Training: 2022-04-11 23:22:02,823-[cfp_fp][292000]Accuracy-Highest: 0.97386
Training: 2022-04-11 23:22:24,760-[agedb_30][292000]XNorm: 6.526053
Training: 2022-04-11 23:22:24,760-[agedb_30][292000]Accuracy-Flip: 0.97300+-0.00859
Training: 2022-04-11 23:22:24,760-[agedb_30][292000]Accuracy-Highest: 0.97417
Training: 2022-04-11 23:22:25,863-Speed 145.60 samples/sec   Loss 3.5518   LearningRate 0.0016   Epoch: 17   Global Step: 292010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:26,992-Speed 9075.84 samples/sec   Loss 3.5565   LearningRate 0.0016   Epoch: 17   Global Step: 292020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:28,120-Speed 9089.73 samples/sec   Loss 3.6113   LearningRate 0.0016   Epoch: 17   Global Step: 292030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:29,261-Speed 8977.95 samples/sec   Loss 3.6589   LearningRate 0.0016   Epoch: 17   Global Step: 292040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:30,334-Speed 9546.77 samples/sec   Loss 3.5877   LearningRate 0.0016   Epoch: 17   Global Step: 292050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:31,429-Speed 9357.37 samples/sec   Loss 3.6744   LearningRate 0.0016   Epoch: 17   Global Step: 292060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:32,574-Speed 8953.52 samples/sec   Loss 3.5929   LearningRate 0.0016   Epoch: 17   Global Step: 292070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:33,735-Speed 8825.31 samples/sec   Loss 3.6835   LearningRate 0.0016   Epoch: 17   Global Step: 292080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:22:34,855-Speed 9144.35 samples/sec   Loss 3.6174   LearningRate 0.0016   Epoch: 17   Global Step: 292090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:22:35,989-Speed 9035.43 samples/sec   Loss 3.5440   LearningRate 0.0016   Epoch: 17   Global Step: 292100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:22:37,115-Speed 9101.82 samples/sec   Loss 3.5765   LearningRate 0.0016   Epoch: 17   Global Step: 292110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:22:38,235-Speed 9147.68 samples/sec   Loss 3.5756   LearningRate 0.0016   Epoch: 17   Global Step: 292120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:22:39,373-Speed 9003.21 samples/sec   Loss 3.5941   LearningRate 0.0016   Epoch: 17   Global Step: 292130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:22:40,540-Speed 8781.73 samples/sec   Loss 3.6422   LearningRate 0.0016   Epoch: 17   Global Step: 292140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:22:41,665-Speed 9108.17 samples/sec   Loss 3.6388   LearningRate 0.0016   Epoch: 17   Global Step: 292150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:22:42,767-Speed 9296.94 samples/sec   Loss 3.6539   LearningRate 0.0016   Epoch: 17   Global Step: 292160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:22:43,896-Speed 9073.14 samples/sec   Loss 3.6163   LearningRate 0.0016   Epoch: 17   Global Step: 292170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:22:44,992-Speed 9347.09 samples/sec   Loss 3.5855   LearningRate 0.0016   Epoch: 17   Global Step: 292180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:46,085-Speed 9372.75 samples/sec   Loss 3.5646   LearningRate 0.0016   Epoch: 17   Global Step: 292190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:47,196-Speed 9229.33 samples/sec   Loss 3.5609   LearningRate 0.0016   Epoch: 17   Global Step: 292200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:48,319-Speed 9123.48 samples/sec   Loss 3.5652   LearningRate 0.0016   Epoch: 17   Global Step: 292210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:49,426-Speed 9250.85 samples/sec   Loss 3.5982   LearningRate 0.0016   Epoch: 17   Global Step: 292220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:50,566-Speed 8991.08 samples/sec   Loss 3.5614   LearningRate 0.0016   Epoch: 17   Global Step: 292230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:51,710-Speed 8957.15 samples/sec   Loss 3.6314   LearningRate 0.0016   Epoch: 17   Global Step: 292240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:52,844-Speed 9032.67 samples/sec   Loss 3.6098   LearningRate 0.0016   Epoch: 17   Global Step: 292250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:53,955-Speed 9217.06 samples/sec   Loss 3.6026   LearningRate 0.0015   Epoch: 17   Global Step: 292260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:55,100-Speed 8950.64 samples/sec   Loss 3.5710   LearningRate 0.0015   Epoch: 17   Global Step: 292270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:56,224-Speed 9122.01 samples/sec   Loss 3.6568   LearningRate 0.0015   Epoch: 17   Global Step: 292280   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:22:57,341-Speed 9171.65 samples/sec   Loss 3.6319   LearningRate 0.0015   Epoch: 17   Global Step: 292290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:58,434-Speed 9373.10 samples/sec   Loss 3.6415   LearningRate 0.0015   Epoch: 17   Global Step: 292300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:22:59,532-Speed 9335.93 samples/sec   Loss 3.5771   LearningRate 0.0015   Epoch: 17   Global Step: 292310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:00,623-Speed 9390.74 samples/sec   Loss 3.5981   LearningRate 0.0015   Epoch: 17   Global Step: 292320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:01,765-Speed 8970.41 samples/sec   Loss 3.6005   LearningRate 0.0015   Epoch: 17   Global Step: 292330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:02,877-Speed 9210.22 samples/sec   Loss 3.5590   LearningRate 0.0015   Epoch: 17   Global Step: 292340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:04,023-Speed 8944.55 samples/sec   Loss 3.5985   LearningRate 0.0015   Epoch: 17   Global Step: 292350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:05,128-Speed 9274.89 samples/sec   Loss 3.5958   LearningRate 0.0015   Epoch: 17   Global Step: 292360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:06,249-Speed 9135.18 samples/sec   Loss 3.6468   LearningRate 0.0015   Epoch: 17   Global Step: 292370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:07,380-Speed 9061.68 samples/sec   Loss 3.5687   LearningRate 0.0015   Epoch: 17   Global Step: 292380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:08,540-Speed 8829.87 samples/sec   Loss 3.6056   LearningRate 0.0015   Epoch: 17   Global Step: 292390   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:23:09,650-Speed 9231.84 samples/sec   Loss 3.5389   LearningRate 0.0015   Epoch: 17   Global Step: 292400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:10,758-Speed 9249.13 samples/sec   Loss 3.5520   LearningRate 0.0015   Epoch: 17   Global Step: 292410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:11,913-Speed 8873.17 samples/sec   Loss 3.6107   LearningRate 0.0015   Epoch: 17   Global Step: 292420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:13,048-Speed 9026.83 samples/sec   Loss 3.5594   LearningRate 0.0015   Epoch: 17   Global Step: 292430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:14,194-Speed 8942.89 samples/sec   Loss 3.5492   LearningRate 0.0015   Epoch: 17   Global Step: 292440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:15,354-Speed 8830.43 samples/sec   Loss 3.5225   LearningRate 0.0015   Epoch: 17   Global Step: 292450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:16,472-Speed 9164.77 samples/sec   Loss 3.6119   LearningRate 0.0015   Epoch: 17   Global Step: 292460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:17,609-Speed 9012.10 samples/sec   Loss 3.6281   LearningRate 0.0015   Epoch: 17   Global Step: 292470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:18,747-Speed 9004.08 samples/sec   Loss 3.5933   LearningRate 0.0015   Epoch: 17   Global Step: 292480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:19,885-Speed 9007.19 samples/sec   Loss 3.6087   LearningRate 0.0015   Epoch: 17   Global Step: 292490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:20,971-Speed 9433.37 samples/sec   Loss 3.6283   LearningRate 0.0015   Epoch: 17   Global Step: 292500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:22,089-Speed 9161.88 samples/sec   Loss 3.6398   LearningRate 0.0015   Epoch: 17   Global Step: 292510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:23,214-Speed 9102.24 samples/sec   Loss 3.4980   LearningRate 0.0015   Epoch: 17   Global Step: 292520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:24,320-Speed 9267.83 samples/sec   Loss 3.5453   LearningRate 0.0015   Epoch: 17   Global Step: 292530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:25,470-Speed 8911.31 samples/sec   Loss 3.5647   LearningRate 0.0015   Epoch: 17   Global Step: 292540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:26,598-Speed 9075.40 samples/sec   Loss 3.6174   LearningRate 0.0015   Epoch: 17   Global Step: 292550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:27,717-Speed 9160.44 samples/sec   Loss 3.5804   LearningRate 0.0015   Epoch: 17   Global Step: 292560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:28,843-Speed 9097.28 samples/sec   Loss 3.6580   LearningRate 0.0015   Epoch: 17   Global Step: 292570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:29,974-Speed 9059.67 samples/sec   Loss 3.6745   LearningRate 0.0015   Epoch: 17   Global Step: 292580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:31,121-Speed 8939.40 samples/sec   Loss 3.6046   LearningRate 0.0015   Epoch: 17   Global Step: 292590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:32,304-Speed 8662.00 samples/sec   Loss 3.6147   LearningRate 0.0015   Epoch: 17   Global Step: 292600   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:23:33,428-Speed 9115.71 samples/sec   Loss 3.6179   LearningRate 0.0015   Epoch: 17   Global Step: 292610   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:23:34,552-Speed 9119.08 samples/sec   Loss 3.6203   LearningRate 0.0015   Epoch: 17   Global Step: 292620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:35,680-Speed 9081.22 samples/sec   Loss 3.7232   LearningRate 0.0015   Epoch: 17   Global Step: 292630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:36,819-Speed 8995.27 samples/sec   Loss 3.4914   LearningRate 0.0015   Epoch: 17   Global Step: 292640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:37,923-Speed 9278.49 samples/sec   Loss 3.6209   LearningRate 0.0015   Epoch: 17   Global Step: 292650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:39,069-Speed 8944.61 samples/sec   Loss 3.6362   LearningRate 0.0015   Epoch: 17   Global Step: 292660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:40,216-Speed 8929.48 samples/sec   Loss 3.5735   LearningRate 0.0015   Epoch: 17   Global Step: 292670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:41,309-Speed 9372.85 samples/sec   Loss 3.5742   LearningRate 0.0015   Epoch: 17   Global Step: 292680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:42,409-Speed 9320.29 samples/sec   Loss 3.5978   LearningRate 0.0015   Epoch: 17   Global Step: 292690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:43,508-Speed 9319.66 samples/sec   Loss 3.4981   LearningRate 0.0015   Epoch: 17   Global Step: 292700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:44,642-Speed 9033.93 samples/sec   Loss 3.6553   LearningRate 0.0015   Epoch: 17   Global Step: 292710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:45,772-Speed 9063.92 samples/sec   Loss 3.5302   LearningRate 0.0015   Epoch: 17   Global Step: 292720   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:23:46,890-Speed 9170.16 samples/sec   Loss 3.5972   LearningRate 0.0015   Epoch: 17   Global Step: 292730   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:23:48,022-Speed 9046.59 samples/sec   Loss 3.5812   LearningRate 0.0015   Epoch: 17   Global Step: 292740   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:23:49,161-Speed 8999.83 samples/sec   Loss 3.6006   LearningRate 0.0015   Epoch: 17   Global Step: 292750   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:23:50,338-Speed 8706.24 samples/sec   Loss 3.5781   LearningRate 0.0015   Epoch: 17   Global Step: 292760   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:23:51,465-Speed 9093.94 samples/sec   Loss 3.6201   LearningRate 0.0015   Epoch: 17   Global Step: 292770   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:23:52,546-Speed 9472.05 samples/sec   Loss 3.5717   LearningRate 0.0015   Epoch: 17   Global Step: 292780   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:23:53,678-Speed 9053.01 samples/sec   Loss 3.4791   LearningRate 0.0015   Epoch: 17   Global Step: 292790   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:23:54,846-Speed 8768.18 samples/sec   Loss 3.5628   LearningRate 0.0015   Epoch: 17   Global Step: 292800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:55,926-Speed 9489.82 samples/sec   Loss 3.5987   LearningRate 0.0015   Epoch: 17   Global Step: 292810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:57,053-Speed 9088.70 samples/sec   Loss 3.5986   LearningRate 0.0015   Epoch: 17   Global Step: 292820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:58,212-Speed 8849.04 samples/sec   Loss 3.5768   LearningRate 0.0015   Epoch: 17   Global Step: 292830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:23:59,365-Speed 8884.02 samples/sec   Loss 3.6610   LearningRate 0.0015   Epoch: 17   Global Step: 292840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:00,510-Speed 8948.79 samples/sec   Loss 3.5925   LearningRate 0.0015   Epoch: 17   Global Step: 292850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:01,626-Speed 9177.66 samples/sec   Loss 3.5231   LearningRate 0.0015   Epoch: 17   Global Step: 292860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:02,775-Speed 8916.65 samples/sec   Loss 3.5606   LearningRate 0.0015   Epoch: 17   Global Step: 292870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:03,888-Speed 9206.00 samples/sec   Loss 3.6073   LearningRate 0.0015   Epoch: 17   Global Step: 292880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:05,026-Speed 9006.52 samples/sec   Loss 3.6125   LearningRate 0.0015   Epoch: 17   Global Step: 292890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:06,158-Speed 9053.87 samples/sec   Loss 3.6022   LearningRate 0.0015   Epoch: 17   Global Step: 292900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:07,358-Speed 8538.52 samples/sec   Loss 3.5835   LearningRate 0.0015   Epoch: 17   Global Step: 292910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:08,483-Speed 9109.32 samples/sec   Loss 3.6719   LearningRate 0.0015   Epoch: 17   Global Step: 292920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:09,631-Speed 8928.25 samples/sec   Loss 3.5948   LearningRate 0.0015   Epoch: 17   Global Step: 292930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:10,768-Speed 9010.68 samples/sec   Loss 3.5401   LearningRate 0.0015   Epoch: 17   Global Step: 292940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:11,920-Speed 8894.68 samples/sec   Loss 3.6335   LearningRate 0.0015   Epoch: 17   Global Step: 292950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:13,071-Speed 8896.64 samples/sec   Loss 3.6842   LearningRate 0.0015   Epoch: 17   Global Step: 292960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:14,177-Speed 9268.00 samples/sec   Loss 3.7259   LearningRate 0.0015   Epoch: 17   Global Step: 292970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:15,299-Speed 9129.18 samples/sec   Loss 3.5491   LearningRate 0.0015   Epoch: 17   Global Step: 292980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:16,376-Speed 9515.19 samples/sec   Loss 3.5628   LearningRate 0.0015   Epoch: 17   Global Step: 292990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:17,452-Speed 9526.00 samples/sec   Loss 3.5818   LearningRate 0.0015   Epoch: 17   Global Step: 293000   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:24:18,532-Speed 9477.78 samples/sec   Loss 3.5717   LearningRate 0.0015   Epoch: 17   Global Step: 293010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:19,617-Speed 9447.37 samples/sec   Loss 3.5374   LearningRate 0.0015   Epoch: 17   Global Step: 293020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:20,779-Speed 8816.54 samples/sec   Loss 3.6331   LearningRate 0.0015   Epoch: 17   Global Step: 293030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:21,891-Speed 9214.25 samples/sec   Loss 3.6111   LearningRate 0.0015   Epoch: 17   Global Step: 293040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:22,982-Speed 9392.41 samples/sec   Loss 3.6154   LearningRate 0.0015   Epoch: 17   Global Step: 293050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:24,096-Speed 9192.97 samples/sec   Loss 3.6368   LearningRate 0.0015   Epoch: 17   Global Step: 293060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:25,239-Speed 8965.11 samples/sec   Loss 3.6646   LearningRate 0.0015   Epoch: 17   Global Step: 293070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:26,358-Speed 9156.19 samples/sec   Loss 3.5270   LearningRate 0.0015   Epoch: 17   Global Step: 293080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:27,452-Speed 9370.90 samples/sec   Loss 3.5048   LearningRate 0.0015   Epoch: 17   Global Step: 293090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:28,606-Speed 8876.19 samples/sec   Loss 3.6374   LearningRate 0.0015   Epoch: 17   Global Step: 293100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:29,717-Speed 9225.33 samples/sec   Loss 3.6489   LearningRate 0.0015   Epoch: 17   Global Step: 293110   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:24:30,853-Speed 9018.80 samples/sec   Loss 3.5571   LearningRate 0.0015   Epoch: 17   Global Step: 293120   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:24:31,950-Speed 9343.36 samples/sec   Loss 3.5917   LearningRate 0.0015   Epoch: 17   Global Step: 293130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:24:33,086-Speed 9014.96 samples/sec   Loss 3.5398   LearningRate 0.0015   Epoch: 17   Global Step: 293140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:24:34,184-Speed 9334.36 samples/sec   Loss 3.6733   LearningRate 0.0015   Epoch: 17   Global Step: 293150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:24:35,309-Speed 9104.78 samples/sec   Loss 3.5507   LearningRate 0.0015   Epoch: 17   Global Step: 293160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:24:36,410-Speed 9313.22 samples/sec   Loss 3.6248   LearningRate 0.0015   Epoch: 17   Global Step: 293170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:24:37,588-Speed 8690.68 samples/sec   Loss 3.6390   LearningRate 0.0015   Epoch: 17   Global Step: 293180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:24:38,739-Speed 8902.01 samples/sec   Loss 3.5644   LearningRate 0.0015   Epoch: 17   Global Step: 293190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:24:39,834-Speed 9359.53 samples/sec   Loss 3.5886   LearningRate 0.0015   Epoch: 17   Global Step: 293200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:24:40,932-Speed 9328.51 samples/sec   Loss 3.6822   LearningRate 0.0015   Epoch: 17   Global Step: 293210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:24:42,013-Speed 9480.34 samples/sec   Loss 3.5772   LearningRate 0.0015   Epoch: 17   Global Step: 293220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:24:43,176-Speed 8807.33 samples/sec   Loss 3.5020   LearningRate 0.0015   Epoch: 17   Global Step: 293230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:44,335-Speed 8843.23 samples/sec   Loss 3.5604   LearningRate 0.0015   Epoch: 17   Global Step: 293240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:45,418-Speed 9456.78 samples/sec   Loss 3.5601   LearningRate 0.0015   Epoch: 17   Global Step: 293250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:46,557-Speed 9000.43 samples/sec   Loss 3.6361   LearningRate 0.0015   Epoch: 17   Global Step: 293260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:47,665-Speed 9248.22 samples/sec   Loss 3.6230   LearningRate 0.0015   Epoch: 17   Global Step: 293270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:48,805-Speed 8989.77 samples/sec   Loss 3.5789   LearningRate 0.0015   Epoch: 17   Global Step: 293280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:49,973-Speed 8769.27 samples/sec   Loss 3.5829   LearningRate 0.0015   Epoch: 17   Global Step: 293290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:51,096-Speed 9126.42 samples/sec   Loss 3.5653   LearningRate 0.0015   Epoch: 17   Global Step: 293300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:52,246-Speed 8912.24 samples/sec   Loss 3.5614   LearningRate 0.0015   Epoch: 17   Global Step: 293310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:53,358-Speed 9211.02 samples/sec   Loss 3.5948   LearningRate 0.0015   Epoch: 17   Global Step: 293320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:54,514-Speed 8864.42 samples/sec   Loss 3.5544   LearningRate 0.0015   Epoch: 17   Global Step: 293330   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:24:55,670-Speed 8863.40 samples/sec   Loss 3.6229   LearningRate 0.0015   Epoch: 17   Global Step: 293340   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:24:56,817-Speed 8927.28 samples/sec   Loss 3.5770   LearningRate 0.0015   Epoch: 17   Global Step: 293350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:57,969-Speed 8900.28 samples/sec   Loss 3.6541   LearningRate 0.0015   Epoch: 17   Global Step: 293360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:24:59,149-Speed 8677.78 samples/sec   Loss 3.5694   LearningRate 0.0015   Epoch: 17   Global Step: 293370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:00,241-Speed 9387.91 samples/sec   Loss 3.6173   LearningRate 0.0015   Epoch: 17   Global Step: 293380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:01,349-Speed 9245.86 samples/sec   Loss 3.5136   LearningRate 0.0015   Epoch: 17   Global Step: 293390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:02,468-Speed 9154.39 samples/sec   Loss 3.6321   LearningRate 0.0015   Epoch: 17   Global Step: 293400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:03,608-Speed 8987.15 samples/sec   Loss 3.6284   LearningRate 0.0015   Epoch: 17   Global Step: 293410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:04,732-Speed 9118.56 samples/sec   Loss 3.6161   LearningRate 0.0015   Epoch: 17   Global Step: 293420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:05,928-Speed 8570.76 samples/sec   Loss 3.6236   LearningRate 0.0015   Epoch: 17   Global Step: 293430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:07,074-Speed 8942.29 samples/sec   Loss 3.6016   LearningRate 0.0015   Epoch: 17   Global Step: 293440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:08,189-Speed 9185.40 samples/sec   Loss 3.5572   LearningRate 0.0015   Epoch: 17   Global Step: 293450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:25:09,300-Speed 9220.05 samples/sec   Loss 3.6272   LearningRate 0.0015   Epoch: 17   Global Step: 293460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:10,408-Speed 9244.53 samples/sec   Loss 3.5806   LearningRate 0.0015   Epoch: 17   Global Step: 293470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:11,535-Speed 9089.95 samples/sec   Loss 3.6428   LearningRate 0.0015   Epoch: 17   Global Step: 293480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:12,673-Speed 9006.78 samples/sec   Loss 3.5599   LearningRate 0.0015   Epoch: 17   Global Step: 293490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:13,815-Speed 8971.71 samples/sec   Loss 3.5548   LearningRate 0.0015   Epoch: 17   Global Step: 293500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:14,926-Speed 9223.06 samples/sec   Loss 3.6591   LearningRate 0.0015   Epoch: 17   Global Step: 293510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:16,090-Speed 8799.83 samples/sec   Loss 3.6485   LearningRate 0.0015   Epoch: 17   Global Step: 293520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:17,188-Speed 9338.58 samples/sec   Loss 3.5937   LearningRate 0.0015   Epoch: 17   Global Step: 293530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:18,286-Speed 9330.10 samples/sec   Loss 3.5876   LearningRate 0.0015   Epoch: 17   Global Step: 293540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:19,430-Speed 8955.70 samples/sec   Loss 3.6794   LearningRate 0.0015   Epoch: 17   Global Step: 293550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:20,601-Speed 8751.43 samples/sec   Loss 3.6140   LearningRate 0.0015   Epoch: 17   Global Step: 293560   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:25:21,712-Speed 9213.49 samples/sec   Loss 3.5449   LearningRate 0.0015   Epoch: 17   Global Step: 293570   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:25:22,825-Speed 9214.23 samples/sec   Loss 3.5668   LearningRate 0.0015   Epoch: 17   Global Step: 293580   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:25:23,921-Speed 9351.92 samples/sec   Loss 3.6162   LearningRate 0.0015   Epoch: 17   Global Step: 293590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:25,032-Speed 9222.47 samples/sec   Loss 3.5455   LearningRate 0.0015   Epoch: 17   Global Step: 293600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:26,160-Speed 9076.36 samples/sec   Loss 3.5743   LearningRate 0.0015   Epoch: 17   Global Step: 293610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:27,326-Speed 8787.47 samples/sec   Loss 3.6672   LearningRate 0.0015   Epoch: 17   Global Step: 293620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:28,445-Speed 9164.55 samples/sec   Loss 3.6550   LearningRate 0.0014   Epoch: 17   Global Step: 293630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:29,579-Speed 9033.25 samples/sec   Loss 3.6379   LearningRate 0.0014   Epoch: 17   Global Step: 293640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:30,654-Speed 9528.66 samples/sec   Loss 3.6204   LearningRate 0.0014   Epoch: 17   Global Step: 293650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:31,797-Speed 8962.63 samples/sec   Loss 3.5371   LearningRate 0.0014   Epoch: 17   Global Step: 293660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:32,986-Speed 8615.98 samples/sec   Loss 3.5061   LearningRate 0.0014   Epoch: 17   Global Step: 293670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:34,141-Speed 8869.65 samples/sec   Loss 3.5734   LearningRate 0.0014   Epoch: 17   Global Step: 293680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:35,303-Speed 8821.84 samples/sec   Loss 3.5929   LearningRate 0.0014   Epoch: 17   Global Step: 293690   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:25:36,440-Speed 9013.67 samples/sec   Loss 3.5504   LearningRate 0.0014   Epoch: 17   Global Step: 293700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:37,596-Speed 8861.81 samples/sec   Loss 3.6601   LearningRate 0.0014   Epoch: 17   Global Step: 293710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:38,751-Speed 8869.31 samples/sec   Loss 3.6237   LearningRate 0.0014   Epoch: 17   Global Step: 293720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:39,885-Speed 9034.06 samples/sec   Loss 3.5605   LearningRate 0.0014   Epoch: 17   Global Step: 293730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:41,047-Speed 8822.29 samples/sec   Loss 3.5597   LearningRate 0.0014   Epoch: 17   Global Step: 293740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:42,125-Speed 9507.89 samples/sec   Loss 3.5787   LearningRate 0.0014   Epoch: 17   Global Step: 293750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:43,210-Speed 9443.58 samples/sec   Loss 3.5943   LearningRate 0.0014   Epoch: 17   Global Step: 293760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:44,352-Speed 8971.48 samples/sec   Loss 3.5715   LearningRate 0.0014   Epoch: 17   Global Step: 293770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:45,449-Speed 9335.08 samples/sec   Loss 3.6062   LearningRate 0.0014   Epoch: 17   Global Step: 293780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:46,548-Speed 9320.91 samples/sec   Loss 3.6753   LearningRate 0.0014   Epoch: 17   Global Step: 293790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:47,655-Speed 9260.06 samples/sec   Loss 3.5315   LearningRate 0.0014   Epoch: 17   Global Step: 293800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:48,745-Speed 9397.40 samples/sec   Loss 3.6209   LearningRate 0.0014   Epoch: 17   Global Step: 293810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:49,884-Speed 8998.01 samples/sec   Loss 3.5581   LearningRate 0.0014   Epoch: 17   Global Step: 293820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:51,002-Speed 9166.64 samples/sec   Loss 3.6496   LearningRate 0.0014   Epoch: 17   Global Step: 293830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:52,112-Speed 9231.75 samples/sec   Loss 3.4629   LearningRate 0.0014   Epoch: 17   Global Step: 293840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:53,296-Speed 8648.80 samples/sec   Loss 3.5201   LearningRate 0.0014   Epoch: 17   Global Step: 293850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:54,431-Speed 9027.97 samples/sec   Loss 3.6269   LearningRate 0.0014   Epoch: 17   Global Step: 293860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:55,585-Speed 8880.22 samples/sec   Loss 3.5986   LearningRate 0.0014   Epoch: 17   Global Step: 293870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:56,703-Speed 9158.35 samples/sec   Loss 3.6023   LearningRate 0.0014   Epoch: 17   Global Step: 293880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:57,853-Speed 8915.49 samples/sec   Loss 3.6045   LearningRate 0.0014   Epoch: 17   Global Step: 293890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:25:58,980-Speed 9091.20 samples/sec   Loss 3.6096   LearningRate 0.0014   Epoch: 17   Global Step: 293900   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:26:00,090-Speed 9234.26 samples/sec   Loss 3.6347   LearningRate 0.0014   Epoch: 17   Global Step: 293910   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:26:01,207-Speed 9169.27 samples/sec   Loss 3.6331   LearningRate 0.0014   Epoch: 17   Global Step: 293920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:26:02,353-Speed 8943.32 samples/sec   Loss 3.5930   LearningRate 0.0014   Epoch: 17   Global Step: 293930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:26:03,504-Speed 8900.59 samples/sec   Loss 3.5826   LearningRate 0.0014   Epoch: 17   Global Step: 293940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:26:04,643-Speed 9001.94 samples/sec   Loss 3.5570   LearningRate 0.0014   Epoch: 17   Global Step: 293950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:26:05,767-Speed 9114.01 samples/sec   Loss 3.5864   LearningRate 0.0014   Epoch: 17   Global Step: 293960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:26:06,901-Speed 9035.75 samples/sec   Loss 3.6109   LearningRate 0.0014   Epoch: 17   Global Step: 293970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:26:08,045-Speed 8955.89 samples/sec   Loss 3.6092   LearningRate 0.0014   Epoch: 17   Global Step: 293980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:26:09,159-Speed 9197.23 samples/sec   Loss 3.5715   LearningRate 0.0014   Epoch: 17   Global Step: 293990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:26:10,299-Speed 8987.38 samples/sec   Loss 3.5933   LearningRate 0.0014   Epoch: 17   Global Step: 294000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:26:32,314-[lfw][294000]XNorm: 6.678325
Training: 2022-04-11 23:26:32,315-[lfw][294000]Accuracy-Flip: 0.99733+-0.00309
Training: 2022-04-11 23:26:32,315-[lfw][294000]Accuracy-Highest: 0.99733
Training: 2022-04-11 23:26:57,790-[cfp_fp][294000]XNorm: 5.814234
Training: 2022-04-11 23:26:57,791-[cfp_fp][294000]Accuracy-Flip: 0.97214+-0.00909
Training: 2022-04-11 23:26:57,791-[cfp_fp][294000]Accuracy-Highest: 0.97386
Training: 2022-04-11 23:27:19,777-[agedb_30][294000]XNorm: 6.492500
Training: 2022-04-11 23:27:19,778-[agedb_30][294000]Accuracy-Flip: 0.97017+-0.00831
Training: 2022-04-11 23:27:19,778-[agedb_30][294000]Accuracy-Highest: 0.97417
Training: 2022-04-11 23:27:20,912-Speed 145.02 samples/sec   Loss 3.5274   LearningRate 0.0014   Epoch: 17   Global Step: 294010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:22,013-Speed 9307.29 samples/sec   Loss 3.5308   LearningRate 0.0014   Epoch: 17   Global Step: 294020   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:27:23,146-Speed 9037.81 samples/sec   Loss 3.5980   LearningRate 0.0014   Epoch: 17   Global Step: 294030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:24,273-Speed 9093.16 samples/sec   Loss 3.6103   LearningRate 0.0014   Epoch: 17   Global Step: 294040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:25,382-Speed 9240.43 samples/sec   Loss 3.5567   LearningRate 0.0014   Epoch: 17   Global Step: 294050   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:26,517-Speed 9029.62 samples/sec   Loss 3.6077   LearningRate 0.0014   Epoch: 17   Global Step: 294060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:27,719-Speed 8523.44 samples/sec   Loss 3.6317   LearningRate 0.0014   Epoch: 17   Global Step: 294070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:28,851-Speed 9050.65 samples/sec   Loss 3.6269   LearningRate 0.0014   Epoch: 17   Global Step: 294080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:29,979-Speed 9083.84 samples/sec   Loss 3.5526   LearningRate 0.0014   Epoch: 17   Global Step: 294090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:31,080-Speed 9308.00 samples/sec   Loss 3.6313   LearningRate 0.0014   Epoch: 17   Global Step: 294100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:32,258-Speed 8696.69 samples/sec   Loss 3.5565   LearningRate 0.0014   Epoch: 17   Global Step: 294110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:33,370-Speed 9209.52 samples/sec   Loss 3.6602   LearningRate 0.0014   Epoch: 17   Global Step: 294120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:34,494-Speed 9117.29 samples/sec   Loss 3.6134   LearningRate 0.0014   Epoch: 17   Global Step: 294130   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:27:35,578-Speed 9452.29 samples/sec   Loss 3.5652   LearningRate 0.0014   Epoch: 17   Global Step: 294140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:36,719-Speed 8986.45 samples/sec   Loss 3.5971   LearningRate 0.0014   Epoch: 17   Global Step: 294150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:37,891-Speed 8741.52 samples/sec   Loss 3.6306   LearningRate 0.0014   Epoch: 17   Global Step: 294160   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:38,990-Speed 9324.06 samples/sec   Loss 3.6288   LearningRate 0.0014   Epoch: 17   Global Step: 294170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:40,136-Speed 8933.44 samples/sec   Loss 3.6021   LearningRate 0.0014   Epoch: 17   Global Step: 294180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:41,307-Speed 8751.17 samples/sec   Loss 3.6385   LearningRate 0.0014   Epoch: 17   Global Step: 294190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:42,428-Speed 9145.39 samples/sec   Loss 3.5327   LearningRate 0.0014   Epoch: 17   Global Step: 294200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:43,526-Speed 9331.41 samples/sec   Loss 3.6508   LearningRate 0.0014   Epoch: 17   Global Step: 294210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:44,641-Speed 9184.82 samples/sec   Loss 3.6010   LearningRate 0.0014   Epoch: 17   Global Step: 294220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:45,735-Speed 9367.58 samples/sec   Loss 3.5564   LearningRate 0.0014   Epoch: 17   Global Step: 294230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:46,857-Speed 9129.35 samples/sec   Loss 3.6773   LearningRate 0.0014   Epoch: 17   Global Step: 294240   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:27:47,945-Speed 9422.63 samples/sec   Loss 3.6087   LearningRate 0.0014   Epoch: 17   Global Step: 294250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:49,120-Speed 8714.36 samples/sec   Loss 3.5738   LearningRate 0.0014   Epoch: 17   Global Step: 294260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:50,248-Speed 9085.14 samples/sec   Loss 3.6756   LearningRate 0.0014   Epoch: 17   Global Step: 294270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:51,395-Speed 8935.14 samples/sec   Loss 3.5886   LearningRate 0.0014   Epoch: 17   Global Step: 294280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:52,540-Speed 8952.00 samples/sec   Loss 3.6313   LearningRate 0.0014   Epoch: 17   Global Step: 294290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:53,668-Speed 9083.69 samples/sec   Loss 3.5446   LearningRate 0.0014   Epoch: 17   Global Step: 294300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:54,801-Speed 9040.01 samples/sec   Loss 3.6463   LearningRate 0.0014   Epoch: 17   Global Step: 294310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:55,938-Speed 9011.17 samples/sec   Loss 3.5638   LearningRate 0.0014   Epoch: 17   Global Step: 294320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:57,054-Speed 9184.36 samples/sec   Loss 3.5769   LearningRate 0.0014   Epoch: 17   Global Step: 294330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:58,163-Speed 9233.63 samples/sec   Loss 3.5966   LearningRate 0.0014   Epoch: 17   Global Step: 294340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:27:59,253-Speed 9404.22 samples/sec   Loss 3.5546   LearningRate 0.0014   Epoch: 17   Global Step: 294350   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:28:00,313-Speed 9664.41 samples/sec   Loss 3.6372   LearningRate 0.0014   Epoch: 17   Global Step: 294360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:01,449-Speed 9023.21 samples/sec   Loss 3.6250   LearningRate 0.0014   Epoch: 17   Global Step: 294370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:02,606-Speed 8853.51 samples/sec   Loss 3.6581   LearningRate 0.0014   Epoch: 17   Global Step: 294380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:03,743-Speed 9012.32 samples/sec   Loss 3.6573   LearningRate 0.0014   Epoch: 17   Global Step: 294390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:04,895-Speed 8889.16 samples/sec   Loss 3.6810   LearningRate 0.0014   Epoch: 17   Global Step: 294400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:05,996-Speed 9309.24 samples/sec   Loss 3.5590   LearningRate 0.0014   Epoch: 17   Global Step: 294410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:07,168-Speed 8746.45 samples/sec   Loss 3.5725   LearningRate 0.0014   Epoch: 17   Global Step: 294420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:08,293-Speed 9110.36 samples/sec   Loss 3.6037   LearningRate 0.0014   Epoch: 17   Global Step: 294430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:09,420-Speed 9087.52 samples/sec   Loss 3.5652   LearningRate 0.0014   Epoch: 17   Global Step: 294440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:10,566-Speed 8942.80 samples/sec   Loss 3.5632   LearningRate 0.0014   Epoch: 17   Global Step: 294450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:11,735-Speed 8769.82 samples/sec   Loss 3.6496   LearningRate 0.0014   Epoch: 17   Global Step: 294460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:12,898-Speed 8804.82 samples/sec   Loss 3.5595   LearningRate 0.0014   Epoch: 17   Global Step: 294470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:14,011-Speed 9214.01 samples/sec   Loss 3.6022   LearningRate 0.0014   Epoch: 17   Global Step: 294480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:15,102-Speed 9390.36 samples/sec   Loss 3.6473   LearningRate 0.0014   Epoch: 17   Global Step: 294490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:16,221-Speed 9156.40 samples/sec   Loss 3.5587   LearningRate 0.0014   Epoch: 17   Global Step: 294500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:17,345-Speed 9114.06 samples/sec   Loss 3.6590   LearningRate 0.0014   Epoch: 17   Global Step: 294510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:18,532-Speed 8626.01 samples/sec   Loss 3.5838   LearningRate 0.0014   Epoch: 17   Global Step: 294520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:19,648-Speed 9182.01 samples/sec   Loss 3.6833   LearningRate 0.0014   Epoch: 17   Global Step: 294530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:20,763-Speed 9190.68 samples/sec   Loss 3.5269   LearningRate 0.0014   Epoch: 17   Global Step: 294540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:21,920-Speed 8853.53 samples/sec   Loss 3.6227   LearningRate 0.0014   Epoch: 17   Global Step: 294550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:23,059-Speed 8995.55 samples/sec   Loss 3.5314   LearningRate 0.0014   Epoch: 17   Global Step: 294560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:24,192-Speed 9044.56 samples/sec   Loss 3.5841   LearningRate 0.0014   Epoch: 17   Global Step: 294570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:25,359-Speed 8781.18 samples/sec   Loss 3.6103   LearningRate 0.0014   Epoch: 17   Global Step: 294580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:26,496-Speed 9007.83 samples/sec   Loss 3.5919   LearningRate 0.0014   Epoch: 17   Global Step: 294590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:27,640-Speed 8961.96 samples/sec   Loss 3.5976   LearningRate 0.0014   Epoch: 17   Global Step: 294600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:28,758-Speed 9165.19 samples/sec   Loss 3.6235   LearningRate 0.0014   Epoch: 17   Global Step: 294610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:29,960-Speed 8523.93 samples/sec   Loss 3.6385   LearningRate 0.0014   Epoch: 17   Global Step: 294620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:31,076-Speed 9173.67 samples/sec   Loss 3.5226   LearningRate 0.0014   Epoch: 17   Global Step: 294630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-11 23:28:32,215-Speed 9001.75 samples/sec   Loss 3.5473   LearningRate 0.0014   Epoch: 17   Global Step: 294640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:33,417-Speed 8521.07 samples/sec   Loss 3.5414   LearningRate 0.0014   Epoch: 17   Global Step: 294650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:34,578-Speed 8824.05 samples/sec   Loss 3.5388   LearningRate 0.0014   Epoch: 17   Global Step: 294660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:35,700-Speed 9135.83 samples/sec   Loss 3.5858   LearningRate 0.0014   Epoch: 17   Global Step: 294670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:36,798-Speed 9332.96 samples/sec   Loss 3.6382   LearningRate 0.0014   Epoch: 17   Global Step: 294680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:37,949-Speed 8904.18 samples/sec   Loss 3.5816   LearningRate 0.0014   Epoch: 17   Global Step: 294690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:39,087-Speed 8999.22 samples/sec   Loss 3.5869   LearningRate 0.0014   Epoch: 17   Global Step: 294700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:40,244-Speed 8860.10 samples/sec   Loss 3.6215   LearningRate 0.0014   Epoch: 17   Global Step: 294710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:41,403-Speed 8832.64 samples/sec   Loss 3.5505   LearningRate 0.0014   Epoch: 17   Global Step: 294720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:42,535-Speed 9057.06 samples/sec   Loss 3.5691   LearningRate 0.0014   Epoch: 17   Global Step: 294730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:43,700-Speed 8793.98 samples/sec   Loss 3.6085   LearningRate 0.0014   Epoch: 17   Global Step: 294740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:44,806-Speed 9260.77 samples/sec   Loss 3.6831   LearningRate 0.0014   Epoch: 17   Global Step: 294750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:45,954-Speed 8928.59 samples/sec   Loss 3.6073   LearningRate 0.0014   Epoch: 17   Global Step: 294760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:47,087-Speed 9042.55 samples/sec   Loss 3.6399   LearningRate 0.0014   Epoch: 17   Global Step: 294770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:48,261-Speed 8727.01 samples/sec   Loss 3.5288   LearningRate 0.0014   Epoch: 17   Global Step: 294780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:49,424-Speed 8807.74 samples/sec   Loss 3.6417   LearningRate 0.0014   Epoch: 17   Global Step: 294790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:50,539-Speed 9191.30 samples/sec   Loss 3.6330   LearningRate 0.0014   Epoch: 17   Global Step: 294800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:51,657-Speed 9163.54 samples/sec   Loss 3.5747   LearningRate 0.0014   Epoch: 17   Global Step: 294810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:52,810-Speed 8887.12 samples/sec   Loss 3.5730   LearningRate 0.0014   Epoch: 17   Global Step: 294820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:53,946-Speed 9020.52 samples/sec   Loss 3.5967   LearningRate 0.0014   Epoch: 17   Global Step: 294830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:55,080-Speed 9031.50 samples/sec   Loss 3.5557   LearningRate 0.0014   Epoch: 17   Global Step: 294840   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:56,237-Speed 8860.51 samples/sec   Loss 3.6018   LearningRate 0.0014   Epoch: 17   Global Step: 294850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:57,340-Speed 9287.44 samples/sec   Loss 3.5835   LearningRate 0.0014   Epoch: 17   Global Step: 294860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:58,486-Speed 8939.82 samples/sec   Loss 3.6206   LearningRate 0.0014   Epoch: 17   Global Step: 294870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:28:59,618-Speed 9051.25 samples/sec   Loss 3.6465   LearningRate 0.0014   Epoch: 17   Global Step: 294880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:00,719-Speed 9309.08 samples/sec   Loss 3.6117   LearningRate 0.0014   Epoch: 17   Global Step: 294890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:01,846-Speed 9086.77 samples/sec   Loss 3.5224   LearningRate 0.0014   Epoch: 17   Global Step: 294900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:03,007-Speed 8827.62 samples/sec   Loss 3.5911   LearningRate 0.0014   Epoch: 17   Global Step: 294910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:04,163-Speed 8866.80 samples/sec   Loss 3.6101   LearningRate 0.0014   Epoch: 17   Global Step: 294920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:05,332-Speed 8770.00 samples/sec   Loss 3.5767   LearningRate 0.0014   Epoch: 17   Global Step: 294930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:06,438-Speed 9258.99 samples/sec   Loss 3.6099   LearningRate 0.0014   Epoch: 17   Global Step: 294940   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:29:07,580-Speed 8976.12 samples/sec   Loss 3.6488   LearningRate 0.0014   Epoch: 17   Global Step: 294950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:08,708-Speed 9084.80 samples/sec   Loss 3.6146   LearningRate 0.0014   Epoch: 17   Global Step: 294960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:09,815-Speed 9255.62 samples/sec   Loss 3.5794   LearningRate 0.0014   Epoch: 17   Global Step: 294970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:10,935-Speed 9149.74 samples/sec   Loss 3.5947   LearningRate 0.0014   Epoch: 17   Global Step: 294980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:12,026-Speed 9392.33 samples/sec   Loss 3.5757   LearningRate 0.0014   Epoch: 17   Global Step: 294990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:13,135-Speed 9236.80 samples/sec   Loss 3.5609   LearningRate 0.0014   Epoch: 17   Global Step: 295000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:14,303-Speed 8771.98 samples/sec   Loss 3.6121   LearningRate 0.0014   Epoch: 17   Global Step: 295010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:15,414-Speed 9225.18 samples/sec   Loss 3.6371   LearningRate 0.0014   Epoch: 17   Global Step: 295020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:16,568-Speed 8879.54 samples/sec   Loss 3.5202   LearningRate 0.0014   Epoch: 17   Global Step: 295030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:17,683-Speed 9185.67 samples/sec   Loss 3.6147   LearningRate 0.0013   Epoch: 17   Global Step: 295040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:18,805-Speed 9128.41 samples/sec   Loss 3.6675   LearningRate 0.0013   Epoch: 17   Global Step: 295050   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:29:19,937-Speed 9053.02 samples/sec   Loss 3.5913   LearningRate 0.0013   Epoch: 17   Global Step: 295060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:21,020-Speed 9462.86 samples/sec   Loss 3.5780   LearningRate 0.0013   Epoch: 17   Global Step: 295070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:22,092-Speed 9558.69 samples/sec   Loss 3.5357   LearningRate 0.0013   Epoch: 17   Global Step: 295080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:23,236-Speed 8957.71 samples/sec   Loss 3.6238   LearningRate 0.0013   Epoch: 17   Global Step: 295090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:24,355-Speed 9156.70 samples/sec   Loss 3.6507   LearningRate 0.0013   Epoch: 17   Global Step: 295100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:25,438-Speed 9459.13 samples/sec   Loss 3.5715   LearningRate 0.0013   Epoch: 17   Global Step: 295110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:26,554-Speed 9181.23 samples/sec   Loss 3.5130   LearningRate 0.0013   Epoch: 17   Global Step: 295120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:27,701-Speed 8932.52 samples/sec   Loss 3.5731   LearningRate 0.0013   Epoch: 17   Global Step: 295130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:28,835-Speed 9036.46 samples/sec   Loss 3.5409   LearningRate 0.0013   Epoch: 17   Global Step: 295140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:29,947-Speed 9213.48 samples/sec   Loss 3.5538   LearningRate 0.0013   Epoch: 17   Global Step: 295150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:31,110-Speed 8807.00 samples/sec   Loss 3.5522   LearningRate 0.0013   Epoch: 17   Global Step: 295160   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:29:32,212-Speed 9300.40 samples/sec   Loss 3.5863   LearningRate 0.0013   Epoch: 17   Global Step: 295170   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:33,299-Speed 9423.32 samples/sec   Loss 3.6772   LearningRate 0.0013   Epoch: 17   Global Step: 295180   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:34,427-Speed 9081.72 samples/sec   Loss 3.6300   LearningRate 0.0013   Epoch: 17   Global Step: 295190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:35,557-Speed 9066.36 samples/sec   Loss 3.6294   LearningRate 0.0013   Epoch: 17   Global Step: 295200   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:36,667-Speed 9234.28 samples/sec   Loss 3.6567   LearningRate 0.0013   Epoch: 17   Global Step: 295210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:37,800-Speed 9046.16 samples/sec   Loss 3.5487   LearningRate 0.0013   Epoch: 17   Global Step: 295220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:38,934-Speed 9028.34 samples/sec   Loss 3.5918   LearningRate 0.0013   Epoch: 17   Global Step: 295230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:40,042-Speed 9248.06 samples/sec   Loss 3.6137   LearningRate 0.0013   Epoch: 17   Global Step: 295240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:41,166-Speed 9118.88 samples/sec   Loss 3.6678   LearningRate 0.0013   Epoch: 17   Global Step: 295250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:42,332-Speed 8788.88 samples/sec   Loss 3.6006   LearningRate 0.0013   Epoch: 17   Global Step: 295260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:43,471-Speed 8999.11 samples/sec   Loss 3.6180   LearningRate 0.0013   Epoch: 17   Global Step: 295270   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:29:44,572-Speed 9310.52 samples/sec   Loss 3.6352   LearningRate 0.0013   Epoch: 17   Global Step: 295280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:45,660-Speed 9413.47 samples/sec   Loss 3.5744   LearningRate 0.0013   Epoch: 17   Global Step: 295290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:46,776-Speed 9177.64 samples/sec   Loss 3.5821   LearningRate 0.0013   Epoch: 17   Global Step: 295300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:47,908-Speed 9051.89 samples/sec   Loss 3.6365   LearningRate 0.0013   Epoch: 17   Global Step: 295310   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:49,028-Speed 9150.42 samples/sec   Loss 3.5480   LearningRate 0.0013   Epoch: 17   Global Step: 295320   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:50,119-Speed 9390.35 samples/sec   Loss 3.5417   LearningRate 0.0013   Epoch: 17   Global Step: 295330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:51,256-Speed 9006.57 samples/sec   Loss 3.6423   LearningRate 0.0013   Epoch: 17   Global Step: 295340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:52,388-Speed 9054.78 samples/sec   Loss 3.5156   LearningRate 0.0013   Epoch: 17   Global Step: 295350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:53,608-Speed 8400.03 samples/sec   Loss 3.6115   LearningRate 0.0013   Epoch: 17   Global Step: 295360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:54,728-Speed 9142.45 samples/sec   Loss 3.6755   LearningRate 0.0013   Epoch: 17   Global Step: 295370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:55,861-Speed 9047.39 samples/sec   Loss 3.5555   LearningRate 0.0013   Epoch: 17   Global Step: 295380   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:29:56,940-Speed 9496.05 samples/sec   Loss 3.5793   LearningRate 0.0013   Epoch: 17   Global Step: 295390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:58,106-Speed 8787.79 samples/sec   Loss 3.5440   LearningRate 0.0013   Epoch: 17   Global Step: 295400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:29:59,269-Speed 8813.23 samples/sec   Loss 3.6943   LearningRate 0.0013   Epoch: 17   Global Step: 295410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:00,423-Speed 8880.93 samples/sec   Loss 3.7340   LearningRate 0.0013   Epoch: 17   Global Step: 295420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:01,559-Speed 9019.04 samples/sec   Loss 3.6071   LearningRate 0.0013   Epoch: 17   Global Step: 295430   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:02,699-Speed 8987.66 samples/sec   Loss 3.5516   LearningRate 0.0013   Epoch: 17   Global Step: 295440   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:03,840-Speed 8975.41 samples/sec   Loss 3.5230   LearningRate 0.0013   Epoch: 17   Global Step: 295450   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:04,983-Speed 8962.47 samples/sec   Loss 3.5653   LearningRate 0.0013   Epoch: 17   Global Step: 295460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:06,120-Speed 9013.80 samples/sec   Loss 3.5906   LearningRate 0.0013   Epoch: 17   Global Step: 295470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:07,209-Speed 9408.09 samples/sec   Loss 3.5369   LearningRate 0.0013   Epoch: 17   Global Step: 295480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:08,318-Speed 9241.10 samples/sec   Loss 3.6112   LearningRate 0.0013   Epoch: 17   Global Step: 295490   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:30:09,442-Speed 9114.43 samples/sec   Loss 3.5953   LearningRate 0.0013   Epoch: 17   Global Step: 295500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:10,589-Speed 8932.73 samples/sec   Loss 3.5720   LearningRate 0.0013   Epoch: 17   Global Step: 295510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:11,725-Speed 9017.47 samples/sec   Loss 3.4987   LearningRate 0.0013   Epoch: 17   Global Step: 295520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:12,860-Speed 9028.87 samples/sec   Loss 3.6179   LearningRate 0.0013   Epoch: 17   Global Step: 295530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:14,003-Speed 8967.08 samples/sec   Loss 3.6237   LearningRate 0.0013   Epoch: 17   Global Step: 295540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:15,127-Speed 9116.22 samples/sec   Loss 3.5854   LearningRate 0.0013   Epoch: 17   Global Step: 295550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:16,264-Speed 9009.86 samples/sec   Loss 3.6127   LearningRate 0.0013   Epoch: 17   Global Step: 295560   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:17,347-Speed 9462.01 samples/sec   Loss 3.5886   LearningRate 0.0013   Epoch: 17   Global Step: 295570   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:18,467-Speed 9141.33 samples/sec   Loss 3.6143   LearningRate 0.0013   Epoch: 17   Global Step: 295580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:19,579-Speed 9225.64 samples/sec   Loss 3.5688   LearningRate 0.0013   Epoch: 17   Global Step: 295590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:20,697-Speed 9159.86 samples/sec   Loss 3.6099   LearningRate 0.0013   Epoch: 17   Global Step: 295600   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:30:21,823-Speed 9099.18 samples/sec   Loss 3.5240   LearningRate 0.0013   Epoch: 17   Global Step: 295610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:22,944-Speed 9139.03 samples/sec   Loss 3.6153   LearningRate 0.0013   Epoch: 17   Global Step: 295620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:24,090-Speed 8946.05 samples/sec   Loss 3.6349   LearningRate 0.0013   Epoch: 17   Global Step: 295630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:25,194-Speed 9279.20 samples/sec   Loss 3.6551   LearningRate 0.0013   Epoch: 17   Global Step: 295640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:26,301-Speed 9257.84 samples/sec   Loss 3.5387   LearningRate 0.0013   Epoch: 17   Global Step: 295650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:27,430-Speed 9074.52 samples/sec   Loss 3.5649   LearningRate 0.0013   Epoch: 17   Global Step: 295660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:28,598-Speed 8771.08 samples/sec   Loss 3.5759   LearningRate 0.0013   Epoch: 17   Global Step: 295670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:29,783-Speed 8646.28 samples/sec   Loss 3.6522   LearningRate 0.0013   Epoch: 17   Global Step: 295680   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:30,899-Speed 9178.25 samples/sec   Loss 3.7082   LearningRate 0.0013   Epoch: 17   Global Step: 295690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:32,064-Speed 8793.64 samples/sec   Loss 3.5680   LearningRate 0.0013   Epoch: 17   Global Step: 295700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:33,210-Speed 8940.45 samples/sec   Loss 3.6275   LearningRate 0.0013   Epoch: 17   Global Step: 295710   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:30:34,358-Speed 8923.00 samples/sec   Loss 3.5876   LearningRate 0.0013   Epoch: 17   Global Step: 295720   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:30:35,478-Speed 9150.20 samples/sec   Loss 3.6184   LearningRate 0.0013   Epoch: 17   Global Step: 295730   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:30:36,601-Speed 9124.90 samples/sec   Loss 3.6254   LearningRate 0.0013   Epoch: 17   Global Step: 295740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:37,735-Speed 9039.57 samples/sec   Loss 3.5337   LearningRate 0.0013   Epoch: 17   Global Step: 295750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:38,829-Speed 9370.14 samples/sec   Loss 3.5802   LearningRate 0.0013   Epoch: 17   Global Step: 295760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:39,962-Speed 9040.12 samples/sec   Loss 3.6332   LearningRate 0.0013   Epoch: 17   Global Step: 295770   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:41,068-Speed 9269.02 samples/sec   Loss 3.5997   LearningRate 0.0013   Epoch: 17   Global Step: 295780   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:42,217-Speed 8910.88 samples/sec   Loss 3.5502   LearningRate 0.0013   Epoch: 17   Global Step: 295790   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:43,353-Speed 9024.21 samples/sec   Loss 3.6000   LearningRate 0.0013   Epoch: 17   Global Step: 295800   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:44,495-Speed 8971.80 samples/sec   Loss 3.5280   LearningRate 0.0013   Epoch: 17   Global Step: 295810   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:45,599-Speed 9280.70 samples/sec   Loss 3.6284   LearningRate 0.0013   Epoch: 17   Global Step: 295820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:46,744-Speed 8945.64 samples/sec   Loss 3.5420   LearningRate 0.0013   Epoch: 17   Global Step: 295830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:47,875-Speed 9060.68 samples/sec   Loss 3.6885   LearningRate 0.0013   Epoch: 17   Global Step: 295840   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:30:49,023-Speed 8921.83 samples/sec   Loss 3.5668   LearningRate 0.0013   Epoch: 17   Global Step: 295850   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:50,157-Speed 9038.89 samples/sec   Loss 3.5716   LearningRate 0.0013   Epoch: 17   Global Step: 295860   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:51,263-Speed 9261.02 samples/sec   Loss 3.5538   LearningRate 0.0013   Epoch: 17   Global Step: 295870   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:52,399-Speed 9021.14 samples/sec   Loss 3.5721   LearningRate 0.0013   Epoch: 17   Global Step: 295880   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:53,511-Speed 9212.81 samples/sec   Loss 3.6324   LearningRate 0.0013   Epoch: 17   Global Step: 295890   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:54,662-Speed 8900.05 samples/sec   Loss 3.6579   LearningRate 0.0013   Epoch: 17   Global Step: 295900   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:55,794-Speed 9050.73 samples/sec   Loss 3.5035   LearningRate 0.0013   Epoch: 17   Global Step: 295910   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:56,933-Speed 9002.53 samples/sec   Loss 3.6017   LearningRate 0.0013   Epoch: 17   Global Step: 295920   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:58,054-Speed 9138.77 samples/sec   Loss 3.5387   LearningRate 0.0013   Epoch: 17   Global Step: 295930   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:30:59,198-Speed 8958.19 samples/sec   Loss 3.5976   LearningRate 0.0013   Epoch: 17   Global Step: 295940   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:31:00,318-Speed 9149.13 samples/sec   Loss 3.6602   LearningRate 0.0013   Epoch: 17   Global Step: 295950   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:31:01,475-Speed 8857.54 samples/sec   Loss 3.6195   LearningRate 0.0013   Epoch: 17   Global Step: 295960   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:31:02,617-Speed 8970.39 samples/sec   Loss 3.5902   LearningRate 0.0013   Epoch: 17   Global Step: 295970   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:31:03,727-Speed 9225.69 samples/sec   Loss 3.6035   LearningRate 0.0013   Epoch: 17   Global Step: 295980   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:31:04,872-Speed 8947.03 samples/sec   Loss 3.5529   LearningRate 0.0013   Epoch: 17   Global Step: 295990   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:31:05,998-Speed 9103.32 samples/sec   Loss 3.5897   LearningRate 0.0013   Epoch: 17   Global Step: 296000   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:31:28,083-[lfw][296000]XNorm: 6.651927
Training: 2022-04-11 23:31:28,084-[lfw][296000]Accuracy-Flip: 0.99717+-0.00279
Training: 2022-04-11 23:31:28,085-[lfw][296000]Accuracy-Highest: 0.99733
Training: 2022-04-11 23:31:53,721-[cfp_fp][296000]XNorm: 5.814749
Training: 2022-04-11 23:31:53,722-[cfp_fp][296000]Accuracy-Flip: 0.97371+-0.00811
Training: 2022-04-11 23:31:53,722-[cfp_fp][296000]Accuracy-Highest: 0.97386
Training: 2022-04-11 23:32:15,864-[agedb_30][296000]XNorm: 6.480942
Training: 2022-04-11 23:32:15,865-[agedb_30][296000]Accuracy-Flip: 0.97133+-0.00859
Training: 2022-04-11 23:32:15,866-[agedb_30][296000]Accuracy-Highest: 0.97417
Training: 2022-04-11 23:32:16,945-Speed 144.33 samples/sec   Loss 3.6813   LearningRate 0.0013   Epoch: 17   Global Step: 296010   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:18,050-Speed 9277.27 samples/sec   Loss 3.6317   LearningRate 0.0013   Epoch: 17   Global Step: 296020   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:19,197-Speed 8932.36 samples/sec   Loss 3.5965   LearningRate 0.0013   Epoch: 17   Global Step: 296030   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:20,347-Speed 8910.21 samples/sec   Loss 3.6347   LearningRate 0.0013   Epoch: 17   Global Step: 296040   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:21,477-Speed 9065.76 samples/sec   Loss 3.6276   LearningRate 0.0013   Epoch: 17   Global Step: 296050   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:32:22,599-Speed 9128.41 samples/sec   Loss 3.6345   LearningRate 0.0013   Epoch: 17   Global Step: 296060   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:23,714-Speed 9197.63 samples/sec   Loss 3.5498   LearningRate 0.0013   Epoch: 17   Global Step: 296070   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:24,843-Speed 9076.58 samples/sec   Loss 3.5390   LearningRate 0.0013   Epoch: 17   Global Step: 296080   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:25,915-Speed 9553.72 samples/sec   Loss 3.5350   LearningRate 0.0013   Epoch: 17   Global Step: 296090   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:27,041-Speed 9101.01 samples/sec   Loss 3.6764   LearningRate 0.0013   Epoch: 17   Global Step: 296100   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:28,146-Speed 9267.13 samples/sec   Loss 3.6137   LearningRate 0.0013   Epoch: 17   Global Step: 296110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:29,315-Speed 8767.79 samples/sec   Loss 3.6829   LearningRate 0.0013   Epoch: 17   Global Step: 296120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:30,441-Speed 9100.22 samples/sec   Loss 3.5883   LearningRate 0.0013   Epoch: 17   Global Step: 296130   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:31,565-Speed 9113.34 samples/sec   Loss 3.5655   LearningRate 0.0013   Epoch: 17   Global Step: 296140   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:32,686-Speed 9142.09 samples/sec   Loss 3.7027   LearningRate 0.0013   Epoch: 17   Global Step: 296150   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:33,794-Speed 9243.30 samples/sec   Loss 3.5477   LearningRate 0.0013   Epoch: 17   Global Step: 296160   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:32:34,925-Speed 9062.32 samples/sec   Loss 3.6734   LearningRate 0.0013   Epoch: 17   Global Step: 296170   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:32:36,073-Speed 8923.54 samples/sec   Loss 3.4974   LearningRate 0.0013   Epoch: 17   Global Step: 296180   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:32:37,203-Speed 9075.17 samples/sec   Loss 3.5208   LearningRate 0.0013   Epoch: 17   Global Step: 296190   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:32:38,354-Speed 8896.87 samples/sec   Loss 3.5677   LearningRate 0.0013   Epoch: 17   Global Step: 296200   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:32:39,461-Speed 9254.70 samples/sec   Loss 3.5822   LearningRate 0.0013   Epoch: 17   Global Step: 296210   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:40,572-Speed 9222.33 samples/sec   Loss 3.5760   LearningRate 0.0013   Epoch: 17   Global Step: 296220   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:41,645-Speed 9549.88 samples/sec   Loss 3.6269   LearningRate 0.0013   Epoch: 17   Global Step: 296230   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:42,765-Speed 9147.60 samples/sec   Loss 3.5422   LearningRate 0.0013   Epoch: 17   Global Step: 296240   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:43,876-Speed 9226.95 samples/sec   Loss 3.6400   LearningRate 0.0013   Epoch: 17   Global Step: 296250   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:45,017-Speed 8978.80 samples/sec   Loss 3.6111   LearningRate 0.0013   Epoch: 17   Global Step: 296260   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:46,100-Speed 9462.81 samples/sec   Loss 3.6508   LearningRate 0.0013   Epoch: 17   Global Step: 296270   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:47,201-Speed 9304.72 samples/sec   Loss 3.6179   LearningRate 0.0013   Epoch: 17   Global Step: 296280   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:48,322-Speed 9138.39 samples/sec   Loss 3.6135   LearningRate 0.0013   Epoch: 17   Global Step: 296290   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:49,499-Speed 8706.37 samples/sec   Loss 3.6188   LearningRate 0.0013   Epoch: 17   Global Step: 296300   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:50,630-Speed 9056.65 samples/sec   Loss 3.6055   LearningRate 0.0013   Epoch: 17   Global Step: 296310   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:32:51,739-Speed 9236.07 samples/sec   Loss 3.6478   LearningRate 0.0013   Epoch: 17   Global Step: 296320   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:32:52,847-Speed 9253.53 samples/sec   Loss 3.5524   LearningRate 0.0013   Epoch: 17   Global Step: 296330   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:53,985-Speed 8997.16 samples/sec   Loss 3.6223   LearningRate 0.0013   Epoch: 17   Global Step: 296340   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:55,087-Speed 9302.34 samples/sec   Loss 3.5798   LearningRate 0.0013   Epoch: 17   Global Step: 296350   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:56,208-Speed 9144.90 samples/sec   Loss 3.5408   LearningRate 0.0013   Epoch: 17   Global Step: 296360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:57,345-Speed 9005.38 samples/sec   Loss 3.6043   LearningRate 0.0013   Epoch: 17   Global Step: 296370   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:58,455-Speed 9232.82 samples/sec   Loss 3.5697   LearningRate 0.0013   Epoch: 17   Global Step: 296380   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:32:59,546-Speed 9392.27 samples/sec   Loss 3.6732   LearningRate 0.0013   Epoch: 17   Global Step: 296390   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:00,697-Speed 8896.12 samples/sec   Loss 3.5968   LearningRate 0.0013   Epoch: 17   Global Step: 296400   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:01,871-Speed 8728.43 samples/sec   Loss 3.6073   LearningRate 0.0013   Epoch: 17   Global Step: 296410   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:03,007-Speed 9024.78 samples/sec   Loss 3.5750   LearningRate 0.0013   Epoch: 17   Global Step: 296420   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:04,132-Speed 9105.47 samples/sec   Loss 3.5312   LearningRate 0.0013   Epoch: 17   Global Step: 296430   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:33:05,278-Speed 8939.06 samples/sec   Loss 3.5621   LearningRate 0.0013   Epoch: 17   Global Step: 296440   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:33:06,439-Speed 8822.60 samples/sec   Loss 3.6411   LearningRate 0.0013   Epoch: 17   Global Step: 296450   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:33:07,554-Speed 9198.82 samples/sec   Loss 3.5810   LearningRate 0.0013   Epoch: 17   Global Step: 296460   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:08,693-Speed 8993.83 samples/sec   Loss 3.5863   LearningRate 0.0013   Epoch: 17   Global Step: 296470   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:09,846-Speed 8883.77 samples/sec   Loss 3.6000   LearningRate 0.0013   Epoch: 17   Global Step: 296480   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:10,953-Speed 9255.89 samples/sec   Loss 3.6906   LearningRate 0.0013   Epoch: 17   Global Step: 296490   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:12,059-Speed 9263.12 samples/sec   Loss 3.5641   LearningRate 0.0012   Epoch: 17   Global Step: 296500   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:13,237-Speed 8692.85 samples/sec   Loss 3.6846   LearningRate 0.0012   Epoch: 17   Global Step: 296510   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:14,385-Speed 8934.50 samples/sec   Loss 3.5222   LearningRate 0.0012   Epoch: 17   Global Step: 296520   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:15,521-Speed 9017.64 samples/sec   Loss 3.6580   LearningRate 0.0012   Epoch: 17   Global Step: 296530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:16,681-Speed 8833.38 samples/sec   Loss 3.6004   LearningRate 0.0012   Epoch: 17   Global Step: 296540   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:17,797-Speed 9174.73 samples/sec   Loss 3.6254   LearningRate 0.0012   Epoch: 17   Global Step: 296550   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:18,942-Speed 8951.54 samples/sec   Loss 3.5220   LearningRate 0.0012   Epoch: 17   Global Step: 296560   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:33:20,091-Speed 8917.00 samples/sec   Loss 3.5661   LearningRate 0.0012   Epoch: 17   Global Step: 296570   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:33:21,187-Speed 9353.30 samples/sec   Loss 3.6093   LearningRate 0.0012   Epoch: 17   Global Step: 296580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:22,280-Speed 9377.12 samples/sec   Loss 3.6791   LearningRate 0.0012   Epoch: 17   Global Step: 296590   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:23,362-Speed 9464.56 samples/sec   Loss 3.5236   LearningRate 0.0012   Epoch: 17   Global Step: 296600   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:24,492-Speed 9065.84 samples/sec   Loss 3.5418   LearningRate 0.0012   Epoch: 17   Global Step: 296610   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:25,652-Speed 8834.88 samples/sec   Loss 3.6615   LearningRate 0.0012   Epoch: 17   Global Step: 296620   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:26,790-Speed 9008.19 samples/sec   Loss 3.5972   LearningRate 0.0012   Epoch: 17   Global Step: 296630   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:27,882-Speed 9377.39 samples/sec   Loss 3.6773   LearningRate 0.0012   Epoch: 17   Global Step: 296640   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:28,981-Speed 9319.34 samples/sec   Loss 3.6426   LearningRate 0.0012   Epoch: 17   Global Step: 296650   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:30,096-Speed 9190.23 samples/sec   Loss 3.6110   LearningRate 0.0012   Epoch: 17   Global Step: 296660   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:31,195-Speed 9326.48 samples/sec   Loss 3.5924   LearningRate 0.0012   Epoch: 17   Global Step: 296670   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:32,308-Speed 9200.14 samples/sec   Loss 3.6330   LearningRate 0.0012   Epoch: 17   Global Step: 296680   Fp16 Grad Scale: 131072   Required: 2 hours
Training: 2022-04-11 23:33:33,396-Speed 9422.30 samples/sec   Loss 3.6084   LearningRate 0.0012   Epoch: 17   Global Step: 296690   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:34,537-Speed 8977.26 samples/sec   Loss 3.6880   LearningRate 0.0012   Epoch: 17   Global Step: 296700   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:35,678-Speed 8978.64 samples/sec   Loss 3.6650   LearningRate 0.0012   Epoch: 17   Global Step: 296710   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:36,803-Speed 9105.94 samples/sec   Loss 3.6112   LearningRate 0.0012   Epoch: 17   Global Step: 296720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:37,918-Speed 9190.84 samples/sec   Loss 3.6385   LearningRate 0.0012   Epoch: 17   Global Step: 296730   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:39,065-Speed 8933.47 samples/sec   Loss 3.6256   LearningRate 0.0012   Epoch: 17   Global Step: 296740   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:40,199-Speed 9040.37 samples/sec   Loss 3.6770   LearningRate 0.0012   Epoch: 17   Global Step: 296750   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:41,357-Speed 8848.04 samples/sec   Loss 3.5698   LearningRate 0.0012   Epoch: 17   Global Step: 296760   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-11 23:33:42,516-Speed 8840.42 samples/sec   Loss 3.5906   LearningRate 0.0012   Epoch: 17   Global Step: 296770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:33:43,651-Speed 9021.81 samples/sec   Loss 3.5983   LearningRate 0.0012   Epoch: 17   Global Step: 296780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:33:44,778-Speed 9097.13 samples/sec   Loss 3.5755   LearningRate 0.0012   Epoch: 17   Global Step: 296790   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:33:45,912-Speed 9036.66 samples/sec   Loss 3.6753   LearningRate 0.0012   Epoch: 17   Global Step: 296800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:33:47,014-Speed 9296.51 samples/sec   Loss 3.6647   LearningRate 0.0012   Epoch: 17   Global Step: 296810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:33:48,116-Speed 9297.68 samples/sec   Loss 3.6629   LearningRate 0.0012   Epoch: 17   Global Step: 296820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:33:49,248-Speed 9055.03 samples/sec   Loss 3.6316   LearningRate 0.0012   Epoch: 17   Global Step: 296830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:33:50,406-Speed 8840.37 samples/sec   Loss 3.5790   LearningRate 0.0012   Epoch: 17   Global Step: 296840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:33:51,530-Speed 9116.34 samples/sec   Loss 3.5315   LearningRate 0.0012   Epoch: 17   Global Step: 296850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:33:52,678-Speed 8927.85 samples/sec   Loss 3.6228   LearningRate 0.0012   Epoch: 17   Global Step: 296860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:33:53,831-Speed 8886.14 samples/sec   Loss 3.5551   LearningRate 0.0012   Epoch: 17   Global Step: 296870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:33:55,017-Speed 8636.49 samples/sec   Loss 3.6934   LearningRate 0.0012   Epoch: 17   Global Step: 296880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:33:56,151-Speed 9035.12 samples/sec   Loss 3.5980   LearningRate 0.0012   Epoch: 17   Global Step: 296890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:33:57,303-Speed 8908.59 samples/sec   Loss 3.5899   LearningRate 0.0012   Epoch: 17   Global Step: 296900   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:33:58,445-Speed 8971.59 samples/sec   Loss 3.6579   LearningRate 0.0012   Epoch: 17   Global Step: 296910   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:33:59,585-Speed 8991.96 samples/sec   Loss 3.6331   LearningRate 0.0012   Epoch: 17   Global Step: 296920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:00,702-Speed 9166.61 samples/sec   Loss 3.5378   LearningRate 0.0012   Epoch: 17   Global Step: 296930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:01,853-Speed 8902.29 samples/sec   Loss 3.6231   LearningRate 0.0012   Epoch: 17   Global Step: 296940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:02,986-Speed 9042.81 samples/sec   Loss 3.5920   LearningRate 0.0012   Epoch: 17   Global Step: 296950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:04,096-Speed 9227.50 samples/sec   Loss 3.6274   LearningRate 0.0012   Epoch: 17   Global Step: 296960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:05,189-Speed 9375.43 samples/sec   Loss 3.6384   LearningRate 0.0012   Epoch: 17   Global Step: 296970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:06,291-Speed 9303.16 samples/sec   Loss 3.5958   LearningRate 0.0012   Epoch: 17   Global Step: 296980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:07,372-Speed 9482.20 samples/sec   Loss 3.6401   LearningRate 0.0012   Epoch: 17   Global Step: 296990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:08,494-Speed 9125.25 samples/sec   Loss 3.5928   LearningRate 0.0012   Epoch: 17   Global Step: 297000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:09,626-Speed 9049.06 samples/sec   Loss 3.5656   LearningRate 0.0012   Epoch: 17   Global Step: 297010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:10,764-Speed 9005.17 samples/sec   Loss 3.5841   LearningRate 0.0012   Epoch: 17   Global Step: 297020   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:34:11,857-Speed 9372.46 samples/sec   Loss 3.6606   LearningRate 0.0012   Epoch: 17   Global Step: 297030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:12,992-Speed 9028.97 samples/sec   Loss 3.6414   LearningRate 0.0012   Epoch: 17   Global Step: 297040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:14,104-Speed 9216.08 samples/sec   Loss 3.5915   LearningRate 0.0012   Epoch: 17   Global Step: 297050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:15,218-Speed 9198.76 samples/sec   Loss 3.5826   LearningRate 0.0012   Epoch: 17   Global Step: 297060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:16,352-Speed 9036.43 samples/sec   Loss 3.6290   LearningRate 0.0012   Epoch: 17   Global Step: 297070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:17,473-Speed 9136.10 samples/sec   Loss 3.6439   LearningRate 0.0012   Epoch: 17   Global Step: 297080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:18,628-Speed 8877.69 samples/sec   Loss 3.5647   LearningRate 0.0012   Epoch: 17   Global Step: 297090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:19,736-Speed 9241.21 samples/sec   Loss 3.5970   LearningRate 0.0012   Epoch: 17   Global Step: 297100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:20,826-Speed 9404.15 samples/sec   Loss 3.5964   LearningRate 0.0012   Epoch: 17   Global Step: 297110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:21,955-Speed 9069.81 samples/sec   Loss 3.6113   LearningRate 0.0012   Epoch: 17   Global Step: 297120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:23,034-Speed 9496.52 samples/sec   Loss 3.6120   LearningRate 0.0012   Epoch: 17   Global Step: 297130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:24,188-Speed 8878.52 samples/sec   Loss 3.5406   LearningRate 0.0012   Epoch: 17   Global Step: 297140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:25,331-Speed 8969.63 samples/sec   Loss 3.6609   LearningRate 0.0012   Epoch: 17   Global Step: 297150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:26,490-Speed 8840.70 samples/sec   Loss 3.5242   LearningRate 0.0012   Epoch: 17   Global Step: 297160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:27,625-Speed 9026.60 samples/sec   Loss 3.6853   LearningRate 0.0012   Epoch: 17   Global Step: 297170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:28,757-Speed 9046.27 samples/sec   Loss 3.6395   LearningRate 0.0012   Epoch: 17   Global Step: 297180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:29,895-Speed 9001.85 samples/sec   Loss 3.6446   LearningRate 0.0012   Epoch: 17   Global Step: 297190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:31,037-Speed 8975.90 samples/sec   Loss 3.5823   LearningRate 0.0012   Epoch: 17   Global Step: 297200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:32,119-Speed 9469.61 samples/sec   Loss 3.5882   LearningRate 0.0012   Epoch: 17   Global Step: 297210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:33,201-Speed 9465.27 samples/sec   Loss 3.6606   LearningRate 0.0012   Epoch: 17   Global Step: 297220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:34,339-Speed 9006.82 samples/sec   Loss 3.6229   LearningRate 0.0012   Epoch: 17   Global Step: 297230   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:34:35,430-Speed 9394.32 samples/sec   Loss 3.6169   LearningRate 0.0012   Epoch: 17   Global Step: 297240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:36,603-Speed 8733.15 samples/sec   Loss 3.5700   LearningRate 0.0012   Epoch: 17   Global Step: 297250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:37,737-Speed 9036.94 samples/sec   Loss 3.6781   LearningRate 0.0012   Epoch: 17   Global Step: 297260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:38,874-Speed 9011.32 samples/sec   Loss 3.5980   LearningRate 0.0012   Epoch: 17   Global Step: 297270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:39,976-Speed 9302.71 samples/sec   Loss 3.5692   LearningRate 0.0012   Epoch: 17   Global Step: 297280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:34:41,091-Speed 9187.28 samples/sec   Loss 3.5307   LearningRate 0.0012   Epoch: 17   Global Step: 297290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:34:42,238-Speed 8930.19 samples/sec   Loss 3.6916   LearningRate 0.0012   Epoch: 17   Global Step: 297300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:34:43,318-Speed 9495.20 samples/sec   Loss 3.5118   LearningRate 0.0012   Epoch: 17   Global Step: 297310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:34:44,519-Speed 8526.93 samples/sec   Loss 3.5715   LearningRate 0.0012   Epoch: 17   Global Step: 297320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:34:45,655-Speed 9019.01 samples/sec   Loss 3.6291   LearningRate 0.0012   Epoch: 17   Global Step: 297330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:34:46,790-Speed 9031.26 samples/sec   Loss 3.6822   LearningRate 0.0012   Epoch: 17   Global Step: 297340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:34:47,925-Speed 9022.81 samples/sec   Loss 3.6399   LearningRate 0.0012   Epoch: 17   Global Step: 297350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:34:49,012-Speed 9429.69 samples/sec   Loss 3.6782   LearningRate 0.0012   Epoch: 17   Global Step: 297360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:34:50,170-Speed 8844.54 samples/sec   Loss 3.6189   LearningRate 0.0012   Epoch: 17   Global Step: 297370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:34:51,303-Speed 9044.49 samples/sec   Loss 3.5732   LearningRate 0.0012   Epoch: 17   Global Step: 297380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:52,406-Speed 9292.42 samples/sec   Loss 3.6113   LearningRate 0.0012   Epoch: 17   Global Step: 297390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:53,539-Speed 9039.43 samples/sec   Loss 3.5960   LearningRate 0.0012   Epoch: 17   Global Step: 297400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:54,669-Speed 9069.30 samples/sec   Loss 3.5532   LearningRate 0.0012   Epoch: 17   Global Step: 297410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:55,748-Speed 9498.71 samples/sec   Loss 3.6109   LearningRate 0.0012   Epoch: 17   Global Step: 297420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:56,859-Speed 9220.76 samples/sec   Loss 3.6645   LearningRate 0.0012   Epoch: 17   Global Step: 297430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:58,020-Speed 8823.18 samples/sec   Loss 3.5909   LearningRate 0.0012   Epoch: 17   Global Step: 297440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:34:59,152-Speed 9051.34 samples/sec   Loss 3.5877   LearningRate 0.0012   Epoch: 17   Global Step: 297450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:00,317-Speed 8790.97 samples/sec   Loss 3.6305   LearningRate 0.0012   Epoch: 17   Global Step: 297460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:01,415-Speed 9332.70 samples/sec   Loss 3.5550   LearningRate 0.0012   Epoch: 17   Global Step: 297470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:02,586-Speed 8755.99 samples/sec   Loss 3.5921   LearningRate 0.0012   Epoch: 17   Global Step: 297480   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:35:03,721-Speed 9026.92 samples/sec   Loss 3.6651   LearningRate 0.0012   Epoch: 17   Global Step: 297490   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:35:04,827-Speed 9258.50 samples/sec   Loss 3.5351   LearningRate 0.0012   Epoch: 17   Global Step: 297500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:05,984-Speed 8857.62 samples/sec   Loss 3.6114   LearningRate 0.0012   Epoch: 17   Global Step: 297510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:07,104-Speed 9149.40 samples/sec   Loss 3.4914   LearningRate 0.0012   Epoch: 17   Global Step: 297520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:08,228-Speed 9108.88 samples/sec   Loss 3.5753   LearningRate 0.0012   Epoch: 17   Global Step: 297530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:09,357-Speed 9075.09 samples/sec   Loss 3.6490   LearningRate 0.0012   Epoch: 17   Global Step: 297540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:10,491-Speed 9041.34 samples/sec   Loss 3.6078   LearningRate 0.0012   Epoch: 17   Global Step: 297550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:11,625-Speed 9027.43 samples/sec   Loss 3.5389   LearningRate 0.0012   Epoch: 17   Global Step: 297560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:12,786-Speed 8834.15 samples/sec   Loss 3.5531   LearningRate 0.0012   Epoch: 17   Global Step: 297570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:13,907-Speed 9137.48 samples/sec   Loss 3.5443   LearningRate 0.0012   Epoch: 17   Global Step: 297580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:15,051-Speed 8960.74 samples/sec   Loss 3.6095   LearningRate 0.0012   Epoch: 17   Global Step: 297590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:16,170-Speed 9154.57 samples/sec   Loss 3.5415   LearningRate 0.0012   Epoch: 17   Global Step: 297600   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:35:17,304-Speed 9034.03 samples/sec   Loss 3.6615   LearningRate 0.0012   Epoch: 17   Global Step: 297610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:18,428-Speed 9112.08 samples/sec   Loss 3.5897   LearningRate 0.0012   Epoch: 17   Global Step: 297620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:19,545-Speed 9177.62 samples/sec   Loss 3.6232   LearningRate 0.0012   Epoch: 17   Global Step: 297630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:20,661-Speed 9178.84 samples/sec   Loss 3.6919   LearningRate 0.0012   Epoch: 17   Global Step: 297640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:21,779-Speed 9163.51 samples/sec   Loss 3.6215   LearningRate 0.0012   Epoch: 17   Global Step: 297650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:22,943-Speed 8800.63 samples/sec   Loss 3.6527   LearningRate 0.0012   Epoch: 17   Global Step: 297660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:24,096-Speed 8888.55 samples/sec   Loss 3.6493   LearningRate 0.0012   Epoch: 17   Global Step: 297670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:25,227-Speed 9061.48 samples/sec   Loss 3.5445   LearningRate 0.0012   Epoch: 17   Global Step: 297680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:26,348-Speed 9135.94 samples/sec   Loss 3.6734   LearningRate 0.0012   Epoch: 17   Global Step: 297690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:27,459-Speed 9223.26 samples/sec   Loss 3.6014   LearningRate 0.0012   Epoch: 17   Global Step: 297700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:28,569-Speed 9234.43 samples/sec   Loss 3.5615   LearningRate 0.0012   Epoch: 17   Global Step: 297710   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:35:29,658-Speed 9401.39 samples/sec   Loss 3.5016   LearningRate 0.0012   Epoch: 17   Global Step: 297720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:30,757-Speed 9323.25 samples/sec   Loss 3.4956   LearningRate 0.0012   Epoch: 17   Global Step: 297730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:31,900-Speed 8968.42 samples/sec   Loss 3.5895   LearningRate 0.0012   Epoch: 17   Global Step: 297740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:33,039-Speed 8997.21 samples/sec   Loss 3.6489   LearningRate 0.0012   Epoch: 17   Global Step: 297750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:34,229-Speed 8613.60 samples/sec   Loss 3.5858   LearningRate 0.0012   Epoch: 17   Global Step: 297760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:35,364-Speed 9025.07 samples/sec   Loss 3.6519   LearningRate 0.0012   Epoch: 17   Global Step: 297770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:36,464-Speed 9311.81 samples/sec   Loss 3.6891   LearningRate 0.0012   Epoch: 17   Global Step: 297780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:37,603-Speed 8994.69 samples/sec   Loss 3.5909   LearningRate 0.0012   Epoch: 17   Global Step: 297790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:38,692-Speed 9407.33 samples/sec   Loss 3.6613   LearningRate 0.0012   Epoch: 17   Global Step: 297800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:39,843-Speed 8901.26 samples/sec   Loss 3.5782   LearningRate 0.0012   Epoch: 17   Global Step: 297810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:41,014-Speed 8753.44 samples/sec   Loss 3.6613   LearningRate 0.0012   Epoch: 17   Global Step: 297820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:42,125-Speed 9221.86 samples/sec   Loss 3.6099   LearningRate 0.0012   Epoch: 17   Global Step: 297830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:43,253-Speed 9080.03 samples/sec   Loss 3.5786   LearningRate 0.0012   Epoch: 17   Global Step: 297840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:44,389-Speed 9023.13 samples/sec   Loss 3.5783   LearningRate 0.0012   Epoch: 17   Global Step: 297850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:45,523-Speed 9031.07 samples/sec   Loss 3.6406   LearningRate 0.0012   Epoch: 17   Global Step: 297860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:46,658-Speed 9032.26 samples/sec   Loss 3.5616   LearningRate 0.0012   Epoch: 17   Global Step: 297870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:47,805-Speed 8927.62 samples/sec   Loss 3.6403   LearningRate 0.0012   Epoch: 17   Global Step: 297880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:48,950-Speed 8949.08 samples/sec   Loss 3.6061   LearningRate 0.0012   Epoch: 17   Global Step: 297890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:50,053-Speed 9294.68 samples/sec   Loss 3.6803   LearningRate 0.0012   Epoch: 17   Global Step: 297900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:51,153-Speed 9320.86 samples/sec   Loss 3.5675   LearningRate 0.0012   Epoch: 17   Global Step: 297910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:35:52,332-Speed 8688.40 samples/sec   Loss 3.5804   LearningRate 0.0012   Epoch: 17   Global Step: 297920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:35:53,465-Speed 9040.69 samples/sec   Loss 3.6144   LearningRate 0.0012   Epoch: 17   Global Step: 297930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:35:54,608-Speed 8961.18 samples/sec   Loss 3.6543   LearningRate 0.0012   Epoch: 17   Global Step: 297940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:35:55,765-Speed 8855.81 samples/sec   Loss 3.6131   LearningRate 0.0012   Epoch: 17   Global Step: 297950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:35:56,898-Speed 9048.96 samples/sec   Loss 3.6470   LearningRate 0.0012   Epoch: 17   Global Step: 297960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:35:58,019-Speed 9136.35 samples/sec   Loss 3.6963   LearningRate 0.0012   Epoch: 17   Global Step: 297970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:35:59,129-Speed 9227.82 samples/sec   Loss 3.6786   LearningRate 0.0012   Epoch: 17   Global Step: 297980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:36:00,312-Speed 8664.91 samples/sec   Loss 3.6096   LearningRate 0.0012   Epoch: 17   Global Step: 297990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:36:01,425-Speed 9207.15 samples/sec   Loss 3.7219   LearningRate 0.0012   Epoch: 17   Global Step: 298000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:36:23,429-[lfw][298000]XNorm: 6.631757
Training: 2022-04-11 23:36:23,429-[lfw][298000]Accuracy-Flip: 0.99733+-0.00291
Training: 2022-04-11 23:36:23,430-[lfw][298000]Accuracy-Highest: 0.99733
Training: 2022-04-11 23:36:48,935-[cfp_fp][298000]XNorm: 5.795037
Training: 2022-04-11 23:36:48,936-[cfp_fp][298000]Accuracy-Flip: 0.97186+-0.00888
Training: 2022-04-11 23:36:48,936-[cfp_fp][298000]Accuracy-Highest: 0.97386
Training: 2022-04-11 23:37:10,975-[agedb_30][298000]XNorm: 6.459667
Training: 2022-04-11 23:37:10,975-[agedb_30][298000]Accuracy-Flip: 0.97383+-0.00823
Training: 2022-04-11 23:37:10,976-[agedb_30][298000]Accuracy-Highest: 0.97417
Training: 2022-04-11 23:37:12,120-Speed 144.85 samples/sec   Loss 3.6005   LearningRate 0.0012   Epoch: 17   Global Step: 298010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:37:13,252-Speed 9048.65 samples/sec   Loss 3.6458   LearningRate 0.0012   Epoch: 17   Global Step: 298020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:14,346-Speed 9367.40 samples/sec   Loss 3.6031   LearningRate 0.0011   Epoch: 17   Global Step: 298030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:15,478-Speed 9051.44 samples/sec   Loss 3.5254   LearningRate 0.0011   Epoch: 17   Global Step: 298040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:16,629-Speed 8902.83 samples/sec   Loss 3.6564   LearningRate 0.0011   Epoch: 17   Global Step: 298050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:17,783-Speed 8875.55 samples/sec   Loss 3.6315   LearningRate 0.0011   Epoch: 17   Global Step: 298060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:18,893-Speed 9231.98 samples/sec   Loss 3.5010   LearningRate 0.0011   Epoch: 17   Global Step: 298070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:20,022-Speed 9081.58 samples/sec   Loss 3.5984   LearningRate 0.0011   Epoch: 17   Global Step: 298080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:21,126-Speed 9278.28 samples/sec   Loss 3.6468   LearningRate 0.0011   Epoch: 17   Global Step: 298090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:22,317-Speed 8607.53 samples/sec   Loss 3.6147   LearningRate 0.0011   Epoch: 17   Global Step: 298100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:23,424-Speed 9254.26 samples/sec   Loss 3.5280   LearningRate 0.0011   Epoch: 17   Global Step: 298110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:24,596-Speed 8739.16 samples/sec   Loss 3.6503   LearningRate 0.0011   Epoch: 17   Global Step: 298120   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:37:25,732-Speed 9016.02 samples/sec   Loss 3.6179   LearningRate 0.0011   Epoch: 17   Global Step: 298130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:26,883-Speed 8903.38 samples/sec   Loss 3.6171   LearningRate 0.0011   Epoch: 17   Global Step: 298140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:28,080-Speed 8558.26 samples/sec   Loss 3.5133   LearningRate 0.0011   Epoch: 17   Global Step: 298150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:29,203-Speed 9125.73 samples/sec   Loss 3.6155   LearningRate 0.0011   Epoch: 17   Global Step: 298160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:30,326-Speed 9124.75 samples/sec   Loss 3.6151   LearningRate 0.0011   Epoch: 17   Global Step: 298170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:31,418-Speed 9379.52 samples/sec   Loss 3.6133   LearningRate 0.0011   Epoch: 17   Global Step: 298180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:32,521-Speed 9291.68 samples/sec   Loss 3.6657   LearningRate 0.0011   Epoch: 17   Global Step: 298190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:33,649-Speed 9088.71 samples/sec   Loss 3.5520   LearningRate 0.0011   Epoch: 17   Global Step: 298200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:34,782-Speed 9041.03 samples/sec   Loss 3.5887   LearningRate 0.0011   Epoch: 17   Global Step: 298210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:35,861-Speed 9492.25 samples/sec   Loss 3.5993   LearningRate 0.0011   Epoch: 17   Global Step: 298220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:36,992-Speed 9059.28 samples/sec   Loss 3.6122   LearningRate 0.0011   Epoch: 17   Global Step: 298230   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:37:38,078-Speed 9440.09 samples/sec   Loss 3.6033   LearningRate 0.0011   Epoch: 17   Global Step: 298240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:39,215-Speed 9011.17 samples/sec   Loss 3.6208   LearningRate 0.0011   Epoch: 17   Global Step: 298250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:40,361-Speed 8938.52 samples/sec   Loss 3.5617   LearningRate 0.0011   Epoch: 17   Global Step: 298260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:41,463-Speed 9296.27 samples/sec   Loss 3.6070   LearningRate 0.0011   Epoch: 17   Global Step: 298270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:42,626-Speed 8815.47 samples/sec   Loss 3.6850   LearningRate 0.0011   Epoch: 17   Global Step: 298280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:43,801-Speed 8718.50 samples/sec   Loss 3.5222   LearningRate 0.0011   Epoch: 17   Global Step: 298290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:44,926-Speed 9106.75 samples/sec   Loss 3.6210   LearningRate 0.0011   Epoch: 17   Global Step: 298300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:46,074-Speed 8925.35 samples/sec   Loss 3.5667   LearningRate 0.0011   Epoch: 17   Global Step: 298310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:47,219-Speed 8948.69 samples/sec   Loss 3.6047   LearningRate 0.0011   Epoch: 17   Global Step: 298320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:48,362-Speed 8963.08 samples/sec   Loss 3.6740   LearningRate 0.0011   Epoch: 17   Global Step: 298330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:49,489-Speed 9096.48 samples/sec   Loss 3.6973   LearningRate 0.0011   Epoch: 17   Global Step: 298340   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:37:50,589-Speed 9318.23 samples/sec   Loss 3.6582   LearningRate 0.0011   Epoch: 17   Global Step: 298350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:51,674-Speed 9438.19 samples/sec   Loss 3.6321   LearningRate 0.0011   Epoch: 17   Global Step: 298360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:52,784-Speed 9230.03 samples/sec   Loss 3.6651   LearningRate 0.0011   Epoch: 17   Global Step: 298370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:53,930-Speed 8939.69 samples/sec   Loss 3.5855   LearningRate 0.0011   Epoch: 17   Global Step: 298380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:55,014-Speed 9450.72 samples/sec   Loss 3.5640   LearningRate 0.0011   Epoch: 17   Global Step: 298390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:56,151-Speed 9012.69 samples/sec   Loss 3.5742   LearningRate 0.0011   Epoch: 17   Global Step: 298400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:57,258-Speed 9258.98 samples/sec   Loss 3.5237   LearningRate 0.0011   Epoch: 17   Global Step: 298410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:58,369-Speed 9221.58 samples/sec   Loss 3.5318   LearningRate 0.0011   Epoch: 17   Global Step: 298420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:37:59,512-Speed 8970.98 samples/sec   Loss 3.6357   LearningRate 0.0011   Epoch: 17   Global Step: 298430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:00,641-Speed 9071.20 samples/sec   Loss 3.6371   LearningRate 0.0011   Epoch: 17   Global Step: 298440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:01,754-Speed 9205.64 samples/sec   Loss 3.6480   LearningRate 0.0011   Epoch: 17   Global Step: 298450   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:38:02,829-Speed 9537.17 samples/sec   Loss 3.5977   LearningRate 0.0011   Epoch: 17   Global Step: 298460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:03,950-Speed 9134.12 samples/sec   Loss 3.5088   LearningRate 0.0011   Epoch: 17   Global Step: 298470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:05,108-Speed 8848.80 samples/sec   Loss 3.6095   LearningRate 0.0011   Epoch: 17   Global Step: 298480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:06,208-Speed 9315.11 samples/sec   Loss 3.5673   LearningRate 0.0011   Epoch: 17   Global Step: 298490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:07,304-Speed 9350.89 samples/sec   Loss 3.5691   LearningRate 0.0011   Epoch: 17   Global Step: 298500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:08,435-Speed 9058.65 samples/sec   Loss 3.6123   LearningRate 0.0011   Epoch: 17   Global Step: 298510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:09,524-Speed 9400.91 samples/sec   Loss 3.6509   LearningRate 0.0011   Epoch: 17   Global Step: 298520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:10,649-Speed 9114.75 samples/sec   Loss 3.6200   LearningRate 0.0011   Epoch: 17   Global Step: 298530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:11,747-Speed 9326.46 samples/sec   Loss 3.6104   LearningRate 0.0011   Epoch: 17   Global Step: 298540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:12,895-Speed 8925.94 samples/sec   Loss 3.5617   LearningRate 0.0011   Epoch: 17   Global Step: 298550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:14,008-Speed 9205.44 samples/sec   Loss 3.6376   LearningRate 0.0011   Epoch: 17   Global Step: 298560   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:38:15,183-Speed 8720.69 samples/sec   Loss 3.5723   LearningRate 0.0011   Epoch: 17   Global Step: 298570   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:38:16,313-Speed 9070.86 samples/sec   Loss 3.6008   LearningRate 0.0011   Epoch: 17   Global Step: 298580   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:38:17,438-Speed 9103.17 samples/sec   Loss 3.5855   LearningRate 0.0011   Epoch: 17   Global Step: 298590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:18,561-Speed 9124.49 samples/sec   Loss 3.6068   LearningRate 0.0011   Epoch: 17   Global Step: 298600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:19,686-Speed 9114.77 samples/sec   Loss 3.6531   LearningRate 0.0011   Epoch: 17   Global Step: 298610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:20,800-Speed 9197.22 samples/sec   Loss 3.7123   LearningRate 0.0011   Epoch: 17   Global Step: 298620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:21,923-Speed 9118.45 samples/sec   Loss 3.5947   LearningRate 0.0011   Epoch: 17   Global Step: 298630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:23,028-Speed 9276.76 samples/sec   Loss 3.5781   LearningRate 0.0011   Epoch: 17   Global Step: 298640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:24,160-Speed 9052.67 samples/sec   Loss 3.5407   LearningRate 0.0011   Epoch: 17   Global Step: 298650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:25,317-Speed 8855.00 samples/sec   Loss 3.5586   LearningRate 0.0011   Epoch: 17   Global Step: 298660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:26,432-Speed 9188.77 samples/sec   Loss 3.6674   LearningRate 0.0011   Epoch: 17   Global Step: 298670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:27,598-Speed 8782.17 samples/sec   Loss 3.5519   LearningRate 0.0011   Epoch: 17   Global Step: 298680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:28,762-Speed 8799.73 samples/sec   Loss 3.6431   LearningRate 0.0011   Epoch: 17   Global Step: 298690   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:38:29,900-Speed 9007.36 samples/sec   Loss 3.5889   LearningRate 0.0011   Epoch: 17   Global Step: 298700   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:38:31,009-Speed 9240.60 samples/sec   Loss 3.5888   LearningRate 0.0011   Epoch: 17   Global Step: 298710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:32,135-Speed 9102.59 samples/sec   Loss 3.5049   LearningRate 0.0011   Epoch: 17   Global Step: 298720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:33,295-Speed 8830.60 samples/sec   Loss 3.6426   LearningRate 0.0011   Epoch: 17   Global Step: 298730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:34,454-Speed 8842.23 samples/sec   Loss 3.5422   LearningRate 0.0011   Epoch: 17   Global Step: 298740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:35,606-Speed 8892.19 samples/sec   Loss 3.6232   LearningRate 0.0011   Epoch: 17   Global Step: 298750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:36,747-Speed 8979.80 samples/sec   Loss 3.6166   LearningRate 0.0011   Epoch: 17   Global Step: 298760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:37,817-Speed 9573.47 samples/sec   Loss 3.7409   LearningRate 0.0011   Epoch: 17   Global Step: 298770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:38,936-Speed 9159.20 samples/sec   Loss 3.5975   LearningRate 0.0011   Epoch: 17   Global Step: 298780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:40,069-Speed 9041.97 samples/sec   Loss 3.6352   LearningRate 0.0011   Epoch: 17   Global Step: 298790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:41,213-Speed 8956.26 samples/sec   Loss 3.6117   LearningRate 0.0011   Epoch: 17   Global Step: 298800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:42,358-Speed 8949.54 samples/sec   Loss 3.6179   LearningRate 0.0011   Epoch: 17   Global Step: 298810   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:38:43,517-Speed 8837.67 samples/sec   Loss 3.6308   LearningRate 0.0011   Epoch: 17   Global Step: 298820   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:38:44,634-Speed 9177.17 samples/sec   Loss 3.6592   LearningRate 0.0011   Epoch: 17   Global Step: 298830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:45,783-Speed 8916.03 samples/sec   Loss 3.5287   LearningRate 0.0011   Epoch: 17   Global Step: 298840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:46,925-Speed 8974.24 samples/sec   Loss 3.5864   LearningRate 0.0011   Epoch: 17   Global Step: 298850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:48,122-Speed 8556.49 samples/sec   Loss 3.5826   LearningRate 0.0011   Epoch: 17   Global Step: 298860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:49,247-Speed 9109.03 samples/sec   Loss 3.6007   LearningRate 0.0011   Epoch: 17   Global Step: 298870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:50,341-Speed 9362.46 samples/sec   Loss 3.6316   LearningRate 0.0011   Epoch: 17   Global Step: 298880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:51,452-Speed 9224.72 samples/sec   Loss 3.5399   LearningRate 0.0011   Epoch: 17   Global Step: 298890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:52,579-Speed 9098.10 samples/sec   Loss 3.5962   LearningRate 0.0011   Epoch: 17   Global Step: 298900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:53,706-Speed 9091.06 samples/sec   Loss 3.6100   LearningRate 0.0011   Epoch: 17   Global Step: 298910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:54,805-Speed 9321.75 samples/sec   Loss 3.5987   LearningRate 0.0011   Epoch: 17   Global Step: 298920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:38:55,877-Speed 9551.91 samples/sec   Loss 3.6798   LearningRate 0.0011   Epoch: 17   Global Step: 298930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:38:57,011-Speed 9036.95 samples/sec   Loss 3.5596   LearningRate 0.0011   Epoch: 17   Global Step: 298940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:38:58,169-Speed 8850.37 samples/sec   Loss 3.6420   LearningRate 0.0011   Epoch: 17   Global Step: 298950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:38:59,340-Speed 8749.01 samples/sec   Loss 3.6034   LearningRate 0.0011   Epoch: 17   Global Step: 298960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:00,458-Speed 9162.18 samples/sec   Loss 3.5973   LearningRate 0.0011   Epoch: 17   Global Step: 298970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:01,601-Speed 8961.89 samples/sec   Loss 3.6261   LearningRate 0.0011   Epoch: 17   Global Step: 298980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:02,707-Speed 9269.53 samples/sec   Loss 3.6116   LearningRate 0.0011   Epoch: 17   Global Step: 298990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:03,858-Speed 8908.44 samples/sec   Loss 3.6040   LearningRate 0.0011   Epoch: 17   Global Step: 299000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:04,947-Speed 9408.14 samples/sec   Loss 3.5902   LearningRate 0.0011   Epoch: 17   Global Step: 299010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:06,112-Speed 8795.36 samples/sec   Loss 3.6160   LearningRate 0.0011   Epoch: 17   Global Step: 299020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:07,284-Speed 8740.82 samples/sec   Loss 3.6576   LearningRate 0.0011   Epoch: 17   Global Step: 299030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:08,457-Speed 8731.27 samples/sec   Loss 3.5185   LearningRate 0.0011   Epoch: 17   Global Step: 299040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:09,598-Speed 8977.64 samples/sec   Loss 3.6508   LearningRate 0.0011   Epoch: 17   Global Step: 299050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:10,726-Speed 9089.81 samples/sec   Loss 3.5522   LearningRate 0.0011   Epoch: 17   Global Step: 299060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:11,907-Speed 8678.29 samples/sec   Loss 3.5425   LearningRate 0.0011   Epoch: 17   Global Step: 299070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:13,035-Speed 9084.49 samples/sec   Loss 3.5453   LearningRate 0.0011   Epoch: 17   Global Step: 299080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:14,206-Speed 8750.02 samples/sec   Loss 3.5313   LearningRate 0.0011   Epoch: 17   Global Step: 299090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:15,344-Speed 8999.81 samples/sec   Loss 3.5900   LearningRate 0.0011   Epoch: 17   Global Step: 299100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:16,532-Speed 8625.37 samples/sec   Loss 3.6129   LearningRate 0.0011   Epoch: 17   Global Step: 299110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:17,648-Speed 9181.84 samples/sec   Loss 3.5653   LearningRate 0.0011   Epoch: 17   Global Step: 299120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:18,840-Speed 8594.86 samples/sec   Loss 3.5693   LearningRate 0.0011   Epoch: 17   Global Step: 299130   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:39:19,945-Speed 9279.27 samples/sec   Loss 3.6565   LearningRate 0.0011   Epoch: 17   Global Step: 299140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:21,062-Speed 9171.05 samples/sec   Loss 3.6664   LearningRate 0.0011   Epoch: 17   Global Step: 299150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:22,218-Speed 8857.03 samples/sec   Loss 3.6756   LearningRate 0.0011   Epoch: 17   Global Step: 299160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:23,388-Speed 8761.50 samples/sec   Loss 3.6634   LearningRate 0.0011   Epoch: 17   Global Step: 299170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:24,521-Speed 9043.54 samples/sec   Loss 3.5535   LearningRate 0.0011   Epoch: 17   Global Step: 299180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:25,616-Speed 9351.23 samples/sec   Loss 3.5770   LearningRate 0.0011   Epoch: 17   Global Step: 299190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:26,698-Speed 9474.25 samples/sec   Loss 3.6221   LearningRate 0.0011   Epoch: 17   Global Step: 299200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:27,793-Speed 9349.78 samples/sec   Loss 3.6226   LearningRate 0.0011   Epoch: 17   Global Step: 299210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:28,874-Speed 9481.33 samples/sec   Loss 3.5792   LearningRate 0.0011   Epoch: 17   Global Step: 299220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:30,028-Speed 8882.48 samples/sec   Loss 3.6921   LearningRate 0.0011   Epoch: 17   Global Step: 299230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:31,155-Speed 9090.94 samples/sec   Loss 3.6231   LearningRate 0.0011   Epoch: 17   Global Step: 299240   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:39:32,249-Speed 9368.23 samples/sec   Loss 3.6282   LearningRate 0.0011   Epoch: 17   Global Step: 299250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:33,350-Speed 9307.68 samples/sec   Loss 3.5401   LearningRate 0.0011   Epoch: 17   Global Step: 299260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:34,469-Speed 9156.57 samples/sec   Loss 3.6108   LearningRate 0.0011   Epoch: 17   Global Step: 299270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:35,530-Speed 9652.13 samples/sec   Loss 3.6378   LearningRate 0.0011   Epoch: 17   Global Step: 299280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:36,649-Speed 9156.43 samples/sec   Loss 3.5635   LearningRate 0.0011   Epoch: 17   Global Step: 299290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:37,764-Speed 9189.76 samples/sec   Loss 3.5516   LearningRate 0.0011   Epoch: 17   Global Step: 299300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:38,902-Speed 9004.63 samples/sec   Loss 3.6261   LearningRate 0.0011   Epoch: 17   Global Step: 299310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:40,005-Speed 9284.80 samples/sec   Loss 3.6522   LearningRate 0.0011   Epoch: 17   Global Step: 299320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:41,121-Speed 9183.90 samples/sec   Loss 3.5994   LearningRate 0.0011   Epoch: 17   Global Step: 299330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:42,250-Speed 9074.27 samples/sec   Loss 3.6523   LearningRate 0.0011   Epoch: 17   Global Step: 299340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:43,410-Speed 8837.06 samples/sec   Loss 3.5429   LearningRate 0.0011   Epoch: 17   Global Step: 299350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:44,536-Speed 9095.51 samples/sec   Loss 3.5439   LearningRate 0.0011   Epoch: 17   Global Step: 299360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:45,627-Speed 9394.84 samples/sec   Loss 3.7330   LearningRate 0.0011   Epoch: 17   Global Step: 299370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:46,775-Speed 8920.62 samples/sec   Loss 3.5339   LearningRate 0.0011   Epoch: 17   Global Step: 299380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:39:47,880-Speed 9281.34 samples/sec   Loss 3.6576   LearningRate 0.0011   Epoch: 17   Global Step: 299390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:48,961-Speed 9475.25 samples/sec   Loss 3.5864   LearningRate 0.0011   Epoch: 17   Global Step: 299400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:50,122-Speed 8826.57 samples/sec   Loss 3.6921   LearningRate 0.0011   Epoch: 17   Global Step: 299410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:51,253-Speed 9063.46 samples/sec   Loss 3.6371   LearningRate 0.0011   Epoch: 17   Global Step: 299420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:52,367-Speed 9195.73 samples/sec   Loss 3.6203   LearningRate 0.0011   Epoch: 17   Global Step: 299430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:53,460-Speed 9374.89 samples/sec   Loss 3.6608   LearningRate 0.0011   Epoch: 17   Global Step: 299440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:54,570-Speed 9224.90 samples/sec   Loss 3.5611   LearningRate 0.0011   Epoch: 17   Global Step: 299450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:55,722-Speed 8896.64 samples/sec   Loss 3.5512   LearningRate 0.0011   Epoch: 17   Global Step: 299460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:56,863-Speed 8980.16 samples/sec   Loss 3.7280   LearningRate 0.0011   Epoch: 17   Global Step: 299470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:57,989-Speed 9100.24 samples/sec   Loss 3.6364   LearningRate 0.0011   Epoch: 17   Global Step: 299480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:39:59,113-Speed 9113.40 samples/sec   Loss 3.5930   LearningRate 0.0011   Epoch: 17   Global Step: 299490   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:40:00,241-Speed 9081.44 samples/sec   Loss 3.6128   LearningRate 0.0011   Epoch: 17   Global Step: 299500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:01,336-Speed 9358.28 samples/sec   Loss 3.6399   LearningRate 0.0011   Epoch: 17   Global Step: 299510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:02,490-Speed 8880.27 samples/sec   Loss 3.7003   LearningRate 0.0011   Epoch: 17   Global Step: 299520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:03,606-Speed 9181.32 samples/sec   Loss 3.6184   LearningRate 0.0011   Epoch: 17   Global Step: 299530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:04,748-Speed 8976.55 samples/sec   Loss 3.6087   LearningRate 0.0011   Epoch: 17   Global Step: 299540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:40:05,886-Speed 9001.35 samples/sec   Loss 3.6049   LearningRate 0.0011   Epoch: 17   Global Step: 299550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:40:06,998-Speed 9217.01 samples/sec   Loss 3.5763   LearningRate 0.0011   Epoch: 17   Global Step: 299560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:40:08,202-Speed 8506.58 samples/sec   Loss 3.6215   LearningRate 0.0011   Epoch: 17   Global Step: 299570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:40:09,341-Speed 8999.25 samples/sec   Loss 3.5845   LearningRate 0.0011   Epoch: 17   Global Step: 299580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:40:10,486-Speed 8951.42 samples/sec   Loss 3.5567   LearningRate 0.0011   Epoch: 17   Global Step: 299590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:40:11,632-Speed 8936.39 samples/sec   Loss 3.4977   LearningRate 0.0011   Epoch: 17   Global Step: 299600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:40:12,748-Speed 9186.54 samples/sec   Loss 3.5731   LearningRate 0.0011   Epoch: 17   Global Step: 299610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:40:13,968-Speed 8399.71 samples/sec   Loss 3.6239   LearningRate 0.0010   Epoch: 17   Global Step: 299620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:40:15,070-Speed 9293.85 samples/sec   Loss 3.6547   LearningRate 0.0010   Epoch: 17   Global Step: 299630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:40:16,212-Speed 8976.93 samples/sec   Loss 3.5759   LearningRate 0.0010   Epoch: 17   Global Step: 299640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:17,296-Speed 9449.21 samples/sec   Loss 3.6793   LearningRate 0.0010   Epoch: 17   Global Step: 299650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:18,398-Speed 9292.59 samples/sec   Loss 3.6368   LearningRate 0.0010   Epoch: 17   Global Step: 299660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:19,564-Speed 8792.52 samples/sec   Loss 3.5945   LearningRate 0.0010   Epoch: 17   Global Step: 299670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:20,696-Speed 9055.58 samples/sec   Loss 3.7155   LearningRate 0.0010   Epoch: 17   Global Step: 299680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:21,861-Speed 8788.84 samples/sec   Loss 3.6179   LearningRate 0.0010   Epoch: 17   Global Step: 299690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:22,994-Speed 9048.76 samples/sec   Loss 3.5238   LearningRate 0.0010   Epoch: 17   Global Step: 299700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:24,150-Speed 8857.41 samples/sec   Loss 3.6400   LearningRate 0.0010   Epoch: 17   Global Step: 299710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:25,298-Speed 8930.73 samples/sec   Loss 3.5985   LearningRate 0.0010   Epoch: 17   Global Step: 299720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:26,448-Speed 8907.67 samples/sec   Loss 3.6178   LearningRate 0.0010   Epoch: 17   Global Step: 299730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:27,534-Speed 9433.43 samples/sec   Loss 3.6059   LearningRate 0.0010   Epoch: 17   Global Step: 299740   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:40:28,680-Speed 8941.29 samples/sec   Loss 3.6095   LearningRate 0.0010   Epoch: 17   Global Step: 299750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:29,829-Speed 8919.59 samples/sec   Loss 3.5727   LearningRate 0.0010   Epoch: 17   Global Step: 299760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:30,968-Speed 8995.25 samples/sec   Loss 3.5922   LearningRate 0.0010   Epoch: 17   Global Step: 299770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:32,097-Speed 9079.28 samples/sec   Loss 3.6348   LearningRate 0.0010   Epoch: 17   Global Step: 299780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:33,227-Speed 9065.22 samples/sec   Loss 3.6242   LearningRate 0.0010   Epoch: 17   Global Step: 299790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:34,337-Speed 9230.66 samples/sec   Loss 3.5566   LearningRate 0.0010   Epoch: 17   Global Step: 299800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:35,478-Speed 8981.79 samples/sec   Loss 3.6526   LearningRate 0.0010   Epoch: 17   Global Step: 299810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:36,599-Speed 9141.45 samples/sec   Loss 3.5958   LearningRate 0.0010   Epoch: 17   Global Step: 299820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:37,704-Speed 9265.46 samples/sec   Loss 3.5292   LearningRate 0.0010   Epoch: 17   Global Step: 299830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:38,848-Speed 8962.54 samples/sec   Loss 3.6860   LearningRate 0.0010   Epoch: 17   Global Step: 299840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:39,962-Speed 9190.04 samples/sec   Loss 3.5658   LearningRate 0.0010   Epoch: 17   Global Step: 299850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:41,133-Speed 8754.09 samples/sec   Loss 3.6207   LearningRate 0.0010   Epoch: 17   Global Step: 299860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:42,244-Speed 9217.47 samples/sec   Loss 3.5566   LearningRate 0.0010   Epoch: 17   Global Step: 299870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:43,376-Speed 9056.51 samples/sec   Loss 3.6609   LearningRate 0.0010   Epoch: 17   Global Step: 299880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:44,570-Speed 8583.96 samples/sec   Loss 3.6316   LearningRate 0.0010   Epoch: 17   Global Step: 299890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:45,666-Speed 9345.93 samples/sec   Loss 3.6014   LearningRate 0.0010   Epoch: 17   Global Step: 299900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:46,788-Speed 9130.39 samples/sec   Loss 3.6667   LearningRate 0.0010   Epoch: 17   Global Step: 299910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:47,942-Speed 8886.49 samples/sec   Loss 3.6703   LearningRate 0.0010   Epoch: 17   Global Step: 299920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:49,075-Speed 9038.31 samples/sec   Loss 3.5757   LearningRate 0.0010   Epoch: 17   Global Step: 299930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:50,199-Speed 9115.55 samples/sec   Loss 3.6126   LearningRate 0.0010   Epoch: 17   Global Step: 299940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:51,327-Speed 9084.87 samples/sec   Loss 3.5398   LearningRate 0.0010   Epoch: 17   Global Step: 299950   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:40:52,447-Speed 9146.51 samples/sec   Loss 3.5903   LearningRate 0.0010   Epoch: 17   Global Step: 299960   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:40:53,549-Speed 9296.98 samples/sec   Loss 3.5906   LearningRate 0.0010   Epoch: 17   Global Step: 299970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:54,642-Speed 9378.46 samples/sec   Loss 3.5982   LearningRate 0.0010   Epoch: 17   Global Step: 299980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:55,769-Speed 9093.01 samples/sec   Loss 3.5253   LearningRate 0.0010   Epoch: 17   Global Step: 299990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:40:56,892-Speed 9121.10 samples/sec   Loss 3.5668   LearningRate 0.0010   Epoch: 17   Global Step: 300000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:41:18,848-[lfw][300000]XNorm: 6.630975
Training: 2022-04-11 23:41:18,849-[lfw][300000]Accuracy-Flip: 0.99700+-0.00287
Training: 2022-04-11 23:41:18,850-[lfw][300000]Accuracy-Highest: 0.99733
Training: 2022-04-11 23:41:44,291-[cfp_fp][300000]XNorm: 5.798590
Training: 2022-04-11 23:41:44,291-[cfp_fp][300000]Accuracy-Flip: 0.97286+-0.00908
Training: 2022-04-11 23:41:44,292-[cfp_fp][300000]Accuracy-Highest: 0.97386
Training: 2022-04-11 23:42:06,178-[agedb_30][300000]XNorm: 6.468586
Training: 2022-04-11 23:42:06,179-[agedb_30][300000]Accuracy-Flip: 0.97083+-0.00814
Training: 2022-04-11 23:42:06,180-[agedb_30][300000]Accuracy-Highest: 0.97417
Training: 2022-04-11 23:42:07,287-Speed 145.47 samples/sec   Loss 3.6515   LearningRate 0.0010   Epoch: 17   Global Step: 300010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:08,418-Speed 9057.17 samples/sec   Loss 3.6476   LearningRate 0.0010   Epoch: 17   Global Step: 300020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:09,534-Speed 9180.64 samples/sec   Loss 3.6389   LearningRate 0.0010   Epoch: 17   Global Step: 300030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:10,620-Speed 9433.08 samples/sec   Loss 3.5735   LearningRate 0.0010   Epoch: 17   Global Step: 300040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:11,730-Speed 9229.80 samples/sec   Loss 3.6222   LearningRate 0.0010   Epoch: 17   Global Step: 300050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:12,851-Speed 9138.48 samples/sec   Loss 3.5939   LearningRate 0.0010   Epoch: 17   Global Step: 300060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:13,994-Speed 8966.22 samples/sec   Loss 3.6879   LearningRate 0.0010   Epoch: 17   Global Step: 300070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:15,167-Speed 8733.54 samples/sec   Loss 3.5766   LearningRate 0.0010   Epoch: 17   Global Step: 300080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:16,305-Speed 9000.65 samples/sec   Loss 3.6813   LearningRate 0.0010   Epoch: 17   Global Step: 300090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:17,413-Speed 9246.09 samples/sec   Loss 3.7307   LearningRate 0.0010   Epoch: 17   Global Step: 300100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:18,504-Speed 9397.35 samples/sec   Loss 3.6408   LearningRate 0.0010   Epoch: 17   Global Step: 300110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:19,608-Speed 9281.78 samples/sec   Loss 3.6052   LearningRate 0.0010   Epoch: 17   Global Step: 300120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:20,701-Speed 9377.10 samples/sec   Loss 3.5985   LearningRate 0.0010   Epoch: 17   Global Step: 300130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:21,827-Speed 9097.10 samples/sec   Loss 3.5886   LearningRate 0.0010   Epoch: 17   Global Step: 300140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:22,927-Speed 9315.69 samples/sec   Loss 3.6677   LearningRate 0.0010   Epoch: 17   Global Step: 300150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:24,109-Speed 8665.14 samples/sec   Loss 3.5269   LearningRate 0.0010   Epoch: 17   Global Step: 300160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:25,194-Speed 9449.27 samples/sec   Loss 3.5638   LearningRate 0.0010   Epoch: 17   Global Step: 300170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:26,346-Speed 8890.98 samples/sec   Loss 3.6593   LearningRate 0.0010   Epoch: 17   Global Step: 300180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:27,479-Speed 9039.98 samples/sec   Loss 3.7380   LearningRate 0.0010   Epoch: 17   Global Step: 300190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:28,642-Speed 8816.59 samples/sec   Loss 3.5506   LearningRate 0.0010   Epoch: 17   Global Step: 300200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:29,762-Speed 9147.63 samples/sec   Loss 3.6100   LearningRate 0.0010   Epoch: 17   Global Step: 300210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:30,885-Speed 9124.30 samples/sec   Loss 3.6183   LearningRate 0.0010   Epoch: 17   Global Step: 300220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:32,051-Speed 8782.38 samples/sec   Loss 3.6770   LearningRate 0.0010   Epoch: 17   Global Step: 300230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:33,163-Speed 9216.53 samples/sec   Loss 3.6136   LearningRate 0.0010   Epoch: 17   Global Step: 300240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:34,266-Speed 9289.37 samples/sec   Loss 3.5962   LearningRate 0.0010   Epoch: 17   Global Step: 300250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:35,358-Speed 9383.94 samples/sec   Loss 3.6556   LearningRate 0.0010   Epoch: 17   Global Step: 300260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:36,461-Speed 9290.59 samples/sec   Loss 3.6988   LearningRate 0.0010   Epoch: 17   Global Step: 300270   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:42:37,598-Speed 9012.39 samples/sec   Loss 3.5403   LearningRate 0.0010   Epoch: 17   Global Step: 300280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:38,715-Speed 9171.52 samples/sec   Loss 3.6185   LearningRate 0.0010   Epoch: 17   Global Step: 300290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:39,817-Speed 9298.75 samples/sec   Loss 3.6248   LearningRate 0.0010   Epoch: 17   Global Step: 300300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:40,945-Speed 9078.16 samples/sec   Loss 3.6627   LearningRate 0.0010   Epoch: 17   Global Step: 300310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:42,095-Speed 8913.48 samples/sec   Loss 3.6659   LearningRate 0.0010   Epoch: 17   Global Step: 300320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:43,241-Speed 8941.49 samples/sec   Loss 3.5771   LearningRate 0.0010   Epoch: 17   Global Step: 300330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:44,448-Speed 8484.39 samples/sec   Loss 3.5368   LearningRate 0.0010   Epoch: 17   Global Step: 300340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:45,517-Speed 9585.74 samples/sec   Loss 3.6665   LearningRate 0.0010   Epoch: 17   Global Step: 300350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:46,641-Speed 9114.76 samples/sec   Loss 3.5153   LearningRate 0.0010   Epoch: 17   Global Step: 300360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:47,766-Speed 9109.82 samples/sec   Loss 3.5904   LearningRate 0.0010   Epoch: 17   Global Step: 300370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:48,933-Speed 8781.19 samples/sec   Loss 3.5545   LearningRate 0.0010   Epoch: 17   Global Step: 300380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:50,070-Speed 9012.78 samples/sec   Loss 3.5860   LearningRate 0.0010   Epoch: 17   Global Step: 300390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:51,154-Speed 9450.15 samples/sec   Loss 3.5096   LearningRate 0.0010   Epoch: 17   Global Step: 300400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:52,267-Speed 9208.72 samples/sec   Loss 3.6048   LearningRate 0.0010   Epoch: 17   Global Step: 300410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:53,404-Speed 9013.31 samples/sec   Loss 3.6340   LearningRate 0.0010   Epoch: 17   Global Step: 300420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:42:54,949-Speed 6627.55 samples/sec   Loss 3.6367   LearningRate 0.0010   Epoch: 17   Global Step: 300430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:25,541-Speed 334.75 samples/sec   Loss 3.6035   LearningRate 0.0010   Epoch: 18   Global Step: 300440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:26,713-Speed 8741.42 samples/sec   Loss 3.2657   LearningRate 0.0010   Epoch: 18   Global Step: 300450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:27,904-Speed 8605.68 samples/sec   Loss 3.3355   LearningRate 0.0010   Epoch: 18   Global Step: 300460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:29,443-Speed 6655.46 samples/sec   Loss 3.3000   LearningRate 0.0010   Epoch: 18   Global Step: 300470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:43:31,200-Speed 5829.15 samples/sec   Loss 3.3441   LearningRate 0.0010   Epoch: 18   Global Step: 300480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:43:32,513-Speed 7806.00 samples/sec   Loss 3.4213   LearningRate 0.0010   Epoch: 18   Global Step: 300490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:43:33,871-Speed 7549.70 samples/sec   Loss 3.3405   LearningRate 0.0010   Epoch: 18   Global Step: 300500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:43:35,045-Speed 8722.42 samples/sec   Loss 3.3965   LearningRate 0.0010   Epoch: 18   Global Step: 300510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:43:36,192-Speed 8934.70 samples/sec   Loss 3.3640   LearningRate 0.0010   Epoch: 18   Global Step: 300520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:43:37,317-Speed 9107.25 samples/sec   Loss 3.3693   LearningRate 0.0010   Epoch: 18   Global Step: 300530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:43:38,450-Speed 9042.20 samples/sec   Loss 3.3966   LearningRate 0.0010   Epoch: 18   Global Step: 300540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:43:39,595-Speed 8946.83 samples/sec   Loss 3.3274   LearningRate 0.0010   Epoch: 18   Global Step: 300550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:43:40,796-Speed 8532.55 samples/sec   Loss 3.3346   LearningRate 0.0010   Epoch: 18   Global Step: 300560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:43:42,032-Speed 8288.16 samples/sec   Loss 3.4010   LearningRate 0.0010   Epoch: 18   Global Step: 300570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:43,192-Speed 8832.69 samples/sec   Loss 3.3158   LearningRate 0.0010   Epoch: 18   Global Step: 300580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:44,385-Speed 8586.26 samples/sec   Loss 3.3223   LearningRate 0.0010   Epoch: 18   Global Step: 300590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:45,510-Speed 9108.28 samples/sec   Loss 3.3848   LearningRate 0.0010   Epoch: 18   Global Step: 300600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:46,658-Speed 8924.85 samples/sec   Loss 3.3825   LearningRate 0.0010   Epoch: 18   Global Step: 300610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:47,779-Speed 9142.31 samples/sec   Loss 3.3659   LearningRate 0.0010   Epoch: 18   Global Step: 300620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:48,920-Speed 8975.25 samples/sec   Loss 3.3726   LearningRate 0.0010   Epoch: 18   Global Step: 300630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:50,022-Speed 9299.66 samples/sec   Loss 3.3768   LearningRate 0.0010   Epoch: 18   Global Step: 300640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:51,193-Speed 8750.27 samples/sec   Loss 3.3941   LearningRate 0.0010   Epoch: 18   Global Step: 300650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:52,320-Speed 9090.47 samples/sec   Loss 3.3591   LearningRate 0.0010   Epoch: 18   Global Step: 300660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:53,551-Speed 8327.13 samples/sec   Loss 3.2969   LearningRate 0.0010   Epoch: 18   Global Step: 300670   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:43:54,652-Speed 9303.35 samples/sec   Loss 3.3674   LearningRate 0.0010   Epoch: 18   Global Step: 300680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:55,812-Speed 8829.92 samples/sec   Loss 3.3075   LearningRate 0.0010   Epoch: 18   Global Step: 300690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:56,961-Speed 8920.10 samples/sec   Loss 3.3306   LearningRate 0.0010   Epoch: 18   Global Step: 300700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:58,108-Speed 8929.14 samples/sec   Loss 3.3264   LearningRate 0.0010   Epoch: 18   Global Step: 300710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:43:59,255-Speed 8929.96 samples/sec   Loss 3.3798   LearningRate 0.0010   Epoch: 18   Global Step: 300720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:00,440-Speed 8645.32 samples/sec   Loss 3.3467   LearningRate 0.0010   Epoch: 18   Global Step: 300730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:01,569-Speed 9076.76 samples/sec   Loss 3.3417   LearningRate 0.0010   Epoch: 18   Global Step: 300740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:02,719-Speed 8910.82 samples/sec   Loss 3.3414   LearningRate 0.0010   Epoch: 18   Global Step: 300750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:03,837-Speed 9170.16 samples/sec   Loss 3.3057   LearningRate 0.0010   Epoch: 18   Global Step: 300760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:04,989-Speed 8893.77 samples/sec   Loss 3.3293   LearningRate 0.0010   Epoch: 18   Global Step: 300770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:06,151-Speed 8817.03 samples/sec   Loss 3.4128   LearningRate 0.0010   Epoch: 18   Global Step: 300780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:07,334-Speed 8659.63 samples/sec   Loss 3.4016   LearningRate 0.0010   Epoch: 18   Global Step: 300790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:08,537-Speed 8516.62 samples/sec   Loss 3.3433   LearningRate 0.0010   Epoch: 18   Global Step: 300800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:09,812-Speed 8038.30 samples/sec   Loss 3.3275   LearningRate 0.0010   Epoch: 18   Global Step: 300810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:10,935-Speed 9124.14 samples/sec   Loss 3.3211   LearningRate 0.0010   Epoch: 18   Global Step: 300820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:12,070-Speed 9032.49 samples/sec   Loss 3.3781   LearningRate 0.0010   Epoch: 18   Global Step: 300830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:13,196-Speed 9094.92 samples/sec   Loss 3.4035   LearningRate 0.0010   Epoch: 18   Global Step: 300840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:14,332-Speed 9016.34 samples/sec   Loss 3.3670   LearningRate 0.0010   Epoch: 18   Global Step: 300850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:15,469-Speed 9012.67 samples/sec   Loss 3.3672   LearningRate 0.0010   Epoch: 18   Global Step: 300860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:16,580-Speed 9221.30 samples/sec   Loss 3.4176   LearningRate 0.0010   Epoch: 18   Global Step: 300870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:17,707-Speed 9088.58 samples/sec   Loss 3.3276   LearningRate 0.0010   Epoch: 18   Global Step: 300880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:18,842-Speed 9033.32 samples/sec   Loss 3.3801   LearningRate 0.0010   Epoch: 18   Global Step: 300890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:19,993-Speed 8897.63 samples/sec   Loss 3.3534   LearningRate 0.0010   Epoch: 18   Global Step: 300900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:21,139-Speed 8943.58 samples/sec   Loss 3.3669   LearningRate 0.0010   Epoch: 18   Global Step: 300910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:22,266-Speed 9087.84 samples/sec   Loss 3.4464   LearningRate 0.0010   Epoch: 18   Global Step: 300920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:23,407-Speed 8985.89 samples/sec   Loss 3.3136   LearningRate 0.0010   Epoch: 18   Global Step: 300930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:24,545-Speed 9002.29 samples/sec   Loss 3.3065   LearningRate 0.0010   Epoch: 18   Global Step: 300940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:25,667-Speed 9131.14 samples/sec   Loss 3.3397   LearningRate 0.0010   Epoch: 18   Global Step: 300950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:26,812-Speed 8945.31 samples/sec   Loss 3.3679   LearningRate 0.0010   Epoch: 18   Global Step: 300960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:27,945-Speed 9043.97 samples/sec   Loss 3.3413   LearningRate 0.0010   Epoch: 18   Global Step: 300970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:29,058-Speed 9209.58 samples/sec   Loss 3.3143   LearningRate 0.0010   Epoch: 18   Global Step: 300980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:30,172-Speed 9191.43 samples/sec   Loss 3.3198   LearningRate 0.0010   Epoch: 18   Global Step: 300990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:31,321-Speed 8919.23 samples/sec   Loss 3.2829   LearningRate 0.0010   Epoch: 18   Global Step: 301000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:32,489-Speed 8773.58 samples/sec   Loss 3.4486   LearningRate 0.0010   Epoch: 18   Global Step: 301010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:33,651-Speed 8815.49 samples/sec   Loss 3.3067   LearningRate 0.0010   Epoch: 18   Global Step: 301020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:34,765-Speed 9200.35 samples/sec   Loss 3.2889   LearningRate 0.0010   Epoch: 18   Global Step: 301030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:35,884-Speed 9159.20 samples/sec   Loss 3.2962   LearningRate 0.0010   Epoch: 18   Global Step: 301040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:37,026-Speed 8966.12 samples/sec   Loss 3.3424   LearningRate 0.0010   Epoch: 18   Global Step: 301050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:38,169-Speed 8969.03 samples/sec   Loss 3.3606   LearningRate 0.0010   Epoch: 18   Global Step: 301060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:39,309-Speed 8986.49 samples/sec   Loss 3.3814   LearningRate 0.0010   Epoch: 18   Global Step: 301070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:40,470-Speed 8826.64 samples/sec   Loss 3.3861   LearningRate 0.0010   Epoch: 18   Global Step: 301080   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:44:41,599-Speed 9076.22 samples/sec   Loss 3.3470   LearningRate 0.0010   Epoch: 18   Global Step: 301090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:42,736-Speed 9006.26 samples/sec   Loss 3.3765   LearningRate 0.0010   Epoch: 18   Global Step: 301100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:43,866-Speed 9071.46 samples/sec   Loss 3.4373   LearningRate 0.0010   Epoch: 18   Global Step: 301110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:44,975-Speed 9238.74 samples/sec   Loss 3.3831   LearningRate 0.0010   Epoch: 18   Global Step: 301120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:46,106-Speed 9057.80 samples/sec   Loss 3.3215   LearningRate 0.0010   Epoch: 18   Global Step: 301130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:47,232-Speed 9102.57 samples/sec   Loss 3.2248   LearningRate 0.0010   Epoch: 18   Global Step: 301140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:48,522-Speed 7939.82 samples/sec   Loss 3.3714   LearningRate 0.0010   Epoch: 18   Global Step: 301150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:49,614-Speed 9386.26 samples/sec   Loss 3.3030   LearningRate 0.0010   Epoch: 18   Global Step: 301160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:50,915-Speed 7875.62 samples/sec   Loss 3.3846   LearningRate 0.0010   Epoch: 18   Global Step: 301170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:52,359-Speed 7093.43 samples/sec   Loss 3.2810   LearningRate 0.0010   Epoch: 18   Global Step: 301180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:53,639-Speed 8004.41 samples/sec   Loss 3.3859   LearningRate 0.0010   Epoch: 18   Global Step: 301190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:54,967-Speed 7714.59 samples/sec   Loss 3.3231   LearningRate 0.0010   Epoch: 18   Global Step: 301200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:56,104-Speed 9006.36 samples/sec   Loss 3.4226   LearningRate 0.0010   Epoch: 18   Global Step: 301210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:57,233-Speed 9078.04 samples/sec   Loss 3.3780   LearningRate 0.0010   Epoch: 18   Global Step: 301220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:58,333-Speed 9316.64 samples/sec   Loss 3.3492   LearningRate 0.0010   Epoch: 18   Global Step: 301230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:44:59,456-Speed 9123.25 samples/sec   Loss 3.3328   LearningRate 0.0010   Epoch: 18   Global Step: 301240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:00,609-Speed 8890.08 samples/sec   Loss 3.3411   LearningRate 0.0010   Epoch: 18   Global Step: 301250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:01,672-Speed 9641.67 samples/sec   Loss 3.3169   LearningRate 0.0010   Epoch: 18   Global Step: 301260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:02,778-Speed 9261.28 samples/sec   Loss 3.3871   LearningRate 0.0010   Epoch: 18   Global Step: 301270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:03,905-Speed 9095.61 samples/sec   Loss 3.3934   LearningRate 0.0010   Epoch: 18   Global Step: 301280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:05,020-Speed 9187.43 samples/sec   Loss 3.3578   LearningRate 0.0009   Epoch: 18   Global Step: 301290   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:45:06,131-Speed 9221.13 samples/sec   Loss 3.3896   LearningRate 0.0009   Epoch: 18   Global Step: 301300   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:45:07,278-Speed 8930.17 samples/sec   Loss 3.3706   LearningRate 0.0009   Epoch: 18   Global Step: 301310   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:45:08,393-Speed 9193.02 samples/sec   Loss 3.4040   LearningRate 0.0009   Epoch: 18   Global Step: 301320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:09,530-Speed 9013.06 samples/sec   Loss 3.3405   LearningRate 0.0009   Epoch: 18   Global Step: 301330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:10,650-Speed 9147.44 samples/sec   Loss 3.3691   LearningRate 0.0009   Epoch: 18   Global Step: 301340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:11,796-Speed 8943.46 samples/sec   Loss 3.3614   LearningRate 0.0009   Epoch: 18   Global Step: 301350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:12,908-Speed 9212.84 samples/sec   Loss 3.2757   LearningRate 0.0009   Epoch: 18   Global Step: 301360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:14,082-Speed 8725.69 samples/sec   Loss 3.3323   LearningRate 0.0009   Epoch: 18   Global Step: 301370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:15,213-Speed 9063.90 samples/sec   Loss 3.3567   LearningRate 0.0009   Epoch: 18   Global Step: 301380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:16,325-Speed 9215.07 samples/sec   Loss 3.3453   LearningRate 0.0009   Epoch: 18   Global Step: 301390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:17,429-Speed 9279.69 samples/sec   Loss 3.3471   LearningRate 0.0009   Epoch: 18   Global Step: 301400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:18,557-Speed 9084.13 samples/sec   Loss 3.3294   LearningRate 0.0009   Epoch: 18   Global Step: 301410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:19,646-Speed 9400.30 samples/sec   Loss 3.3106   LearningRate 0.0009   Epoch: 18   Global Step: 301420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:20,772-Speed 9105.37 samples/sec   Loss 3.4118   LearningRate 0.0009   Epoch: 18   Global Step: 301430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:21,942-Speed 8754.90 samples/sec   Loss 3.4259   LearningRate 0.0009   Epoch: 18   Global Step: 301440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:23,097-Speed 8876.34 samples/sec   Loss 3.4084   LearningRate 0.0009   Epoch: 18   Global Step: 301450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:24,193-Speed 9346.72 samples/sec   Loss 3.3560   LearningRate 0.0009   Epoch: 18   Global Step: 301460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:25,273-Speed 9488.55 samples/sec   Loss 3.4110   LearningRate 0.0009   Epoch: 18   Global Step: 301470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:26,440-Speed 8779.93 samples/sec   Loss 3.3648   LearningRate 0.0009   Epoch: 18   Global Step: 301480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:27,613-Speed 8730.90 samples/sec   Loss 3.3157   LearningRate 0.0009   Epoch: 18   Global Step: 301490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:28,763-Speed 8916.23 samples/sec   Loss 3.2067   LearningRate 0.0009   Epoch: 18   Global Step: 301500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:29,893-Speed 9064.29 samples/sec   Loss 3.4352   LearningRate 0.0009   Epoch: 18   Global Step: 301510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:31,027-Speed 9033.47 samples/sec   Loss 3.3757   LearningRate 0.0009   Epoch: 18   Global Step: 301520   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:45:32,137-Speed 9231.64 samples/sec   Loss 3.2625   LearningRate 0.0009   Epoch: 18   Global Step: 301530   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:45:33,280-Speed 8971.87 samples/sec   Loss 3.3882   LearningRate 0.0009   Epoch: 18   Global Step: 301540   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:45:34,446-Speed 8782.84 samples/sec   Loss 3.3341   LearningRate 0.0009   Epoch: 18   Global Step: 301550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:35,578-Speed 9052.90 samples/sec   Loss 3.3535   LearningRate 0.0009   Epoch: 18   Global Step: 301560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:36,713-Speed 9025.75 samples/sec   Loss 3.3673   LearningRate 0.0009   Epoch: 18   Global Step: 301570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:37,841-Speed 9086.43 samples/sec   Loss 3.3566   LearningRate 0.0009   Epoch: 18   Global Step: 301580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:38,923-Speed 9472.40 samples/sec   Loss 3.3747   LearningRate 0.0009   Epoch: 18   Global Step: 301590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:40,081-Speed 8845.38 samples/sec   Loss 3.3730   LearningRate 0.0009   Epoch: 18   Global Step: 301600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:41,192-Speed 9223.75 samples/sec   Loss 3.4211   LearningRate 0.0009   Epoch: 18   Global Step: 301610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:42,320-Speed 9084.70 samples/sec   Loss 3.4473   LearningRate 0.0009   Epoch: 18   Global Step: 301620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:43,401-Speed 9476.41 samples/sec   Loss 3.3146   LearningRate 0.0009   Epoch: 18   Global Step: 301630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:44,522-Speed 9134.84 samples/sec   Loss 3.4192   LearningRate 0.0009   Epoch: 18   Global Step: 301640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:45,664-Speed 8976.98 samples/sec   Loss 3.3220   LearningRate 0.0009   Epoch: 18   Global Step: 301650   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:45:46,826-Speed 8811.64 samples/sec   Loss 3.3567   LearningRate 0.0009   Epoch: 18   Global Step: 301660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:47,967-Speed 8981.74 samples/sec   Loss 3.3654   LearningRate 0.0009   Epoch: 18   Global Step: 301670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:49,106-Speed 8992.68 samples/sec   Loss 3.3265   LearningRate 0.0009   Epoch: 18   Global Step: 301680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:50,230-Speed 9116.41 samples/sec   Loss 3.3447   LearningRate 0.0009   Epoch: 18   Global Step: 301690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:51,388-Speed 8848.14 samples/sec   Loss 3.3719   LearningRate 0.0009   Epoch: 18   Global Step: 301700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:52,542-Speed 8883.92 samples/sec   Loss 3.3059   LearningRate 0.0009   Epoch: 18   Global Step: 301710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:53,697-Speed 8879.86 samples/sec   Loss 3.3613   LearningRate 0.0009   Epoch: 18   Global Step: 301720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:54,840-Speed 8962.27 samples/sec   Loss 3.3322   LearningRate 0.0009   Epoch: 18   Global Step: 301730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:55,970-Speed 9066.25 samples/sec   Loss 3.3443   LearningRate 0.0009   Epoch: 18   Global Step: 301740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:57,113-Speed 8960.86 samples/sec   Loss 3.4072   LearningRate 0.0009   Epoch: 18   Global Step: 301750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:45:58,229-Speed 9186.46 samples/sec   Loss 3.3774   LearningRate 0.0009   Epoch: 18   Global Step: 301760   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:45:59,325-Speed 9345.37 samples/sec   Loss 3.3849   LearningRate 0.0009   Epoch: 18   Global Step: 301770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:00,416-Speed 9391.73 samples/sec   Loss 3.3092   LearningRate 0.0009   Epoch: 18   Global Step: 301780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:01,500-Speed 9453.39 samples/sec   Loss 3.3880   LearningRate 0.0009   Epoch: 18   Global Step: 301790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:02,606-Speed 9260.49 samples/sec   Loss 3.4493   LearningRate 0.0009   Epoch: 18   Global Step: 301800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:03,730-Speed 9121.05 samples/sec   Loss 3.3064   LearningRate 0.0009   Epoch: 18   Global Step: 301810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:04,813-Speed 9465.37 samples/sec   Loss 3.3934   LearningRate 0.0009   Epoch: 18   Global Step: 301820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:05,900-Speed 9425.96 samples/sec   Loss 3.3279   LearningRate 0.0009   Epoch: 18   Global Step: 301830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:07,065-Speed 8789.14 samples/sec   Loss 3.3286   LearningRate 0.0009   Epoch: 18   Global Step: 301840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:08,215-Speed 8913.23 samples/sec   Loss 3.5060   LearningRate 0.0009   Epoch: 18   Global Step: 301850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:09,347-Speed 9051.06 samples/sec   Loss 3.4183   LearningRate 0.0009   Epoch: 18   Global Step: 301860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:10,478-Speed 9061.98 samples/sec   Loss 3.3778   LearningRate 0.0009   Epoch: 18   Global Step: 301870   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:46:11,669-Speed 8604.64 samples/sec   Loss 3.3779   LearningRate 0.0009   Epoch: 18   Global Step: 301880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:12,817-Speed 8926.57 samples/sec   Loss 3.3371   LearningRate 0.0009   Epoch: 18   Global Step: 301890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:13,967-Speed 8906.04 samples/sec   Loss 3.3112   LearningRate 0.0009   Epoch: 18   Global Step: 301900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:15,092-Speed 9111.16 samples/sec   Loss 3.4066   LearningRate 0.0009   Epoch: 18   Global Step: 301910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:16,237-Speed 8942.24 samples/sec   Loss 3.3732   LearningRate 0.0009   Epoch: 18   Global Step: 301920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:17,329-Speed 9382.00 samples/sec   Loss 3.3435   LearningRate 0.0009   Epoch: 18   Global Step: 301930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:18,476-Speed 8932.53 samples/sec   Loss 3.3513   LearningRate 0.0009   Epoch: 18   Global Step: 301940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:19,636-Speed 8832.27 samples/sec   Loss 3.3934   LearningRate 0.0009   Epoch: 18   Global Step: 301950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:20,756-Speed 9151.23 samples/sec   Loss 3.4058   LearningRate 0.0009   Epoch: 18   Global Step: 301960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:21,852-Speed 9343.36 samples/sec   Loss 3.3881   LearningRate 0.0009   Epoch: 18   Global Step: 301970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:22,951-Speed 9326.64 samples/sec   Loss 3.3528   LearningRate 0.0009   Epoch: 18   Global Step: 301980   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:46:24,151-Speed 8542.10 samples/sec   Loss 3.2640   LearningRate 0.0009   Epoch: 18   Global Step: 301990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:25,290-Speed 8994.98 samples/sec   Loss 3.3380   LearningRate 0.0009   Epoch: 18   Global Step: 302000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:46:47,231-[lfw][302000]XNorm: 6.591494
Training: 2022-04-11 23:46:47,232-[lfw][302000]Accuracy-Flip: 0.99750+-0.00300
Training: 2022-04-11 23:46:47,232-[lfw][302000]Accuracy-Highest: 0.99750
Training: 2022-04-11 23:47:12,505-[cfp_fp][302000]XNorm: 5.771600
Training: 2022-04-11 23:47:12,506-[cfp_fp][302000]Accuracy-Flip: 0.97257+-0.00837
Training: 2022-04-11 23:47:12,506-[cfp_fp][302000]Accuracy-Highest: 0.97386
Training: 2022-04-11 23:47:34,353-[agedb_30][302000]XNorm: 6.428232
Training: 2022-04-11 23:47:34,354-[agedb_30][302000]Accuracy-Flip: 0.97250+-0.00814
Training: 2022-04-11 23:47:34,354-[agedb_30][302000]Accuracy-Highest: 0.97417
Training: 2022-04-11 23:47:35,499-Speed 145.85 samples/sec   Loss 3.4101   LearningRate 0.0009   Epoch: 18   Global Step: 302010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:36,618-Speed 9152.70 samples/sec   Loss 3.2880   LearningRate 0.0009   Epoch: 18   Global Step: 302020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:37,721-Speed 9292.88 samples/sec   Loss 3.3480   LearningRate 0.0009   Epoch: 18   Global Step: 302030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:38,816-Speed 9358.95 samples/sec   Loss 3.4467   LearningRate 0.0009   Epoch: 18   Global Step: 302040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:39,960-Speed 8949.83 samples/sec   Loss 3.3116   LearningRate 0.0009   Epoch: 18   Global Step: 302050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:41,095-Speed 9025.92 samples/sec   Loss 3.4011   LearningRate 0.0009   Epoch: 18   Global Step: 302060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:42,262-Speed 8784.01 samples/sec   Loss 3.3799   LearningRate 0.0009   Epoch: 18   Global Step: 302070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:43,432-Speed 8758.63 samples/sec   Loss 3.3595   LearningRate 0.0009   Epoch: 18   Global Step: 302080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:44,579-Speed 8935.39 samples/sec   Loss 3.3321   LearningRate 0.0009   Epoch: 18   Global Step: 302090   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:47:45,696-Speed 9173.80 samples/sec   Loss 3.2945   LearningRate 0.0009   Epoch: 18   Global Step: 302100   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:47:46,817-Speed 9139.49 samples/sec   Loss 3.2612   LearningRate 0.0009   Epoch: 18   Global Step: 302110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:47,935-Speed 9165.04 samples/sec   Loss 3.3206   LearningRate 0.0009   Epoch: 18   Global Step: 302120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:49,074-Speed 8991.27 samples/sec   Loss 3.3491   LearningRate 0.0009   Epoch: 18   Global Step: 302130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:50,167-Speed 9376.92 samples/sec   Loss 3.3424   LearningRate 0.0009   Epoch: 18   Global Step: 302140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:51,355-Speed 8624.09 samples/sec   Loss 3.4001   LearningRate 0.0009   Epoch: 18   Global Step: 302150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:52,468-Speed 9204.42 samples/sec   Loss 3.3703   LearningRate 0.0009   Epoch: 18   Global Step: 302160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:53,566-Speed 9334.47 samples/sec   Loss 3.3987   LearningRate 0.0009   Epoch: 18   Global Step: 302170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:54,665-Speed 9323.62 samples/sec   Loss 3.3240   LearningRate 0.0009   Epoch: 18   Global Step: 302180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:55,758-Speed 9369.68 samples/sec   Loss 3.3952   LearningRate 0.0009   Epoch: 18   Global Step: 302190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:56,920-Speed 8822.64 samples/sec   Loss 3.4083   LearningRate 0.0009   Epoch: 18   Global Step: 302200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:58,023-Speed 9288.38 samples/sec   Loss 3.3923   LearningRate 0.0009   Epoch: 18   Global Step: 302210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:47:59,170-Speed 8935.57 samples/sec   Loss 3.4028   LearningRate 0.0009   Epoch: 18   Global Step: 302220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:00,297-Speed 9092.99 samples/sec   Loss 3.3390   LearningRate 0.0009   Epoch: 18   Global Step: 302230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:01,370-Speed 9544.47 samples/sec   Loss 3.3349   LearningRate 0.0009   Epoch: 18   Global Step: 302240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:02,489-Speed 9155.61 samples/sec   Loss 3.2685   LearningRate 0.0009   Epoch: 18   Global Step: 302250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:03,588-Speed 9326.64 samples/sec   Loss 3.3305   LearningRate 0.0009   Epoch: 18   Global Step: 302260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:04,705-Speed 9170.14 samples/sec   Loss 3.3810   LearningRate 0.0009   Epoch: 18   Global Step: 302270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:05,848-Speed 8968.08 samples/sec   Loss 3.3594   LearningRate 0.0009   Epoch: 18   Global Step: 302280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:06,954-Speed 9264.98 samples/sec   Loss 3.4072   LearningRate 0.0009   Epoch: 18   Global Step: 302290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:08,036-Speed 9465.48 samples/sec   Loss 3.3332   LearningRate 0.0009   Epoch: 18   Global Step: 302300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:09,180-Speed 8955.55 samples/sec   Loss 3.3298   LearningRate 0.0009   Epoch: 18   Global Step: 302310   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:48:10,285-Speed 9280.79 samples/sec   Loss 3.3769   LearningRate 0.0009   Epoch: 18   Global Step: 302320   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:48:11,415-Speed 9068.36 samples/sec   Loss 3.3592   LearningRate 0.0009   Epoch: 18   Global Step: 302330   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:48:12,546-Speed 9055.23 samples/sec   Loss 3.3443   LearningRate 0.0009   Epoch: 18   Global Step: 302340   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:48:13,637-Speed 9389.71 samples/sec   Loss 3.3492   LearningRate 0.0009   Epoch: 18   Global Step: 302350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:14,729-Speed 9387.64 samples/sec   Loss 3.4190   LearningRate 0.0009   Epoch: 18   Global Step: 302360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:15,851-Speed 9129.76 samples/sec   Loss 3.3238   LearningRate 0.0009   Epoch: 18   Global Step: 302370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:16,995-Speed 8961.21 samples/sec   Loss 3.2824   LearningRate 0.0009   Epoch: 18   Global Step: 302380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:48:18,103-Speed 9242.49 samples/sec   Loss 3.3172   LearningRate 0.0009   Epoch: 18   Global Step: 302390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:48:19,241-Speed 9004.73 samples/sec   Loss 3.4601   LearningRate 0.0009   Epoch: 18   Global Step: 302400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:48:20,367-Speed 9098.76 samples/sec   Loss 3.3714   LearningRate 0.0009   Epoch: 18   Global Step: 302410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:48:21,513-Speed 8941.72 samples/sec   Loss 3.4187   LearningRate 0.0009   Epoch: 18   Global Step: 302420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:48:22,707-Speed 8584.97 samples/sec   Loss 3.4100   LearningRate 0.0009   Epoch: 18   Global Step: 302430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:48:23,823-Speed 9172.21 samples/sec   Loss 3.4027   LearningRate 0.0009   Epoch: 18   Global Step: 302440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:48:24,904-Speed 9492.32 samples/sec   Loss 3.3383   LearningRate 0.0009   Epoch: 18   Global Step: 302450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:48:26,019-Speed 9195.63 samples/sec   Loss 3.4154   LearningRate 0.0009   Epoch: 18   Global Step: 302460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:48:27,137-Speed 9158.04 samples/sec   Loss 3.3957   LearningRate 0.0009   Epoch: 18   Global Step: 302470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:48:28,233-Speed 9352.41 samples/sec   Loss 3.4137   LearningRate 0.0009   Epoch: 18   Global Step: 302480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:29,340-Speed 9253.06 samples/sec   Loss 3.3548   LearningRate 0.0009   Epoch: 18   Global Step: 302490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:30,437-Speed 9341.83 samples/sec   Loss 3.4146   LearningRate 0.0009   Epoch: 18   Global Step: 302500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:31,599-Speed 8817.30 samples/sec   Loss 3.3201   LearningRate 0.0009   Epoch: 18   Global Step: 302510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:32,805-Speed 8492.93 samples/sec   Loss 3.4354   LearningRate 0.0009   Epoch: 18   Global Step: 302520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:33,950-Speed 8952.91 samples/sec   Loss 3.3926   LearningRate 0.0009   Epoch: 18   Global Step: 302530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:35,062-Speed 9222.21 samples/sec   Loss 3.3439   LearningRate 0.0009   Epoch: 18   Global Step: 302540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:36,230-Speed 8767.03 samples/sec   Loss 3.3257   LearningRate 0.0009   Epoch: 18   Global Step: 302550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:37,325-Speed 9356.08 samples/sec   Loss 3.3577   LearningRate 0.0009   Epoch: 18   Global Step: 302560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:38,415-Speed 9401.99 samples/sec   Loss 3.3429   LearningRate 0.0009   Epoch: 18   Global Step: 302570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:39,512-Speed 9338.11 samples/sec   Loss 3.3912   LearningRate 0.0009   Epoch: 18   Global Step: 302580   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:48:40,589-Speed 9517.99 samples/sec   Loss 3.3837   LearningRate 0.0009   Epoch: 18   Global Step: 302590   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:48:41,675-Speed 9435.58 samples/sec   Loss 3.3529   LearningRate 0.0009   Epoch: 18   Global Step: 302600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:42,785-Speed 9235.80 samples/sec   Loss 3.3191   LearningRate 0.0009   Epoch: 18   Global Step: 302610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:43,870-Speed 9437.65 samples/sec   Loss 3.3154   LearningRate 0.0009   Epoch: 18   Global Step: 302620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:44,942-Speed 9555.85 samples/sec   Loss 3.3955   LearningRate 0.0009   Epoch: 18   Global Step: 302630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:46,115-Speed 8736.80 samples/sec   Loss 3.3513   LearningRate 0.0009   Epoch: 18   Global Step: 302640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:47,239-Speed 9117.66 samples/sec   Loss 3.3965   LearningRate 0.0009   Epoch: 18   Global Step: 302650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:48,306-Speed 9602.20 samples/sec   Loss 3.3909   LearningRate 0.0009   Epoch: 18   Global Step: 302660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:49,451-Speed 8942.75 samples/sec   Loss 3.4113   LearningRate 0.0009   Epoch: 18   Global Step: 302670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:50,630-Speed 8694.57 samples/sec   Loss 3.3123   LearningRate 0.0009   Epoch: 18   Global Step: 302680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:51,776-Speed 8936.93 samples/sec   Loss 3.3306   LearningRate 0.0009   Epoch: 18   Global Step: 302690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:52,907-Speed 9065.77 samples/sec   Loss 3.3712   LearningRate 0.0009   Epoch: 18   Global Step: 302700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:54,036-Speed 9079.17 samples/sec   Loss 3.3897   LearningRate 0.0009   Epoch: 18   Global Step: 302710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:55,166-Speed 9067.02 samples/sec   Loss 3.3944   LearningRate 0.0009   Epoch: 18   Global Step: 302720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:56,267-Speed 9308.12 samples/sec   Loss 3.3545   LearningRate 0.0009   Epoch: 18   Global Step: 302730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:57,406-Speed 8995.01 samples/sec   Loss 3.3473   LearningRate 0.0009   Epoch: 18   Global Step: 302740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:58,508-Speed 9289.95 samples/sec   Loss 3.4069   LearningRate 0.0009   Epoch: 18   Global Step: 302750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:48:59,593-Speed 9450.15 samples/sec   Loss 3.3376   LearningRate 0.0009   Epoch: 18   Global Step: 302760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:00,748-Speed 8866.64 samples/sec   Loss 3.3466   LearningRate 0.0009   Epoch: 18   Global Step: 302770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:01,878-Speed 9067.83 samples/sec   Loss 3.3863   LearningRate 0.0009   Epoch: 18   Global Step: 302780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:02,983-Speed 9273.68 samples/sec   Loss 3.3600   LearningRate 0.0009   Epoch: 18   Global Step: 302790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:04,088-Speed 9278.55 samples/sec   Loss 3.3859   LearningRate 0.0009   Epoch: 18   Global Step: 302800   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:49:05,175-Speed 9424.48 samples/sec   Loss 3.3789   LearningRate 0.0009   Epoch: 18   Global Step: 302810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:06,341-Speed 8781.44 samples/sec   Loss 3.3289   LearningRate 0.0009   Epoch: 18   Global Step: 302820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:07,504-Speed 8812.83 samples/sec   Loss 3.3749   LearningRate 0.0009   Epoch: 18   Global Step: 302830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:08,642-Speed 9006.06 samples/sec   Loss 3.3996   LearningRate 0.0009   Epoch: 18   Global Step: 302840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:09,812-Speed 8752.16 samples/sec   Loss 3.3743   LearningRate 0.0009   Epoch: 18   Global Step: 302850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:10,977-Speed 8802.30 samples/sec   Loss 3.3896   LearningRate 0.0009   Epoch: 18   Global Step: 302860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:12,072-Speed 9353.95 samples/sec   Loss 3.3687   LearningRate 0.0009   Epoch: 18   Global Step: 302870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:13,213-Speed 8984.95 samples/sec   Loss 3.3897   LearningRate 0.0009   Epoch: 18   Global Step: 302880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:14,282-Speed 9576.61 samples/sec   Loss 3.4062   LearningRate 0.0009   Epoch: 18   Global Step: 302890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:15,396-Speed 9198.09 samples/sec   Loss 3.3350   LearningRate 0.0009   Epoch: 18   Global Step: 302900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:16,488-Speed 9382.71 samples/sec   Loss 3.3813   LearningRate 0.0009   Epoch: 18   Global Step: 302910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:17,607-Speed 9165.43 samples/sec   Loss 3.3909   LearningRate 0.0009   Epoch: 18   Global Step: 302920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:18,724-Speed 9170.33 samples/sec   Loss 3.3508   LearningRate 0.0009   Epoch: 18   Global Step: 302930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:19,844-Speed 9144.71 samples/sec   Loss 3.3344   LearningRate 0.0009   Epoch: 18   Global Step: 302940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:20,931-Speed 9426.29 samples/sec   Loss 3.3382   LearningRate 0.0009   Epoch: 18   Global Step: 302950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:22,032-Speed 9309.39 samples/sec   Loss 3.2762   LearningRate 0.0009   Epoch: 18   Global Step: 302960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:23,167-Speed 9028.44 samples/sec   Loss 3.2396   LearningRate 0.0009   Epoch: 18   Global Step: 302970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:24,275-Speed 9248.68 samples/sec   Loss 3.3607   LearningRate 0.0009   Epoch: 18   Global Step: 302980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:49:25,429-Speed 8877.33 samples/sec   Loss 3.3362   LearningRate 0.0009   Epoch: 18   Global Step: 302990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:49:26,548-Speed 9160.91 samples/sec   Loss 3.3788   LearningRate 0.0009   Epoch: 18   Global Step: 303000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:49:27,714-Speed 8786.91 samples/sec   Loss 3.3588   LearningRate 0.0009   Epoch: 18   Global Step: 303010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:49:28,794-Speed 9483.69 samples/sec   Loss 3.3105   LearningRate 0.0009   Epoch: 18   Global Step: 303020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:49:29,883-Speed 9409.47 samples/sec   Loss 3.4189   LearningRate 0.0009   Epoch: 18   Global Step: 303030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:49:30,985-Speed 9302.84 samples/sec   Loss 3.3720   LearningRate 0.0009   Epoch: 18   Global Step: 303040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:49:32,095-Speed 9226.51 samples/sec   Loss 3.3584   LearningRate 0.0008   Epoch: 18   Global Step: 303050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:49:33,221-Speed 9104.73 samples/sec   Loss 3.3314   LearningRate 0.0008   Epoch: 18   Global Step: 303060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:49:34,355-Speed 9038.07 samples/sec   Loss 3.3783   LearningRate 0.0008   Epoch: 18   Global Step: 303070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:49:35,512-Speed 8855.65 samples/sec   Loss 3.4129   LearningRate 0.0008   Epoch: 18   Global Step: 303080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:36,620-Speed 9245.50 samples/sec   Loss 3.3784   LearningRate 0.0008   Epoch: 18   Global Step: 303090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:37,747-Speed 9094.80 samples/sec   Loss 3.3756   LearningRate 0.0008   Epoch: 18   Global Step: 303100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:38,874-Speed 9085.19 samples/sec   Loss 3.3805   LearningRate 0.0008   Epoch: 18   Global Step: 303110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:39,986-Speed 9216.87 samples/sec   Loss 3.4169   LearningRate 0.0008   Epoch: 18   Global Step: 303120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:41,083-Speed 9341.02 samples/sec   Loss 3.2689   LearningRate 0.0008   Epoch: 18   Global Step: 303130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:42,229-Speed 8939.90 samples/sec   Loss 3.3381   LearningRate 0.0008   Epoch: 18   Global Step: 303140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:43,365-Speed 9019.89 samples/sec   Loss 3.3345   LearningRate 0.0008   Epoch: 18   Global Step: 303150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:44,513-Speed 8924.31 samples/sec   Loss 3.3386   LearningRate 0.0008   Epoch: 18   Global Step: 303160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:45,674-Speed 8823.06 samples/sec   Loss 3.3747   LearningRate 0.0008   Epoch: 18   Global Step: 303170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:46,825-Speed 8901.95 samples/sec   Loss 3.3227   LearningRate 0.0008   Epoch: 18   Global Step: 303180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:47,984-Speed 8840.20 samples/sec   Loss 3.3297   LearningRate 0.0008   Epoch: 18   Global Step: 303190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:49,126-Speed 8971.10 samples/sec   Loss 3.3921   LearningRate 0.0008   Epoch: 18   Global Step: 303200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:50,183-Speed 9697.28 samples/sec   Loss 3.3627   LearningRate 0.0008   Epoch: 18   Global Step: 303210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:51,281-Speed 9330.13 samples/sec   Loss 3.3575   LearningRate 0.0008   Epoch: 18   Global Step: 303220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:52,431-Speed 8914.23 samples/sec   Loss 3.3317   LearningRate 0.0008   Epoch: 18   Global Step: 303230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:53,627-Speed 8562.70 samples/sec   Loss 3.3066   LearningRate 0.0008   Epoch: 18   Global Step: 303240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:54,721-Speed 9372.53 samples/sec   Loss 3.3832   LearningRate 0.0008   Epoch: 18   Global Step: 303250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:55,779-Speed 9678.37 samples/sec   Loss 3.3399   LearningRate 0.0008   Epoch: 18   Global Step: 303260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:56,869-Speed 9401.58 samples/sec   Loss 3.3811   LearningRate 0.0008   Epoch: 18   Global Step: 303270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:49:57,980-Speed 9221.16 samples/sec   Loss 3.4247   LearningRate 0.0008   Epoch: 18   Global Step: 303280   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:49:59,126-Speed 8937.69 samples/sec   Loss 3.4118   LearningRate 0.0008   Epoch: 18   Global Step: 303290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:00,219-Speed 9376.68 samples/sec   Loss 3.4570   LearningRate 0.0008   Epoch: 18   Global Step: 303300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:01,341-Speed 9128.46 samples/sec   Loss 3.2830   LearningRate 0.0008   Epoch: 18   Global Step: 303310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:02,482-Speed 8978.20 samples/sec   Loss 3.3313   LearningRate 0.0008   Epoch: 18   Global Step: 303320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:03,619-Speed 9012.69 samples/sec   Loss 3.3961   LearningRate 0.0008   Epoch: 18   Global Step: 303330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:04,790-Speed 8757.06 samples/sec   Loss 3.3847   LearningRate 0.0008   Epoch: 18   Global Step: 303340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:05,911-Speed 9139.29 samples/sec   Loss 3.3523   LearningRate 0.0008   Epoch: 18   Global Step: 303350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:07,015-Speed 9282.09 samples/sec   Loss 3.3924   LearningRate 0.0008   Epoch: 18   Global Step: 303360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:08,166-Speed 8900.42 samples/sec   Loss 3.2671   LearningRate 0.0008   Epoch: 18   Global Step: 303370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:09,317-Speed 8906.10 samples/sec   Loss 3.3736   LearningRate 0.0008   Epoch: 18   Global Step: 303380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:10,460-Speed 8957.46 samples/sec   Loss 3.3688   LearningRate 0.0008   Epoch: 18   Global Step: 303390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:11,518-Speed 9687.61 samples/sec   Loss 3.3407   LearningRate 0.0008   Epoch: 18   Global Step: 303400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:12,613-Speed 9360.48 samples/sec   Loss 3.3618   LearningRate 0.0008   Epoch: 18   Global Step: 303410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:13,765-Speed 8893.57 samples/sec   Loss 3.3897   LearningRate 0.0008   Epoch: 18   Global Step: 303420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:14,903-Speed 9006.44 samples/sec   Loss 3.3454   LearningRate 0.0008   Epoch: 18   Global Step: 303430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:15,977-Speed 9537.62 samples/sec   Loss 3.2302   LearningRate 0.0008   Epoch: 18   Global Step: 303440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:17,091-Speed 9197.14 samples/sec   Loss 3.4742   LearningRate 0.0008   Epoch: 18   Global Step: 303450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:18,250-Speed 8840.30 samples/sec   Loss 3.3175   LearningRate 0.0008   Epoch: 18   Global Step: 303460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:19,370-Speed 9148.96 samples/sec   Loss 3.3968   LearningRate 0.0008   Epoch: 18   Global Step: 303470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:20,527-Speed 8851.17 samples/sec   Loss 3.3853   LearningRate 0.0008   Epoch: 18   Global Step: 303480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:21,605-Speed 9506.00 samples/sec   Loss 3.3152   LearningRate 0.0008   Epoch: 18   Global Step: 303490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:22,788-Speed 8659.13 samples/sec   Loss 3.3829   LearningRate 0.0008   Epoch: 18   Global Step: 303500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:23,999-Speed 8464.06 samples/sec   Loss 3.3481   LearningRate 0.0008   Epoch: 18   Global Step: 303510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:25,131-Speed 9056.39 samples/sec   Loss 3.4337   LearningRate 0.0008   Epoch: 18   Global Step: 303520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:26,279-Speed 8921.31 samples/sec   Loss 3.3560   LearningRate 0.0008   Epoch: 18   Global Step: 303530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:27,409-Speed 9067.20 samples/sec   Loss 3.3902   LearningRate 0.0008   Epoch: 18   Global Step: 303540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:28,526-Speed 9177.38 samples/sec   Loss 3.3745   LearningRate 0.0008   Epoch: 18   Global Step: 303550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:29,661-Speed 9028.73 samples/sec   Loss 3.4316   LearningRate 0.0008   Epoch: 18   Global Step: 303560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:30,789-Speed 9080.45 samples/sec   Loss 3.3306   LearningRate 0.0008   Epoch: 18   Global Step: 303570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:31,882-Speed 9370.68 samples/sec   Loss 3.3667   LearningRate 0.0008   Epoch: 18   Global Step: 303580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:33,000-Speed 9170.19 samples/sec   Loss 3.3537   LearningRate 0.0008   Epoch: 18   Global Step: 303590   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:50:34,068-Speed 9598.52 samples/sec   Loss 3.4062   LearningRate 0.0008   Epoch: 18   Global Step: 303600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:35,214-Speed 8934.07 samples/sec   Loss 3.2476   LearningRate 0.0008   Epoch: 18   Global Step: 303610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:36,351-Speed 9015.82 samples/sec   Loss 3.3527   LearningRate 0.0008   Epoch: 18   Global Step: 303620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:37,486-Speed 9023.53 samples/sec   Loss 3.3899   LearningRate 0.0008   Epoch: 18   Global Step: 303630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:38,658-Speed 8739.64 samples/sec   Loss 3.3024   LearningRate 0.0008   Epoch: 18   Global Step: 303640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:39,747-Speed 9408.35 samples/sec   Loss 3.4272   LearningRate 0.0008   Epoch: 18   Global Step: 303650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:40,848-Speed 9314.49 samples/sec   Loss 3.3422   LearningRate 0.0008   Epoch: 18   Global Step: 303660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:41,948-Speed 9316.07 samples/sec   Loss 3.4043   LearningRate 0.0008   Epoch: 18   Global Step: 303670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:43,102-Speed 8876.30 samples/sec   Loss 3.3749   LearningRate 0.0008   Epoch: 18   Global Step: 303680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:44,252-Speed 8906.23 samples/sec   Loss 3.4492   LearningRate 0.0008   Epoch: 18   Global Step: 303690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:45,360-Speed 9249.15 samples/sec   Loss 3.3933   LearningRate 0.0008   Epoch: 18   Global Step: 303700   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:50:46,500-Speed 8990.79 samples/sec   Loss 3.3845   LearningRate 0.0008   Epoch: 18   Global Step: 303710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:47,633-Speed 9039.97 samples/sec   Loss 3.4423   LearningRate 0.0008   Epoch: 18   Global Step: 303720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:48,742-Speed 9235.29 samples/sec   Loss 3.3819   LearningRate 0.0008   Epoch: 18   Global Step: 303730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:49,877-Speed 9028.97 samples/sec   Loss 3.4440   LearningRate 0.0008   Epoch: 18   Global Step: 303740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:50,992-Speed 9190.47 samples/sec   Loss 3.4074   LearningRate 0.0008   Epoch: 18   Global Step: 303750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:52,090-Speed 9331.45 samples/sec   Loss 3.3157   LearningRate 0.0008   Epoch: 18   Global Step: 303760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:53,240-Speed 8911.47 samples/sec   Loss 3.4301   LearningRate 0.0008   Epoch: 18   Global Step: 303770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:54,394-Speed 8879.75 samples/sec   Loss 3.3797   LearningRate 0.0008   Epoch: 18   Global Step: 303780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:55,486-Speed 9379.75 samples/sec   Loss 3.3627   LearningRate 0.0008   Epoch: 18   Global Step: 303790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:56,573-Speed 9421.70 samples/sec   Loss 3.4119   LearningRate 0.0008   Epoch: 18   Global Step: 303800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:57,684-Speed 9227.70 samples/sec   Loss 3.4188   LearningRate 0.0008   Epoch: 18   Global Step: 303810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:58,860-Speed 8708.01 samples/sec   Loss 3.4041   LearningRate 0.0008   Epoch: 18   Global Step: 303820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:50:59,992-Speed 9053.12 samples/sec   Loss 3.3638   LearningRate 0.0008   Epoch: 18   Global Step: 303830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:51:01,073-Speed 9474.91 samples/sec   Loss 3.4045   LearningRate 0.0008   Epoch: 18   Global Step: 303840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:51:02,215-Speed 8976.81 samples/sec   Loss 3.3358   LearningRate 0.0008   Epoch: 18   Global Step: 303850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:51:03,327-Speed 9212.80 samples/sec   Loss 3.4394   LearningRate 0.0008   Epoch: 18   Global Step: 303860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:51:04,424-Speed 9341.01 samples/sec   Loss 3.3247   LearningRate 0.0008   Epoch: 18   Global Step: 303870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:51:05,530-Speed 9262.69 samples/sec   Loss 3.4227   LearningRate 0.0008   Epoch: 18   Global Step: 303880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:51:06,607-Speed 9533.12 samples/sec   Loss 3.3181   LearningRate 0.0008   Epoch: 18   Global Step: 303890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:51:07,745-Speed 9005.23 samples/sec   Loss 3.4997   LearningRate 0.0008   Epoch: 18   Global Step: 303900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:51:08,856-Speed 9224.54 samples/sec   Loss 3.3588   LearningRate 0.0008   Epoch: 18   Global Step: 303910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:51:09,957-Speed 9300.48 samples/sec   Loss 3.3529   LearningRate 0.0008   Epoch: 18   Global Step: 303920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:51:11,071-Speed 9201.38 samples/sec   Loss 3.4347   LearningRate 0.0008   Epoch: 18   Global Step: 303930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:51:12,180-Speed 9244.63 samples/sec   Loss 3.3582   LearningRate 0.0008   Epoch: 18   Global Step: 303940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:51:13,293-Speed 9202.91 samples/sec   Loss 3.3761   LearningRate 0.0008   Epoch: 18   Global Step: 303950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:51:14,446-Speed 8883.48 samples/sec   Loss 3.3308   LearningRate 0.0008   Epoch: 18   Global Step: 303960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:51:15,580-Speed 9035.80 samples/sec   Loss 3.3258   LearningRate 0.0008   Epoch: 18   Global Step: 303970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:51:16,666-Speed 9437.38 samples/sec   Loss 3.4147   LearningRate 0.0008   Epoch: 18   Global Step: 303980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:51:17,800-Speed 9036.00 samples/sec   Loss 3.3320   LearningRate 0.0008   Epoch: 18   Global Step: 303990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:51:18,959-Speed 8835.61 samples/sec   Loss 3.3686   LearningRate 0.0008   Epoch: 18   Global Step: 304000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:51:41,136-[lfw][304000]XNorm: 6.605141
Training: 2022-04-11 23:51:41,137-[lfw][304000]Accuracy-Flip: 0.99633+-0.00233
Training: 2022-04-11 23:51:41,138-[lfw][304000]Accuracy-Highest: 0.99750
Training: 2022-04-11 23:52:06,653-[cfp_fp][304000]XNorm: 5.764245
Training: 2022-04-11 23:52:06,654-[cfp_fp][304000]Accuracy-Flip: 0.97286+-0.00777
Training: 2022-04-11 23:52:06,654-[cfp_fp][304000]Accuracy-Highest: 0.97386
Training: 2022-04-11 23:52:28,635-[agedb_30][304000]XNorm: 6.438392
Training: 2022-04-11 23:52:28,636-[agedb_30][304000]Accuracy-Flip: 0.97383+-0.00823
Training: 2022-04-11 23:52:28,637-[agedb_30][304000]Accuracy-Highest: 0.97417
Training: 2022-04-11 23:52:29,797-Speed 144.56 samples/sec   Loss 3.3829   LearningRate 0.0008   Epoch: 18   Global Step: 304010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:30,903-Speed 9261.35 samples/sec   Loss 3.2926   LearningRate 0.0008   Epoch: 18   Global Step: 304020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:32,010-Speed 9250.46 samples/sec   Loss 3.3262   LearningRate 0.0008   Epoch: 18   Global Step: 304030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:33,132-Speed 9131.21 samples/sec   Loss 3.3630   LearningRate 0.0008   Epoch: 18   Global Step: 304040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:34,255-Speed 9130.70 samples/sec   Loss 3.4174   LearningRate 0.0008   Epoch: 18   Global Step: 304050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:35,361-Speed 9262.17 samples/sec   Loss 3.3732   LearningRate 0.0008   Epoch: 18   Global Step: 304060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:36,518-Speed 8855.57 samples/sec   Loss 3.3586   LearningRate 0.0008   Epoch: 18   Global Step: 304070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:37,661-Speed 8968.36 samples/sec   Loss 3.4312   LearningRate 0.0008   Epoch: 18   Global Step: 304080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:38,824-Speed 8802.86 samples/sec   Loss 3.4036   LearningRate 0.0008   Epoch: 18   Global Step: 304090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:39,939-Speed 9193.92 samples/sec   Loss 3.3391   LearningRate 0.0008   Epoch: 18   Global Step: 304100   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:52:41,031-Speed 9387.11 samples/sec   Loss 3.3185   LearningRate 0.0008   Epoch: 18   Global Step: 304110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:42,164-Speed 9039.44 samples/sec   Loss 3.4142   LearningRate 0.0008   Epoch: 18   Global Step: 304120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:43,390-Speed 8356.95 samples/sec   Loss 3.4670   LearningRate 0.0008   Epoch: 18   Global Step: 304130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:44,558-Speed 8770.46 samples/sec   Loss 3.3813   LearningRate 0.0008   Epoch: 18   Global Step: 304140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:45,705-Speed 8929.94 samples/sec   Loss 3.4032   LearningRate 0.0008   Epoch: 18   Global Step: 304150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:46,820-Speed 9191.27 samples/sec   Loss 3.3257   LearningRate 0.0008   Epoch: 18   Global Step: 304160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:47,952-Speed 9049.44 samples/sec   Loss 3.3945   LearningRate 0.0008   Epoch: 18   Global Step: 304170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:49,134-Speed 8670.61 samples/sec   Loss 3.4079   LearningRate 0.0008   Epoch: 18   Global Step: 304180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:50,225-Speed 9391.53 samples/sec   Loss 3.3503   LearningRate 0.0008   Epoch: 18   Global Step: 304190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:51,304-Speed 9492.36 samples/sec   Loss 3.3429   LearningRate 0.0008   Epoch: 18   Global Step: 304200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:52,419-Speed 9190.63 samples/sec   Loss 3.3861   LearningRate 0.0008   Epoch: 18   Global Step: 304210   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:52:53,507-Speed 9422.88 samples/sec   Loss 3.3223   LearningRate 0.0008   Epoch: 18   Global Step: 304220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:54,705-Speed 8548.85 samples/sec   Loss 3.3115   LearningRate 0.0008   Epoch: 18   Global Step: 304230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:55,799-Speed 9371.29 samples/sec   Loss 3.3299   LearningRate 0.0008   Epoch: 18   Global Step: 304240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:56,898-Speed 9316.63 samples/sec   Loss 3.3680   LearningRate 0.0008   Epoch: 18   Global Step: 304250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:57,993-Speed 9356.04 samples/sec   Loss 3.4706   LearningRate 0.0008   Epoch: 18   Global Step: 304260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:52:59,108-Speed 9190.36 samples/sec   Loss 3.4124   LearningRate 0.0008   Epoch: 18   Global Step: 304270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:00,223-Speed 9191.88 samples/sec   Loss 3.3842   LearningRate 0.0008   Epoch: 18   Global Step: 304280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:01,332-Speed 9237.37 samples/sec   Loss 3.3574   LearningRate 0.0008   Epoch: 18   Global Step: 304290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:02,458-Speed 9094.87 samples/sec   Loss 3.3969   LearningRate 0.0008   Epoch: 18   Global Step: 304300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:03,603-Speed 8952.17 samples/sec   Loss 3.3364   LearningRate 0.0008   Epoch: 18   Global Step: 304310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:04,710-Speed 9261.39 samples/sec   Loss 3.4508   LearningRate 0.0008   Epoch: 18   Global Step: 304320   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:53:05,851-Speed 8979.02 samples/sec   Loss 3.3331   LearningRate 0.0008   Epoch: 18   Global Step: 304330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:06,967-Speed 9179.76 samples/sec   Loss 3.3533   LearningRate 0.0008   Epoch: 18   Global Step: 304340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:08,088-Speed 9135.65 samples/sec   Loss 3.3571   LearningRate 0.0008   Epoch: 18   Global Step: 304350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:09,238-Speed 8912.89 samples/sec   Loss 3.4003   LearningRate 0.0008   Epoch: 18   Global Step: 304360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:10,379-Speed 8973.88 samples/sec   Loss 3.3159   LearningRate 0.0008   Epoch: 18   Global Step: 304370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:11,507-Speed 9087.52 samples/sec   Loss 3.4169   LearningRate 0.0008   Epoch: 18   Global Step: 304380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:12,606-Speed 9325.90 samples/sec   Loss 3.3533   LearningRate 0.0008   Epoch: 18   Global Step: 304390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:13,736-Speed 9068.06 samples/sec   Loss 3.3298   LearningRate 0.0008   Epoch: 18   Global Step: 304400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:14,852-Speed 9177.72 samples/sec   Loss 3.4409   LearningRate 0.0008   Epoch: 18   Global Step: 304410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:15,957-Speed 9274.53 samples/sec   Loss 3.3986   LearningRate 0.0008   Epoch: 18   Global Step: 304420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:17,076-Speed 9157.76 samples/sec   Loss 3.4269   LearningRate 0.0008   Epoch: 18   Global Step: 304430   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:53:18,177-Speed 9307.54 samples/sec   Loss 3.3026   LearningRate 0.0008   Epoch: 18   Global Step: 304440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:19,257-Speed 9484.15 samples/sec   Loss 3.3026   LearningRate 0.0008   Epoch: 18   Global Step: 304450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:20,311-Speed 9717.76 samples/sec   Loss 3.4755   LearningRate 0.0008   Epoch: 18   Global Step: 304460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:21,438-Speed 9087.75 samples/sec   Loss 3.3242   LearningRate 0.0008   Epoch: 18   Global Step: 304470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:22,616-Speed 8703.52 samples/sec   Loss 3.2760   LearningRate 0.0008   Epoch: 18   Global Step: 304480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:23,719-Speed 9287.34 samples/sec   Loss 3.3798   LearningRate 0.0008   Epoch: 18   Global Step: 304490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:24,807-Speed 9414.80 samples/sec   Loss 3.4141   LearningRate 0.0008   Epoch: 18   Global Step: 304500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:25,883-Speed 9522.35 samples/sec   Loss 3.3466   LearningRate 0.0008   Epoch: 18   Global Step: 304510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:27,033-Speed 8907.96 samples/sec   Loss 3.4566   LearningRate 0.0008   Epoch: 18   Global Step: 304520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:28,211-Speed 8699.20 samples/sec   Loss 3.3436   LearningRate 0.0008   Epoch: 18   Global Step: 304530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:29,356-Speed 8955.50 samples/sec   Loss 3.4065   LearningRate 0.0008   Epoch: 18   Global Step: 304540   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:53:30,445-Speed 9406.60 samples/sec   Loss 3.3910   LearningRate 0.0008   Epoch: 18   Global Step: 304550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:31,568-Speed 9124.03 samples/sec   Loss 3.3159   LearningRate 0.0008   Epoch: 18   Global Step: 304560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:32,714-Speed 8941.94 samples/sec   Loss 3.4276   LearningRate 0.0008   Epoch: 18   Global Step: 304570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:33,850-Speed 9018.32 samples/sec   Loss 3.4420   LearningRate 0.0008   Epoch: 18   Global Step: 304580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:34,994-Speed 8961.78 samples/sec   Loss 3.3665   LearningRate 0.0008   Epoch: 18   Global Step: 304590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:36,079-Speed 9442.10 samples/sec   Loss 3.3980   LearningRate 0.0008   Epoch: 18   Global Step: 304600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:37,191-Speed 9211.67 samples/sec   Loss 3.3553   LearningRate 0.0008   Epoch: 18   Global Step: 304610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:38,285-Speed 9370.87 samples/sec   Loss 3.4477   LearningRate 0.0008   Epoch: 18   Global Step: 304620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:39,442-Speed 8850.66 samples/sec   Loss 3.3942   LearningRate 0.0008   Epoch: 18   Global Step: 304630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:40,560-Speed 9165.73 samples/sec   Loss 3.3630   LearningRate 0.0008   Epoch: 18   Global Step: 304640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:41,711-Speed 8904.61 samples/sec   Loss 3.3208   LearningRate 0.0008   Epoch: 18   Global Step: 304650   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:53:42,813-Speed 9296.52 samples/sec   Loss 3.4175   LearningRate 0.0008   Epoch: 18   Global Step: 304660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:43,994-Speed 8674.14 samples/sec   Loss 3.4338   LearningRate 0.0008   Epoch: 18   Global Step: 304670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:45,132-Speed 9006.87 samples/sec   Loss 3.3833   LearningRate 0.0008   Epoch: 18   Global Step: 304680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:46,274-Speed 8966.39 samples/sec   Loss 3.3044   LearningRate 0.0008   Epoch: 18   Global Step: 304690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:47,381-Speed 9257.81 samples/sec   Loss 3.3911   LearningRate 0.0008   Epoch: 18   Global Step: 304700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:48,513-Speed 9052.20 samples/sec   Loss 3.4124   LearningRate 0.0008   Epoch: 18   Global Step: 304710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:49,636-Speed 9123.80 samples/sec   Loss 3.3450   LearningRate 0.0008   Epoch: 18   Global Step: 304720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:50,742-Speed 9263.26 samples/sec   Loss 3.3586   LearningRate 0.0008   Epoch: 18   Global Step: 304730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:51,860-Speed 9166.80 samples/sec   Loss 3.4298   LearningRate 0.0008   Epoch: 18   Global Step: 304740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:52,997-Speed 9011.00 samples/sec   Loss 3.3118   LearningRate 0.0008   Epoch: 18   Global Step: 304750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:54,108-Speed 9218.90 samples/sec   Loss 3.3652   LearningRate 0.0008   Epoch: 18   Global Step: 304760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:55,226-Speed 9164.32 samples/sec   Loss 3.3612   LearningRate 0.0008   Epoch: 18   Global Step: 304770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:56,317-Speed 9390.79 samples/sec   Loss 3.4264   LearningRate 0.0008   Epoch: 18   Global Step: 304780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:57,415-Speed 9333.98 samples/sec   Loss 3.4228   LearningRate 0.0008   Epoch: 18   Global Step: 304790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:58,538-Speed 9122.76 samples/sec   Loss 3.4347   LearningRate 0.0008   Epoch: 18   Global Step: 304800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:53:59,627-Speed 9409.90 samples/sec   Loss 3.4869   LearningRate 0.0008   Epoch: 18   Global Step: 304810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:00,765-Speed 9005.19 samples/sec   Loss 3.3392   LearningRate 0.0008   Epoch: 18   Global Step: 304820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:01,869-Speed 9276.15 samples/sec   Loss 3.3459   LearningRate 0.0008   Epoch: 18   Global Step: 304830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:03,000-Speed 9059.19 samples/sec   Loss 3.4348   LearningRate 0.0008   Epoch: 18   Global Step: 304840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:04,103-Speed 9291.05 samples/sec   Loss 3.4408   LearningRate 0.0008   Epoch: 18   Global Step: 304850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:05,200-Speed 9340.73 samples/sec   Loss 3.3953   LearningRate 0.0008   Epoch: 18   Global Step: 304860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:06,310-Speed 9230.14 samples/sec   Loss 3.3646   LearningRate 0.0008   Epoch: 18   Global Step: 304870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:07,412-Speed 9305.84 samples/sec   Loss 3.4002   LearningRate 0.0008   Epoch: 18   Global Step: 304880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:08,515-Speed 9284.66 samples/sec   Loss 3.3798   LearningRate 0.0008   Epoch: 18   Global Step: 304890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:09,619-Speed 9280.88 samples/sec   Loss 3.3990   LearningRate 0.0008   Epoch: 18   Global Step: 304900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:10,740-Speed 9145.41 samples/sec   Loss 3.4084   LearningRate 0.0008   Epoch: 18   Global Step: 304910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:11,909-Speed 8765.95 samples/sec   Loss 3.4174   LearningRate 0.0007   Epoch: 18   Global Step: 304920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:12,994-Speed 9435.39 samples/sec   Loss 3.3153   LearningRate 0.0007   Epoch: 18   Global Step: 304930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:14,099-Speed 9277.11 samples/sec   Loss 3.3143   LearningRate 0.0007   Epoch: 18   Global Step: 304940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:15,207-Speed 9245.18 samples/sec   Loss 3.4270   LearningRate 0.0007   Epoch: 18   Global Step: 304950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:16,320-Speed 9203.65 samples/sec   Loss 3.4419   LearningRate 0.0007   Epoch: 18   Global Step: 304960   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:54:17,433-Speed 9208.21 samples/sec   Loss 3.4024   LearningRate 0.0007   Epoch: 18   Global Step: 304970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:18,574-Speed 8980.88 samples/sec   Loss 3.3638   LearningRate 0.0007   Epoch: 18   Global Step: 304980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:19,739-Speed 8791.15 samples/sec   Loss 3.3907   LearningRate 0.0007   Epoch: 18   Global Step: 304990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:20,825-Speed 9438.12 samples/sec   Loss 3.4343   LearningRate 0.0007   Epoch: 18   Global Step: 305000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:21,928-Speed 9287.19 samples/sec   Loss 3.4254   LearningRate 0.0007   Epoch: 18   Global Step: 305010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:23,062-Speed 9036.27 samples/sec   Loss 3.3521   LearningRate 0.0007   Epoch: 18   Global Step: 305020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:24,233-Speed 8748.68 samples/sec   Loss 3.4055   LearningRate 0.0007   Epoch: 18   Global Step: 305030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:25,366-Speed 9050.68 samples/sec   Loss 3.3063   LearningRate 0.0007   Epoch: 18   Global Step: 305040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:26,478-Speed 9207.64 samples/sec   Loss 3.3542   LearningRate 0.0007   Epoch: 18   Global Step: 305050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:27,653-Speed 8722.95 samples/sec   Loss 3.3765   LearningRate 0.0007   Epoch: 18   Global Step: 305060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:28,777-Speed 9115.54 samples/sec   Loss 3.4175   LearningRate 0.0007   Epoch: 18   Global Step: 305070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:29,889-Speed 9209.33 samples/sec   Loss 3.4471   LearningRate 0.0007   Epoch: 18   Global Step: 305080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:31,028-Speed 9001.31 samples/sec   Loss 3.3753   LearningRate 0.0007   Epoch: 18   Global Step: 305090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:32,147-Speed 9151.98 samples/sec   Loss 3.3863   LearningRate 0.0007   Epoch: 18   Global Step: 305100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:33,292-Speed 8950.02 samples/sec   Loss 3.4888   LearningRate 0.0007   Epoch: 18   Global Step: 305110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:34,439-Speed 8936.32 samples/sec   Loss 3.3065   LearningRate 0.0007   Epoch: 18   Global Step: 305120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:35,564-Speed 9107.43 samples/sec   Loss 3.4381   LearningRate 0.0007   Epoch: 18   Global Step: 305130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:36,674-Speed 9230.98 samples/sec   Loss 3.3831   LearningRate 0.0007   Epoch: 18   Global Step: 305140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:37,802-Speed 9082.09 samples/sec   Loss 3.4412   LearningRate 0.0007   Epoch: 18   Global Step: 305150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:38,968-Speed 8780.44 samples/sec   Loss 3.3241   LearningRate 0.0007   Epoch: 18   Global Step: 305160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:40,124-Speed 8865.37 samples/sec   Loss 3.4168   LearningRate 0.0007   Epoch: 18   Global Step: 305170   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:54:41,258-Speed 9039.76 samples/sec   Loss 3.4074   LearningRate 0.0007   Epoch: 18   Global Step: 305180   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:54:42,381-Speed 9126.62 samples/sec   Loss 3.3759   LearningRate 0.0007   Epoch: 18   Global Step: 305190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:43,492-Speed 9218.90 samples/sec   Loss 3.3973   LearningRate 0.0007   Epoch: 18   Global Step: 305200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:44,584-Speed 9382.50 samples/sec   Loss 3.4013   LearningRate 0.0007   Epoch: 18   Global Step: 305210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:45,642-Speed 9686.15 samples/sec   Loss 3.4653   LearningRate 0.0007   Epoch: 18   Global Step: 305220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:46,749-Speed 9252.99 samples/sec   Loss 3.3980   LearningRate 0.0007   Epoch: 18   Global Step: 305230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:47,898-Speed 8917.43 samples/sec   Loss 3.3231   LearningRate 0.0007   Epoch: 18   Global Step: 305240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:49,024-Speed 9101.60 samples/sec   Loss 3.3310   LearningRate 0.0007   Epoch: 18   Global Step: 305250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:50,148-Speed 9112.85 samples/sec   Loss 3.3997   LearningRate 0.0007   Epoch: 18   Global Step: 305260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:51,264-Speed 9177.24 samples/sec   Loss 3.4093   LearningRate 0.0007   Epoch: 18   Global Step: 305270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:52,373-Speed 9242.31 samples/sec   Loss 3.3900   LearningRate 0.0007   Epoch: 18   Global Step: 305280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:53,521-Speed 8931.27 samples/sec   Loss 3.3849   LearningRate 0.0007   Epoch: 18   Global Step: 305290   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:54:54,701-Speed 8682.56 samples/sec   Loss 3.3199   LearningRate 0.0007   Epoch: 18   Global Step: 305300   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:54:55,825-Speed 9119.59 samples/sec   Loss 3.3181   LearningRate 0.0007   Epoch: 18   Global Step: 305310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:56,917-Speed 9384.36 samples/sec   Loss 3.3334   LearningRate 0.0007   Epoch: 18   Global Step: 305320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:57,994-Speed 9511.39 samples/sec   Loss 3.2369   LearningRate 0.0007   Epoch: 18   Global Step: 305330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:54:59,110-Speed 9182.09 samples/sec   Loss 3.3641   LearningRate 0.0007   Epoch: 18   Global Step: 305340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:00,201-Speed 9384.04 samples/sec   Loss 3.3595   LearningRate 0.0007   Epoch: 18   Global Step: 305350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:01,307-Speed 9263.04 samples/sec   Loss 3.4295   LearningRate 0.0007   Epoch: 18   Global Step: 305360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:02,425-Speed 9174.92 samples/sec   Loss 3.3557   LearningRate 0.0007   Epoch: 18   Global Step: 305370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:03,500-Speed 9529.85 samples/sec   Loss 3.3830   LearningRate 0.0007   Epoch: 18   Global Step: 305380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:04,635-Speed 9025.79 samples/sec   Loss 3.4053   LearningRate 0.0007   Epoch: 18   Global Step: 305390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:05,746-Speed 9226.10 samples/sec   Loss 3.3230   LearningRate 0.0007   Epoch: 18   Global Step: 305400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:06,870-Speed 9112.63 samples/sec   Loss 3.3344   LearningRate 0.0007   Epoch: 18   Global Step: 305410   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:55:07,977-Speed 9255.36 samples/sec   Loss 3.3940   LearningRate 0.0007   Epoch: 18   Global Step: 305420   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:55:09,077-Speed 9317.40 samples/sec   Loss 3.3346   LearningRate 0.0007   Epoch: 18   Global Step: 305430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:10,170-Speed 9388.20 samples/sec   Loss 3.3753   LearningRate 0.0007   Epoch: 18   Global Step: 305440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:11,316-Speed 8946.09 samples/sec   Loss 3.4098   LearningRate 0.0007   Epoch: 18   Global Step: 305450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:12,410-Speed 9367.09 samples/sec   Loss 3.4215   LearningRate 0.0007   Epoch: 18   Global Step: 305460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:13,541-Speed 9054.55 samples/sec   Loss 3.4131   LearningRate 0.0007   Epoch: 18   Global Step: 305470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:14,741-Speed 8540.54 samples/sec   Loss 3.3561   LearningRate 0.0007   Epoch: 18   Global Step: 305480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:15,836-Speed 9351.16 samples/sec   Loss 3.3636   LearningRate 0.0007   Epoch: 18   Global Step: 305490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:16,930-Speed 9376.82 samples/sec   Loss 3.3661   LearningRate 0.0007   Epoch: 18   Global Step: 305500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:18,030-Speed 9306.19 samples/sec   Loss 3.4609   LearningRate 0.0007   Epoch: 18   Global Step: 305510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:19,152-Speed 9137.55 samples/sec   Loss 3.4580   LearningRate 0.0007   Epoch: 18   Global Step: 305520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:20,240-Speed 9417.30 samples/sec   Loss 3.3700   LearningRate 0.0007   Epoch: 18   Global Step: 305530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:21,353-Speed 9200.56 samples/sec   Loss 3.4475   LearningRate 0.0007   Epoch: 18   Global Step: 305540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:22,443-Speed 9403.19 samples/sec   Loss 3.3218   LearningRate 0.0007   Epoch: 18   Global Step: 305550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:23,559-Speed 9180.39 samples/sec   Loss 3.3685   LearningRate 0.0007   Epoch: 18   Global Step: 305560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:24,667-Speed 9252.44 samples/sec   Loss 3.3543   LearningRate 0.0007   Epoch: 18   Global Step: 305570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:25,793-Speed 9092.99 samples/sec   Loss 3.3615   LearningRate 0.0007   Epoch: 18   Global Step: 305580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:26,936-Speed 8968.85 samples/sec   Loss 3.4462   LearningRate 0.0007   Epoch: 18   Global Step: 305590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:28,042-Speed 9259.30 samples/sec   Loss 3.4282   LearningRate 0.0007   Epoch: 18   Global Step: 305600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:29,129-Speed 9433.78 samples/sec   Loss 3.2864   LearningRate 0.0007   Epoch: 18   Global Step: 305610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:30,258-Speed 9075.13 samples/sec   Loss 3.3918   LearningRate 0.0007   Epoch: 18   Global Step: 305620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:31,370-Speed 9212.58 samples/sec   Loss 3.4045   LearningRate 0.0007   Epoch: 18   Global Step: 305630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:32,463-Speed 9369.89 samples/sec   Loss 3.4160   LearningRate 0.0007   Epoch: 18   Global Step: 305640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:55:33,582-Speed 9161.69 samples/sec   Loss 3.4454   LearningRate 0.0007   Epoch: 18   Global Step: 305650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:34,759-Speed 8702.79 samples/sec   Loss 3.4272   LearningRate 0.0007   Epoch: 18   Global Step: 305660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:35,890-Speed 9058.78 samples/sec   Loss 3.3569   LearningRate 0.0007   Epoch: 18   Global Step: 305670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:37,019-Speed 9071.60 samples/sec   Loss 3.3327   LearningRate 0.0007   Epoch: 18   Global Step: 305680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:38,121-Speed 9299.11 samples/sec   Loss 3.4363   LearningRate 0.0007   Epoch: 18   Global Step: 305690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:39,266-Speed 8947.21 samples/sec   Loss 3.4203   LearningRate 0.0007   Epoch: 18   Global Step: 305700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:40,347-Speed 9480.52 samples/sec   Loss 3.3734   LearningRate 0.0007   Epoch: 18   Global Step: 305710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:41,435-Speed 9425.76 samples/sec   Loss 3.3781   LearningRate 0.0007   Epoch: 18   Global Step: 305720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:42,525-Speed 9394.07 samples/sec   Loss 3.3401   LearningRate 0.0007   Epoch: 18   Global Step: 305730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:43,638-Speed 9204.12 samples/sec   Loss 3.4028   LearningRate 0.0007   Epoch: 18   Global Step: 305740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:44,726-Speed 9422.57 samples/sec   Loss 3.4508   LearningRate 0.0007   Epoch: 18   Global Step: 305750   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:55:45,815-Speed 9406.88 samples/sec   Loss 3.4428   LearningRate 0.0007   Epoch: 18   Global Step: 305760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:46,934-Speed 9159.48 samples/sec   Loss 3.4101   LearningRate 0.0007   Epoch: 18   Global Step: 305770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:48,091-Speed 8854.76 samples/sec   Loss 3.3973   LearningRate 0.0007   Epoch: 18   Global Step: 305780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:49,286-Speed 8569.45 samples/sec   Loss 3.4600   LearningRate 0.0007   Epoch: 18   Global Step: 305790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:50,393-Speed 9257.01 samples/sec   Loss 3.4774   LearningRate 0.0007   Epoch: 18   Global Step: 305800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:51,472-Speed 9492.72 samples/sec   Loss 3.3791   LearningRate 0.0007   Epoch: 18   Global Step: 305810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:52,618-Speed 8946.51 samples/sec   Loss 3.4534   LearningRate 0.0007   Epoch: 18   Global Step: 305820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:53,762-Speed 8948.83 samples/sec   Loss 3.3747   LearningRate 0.0007   Epoch: 18   Global Step: 305830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:54,870-Speed 9250.11 samples/sec   Loss 3.2965   LearningRate 0.0007   Epoch: 18   Global Step: 305840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:55,980-Speed 9227.05 samples/sec   Loss 3.3449   LearningRate 0.0007   Epoch: 18   Global Step: 305850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:57,064-Speed 9458.27 samples/sec   Loss 3.3537   LearningRate 0.0007   Epoch: 18   Global Step: 305860   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:55:58,172-Speed 9249.01 samples/sec   Loss 3.3670   LearningRate 0.0007   Epoch: 18   Global Step: 305870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:55:59,310-Speed 9001.54 samples/sec   Loss 3.3817   LearningRate 0.0007   Epoch: 18   Global Step: 305880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:56:00,449-Speed 8999.35 samples/sec   Loss 3.3985   LearningRate 0.0007   Epoch: 18   Global Step: 305890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:56:01,658-Speed 8476.43 samples/sec   Loss 3.3176   LearningRate 0.0007   Epoch: 18   Global Step: 305900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:56:02,817-Speed 8839.79 samples/sec   Loss 3.3287   LearningRate 0.0007   Epoch: 18   Global Step: 305910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:56:03,926-Speed 9235.78 samples/sec   Loss 3.4481   LearningRate 0.0007   Epoch: 18   Global Step: 305920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:56:05,026-Speed 9314.29 samples/sec   Loss 3.4419   LearningRate 0.0007   Epoch: 18   Global Step: 305930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:56:06,213-Speed 8626.62 samples/sec   Loss 3.4142   LearningRate 0.0007   Epoch: 18   Global Step: 305940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:56:07,342-Speed 9077.85 samples/sec   Loss 3.4412   LearningRate 0.0007   Epoch: 18   Global Step: 305950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:56:08,488-Speed 8939.31 samples/sec   Loss 3.4517   LearningRate 0.0007   Epoch: 18   Global Step: 305960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:56:09,612-Speed 9121.13 samples/sec   Loss 3.2815   LearningRate 0.0007   Epoch: 18   Global Step: 305970   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:56:10,695-Speed 9454.53 samples/sec   Loss 3.3566   LearningRate 0.0007   Epoch: 18   Global Step: 305980   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:56:11,811-Speed 9182.49 samples/sec   Loss 3.3770   LearningRate 0.0007   Epoch: 18   Global Step: 305990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:56:12,908-Speed 9340.12 samples/sec   Loss 3.3589   LearningRate 0.0007   Epoch: 18   Global Step: 306000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:56:34,989-[lfw][306000]XNorm: 6.541931
Training: 2022-04-11 23:56:34,990-[lfw][306000]Accuracy-Flip: 0.99650+-0.00273
Training: 2022-04-11 23:56:34,990-[lfw][306000]Accuracy-Highest: 0.99750
Training: 2022-04-11 23:57:00,479-[cfp_fp][306000]XNorm: 5.713169
Training: 2022-04-11 23:57:00,480-[cfp_fp][306000]Accuracy-Flip: 0.97214+-0.00915
Training: 2022-04-11 23:57:00,480-[cfp_fp][306000]Accuracy-Highest: 0.97386
Training: 2022-04-11 23:57:22,456-[agedb_30][306000]XNorm: 6.375520
Training: 2022-04-11 23:57:22,457-[agedb_30][306000]Accuracy-Flip: 0.97350+-0.00709
Training: 2022-04-11 23:57:22,457-[agedb_30][306000]Accuracy-Highest: 0.97417
Training: 2022-04-11 23:57:23,561-Speed 144.94 samples/sec   Loss 3.3760   LearningRate 0.0007   Epoch: 18   Global Step: 306010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:24,671-Speed 9228.78 samples/sec   Loss 3.4078   LearningRate 0.0007   Epoch: 18   Global Step: 306020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:25,860-Speed 8615.42 samples/sec   Loss 3.3874   LearningRate 0.0007   Epoch: 18   Global Step: 306030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:26,957-Speed 9337.61 samples/sec   Loss 3.4063   LearningRate 0.0007   Epoch: 18   Global Step: 306040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:28,071-Speed 9196.95 samples/sec   Loss 3.4466   LearningRate 0.0007   Epoch: 18   Global Step: 306050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:29,225-Speed 8881.09 samples/sec   Loss 3.4125   LearningRate 0.0007   Epoch: 18   Global Step: 306060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:30,350-Speed 9108.00 samples/sec   Loss 3.3340   LearningRate 0.0007   Epoch: 18   Global Step: 306070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:31,475-Speed 9110.82 samples/sec   Loss 3.3941   LearningRate 0.0007   Epoch: 18   Global Step: 306080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:32,576-Speed 9302.70 samples/sec   Loss 3.3701   LearningRate 0.0007   Epoch: 18   Global Step: 306090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:33,775-Speed 8543.66 samples/sec   Loss 3.3911   LearningRate 0.0007   Epoch: 18   Global Step: 306100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:34,894-Speed 9159.63 samples/sec   Loss 3.3565   LearningRate 0.0007   Epoch: 18   Global Step: 306110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:35,975-Speed 9473.52 samples/sec   Loss 3.4310   LearningRate 0.0007   Epoch: 18   Global Step: 306120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:37,072-Speed 9344.83 samples/sec   Loss 3.3812   LearningRate 0.0007   Epoch: 18   Global Step: 306130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:38,206-Speed 9029.85 samples/sec   Loss 3.2617   LearningRate 0.0007   Epoch: 18   Global Step: 306140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:39,335-Speed 9073.12 samples/sec   Loss 3.4096   LearningRate 0.0007   Epoch: 18   Global Step: 306150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:40,448-Speed 9208.33 samples/sec   Loss 3.3487   LearningRate 0.0007   Epoch: 18   Global Step: 306160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:41,571-Speed 9125.71 samples/sec   Loss 3.4016   LearningRate 0.0007   Epoch: 18   Global Step: 306170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:42,710-Speed 8997.24 samples/sec   Loss 3.4397   LearningRate 0.0007   Epoch: 18   Global Step: 306180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:43,824-Speed 9190.77 samples/sec   Loss 3.4458   LearningRate 0.0007   Epoch: 18   Global Step: 306190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:44,947-Speed 9125.78 samples/sec   Loss 3.3983   LearningRate 0.0007   Epoch: 18   Global Step: 306200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:46,039-Speed 9382.07 samples/sec   Loss 3.3986   LearningRate 0.0007   Epoch: 18   Global Step: 306210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:47,136-Speed 9350.27 samples/sec   Loss 3.3421   LearningRate 0.0007   Epoch: 18   Global Step: 306220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:48,272-Speed 9015.15 samples/sec   Loss 3.4824   LearningRate 0.0007   Epoch: 18   Global Step: 306230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:49,406-Speed 9032.64 samples/sec   Loss 3.3871   LearningRate 0.0007   Epoch: 18   Global Step: 306240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:50,485-Speed 9497.58 samples/sec   Loss 3.3897   LearningRate 0.0007   Epoch: 18   Global Step: 306250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:51,631-Speed 8941.50 samples/sec   Loss 3.4346   LearningRate 0.0007   Epoch: 18   Global Step: 306260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:52,691-Speed 9665.50 samples/sec   Loss 3.3605   LearningRate 0.0007   Epoch: 18   Global Step: 306270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:53,783-Speed 9381.35 samples/sec   Loss 3.3947   LearningRate 0.0007   Epoch: 18   Global Step: 306280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:54,927-Speed 8958.55 samples/sec   Loss 3.3515   LearningRate 0.0007   Epoch: 18   Global Step: 306290   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:57:56,001-Speed 9541.62 samples/sec   Loss 3.4248   LearningRate 0.0007   Epoch: 18   Global Step: 306300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:57,115-Speed 9190.31 samples/sec   Loss 3.3169   LearningRate 0.0007   Epoch: 18   Global Step: 306310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:58,220-Speed 9272.20 samples/sec   Loss 3.3823   LearningRate 0.0007   Epoch: 18   Global Step: 306320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:57:59,384-Speed 8806.89 samples/sec   Loss 3.3880   LearningRate 0.0007   Epoch: 18   Global Step: 306330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:00,498-Speed 9195.36 samples/sec   Loss 3.4622   LearningRate 0.0007   Epoch: 18   Global Step: 306340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:01,628-Speed 9071.05 samples/sec   Loss 3.3596   LearningRate 0.0007   Epoch: 18   Global Step: 306350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:02,799-Speed 8747.13 samples/sec   Loss 3.4109   LearningRate 0.0007   Epoch: 18   Global Step: 306360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:03,936-Speed 9012.54 samples/sec   Loss 3.4947   LearningRate 0.0007   Epoch: 18   Global Step: 306370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:05,066-Speed 9066.77 samples/sec   Loss 3.3391   LearningRate 0.0007   Epoch: 18   Global Step: 306380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:06,187-Speed 9140.81 samples/sec   Loss 3.4296   LearningRate 0.0007   Epoch: 18   Global Step: 306390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:07,300-Speed 9207.45 samples/sec   Loss 3.3994   LearningRate 0.0007   Epoch: 18   Global Step: 306400   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:58:08,428-Speed 9081.51 samples/sec   Loss 3.4832   LearningRate 0.0007   Epoch: 18   Global Step: 306410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:09,552-Speed 9116.91 samples/sec   Loss 3.3937   LearningRate 0.0007   Epoch: 18   Global Step: 306420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:10,700-Speed 8925.28 samples/sec   Loss 3.4050   LearningRate 0.0007   Epoch: 18   Global Step: 306430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:11,839-Speed 8994.95 samples/sec   Loss 3.4120   LearningRate 0.0007   Epoch: 18   Global Step: 306440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:12,968-Speed 9074.05 samples/sec   Loss 3.4323   LearningRate 0.0007   Epoch: 18   Global Step: 306450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:14,090-Speed 9132.33 samples/sec   Loss 3.3446   LearningRate 0.0007   Epoch: 18   Global Step: 306460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:15,219-Speed 9072.21 samples/sec   Loss 3.3240   LearningRate 0.0007   Epoch: 18   Global Step: 306470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:16,370-Speed 8905.89 samples/sec   Loss 3.4011   LearningRate 0.0007   Epoch: 18   Global Step: 306480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:17,492-Speed 9131.14 samples/sec   Loss 3.3854   LearningRate 0.0007   Epoch: 18   Global Step: 306490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:18,651-Speed 8836.80 samples/sec   Loss 3.4140   LearningRate 0.0007   Epoch: 18   Global Step: 306500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:19,750-Speed 9327.24 samples/sec   Loss 3.3876   LearningRate 0.0007   Epoch: 18   Global Step: 306510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:20,905-Speed 8864.13 samples/sec   Loss 3.3524   LearningRate 0.0007   Epoch: 18   Global Step: 306520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:22,038-Speed 9044.84 samples/sec   Loss 3.4336   LearningRate 0.0007   Epoch: 18   Global Step: 306530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:23,160-Speed 9132.21 samples/sec   Loss 3.4040   LearningRate 0.0007   Epoch: 18   Global Step: 306540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:24,311-Speed 8904.69 samples/sec   Loss 3.3784   LearningRate 0.0007   Epoch: 18   Global Step: 306550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:25,484-Speed 8731.81 samples/sec   Loss 3.3667   LearningRate 0.0007   Epoch: 18   Global Step: 306560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:26,590-Speed 9272.76 samples/sec   Loss 3.3530   LearningRate 0.0007   Epoch: 18   Global Step: 306570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:58:27,696-Speed 9266.73 samples/sec   Loss 3.4255   LearningRate 0.0007   Epoch: 18   Global Step: 306580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:58:28,805-Speed 9240.71 samples/sec   Loss 3.4596   LearningRate 0.0007   Epoch: 18   Global Step: 306590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:58:29,881-Speed 9522.54 samples/sec   Loss 3.3543   LearningRate 0.0007   Epoch: 18   Global Step: 306600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:58:30,992-Speed 9222.36 samples/sec   Loss 3.3631   LearningRate 0.0007   Epoch: 18   Global Step: 306610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:58:32,159-Speed 8778.64 samples/sec   Loss 3.3775   LearningRate 0.0007   Epoch: 18   Global Step: 306620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:58:33,252-Speed 9371.03 samples/sec   Loss 3.3581   LearningRate 0.0007   Epoch: 18   Global Step: 306630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:58:34,389-Speed 9015.12 samples/sec   Loss 3.3857   LearningRate 0.0007   Epoch: 18   Global Step: 306640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:58:35,496-Speed 9251.98 samples/sec   Loss 3.4346   LearningRate 0.0007   Epoch: 18   Global Step: 306650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:58:36,597-Speed 9308.26 samples/sec   Loss 3.3212   LearningRate 0.0007   Epoch: 18   Global Step: 306660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:58:37,690-Speed 9375.79 samples/sec   Loss 3.4033   LearningRate 0.0007   Epoch: 18   Global Step: 306670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:38,822-Speed 9049.35 samples/sec   Loss 3.3494   LearningRate 0.0007   Epoch: 18   Global Step: 306680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:39,976-Speed 8882.19 samples/sec   Loss 3.4318   LearningRate 0.0007   Epoch: 18   Global Step: 306690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:41,117-Speed 8978.20 samples/sec   Loss 3.3836   LearningRate 0.0007   Epoch: 18   Global Step: 306700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:42,287-Speed 8760.59 samples/sec   Loss 3.4302   LearningRate 0.0007   Epoch: 18   Global Step: 306710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:43,445-Speed 8846.66 samples/sec   Loss 3.3842   LearningRate 0.0007   Epoch: 18   Global Step: 306720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:44,526-Speed 9480.29 samples/sec   Loss 3.4215   LearningRate 0.0007   Epoch: 18   Global Step: 306730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:45,650-Speed 9114.27 samples/sec   Loss 3.3678   LearningRate 0.0007   Epoch: 18   Global Step: 306740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:46,775-Speed 9109.27 samples/sec   Loss 3.4069   LearningRate 0.0007   Epoch: 18   Global Step: 306750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:47,932-Speed 8853.21 samples/sec   Loss 3.3943   LearningRate 0.0007   Epoch: 18   Global Step: 306760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:49,011-Speed 9499.99 samples/sec   Loss 3.3105   LearningRate 0.0007   Epoch: 18   Global Step: 306770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:50,090-Speed 9491.94 samples/sec   Loss 3.3858   LearningRate 0.0007   Epoch: 18   Global Step: 306780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:51,181-Speed 9395.89 samples/sec   Loss 3.4237   LearningRate 0.0007   Epoch: 18   Global Step: 306790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:52,326-Speed 8947.18 samples/sec   Loss 3.4842   LearningRate 0.0007   Epoch: 18   Global Step: 306800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:53,431-Speed 9271.24 samples/sec   Loss 3.3930   LearningRate 0.0007   Epoch: 18   Global Step: 306810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:54,553-Speed 9130.41 samples/sec   Loss 3.3587   LearningRate 0.0007   Epoch: 18   Global Step: 306820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:55,719-Speed 8788.25 samples/sec   Loss 3.3404   LearningRate 0.0007   Epoch: 18   Global Step: 306830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:56,842-Speed 9123.03 samples/sec   Loss 3.3492   LearningRate 0.0007   Epoch: 18   Global Step: 306840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:58,042-Speed 8535.96 samples/sec   Loss 3.3698   LearningRate 0.0007   Epoch: 18   Global Step: 306850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:58:59,143-Speed 9312.56 samples/sec   Loss 3.4575   LearningRate 0.0007   Epoch: 18   Global Step: 306860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:00,279-Speed 9022.90 samples/sec   Loss 3.4762   LearningRate 0.0007   Epoch: 18   Global Step: 306870   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:59:01,354-Speed 9525.66 samples/sec   Loss 3.3771   LearningRate 0.0007   Epoch: 18   Global Step: 306880   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:59:02,483-Speed 9076.65 samples/sec   Loss 3.4547   LearningRate 0.0007   Epoch: 18   Global Step: 306890   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:59:03,564-Speed 9475.42 samples/sec   Loss 3.4329   LearningRate 0.0007   Epoch: 18   Global Step: 306900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:04,642-Speed 9508.81 samples/sec   Loss 3.4286   LearningRate 0.0006   Epoch: 18   Global Step: 306910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:05,735-Speed 9369.88 samples/sec   Loss 3.3764   LearningRate 0.0006   Epoch: 18   Global Step: 306920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:06,865-Speed 9071.16 samples/sec   Loss 3.3548   LearningRate 0.0006   Epoch: 18   Global Step: 306930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:07,965-Speed 9315.70 samples/sec   Loss 3.4162   LearningRate 0.0006   Epoch: 18   Global Step: 306940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:09,142-Speed 8698.49 samples/sec   Loss 3.3501   LearningRate 0.0006   Epoch: 18   Global Step: 306950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:10,274-Speed 9054.11 samples/sec   Loss 3.4546   LearningRate 0.0006   Epoch: 18   Global Step: 306960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:11,369-Speed 9359.54 samples/sec   Loss 3.4090   LearningRate 0.0006   Epoch: 18   Global Step: 306970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:12,588-Speed 8405.81 samples/sec   Loss 3.3488   LearningRate 0.0006   Epoch: 18   Global Step: 306980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:13,687-Speed 9322.05 samples/sec   Loss 3.4266   LearningRate 0.0006   Epoch: 18   Global Step: 306990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:14,797-Speed 9225.93 samples/sec   Loss 3.4216   LearningRate 0.0006   Epoch: 18   Global Step: 307000   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:59:15,949-Speed 8892.47 samples/sec   Loss 3.3544   LearningRate 0.0006   Epoch: 18   Global Step: 307010   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:59:17,071-Speed 9137.71 samples/sec   Loss 3.3928   LearningRate 0.0006   Epoch: 18   Global Step: 307020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:18,264-Speed 8587.12 samples/sec   Loss 3.3805   LearningRate 0.0006   Epoch: 18   Global Step: 307030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:19,437-Speed 8736.49 samples/sec   Loss 3.4299   LearningRate 0.0006   Epoch: 18   Global Step: 307040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:20,540-Speed 9288.47 samples/sec   Loss 3.3929   LearningRate 0.0006   Epoch: 18   Global Step: 307050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:21,642-Speed 9302.41 samples/sec   Loss 3.4158   LearningRate 0.0006   Epoch: 18   Global Step: 307060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:22,759-Speed 9169.54 samples/sec   Loss 3.3853   LearningRate 0.0006   Epoch: 18   Global Step: 307070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:23,859-Speed 9313.29 samples/sec   Loss 3.3975   LearningRate 0.0006   Epoch: 18   Global Step: 307080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:25,033-Speed 8727.84 samples/sec   Loss 3.4023   LearningRate 0.0006   Epoch: 18   Global Step: 307090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:26,149-Speed 9187.94 samples/sec   Loss 3.3708   LearningRate 0.0006   Epoch: 18   Global Step: 307100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:27,284-Speed 9023.57 samples/sec   Loss 3.3544   LearningRate 0.0006   Epoch: 18   Global Step: 307110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:28,415-Speed 9063.22 samples/sec   Loss 3.4066   LearningRate 0.0006   Epoch: 18   Global Step: 307120   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:59:29,563-Speed 8925.66 samples/sec   Loss 3.3643   LearningRate 0.0006   Epoch: 18   Global Step: 307130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:30,663-Speed 9312.55 samples/sec   Loss 3.4355   LearningRate 0.0006   Epoch: 18   Global Step: 307140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:31,783-Speed 9148.69 samples/sec   Loss 3.3258   LearningRate 0.0006   Epoch: 18   Global Step: 307150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:32,857-Speed 9539.50 samples/sec   Loss 3.3848   LearningRate 0.0006   Epoch: 18   Global Step: 307160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:33,968-Speed 9221.26 samples/sec   Loss 3.4385   LearningRate 0.0006   Epoch: 18   Global Step: 307170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:35,083-Speed 9193.39 samples/sec   Loss 3.4125   LearningRate 0.0006   Epoch: 18   Global Step: 307180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:59:36,205-Speed 9129.69 samples/sec   Loss 3.3515   LearningRate 0.0006   Epoch: 18   Global Step: 307190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:59:37,334-Speed 9076.30 samples/sec   Loss 3.4474   LearningRate 0.0006   Epoch: 18   Global Step: 307200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:59:38,439-Speed 9273.94 samples/sec   Loss 3.4138   LearningRate 0.0006   Epoch: 18   Global Step: 307210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:59:39,584-Speed 8948.58 samples/sec   Loss 3.3543   LearningRate 0.0006   Epoch: 18   Global Step: 307220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:59:40,703-Speed 9156.51 samples/sec   Loss 3.4269   LearningRate 0.0006   Epoch: 18   Global Step: 307230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:59:41,773-Speed 9574.11 samples/sec   Loss 3.4338   LearningRate 0.0006   Epoch: 18   Global Step: 307240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:59:42,873-Speed 9317.72 samples/sec   Loss 3.3865   LearningRate 0.0006   Epoch: 18   Global Step: 307250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:59:44,003-Speed 9069.76 samples/sec   Loss 3.4067   LearningRate 0.0006   Epoch: 18   Global Step: 307260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:59:45,106-Speed 9283.51 samples/sec   Loss 3.4248   LearningRate 0.0006   Epoch: 18   Global Step: 307270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-11 23:59:46,205-Speed 9326.35 samples/sec   Loss 3.4077   LearningRate 0.0006   Epoch: 18   Global Step: 307280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:47,379-Speed 8727.98 samples/sec   Loss 3.3700   LearningRate 0.0006   Epoch: 18   Global Step: 307290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:48,501-Speed 9135.48 samples/sec   Loss 3.4622   LearningRate 0.0006   Epoch: 18   Global Step: 307300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:49,674-Speed 8733.53 samples/sec   Loss 3.4300   LearningRate 0.0006   Epoch: 18   Global Step: 307310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:50,803-Speed 9072.03 samples/sec   Loss 3.4957   LearningRate 0.0006   Epoch: 18   Global Step: 307320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:51,900-Speed 9341.48 samples/sec   Loss 3.4258   LearningRate 0.0006   Epoch: 18   Global Step: 307330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:53,003-Speed 9289.27 samples/sec   Loss 3.4140   LearningRate 0.0006   Epoch: 18   Global Step: 307340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:54,141-Speed 9004.56 samples/sec   Loss 3.3626   LearningRate 0.0006   Epoch: 18   Global Step: 307350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:55,245-Speed 9280.91 samples/sec   Loss 3.4755   LearningRate 0.0006   Epoch: 18   Global Step: 307360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:56,393-Speed 8925.32 samples/sec   Loss 3.4503   LearningRate 0.0006   Epoch: 18   Global Step: 307370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:57,524-Speed 9056.92 samples/sec   Loss 3.3802   LearningRate 0.0006   Epoch: 18   Global Step: 307380   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-11 23:59:58,632-Speed 9245.68 samples/sec   Loss 3.3606   LearningRate 0.0006   Epoch: 18   Global Step: 307390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-11 23:59:59,698-Speed 9618.87 samples/sec   Loss 3.3108   LearningRate 0.0006   Epoch: 18   Global Step: 307400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:00,832-Speed 9032.96 samples/sec   Loss 3.3067   LearningRate 0.0006   Epoch: 18   Global Step: 307410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:01,996-Speed 8805.46 samples/sec   Loss 3.4093   LearningRate 0.0006   Epoch: 18   Global Step: 307420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:03,129-Speed 9041.32 samples/sec   Loss 3.3544   LearningRate 0.0006   Epoch: 18   Global Step: 307430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:04,197-Speed 9594.38 samples/sec   Loss 3.3304   LearningRate 0.0006   Epoch: 18   Global Step: 307440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:05,306-Speed 9232.17 samples/sec   Loss 3.3565   LearningRate 0.0006   Epoch: 18   Global Step: 307450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:06,404-Speed 9333.08 samples/sec   Loss 3.3852   LearningRate 0.0006   Epoch: 18   Global Step: 307460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:07,482-Speed 9503.60 samples/sec   Loss 3.3756   LearningRate 0.0006   Epoch: 18   Global Step: 307470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:08,622-Speed 8992.10 samples/sec   Loss 3.3794   LearningRate 0.0006   Epoch: 18   Global Step: 307480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:09,756-Speed 9031.48 samples/sec   Loss 3.4016   LearningRate 0.0006   Epoch: 18   Global Step: 307490   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:00:10,857-Speed 9309.79 samples/sec   Loss 3.4238   LearningRate 0.0006   Epoch: 18   Global Step: 307500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:11,968-Speed 9223.73 samples/sec   Loss 3.4229   LearningRate 0.0006   Epoch: 18   Global Step: 307510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:13,142-Speed 8725.99 samples/sec   Loss 3.3778   LearningRate 0.0006   Epoch: 18   Global Step: 307520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:14,272-Speed 9066.25 samples/sec   Loss 3.4432   LearningRate 0.0006   Epoch: 18   Global Step: 307530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:15,440-Speed 8769.65 samples/sec   Loss 3.4077   LearningRate 0.0006   Epoch: 18   Global Step: 307540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:16,561-Speed 9153.51 samples/sec   Loss 3.3027   LearningRate 0.0006   Epoch: 18   Global Step: 307550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:17,688-Speed 9090.77 samples/sec   Loss 3.3415   LearningRate 0.0006   Epoch: 18   Global Step: 307560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:18,824-Speed 9013.57 samples/sec   Loss 3.4750   LearningRate 0.0006   Epoch: 18   Global Step: 307570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:19,937-Speed 9210.18 samples/sec   Loss 3.4720   LearningRate 0.0006   Epoch: 18   Global Step: 307580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:21,079-Speed 8968.08 samples/sec   Loss 3.4313   LearningRate 0.0006   Epoch: 18   Global Step: 307590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:22,177-Speed 9333.87 samples/sec   Loss 3.4334   LearningRate 0.0006   Epoch: 18   Global Step: 307600   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:00:23,274-Speed 9339.55 samples/sec   Loss 3.4670   LearningRate 0.0006   Epoch: 18   Global Step: 307610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:24,475-Speed 8527.85 samples/sec   Loss 3.4407   LearningRate 0.0006   Epoch: 18   Global Step: 307620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:25,632-Speed 8860.65 samples/sec   Loss 3.3916   LearningRate 0.0006   Epoch: 18   Global Step: 307630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:26,767-Speed 9026.35 samples/sec   Loss 3.4088   LearningRate 0.0006   Epoch: 18   Global Step: 307640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:27,876-Speed 9237.26 samples/sec   Loss 3.3680   LearningRate 0.0006   Epoch: 18   Global Step: 307650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:28,988-Speed 9216.35 samples/sec   Loss 3.3818   LearningRate 0.0006   Epoch: 18   Global Step: 307660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:30,114-Speed 9104.08 samples/sec   Loss 3.3017   LearningRate 0.0006   Epoch: 18   Global Step: 307670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:31,206-Speed 9382.07 samples/sec   Loss 3.3801   LearningRate 0.0006   Epoch: 18   Global Step: 307680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:32,368-Speed 8817.07 samples/sec   Loss 3.3742   LearningRate 0.0006   Epoch: 18   Global Step: 307690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:33,507-Speed 9001.02 samples/sec   Loss 3.4493   LearningRate 0.0006   Epoch: 18   Global Step: 307700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:34,652-Speed 8945.20 samples/sec   Loss 3.3702   LearningRate 0.0006   Epoch: 18   Global Step: 307710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:35,799-Speed 8937.59 samples/sec   Loss 3.3746   LearningRate 0.0006   Epoch: 18   Global Step: 307720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:36,961-Speed 8817.85 samples/sec   Loss 3.3848   LearningRate 0.0006   Epoch: 18   Global Step: 307730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:38,125-Speed 8799.64 samples/sec   Loss 3.4232   LearningRate 0.0006   Epoch: 18   Global Step: 307740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:39,243-Speed 9165.05 samples/sec   Loss 3.4329   LearningRate 0.0006   Epoch: 18   Global Step: 307750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:40,366-Speed 9131.99 samples/sec   Loss 3.4704   LearningRate 0.0006   Epoch: 18   Global Step: 307760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:41,509-Speed 8959.29 samples/sec   Loss 3.4359   LearningRate 0.0006   Epoch: 18   Global Step: 307770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:42,704-Speed 8574.66 samples/sec   Loss 3.3418   LearningRate 0.0006   Epoch: 18   Global Step: 307780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:43,840-Speed 9023.39 samples/sec   Loss 3.3587   LearningRate 0.0006   Epoch: 18   Global Step: 307790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:44,956-Speed 9179.74 samples/sec   Loss 3.3697   LearningRate 0.0006   Epoch: 18   Global Step: 307800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:46,058-Speed 9298.66 samples/sec   Loss 3.4938   LearningRate 0.0006   Epoch: 18   Global Step: 307810   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:00:47,180-Speed 9128.70 samples/sec   Loss 3.3806   LearningRate 0.0006   Epoch: 18   Global Step: 307820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:48,288-Speed 9248.79 samples/sec   Loss 3.4385   LearningRate 0.0006   Epoch: 18   Global Step: 307830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:49,438-Speed 8909.03 samples/sec   Loss 3.3561   LearningRate 0.0006   Epoch: 18   Global Step: 307840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:50,593-Speed 8874.53 samples/sec   Loss 3.3686   LearningRate 0.0006   Epoch: 18   Global Step: 307850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:51,768-Speed 8718.58 samples/sec   Loss 3.4753   LearningRate 0.0006   Epoch: 18   Global Step: 307860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:52,864-Speed 9346.72 samples/sec   Loss 3.4398   LearningRate 0.0006   Epoch: 18   Global Step: 307870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:54,016-Speed 8898.07 samples/sec   Loss 3.3418   LearningRate 0.0006   Epoch: 18   Global Step: 307880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:55,150-Speed 9030.42 samples/sec   Loss 3.4300   LearningRate 0.0006   Epoch: 18   Global Step: 307890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:56,264-Speed 9197.04 samples/sec   Loss 3.4364   LearningRate 0.0006   Epoch: 18   Global Step: 307900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:57,422-Speed 8851.80 samples/sec   Loss 3.3712   LearningRate 0.0006   Epoch: 18   Global Step: 307910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:00:58,562-Speed 8986.74 samples/sec   Loss 3.4119   LearningRate 0.0006   Epoch: 18   Global Step: 307920   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:00:59,711-Speed 8916.54 samples/sec   Loss 3.4702   LearningRate 0.0006   Epoch: 18   Global Step: 307930   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:01:00,850-Speed 8995.14 samples/sec   Loss 3.4589   LearningRate 0.0006   Epoch: 18   Global Step: 307940   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:01:01,941-Speed 9389.76 samples/sec   Loss 3.4644   LearningRate 0.0006   Epoch: 18   Global Step: 307950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:01:03,038-Speed 9339.57 samples/sec   Loss 3.3837   LearningRate 0.0006   Epoch: 18   Global Step: 307960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:01:04,226-Speed 8623.96 samples/sec   Loss 3.3939   LearningRate 0.0006   Epoch: 18   Global Step: 307970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:01:05,311-Speed 9442.35 samples/sec   Loss 3.4082   LearningRate 0.0006   Epoch: 18   Global Step: 307980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:01:06,383-Speed 9561.12 samples/sec   Loss 3.3908   LearningRate 0.0006   Epoch: 18   Global Step: 307990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:01:07,495-Speed 9215.54 samples/sec   Loss 3.4820   LearningRate 0.0006   Epoch: 18   Global Step: 308000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:01:29,593-[lfw][308000]XNorm: 6.600562
Training: 2022-04-12 00:01:29,594-[lfw][308000]Accuracy-Flip: 0.99733+-0.00291
Training: 2022-04-12 00:01:29,595-[lfw][308000]Accuracy-Highest: 0.99750
Training: 2022-04-12 00:01:55,072-[cfp_fp][308000]XNorm: 5.769856
Training: 2022-04-12 00:01:55,073-[cfp_fp][308000]Accuracy-Flip: 0.97200+-0.00874
Training: 2022-04-12 00:01:55,074-[cfp_fp][308000]Accuracy-Highest: 0.97386
Training: 2022-04-12 00:02:17,045-[agedb_30][308000]XNorm: 6.437475
Training: 2022-04-12 00:02:17,046-[agedb_30][308000]Accuracy-Flip: 0.97117+-0.00885
Training: 2022-04-12 00:02:17,047-[agedb_30][308000]Accuracy-Highest: 0.97417
Training: 2022-04-12 00:02:18,165-Speed 144.90 samples/sec   Loss 3.4228   LearningRate 0.0006   Epoch: 18   Global Step: 308010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:19,298-Speed 9041.75 samples/sec   Loss 3.4025   LearningRate 0.0006   Epoch: 18   Global Step: 308020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:20,459-Speed 8830.24 samples/sec   Loss 3.3388   LearningRate 0.0006   Epoch: 18   Global Step: 308030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:21,614-Speed 8871.92 samples/sec   Loss 3.3906   LearningRate 0.0006   Epoch: 18   Global Step: 308040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:22,749-Speed 9024.42 samples/sec   Loss 3.4121   LearningRate 0.0006   Epoch: 18   Global Step: 308050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:23,852-Speed 9283.58 samples/sec   Loss 3.3815   LearningRate 0.0006   Epoch: 18   Global Step: 308060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:24,966-Speed 9202.35 samples/sec   Loss 3.3398   LearningRate 0.0006   Epoch: 18   Global Step: 308070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:26,075-Speed 9242.28 samples/sec   Loss 3.4447   LearningRate 0.0006   Epoch: 18   Global Step: 308080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:27,209-Speed 9034.82 samples/sec   Loss 3.3266   LearningRate 0.0006   Epoch: 18   Global Step: 308090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:28,369-Speed 8829.52 samples/sec   Loss 3.3938   LearningRate 0.0006   Epoch: 18   Global Step: 308100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:29,508-Speed 8999.29 samples/sec   Loss 3.4080   LearningRate 0.0006   Epoch: 18   Global Step: 308110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:30,610-Speed 9289.22 samples/sec   Loss 3.3486   LearningRate 0.0006   Epoch: 18   Global Step: 308120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:31,732-Speed 9134.87 samples/sec   Loss 3.3141   LearningRate 0.0006   Epoch: 18   Global Step: 308130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:32,809-Speed 9513.20 samples/sec   Loss 3.3830   LearningRate 0.0006   Epoch: 18   Global Step: 308140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:34,009-Speed 8537.98 samples/sec   Loss 3.4113   LearningRate 0.0006   Epoch: 18   Global Step: 308150   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:02:35,101-Speed 9386.99 samples/sec   Loss 3.4685   LearningRate 0.0006   Epoch: 18   Global Step: 308160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:36,246-Speed 8947.78 samples/sec   Loss 3.3790   LearningRate 0.0006   Epoch: 18   Global Step: 308170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:37,376-Speed 9067.58 samples/sec   Loss 3.3956   LearningRate 0.0006   Epoch: 18   Global Step: 308180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:38,546-Speed 8750.10 samples/sec   Loss 3.3247   LearningRate 0.0006   Epoch: 18   Global Step: 308190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:39,699-Speed 8893.25 samples/sec   Loss 3.3102   LearningRate 0.0006   Epoch: 18   Global Step: 308200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:40,820-Speed 9147.09 samples/sec   Loss 3.4757   LearningRate 0.0006   Epoch: 18   Global Step: 308210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:41,942-Speed 9125.28 samples/sec   Loss 3.3626   LearningRate 0.0006   Epoch: 18   Global Step: 308220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:43,066-Speed 9118.34 samples/sec   Loss 3.3981   LearningRate 0.0006   Epoch: 18   Global Step: 308230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:44,175-Speed 9235.90 samples/sec   Loss 3.3662   LearningRate 0.0006   Epoch: 18   Global Step: 308240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:45,344-Speed 8768.16 samples/sec   Loss 3.4541   LearningRate 0.0006   Epoch: 18   Global Step: 308250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:46,448-Speed 9274.26 samples/sec   Loss 3.3757   LearningRate 0.0006   Epoch: 18   Global Step: 308260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:47,520-Speed 9566.25 samples/sec   Loss 3.3925   LearningRate 0.0006   Epoch: 18   Global Step: 308270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:48,655-Speed 9028.16 samples/sec   Loss 3.3900   LearningRate 0.0006   Epoch: 18   Global Step: 308280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:49,766-Speed 9230.01 samples/sec   Loss 3.3932   LearningRate 0.0006   Epoch: 18   Global Step: 308290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:50,912-Speed 8937.47 samples/sec   Loss 3.4042   LearningRate 0.0006   Epoch: 18   Global Step: 308300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:52,037-Speed 9112.00 samples/sec   Loss 3.3548   LearningRate 0.0006   Epoch: 18   Global Step: 308310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:53,127-Speed 9400.13 samples/sec   Loss 3.3573   LearningRate 0.0006   Epoch: 18   Global Step: 308320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:54,237-Speed 9230.13 samples/sec   Loss 3.4003   LearningRate 0.0006   Epoch: 18   Global Step: 308330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:55,382-Speed 8943.13 samples/sec   Loss 3.4222   LearningRate 0.0006   Epoch: 18   Global Step: 308340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:56,509-Speed 9092.28 samples/sec   Loss 3.4507   LearningRate 0.0006   Epoch: 18   Global Step: 308350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:57,639-Speed 9063.51 samples/sec   Loss 3.4380   LearningRate 0.0006   Epoch: 18   Global Step: 308360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:58,756-Speed 9174.01 samples/sec   Loss 3.3916   LearningRate 0.0006   Epoch: 18   Global Step: 308370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:02:59,877-Speed 9151.60 samples/sec   Loss 3.3425   LearningRate 0.0006   Epoch: 18   Global Step: 308380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:01,103-Speed 8357.18 samples/sec   Loss 3.3736   LearningRate 0.0006   Epoch: 18   Global Step: 308390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:02,207-Speed 9280.48 samples/sec   Loss 3.3988   LearningRate 0.0006   Epoch: 18   Global Step: 308400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:03,315-Speed 9247.04 samples/sec   Loss 3.5148   LearningRate 0.0006   Epoch: 18   Global Step: 308410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:04,418-Speed 9286.00 samples/sec   Loss 3.3851   LearningRate 0.0006   Epoch: 18   Global Step: 308420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:05,502-Speed 9455.93 samples/sec   Loss 3.3227   LearningRate 0.0006   Epoch: 18   Global Step: 308430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:06,609-Speed 9255.58 samples/sec   Loss 3.4220   LearningRate 0.0006   Epoch: 18   Global Step: 308440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:07,727-Speed 9162.11 samples/sec   Loss 3.3923   LearningRate 0.0006   Epoch: 18   Global Step: 308450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:08,895-Speed 8766.49 samples/sec   Loss 3.3496   LearningRate 0.0006   Epoch: 18   Global Step: 308460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:09,998-Speed 9293.13 samples/sec   Loss 3.4676   LearningRate 0.0006   Epoch: 18   Global Step: 308470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:11,147-Speed 8917.14 samples/sec   Loss 3.4449   LearningRate 0.0006   Epoch: 18   Global Step: 308480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:12,260-Speed 9205.56 samples/sec   Loss 3.3921   LearningRate 0.0006   Epoch: 18   Global Step: 308490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:13,352-Speed 9384.63 samples/sec   Loss 3.3223   LearningRate 0.0006   Epoch: 18   Global Step: 308500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:14,512-Speed 8834.74 samples/sec   Loss 3.3187   LearningRate 0.0006   Epoch: 18   Global Step: 308510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:15,618-Speed 9263.99 samples/sec   Loss 3.4373   LearningRate 0.0006   Epoch: 18   Global Step: 308520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:16,714-Speed 9350.61 samples/sec   Loss 3.4614   LearningRate 0.0006   Epoch: 18   Global Step: 308530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:17,788-Speed 9542.37 samples/sec   Loss 3.3914   LearningRate 0.0006   Epoch: 18   Global Step: 308540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:18,917-Speed 9075.82 samples/sec   Loss 3.3779   LearningRate 0.0006   Epoch: 18   Global Step: 308550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:20,000-Speed 9459.71 samples/sec   Loss 3.4682   LearningRate 0.0006   Epoch: 18   Global Step: 308560   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:03:21,105-Speed 9264.55 samples/sec   Loss 3.4281   LearningRate 0.0006   Epoch: 18   Global Step: 308570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:22,270-Speed 8801.74 samples/sec   Loss 3.3936   LearningRate 0.0006   Epoch: 18   Global Step: 308580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:23,420-Speed 8905.20 samples/sec   Loss 3.4848   LearningRate 0.0006   Epoch: 18   Global Step: 308590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:24,521-Speed 9306.19 samples/sec   Loss 3.3866   LearningRate 0.0006   Epoch: 18   Global Step: 308600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:25,654-Speed 9043.68 samples/sec   Loss 3.4017   LearningRate 0.0006   Epoch: 18   Global Step: 308610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:26,708-Speed 9718.79 samples/sec   Loss 3.3593   LearningRate 0.0006   Epoch: 18   Global Step: 308620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:27,894-Speed 8636.09 samples/sec   Loss 3.4282   LearningRate 0.0006   Epoch: 18   Global Step: 308630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:28,943-Speed 9772.84 samples/sec   Loss 3.4553   LearningRate 0.0006   Epoch: 18   Global Step: 308640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:30,030-Speed 9430.25 samples/sec   Loss 3.2621   LearningRate 0.0006   Epoch: 18   Global Step: 308650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:31,218-Speed 8620.91 samples/sec   Loss 3.4332   LearningRate 0.0006   Epoch: 18   Global Step: 308660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:03:32,382-Speed 8802.59 samples/sec   Loss 3.4023   LearningRate 0.0006   Epoch: 18   Global Step: 308670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:03:33,551-Speed 8766.95 samples/sec   Loss 3.4333   LearningRate 0.0006   Epoch: 18   Global Step: 308680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:03:34,655-Speed 9277.96 samples/sec   Loss 3.3433   LearningRate 0.0006   Epoch: 18   Global Step: 308690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:03:35,799-Speed 8953.55 samples/sec   Loss 3.4183   LearningRate 0.0006   Epoch: 18   Global Step: 308700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:03:36,973-Speed 8732.38 samples/sec   Loss 3.3650   LearningRate 0.0006   Epoch: 18   Global Step: 308710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:03:38,112-Speed 8996.72 samples/sec   Loss 3.3194   LearningRate 0.0006   Epoch: 18   Global Step: 308720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:03:39,231-Speed 9150.13 samples/sec   Loss 3.3726   LearningRate 0.0006   Epoch: 18   Global Step: 308730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:03:40,310-Speed 9496.55 samples/sec   Loss 3.3565   LearningRate 0.0006   Epoch: 18   Global Step: 308740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:03:41,400-Speed 9405.73 samples/sec   Loss 3.4145   LearningRate 0.0006   Epoch: 18   Global Step: 308750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:03:42,522-Speed 9127.94 samples/sec   Loss 3.4261   LearningRate 0.0006   Epoch: 18   Global Step: 308760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:43,659-Speed 9011.84 samples/sec   Loss 3.3515   LearningRate 0.0006   Epoch: 18   Global Step: 308770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:44,773-Speed 9198.86 samples/sec   Loss 3.3626   LearningRate 0.0006   Epoch: 18   Global Step: 308780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:45,898-Speed 9101.49 samples/sec   Loss 3.3297   LearningRate 0.0006   Epoch: 18   Global Step: 308790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:47,004-Speed 9281.95 samples/sec   Loss 3.4031   LearningRate 0.0006   Epoch: 18   Global Step: 308800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:48,110-Speed 9262.12 samples/sec   Loss 3.3286   LearningRate 0.0006   Epoch: 18   Global Step: 308810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:49,269-Speed 8836.64 samples/sec   Loss 3.2943   LearningRate 0.0006   Epoch: 18   Global Step: 308820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:50,372-Speed 9292.03 samples/sec   Loss 3.4043   LearningRate 0.0006   Epoch: 18   Global Step: 308830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:51,572-Speed 8538.52 samples/sec   Loss 3.3971   LearningRate 0.0006   Epoch: 18   Global Step: 308840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:52,697-Speed 9104.39 samples/sec   Loss 3.3602   LearningRate 0.0006   Epoch: 18   Global Step: 308850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:53,811-Speed 9197.86 samples/sec   Loss 3.3535   LearningRate 0.0006   Epoch: 18   Global Step: 308860   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:03:54,914-Speed 9290.60 samples/sec   Loss 3.4528   LearningRate 0.0006   Epoch: 18   Global Step: 308870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:56,049-Speed 9033.80 samples/sec   Loss 3.3767   LearningRate 0.0006   Epoch: 18   Global Step: 308880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:57,216-Speed 8778.80 samples/sec   Loss 3.3510   LearningRate 0.0006   Epoch: 18   Global Step: 308890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:58,283-Speed 9598.31 samples/sec   Loss 3.4209   LearningRate 0.0006   Epoch: 18   Global Step: 308900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:03:59,395-Speed 9225.47 samples/sec   Loss 3.3231   LearningRate 0.0006   Epoch: 18   Global Step: 308910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:00,529-Speed 9035.93 samples/sec   Loss 3.4320   LearningRate 0.0006   Epoch: 18   Global Step: 308920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:01,659-Speed 9061.96 samples/sec   Loss 3.3378   LearningRate 0.0006   Epoch: 18   Global Step: 308930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:02,772-Speed 9208.82 samples/sec   Loss 3.3820   LearningRate 0.0006   Epoch: 18   Global Step: 308940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:03,845-Speed 9550.27 samples/sec   Loss 3.2783   LearningRate 0.0006   Epoch: 18   Global Step: 308950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:04,911-Speed 9602.97 samples/sec   Loss 3.4185   LearningRate 0.0006   Epoch: 18   Global Step: 308960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:06,003-Speed 9390.01 samples/sec   Loss 3.3673   LearningRate 0.0006   Epoch: 18   Global Step: 308970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:07,118-Speed 9183.78 samples/sec   Loss 3.4431   LearningRate 0.0006   Epoch: 18   Global Step: 308980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:08,273-Speed 8872.01 samples/sec   Loss 3.3946   LearningRate 0.0006   Epoch: 18   Global Step: 308990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:09,447-Speed 8730.50 samples/sec   Loss 3.4017   LearningRate 0.0006   Epoch: 18   Global Step: 309000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:10,579-Speed 9055.60 samples/sec   Loss 3.3694   LearningRate 0.0006   Epoch: 18   Global Step: 309010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:11,756-Speed 8705.39 samples/sec   Loss 3.4015   LearningRate 0.0006   Epoch: 18   Global Step: 309020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:12,844-Speed 9418.95 samples/sec   Loss 3.4428   LearningRate 0.0006   Epoch: 18   Global Step: 309030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:13,997-Speed 8888.44 samples/sec   Loss 3.3913   LearningRate 0.0006   Epoch: 18   Global Step: 309040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:15,181-Speed 8649.97 samples/sec   Loss 3.4077   LearningRate 0.0006   Epoch: 18   Global Step: 309050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:16,314-Speed 9044.82 samples/sec   Loss 3.3893   LearningRate 0.0006   Epoch: 18   Global Step: 309060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:17,428-Speed 9200.76 samples/sec   Loss 3.3302   LearningRate 0.0005   Epoch: 18   Global Step: 309070   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:04:18,546-Speed 9168.97 samples/sec   Loss 3.3769   LearningRate 0.0005   Epoch: 18   Global Step: 309080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:19,635-Speed 9407.55 samples/sec   Loss 3.4297   LearningRate 0.0005   Epoch: 18   Global Step: 309090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:20,704-Speed 9579.57 samples/sec   Loss 3.3559   LearningRate 0.0005   Epoch: 18   Global Step: 309100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:21,835-Speed 9060.40 samples/sec   Loss 3.4342   LearningRate 0.0005   Epoch: 18   Global Step: 309110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:22,945-Speed 9230.93 samples/sec   Loss 3.4007   LearningRate 0.0005   Epoch: 18   Global Step: 309120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:24,101-Speed 8863.86 samples/sec   Loss 3.4969   LearningRate 0.0005   Epoch: 18   Global Step: 309130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:25,242-Speed 8980.87 samples/sec   Loss 3.5102   LearningRate 0.0005   Epoch: 18   Global Step: 309140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:26,356-Speed 9195.45 samples/sec   Loss 3.3918   LearningRate 0.0005   Epoch: 18   Global Step: 309150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:27,497-Speed 8976.74 samples/sec   Loss 3.3888   LearningRate 0.0005   Epoch: 18   Global Step: 309160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:28,622-Speed 9113.08 samples/sec   Loss 3.3039   LearningRate 0.0005   Epoch: 18   Global Step: 309170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:29,714-Speed 9379.36 samples/sec   Loss 3.3498   LearningRate 0.0005   Epoch: 18   Global Step: 309180   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:04:30,830-Speed 9181.36 samples/sec   Loss 3.3782   LearningRate 0.0005   Epoch: 18   Global Step: 309190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:31,962-Speed 9056.44 samples/sec   Loss 3.3524   LearningRate 0.0005   Epoch: 18   Global Step: 309200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:33,080-Speed 9165.77 samples/sec   Loss 3.3770   LearningRate 0.0005   Epoch: 18   Global Step: 309210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:34,147-Speed 9606.76 samples/sec   Loss 3.4835   LearningRate 0.0005   Epoch: 18   Global Step: 309220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:35,275-Speed 9076.17 samples/sec   Loss 3.3157   LearningRate 0.0005   Epoch: 18   Global Step: 309230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:36,387-Speed 9212.16 samples/sec   Loss 3.4205   LearningRate 0.0005   Epoch: 18   Global Step: 309240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:37,530-Speed 8964.18 samples/sec   Loss 3.3836   LearningRate 0.0005   Epoch: 18   Global Step: 309250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:38,713-Speed 8666.70 samples/sec   Loss 3.4116   LearningRate 0.0005   Epoch: 18   Global Step: 309260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:39,836-Speed 9119.07 samples/sec   Loss 3.3603   LearningRate 0.0005   Epoch: 18   Global Step: 309270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:40,946-Speed 9237.24 samples/sec   Loss 3.4493   LearningRate 0.0005   Epoch: 18   Global Step: 309280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:42,093-Speed 8933.26 samples/sec   Loss 3.4121   LearningRate 0.0005   Epoch: 18   Global Step: 309290   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:04:43,191-Speed 9329.13 samples/sec   Loss 3.3369   LearningRate 0.0005   Epoch: 18   Global Step: 309300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:44,284-Speed 9379.51 samples/sec   Loss 3.3650   LearningRate 0.0005   Epoch: 18   Global Step: 309310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:45,464-Speed 8682.56 samples/sec   Loss 3.4680   LearningRate 0.0005   Epoch: 18   Global Step: 309320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:46,565-Speed 9302.05 samples/sec   Loss 3.4165   LearningRate 0.0005   Epoch: 18   Global Step: 309330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:04:47,673-Speed 9256.63 samples/sec   Loss 3.4130   LearningRate 0.0005   Epoch: 18   Global Step: 309340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:04:48,797-Speed 9109.21 samples/sec   Loss 3.4785   LearningRate 0.0005   Epoch: 18   Global Step: 309350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:04:49,950-Speed 8883.81 samples/sec   Loss 3.3846   LearningRate 0.0005   Epoch: 18   Global Step: 309360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:04:51,066-Speed 9186.81 samples/sec   Loss 3.5348   LearningRate 0.0005   Epoch: 18   Global Step: 309370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:04:52,169-Speed 9291.24 samples/sec   Loss 3.4757   LearningRate 0.0005   Epoch: 18   Global Step: 309380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:04:53,321-Speed 8893.73 samples/sec   Loss 3.2976   LearningRate 0.0005   Epoch: 18   Global Step: 309390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:04:54,400-Speed 9488.57 samples/sec   Loss 3.4126   LearningRate 0.0005   Epoch: 18   Global Step: 309400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:04:55,493-Speed 9381.42 samples/sec   Loss 3.3891   LearningRate 0.0005   Epoch: 18   Global Step: 309410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:04:56,638-Speed 8947.52 samples/sec   Loss 3.3719   LearningRate 0.0005   Epoch: 18   Global Step: 309420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:04:57,777-Speed 8994.55 samples/sec   Loss 3.3545   LearningRate 0.0005   Epoch: 18   Global Step: 309430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:04:58,892-Speed 9183.92 samples/sec   Loss 3.4334   LearningRate 0.0005   Epoch: 18   Global Step: 309440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:00,047-Speed 8873.57 samples/sec   Loss 3.3293   LearningRate 0.0005   Epoch: 18   Global Step: 309450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:01,174-Speed 9091.64 samples/sec   Loss 3.3641   LearningRate 0.0005   Epoch: 18   Global Step: 309460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:02,300-Speed 9096.75 samples/sec   Loss 3.3532   LearningRate 0.0005   Epoch: 18   Global Step: 309470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:03,455-Speed 8874.99 samples/sec   Loss 3.3524   LearningRate 0.0005   Epoch: 18   Global Step: 309480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:04,585-Speed 9067.56 samples/sec   Loss 3.3653   LearningRate 0.0005   Epoch: 18   Global Step: 309490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:05,737-Speed 8890.36 samples/sec   Loss 3.4247   LearningRate 0.0005   Epoch: 18   Global Step: 309500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:06,823-Speed 9435.94 samples/sec   Loss 3.4561   LearningRate 0.0005   Epoch: 18   Global Step: 309510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:07,961-Speed 9003.61 samples/sec   Loss 3.3882   LearningRate 0.0005   Epoch: 18   Global Step: 309520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:09,096-Speed 9026.88 samples/sec   Loss 3.3962   LearningRate 0.0005   Epoch: 18   Global Step: 309530   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:05:10,226-Speed 9070.23 samples/sec   Loss 3.4128   LearningRate 0.0005   Epoch: 18   Global Step: 309540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:11,346-Speed 9151.19 samples/sec   Loss 3.3814   LearningRate 0.0005   Epoch: 18   Global Step: 309550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:12,477-Speed 9054.55 samples/sec   Loss 3.4166   LearningRate 0.0005   Epoch: 18   Global Step: 309560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:13,639-Speed 8819.10 samples/sec   Loss 3.4090   LearningRate 0.0005   Epoch: 18   Global Step: 309570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:14,737-Speed 9325.23 samples/sec   Loss 3.4674   LearningRate 0.0005   Epoch: 18   Global Step: 309580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:15,870-Speed 9044.13 samples/sec   Loss 3.3802   LearningRate 0.0005   Epoch: 18   Global Step: 309590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:16,982-Speed 9219.92 samples/sec   Loss 3.3750   LearningRate 0.0005   Epoch: 18   Global Step: 309600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:18,073-Speed 9389.24 samples/sec   Loss 3.4286   LearningRate 0.0005   Epoch: 18   Global Step: 309610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:19,191-Speed 9164.97 samples/sec   Loss 3.3937   LearningRate 0.0005   Epoch: 18   Global Step: 309620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:20,351-Speed 8833.39 samples/sec   Loss 3.3964   LearningRate 0.0005   Epoch: 18   Global Step: 309630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:21,464-Speed 9202.88 samples/sec   Loss 3.3611   LearningRate 0.0005   Epoch: 18   Global Step: 309640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:22,608-Speed 8954.30 samples/sec   Loss 3.4189   LearningRate 0.0005   Epoch: 18   Global Step: 309650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:23,737-Speed 9079.18 samples/sec   Loss 3.4676   LearningRate 0.0005   Epoch: 18   Global Step: 309660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:24,839-Speed 9296.59 samples/sec   Loss 3.3912   LearningRate 0.0005   Epoch: 18   Global Step: 309670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:25,940-Speed 9307.04 samples/sec   Loss 3.3688   LearningRate 0.0005   Epoch: 18   Global Step: 309680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:27,082-Speed 8970.54 samples/sec   Loss 3.3084   LearningRate 0.0005   Epoch: 18   Global Step: 309690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:28,176-Speed 9360.99 samples/sec   Loss 3.3589   LearningRate 0.0005   Epoch: 18   Global Step: 309700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:29,325-Speed 8923.81 samples/sec   Loss 3.4580   LearningRate 0.0005   Epoch: 18   Global Step: 309710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:30,444-Speed 9160.13 samples/sec   Loss 3.4088   LearningRate 0.0005   Epoch: 18   Global Step: 309720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:31,617-Speed 8733.60 samples/sec   Loss 3.3493   LearningRate 0.0005   Epoch: 18   Global Step: 309730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:32,761-Speed 8953.84 samples/sec   Loss 3.4536   LearningRate 0.0005   Epoch: 18   Global Step: 309740   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:05:33,849-Speed 9415.23 samples/sec   Loss 3.4080   LearningRate 0.0005   Epoch: 18   Global Step: 309750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:34,956-Speed 9257.35 samples/sec   Loss 3.3847   LearningRate 0.0005   Epoch: 18   Global Step: 309760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:36,075-Speed 9159.34 samples/sec   Loss 3.4379   LearningRate 0.0005   Epoch: 18   Global Step: 309770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:37,166-Speed 9399.66 samples/sec   Loss 3.4170   LearningRate 0.0005   Epoch: 18   Global Step: 309780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:38,330-Speed 8797.03 samples/sec   Loss 3.3623   LearningRate 0.0005   Epoch: 18   Global Step: 309790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:39,455-Speed 9108.57 samples/sec   Loss 3.3993   LearningRate 0.0005   Epoch: 18   Global Step: 309800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:40,581-Speed 9101.33 samples/sec   Loss 3.3974   LearningRate 0.0005   Epoch: 18   Global Step: 309810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:41,701-Speed 9146.17 samples/sec   Loss 3.2821   LearningRate 0.0005   Epoch: 18   Global Step: 309820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:42,816-Speed 9189.51 samples/sec   Loss 3.2912   LearningRate 0.0005   Epoch: 18   Global Step: 309830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:43,939-Speed 9123.61 samples/sec   Loss 3.4399   LearningRate 0.0005   Epoch: 18   Global Step: 309840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:45,095-Speed 8868.11 samples/sec   Loss 3.4761   LearningRate 0.0005   Epoch: 18   Global Step: 309850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:46,205-Speed 9231.38 samples/sec   Loss 3.4011   LearningRate 0.0005   Epoch: 18   Global Step: 309860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:47,309-Speed 9289.31 samples/sec   Loss 3.4362   LearningRate 0.0005   Epoch: 18   Global Step: 309870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:48,464-Speed 8868.17 samples/sec   Loss 3.3859   LearningRate 0.0005   Epoch: 18   Global Step: 309880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:49,583-Speed 9155.91 samples/sec   Loss 3.3075   LearningRate 0.0005   Epoch: 18   Global Step: 309890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:50,757-Speed 8728.22 samples/sec   Loss 3.3476   LearningRate 0.0005   Epoch: 18   Global Step: 309900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:51,900-Speed 8962.12 samples/sec   Loss 3.3486   LearningRate 0.0005   Epoch: 18   Global Step: 309910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:53,021-Speed 9141.91 samples/sec   Loss 3.4417   LearningRate 0.0005   Epoch: 18   Global Step: 309920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:54,148-Speed 9093.37 samples/sec   Loss 3.4245   LearningRate 0.0005   Epoch: 18   Global Step: 309930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:55,279-Speed 9054.22 samples/sec   Loss 3.4208   LearningRate 0.0005   Epoch: 18   Global Step: 309940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:56,395-Speed 9181.58 samples/sec   Loss 3.4331   LearningRate 0.0005   Epoch: 18   Global Step: 309950   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:05:57,484-Speed 9412.31 samples/sec   Loss 3.4084   LearningRate 0.0005   Epoch: 18   Global Step: 309960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:58,604-Speed 9142.21 samples/sec   Loss 3.3714   LearningRate 0.0005   Epoch: 18   Global Step: 309970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:05:59,767-Speed 8815.21 samples/sec   Loss 3.4358   LearningRate 0.0005   Epoch: 18   Global Step: 309980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:06:00,911-Speed 8957.20 samples/sec   Loss 3.3813   LearningRate 0.0005   Epoch: 18   Global Step: 309990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:06:02,001-Speed 9396.46 samples/sec   Loss 3.3740   LearningRate 0.0005   Epoch: 18   Global Step: 310000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:06:23,904-[lfw][310000]XNorm: 6.594072
Training: 2022-04-12 00:06:23,905-[lfw][310000]Accuracy-Flip: 0.99667+-0.00298
Training: 2022-04-12 00:06:23,906-[lfw][310000]Accuracy-Highest: 0.99750
Training: 2022-04-12 00:06:49,238-[cfp_fp][310000]XNorm: 5.756133
Training: 2022-04-12 00:06:49,239-[cfp_fp][310000]Accuracy-Flip: 0.97114+-0.00861
Training: 2022-04-12 00:06:49,240-[cfp_fp][310000]Accuracy-Highest: 0.97386
Training: 2022-04-12 00:07:11,078-[agedb_30][310000]XNorm: 6.425412
Training: 2022-04-12 00:07:11,079-[agedb_30][310000]Accuracy-Flip: 0.97300+-0.00795
Training: 2022-04-12 00:07:11,079-[agedb_30][310000]Accuracy-Highest: 0.97417
Training: 2022-04-12 00:07:12,222-Speed 145.83 samples/sec   Loss 3.3671   LearningRate 0.0005   Epoch: 18   Global Step: 310010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:13,324-Speed 9291.80 samples/sec   Loss 3.3562   LearningRate 0.0005   Epoch: 18   Global Step: 310020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:14,424-Speed 9316.38 samples/sec   Loss 3.4797   LearningRate 0.0005   Epoch: 18   Global Step: 310030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:15,552-Speed 9081.36 samples/sec   Loss 3.3406   LearningRate 0.0005   Epoch: 18   Global Step: 310040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:16,667-Speed 9192.26 samples/sec   Loss 3.4566   LearningRate 0.0005   Epoch: 18   Global Step: 310050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:17,760-Speed 9371.65 samples/sec   Loss 3.4169   LearningRate 0.0005   Epoch: 18   Global Step: 310060   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:07:18,837-Speed 9516.01 samples/sec   Loss 3.4022   LearningRate 0.0005   Epoch: 18   Global Step: 310070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:19,961-Speed 9118.83 samples/sec   Loss 3.3763   LearningRate 0.0005   Epoch: 18   Global Step: 310080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:21,066-Speed 9269.39 samples/sec   Loss 3.3948   LearningRate 0.0005   Epoch: 18   Global Step: 310090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:22,236-Speed 8757.40 samples/sec   Loss 3.3891   LearningRate 0.0005   Epoch: 18   Global Step: 310100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:23,364-Speed 9078.75 samples/sec   Loss 3.4371   LearningRate 0.0005   Epoch: 18   Global Step: 310110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:24,579-Speed 8433.03 samples/sec   Loss 3.4228   LearningRate 0.0005   Epoch: 18   Global Step: 310120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:25,713-Speed 9036.73 samples/sec   Loss 3.3596   LearningRate 0.0005   Epoch: 18   Global Step: 310130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:26,797-Speed 9449.42 samples/sec   Loss 3.3913   LearningRate 0.0005   Epoch: 18   Global Step: 310140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:27,976-Speed 8687.68 samples/sec   Loss 3.3463   LearningRate 0.0005   Epoch: 18   Global Step: 310150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:29,099-Speed 9126.30 samples/sec   Loss 3.3865   LearningRate 0.0005   Epoch: 18   Global Step: 310160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:30,232-Speed 9038.90 samples/sec   Loss 3.3238   LearningRate 0.0005   Epoch: 18   Global Step: 310170   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:07:31,402-Speed 8764.36 samples/sec   Loss 3.4118   LearningRate 0.0005   Epoch: 18   Global Step: 310180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:32,526-Speed 9118.32 samples/sec   Loss 3.4733   LearningRate 0.0005   Epoch: 18   Global Step: 310190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:33,656-Speed 9066.75 samples/sec   Loss 3.3904   LearningRate 0.0005   Epoch: 18   Global Step: 310200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:34,789-Speed 9042.79 samples/sec   Loss 3.4082   LearningRate 0.0005   Epoch: 18   Global Step: 310210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:35,922-Speed 9045.59 samples/sec   Loss 3.4695   LearningRate 0.0005   Epoch: 18   Global Step: 310220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:37,026-Speed 9277.93 samples/sec   Loss 3.3788   LearningRate 0.0005   Epoch: 18   Global Step: 310230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:38,167-Speed 8977.48 samples/sec   Loss 3.4294   LearningRate 0.0005   Epoch: 18   Global Step: 310240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:39,332-Speed 8791.55 samples/sec   Loss 3.4603   LearningRate 0.0005   Epoch: 18   Global Step: 310250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:40,465-Speed 9050.26 samples/sec   Loss 3.3325   LearningRate 0.0005   Epoch: 18   Global Step: 310260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:41,593-Speed 9084.47 samples/sec   Loss 3.4472   LearningRate 0.0005   Epoch: 18   Global Step: 310270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:42,712-Speed 9157.55 samples/sec   Loss 3.4654   LearningRate 0.0005   Epoch: 18   Global Step: 310280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:43,828-Speed 9173.21 samples/sec   Loss 3.3595   LearningRate 0.0005   Epoch: 18   Global Step: 310290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:44,946-Speed 9164.23 samples/sec   Loss 3.5002   LearningRate 0.0005   Epoch: 18   Global Step: 310300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:46,058-Speed 9216.35 samples/sec   Loss 3.3592   LearningRate 0.0005   Epoch: 18   Global Step: 310310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:47,178-Speed 9149.07 samples/sec   Loss 3.2683   LearningRate 0.0005   Epoch: 18   Global Step: 310320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:48,341-Speed 8812.03 samples/sec   Loss 3.3704   LearningRate 0.0005   Epoch: 18   Global Step: 310330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:49,510-Speed 8761.16 samples/sec   Loss 3.3399   LearningRate 0.0005   Epoch: 18   Global Step: 310340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:50,611-Speed 9305.31 samples/sec   Loss 3.3551   LearningRate 0.0005   Epoch: 18   Global Step: 310350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:51,734-Speed 9124.04 samples/sec   Loss 3.3687   LearningRate 0.0005   Epoch: 18   Global Step: 310360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:52,898-Speed 8813.74 samples/sec   Loss 3.3422   LearningRate 0.0005   Epoch: 18   Global Step: 310370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:54,070-Speed 8736.06 samples/sec   Loss 3.4037   LearningRate 0.0005   Epoch: 18   Global Step: 310380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:55,173-Speed 9295.74 samples/sec   Loss 3.3881   LearningRate 0.0005   Epoch: 18   Global Step: 310390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:56,366-Speed 8587.25 samples/sec   Loss 3.3959   LearningRate 0.0005   Epoch: 18   Global Step: 310400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:57,461-Speed 9350.02 samples/sec   Loss 3.3576   LearningRate 0.0005   Epoch: 18   Global Step: 310410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:58,569-Speed 9251.00 samples/sec   Loss 3.4004   LearningRate 0.0005   Epoch: 18   Global Step: 310420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:07:59,721-Speed 8899.31 samples/sec   Loss 3.4485   LearningRate 0.0005   Epoch: 18   Global Step: 310430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:00,842-Speed 9134.46 samples/sec   Loss 3.3651   LearningRate 0.0005   Epoch: 18   Global Step: 310440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:01,973-Speed 9060.35 samples/sec   Loss 3.3453   LearningRate 0.0005   Epoch: 18   Global Step: 310450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:03,085-Speed 9218.06 samples/sec   Loss 3.3952   LearningRate 0.0005   Epoch: 18   Global Step: 310460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:04,223-Speed 8997.81 samples/sec   Loss 3.3951   LearningRate 0.0005   Epoch: 18   Global Step: 310470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:05,358-Speed 9027.77 samples/sec   Loss 3.4928   LearningRate 0.0005   Epoch: 18   Global Step: 310480   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:08:06,509-Speed 8907.60 samples/sec   Loss 3.4132   LearningRate 0.0005   Epoch: 18   Global Step: 310490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:07,642-Speed 9036.25 samples/sec   Loss 3.4376   LearningRate 0.0005   Epoch: 18   Global Step: 310500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:08,770-Speed 9082.10 samples/sec   Loss 3.3690   LearningRate 0.0005   Epoch: 18   Global Step: 310510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:09,931-Speed 8825.32 samples/sec   Loss 3.4576   LearningRate 0.0005   Epoch: 18   Global Step: 310520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:11,034-Speed 9297.25 samples/sec   Loss 3.3908   LearningRate 0.0005   Epoch: 18   Global Step: 310530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:12,155-Speed 9137.69 samples/sec   Loss 3.4274   LearningRate 0.0005   Epoch: 18   Global Step: 310540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:13,339-Speed 8656.07 samples/sec   Loss 3.5070   LearningRate 0.0005   Epoch: 18   Global Step: 310550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:14,484-Speed 8949.80 samples/sec   Loss 3.5679   LearningRate 0.0005   Epoch: 18   Global Step: 310560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:15,616-Speed 9045.13 samples/sec   Loss 3.4037   LearningRate 0.0005   Epoch: 18   Global Step: 310570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:16,710-Speed 9366.38 samples/sec   Loss 3.4343   LearningRate 0.0005   Epoch: 18   Global Step: 310580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:17,805-Speed 9360.62 samples/sec   Loss 3.3385   LearningRate 0.0005   Epoch: 18   Global Step: 310590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:18,982-Speed 8702.71 samples/sec   Loss 3.4094   LearningRate 0.0005   Epoch: 18   Global Step: 310600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:20,147-Speed 8795.12 samples/sec   Loss 3.3422   LearningRate 0.0005   Epoch: 18   Global Step: 310610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:21,241-Speed 9369.31 samples/sec   Loss 3.4231   LearningRate 0.0005   Epoch: 18   Global Step: 310620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:22,372-Speed 9059.20 samples/sec   Loss 3.4583   LearningRate 0.0005   Epoch: 18   Global Step: 310630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:23,501-Speed 9069.76 samples/sec   Loss 3.3860   LearningRate 0.0005   Epoch: 18   Global Step: 310640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:24,652-Speed 8905.05 samples/sec   Loss 3.4204   LearningRate 0.0005   Epoch: 18   Global Step: 310650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:25,820-Speed 8768.33 samples/sec   Loss 3.4248   LearningRate 0.0005   Epoch: 18   Global Step: 310660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:26,963-Speed 8965.62 samples/sec   Loss 3.4229   LearningRate 0.0005   Epoch: 18   Global Step: 310670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:28,040-Speed 9511.22 samples/sec   Loss 3.3544   LearningRate 0.0005   Epoch: 18   Global Step: 310680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:29,130-Speed 9403.37 samples/sec   Loss 3.4178   LearningRate 0.0005   Epoch: 18   Global Step: 310690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:30,255-Speed 9113.56 samples/sec   Loss 3.3982   LearningRate 0.0005   Epoch: 18   Global Step: 310700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:31,326-Speed 9566.76 samples/sec   Loss 3.3716   LearningRate 0.0005   Epoch: 18   Global Step: 310710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:32,443-Speed 9167.05 samples/sec   Loss 3.4291   LearningRate 0.0005   Epoch: 18   Global Step: 310720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:33,559-Speed 9185.95 samples/sec   Loss 3.4403   LearningRate 0.0005   Epoch: 18   Global Step: 310730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:34,681-Speed 9127.10 samples/sec   Loss 3.4122   LearningRate 0.0005   Epoch: 18   Global Step: 310740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:35,806-Speed 9107.15 samples/sec   Loss 3.4362   LearningRate 0.0005   Epoch: 18   Global Step: 310750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:36,943-Speed 9009.22 samples/sec   Loss 3.3551   LearningRate 0.0005   Epoch: 18   Global Step: 310760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:38,043-Speed 9314.45 samples/sec   Loss 3.3768   LearningRate 0.0005   Epoch: 18   Global Step: 310770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:39,147-Speed 9280.76 samples/sec   Loss 3.4134   LearningRate 0.0005   Epoch: 18   Global Step: 310780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:40,250-Speed 9286.89 samples/sec   Loss 3.4458   LearningRate 0.0005   Epoch: 18   Global Step: 310790   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:08:41,378-Speed 9090.23 samples/sec   Loss 3.5310   LearningRate 0.0005   Epoch: 18   Global Step: 310800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:42,510-Speed 9053.38 samples/sec   Loss 3.4661   LearningRate 0.0005   Epoch: 18   Global Step: 310810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:43,648-Speed 9001.21 samples/sec   Loss 3.4233   LearningRate 0.0005   Epoch: 18   Global Step: 310820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:44,752-Speed 9279.16 samples/sec   Loss 3.4071   LearningRate 0.0005   Epoch: 18   Global Step: 310830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:45,850-Speed 9336.97 samples/sec   Loss 3.3444   LearningRate 0.0005   Epoch: 18   Global Step: 310840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:46,992-Speed 8968.74 samples/sec   Loss 3.4175   LearningRate 0.0005   Epoch: 18   Global Step: 310850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:48,096-Speed 9280.88 samples/sec   Loss 3.4717   LearningRate 0.0005   Epoch: 18   Global Step: 310860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:49,218-Speed 9139.31 samples/sec   Loss 3.3461   LearningRate 0.0005   Epoch: 18   Global Step: 310870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:50,319-Speed 9300.15 samples/sec   Loss 3.4218   LearningRate 0.0005   Epoch: 18   Global Step: 310880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:51,465-Speed 8944.14 samples/sec   Loss 3.4538   LearningRate 0.0005   Epoch: 18   Global Step: 310890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:52,592-Speed 9090.48 samples/sec   Loss 3.3261   LearningRate 0.0005   Epoch: 18   Global Step: 310900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:53,696-Speed 9277.85 samples/sec   Loss 3.3843   LearningRate 0.0005   Epoch: 18   Global Step: 310910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:54,844-Speed 8922.88 samples/sec   Loss 3.4347   LearningRate 0.0005   Epoch: 18   Global Step: 310920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:55,961-Speed 9178.87 samples/sec   Loss 3.4609   LearningRate 0.0005   Epoch: 18   Global Step: 310930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:57,076-Speed 9186.78 samples/sec   Loss 3.4433   LearningRate 0.0005   Epoch: 18   Global Step: 310940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:58,196-Speed 9144.34 samples/sec   Loss 3.4252   LearningRate 0.0005   Epoch: 18   Global Step: 310950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:08:59,301-Speed 9275.32 samples/sec   Loss 3.3706   LearningRate 0.0005   Epoch: 18   Global Step: 310960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:00,429-Speed 9088.10 samples/sec   Loss 3.3720   LearningRate 0.0005   Epoch: 18   Global Step: 310970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:01,531-Speed 9294.31 samples/sec   Loss 3.4146   LearningRate 0.0005   Epoch: 18   Global Step: 310980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:02,628-Speed 9342.61 samples/sec   Loss 3.4129   LearningRate 0.0005   Epoch: 18   Global Step: 310990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:03,749-Speed 9140.46 samples/sec   Loss 3.4334   LearningRate 0.0005   Epoch: 18   Global Step: 311000   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:09:04,869-Speed 9144.69 samples/sec   Loss 3.4100   LearningRate 0.0005   Epoch: 18   Global Step: 311010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:06,016-Speed 8931.92 samples/sec   Loss 3.3915   LearningRate 0.0005   Epoch: 18   Global Step: 311020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:07,119-Speed 9295.21 samples/sec   Loss 3.3918   LearningRate 0.0005   Epoch: 18   Global Step: 311030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:08,224-Speed 9263.63 samples/sec   Loss 3.4255   LearningRate 0.0005   Epoch: 18   Global Step: 311040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:09,353-Speed 9080.53 samples/sec   Loss 3.4265   LearningRate 0.0005   Epoch: 18   Global Step: 311050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:10,455-Speed 9291.37 samples/sec   Loss 3.4138   LearningRate 0.0005   Epoch: 18   Global Step: 311060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:11,568-Speed 9215.15 samples/sec   Loss 3.3248   LearningRate 0.0005   Epoch: 18   Global Step: 311070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:12,680-Speed 9211.91 samples/sec   Loss 3.3943   LearningRate 0.0005   Epoch: 18   Global Step: 311080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:13,817-Speed 9006.02 samples/sec   Loss 3.4636   LearningRate 0.0005   Epoch: 18   Global Step: 311090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:14,894-Speed 9516.59 samples/sec   Loss 3.4082   LearningRate 0.0005   Epoch: 18   Global Step: 311100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:15,995-Speed 9307.12 samples/sec   Loss 3.4275   LearningRate 0.0005   Epoch: 18   Global Step: 311110   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:09:17,073-Speed 9499.44 samples/sec   Loss 3.4438   LearningRate 0.0005   Epoch: 18   Global Step: 311120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:18,177-Speed 9296.11 samples/sec   Loss 3.4352   LearningRate 0.0005   Epoch: 18   Global Step: 311130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:19,334-Speed 8852.30 samples/sec   Loss 3.3805   LearningRate 0.0005   Epoch: 18   Global Step: 311140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:20,488-Speed 8878.85 samples/sec   Loss 3.4668   LearningRate 0.0005   Epoch: 18   Global Step: 311150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:21,592-Speed 9280.80 samples/sec   Loss 3.3865   LearningRate 0.0005   Epoch: 18   Global Step: 311160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:22,720-Speed 9084.43 samples/sec   Loss 3.4698   LearningRate 0.0005   Epoch: 18   Global Step: 311170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:23,818-Speed 9329.46 samples/sec   Loss 3.3175   LearningRate 0.0005   Epoch: 18   Global Step: 311180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:24,922-Speed 9275.21 samples/sec   Loss 3.4660   LearningRate 0.0005   Epoch: 18   Global Step: 311190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:26,022-Speed 9326.20 samples/sec   Loss 3.4196   LearningRate 0.0005   Epoch: 18   Global Step: 311200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:27,127-Speed 9279.85 samples/sec   Loss 3.3544   LearningRate 0.0005   Epoch: 18   Global Step: 311210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:28,249-Speed 9134.96 samples/sec   Loss 3.4228   LearningRate 0.0005   Epoch: 18   Global Step: 311220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:29,358-Speed 9240.44 samples/sec   Loss 3.4168   LearningRate 0.0005   Epoch: 18   Global Step: 311230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:30,454-Speed 9350.92 samples/sec   Loss 3.3786   LearningRate 0.0005   Epoch: 18   Global Step: 311240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:31,574-Speed 9144.34 samples/sec   Loss 3.4150   LearningRate 0.0005   Epoch: 18   Global Step: 311250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:32,677-Speed 9288.85 samples/sec   Loss 3.4524   LearningRate 0.0005   Epoch: 18   Global Step: 311260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:33,807-Speed 9067.57 samples/sec   Loss 3.4193   LearningRate 0.0005   Epoch: 18   Global Step: 311270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:34,897-Speed 9400.64 samples/sec   Loss 3.3762   LearningRate 0.0005   Epoch: 18   Global Step: 311280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:36,021-Speed 9109.35 samples/sec   Loss 3.3756   LearningRate 0.0005   Epoch: 18   Global Step: 311290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:37,113-Speed 9384.44 samples/sec   Loss 3.3963   LearningRate 0.0005   Epoch: 18   Global Step: 311300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:38,233-Speed 9146.38 samples/sec   Loss 3.4188   LearningRate 0.0005   Epoch: 18   Global Step: 311310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:39,344-Speed 9223.23 samples/sec   Loss 3.2828   LearningRate 0.0005   Epoch: 18   Global Step: 311320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:40,484-Speed 8988.50 samples/sec   Loss 3.3916   LearningRate 0.0005   Epoch: 18   Global Step: 311330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:41,645-Speed 8827.37 samples/sec   Loss 3.3479   LearningRate 0.0005   Epoch: 18   Global Step: 311340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:42,781-Speed 9016.44 samples/sec   Loss 3.4417   LearningRate 0.0005   Epoch: 18   Global Step: 311350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:43,897-Speed 9185.98 samples/sec   Loss 3.4282   LearningRate 0.0005   Epoch: 18   Global Step: 311360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:45,011-Speed 9197.08 samples/sec   Loss 3.3950   LearningRate 0.0005   Epoch: 18   Global Step: 311370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:46,149-Speed 9003.84 samples/sec   Loss 3.2840   LearningRate 0.0005   Epoch: 18   Global Step: 311380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:47,249-Speed 9317.46 samples/sec   Loss 3.4551   LearningRate 0.0005   Epoch: 18   Global Step: 311390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:09:48,323-Speed 9540.01 samples/sec   Loss 3.4210   LearningRate 0.0005   Epoch: 18   Global Step: 311400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:49,416-Speed 9375.44 samples/sec   Loss 3.3945   LearningRate 0.0005   Epoch: 18   Global Step: 311410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:50,554-Speed 9001.33 samples/sec   Loss 3.4719   LearningRate 0.0005   Epoch: 18   Global Step: 311420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:51,647-Speed 9377.79 samples/sec   Loss 3.4286   LearningRate 0.0004   Epoch: 18   Global Step: 311430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:52,846-Speed 8539.91 samples/sec   Loss 3.3898   LearningRate 0.0004   Epoch: 18   Global Step: 311440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:53,991-Speed 8954.13 samples/sec   Loss 3.4682   LearningRate 0.0004   Epoch: 18   Global Step: 311450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:55,133-Speed 8966.48 samples/sec   Loss 3.4613   LearningRate 0.0004   Epoch: 18   Global Step: 311460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:56,254-Speed 9140.17 samples/sec   Loss 3.3433   LearningRate 0.0004   Epoch: 18   Global Step: 311470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:57,343-Speed 9408.47 samples/sec   Loss 3.3211   LearningRate 0.0004   Epoch: 18   Global Step: 311480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:58,447-Speed 9284.18 samples/sec   Loss 3.5032   LearningRate 0.0004   Epoch: 18   Global Step: 311490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:09:59,616-Speed 8763.06 samples/sec   Loss 3.4003   LearningRate 0.0004   Epoch: 18   Global Step: 311500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:00,782-Speed 8790.75 samples/sec   Loss 3.3517   LearningRate 0.0004   Epoch: 18   Global Step: 311510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:01,914-Speed 9048.16 samples/sec   Loss 3.3692   LearningRate 0.0004   Epoch: 18   Global Step: 311520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:03,003-Speed 9412.42 samples/sec   Loss 3.3606   LearningRate 0.0004   Epoch: 18   Global Step: 311530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:04,081-Speed 9506.82 samples/sec   Loss 3.4451   LearningRate 0.0004   Epoch: 18   Global Step: 311540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:05,170-Speed 9405.75 samples/sec   Loss 3.4224   LearningRate 0.0004   Epoch: 18   Global Step: 311550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:06,266-Speed 9345.64 samples/sec   Loss 3.5074   LearningRate 0.0004   Epoch: 18   Global Step: 311560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:07,409-Speed 8965.79 samples/sec   Loss 3.4191   LearningRate 0.0004   Epoch: 18   Global Step: 311570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:08,508-Speed 9317.16 samples/sec   Loss 3.4355   LearningRate 0.0004   Epoch: 18   Global Step: 311580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:09,620-Speed 9214.22 samples/sec   Loss 3.4498   LearningRate 0.0004   Epoch: 18   Global Step: 311590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:10,709-Speed 9417.10 samples/sec   Loss 3.3846   LearningRate 0.0004   Epoch: 18   Global Step: 311600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:11,841-Speed 9048.85 samples/sec   Loss 3.3901   LearningRate 0.0004   Epoch: 18   Global Step: 311610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:12,969-Speed 9087.19 samples/sec   Loss 3.4158   LearningRate 0.0004   Epoch: 18   Global Step: 311620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:14,093-Speed 9117.08 samples/sec   Loss 3.3312   LearningRate 0.0004   Epoch: 18   Global Step: 311630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:15,222-Speed 9070.94 samples/sec   Loss 3.3667   LearningRate 0.0004   Epoch: 18   Global Step: 311640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:16,401-Speed 8688.00 samples/sec   Loss 3.3723   LearningRate 0.0004   Epoch: 18   Global Step: 311650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:17,570-Speed 8765.33 samples/sec   Loss 3.4387   LearningRate 0.0004   Epoch: 18   Global Step: 311660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:18,684-Speed 9204.69 samples/sec   Loss 3.3377   LearningRate 0.0004   Epoch: 18   Global Step: 311670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:19,858-Speed 8726.77 samples/sec   Loss 3.3996   LearningRate 0.0004   Epoch: 18   Global Step: 311680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:20,990-Speed 9043.84 samples/sec   Loss 3.3883   LearningRate 0.0004   Epoch: 18   Global Step: 311690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:22,123-Speed 9047.08 samples/sec   Loss 3.4474   LearningRate 0.0004   Epoch: 18   Global Step: 311700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:23,296-Speed 8735.56 samples/sec   Loss 3.3624   LearningRate 0.0004   Epoch: 18   Global Step: 311710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:24,436-Speed 8987.42 samples/sec   Loss 3.3441   LearningRate 0.0004   Epoch: 18   Global Step: 311720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:25,555-Speed 9152.30 samples/sec   Loss 3.3672   LearningRate 0.0004   Epoch: 18   Global Step: 311730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:26,672-Speed 9173.39 samples/sec   Loss 3.3537   LearningRate 0.0004   Epoch: 18   Global Step: 311740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:27,760-Speed 9423.49 samples/sec   Loss 3.4238   LearningRate 0.0004   Epoch: 18   Global Step: 311750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:28,864-Speed 9278.81 samples/sec   Loss 3.3408   LearningRate 0.0004   Epoch: 18   Global Step: 311760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:29,991-Speed 9116.27 samples/sec   Loss 3.4249   LearningRate 0.0004   Epoch: 18   Global Step: 311770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:31,077-Speed 9433.83 samples/sec   Loss 3.3832   LearningRate 0.0004   Epoch: 18   Global Step: 311780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:32,192-Speed 9192.01 samples/sec   Loss 3.3610   LearningRate 0.0004   Epoch: 18   Global Step: 311790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:33,299-Speed 9251.19 samples/sec   Loss 3.5063   LearningRate 0.0004   Epoch: 18   Global Step: 311800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:34,416-Speed 9175.39 samples/sec   Loss 3.4159   LearningRate 0.0004   Epoch: 18   Global Step: 311810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:35,513-Speed 9340.88 samples/sec   Loss 3.4299   LearningRate 0.0004   Epoch: 18   Global Step: 311820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:36,615-Speed 9298.17 samples/sec   Loss 3.3975   LearningRate 0.0004   Epoch: 18   Global Step: 311830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:37,734-Speed 9161.39 samples/sec   Loss 3.4201   LearningRate 0.0004   Epoch: 18   Global Step: 311840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:38,871-Speed 9006.71 samples/sec   Loss 3.3748   LearningRate 0.0004   Epoch: 18   Global Step: 311850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:40,029-Speed 8852.66 samples/sec   Loss 3.4832   LearningRate 0.0004   Epoch: 18   Global Step: 311860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:41,162-Speed 9042.49 samples/sec   Loss 3.4734   LearningRate 0.0004   Epoch: 18   Global Step: 311870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:42,284-Speed 9136.84 samples/sec   Loss 3.3714   LearningRate 0.0004   Epoch: 18   Global Step: 311880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:43,445-Speed 8824.43 samples/sec   Loss 3.4288   LearningRate 0.0004   Epoch: 18   Global Step: 311890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:44,594-Speed 8911.12 samples/sec   Loss 3.4099   LearningRate 0.0004   Epoch: 18   Global Step: 311900   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:10:45,673-Speed 9500.73 samples/sec   Loss 3.4019   LearningRate 0.0004   Epoch: 18   Global Step: 311910   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:10:46,736-Speed 9634.47 samples/sec   Loss 3.4139   LearningRate 0.0004   Epoch: 18   Global Step: 311920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:47,861-Speed 9112.72 samples/sec   Loss 3.3994   LearningRate 0.0004   Epoch: 18   Global Step: 311930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:48,969-Speed 9246.30 samples/sec   Loss 3.4169   LearningRate 0.0004   Epoch: 18   Global Step: 311940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:50,117-Speed 8921.50 samples/sec   Loss 3.4277   LearningRate 0.0004   Epoch: 18   Global Step: 311950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:51,235-Speed 9171.96 samples/sec   Loss 3.4762   LearningRate 0.0004   Epoch: 18   Global Step: 311960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:52,371-Speed 9017.23 samples/sec   Loss 3.4076   LearningRate 0.0004   Epoch: 18   Global Step: 311970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:53,542-Speed 8744.21 samples/sec   Loss 3.3510   LearningRate 0.0004   Epoch: 18   Global Step: 311980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:54,675-Speed 9046.64 samples/sec   Loss 3.4148   LearningRate 0.0004   Epoch: 18   Global Step: 311990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:10:55,798-Speed 9122.37 samples/sec   Loss 3.4304   LearningRate 0.0004   Epoch: 18   Global Step: 312000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:11:17,577-[lfw][312000]XNorm: 6.563531
Training: 2022-04-12 00:11:17,578-[lfw][312000]Accuracy-Flip: 0.99617+-0.00289
Training: 2022-04-12 00:11:17,579-[lfw][312000]Accuracy-Highest: 0.99750
Training: 2022-04-12 00:11:42,768-[cfp_fp][312000]XNorm: 5.733843
Training: 2022-04-12 00:11:42,769-[cfp_fp][312000]Accuracy-Flip: 0.97243+-0.00860
Training: 2022-04-12 00:11:42,769-[cfp_fp][312000]Accuracy-Highest: 0.97386
Training: 2022-04-12 00:12:04,474-[agedb_30][312000]XNorm: 6.401253
Training: 2022-04-12 00:12:04,475-[agedb_30][312000]Accuracy-Flip: 0.97250+-0.00857
Training: 2022-04-12 00:12:04,475-[agedb_30][312000]Accuracy-Highest: 0.97417
Training: 2022-04-12 00:12:05,597-Speed 146.71 samples/sec   Loss 3.4054   LearningRate 0.0004   Epoch: 18   Global Step: 312010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:06,758-Speed 8827.61 samples/sec   Loss 3.3517   LearningRate 0.0004   Epoch: 18   Global Step: 312020   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:12:07,867-Speed 9239.69 samples/sec   Loss 3.5021   LearningRate 0.0004   Epoch: 18   Global Step: 312030   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:12:08,969-Speed 9294.69 samples/sec   Loss 3.3910   LearningRate 0.0004   Epoch: 18   Global Step: 312040   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:12:10,034-Speed 9619.63 samples/sec   Loss 3.4016   LearningRate 0.0004   Epoch: 18   Global Step: 312050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:11,128-Speed 9371.56 samples/sec   Loss 3.3555   LearningRate 0.0004   Epoch: 18   Global Step: 312060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:12,230-Speed 9294.99 samples/sec   Loss 3.4246   LearningRate 0.0004   Epoch: 18   Global Step: 312070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:13,323-Speed 9376.79 samples/sec   Loss 3.4221   LearningRate 0.0004   Epoch: 18   Global Step: 312080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:14,438-Speed 9184.09 samples/sec   Loss 3.4180   LearningRate 0.0004   Epoch: 18   Global Step: 312090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:15,549-Speed 9228.33 samples/sec   Loss 3.3764   LearningRate 0.0004   Epoch: 18   Global Step: 312100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:16,671-Speed 9132.79 samples/sec   Loss 3.4256   LearningRate 0.0004   Epoch: 18   Global Step: 312110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:17,761-Speed 9406.20 samples/sec   Loss 3.4467   LearningRate 0.0004   Epoch: 18   Global Step: 312120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:18,860-Speed 9329.49 samples/sec   Loss 3.2966   LearningRate 0.0004   Epoch: 18   Global Step: 312130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:19,975-Speed 9184.66 samples/sec   Loss 3.4190   LearningRate 0.0004   Epoch: 18   Global Step: 312140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:21,117-Speed 8974.07 samples/sec   Loss 3.2877   LearningRate 0.0004   Epoch: 18   Global Step: 312150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:22,227-Speed 9225.27 samples/sec   Loss 3.4498   LearningRate 0.0004   Epoch: 18   Global Step: 312160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:23,366-Speed 9000.03 samples/sec   Loss 3.2824   LearningRate 0.0004   Epoch: 18   Global Step: 312170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:24,511-Speed 8949.37 samples/sec   Loss 3.4088   LearningRate 0.0004   Epoch: 18   Global Step: 312180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:25,635-Speed 9113.26 samples/sec   Loss 3.4456   LearningRate 0.0004   Epoch: 18   Global Step: 312190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:26,780-Speed 8950.90 samples/sec   Loss 3.4314   LearningRate 0.0004   Epoch: 18   Global Step: 312200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:27,886-Speed 9263.35 samples/sec   Loss 3.4202   LearningRate 0.0004   Epoch: 18   Global Step: 312210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:29,027-Speed 8980.14 samples/sec   Loss 3.4013   LearningRate 0.0004   Epoch: 18   Global Step: 312220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:30,211-Speed 8655.53 samples/sec   Loss 3.3339   LearningRate 0.0004   Epoch: 18   Global Step: 312230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:31,317-Speed 9267.83 samples/sec   Loss 3.3957   LearningRate 0.0004   Epoch: 18   Global Step: 312240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:32,406-Speed 9408.31 samples/sec   Loss 3.3049   LearningRate 0.0004   Epoch: 18   Global Step: 312250   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:12:33,524-Speed 9162.22 samples/sec   Loss 3.3888   LearningRate 0.0004   Epoch: 18   Global Step: 312260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:34,637-Speed 9206.41 samples/sec   Loss 3.3903   LearningRate 0.0004   Epoch: 18   Global Step: 312270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:35,726-Speed 9410.03 samples/sec   Loss 3.3567   LearningRate 0.0004   Epoch: 18   Global Step: 312280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:36,783-Speed 9692.39 samples/sec   Loss 3.3427   LearningRate 0.0004   Epoch: 18   Global Step: 312290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:37,915-Speed 9045.31 samples/sec   Loss 3.4256   LearningRate 0.0004   Epoch: 18   Global Step: 312300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:38,997-Speed 9468.36 samples/sec   Loss 3.4028   LearningRate 0.0004   Epoch: 18   Global Step: 312310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:40,104-Speed 9260.79 samples/sec   Loss 3.3535   LearningRate 0.0004   Epoch: 18   Global Step: 312320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:41,228-Speed 9121.50 samples/sec   Loss 3.3934   LearningRate 0.0004   Epoch: 18   Global Step: 312330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:42,415-Speed 8626.96 samples/sec   Loss 3.4767   LearningRate 0.0004   Epoch: 18   Global Step: 312340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:43,607-Speed 8596.08 samples/sec   Loss 3.3391   LearningRate 0.0004   Epoch: 18   Global Step: 312350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:44,711-Speed 9276.77 samples/sec   Loss 3.4930   LearningRate 0.0004   Epoch: 18   Global Step: 312360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:45,844-Speed 9046.87 samples/sec   Loss 3.3913   LearningRate 0.0004   Epoch: 18   Global Step: 312370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:46,953-Speed 9234.34 samples/sec   Loss 3.3207   LearningRate 0.0004   Epoch: 18   Global Step: 312380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:48,059-Speed 9276.94 samples/sec   Loss 3.3672   LearningRate 0.0004   Epoch: 18   Global Step: 312390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:49,155-Speed 9348.12 samples/sec   Loss 3.4848   LearningRate 0.0004   Epoch: 18   Global Step: 312400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:50,263-Speed 9245.77 samples/sec   Loss 3.3545   LearningRate 0.0004   Epoch: 18   Global Step: 312410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:51,404-Speed 8980.64 samples/sec   Loss 3.3563   LearningRate 0.0004   Epoch: 18   Global Step: 312420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:52,552-Speed 8924.79 samples/sec   Loss 3.4152   LearningRate 0.0004   Epoch: 18   Global Step: 312430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:53,650-Speed 9329.85 samples/sec   Loss 3.3713   LearningRate 0.0004   Epoch: 18   Global Step: 312440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:54,751-Speed 9298.54 samples/sec   Loss 3.4169   LearningRate 0.0004   Epoch: 18   Global Step: 312450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:55,891-Speed 8989.93 samples/sec   Loss 3.3770   LearningRate 0.0004   Epoch: 18   Global Step: 312460   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:12:56,999-Speed 9247.79 samples/sec   Loss 3.4213   LearningRate 0.0004   Epoch: 18   Global Step: 312470   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:12:58,099-Speed 9317.85 samples/sec   Loss 3.4110   LearningRate 0.0004   Epoch: 18   Global Step: 312480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:12:59,231-Speed 9050.75 samples/sec   Loss 3.4373   LearningRate 0.0004   Epoch: 18   Global Step: 312490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:00,343-Speed 9215.09 samples/sec   Loss 3.4093   LearningRate 0.0004   Epoch: 18   Global Step: 312500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:01,454-Speed 9217.60 samples/sec   Loss 3.4150   LearningRate 0.0004   Epoch: 18   Global Step: 312510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:02,533-Speed 9500.57 samples/sec   Loss 3.3861   LearningRate 0.0004   Epoch: 18   Global Step: 312520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:03,643-Speed 9224.46 samples/sec   Loss 3.4489   LearningRate 0.0004   Epoch: 18   Global Step: 312530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:04,793-Speed 8909.06 samples/sec   Loss 3.4617   LearningRate 0.0004   Epoch: 18   Global Step: 312540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:05,886-Speed 9374.52 samples/sec   Loss 3.4429   LearningRate 0.0004   Epoch: 18   Global Step: 312550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:07,003-Speed 9179.12 samples/sec   Loss 3.4172   LearningRate 0.0004   Epoch: 18   Global Step: 312560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:08,113-Speed 9228.84 samples/sec   Loss 3.3794   LearningRate 0.0004   Epoch: 18   Global Step: 312570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:09,222-Speed 9234.53 samples/sec   Loss 3.3683   LearningRate 0.0004   Epoch: 18   Global Step: 312580   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:13:10,331-Speed 9243.45 samples/sec   Loss 3.3197   LearningRate 0.0004   Epoch: 18   Global Step: 312590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:11,414-Speed 9467.45 samples/sec   Loss 3.4480   LearningRate 0.0004   Epoch: 18   Global Step: 312600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:12,521-Speed 9254.37 samples/sec   Loss 3.4199   LearningRate 0.0004   Epoch: 18   Global Step: 312610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:13,682-Speed 8821.42 samples/sec   Loss 3.3721   LearningRate 0.0004   Epoch: 18   Global Step: 312620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:14,795-Speed 9202.15 samples/sec   Loss 3.3741   LearningRate 0.0004   Epoch: 18   Global Step: 312630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:15,919-Speed 9114.85 samples/sec   Loss 3.3995   LearningRate 0.0004   Epoch: 18   Global Step: 312640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:17,081-Speed 8819.26 samples/sec   Loss 3.3518   LearningRate 0.0004   Epoch: 18   Global Step: 312650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:18,199-Speed 9169.42 samples/sec   Loss 3.4102   LearningRate 0.0004   Epoch: 18   Global Step: 312660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:19,322-Speed 9123.75 samples/sec   Loss 3.3724   LearningRate 0.0004   Epoch: 18   Global Step: 312670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:20,464-Speed 8972.61 samples/sec   Loss 3.4107   LearningRate 0.0004   Epoch: 18   Global Step: 312680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:21,597-Speed 9035.76 samples/sec   Loss 3.4714   LearningRate 0.0004   Epoch: 18   Global Step: 312690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:22,715-Speed 9164.04 samples/sec   Loss 3.4517   LearningRate 0.0004   Epoch: 18   Global Step: 312700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:23,875-Speed 8836.47 samples/sec   Loss 3.4540   LearningRate 0.0004   Epoch: 18   Global Step: 312710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:24,993-Speed 9162.97 samples/sec   Loss 3.4617   LearningRate 0.0004   Epoch: 18   Global Step: 312720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:26,134-Speed 8978.45 samples/sec   Loss 3.3909   LearningRate 0.0004   Epoch: 18   Global Step: 312730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:27,232-Speed 9337.73 samples/sec   Loss 3.4514   LearningRate 0.0004   Epoch: 18   Global Step: 312740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:28,346-Speed 9194.48 samples/sec   Loss 3.3754   LearningRate 0.0004   Epoch: 18   Global Step: 312750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:29,508-Speed 8815.03 samples/sec   Loss 3.3781   LearningRate 0.0004   Epoch: 18   Global Step: 312760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:30,641-Speed 9053.59 samples/sec   Loss 3.3827   LearningRate 0.0004   Epoch: 18   Global Step: 312770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:31,779-Speed 9001.80 samples/sec   Loss 3.4332   LearningRate 0.0004   Epoch: 18   Global Step: 312780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:32,836-Speed 9692.93 samples/sec   Loss 3.4279   LearningRate 0.0004   Epoch: 18   Global Step: 312790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:33,958-Speed 9134.70 samples/sec   Loss 3.4058   LearningRate 0.0004   Epoch: 18   Global Step: 312800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:35,133-Speed 8717.06 samples/sec   Loss 3.4078   LearningRate 0.0004   Epoch: 18   Global Step: 312810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:36,195-Speed 9645.48 samples/sec   Loss 3.4585   LearningRate 0.0004   Epoch: 18   Global Step: 312820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:13:37,354-Speed 8845.02 samples/sec   Loss 3.4446   LearningRate 0.0004   Epoch: 18   Global Step: 312830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:13:38,502-Speed 8922.49 samples/sec   Loss 3.3961   LearningRate 0.0004   Epoch: 18   Global Step: 312840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:13:39,605-Speed 9290.69 samples/sec   Loss 3.3968   LearningRate 0.0004   Epoch: 18   Global Step: 312850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:13:40,742-Speed 9010.32 samples/sec   Loss 3.3611   LearningRate 0.0004   Epoch: 18   Global Step: 312860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:13:41,894-Speed 8894.19 samples/sec   Loss 3.4286   LearningRate 0.0004   Epoch: 18   Global Step: 312870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:13:43,067-Speed 8736.47 samples/sec   Loss 3.3816   LearningRate 0.0004   Epoch: 18   Global Step: 312880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:13:44,154-Speed 9425.74 samples/sec   Loss 3.3913   LearningRate 0.0004   Epoch: 18   Global Step: 312890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:13:45,254-Speed 9317.29 samples/sec   Loss 3.3487   LearningRate 0.0004   Epoch: 18   Global Step: 312900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:13:46,378-Speed 9117.75 samples/sec   Loss 3.3790   LearningRate 0.0004   Epoch: 18   Global Step: 312910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:13:47,493-Speed 9190.38 samples/sec   Loss 3.4782   LearningRate 0.0004   Epoch: 18   Global Step: 312920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:48,620-Speed 9090.87 samples/sec   Loss 3.4409   LearningRate 0.0004   Epoch: 18   Global Step: 312930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:49,703-Speed 9465.48 samples/sec   Loss 3.3733   LearningRate 0.0004   Epoch: 18   Global Step: 312940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:50,806-Speed 9290.15 samples/sec   Loss 3.3826   LearningRate 0.0004   Epoch: 18   Global Step: 312950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:51,917-Speed 9221.99 samples/sec   Loss 3.3985   LearningRate 0.0004   Epoch: 18   Global Step: 312960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:53,010-Speed 9367.58 samples/sec   Loss 3.4590   LearningRate 0.0004   Epoch: 18   Global Step: 312970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:54,134-Speed 9114.10 samples/sec   Loss 3.4192   LearningRate 0.0004   Epoch: 18   Global Step: 312980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:55,303-Speed 8764.89 samples/sec   Loss 3.3993   LearningRate 0.0004   Epoch: 18   Global Step: 312990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:56,421-Speed 9162.18 samples/sec   Loss 3.4358   LearningRate 0.0004   Epoch: 18   Global Step: 313000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:57,583-Speed 8816.74 samples/sec   Loss 3.4487   LearningRate 0.0004   Epoch: 18   Global Step: 313010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:58,737-Speed 8881.22 samples/sec   Loss 3.3845   LearningRate 0.0004   Epoch: 18   Global Step: 313020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:13:59,878-Speed 8979.66 samples/sec   Loss 3.3639   LearningRate 0.0004   Epoch: 18   Global Step: 313030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:01,036-Speed 8849.56 samples/sec   Loss 3.3933   LearningRate 0.0004   Epoch: 18   Global Step: 313040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:02,170-Speed 9041.45 samples/sec   Loss 3.3939   LearningRate 0.0004   Epoch: 18   Global Step: 313050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:03,242-Speed 9560.39 samples/sec   Loss 3.4743   LearningRate 0.0004   Epoch: 18   Global Step: 313060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:04,410-Speed 8772.70 samples/sec   Loss 3.3588   LearningRate 0.0004   Epoch: 18   Global Step: 313070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:05,523-Speed 9205.63 samples/sec   Loss 3.4381   LearningRate 0.0004   Epoch: 18   Global Step: 313080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:06,605-Speed 9470.17 samples/sec   Loss 3.4841   LearningRate 0.0004   Epoch: 18   Global Step: 313090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:07,705-Speed 9313.38 samples/sec   Loss 3.4653   LearningRate 0.0004   Epoch: 18   Global Step: 313100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:08,860-Speed 8866.06 samples/sec   Loss 3.3054   LearningRate 0.0004   Epoch: 18   Global Step: 313110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:09,939-Speed 9495.97 samples/sec   Loss 3.3820   LearningRate 0.0004   Epoch: 18   Global Step: 313120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:11,085-Speed 8947.49 samples/sec   Loss 3.4112   LearningRate 0.0004   Epoch: 18   Global Step: 313130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:12,227-Speed 8969.17 samples/sec   Loss 3.3552   LearningRate 0.0004   Epoch: 18   Global Step: 313140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:13,350-Speed 9121.10 samples/sec   Loss 3.3732   LearningRate 0.0004   Epoch: 18   Global Step: 313150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:14,444-Speed 9367.62 samples/sec   Loss 3.4390   LearningRate 0.0004   Epoch: 18   Global Step: 313160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:15,557-Speed 9206.94 samples/sec   Loss 3.3592   LearningRate 0.0004   Epoch: 18   Global Step: 313170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:16,692-Speed 9026.42 samples/sec   Loss 3.3585   LearningRate 0.0004   Epoch: 18   Global Step: 313180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:17,818-Speed 9104.37 samples/sec   Loss 3.4075   LearningRate 0.0004   Epoch: 18   Global Step: 313190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:18,962-Speed 8954.58 samples/sec   Loss 3.4862   LearningRate 0.0004   Epoch: 18   Global Step: 313200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:20,088-Speed 9106.91 samples/sec   Loss 3.3986   LearningRate 0.0004   Epoch: 18   Global Step: 313210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:21,215-Speed 9088.26 samples/sec   Loss 3.4385   LearningRate 0.0004   Epoch: 18   Global Step: 313220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:22,341-Speed 9100.42 samples/sec   Loss 3.3527   LearningRate 0.0004   Epoch: 18   Global Step: 313230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:23,522-Speed 8674.46 samples/sec   Loss 3.3814   LearningRate 0.0004   Epoch: 18   Global Step: 313240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:24,671-Speed 8918.32 samples/sec   Loss 3.4500   LearningRate 0.0004   Epoch: 18   Global Step: 313250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:25,745-Speed 9538.07 samples/sec   Loss 3.4226   LearningRate 0.0004   Epoch: 18   Global Step: 313260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:26,825-Speed 9484.93 samples/sec   Loss 3.3676   LearningRate 0.0004   Epoch: 18   Global Step: 313270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:27,985-Speed 8836.81 samples/sec   Loss 3.3583   LearningRate 0.0004   Epoch: 18   Global Step: 313280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:29,131-Speed 8938.08 samples/sec   Loss 3.3790   LearningRate 0.0004   Epoch: 18   Global Step: 313290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:30,249-Speed 9166.12 samples/sec   Loss 3.4325   LearningRate 0.0004   Epoch: 18   Global Step: 313300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:31,368-Speed 9153.83 samples/sec   Loss 3.3940   LearningRate 0.0004   Epoch: 18   Global Step: 313310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:32,500-Speed 9054.60 samples/sec   Loss 3.4187   LearningRate 0.0004   Epoch: 18   Global Step: 313320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:33,575-Speed 9526.19 samples/sec   Loss 3.3858   LearningRate 0.0004   Epoch: 18   Global Step: 313330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:34,731-Speed 8865.35 samples/sec   Loss 3.4050   LearningRate 0.0004   Epoch: 18   Global Step: 313340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:35,874-Speed 8964.79 samples/sec   Loss 3.3385   LearningRate 0.0004   Epoch: 18   Global Step: 313350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:36,985-Speed 9219.62 samples/sec   Loss 3.3863   LearningRate 0.0004   Epoch: 18   Global Step: 313360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:38,093-Speed 9249.51 samples/sec   Loss 3.3587   LearningRate 0.0004   Epoch: 18   Global Step: 313370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:39,197-Speed 9283.35 samples/sec   Loss 3.4161   LearningRate 0.0004   Epoch: 18   Global Step: 313380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:40,300-Speed 9289.70 samples/sec   Loss 3.3919   LearningRate 0.0004   Epoch: 18   Global Step: 313390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:41,398-Speed 9330.17 samples/sec   Loss 3.4045   LearningRate 0.0004   Epoch: 18   Global Step: 313400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:42,521-Speed 9126.05 samples/sec   Loss 3.4927   LearningRate 0.0004   Epoch: 18   Global Step: 313410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:43,606-Speed 9440.45 samples/sec   Loss 3.3489   LearningRate 0.0004   Epoch: 18   Global Step: 313420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:44,735-Speed 9075.78 samples/sec   Loss 3.3410   LearningRate 0.0004   Epoch: 18   Global Step: 313430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:45,864-Speed 9072.40 samples/sec   Loss 3.4136   LearningRate 0.0004   Epoch: 18   Global Step: 313440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:46,971-Speed 9265.41 samples/sec   Loss 3.4357   LearningRate 0.0004   Epoch: 18   Global Step: 313450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:48,109-Speed 9001.91 samples/sec   Loss 3.3717   LearningRate 0.0004   Epoch: 18   Global Step: 313460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:14:49,217-Speed 9246.99 samples/sec   Loss 3.3624   LearningRate 0.0004   Epoch: 18   Global Step: 313470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:50,352-Speed 9027.28 samples/sec   Loss 3.4106   LearningRate 0.0004   Epoch: 18   Global Step: 313480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:51,466-Speed 9201.27 samples/sec   Loss 3.4079   LearningRate 0.0004   Epoch: 18   Global Step: 313490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:52,608-Speed 8965.31 samples/sec   Loss 3.3995   LearningRate 0.0004   Epoch: 18   Global Step: 313500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:53,767-Speed 8843.94 samples/sec   Loss 3.4035   LearningRate 0.0004   Epoch: 18   Global Step: 313510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:54,903-Speed 9024.11 samples/sec   Loss 3.3360   LearningRate 0.0004   Epoch: 18   Global Step: 313520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:56,095-Speed 8592.87 samples/sec   Loss 3.3700   LearningRate 0.0004   Epoch: 18   Global Step: 313530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:57,235-Speed 8985.61 samples/sec   Loss 3.4058   LearningRate 0.0004   Epoch: 18   Global Step: 313540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:58,373-Speed 9003.75 samples/sec   Loss 3.3479   LearningRate 0.0004   Epoch: 18   Global Step: 313550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:14:59,476-Speed 9287.36 samples/sec   Loss 3.4113   LearningRate 0.0004   Epoch: 18   Global Step: 313560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:15:00,632-Speed 8862.84 samples/sec   Loss 3.4386   LearningRate 0.0004   Epoch: 18   Global Step: 313570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:01,769-Speed 9012.93 samples/sec   Loss 3.4026   LearningRate 0.0004   Epoch: 18   Global Step: 313580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:02,909-Speed 8987.68 samples/sec   Loss 3.4784   LearningRate 0.0004   Epoch: 18   Global Step: 313590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:03,998-Speed 9405.06 samples/sec   Loss 3.3842   LearningRate 0.0004   Epoch: 18   Global Step: 313600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:05,110-Speed 9218.31 samples/sec   Loss 3.4162   LearningRate 0.0004   Epoch: 18   Global Step: 313610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:06,240-Speed 9068.66 samples/sec   Loss 3.3908   LearningRate 0.0004   Epoch: 18   Global Step: 313620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:07,374-Speed 9036.60 samples/sec   Loss 3.4290   LearningRate 0.0004   Epoch: 18   Global Step: 313630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:08,500-Speed 9101.69 samples/sec   Loss 3.4703   LearningRate 0.0004   Epoch: 18   Global Step: 313640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:09,612-Speed 9211.09 samples/sec   Loss 3.4163   LearningRate 0.0004   Epoch: 18   Global Step: 313650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:10,720-Speed 9250.17 samples/sec   Loss 3.4887   LearningRate 0.0004   Epoch: 18   Global Step: 313660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:11,864-Speed 8949.61 samples/sec   Loss 3.3864   LearningRate 0.0004   Epoch: 18   Global Step: 313670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:12,983-Speed 9159.93 samples/sec   Loss 3.4208   LearningRate 0.0004   Epoch: 18   Global Step: 313680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:14,110-Speed 9090.68 samples/sec   Loss 3.3962   LearningRate 0.0004   Epoch: 18   Global Step: 313690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:15,200-Speed 9395.24 samples/sec   Loss 3.3755   LearningRate 0.0004   Epoch: 18   Global Step: 313700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:16,331-Speed 9066.05 samples/sec   Loss 3.4239   LearningRate 0.0004   Epoch: 18   Global Step: 313710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:17,451-Speed 9152.21 samples/sec   Loss 3.3620   LearningRate 0.0004   Epoch: 18   Global Step: 313720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:18,597-Speed 8941.96 samples/sec   Loss 3.4473   LearningRate 0.0004   Epoch: 18   Global Step: 313730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:19,750-Speed 8883.26 samples/sec   Loss 3.3621   LearningRate 0.0004   Epoch: 18   Global Step: 313740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:20,918-Speed 8773.94 samples/sec   Loss 3.4005   LearningRate 0.0004   Epoch: 18   Global Step: 313750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:22,066-Speed 8919.33 samples/sec   Loss 3.4719   LearningRate 0.0004   Epoch: 18   Global Step: 313760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:23,150-Speed 9458.31 samples/sec   Loss 3.4333   LearningRate 0.0004   Epoch: 18   Global Step: 313770   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:15:24,286-Speed 9016.96 samples/sec   Loss 3.4461   LearningRate 0.0004   Epoch: 18   Global Step: 313780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:25,409-Speed 9122.15 samples/sec   Loss 3.3148   LearningRate 0.0004   Epoch: 18   Global Step: 313790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:26,531-Speed 9134.28 samples/sec   Loss 3.4442   LearningRate 0.0004   Epoch: 18   Global Step: 313800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:27,631-Speed 9312.60 samples/sec   Loss 3.5248   LearningRate 0.0004   Epoch: 18   Global Step: 313810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:28,735-Speed 9280.24 samples/sec   Loss 3.4178   LearningRate 0.0004   Epoch: 18   Global Step: 313820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:29,838-Speed 9290.48 samples/sec   Loss 3.4698   LearningRate 0.0004   Epoch: 18   Global Step: 313830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:30,985-Speed 8930.42 samples/sec   Loss 3.4601   LearningRate 0.0004   Epoch: 18   Global Step: 313840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:32,186-Speed 8530.27 samples/sec   Loss 3.4088   LearningRate 0.0004   Epoch: 18   Global Step: 313850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:33,336-Speed 8908.89 samples/sec   Loss 3.4252   LearningRate 0.0004   Epoch: 18   Global Step: 313860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:34,478-Speed 8970.59 samples/sec   Loss 3.4396   LearningRate 0.0004   Epoch: 18   Global Step: 313870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:35,612-Speed 9040.96 samples/sec   Loss 3.3892   LearningRate 0.0004   Epoch: 18   Global Step: 313880   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:15:36,666-Speed 9720.75 samples/sec   Loss 3.4051   LearningRate 0.0004   Epoch: 18   Global Step: 313890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:37,798-Speed 9050.73 samples/sec   Loss 3.3473   LearningRate 0.0004   Epoch: 18   Global Step: 313900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:38,919-Speed 9137.63 samples/sec   Loss 3.4242   LearningRate 0.0004   Epoch: 18   Global Step: 313910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:40,051-Speed 9050.26 samples/sec   Loss 3.3825   LearningRate 0.0004   Epoch: 18   Global Step: 313920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:41,204-Speed 8886.80 samples/sec   Loss 3.3836   LearningRate 0.0004   Epoch: 18   Global Step: 313930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:42,318-Speed 9200.69 samples/sec   Loss 3.3595   LearningRate 0.0004   Epoch: 18   Global Step: 313940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:43,459-Speed 8974.53 samples/sec   Loss 3.4173   LearningRate 0.0004   Epoch: 18   Global Step: 313950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:44,627-Speed 8772.67 samples/sec   Loss 3.4192   LearningRate 0.0004   Epoch: 18   Global Step: 313960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:45,712-Speed 9442.54 samples/sec   Loss 3.3360   LearningRate 0.0004   Epoch: 18   Global Step: 313970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:46,832-Speed 9150.24 samples/sec   Loss 3.3982   LearningRate 0.0004   Epoch: 18   Global Step: 313980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:47,953-Speed 9144.97 samples/sec   Loss 3.4434   LearningRate 0.0004   Epoch: 18   Global Step: 313990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:15:49,108-Speed 8872.83 samples/sec   Loss 3.4052   LearningRate 0.0004   Epoch: 18   Global Step: 314000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:16:11,743-[lfw][314000]XNorm: 6.549081
Training: 2022-04-12 00:16:11,744-[lfw][314000]Accuracy-Flip: 0.99700+-0.00287
Training: 2022-04-12 00:16:11,744-[lfw][314000]Accuracy-Highest: 0.99750
Training: 2022-04-12 00:16:37,250-[cfp_fp][314000]XNorm: 5.733874
Training: 2022-04-12 00:16:37,251-[cfp_fp][314000]Accuracy-Flip: 0.97514+-0.00793
Training: 2022-04-12 00:16:37,252-[cfp_fp][314000]Accuracy-Highest: 0.97514
Training: 2022-04-12 00:16:59,289-[agedb_30][314000]XNorm: 6.381650
Training: 2022-04-12 00:16:59,290-[agedb_30][314000]Accuracy-Flip: 0.97233+-0.00750
Training: 2022-04-12 00:16:59,291-[agedb_30][314000]Accuracy-Highest: 0.97417
Training: 2022-04-12 00:17:00,444-Speed 143.55 samples/sec   Loss 3.3967   LearningRate 0.0004   Epoch: 18   Global Step: 314010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:01,564-Speed 9152.79 samples/sec   Loss 3.3667   LearningRate 0.0004   Epoch: 18   Global Step: 314020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:02,701-Speed 9011.41 samples/sec   Loss 3.4136   LearningRate 0.0004   Epoch: 18   Global Step: 314030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:03,818-Speed 9167.66 samples/sec   Loss 3.4172   LearningRate 0.0004   Epoch: 18   Global Step: 314040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:04,972-Speed 8882.46 samples/sec   Loss 3.3669   LearningRate 0.0004   Epoch: 18   Global Step: 314050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:06,091-Speed 9155.03 samples/sec   Loss 3.3425   LearningRate 0.0004   Epoch: 18   Global Step: 314060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:07,206-Speed 9195.35 samples/sec   Loss 3.4544   LearningRate 0.0004   Epoch: 18   Global Step: 314070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:08,346-Speed 8986.89 samples/sec   Loss 3.4317   LearningRate 0.0003   Epoch: 18   Global Step: 314080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:09,461-Speed 9182.49 samples/sec   Loss 3.3650   LearningRate 0.0003   Epoch: 18   Global Step: 314090   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:17:10,589-Speed 9086.08 samples/sec   Loss 3.4691   LearningRate 0.0003   Epoch: 18   Global Step: 314100   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:17:11,701-Speed 9215.56 samples/sec   Loss 3.3406   LearningRate 0.0003   Epoch: 18   Global Step: 314110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:12,817-Speed 9176.15 samples/sec   Loss 3.4729   LearningRate 0.0003   Epoch: 18   Global Step: 314120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:13,968-Speed 8904.26 samples/sec   Loss 3.3564   LearningRate 0.0003   Epoch: 18   Global Step: 314130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:15,067-Speed 9318.09 samples/sec   Loss 3.3675   LearningRate 0.0003   Epoch: 18   Global Step: 314140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:16,165-Speed 9330.55 samples/sec   Loss 3.4450   LearningRate 0.0003   Epoch: 18   Global Step: 314150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:17,340-Speed 8729.63 samples/sec   Loss 3.4759   LearningRate 0.0003   Epoch: 18   Global Step: 314160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:18,519-Speed 8688.29 samples/sec   Loss 3.4023   LearningRate 0.0003   Epoch: 18   Global Step: 314170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:19,709-Speed 8609.88 samples/sec   Loss 3.4160   LearningRate 0.0003   Epoch: 18   Global Step: 314180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:20,839-Speed 9059.87 samples/sec   Loss 3.4554   LearningRate 0.0003   Epoch: 18   Global Step: 314190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:21,999-Speed 8833.91 samples/sec   Loss 3.3380   LearningRate 0.0003   Epoch: 18   Global Step: 314200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:23,121-Speed 9135.42 samples/sec   Loss 3.5068   LearningRate 0.0003   Epoch: 18   Global Step: 314210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:24,208-Speed 9431.22 samples/sec   Loss 3.4498   LearningRate 0.0003   Epoch: 18   Global Step: 314220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:25,344-Speed 9019.18 samples/sec   Loss 3.3903   LearningRate 0.0003   Epoch: 18   Global Step: 314230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:26,502-Speed 8846.32 samples/sec   Loss 3.3902   LearningRate 0.0003   Epoch: 18   Global Step: 314240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:27,613-Speed 9220.72 samples/sec   Loss 3.3875   LearningRate 0.0003   Epoch: 18   Global Step: 314250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:28,667-Speed 9723.71 samples/sec   Loss 3.4251   LearningRate 0.0003   Epoch: 18   Global Step: 314260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:29,763-Speed 9346.66 samples/sec   Loss 3.3678   LearningRate 0.0003   Epoch: 18   Global Step: 314270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:30,821-Speed 9687.40 samples/sec   Loss 3.4515   LearningRate 0.0003   Epoch: 18   Global Step: 314280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:31,967-Speed 8938.51 samples/sec   Loss 3.3263   LearningRate 0.0003   Epoch: 18   Global Step: 314290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:33,139-Speed 8739.21 samples/sec   Loss 3.3802   LearningRate 0.0003   Epoch: 18   Global Step: 314300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:34,303-Speed 8805.01 samples/sec   Loss 3.3892   LearningRate 0.0003   Epoch: 18   Global Step: 314310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:35,427-Speed 9115.93 samples/sec   Loss 3.4583   LearningRate 0.0003   Epoch: 18   Global Step: 314320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:36,550-Speed 9125.98 samples/sec   Loss 3.4694   LearningRate 0.0003   Epoch: 18   Global Step: 314330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:37,665-Speed 9191.91 samples/sec   Loss 3.4020   LearningRate 0.0003   Epoch: 18   Global Step: 314340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:38,786-Speed 9134.56 samples/sec   Loss 3.4422   LearningRate 0.0003   Epoch: 18   Global Step: 314350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:39,888-Speed 9302.62 samples/sec   Loss 3.4223   LearningRate 0.0003   Epoch: 18   Global Step: 314360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:41,044-Speed 8860.14 samples/sec   Loss 3.4286   LearningRate 0.0003   Epoch: 18   Global Step: 314370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:42,160-Speed 9183.31 samples/sec   Loss 3.3631   LearningRate 0.0003   Epoch: 18   Global Step: 314380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:43,288-Speed 9081.63 samples/sec   Loss 3.4222   LearningRate 0.0003   Epoch: 18   Global Step: 314390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:44,406-Speed 9168.53 samples/sec   Loss 3.4368   LearningRate 0.0003   Epoch: 18   Global Step: 314400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:45,526-Speed 9144.61 samples/sec   Loss 3.4614   LearningRate 0.0003   Epoch: 18   Global Step: 314410   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:17:46,627-Speed 9307.86 samples/sec   Loss 3.3893   LearningRate 0.0003   Epoch: 18   Global Step: 314420   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:17:47,725-Speed 9332.57 samples/sec   Loss 3.3860   LearningRate 0.0003   Epoch: 18   Global Step: 314430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:48,846-Speed 9140.76 samples/sec   Loss 3.3935   LearningRate 0.0003   Epoch: 18   Global Step: 314440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:50,010-Speed 8799.58 samples/sec   Loss 3.4519   LearningRate 0.0003   Epoch: 18   Global Step: 314450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:51,135-Speed 9107.23 samples/sec   Loss 3.4079   LearningRate 0.0003   Epoch: 18   Global Step: 314460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:52,278-Speed 8961.39 samples/sec   Loss 3.3517   LearningRate 0.0003   Epoch: 18   Global Step: 314470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:53,387-Speed 9248.32 samples/sec   Loss 3.4445   LearningRate 0.0003   Epoch: 18   Global Step: 314480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:54,498-Speed 9218.14 samples/sec   Loss 3.4854   LearningRate 0.0003   Epoch: 18   Global Step: 314490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:55,659-Speed 8827.11 samples/sec   Loss 3.3303   LearningRate 0.0003   Epoch: 18   Global Step: 314500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:56,813-Speed 8882.02 samples/sec   Loss 3.4618   LearningRate 0.0003   Epoch: 18   Global Step: 314510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:57,931-Speed 9159.84 samples/sec   Loss 3.3481   LearningRate 0.0003   Epoch: 18   Global Step: 314520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:17:59,019-Speed 9416.32 samples/sec   Loss 3.3843   LearningRate 0.0003   Epoch: 18   Global Step: 314530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:00,116-Speed 9345.44 samples/sec   Loss 3.3986   LearningRate 0.0003   Epoch: 18   Global Step: 314540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:01,253-Speed 9010.63 samples/sec   Loss 3.4323   LearningRate 0.0003   Epoch: 18   Global Step: 314550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:02,391-Speed 9003.86 samples/sec   Loss 3.4162   LearningRate 0.0003   Epoch: 18   Global Step: 314560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:03,507-Speed 9179.91 samples/sec   Loss 3.3391   LearningRate 0.0003   Epoch: 18   Global Step: 314570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:04,618-Speed 9223.74 samples/sec   Loss 3.4990   LearningRate 0.0003   Epoch: 18   Global Step: 314580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:05,731-Speed 9210.99 samples/sec   Loss 3.4409   LearningRate 0.0003   Epoch: 18   Global Step: 314590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:06,803-Speed 9560.37 samples/sec   Loss 3.4499   LearningRate 0.0003   Epoch: 18   Global Step: 314600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:18:07,927-Speed 9111.43 samples/sec   Loss 3.2832   LearningRate 0.0003   Epoch: 18   Global Step: 314610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:18:09,022-Speed 9355.70 samples/sec   Loss 3.3506   LearningRate 0.0003   Epoch: 18   Global Step: 314620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:18:10,113-Speed 9391.10 samples/sec   Loss 3.4317   LearningRate 0.0003   Epoch: 18   Global Step: 314630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:18:11,199-Speed 9430.65 samples/sec   Loss 3.4564   LearningRate 0.0003   Epoch: 18   Global Step: 314640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:18:12,297-Speed 9330.94 samples/sec   Loss 3.3793   LearningRate 0.0003   Epoch: 18   Global Step: 314650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:18:13,422-Speed 9112.56 samples/sec   Loss 3.3944   LearningRate 0.0003   Epoch: 18   Global Step: 314660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:18:14,530-Speed 9241.57 samples/sec   Loss 3.4348   LearningRate 0.0003   Epoch: 18   Global Step: 314670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:18:15,632-Speed 9296.10 samples/sec   Loss 3.4070   LearningRate 0.0003   Epoch: 18   Global Step: 314680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:18:16,728-Speed 9360.82 samples/sec   Loss 3.3858   LearningRate 0.0003   Epoch: 18   Global Step: 314690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:18:17,844-Speed 9185.12 samples/sec   Loss 3.3746   LearningRate 0.0003   Epoch: 18   Global Step: 314700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:18,955-Speed 9223.14 samples/sec   Loss 3.4358   LearningRate 0.0003   Epoch: 18   Global Step: 314710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:20,065-Speed 9226.72 samples/sec   Loss 3.3899   LearningRate 0.0003   Epoch: 18   Global Step: 314720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:21,162-Speed 9341.87 samples/sec   Loss 3.4226   LearningRate 0.0003   Epoch: 18   Global Step: 314730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:22,282-Speed 9148.08 samples/sec   Loss 3.3957   LearningRate 0.0003   Epoch: 18   Global Step: 314740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:23,387-Speed 9281.24 samples/sec   Loss 3.4321   LearningRate 0.0003   Epoch: 18   Global Step: 314750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:24,470-Speed 9466.87 samples/sec   Loss 3.4321   LearningRate 0.0003   Epoch: 18   Global Step: 314760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:25,582-Speed 9212.52 samples/sec   Loss 3.3605   LearningRate 0.0003   Epoch: 18   Global Step: 314770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:26,670-Speed 9413.39 samples/sec   Loss 3.4349   LearningRate 0.0003   Epoch: 18   Global Step: 314780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:27,808-Speed 9005.27 samples/sec   Loss 3.4292   LearningRate 0.0003   Epoch: 18   Global Step: 314790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:28,914-Speed 9262.97 samples/sec   Loss 3.4353   LearningRate 0.0003   Epoch: 18   Global Step: 314800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:30,049-Speed 9023.94 samples/sec   Loss 3.4010   LearningRate 0.0003   Epoch: 18   Global Step: 314810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:31,137-Speed 9415.97 samples/sec   Loss 3.4539   LearningRate 0.0003   Epoch: 18   Global Step: 314820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:32,251-Speed 9204.31 samples/sec   Loss 3.4138   LearningRate 0.0003   Epoch: 18   Global Step: 314830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:33,407-Speed 8860.17 samples/sec   Loss 3.4599   LearningRate 0.0003   Epoch: 18   Global Step: 314840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:34,501-Speed 9365.78 samples/sec   Loss 3.3512   LearningRate 0.0003   Epoch: 18   Global Step: 314850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:35,619-Speed 9170.14 samples/sec   Loss 3.4225   LearningRate 0.0003   Epoch: 18   Global Step: 314860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:36,742-Speed 9120.66 samples/sec   Loss 3.4343   LearningRate 0.0003   Epoch: 18   Global Step: 314870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:37,857-Speed 9185.96 samples/sec   Loss 3.4600   LearningRate 0.0003   Epoch: 18   Global Step: 314880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:39,031-Speed 8729.40 samples/sec   Loss 3.4211   LearningRate 0.0003   Epoch: 18   Global Step: 314890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:40,132-Speed 9308.13 samples/sec   Loss 3.3641   LearningRate 0.0003   Epoch: 18   Global Step: 314900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:41,248-Speed 9194.41 samples/sec   Loss 3.3875   LearningRate 0.0003   Epoch: 18   Global Step: 314910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:42,332-Speed 9448.75 samples/sec   Loss 3.3544   LearningRate 0.0003   Epoch: 18   Global Step: 314920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:43,446-Speed 9199.99 samples/sec   Loss 3.3683   LearningRate 0.0003   Epoch: 18   Global Step: 314930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:44,595-Speed 8914.93 samples/sec   Loss 3.5218   LearningRate 0.0003   Epoch: 18   Global Step: 314940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:45,708-Speed 9203.97 samples/sec   Loss 3.5052   LearningRate 0.0003   Epoch: 18   Global Step: 314950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:46,838-Speed 9074.76 samples/sec   Loss 3.3032   LearningRate 0.0003   Epoch: 18   Global Step: 314960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:47,979-Speed 8979.79 samples/sec   Loss 3.4989   LearningRate 0.0003   Epoch: 18   Global Step: 314970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:49,098-Speed 9156.08 samples/sec   Loss 3.4714   LearningRate 0.0003   Epoch: 18   Global Step: 314980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:50,227-Speed 9071.41 samples/sec   Loss 3.3587   LearningRate 0.0003   Epoch: 18   Global Step: 314990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:51,364-Speed 9007.60 samples/sec   Loss 3.4186   LearningRate 0.0003   Epoch: 18   Global Step: 315000   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:18:52,485-Speed 9142.39 samples/sec   Loss 3.4103   LearningRate 0.0003   Epoch: 18   Global Step: 315010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:53,615-Speed 9066.30 samples/sec   Loss 3.4039   LearningRate 0.0003   Epoch: 18   Global Step: 315020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:54,766-Speed 8906.68 samples/sec   Loss 3.3708   LearningRate 0.0003   Epoch: 18   Global Step: 315030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:55,913-Speed 8928.00 samples/sec   Loss 3.4315   LearningRate 0.0003   Epoch: 18   Global Step: 315040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:57,031-Speed 9164.50 samples/sec   Loss 3.4109   LearningRate 0.0003   Epoch: 18   Global Step: 315050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:58,166-Speed 9030.11 samples/sec   Loss 3.3159   LearningRate 0.0003   Epoch: 18   Global Step: 315060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:18:59,279-Speed 9210.60 samples/sec   Loss 3.4196   LearningRate 0.0003   Epoch: 18   Global Step: 315070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:00,465-Speed 8639.71 samples/sec   Loss 3.3907   LearningRate 0.0003   Epoch: 18   Global Step: 315080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:01,545-Speed 9482.19 samples/sec   Loss 3.3593   LearningRate 0.0003   Epoch: 18   Global Step: 315090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:02,705-Speed 8832.19 samples/sec   Loss 3.4104   LearningRate 0.0003   Epoch: 18   Global Step: 315100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:03,818-Speed 9204.19 samples/sec   Loss 3.4747   LearningRate 0.0003   Epoch: 18   Global Step: 315110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:04,927-Speed 9241.78 samples/sec   Loss 3.4361   LearningRate 0.0003   Epoch: 18   Global Step: 315120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:06,053-Speed 9100.46 samples/sec   Loss 3.3859   LearningRate 0.0003   Epoch: 18   Global Step: 315130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:07,200-Speed 8933.87 samples/sec   Loss 3.3890   LearningRate 0.0003   Epoch: 18   Global Step: 315140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:08,358-Speed 8852.11 samples/sec   Loss 3.4035   LearningRate 0.0003   Epoch: 18   Global Step: 315150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:09,539-Speed 8676.47 samples/sec   Loss 3.3427   LearningRate 0.0003   Epoch: 18   Global Step: 315160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:10,681-Speed 8966.72 samples/sec   Loss 3.4275   LearningRate 0.0003   Epoch: 18   Global Step: 315170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:11,841-Speed 8834.40 samples/sec   Loss 3.4541   LearningRate 0.0003   Epoch: 18   Global Step: 315180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:12,969-Speed 9081.86 samples/sec   Loss 3.4423   LearningRate 0.0003   Epoch: 18   Global Step: 315190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:14,110-Speed 8982.49 samples/sec   Loss 3.4745   LearningRate 0.0003   Epoch: 18   Global Step: 315200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:15,284-Speed 8722.67 samples/sec   Loss 3.3840   LearningRate 0.0003   Epoch: 18   Global Step: 315210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:16,419-Speed 9031.82 samples/sec   Loss 3.3840   LearningRate 0.0003   Epoch: 18   Global Step: 315220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:17,574-Speed 8873.43 samples/sec   Loss 3.4259   LearningRate 0.0003   Epoch: 18   Global Step: 315230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:18,690-Speed 9177.22 samples/sec   Loss 3.4397   LearningRate 0.0003   Epoch: 18   Global Step: 315240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:19,863-Speed 8741.16 samples/sec   Loss 3.4750   LearningRate 0.0003   Epoch: 18   Global Step: 315250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:20,974-Speed 9216.84 samples/sec   Loss 3.3739   LearningRate 0.0003   Epoch: 18   Global Step: 315260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:22,126-Speed 8896.84 samples/sec   Loss 3.3663   LearningRate 0.0003   Epoch: 18   Global Step: 315270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:23,277-Speed 8896.93 samples/sec   Loss 3.4001   LearningRate 0.0003   Epoch: 18   Global Step: 315280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:24,370-Speed 9376.11 samples/sec   Loss 3.4724   LearningRate 0.0003   Epoch: 18   Global Step: 315290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:25,459-Speed 9410.85 samples/sec   Loss 3.4251   LearningRate 0.0003   Epoch: 18   Global Step: 315300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:26,617-Speed 8850.73 samples/sec   Loss 3.4300   LearningRate 0.0003   Epoch: 18   Global Step: 315310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:27,743-Speed 9094.78 samples/sec   Loss 3.4249   LearningRate 0.0003   Epoch: 18   Global Step: 315320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:28,855-Speed 9218.25 samples/sec   Loss 3.3185   LearningRate 0.0003   Epoch: 18   Global Step: 315330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:30,052-Speed 8555.75 samples/sec   Loss 3.3468   LearningRate 0.0003   Epoch: 18   Global Step: 315340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:31,175-Speed 9124.49 samples/sec   Loss 3.4783   LearningRate 0.0003   Epoch: 18   Global Step: 315350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:32,296-Speed 9137.00 samples/sec   Loss 3.4040   LearningRate 0.0003   Epoch: 18   Global Step: 315360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:33,400-Speed 9278.84 samples/sec   Loss 3.4393   LearningRate 0.0003   Epoch: 18   Global Step: 315370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:34,629-Speed 8342.88 samples/sec   Loss 3.4058   LearningRate 0.0003   Epoch: 18   Global Step: 315380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:35,748-Speed 9155.62 samples/sec   Loss 3.3074   LearningRate 0.0003   Epoch: 18   Global Step: 315390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:36,864-Speed 9187.26 samples/sec   Loss 3.3446   LearningRate 0.0003   Epoch: 18   Global Step: 315400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:38,015-Speed 8897.00 samples/sec   Loss 3.4829   LearningRate 0.0003   Epoch: 18   Global Step: 315410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:39,171-Speed 8867.00 samples/sec   Loss 3.3293   LearningRate 0.0003   Epoch: 18   Global Step: 315420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:40,244-Speed 9547.06 samples/sec   Loss 3.3828   LearningRate 0.0003   Epoch: 18   Global Step: 315430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:41,347-Speed 9287.49 samples/sec   Loss 3.4108   LearningRate 0.0003   Epoch: 18   Global Step: 315440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:42,424-Speed 9512.19 samples/sec   Loss 3.4790   LearningRate 0.0003   Epoch: 18   Global Step: 315450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:43,561-Speed 9009.13 samples/sec   Loss 3.4174   LearningRate 0.0003   Epoch: 18   Global Step: 315460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:44,700-Speed 8999.53 samples/sec   Loss 3.4971   LearningRate 0.0003   Epoch: 18   Global Step: 315470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:19:45,803-Speed 9285.72 samples/sec   Loss 3.3621   LearningRate 0.0003   Epoch: 18   Global Step: 315480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:46,924-Speed 9140.21 samples/sec   Loss 3.3930   LearningRate 0.0003   Epoch: 18   Global Step: 315490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:48,074-Speed 8907.88 samples/sec   Loss 3.3825   LearningRate 0.0003   Epoch: 18   Global Step: 315500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:49,202-Speed 9088.73 samples/sec   Loss 3.3800   LearningRate 0.0003   Epoch: 18   Global Step: 315510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:50,319-Speed 9173.07 samples/sec   Loss 3.4279   LearningRate 0.0003   Epoch: 18   Global Step: 315520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:51,424-Speed 9269.70 samples/sec   Loss 3.4221   LearningRate 0.0003   Epoch: 18   Global Step: 315530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:52,559-Speed 9024.71 samples/sec   Loss 3.4516   LearningRate 0.0003   Epoch: 18   Global Step: 315540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:53,681-Speed 9133.76 samples/sec   Loss 3.3850   LearningRate 0.0003   Epoch: 18   Global Step: 315550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:54,802-Speed 9144.28 samples/sec   Loss 3.4755   LearningRate 0.0003   Epoch: 18   Global Step: 315560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:55,870-Speed 9595.24 samples/sec   Loss 3.4369   LearningRate 0.0003   Epoch: 18   Global Step: 315570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:57,000-Speed 9066.37 samples/sec   Loss 3.3477   LearningRate 0.0003   Epoch: 18   Global Step: 315580   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:19:58,119-Speed 9152.43 samples/sec   Loss 3.3894   LearningRate 0.0003   Epoch: 18   Global Step: 315590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:19:59,302-Speed 8664.08 samples/sec   Loss 3.3941   LearningRate 0.0003   Epoch: 18   Global Step: 315600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:00,402-Speed 9306.94 samples/sec   Loss 3.3459   LearningRate 0.0003   Epoch: 18   Global Step: 315610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:01,516-Speed 9199.60 samples/sec   Loss 3.4113   LearningRate 0.0003   Epoch: 18   Global Step: 315620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:02,657-Speed 8980.00 samples/sec   Loss 3.4680   LearningRate 0.0003   Epoch: 18   Global Step: 315630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:03,819-Speed 8815.19 samples/sec   Loss 3.4132   LearningRate 0.0003   Epoch: 18   Global Step: 315640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:04,924-Speed 9270.03 samples/sec   Loss 3.4450   LearningRate 0.0003   Epoch: 18   Global Step: 315650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:06,018-Speed 9377.16 samples/sec   Loss 3.4784   LearningRate 0.0003   Epoch: 18   Global Step: 315660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:07,128-Speed 9226.94 samples/sec   Loss 3.4935   LearningRate 0.0003   Epoch: 18   Global Step: 315670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:08,259-Speed 9061.45 samples/sec   Loss 3.3809   LearningRate 0.0003   Epoch: 18   Global Step: 315680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:09,372-Speed 9200.83 samples/sec   Loss 3.3982   LearningRate 0.0003   Epoch: 18   Global Step: 315690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:10,548-Speed 8711.42 samples/sec   Loss 3.4191   LearningRate 0.0003   Epoch: 18   Global Step: 315700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:11,693-Speed 8953.35 samples/sec   Loss 3.3708   LearningRate 0.0003   Epoch: 18   Global Step: 315710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:12,808-Speed 9186.04 samples/sec   Loss 3.3950   LearningRate 0.0003   Epoch: 18   Global Step: 315720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:13,938-Speed 9072.69 samples/sec   Loss 3.3730   LearningRate 0.0003   Epoch: 18   Global Step: 315730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:15,043-Speed 9266.70 samples/sec   Loss 3.4344   LearningRate 0.0003   Epoch: 18   Global Step: 315740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:16,167-Speed 9120.46 samples/sec   Loss 3.3973   LearningRate 0.0003   Epoch: 18   Global Step: 315750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:17,331-Speed 8800.73 samples/sec   Loss 3.4373   LearningRate 0.0003   Epoch: 18   Global Step: 315760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:18,433-Speed 9294.57 samples/sec   Loss 3.4494   LearningRate 0.0003   Epoch: 18   Global Step: 315770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:19,540-Speed 9261.45 samples/sec   Loss 3.3873   LearningRate 0.0003   Epoch: 18   Global Step: 315780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:20,655-Speed 9189.70 samples/sec   Loss 3.4606   LearningRate 0.0003   Epoch: 18   Global Step: 315790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:20:21,810-Speed 8873.98 samples/sec   Loss 3.4487   LearningRate 0.0003   Epoch: 18   Global Step: 315800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:20:22,950-Speed 8982.34 samples/sec   Loss 3.4444   LearningRate 0.0003   Epoch: 18   Global Step: 315810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:20:24,065-Speed 9194.05 samples/sec   Loss 3.4324   LearningRate 0.0003   Epoch: 18   Global Step: 315820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:20:25,163-Speed 9424.41 samples/sec   Loss 3.3874   LearningRate 0.0003   Epoch: 18   Global Step: 315830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:20:26,279-Speed 9181.62 samples/sec   Loss 3.4633   LearningRate 0.0003   Epoch: 18   Global Step: 315840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:20:27,359-Speed 9486.89 samples/sec   Loss 3.3306   LearningRate 0.0003   Epoch: 18   Global Step: 315850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:20:28,450-Speed 9384.49 samples/sec   Loss 3.3840   LearningRate 0.0003   Epoch: 18   Global Step: 315860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:20:29,624-Speed 8727.93 samples/sec   Loss 3.3244   LearningRate 0.0003   Epoch: 18   Global Step: 315870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:20:30,772-Speed 8926.32 samples/sec   Loss 3.3490   LearningRate 0.0003   Epoch: 18   Global Step: 315880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:20:31,909-Speed 9015.55 samples/sec   Loss 3.4357   LearningRate 0.0003   Epoch: 18   Global Step: 315890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:33,052-Speed 8965.18 samples/sec   Loss 3.3989   LearningRate 0.0003   Epoch: 18   Global Step: 315900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:34,145-Speed 9371.98 samples/sec   Loss 3.3920   LearningRate 0.0003   Epoch: 18   Global Step: 315910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:35,254-Speed 9239.28 samples/sec   Loss 3.3825   LearningRate 0.0003   Epoch: 18   Global Step: 315920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:36,371-Speed 9173.06 samples/sec   Loss 3.3556   LearningRate 0.0003   Epoch: 18   Global Step: 315930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:37,494-Speed 9124.67 samples/sec   Loss 3.4435   LearningRate 0.0003   Epoch: 18   Global Step: 315940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:38,643-Speed 8916.60 samples/sec   Loss 3.4309   LearningRate 0.0003   Epoch: 18   Global Step: 315950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:39,774-Speed 9060.65 samples/sec   Loss 3.4561   LearningRate 0.0003   Epoch: 18   Global Step: 315960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:40,869-Speed 9352.45 samples/sec   Loss 3.4152   LearningRate 0.0003   Epoch: 18   Global Step: 315970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:41,955-Speed 9441.69 samples/sec   Loss 3.4492   LearningRate 0.0003   Epoch: 18   Global Step: 315980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:43,061-Speed 9256.42 samples/sec   Loss 3.3403   LearningRate 0.0003   Epoch: 18   Global Step: 315990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:20:44,211-Speed 8916.44 samples/sec   Loss 3.4354   LearningRate 0.0003   Epoch: 18   Global Step: 316000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:21:06,002-[lfw][316000]XNorm: 6.569137
Training: 2022-04-12 00:21:06,003-[lfw][316000]Accuracy-Flip: 0.99650+-0.00283
Training: 2022-04-12 00:21:06,003-[lfw][316000]Accuracy-Highest: 0.99750
Training: 2022-04-12 00:21:31,225-[cfp_fp][316000]XNorm: 5.736773
Training: 2022-04-12 00:21:31,226-[cfp_fp][316000]Accuracy-Flip: 0.97300+-0.00898
Training: 2022-04-12 00:21:31,226-[cfp_fp][316000]Accuracy-Highest: 0.97514
Training: 2022-04-12 00:21:52,961-[agedb_30][316000]XNorm: 6.391359
Training: 2022-04-12 00:21:52,961-[agedb_30][316000]Accuracy-Flip: 0.97333+-0.00756
Training: 2022-04-12 00:21:52,962-[agedb_30][316000]Accuracy-Highest: 0.97417
Training: 2022-04-12 00:21:54,088-Speed 146.54 samples/sec   Loss 3.4275   LearningRate 0.0003   Epoch: 18   Global Step: 316010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:21:55,239-Speed 8903.48 samples/sec   Loss 3.3612   LearningRate 0.0003   Epoch: 18   Global Step: 316020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:21:56,365-Speed 9101.75 samples/sec   Loss 3.3582   LearningRate 0.0003   Epoch: 18   Global Step: 316030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:21:57,470-Speed 9269.61 samples/sec   Loss 3.3458   LearningRate 0.0003   Epoch: 18   Global Step: 316040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:21:58,551-Speed 9477.77 samples/sec   Loss 3.3445   LearningRate 0.0003   Epoch: 18   Global Step: 316050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:21:59,611-Speed 9668.78 samples/sec   Loss 3.5072   LearningRate 0.0003   Epoch: 18   Global Step: 316060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:00,721-Speed 9230.71 samples/sec   Loss 3.4644   LearningRate 0.0003   Epoch: 18   Global Step: 316070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:01,845-Speed 9119.17 samples/sec   Loss 3.3726   LearningRate 0.0003   Epoch: 18   Global Step: 316080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:02,924-Speed 9493.98 samples/sec   Loss 3.4156   LearningRate 0.0003   Epoch: 18   Global Step: 316090   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:22:04,037-Speed 9204.45 samples/sec   Loss 3.4640   LearningRate 0.0003   Epoch: 18   Global Step: 316100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:05,160-Speed 9126.58 samples/sec   Loss 3.4257   LearningRate 0.0003   Epoch: 18   Global Step: 316110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:06,305-Speed 8945.71 samples/sec   Loss 3.3787   LearningRate 0.0003   Epoch: 18   Global Step: 316120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:07,481-Speed 8714.66 samples/sec   Loss 3.3499   LearningRate 0.0003   Epoch: 18   Global Step: 316130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:08,663-Speed 8664.86 samples/sec   Loss 3.4128   LearningRate 0.0003   Epoch: 18   Global Step: 316140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:09,827-Speed 8798.81 samples/sec   Loss 3.3769   LearningRate 0.0003   Epoch: 18   Global Step: 316150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:10,951-Speed 9118.06 samples/sec   Loss 3.4589   LearningRate 0.0003   Epoch: 18   Global Step: 316160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:12,044-Speed 9375.78 samples/sec   Loss 3.4166   LearningRate 0.0003   Epoch: 18   Global Step: 316170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:13,133-Speed 9409.13 samples/sec   Loss 3.5365   LearningRate 0.0003   Epoch: 18   Global Step: 316180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:14,207-Speed 9539.15 samples/sec   Loss 3.3975   LearningRate 0.0003   Epoch: 18   Global Step: 316190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:15,302-Speed 9358.90 samples/sec   Loss 3.3939   LearningRate 0.0003   Epoch: 18   Global Step: 316200   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:22:16,468-Speed 8786.43 samples/sec   Loss 3.4249   LearningRate 0.0003   Epoch: 18   Global Step: 316210   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:22:17,567-Speed 9321.49 samples/sec   Loss 3.3872   LearningRate 0.0003   Epoch: 18   Global Step: 316220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:18,759-Speed 8600.39 samples/sec   Loss 3.2867   LearningRate 0.0003   Epoch: 18   Global Step: 316230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:19,932-Speed 8731.85 samples/sec   Loss 3.3810   LearningRate 0.0003   Epoch: 18   Global Step: 316240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:21,007-Speed 9537.55 samples/sec   Loss 3.3819   LearningRate 0.0003   Epoch: 18   Global Step: 316250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:22,164-Speed 8854.57 samples/sec   Loss 3.3776   LearningRate 0.0003   Epoch: 18   Global Step: 316260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:23,282-Speed 9159.29 samples/sec   Loss 3.3888   LearningRate 0.0003   Epoch: 18   Global Step: 316270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:24,423-Speed 8985.30 samples/sec   Loss 3.4223   LearningRate 0.0003   Epoch: 18   Global Step: 316280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:25,563-Speed 8982.15 samples/sec   Loss 3.4182   LearningRate 0.0003   Epoch: 18   Global Step: 316290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:26,740-Speed 8704.54 samples/sec   Loss 3.3031   LearningRate 0.0003   Epoch: 18   Global Step: 316300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:27,907-Speed 8780.76 samples/sec   Loss 3.4302   LearningRate 0.0003   Epoch: 18   Global Step: 316310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:28,997-Speed 9398.91 samples/sec   Loss 3.3285   LearningRate 0.0003   Epoch: 18   Global Step: 316320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:30,139-Speed 8969.99 samples/sec   Loss 3.4373   LearningRate 0.0003   Epoch: 18   Global Step: 316330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:31,290-Speed 8905.93 samples/sec   Loss 3.3158   LearningRate 0.0003   Epoch: 18   Global Step: 316340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:32,438-Speed 8919.86 samples/sec   Loss 3.4013   LearningRate 0.0003   Epoch: 18   Global Step: 316350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:33,624-Speed 8643.08 samples/sec   Loss 3.4145   LearningRate 0.0003   Epoch: 18   Global Step: 316360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:34,760-Speed 9015.90 samples/sec   Loss 3.3525   LearningRate 0.0003   Epoch: 18   Global Step: 316370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:35,900-Speed 8991.35 samples/sec   Loss 3.3894   LearningRate 0.0003   Epoch: 18   Global Step: 316380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:37,013-Speed 9213.61 samples/sec   Loss 3.4016   LearningRate 0.0003   Epoch: 18   Global Step: 316390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:38,091-Speed 9498.49 samples/sec   Loss 3.4598   LearningRate 0.0003   Epoch: 18   Global Step: 316400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:39,226-Speed 9027.47 samples/sec   Loss 3.4000   LearningRate 0.0003   Epoch: 18   Global Step: 316410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:40,361-Speed 9029.97 samples/sec   Loss 3.4159   LearningRate 0.0003   Epoch: 18   Global Step: 316420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:41,526-Speed 8796.56 samples/sec   Loss 3.3774   LearningRate 0.0003   Epoch: 18   Global Step: 316430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:42,655-Speed 9068.62 samples/sec   Loss 3.4730   LearningRate 0.0003   Epoch: 18   Global Step: 316440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:43,774-Speed 9161.71 samples/sec   Loss 3.4214   LearningRate 0.0003   Epoch: 18   Global Step: 316450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:44,943-Speed 8764.01 samples/sec   Loss 3.3856   LearningRate 0.0003   Epoch: 18   Global Step: 316460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:46,072-Speed 9070.01 samples/sec   Loss 3.4339   LearningRate 0.0003   Epoch: 18   Global Step: 316470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:47,228-Speed 8864.70 samples/sec   Loss 3.4650   LearningRate 0.0003   Epoch: 18   Global Step: 316480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:48,349-Speed 9143.41 samples/sec   Loss 3.4631   LearningRate 0.0003   Epoch: 18   Global Step: 316490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:49,449-Speed 9310.93 samples/sec   Loss 3.4303   LearningRate 0.0003   Epoch: 18   Global Step: 316500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:50,548-Speed 9322.05 samples/sec   Loss 3.4426   LearningRate 0.0003   Epoch: 18   Global Step: 316510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:51,720-Speed 8744.86 samples/sec   Loss 3.3744   LearningRate 0.0003   Epoch: 18   Global Step: 316520   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:22:52,865-Speed 8948.26 samples/sec   Loss 3.3992   LearningRate 0.0003   Epoch: 18   Global Step: 316530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:54,001-Speed 9019.84 samples/sec   Loss 3.3806   LearningRate 0.0003   Epoch: 18   Global Step: 316540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:55,112-Speed 9225.68 samples/sec   Loss 3.4421   LearningRate 0.0003   Epoch: 18   Global Step: 316550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:56,241-Speed 9072.11 samples/sec   Loss 3.3433   LearningRate 0.0003   Epoch: 18   Global Step: 316560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:57,333-Speed 9382.93 samples/sec   Loss 3.4300   LearningRate 0.0003   Epoch: 18   Global Step: 316570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:58,435-Speed 9300.43 samples/sec   Loss 3.3842   LearningRate 0.0003   Epoch: 18   Global Step: 316580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:22:59,538-Speed 9286.41 samples/sec   Loss 3.4616   LearningRate 0.0003   Epoch: 18   Global Step: 316590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:00,660-Speed 9134.33 samples/sec   Loss 3.3543   LearningRate 0.0003   Epoch: 18   Global Step: 316600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:01,804-Speed 8954.39 samples/sec   Loss 3.4538   LearningRate 0.0003   Epoch: 18   Global Step: 316610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:02,925-Speed 9137.26 samples/sec   Loss 3.3887   LearningRate 0.0003   Epoch: 18   Global Step: 316620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:04,082-Speed 8855.14 samples/sec   Loss 3.4913   LearningRate 0.0003   Epoch: 18   Global Step: 316630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:05,215-Speed 9047.07 samples/sec   Loss 3.4382   LearningRate 0.0003   Epoch: 18   Global Step: 316640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:06,368-Speed 8887.72 samples/sec   Loss 3.4909   LearningRate 0.0003   Epoch: 18   Global Step: 316650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:07,529-Speed 8826.60 samples/sec   Loss 3.3254   LearningRate 0.0003   Epoch: 18   Global Step: 316660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:08,657-Speed 9083.12 samples/sec   Loss 3.3575   LearningRate 0.0003   Epoch: 18   Global Step: 316670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:09,824-Speed 8781.62 samples/sec   Loss 3.3907   LearningRate 0.0003   Epoch: 18   Global Step: 316680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:10,939-Speed 9184.94 samples/sec   Loss 3.3812   LearningRate 0.0003   Epoch: 18   Global Step: 316690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:12,085-Speed 8942.05 samples/sec   Loss 3.4375   LearningRate 0.0003   Epoch: 18   Global Step: 316700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:13,243-Speed 8848.24 samples/sec   Loss 3.3453   LearningRate 0.0003   Epoch: 18   Global Step: 316710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:14,383-Speed 8995.86 samples/sec   Loss 3.3563   LearningRate 0.0003   Epoch: 18   Global Step: 316720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:15,508-Speed 9103.04 samples/sec   Loss 3.4274   LearningRate 0.0003   Epoch: 18   Global Step: 316730   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:23:16,617-Speed 9236.69 samples/sec   Loss 3.4102   LearningRate 0.0003   Epoch: 18   Global Step: 316740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:17,728-Speed 9231.88 samples/sec   Loss 3.3855   LearningRate 0.0003   Epoch: 18   Global Step: 316750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:18,873-Speed 8944.17 samples/sec   Loss 3.3999   LearningRate 0.0003   Epoch: 18   Global Step: 316760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:19,974-Speed 9311.54 samples/sec   Loss 3.3935   LearningRate 0.0003   Epoch: 18   Global Step: 316770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:21,097-Speed 9122.76 samples/sec   Loss 3.3968   LearningRate 0.0003   Epoch: 18   Global Step: 316780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:22,258-Speed 8825.11 samples/sec   Loss 3.3198   LearningRate 0.0003   Epoch: 18   Global Step: 316790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:23,415-Speed 8855.52 samples/sec   Loss 3.4892   LearningRate 0.0003   Epoch: 18   Global Step: 316800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:24,550-Speed 9024.76 samples/sec   Loss 3.3436   LearningRate 0.0003   Epoch: 18   Global Step: 316810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:25,719-Speed 8767.16 samples/sec   Loss 3.4188   LearningRate 0.0003   Epoch: 18   Global Step: 316820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:26,816-Speed 9342.32 samples/sec   Loss 3.3812   LearningRate 0.0003   Epoch: 18   Global Step: 316830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:27,927-Speed 9220.75 samples/sec   Loss 3.3727   LearningRate 0.0003   Epoch: 18   Global Step: 316840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:29,030-Speed 9293.27 samples/sec   Loss 3.3707   LearningRate 0.0003   Epoch: 18   Global Step: 316850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:30,194-Speed 8801.88 samples/sec   Loss 3.4525   LearningRate 0.0003   Epoch: 18   Global Step: 316860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:31,325-Speed 9058.63 samples/sec   Loss 3.3791   LearningRate 0.0003   Epoch: 18   Global Step: 316870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:32,418-Speed 9375.30 samples/sec   Loss 3.4597   LearningRate 0.0003   Epoch: 18   Global Step: 316880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:33,553-Speed 9029.00 samples/sec   Loss 3.4692   LearningRate 0.0003   Epoch: 18   Global Step: 316890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:34,738-Speed 8648.15 samples/sec   Loss 3.4206   LearningRate 0.0003   Epoch: 18   Global Step: 316900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:35,900-Speed 8814.04 samples/sec   Loss 3.5052   LearningRate 0.0003   Epoch: 18   Global Step: 316910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:37,034-Speed 9035.74 samples/sec   Loss 3.3527   LearningRate 0.0003   Epoch: 18   Global Step: 316920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:38,162-Speed 9086.75 samples/sec   Loss 3.4412   LearningRate 0.0003   Epoch: 18   Global Step: 316930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:39,288-Speed 9095.38 samples/sec   Loss 3.3971   LearningRate 0.0003   Epoch: 18   Global Step: 316940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:40,443-Speed 8872.91 samples/sec   Loss 3.3806   LearningRate 0.0003   Epoch: 18   Global Step: 316950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:41,596-Speed 8892.46 samples/sec   Loss 3.4796   LearningRate 0.0003   Epoch: 18   Global Step: 316960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:42,721-Speed 9102.26 samples/sec   Loss 3.2824   LearningRate 0.0003   Epoch: 18   Global Step: 316970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:43,850-Speed 9080.14 samples/sec   Loss 3.3668   LearningRate 0.0003   Epoch: 18   Global Step: 316980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:44,935-Speed 9438.14 samples/sec   Loss 3.3686   LearningRate 0.0003   Epoch: 18   Global Step: 316990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:46,033-Speed 9328.93 samples/sec   Loss 3.4462   LearningRate 0.0003   Epoch: 18   Global Step: 317000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:47,158-Speed 9112.86 samples/sec   Loss 3.4151   LearningRate 0.0003   Epoch: 18   Global Step: 317010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:48,245-Speed 9424.42 samples/sec   Loss 3.4155   LearningRate 0.0003   Epoch: 18   Global Step: 317020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:49,360-Speed 9182.76 samples/sec   Loss 3.3248   LearningRate 0.0003   Epoch: 18   Global Step: 317030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:50,506-Speed 8944.03 samples/sec   Loss 3.4052   LearningRate 0.0003   Epoch: 18   Global Step: 317040   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:23:51,647-Speed 8986.51 samples/sec   Loss 3.5034   LearningRate 0.0003   Epoch: 18   Global Step: 317050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:52,745-Speed 9332.10 samples/sec   Loss 3.4058   LearningRate 0.0003   Epoch: 18   Global Step: 317060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:53,881-Speed 9018.48 samples/sec   Loss 3.4223   LearningRate 0.0003   Epoch: 18   Global Step: 317070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:54,967-Speed 9435.90 samples/sec   Loss 3.4754   LearningRate 0.0003   Epoch: 18   Global Step: 317080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:56,086-Speed 9155.16 samples/sec   Loss 3.4039   LearningRate 0.0003   Epoch: 18   Global Step: 317090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:57,247-Speed 8822.76 samples/sec   Loss 3.3941   LearningRate 0.0003   Epoch: 18   Global Step: 317100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:58,370-Speed 9122.56 samples/sec   Loss 3.4047   LearningRate 0.0003   Epoch: 18   Global Step: 317110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:23:59,719-Speed 7596.82 samples/sec   Loss 3.3648   LearningRate 0.0003   Epoch: 18   Global Step: 317120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:24:38,342-Speed 265.14 samples/sec   Loss 3.3116   LearningRate 0.0002   Epoch: 19   Global Step: 317130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:24:40,360-Speed 5076.44 samples/sec   Loss 3.3183   LearningRate 0.0002   Epoch: 19   Global Step: 317140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:24:41,735-Speed 7452.80 samples/sec   Loss 3.3292   LearningRate 0.0002   Epoch: 19   Global Step: 317150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:24:43,120-Speed 7396.13 samples/sec   Loss 3.3251   LearningRate 0.0002   Epoch: 19   Global Step: 317160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:24:44,439-Speed 7765.06 samples/sec   Loss 3.2841   LearningRate 0.0002   Epoch: 19   Global Step: 317170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:24:45,555-Speed 9189.87 samples/sec   Loss 3.3143   LearningRate 0.0002   Epoch: 19   Global Step: 317180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:24:47,042-Speed 6888.03 samples/sec   Loss 3.2816   LearningRate 0.0002   Epoch: 19   Global Step: 317190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:24:48,148-Speed 9261.63 samples/sec   Loss 3.2591   LearningRate 0.0002   Epoch: 19   Global Step: 317200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:24:49,316-Speed 8771.88 samples/sec   Loss 3.2764   LearningRate 0.0002   Epoch: 19   Global Step: 317210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:24:50,403-Speed 9426.13 samples/sec   Loss 3.3166   LearningRate 0.0002   Epoch: 19   Global Step: 317220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:24:51,504-Speed 9309.41 samples/sec   Loss 3.2450   LearningRate 0.0002   Epoch: 19   Global Step: 317230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:24:52,829-Speed 7729.82 samples/sec   Loss 3.2508   LearningRate 0.0002   Epoch: 19   Global Step: 317240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:24:53,987-Speed 8845.26 samples/sec   Loss 3.3051   LearningRate 0.0002   Epoch: 19   Global Step: 317250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:24:55,152-Speed 8795.50 samples/sec   Loss 3.2975   LearningRate 0.0002   Epoch: 19   Global Step: 317260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:24:56,843-Speed 6061.27 samples/sec   Loss 3.2642   LearningRate 0.0002   Epoch: 19   Global Step: 317270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:24:57,917-Speed 9541.40 samples/sec   Loss 3.3307   LearningRate 0.0002   Epoch: 19   Global Step: 317280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:24:59,201-Speed 7979.40 samples/sec   Loss 3.3193   LearningRate 0.0002   Epoch: 19   Global Step: 317290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:00,481-Speed 8004.16 samples/sec   Loss 3.3147   LearningRate 0.0002   Epoch: 19   Global Step: 317300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:01,595-Speed 9196.74 samples/sec   Loss 3.2980   LearningRate 0.0002   Epoch: 19   Global Step: 317310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:02,894-Speed 7884.48 samples/sec   Loss 3.2303   LearningRate 0.0002   Epoch: 19   Global Step: 317320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:04,056-Speed 8816.05 samples/sec   Loss 3.3658   LearningRate 0.0002   Epoch: 19   Global Step: 317330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:05,166-Speed 9235.37 samples/sec   Loss 3.2934   LearningRate 0.0002   Epoch: 19   Global Step: 317340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:06,258-Speed 9377.82 samples/sec   Loss 3.2920   LearningRate 0.0002   Epoch: 19   Global Step: 317350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:07,387-Speed 9077.27 samples/sec   Loss 3.2542   LearningRate 0.0002   Epoch: 19   Global Step: 317360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:08,535-Speed 8928.13 samples/sec   Loss 3.2752   LearningRate 0.0002   Epoch: 19   Global Step: 317370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:09,670-Speed 9019.59 samples/sec   Loss 3.2056   LearningRate 0.0002   Epoch: 19   Global Step: 317380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:10,794-Speed 9120.87 samples/sec   Loss 3.2339   LearningRate 0.0002   Epoch: 19   Global Step: 317390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:11,906-Speed 9208.76 samples/sec   Loss 3.3333   LearningRate 0.0002   Epoch: 19   Global Step: 317400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:13,008-Speed 9304.91 samples/sec   Loss 3.2998   LearningRate 0.0002   Epoch: 19   Global Step: 317410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:14,106-Speed 9327.96 samples/sec   Loss 3.3003   LearningRate 0.0002   Epoch: 19   Global Step: 317420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:15,347-Speed 8256.94 samples/sec   Loss 3.2246   LearningRate 0.0002   Epoch: 19   Global Step: 317430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:16,522-Speed 8719.76 samples/sec   Loss 3.3192   LearningRate 0.0002   Epoch: 19   Global Step: 317440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:17,631-Speed 9238.87 samples/sec   Loss 3.3138   LearningRate 0.0002   Epoch: 19   Global Step: 317450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:18,721-Speed 9402.06 samples/sec   Loss 3.4065   LearningRate 0.0002   Epoch: 19   Global Step: 317460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:19,841-Speed 9149.20 samples/sec   Loss 3.2989   LearningRate 0.0002   Epoch: 19   Global Step: 317470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:20,955-Speed 9201.33 samples/sec   Loss 3.2828   LearningRate 0.0002   Epoch: 19   Global Step: 317480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:22,073-Speed 9166.70 samples/sec   Loss 3.1985   LearningRate 0.0002   Epoch: 19   Global Step: 317490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:23,196-Speed 9118.92 samples/sec   Loss 3.2544   LearningRate 0.0002   Epoch: 19   Global Step: 317500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:24,317-Speed 9137.64 samples/sec   Loss 3.3179   LearningRate 0.0002   Epoch: 19   Global Step: 317510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:25,453-Speed 9018.50 samples/sec   Loss 3.3023   LearningRate 0.0002   Epoch: 19   Global Step: 317520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:26,584-Speed 9061.89 samples/sec   Loss 3.2663   LearningRate 0.0002   Epoch: 19   Global Step: 317530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:27,696-Speed 9213.40 samples/sec   Loss 3.2584   LearningRate 0.0002   Epoch: 19   Global Step: 317540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:28,831-Speed 9023.51 samples/sec   Loss 3.3168   LearningRate 0.0002   Epoch: 19   Global Step: 317550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:29,948-Speed 9174.72 samples/sec   Loss 3.2774   LearningRate 0.0002   Epoch: 19   Global Step: 317560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:31,035-Speed 9424.59 samples/sec   Loss 3.3818   LearningRate 0.0002   Epoch: 19   Global Step: 317570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:32,139-Speed 9285.49 samples/sec   Loss 3.4123   LearningRate 0.0002   Epoch: 19   Global Step: 317580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:33,275-Speed 9022.94 samples/sec   Loss 3.2742   LearningRate 0.0002   Epoch: 19   Global Step: 317590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:34,355-Speed 9490.12 samples/sec   Loss 3.2698   LearningRate 0.0002   Epoch: 19   Global Step: 317600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:35,459-Speed 9279.77 samples/sec   Loss 3.3480   LearningRate 0.0002   Epoch: 19   Global Step: 317610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:36,553-Speed 9360.39 samples/sec   Loss 3.2822   LearningRate 0.0002   Epoch: 19   Global Step: 317620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:37,652-Speed 9328.53 samples/sec   Loss 3.2728   LearningRate 0.0002   Epoch: 19   Global Step: 317630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:38,787-Speed 9024.42 samples/sec   Loss 3.2974   LearningRate 0.0002   Epoch: 19   Global Step: 317640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:39,901-Speed 9197.66 samples/sec   Loss 3.3121   LearningRate 0.0002   Epoch: 19   Global Step: 317650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:40,976-Speed 9528.83 samples/sec   Loss 3.2820   LearningRate 0.0002   Epoch: 19   Global Step: 317660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:42,070-Speed 9365.35 samples/sec   Loss 3.2415   LearningRate 0.0002   Epoch: 19   Global Step: 317670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:43,184-Speed 9206.43 samples/sec   Loss 3.2732   LearningRate 0.0002   Epoch: 19   Global Step: 317680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:44,274-Speed 9397.10 samples/sec   Loss 3.3185   LearningRate 0.0002   Epoch: 19   Global Step: 317690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:45,358-Speed 9451.23 samples/sec   Loss 3.2315   LearningRate 0.0002   Epoch: 19   Global Step: 317700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:46,456-Speed 9334.29 samples/sec   Loss 3.2255   LearningRate 0.0002   Epoch: 19   Global Step: 317710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:25:47,588-Speed 9046.01 samples/sec   Loss 3.2334   LearningRate 0.0002   Epoch: 19   Global Step: 317720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:48,684-Speed 9350.27 samples/sec   Loss 3.3128   LearningRate 0.0002   Epoch: 19   Global Step: 317730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:49,753-Speed 9590.20 samples/sec   Loss 3.3549   LearningRate 0.0002   Epoch: 19   Global Step: 317740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:50,845-Speed 9377.62 samples/sec   Loss 3.2477   LearningRate 0.0002   Epoch: 19   Global Step: 317750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:51,953-Speed 9253.82 samples/sec   Loss 3.2723   LearningRate 0.0002   Epoch: 19   Global Step: 317760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:53,116-Speed 8812.59 samples/sec   Loss 3.3954   LearningRate 0.0002   Epoch: 19   Global Step: 317770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:54,269-Speed 8879.39 samples/sec   Loss 3.3220   LearningRate 0.0002   Epoch: 19   Global Step: 317780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:55,380-Speed 9225.33 samples/sec   Loss 3.2928   LearningRate 0.0002   Epoch: 19   Global Step: 317790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:56,508-Speed 9082.30 samples/sec   Loss 3.2633   LearningRate 0.0002   Epoch: 19   Global Step: 317800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:57,585-Speed 9513.09 samples/sec   Loss 3.2712   LearningRate 0.0002   Epoch: 19   Global Step: 317810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:58,704-Speed 9154.46 samples/sec   Loss 3.3367   LearningRate 0.0002   Epoch: 19   Global Step: 317820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:25:59,827-Speed 9127.10 samples/sec   Loss 3.3369   LearningRate 0.0002   Epoch: 19   Global Step: 317830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:00,929-Speed 9294.90 samples/sec   Loss 3.3017   LearningRate 0.0002   Epoch: 19   Global Step: 317840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:02,039-Speed 9235.10 samples/sec   Loss 3.3225   LearningRate 0.0002   Epoch: 19   Global Step: 317850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:03,232-Speed 8590.06 samples/sec   Loss 3.3843   LearningRate 0.0002   Epoch: 19   Global Step: 317860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:04,339-Speed 9253.44 samples/sec   Loss 3.3375   LearningRate 0.0002   Epoch: 19   Global Step: 317870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:05,440-Speed 9308.49 samples/sec   Loss 3.3192   LearningRate 0.0002   Epoch: 19   Global Step: 317880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:06,539-Speed 9323.99 samples/sec   Loss 3.2817   LearningRate 0.0002   Epoch: 19   Global Step: 317890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:07,644-Speed 9268.21 samples/sec   Loss 3.2649   LearningRate 0.0002   Epoch: 19   Global Step: 317900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:08,776-Speed 9054.91 samples/sec   Loss 3.3126   LearningRate 0.0002   Epoch: 19   Global Step: 317910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:09,900-Speed 9117.15 samples/sec   Loss 3.3104   LearningRate 0.0002   Epoch: 19   Global Step: 317920   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:26:10,995-Speed 9357.64 samples/sec   Loss 3.2639   LearningRate 0.0002   Epoch: 19   Global Step: 317930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:12,078-Speed 9456.58 samples/sec   Loss 3.2083   LearningRate 0.0002   Epoch: 19   Global Step: 317940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:13,234-Speed 8868.03 samples/sec   Loss 3.3240   LearningRate 0.0002   Epoch: 19   Global Step: 317950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:14,394-Speed 8832.23 samples/sec   Loss 3.2824   LearningRate 0.0002   Epoch: 19   Global Step: 317960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:15,508-Speed 9196.46 samples/sec   Loss 3.3118   LearningRate 0.0002   Epoch: 19   Global Step: 317970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:16,591-Speed 9458.50 samples/sec   Loss 3.2740   LearningRate 0.0002   Epoch: 19   Global Step: 317980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:17,757-Speed 8788.17 samples/sec   Loss 3.2982   LearningRate 0.0002   Epoch: 19   Global Step: 317990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:18,839-Speed 9468.87 samples/sec   Loss 3.2792   LearningRate 0.0002   Epoch: 19   Global Step: 318000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:26:40,593-[lfw][318000]XNorm: 6.554832
Training: 2022-04-12 00:26:40,593-[lfw][318000]Accuracy-Flip: 0.99633+-0.00287
Training: 2022-04-12 00:26:40,593-[lfw][318000]Accuracy-Highest: 0.99750
Training: 2022-04-12 00:27:05,756-[cfp_fp][318000]XNorm: 5.732424
Training: 2022-04-12 00:27:05,756-[cfp_fp][318000]Accuracy-Flip: 0.97443+-0.00683
Training: 2022-04-12 00:27:05,757-[cfp_fp][318000]Accuracy-Highest: 0.97514
Training: 2022-04-12 00:27:27,468-[agedb_30][318000]XNorm: 6.385354
Training: 2022-04-12 00:27:27,469-[agedb_30][318000]Accuracy-Flip: 0.97333+-0.00782
Training: 2022-04-12 00:27:27,469-[agedb_30][318000]Accuracy-Highest: 0.97417
Training: 2022-04-12 00:27:28,579-Speed 146.83 samples/sec   Loss 3.2781   LearningRate 0.0002   Epoch: 19   Global Step: 318010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:29,722-Speed 8966.16 samples/sec   Loss 3.3494   LearningRate 0.0002   Epoch: 19   Global Step: 318020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:30,782-Speed 9658.39 samples/sec   Loss 3.3319   LearningRate 0.0002   Epoch: 19   Global Step: 318030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:31,859-Speed 9517.94 samples/sec   Loss 3.3351   LearningRate 0.0002   Epoch: 19   Global Step: 318040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:33,000-Speed 8977.11 samples/sec   Loss 3.3226   LearningRate 0.0002   Epoch: 19   Global Step: 318050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:34,115-Speed 9189.64 samples/sec   Loss 3.2921   LearningRate 0.0002   Epoch: 19   Global Step: 318060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:35,197-Speed 9478.11 samples/sec   Loss 3.3069   LearningRate 0.0002   Epoch: 19   Global Step: 318070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:36,351-Speed 8875.80 samples/sec   Loss 3.2918   LearningRate 0.0002   Epoch: 19   Global Step: 318080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:37,466-Speed 9188.77 samples/sec   Loss 3.3108   LearningRate 0.0002   Epoch: 19   Global Step: 318090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:38,593-Speed 9088.75 samples/sec   Loss 3.2813   LearningRate 0.0002   Epoch: 19   Global Step: 318100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:39,676-Speed 9466.51 samples/sec   Loss 3.2251   LearningRate 0.0002   Epoch: 19   Global Step: 318110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:40,723-Speed 9785.79 samples/sec   Loss 3.2810   LearningRate 0.0002   Epoch: 19   Global Step: 318120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:41,812-Speed 9405.97 samples/sec   Loss 3.2635   LearningRate 0.0002   Epoch: 19   Global Step: 318130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:42,906-Speed 9366.12 samples/sec   Loss 3.2751   LearningRate 0.0002   Epoch: 19   Global Step: 318140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:44,020-Speed 9199.26 samples/sec   Loss 3.2666   LearningRate 0.0002   Epoch: 19   Global Step: 318150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:45,109-Speed 9405.76 samples/sec   Loss 3.3012   LearningRate 0.0002   Epoch: 19   Global Step: 318160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:46,214-Speed 9270.84 samples/sec   Loss 3.2153   LearningRate 0.0002   Epoch: 19   Global Step: 318170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:47,341-Speed 9092.28 samples/sec   Loss 3.3136   LearningRate 0.0002   Epoch: 19   Global Step: 318180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:48,475-Speed 9038.21 samples/sec   Loss 3.2216   LearningRate 0.0002   Epoch: 19   Global Step: 318190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:27:49,590-Speed 9186.13 samples/sec   Loss 3.3190   LearningRate 0.0002   Epoch: 19   Global Step: 318200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:27:50,706-Speed 9180.71 samples/sec   Loss 3.3507   LearningRate 0.0002   Epoch: 19   Global Step: 318210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:27:51,850-Speed 8958.87 samples/sec   Loss 3.3027   LearningRate 0.0002   Epoch: 19   Global Step: 318220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:27:53,015-Speed 8795.95 samples/sec   Loss 3.3177   LearningRate 0.0002   Epoch: 19   Global Step: 318230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:27:54,164-Speed 8919.06 samples/sec   Loss 3.2759   LearningRate 0.0002   Epoch: 19   Global Step: 318240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:27:55,252-Speed 9415.66 samples/sec   Loss 3.3099   LearningRate 0.0002   Epoch: 19   Global Step: 318250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:27:56,380-Speed 9084.92 samples/sec   Loss 3.2997   LearningRate 0.0002   Epoch: 19   Global Step: 318260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:27:57,510-Speed 9063.80 samples/sec   Loss 3.2673   LearningRate 0.0002   Epoch: 19   Global Step: 318270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:27:58,615-Speed 9279.32 samples/sec   Loss 3.2839   LearningRate 0.0002   Epoch: 19   Global Step: 318280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:27:59,705-Speed 9394.51 samples/sec   Loss 3.3259   LearningRate 0.0002   Epoch: 19   Global Step: 318290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:28:00,787-Speed 9471.99 samples/sec   Loss 3.3043   LearningRate 0.0002   Epoch: 19   Global Step: 318300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:01,894-Speed 9259.95 samples/sec   Loss 3.2299   LearningRate 0.0002   Epoch: 19   Global Step: 318310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:02,967-Speed 9551.90 samples/sec   Loss 3.3335   LearningRate 0.0002   Epoch: 19   Global Step: 318320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:28:04,088-Speed 9135.59 samples/sec   Loss 3.3088   LearningRate 0.0002   Epoch: 19   Global Step: 318330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:28:05,213-Speed 9108.91 samples/sec   Loss 3.2908   LearningRate 0.0002   Epoch: 19   Global Step: 318340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:28:06,320-Speed 9251.80 samples/sec   Loss 3.3149   LearningRate 0.0002   Epoch: 19   Global Step: 318350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:28:07,448-Speed 9088.59 samples/sec   Loss 3.3356   LearningRate 0.0002   Epoch: 19   Global Step: 318360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:28:08,564-Speed 9178.63 samples/sec   Loss 3.3151   LearningRate 0.0002   Epoch: 19   Global Step: 318370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:28:09,688-Speed 9109.13 samples/sec   Loss 3.3299   LearningRate 0.0002   Epoch: 19   Global Step: 318380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:28:10,772-Speed 9457.23 samples/sec   Loss 3.3087   LearningRate 0.0002   Epoch: 19   Global Step: 318390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:28:11,889-Speed 9166.67 samples/sec   Loss 3.2204   LearningRate 0.0002   Epoch: 19   Global Step: 318400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:28:13,046-Speed 8859.98 samples/sec   Loss 3.3377   LearningRate 0.0002   Epoch: 19   Global Step: 318410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:28:14,187-Speed 8981.83 samples/sec   Loss 3.2026   LearningRate 0.0002   Epoch: 19   Global Step: 318420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:15,358-Speed 8748.12 samples/sec   Loss 3.2353   LearningRate 0.0002   Epoch: 19   Global Step: 318430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:16,543-Speed 8645.80 samples/sec   Loss 3.3209   LearningRate 0.0002   Epoch: 19   Global Step: 318440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:17,643-Speed 9318.22 samples/sec   Loss 3.2991   LearningRate 0.0002   Epoch: 19   Global Step: 318450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:18,769-Speed 9097.70 samples/sec   Loss 3.2825   LearningRate 0.0002   Epoch: 19   Global Step: 318460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:19,887-Speed 9170.44 samples/sec   Loss 3.2647   LearningRate 0.0002   Epoch: 19   Global Step: 318470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:20,948-Speed 9650.09 samples/sec   Loss 3.2050   LearningRate 0.0002   Epoch: 19   Global Step: 318480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:22,088-Speed 8993.58 samples/sec   Loss 3.2801   LearningRate 0.0002   Epoch: 19   Global Step: 318490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:23,184-Speed 9343.05 samples/sec   Loss 3.2967   LearningRate 0.0002   Epoch: 19   Global Step: 318500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:24,296-Speed 9212.76 samples/sec   Loss 3.3087   LearningRate 0.0002   Epoch: 19   Global Step: 318510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:25,393-Speed 9337.85 samples/sec   Loss 3.3484   LearningRate 0.0002   Epoch: 19   Global Step: 318520   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:28:26,509-Speed 9183.29 samples/sec   Loss 3.3263   LearningRate 0.0002   Epoch: 19   Global Step: 318530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:27,626-Speed 9170.69 samples/sec   Loss 3.3466   LearningRate 0.0002   Epoch: 19   Global Step: 318540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:28,741-Speed 9191.12 samples/sec   Loss 3.3448   LearningRate 0.0002   Epoch: 19   Global Step: 318550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:29,861-Speed 9147.34 samples/sec   Loss 3.2663   LearningRate 0.0002   Epoch: 19   Global Step: 318560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:30,938-Speed 9510.30 samples/sec   Loss 3.2284   LearningRate 0.0002   Epoch: 19   Global Step: 318570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:32,062-Speed 9118.21 samples/sec   Loss 3.2699   LearningRate 0.0002   Epoch: 19   Global Step: 318580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:33,161-Speed 9327.51 samples/sec   Loss 3.3049   LearningRate 0.0002   Epoch: 19   Global Step: 318590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:34,263-Speed 9293.34 samples/sec   Loss 3.3364   LearningRate 0.0002   Epoch: 19   Global Step: 318600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:35,388-Speed 9112.60 samples/sec   Loss 3.3681   LearningRate 0.0002   Epoch: 19   Global Step: 318610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:36,542-Speed 8876.16 samples/sec   Loss 3.3093   LearningRate 0.0002   Epoch: 19   Global Step: 318620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:37,648-Speed 9264.95 samples/sec   Loss 3.2880   LearningRate 0.0002   Epoch: 19   Global Step: 318630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:38,807-Speed 8842.69 samples/sec   Loss 3.2162   LearningRate 0.0002   Epoch: 19   Global Step: 318640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:39,917-Speed 9231.81 samples/sec   Loss 3.2815   LearningRate 0.0002   Epoch: 19   Global Step: 318650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:41,040-Speed 9121.61 samples/sec   Loss 3.2884   LearningRate 0.0002   Epoch: 19   Global Step: 318660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:42,199-Speed 8836.76 samples/sec   Loss 3.3045   LearningRate 0.0002   Epoch: 19   Global Step: 318670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:43,269-Speed 9586.61 samples/sec   Loss 3.3504   LearningRate 0.0002   Epoch: 19   Global Step: 318680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:44,396-Speed 9089.69 samples/sec   Loss 3.2467   LearningRate 0.0002   Epoch: 19   Global Step: 318690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:45,507-Speed 9222.66 samples/sec   Loss 3.2820   LearningRate 0.0002   Epoch: 19   Global Step: 318700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:46,596-Speed 9403.86 samples/sec   Loss 3.2738   LearningRate 0.0002   Epoch: 19   Global Step: 318710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:47,730-Speed 9038.28 samples/sec   Loss 3.3566   LearningRate 0.0002   Epoch: 19   Global Step: 318720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:48,846-Speed 9175.64 samples/sec   Loss 3.2572   LearningRate 0.0002   Epoch: 19   Global Step: 318730   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:28:49,963-Speed 9180.87 samples/sec   Loss 3.3652   LearningRate 0.0002   Epoch: 19   Global Step: 318740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:51,136-Speed 8732.66 samples/sec   Loss 3.3304   LearningRate 0.0002   Epoch: 19   Global Step: 318750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:52,246-Speed 9226.26 samples/sec   Loss 3.2913   LearningRate 0.0002   Epoch: 19   Global Step: 318760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:53,376-Speed 9068.52 samples/sec   Loss 3.2498   LearningRate 0.0002   Epoch: 19   Global Step: 318770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:54,481-Speed 9275.00 samples/sec   Loss 3.2879   LearningRate 0.0002   Epoch: 19   Global Step: 318780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:55,597-Speed 9184.73 samples/sec   Loss 3.3442   LearningRate 0.0002   Epoch: 19   Global Step: 318790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:56,709-Speed 9210.38 samples/sec   Loss 3.3162   LearningRate 0.0002   Epoch: 19   Global Step: 318800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:57,862-Speed 8883.06 samples/sec   Loss 3.3550   LearningRate 0.0002   Epoch: 19   Global Step: 318810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:28:58,976-Speed 9203.05 samples/sec   Loss 3.2615   LearningRate 0.0002   Epoch: 19   Global Step: 318820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:00,110-Speed 9028.13 samples/sec   Loss 3.3304   LearningRate 0.0002   Epoch: 19   Global Step: 318830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:01,239-Speed 9075.24 samples/sec   Loss 3.2753   LearningRate 0.0002   Epoch: 19   Global Step: 318840   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:29:02,317-Speed 9513.50 samples/sec   Loss 3.2840   LearningRate 0.0002   Epoch: 19   Global Step: 318850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:03,478-Speed 8822.02 samples/sec   Loss 3.2169   LearningRate 0.0002   Epoch: 19   Global Step: 318860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:04,601-Speed 9123.02 samples/sec   Loss 3.2727   LearningRate 0.0002   Epoch: 19   Global Step: 318870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:05,732-Speed 9059.58 samples/sec   Loss 3.2669   LearningRate 0.0002   Epoch: 19   Global Step: 318880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:06,843-Speed 9220.33 samples/sec   Loss 3.3208   LearningRate 0.0002   Epoch: 19   Global Step: 318890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:08,012-Speed 8767.01 samples/sec   Loss 3.3140   LearningRate 0.0002   Epoch: 19   Global Step: 318900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:09,180-Speed 8771.43 samples/sec   Loss 3.2755   LearningRate 0.0002   Epoch: 19   Global Step: 318910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:10,300-Speed 9152.10 samples/sec   Loss 3.2637   LearningRate 0.0002   Epoch: 19   Global Step: 318920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:11,439-Speed 8990.74 samples/sec   Loss 3.2892   LearningRate 0.0002   Epoch: 19   Global Step: 318930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:12,578-Speed 8998.30 samples/sec   Loss 3.3262   LearningRate 0.0002   Epoch: 19   Global Step: 318940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:13,681-Speed 9294.16 samples/sec   Loss 3.2579   LearningRate 0.0002   Epoch: 19   Global Step: 318950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:14,791-Speed 9231.23 samples/sec   Loss 3.2180   LearningRate 0.0002   Epoch: 19   Global Step: 318960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:15,881-Speed 9402.20 samples/sec   Loss 3.2722   LearningRate 0.0002   Epoch: 19   Global Step: 318970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:16,970-Speed 9412.36 samples/sec   Loss 3.3045   LearningRate 0.0002   Epoch: 19   Global Step: 318980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:18,145-Speed 8714.79 samples/sec   Loss 3.3362   LearningRate 0.0002   Epoch: 19   Global Step: 318990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:19,228-Speed 9459.02 samples/sec   Loss 3.2913   LearningRate 0.0002   Epoch: 19   Global Step: 319000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:29:20,344-Speed 9189.98 samples/sec   Loss 3.2793   LearningRate 0.0002   Epoch: 19   Global Step: 319010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:29:21,483-Speed 8993.22 samples/sec   Loss 3.2518   LearningRate 0.0002   Epoch: 19   Global Step: 319020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:29:22,571-Speed 9422.64 samples/sec   Loss 3.3555   LearningRate 0.0002   Epoch: 19   Global Step: 319030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:29:23,754-Speed 8656.37 samples/sec   Loss 3.2371   LearningRate 0.0002   Epoch: 19   Global Step: 319040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:29:24,909-Speed 8873.56 samples/sec   Loss 3.2288   LearningRate 0.0002   Epoch: 19   Global Step: 319050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:29:26,012-Speed 9292.31 samples/sec   Loss 3.3014   LearningRate 0.0002   Epoch: 19   Global Step: 319060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:29:27,170-Speed 8848.12 samples/sec   Loss 3.3026   LearningRate 0.0002   Epoch: 19   Global Step: 319070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:29:28,327-Speed 8850.94 samples/sec   Loss 3.3150   LearningRate 0.0002   Epoch: 19   Global Step: 319080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:29:29,460-Speed 9044.01 samples/sec   Loss 3.2938   LearningRate 0.0002   Epoch: 19   Global Step: 319090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:29:30,566-Speed 9265.78 samples/sec   Loss 3.2702   LearningRate 0.0002   Epoch: 19   Global Step: 319100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:31,679-Speed 9202.07 samples/sec   Loss 3.2661   LearningRate 0.0002   Epoch: 19   Global Step: 319110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:32,816-Speed 9020.55 samples/sec   Loss 3.2761   LearningRate 0.0002   Epoch: 19   Global Step: 319120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:33,898-Speed 9466.58 samples/sec   Loss 3.2352   LearningRate 0.0002   Epoch: 19   Global Step: 319130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:35,059-Speed 8822.95 samples/sec   Loss 3.2120   LearningRate 0.0002   Epoch: 19   Global Step: 319140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:36,185-Speed 9100.95 samples/sec   Loss 3.2815   LearningRate 0.0002   Epoch: 19   Global Step: 319150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:37,283-Speed 9334.40 samples/sec   Loss 3.3072   LearningRate 0.0002   Epoch: 19   Global Step: 319160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:38,372-Speed 9407.04 samples/sec   Loss 3.3296   LearningRate 0.0002   Epoch: 19   Global Step: 319170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:39,568-Speed 8566.75 samples/sec   Loss 3.3177   LearningRate 0.0002   Epoch: 19   Global Step: 319180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:40,689-Speed 9133.47 samples/sec   Loss 3.2812   LearningRate 0.0002   Epoch: 19   Global Step: 319190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:41,806-Speed 9176.08 samples/sec   Loss 3.2617   LearningRate 0.0002   Epoch: 19   Global Step: 319200   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:29:42,898-Speed 9384.82 samples/sec   Loss 3.3243   LearningRate 0.0002   Epoch: 19   Global Step: 319210   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:29:43,992-Speed 9370.63 samples/sec   Loss 3.2669   LearningRate 0.0002   Epoch: 19   Global Step: 319220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:45,066-Speed 9537.70 samples/sec   Loss 3.2621   LearningRate 0.0002   Epoch: 19   Global Step: 319230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:46,200-Speed 9033.23 samples/sec   Loss 3.2722   LearningRate 0.0002   Epoch: 19   Global Step: 319240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:47,336-Speed 9018.70 samples/sec   Loss 3.2670   LearningRate 0.0002   Epoch: 19   Global Step: 319250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:48,472-Speed 9022.18 samples/sec   Loss 3.4567   LearningRate 0.0002   Epoch: 19   Global Step: 319260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:49,561-Speed 9406.83 samples/sec   Loss 3.2670   LearningRate 0.0002   Epoch: 19   Global Step: 319270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:50,687-Speed 9109.79 samples/sec   Loss 3.1973   LearningRate 0.0002   Epoch: 19   Global Step: 319280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:51,832-Speed 8949.16 samples/sec   Loss 3.3612   LearningRate 0.0002   Epoch: 19   Global Step: 319290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:52,981-Speed 8917.96 samples/sec   Loss 3.2570   LearningRate 0.0002   Epoch: 19   Global Step: 319300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:54,097-Speed 9175.97 samples/sec   Loss 3.2899   LearningRate 0.0002   Epoch: 19   Global Step: 319310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:55,191-Speed 9368.81 samples/sec   Loss 3.2641   LearningRate 0.0002   Epoch: 19   Global Step: 319320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:56,299-Speed 9244.16 samples/sec   Loss 3.2853   LearningRate 0.0002   Epoch: 19   Global Step: 319330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:57,416-Speed 9173.05 samples/sec   Loss 3.3420   LearningRate 0.0002   Epoch: 19   Global Step: 319340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:58,495-Speed 9495.61 samples/sec   Loss 3.3213   LearningRate 0.0002   Epoch: 19   Global Step: 319350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:29:59,621-Speed 9103.22 samples/sec   Loss 3.2400   LearningRate 0.0002   Epoch: 19   Global Step: 319360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:00,751-Speed 9067.13 samples/sec   Loss 3.3236   LearningRate 0.0002   Epoch: 19   Global Step: 319370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:01,885-Speed 9033.00 samples/sec   Loss 3.2752   LearningRate 0.0002   Epoch: 19   Global Step: 319380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:02,972-Speed 9428.01 samples/sec   Loss 3.3034   LearningRate 0.0002   Epoch: 19   Global Step: 319390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:04,078-Speed 9265.49 samples/sec   Loss 3.2999   LearningRate 0.0002   Epoch: 19   Global Step: 319400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:05,172-Speed 9360.39 samples/sec   Loss 3.2349   LearningRate 0.0002   Epoch: 19   Global Step: 319410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:06,297-Speed 9105.93 samples/sec   Loss 3.3300   LearningRate 0.0002   Epoch: 19   Global Step: 319420   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:30:07,401-Speed 9281.18 samples/sec   Loss 3.3274   LearningRate 0.0002   Epoch: 19   Global Step: 319430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:08,522-Speed 9143.99 samples/sec   Loss 3.2043   LearningRate 0.0002   Epoch: 19   Global Step: 319440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:09,650-Speed 9085.04 samples/sec   Loss 3.2879   LearningRate 0.0002   Epoch: 19   Global Step: 319450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:10,718-Speed 9593.41 samples/sec   Loss 3.2334   LearningRate 0.0002   Epoch: 19   Global Step: 319460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:11,818-Speed 9313.36 samples/sec   Loss 3.2205   LearningRate 0.0002   Epoch: 19   Global Step: 319470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:12,898-Speed 9485.13 samples/sec   Loss 3.4333   LearningRate 0.0002   Epoch: 19   Global Step: 319480   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:14,037-Speed 9003.67 samples/sec   Loss 3.3118   LearningRate 0.0002   Epoch: 19   Global Step: 319490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:15,151-Speed 9195.45 samples/sec   Loss 3.2871   LearningRate 0.0002   Epoch: 19   Global Step: 319500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:16,240-Speed 9406.50 samples/sec   Loss 3.3171   LearningRate 0.0002   Epoch: 19   Global Step: 319510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:17,363-Speed 9125.72 samples/sec   Loss 3.2748   LearningRate 0.0002   Epoch: 19   Global Step: 319520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:18,484-Speed 9143.28 samples/sec   Loss 3.3013   LearningRate 0.0002   Epoch: 19   Global Step: 319530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:19,595-Speed 9216.66 samples/sec   Loss 3.3007   LearningRate 0.0002   Epoch: 19   Global Step: 319540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:20,690-Speed 9364.64 samples/sec   Loss 3.3312   LearningRate 0.0002   Epoch: 19   Global Step: 319550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:21,787-Speed 9338.99 samples/sec   Loss 3.2544   LearningRate 0.0002   Epoch: 19   Global Step: 319560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:22,904-Speed 9167.12 samples/sec   Loss 3.2428   LearningRate 0.0002   Epoch: 19   Global Step: 319570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:24,058-Speed 8883.16 samples/sec   Loss 3.2834   LearningRate 0.0002   Epoch: 19   Global Step: 319580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:25,209-Speed 8899.96 samples/sec   Loss 3.3148   LearningRate 0.0002   Epoch: 19   Global Step: 319590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:26,420-Speed 8461.06 samples/sec   Loss 3.2345   LearningRate 0.0002   Epoch: 19   Global Step: 319600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:27,519-Speed 9322.97 samples/sec   Loss 3.3434   LearningRate 0.0002   Epoch: 19   Global Step: 319610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:28,665-Speed 8939.89 samples/sec   Loss 3.2827   LearningRate 0.0002   Epoch: 19   Global Step: 319620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:29,764-Speed 9325.54 samples/sec   Loss 3.2370   LearningRate 0.0002   Epoch: 19   Global Step: 319630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:30,853-Speed 9405.96 samples/sec   Loss 3.2358   LearningRate 0.0002   Epoch: 19   Global Step: 319640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:31,988-Speed 9028.12 samples/sec   Loss 3.3381   LearningRate 0.0002   Epoch: 19   Global Step: 319650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:33,139-Speed 8908.12 samples/sec   Loss 3.3109   LearningRate 0.0002   Epoch: 19   Global Step: 319660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:34,298-Speed 8835.25 samples/sec   Loss 3.4082   LearningRate 0.0002   Epoch: 19   Global Step: 319670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:35,369-Speed 9568.34 samples/sec   Loss 3.3358   LearningRate 0.0002   Epoch: 19   Global Step: 319680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:36,475-Speed 9262.40 samples/sec   Loss 3.3657   LearningRate 0.0002   Epoch: 19   Global Step: 319690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:37,582-Speed 9254.26 samples/sec   Loss 3.2937   LearningRate 0.0002   Epoch: 19   Global Step: 319700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:38,700-Speed 9164.11 samples/sec   Loss 3.2996   LearningRate 0.0002   Epoch: 19   Global Step: 319710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:39,843-Speed 8967.98 samples/sec   Loss 3.2590   LearningRate 0.0002   Epoch: 19   Global Step: 319720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:40,918-Speed 9523.85 samples/sec   Loss 3.2713   LearningRate 0.0002   Epoch: 19   Global Step: 319730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:41,977-Speed 9681.40 samples/sec   Loss 3.3489   LearningRate 0.0002   Epoch: 19   Global Step: 319740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:43,076-Speed 9322.27 samples/sec   Loss 3.2554   LearningRate 0.0002   Epoch: 19   Global Step: 319750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:44,225-Speed 8920.35 samples/sec   Loss 3.3251   LearningRate 0.0002   Epoch: 19   Global Step: 319760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:45,328-Speed 9288.73 samples/sec   Loss 3.2077   LearningRate 0.0002   Epoch: 19   Global Step: 319770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:46,425-Speed 9343.54 samples/sec   Loss 3.2678   LearningRate 0.0002   Epoch: 19   Global Step: 319780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:47,551-Speed 9097.45 samples/sec   Loss 3.3158   LearningRate 0.0002   Epoch: 19   Global Step: 319790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:48,644-Speed 9377.23 samples/sec   Loss 3.2806   LearningRate 0.0002   Epoch: 19   Global Step: 319800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:49,772-Speed 9086.13 samples/sec   Loss 3.2927   LearningRate 0.0002   Epoch: 19   Global Step: 319810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:50,856-Speed 9448.19 samples/sec   Loss 3.3023   LearningRate 0.0002   Epoch: 19   Global Step: 319820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:51,981-Speed 9106.57 samples/sec   Loss 3.3147   LearningRate 0.0002   Epoch: 19   Global Step: 319830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:53,102-Speed 9137.65 samples/sec   Loss 3.2290   LearningRate 0.0002   Epoch: 19   Global Step: 319840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:54,228-Speed 9104.11 samples/sec   Loss 3.2937   LearningRate 0.0002   Epoch: 19   Global Step: 319850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:55,311-Speed 9458.72 samples/sec   Loss 3.3185   LearningRate 0.0002   Epoch: 19   Global Step: 319860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:56,401-Speed 9399.62 samples/sec   Loss 3.2899   LearningRate 0.0002   Epoch: 19   Global Step: 319870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:57,575-Speed 8725.04 samples/sec   Loss 3.3321   LearningRate 0.0002   Epoch: 19   Global Step: 319880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:58,713-Speed 9003.59 samples/sec   Loss 3.2875   LearningRate 0.0002   Epoch: 19   Global Step: 319890   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:30:59,838-Speed 9108.51 samples/sec   Loss 3.2656   LearningRate 0.0002   Epoch: 19   Global Step: 319900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:31:00,930-Speed 9377.89 samples/sec   Loss 3.3616   LearningRate 0.0002   Epoch: 19   Global Step: 319910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:31:02,018-Speed 9423.97 samples/sec   Loss 3.2974   LearningRate 0.0002   Epoch: 19   Global Step: 319920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:31:03,117-Speed 9325.53 samples/sec   Loss 3.2850   LearningRate 0.0002   Epoch: 19   Global Step: 319930   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:31:04,246-Speed 9075.44 samples/sec   Loss 3.2237   LearningRate 0.0002   Epoch: 19   Global Step: 319940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:31:05,323-Speed 9512.63 samples/sec   Loss 3.2999   LearningRate 0.0002   Epoch: 19   Global Step: 319950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:31:06,471-Speed 8929.00 samples/sec   Loss 3.3027   LearningRate 0.0002   Epoch: 19   Global Step: 319960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:31:07,592-Speed 9133.98 samples/sec   Loss 3.3039   LearningRate 0.0002   Epoch: 19   Global Step: 319970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:31:08,728-Speed 9024.79 samples/sec   Loss 3.2882   LearningRate 0.0002   Epoch: 19   Global Step: 319980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:31:09,821-Speed 9373.62 samples/sec   Loss 3.2200   LearningRate 0.0002   Epoch: 19   Global Step: 319990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:31:10,901-Speed 9480.99 samples/sec   Loss 3.3068   LearningRate 0.0002   Epoch: 19   Global Step: 320000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:31:32,812-[lfw][320000]XNorm: 6.561927
Training: 2022-04-12 00:31:32,813-[lfw][320000]Accuracy-Flip: 0.99717+-0.00279
Training: 2022-04-12 00:31:32,814-[lfw][320000]Accuracy-Highest: 0.99750
Training: 2022-04-12 00:31:58,171-[cfp_fp][320000]XNorm: 5.731094
Training: 2022-04-12 00:31:58,172-[cfp_fp][320000]Accuracy-Flip: 0.97271+-0.00907
Training: 2022-04-12 00:31:58,172-[cfp_fp][320000]Accuracy-Highest: 0.97514
Training: 2022-04-12 00:32:20,120-[agedb_30][320000]XNorm: 6.391900
Training: 2022-04-12 00:32:20,121-[agedb_30][320000]Accuracy-Flip: 0.97233+-0.00782
Training: 2022-04-12 00:32:20,121-[agedb_30][320000]Accuracy-Highest: 0.97417
Training: 2022-04-12 00:32:21,219-Speed 145.63 samples/sec   Loss 3.3279   LearningRate 0.0002   Epoch: 19   Global Step: 320010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:22,317-Speed 9325.98 samples/sec   Loss 3.2269   LearningRate 0.0002   Epoch: 19   Global Step: 320020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:23,408-Speed 9398.27 samples/sec   Loss 3.3593   LearningRate 0.0002   Epoch: 19   Global Step: 320030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:24,546-Speed 9000.02 samples/sec   Loss 3.3558   LearningRate 0.0002   Epoch: 19   Global Step: 320040   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:32:25,691-Speed 8948.09 samples/sec   Loss 3.2497   LearningRate 0.0002   Epoch: 19   Global Step: 320050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:26,851-Speed 8835.65 samples/sec   Loss 3.2922   LearningRate 0.0002   Epoch: 19   Global Step: 320060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:27,980-Speed 9076.36 samples/sec   Loss 3.3138   LearningRate 0.0002   Epoch: 19   Global Step: 320070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:29,136-Speed 8863.23 samples/sec   Loss 3.2817   LearningRate 0.0002   Epoch: 19   Global Step: 320080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:30,316-Speed 8684.42 samples/sec   Loss 3.2876   LearningRate 0.0002   Epoch: 19   Global Step: 320090   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:31,424-Speed 9242.85 samples/sec   Loss 3.2883   LearningRate 0.0002   Epoch: 19   Global Step: 320100   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:32,547-Speed 9125.40 samples/sec   Loss 3.2861   LearningRate 0.0002   Epoch: 19   Global Step: 320110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:33,687-Speed 8989.41 samples/sec   Loss 3.3041   LearningRate 0.0002   Epoch: 19   Global Step: 320120   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:34,787-Speed 9310.43 samples/sec   Loss 3.2457   LearningRate 0.0002   Epoch: 19   Global Step: 320130   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:35,912-Speed 9112.56 samples/sec   Loss 3.1916   LearningRate 0.0002   Epoch: 19   Global Step: 320140   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:37,074-Speed 8812.72 samples/sec   Loss 3.3169   LearningRate 0.0002   Epoch: 19   Global Step: 320150   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:38,186-Speed 9213.36 samples/sec   Loss 3.2855   LearningRate 0.0002   Epoch: 19   Global Step: 320160   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:39,293-Speed 9259.05 samples/sec   Loss 3.3578   LearningRate 0.0002   Epoch: 19   Global Step: 320170   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:40,431-Speed 9003.97 samples/sec   Loss 3.2657   LearningRate 0.0002   Epoch: 19   Global Step: 320180   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:41,607-Speed 8711.21 samples/sec   Loss 3.2638   LearningRate 0.0002   Epoch: 19   Global Step: 320190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:42,693-Speed 9435.69 samples/sec   Loss 3.3725   LearningRate 0.0002   Epoch: 19   Global Step: 320200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:43,784-Speed 9391.38 samples/sec   Loss 3.3044   LearningRate 0.0002   Epoch: 19   Global Step: 320210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:44,883-Speed 9326.48 samples/sec   Loss 3.2817   LearningRate 0.0002   Epoch: 19   Global Step: 320220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:45,984-Speed 9299.33 samples/sec   Loss 3.3368   LearningRate 0.0002   Epoch: 19   Global Step: 320230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:47,112-Speed 9090.55 samples/sec   Loss 3.3897   LearningRate 0.0002   Epoch: 19   Global Step: 320240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:48,268-Speed 8858.83 samples/sec   Loss 3.3356   LearningRate 0.0002   Epoch: 19   Global Step: 320250   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:32:49,376-Speed 9251.12 samples/sec   Loss 3.2361   LearningRate 0.0002   Epoch: 19   Global Step: 320260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:50,492-Speed 9183.03 samples/sec   Loss 3.2841   LearningRate 0.0002   Epoch: 19   Global Step: 320270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:51,602-Speed 9235.40 samples/sec   Loss 3.2781   LearningRate 0.0002   Epoch: 19   Global Step: 320280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:52,701-Speed 9316.76 samples/sec   Loss 3.2840   LearningRate 0.0002   Epoch: 19   Global Step: 320290   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:53,848-Speed 8932.10 samples/sec   Loss 3.2553   LearningRate 0.0002   Epoch: 19   Global Step: 320300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:54,969-Speed 9144.10 samples/sec   Loss 3.3252   LearningRate 0.0002   Epoch: 19   Global Step: 320310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:56,096-Speed 9090.07 samples/sec   Loss 3.2480   LearningRate 0.0002   Epoch: 19   Global Step: 320320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:57,181-Speed 9440.88 samples/sec   Loss 3.2731   LearningRate 0.0002   Epoch: 19   Global Step: 320330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:58,315-Speed 9040.40 samples/sec   Loss 3.3126   LearningRate 0.0002   Epoch: 19   Global Step: 320340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:32:59,432-Speed 9170.89 samples/sec   Loss 3.2677   LearningRate 0.0002   Epoch: 19   Global Step: 320350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:00,554-Speed 9130.45 samples/sec   Loss 3.3881   LearningRate 0.0002   Epoch: 19   Global Step: 320360   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:33:01,677-Speed 9121.91 samples/sec   Loss 3.2506   LearningRate 0.0002   Epoch: 19   Global Step: 320370   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:33:02,778-Speed 9308.46 samples/sec   Loss 3.2380   LearningRate 0.0002   Epoch: 19   Global Step: 320380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:03,901-Speed 9122.33 samples/sec   Loss 3.2820   LearningRate 0.0002   Epoch: 19   Global Step: 320390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:05,013-Speed 9214.04 samples/sec   Loss 3.2440   LearningRate 0.0002   Epoch: 19   Global Step: 320400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:06,121-Speed 9249.80 samples/sec   Loss 3.2743   LearningRate 0.0002   Epoch: 19   Global Step: 320410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:07,216-Speed 9355.71 samples/sec   Loss 3.2957   LearningRate 0.0002   Epoch: 19   Global Step: 320420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:08,367-Speed 8911.39 samples/sec   Loss 3.2974   LearningRate 0.0002   Epoch: 19   Global Step: 320430   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:09,473-Speed 9262.59 samples/sec   Loss 3.1725   LearningRate 0.0002   Epoch: 19   Global Step: 320440   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:10,639-Speed 8783.10 samples/sec   Loss 3.3014   LearningRate 0.0002   Epoch: 19   Global Step: 320450   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:11,770-Speed 9066.00 samples/sec   Loss 3.2764   LearningRate 0.0002   Epoch: 19   Global Step: 320460   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:12,916-Speed 8934.48 samples/sec   Loss 3.3569   LearningRate 0.0002   Epoch: 19   Global Step: 320470   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:14,015-Speed 9324.07 samples/sec   Loss 3.3188   LearningRate 0.0002   Epoch: 19   Global Step: 320480   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:33:15,161-Speed 8939.61 samples/sec   Loss 3.3397   LearningRate 0.0002   Epoch: 19   Global Step: 320490   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:16,242-Speed 9481.93 samples/sec   Loss 3.2807   LearningRate 0.0002   Epoch: 19   Global Step: 320500   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:17,356-Speed 9198.86 samples/sec   Loss 3.2338   LearningRate 0.0002   Epoch: 19   Global Step: 320510   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:18,451-Speed 9363.21 samples/sec   Loss 3.2355   LearningRate 0.0002   Epoch: 19   Global Step: 320520   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:19,607-Speed 8860.82 samples/sec   Loss 3.3047   LearningRate 0.0002   Epoch: 19   Global Step: 320530   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:20,720-Speed 9206.29 samples/sec   Loss 3.2779   LearningRate 0.0002   Epoch: 19   Global Step: 320540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:21,809-Speed 9400.79 samples/sec   Loss 3.1799   LearningRate 0.0002   Epoch: 19   Global Step: 320550   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:22,926-Speed 9174.60 samples/sec   Loss 3.3259   LearningRate 0.0002   Epoch: 19   Global Step: 320560   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:24,006-Speed 9489.56 samples/sec   Loss 3.2529   LearningRate 0.0002   Epoch: 19   Global Step: 320570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:25,109-Speed 9289.95 samples/sec   Loss 3.2798   LearningRate 0.0002   Epoch: 19   Global Step: 320580   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:26,243-Speed 9035.52 samples/sec   Loss 3.2356   LearningRate 0.0002   Epoch: 19   Global Step: 320590   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:27,320-Speed 9517.40 samples/sec   Loss 3.2422   LearningRate 0.0002   Epoch: 19   Global Step: 320600   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:28,391-Speed 9560.78 samples/sec   Loss 3.3147   LearningRate 0.0002   Epoch: 19   Global Step: 320610   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:29,514-Speed 9123.95 samples/sec   Loss 3.2936   LearningRate 0.0002   Epoch: 19   Global Step: 320620   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:30,631-Speed 9176.41 samples/sec   Loss 3.2921   LearningRate 0.0002   Epoch: 19   Global Step: 320630   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:31,755-Speed 9116.07 samples/sec   Loss 3.3507   LearningRate 0.0002   Epoch: 19   Global Step: 320640   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:32,907-Speed 8895.07 samples/sec   Loss 3.2424   LearningRate 0.0002   Epoch: 19   Global Step: 320650   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:34,066-Speed 8836.38 samples/sec   Loss 3.3308   LearningRate 0.0002   Epoch: 19   Global Step: 320660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:35,148-Speed 9469.23 samples/sec   Loss 3.2982   LearningRate 0.0002   Epoch: 19   Global Step: 320670   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:36,252-Speed 9280.44 samples/sec   Loss 3.3001   LearningRate 0.0002   Epoch: 19   Global Step: 320680   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:37,359-Speed 9259.45 samples/sec   Loss 3.2439   LearningRate 0.0002   Epoch: 19   Global Step: 320690   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:38,481-Speed 9130.08 samples/sec   Loss 3.2703   LearningRate 0.0002   Epoch: 19   Global Step: 320700   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:39,626-Speed 8946.19 samples/sec   Loss 3.2535   LearningRate 0.0002   Epoch: 19   Global Step: 320710   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:40,730-Speed 9286.38 samples/sec   Loss 3.2749   LearningRate 0.0002   Epoch: 19   Global Step: 320720   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:41,848-Speed 9162.98 samples/sec   Loss 3.3069   LearningRate 0.0002   Epoch: 19   Global Step: 320730   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:42,973-Speed 9107.71 samples/sec   Loss 3.2507   LearningRate 0.0002   Epoch: 19   Global Step: 320740   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:44,130-Speed 8852.20 samples/sec   Loss 3.3136   LearningRate 0.0002   Epoch: 19   Global Step: 320750   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:45,219-Speed 9416.78 samples/sec   Loss 3.2403   LearningRate 0.0002   Epoch: 19   Global Step: 320760   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:46,341-Speed 9129.86 samples/sec   Loss 3.2601   LearningRate 0.0002   Epoch: 19   Global Step: 320770   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:47,454-Speed 9204.45 samples/sec   Loss 3.3285   LearningRate 0.0002   Epoch: 19   Global Step: 320780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:48,573-Speed 9159.85 samples/sec   Loss 3.2744   LearningRate 0.0002   Epoch: 19   Global Step: 320790   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:49,690-Speed 9172.80 samples/sec   Loss 3.3256   LearningRate 0.0002   Epoch: 19   Global Step: 320800   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:50,802-Speed 9214.28 samples/sec   Loss 3.2500   LearningRate 0.0002   Epoch: 19   Global Step: 320810   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:51,904-Speed 9300.31 samples/sec   Loss 3.2563   LearningRate 0.0002   Epoch: 19   Global Step: 320820   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:52,999-Speed 9360.75 samples/sec   Loss 3.3143   LearningRate 0.0002   Epoch: 19   Global Step: 320830   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:54,098-Speed 9318.44 samples/sec   Loss 3.2699   LearningRate 0.0002   Epoch: 19   Global Step: 320840   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:55,165-Speed 9605.40 samples/sec   Loss 3.1917   LearningRate 0.0002   Epoch: 19   Global Step: 320850   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:56,294-Speed 9078.13 samples/sec   Loss 3.3510   LearningRate 0.0002   Epoch: 19   Global Step: 320860   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:57,417-Speed 9121.44 samples/sec   Loss 3.2711   LearningRate 0.0002   Epoch: 19   Global Step: 320870   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:58,502-Speed 9445.75 samples/sec   Loss 3.2435   LearningRate 0.0002   Epoch: 19   Global Step: 320880   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:33:59,634-Speed 9054.22 samples/sec   Loss 3.2326   LearningRate 0.0002   Epoch: 19   Global Step: 320890   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:34:00,754-Speed 9142.85 samples/sec   Loss 3.2146   LearningRate 0.0001   Epoch: 19   Global Step: 320900   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:01,875-Speed 9144.10 samples/sec   Loss 3.3165   LearningRate 0.0001   Epoch: 19   Global Step: 320910   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:02,995-Speed 9143.90 samples/sec   Loss 3.2948   LearningRate 0.0001   Epoch: 19   Global Step: 320920   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:04,133-Speed 9008.97 samples/sec   Loss 3.3659   LearningRate 0.0001   Epoch: 19   Global Step: 320930   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:05,221-Speed 9414.06 samples/sec   Loss 3.3308   LearningRate 0.0001   Epoch: 19   Global Step: 320940   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:06,346-Speed 9113.02 samples/sec   Loss 3.2659   LearningRate 0.0001   Epoch: 19   Global Step: 320950   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:07,457-Speed 9221.79 samples/sec   Loss 3.3566   LearningRate 0.0001   Epoch: 19   Global Step: 320960   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:08,531-Speed 9541.93 samples/sec   Loss 3.3212   LearningRate 0.0001   Epoch: 19   Global Step: 320970   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:09,640-Speed 9239.19 samples/sec   Loss 3.3185   LearningRate 0.0001   Epoch: 19   Global Step: 320980   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:10,780-Speed 8987.13 samples/sec   Loss 3.2561   LearningRate 0.0001   Epoch: 19   Global Step: 320990   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:11,839-Speed 9677.29 samples/sec   Loss 3.2835   LearningRate 0.0001   Epoch: 19   Global Step: 321000   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:12,953-Speed 9192.04 samples/sec   Loss 3.2503   LearningRate 0.0001   Epoch: 19   Global Step: 321010   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:14,075-Speed 9136.12 samples/sec   Loss 3.3236   LearningRate 0.0001   Epoch: 19   Global Step: 321020   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:15,227-Speed 8890.52 samples/sec   Loss 3.2700   LearningRate 0.0001   Epoch: 19   Global Step: 321030   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:16,347-Speed 9151.58 samples/sec   Loss 3.3645   LearningRate 0.0001   Epoch: 19   Global Step: 321040   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:17,467-Speed 9146.93 samples/sec   Loss 3.2768   LearningRate 0.0001   Epoch: 19   Global Step: 321050   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:18,578-Speed 9217.14 samples/sec   Loss 3.2763   LearningRate 0.0001   Epoch: 19   Global Step: 321060   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:19,695-Speed 9178.66 samples/sec   Loss 3.2753   LearningRate 0.0001   Epoch: 19   Global Step: 321070   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:20,797-Speed 9294.02 samples/sec   Loss 3.3344   LearningRate 0.0001   Epoch: 19   Global Step: 321080   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:21,905-Speed 9246.18 samples/sec   Loss 3.3087   LearningRate 0.0001   Epoch: 19   Global Step: 321090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:34:23,037-Speed 9057.70 samples/sec   Loss 3.3156   LearningRate 0.0001   Epoch: 19   Global Step: 321100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:34:24,140-Speed 9284.26 samples/sec   Loss 3.2993   LearningRate 0.0001   Epoch: 19   Global Step: 321110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:34:25,293-Speed 8889.88 samples/sec   Loss 3.2527   LearningRate 0.0001   Epoch: 19   Global Step: 321120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:34:26,459-Speed 8791.13 samples/sec   Loss 3.3400   LearningRate 0.0001   Epoch: 19   Global Step: 321130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:34:27,547-Speed 9413.14 samples/sec   Loss 3.2563   LearningRate 0.0001   Epoch: 19   Global Step: 321140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:34:28,646-Speed 9321.41 samples/sec   Loss 3.2892   LearningRate 0.0001   Epoch: 19   Global Step: 321150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:34:29,746-Speed 9315.33 samples/sec   Loss 3.2767   LearningRate 0.0001   Epoch: 19   Global Step: 321160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:34:30,867-Speed 9139.25 samples/sec   Loss 3.2601   LearningRate 0.0001   Epoch: 19   Global Step: 321170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:34:31,973-Speed 9262.92 samples/sec   Loss 3.2091   LearningRate 0.0001   Epoch: 19   Global Step: 321180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:34:33,105-Speed 9054.44 samples/sec   Loss 3.2251   LearningRate 0.0001   Epoch: 19   Global Step: 321190   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:34,245-Speed 8985.23 samples/sec   Loss 3.2717   LearningRate 0.0001   Epoch: 19   Global Step: 321200   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:35,369-Speed 9118.68 samples/sec   Loss 3.2628   LearningRate 0.0001   Epoch: 19   Global Step: 321210   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:36,480-Speed 9222.05 samples/sec   Loss 3.3687   LearningRate 0.0001   Epoch: 19   Global Step: 321220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:37,569-Speed 9404.08 samples/sec   Loss 3.3409   LearningRate 0.0001   Epoch: 19   Global Step: 321230   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:38,698-Speed 9074.14 samples/sec   Loss 3.3058   LearningRate 0.0001   Epoch: 19   Global Step: 321240   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:39,822-Speed 9119.18 samples/sec   Loss 3.2701   LearningRate 0.0001   Epoch: 19   Global Step: 321250   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:40,898-Speed 9524.47 samples/sec   Loss 3.3210   LearningRate 0.0001   Epoch: 19   Global Step: 321260   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:42,011-Speed 9209.50 samples/sec   Loss 3.3125   LearningRate 0.0001   Epoch: 19   Global Step: 321270   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:43,114-Speed 9282.61 samples/sec   Loss 3.2610   LearningRate 0.0001   Epoch: 19   Global Step: 321280   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:44,225-Speed 9230.77 samples/sec   Loss 3.2697   LearningRate 0.0001   Epoch: 19   Global Step: 321290   Fp16 Grad Scale: 131072   Required: 1 hours
Training: 2022-04-12 00:34:45,329-Speed 9278.92 samples/sec   Loss 3.2845   LearningRate 0.0001   Epoch: 19   Global Step: 321300   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:46,458-Speed 9076.96 samples/sec   Loss 3.3650   LearningRate 0.0001   Epoch: 19   Global Step: 321310   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:47,564-Speed 9258.38 samples/sec   Loss 3.2403   LearningRate 0.0001   Epoch: 19   Global Step: 321320   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:48,736-Speed 8749.05 samples/sec   Loss 3.2538   LearningRate 0.0001   Epoch: 19   Global Step: 321330   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:49,867-Speed 9060.50 samples/sec   Loss 3.2385   LearningRate 0.0001   Epoch: 19   Global Step: 321340   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:50,996-Speed 9071.61 samples/sec   Loss 3.2678   LearningRate 0.0001   Epoch: 19   Global Step: 321350   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:52,091-Speed 9359.77 samples/sec   Loss 3.3463   LearningRate 0.0001   Epoch: 19   Global Step: 321360   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:53,197-Speed 9266.19 samples/sec   Loss 3.3549   LearningRate 0.0001   Epoch: 19   Global Step: 321370   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:54,352-Speed 8872.96 samples/sec   Loss 3.2402   LearningRate 0.0001   Epoch: 19   Global Step: 321380   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:55,453-Speed 9307.58 samples/sec   Loss 3.1853   LearningRate 0.0001   Epoch: 19   Global Step: 321390   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:56,558-Speed 9273.89 samples/sec   Loss 3.2339   LearningRate 0.0001   Epoch: 19   Global Step: 321400   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:57,640-Speed 9463.24 samples/sec   Loss 3.2656   LearningRate 0.0001   Epoch: 19   Global Step: 321410   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:58,758-Speed 9167.60 samples/sec   Loss 3.3638   LearningRate 0.0001   Epoch: 19   Global Step: 321420   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-04-12 00:34:59,819-Speed 9657.87 samples/sec   Loss 3.2403   LearningRate 0.0001   Epoch: 19   Global Step: 321430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:35:00,899-Speed 9489.51 samples/sec   Loss 3.2712   LearningRate 0.0001   Epoch: 19   Global Step: 321440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:35:02,000-Speed 9302.50 samples/sec   Loss 3.2753   LearningRate 0.0001   Epoch: 19   Global Step: 321450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:35:03,097-Speed 9347.66 samples/sec   Loss 3.3187   LearningRate 0.0001   Epoch: 19   Global Step: 321460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:35:04,220-Speed 9118.00 samples/sec   Loss 3.2522   LearningRate 0.0001   Epoch: 19   Global Step: 321470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:35:05,360-Speed 8991.26 samples/sec   Loss 3.3121   LearningRate 0.0001   Epoch: 19   Global Step: 321480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-12 00:35:06,491-Speed 9061.69 samples/sec   Loss 3.2779   LearningRate 0.0001   Epoch: 19   Global Step: 321490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:35:07,633-Speed 8974.11 samples/sec   Loss 3.2830   LearningRate 0.0001   Epoch: 19   Global Step: 321500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:35:08,763-Speed 9067.05 samples/sec   Loss 3.2727   LearningRate 0.0001   Epoch: 19   Global Step: 321510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:35:09,878-Speed 9188.92 samples/sec   Loss 3.2489   LearningRate 0.0001   Epoch: 19   Global Step: 321520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:35:10,998-Speed 9145.73 samples/sec   Loss 3.3294   LearningRate 0.0001   Epoch: 19   Global Step: 321530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:12,123-Speed 9105.60 samples/sec   Loss 3.3146   LearningRate 0.0001   Epoch: 19   Global Step: 321540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:13,196-Speed 9555.01 samples/sec   Loss 3.2540   LearningRate 0.0001   Epoch: 19   Global Step: 321550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:14,302-Speed 9256.90 samples/sec   Loss 3.2814   LearningRate 0.0001   Epoch: 19   Global Step: 321560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:15,467-Speed 8796.66 samples/sec   Loss 3.2695   LearningRate 0.0001   Epoch: 19   Global Step: 321570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:16,591-Speed 9112.33 samples/sec   Loss 3.2584   LearningRate 0.0001   Epoch: 19   Global Step: 321580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:17,733-Speed 8973.36 samples/sec   Loss 3.3285   LearningRate 0.0001   Epoch: 19   Global Step: 321590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:18,871-Speed 9009.53 samples/sec   Loss 3.2851   LearningRate 0.0001   Epoch: 19   Global Step: 321600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:19,987-Speed 9185.12 samples/sec   Loss 3.3065   LearningRate 0.0001   Epoch: 19   Global Step: 321610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:21,119-Speed 9052.17 samples/sec   Loss 3.2249   LearningRate 0.0001   Epoch: 19   Global Step: 321620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:22,222-Speed 9284.30 samples/sec   Loss 3.2411   LearningRate 0.0001   Epoch: 19   Global Step: 321630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:23,330-Speed 9244.86 samples/sec   Loss 3.2592   LearningRate 0.0001   Epoch: 19   Global Step: 321640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:24,473-Speed 8968.46 samples/sec   Loss 3.2642   LearningRate 0.0001   Epoch: 19   Global Step: 321650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:25,647-Speed 8723.98 samples/sec   Loss 3.3576   LearningRate 0.0001   Epoch: 19   Global Step: 321660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:26,766-Speed 9157.28 samples/sec   Loss 3.2491   LearningRate 0.0001   Epoch: 19   Global Step: 321670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:27,879-Speed 9208.35 samples/sec   Loss 3.3127   LearningRate 0.0001   Epoch: 19   Global Step: 321680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:29,042-Speed 8808.11 samples/sec   Loss 3.2484   LearningRate 0.0001   Epoch: 19   Global Step: 321690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:30,162-Speed 9144.07 samples/sec   Loss 3.2776   LearningRate 0.0001   Epoch: 19   Global Step: 321700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:31,291-Speed 9081.03 samples/sec   Loss 3.2287   LearningRate 0.0001   Epoch: 19   Global Step: 321710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:32,390-Speed 9321.96 samples/sec   Loss 3.2275   LearningRate 0.0001   Epoch: 19   Global Step: 321720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:33,491-Speed 9303.20 samples/sec   Loss 3.2775   LearningRate 0.0001   Epoch: 19   Global Step: 321730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:34,663-Speed 8742.22 samples/sec   Loss 3.2127   LearningRate 0.0001   Epoch: 19   Global Step: 321740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:35,761-Speed 9327.98 samples/sec   Loss 3.3498   LearningRate 0.0001   Epoch: 19   Global Step: 321750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:36,869-Speed 9252.69 samples/sec   Loss 3.2869   LearningRate 0.0001   Epoch: 19   Global Step: 321760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:37,976-Speed 9259.76 samples/sec   Loss 3.2925   LearningRate 0.0001   Epoch: 19   Global Step: 321770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:35:39,185-Speed 8476.10 samples/sec   Loss 3.3151   LearningRate 0.0001   Epoch: 19   Global Step: 321780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:35:40,290-Speed 9271.71 samples/sec   Loss 3.3052   LearningRate 0.0001   Epoch: 19   Global Step: 321790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:35:41,382-Speed 9384.49 samples/sec   Loss 3.2745   LearningRate 0.0001   Epoch: 19   Global Step: 321800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:35:42,471-Speed 9406.19 samples/sec   Loss 3.3598   LearningRate 0.0001   Epoch: 19   Global Step: 321810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:35:43,567-Speed 9342.76 samples/sec   Loss 3.3210   LearningRate 0.0001   Epoch: 19   Global Step: 321820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:35:44,680-Speed 9207.39 samples/sec   Loss 3.3233   LearningRate 0.0001   Epoch: 19   Global Step: 321830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:35:45,770-Speed 9401.36 samples/sec   Loss 3.2727   LearningRate 0.0001   Epoch: 19   Global Step: 321840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:35:46,892-Speed 9132.59 samples/sec   Loss 3.2500   LearningRate 0.0001   Epoch: 19   Global Step: 321850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:35:48,012-Speed 9146.05 samples/sec   Loss 3.3289   LearningRate 0.0001   Epoch: 19   Global Step: 321860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:35:49,124-Speed 9212.39 samples/sec   Loss 3.2610   LearningRate 0.0001   Epoch: 19   Global Step: 321870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:50,213-Speed 9408.31 samples/sec   Loss 3.1802   LearningRate 0.0001   Epoch: 19   Global Step: 321880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:51,327-Speed 9201.01 samples/sec   Loss 3.3898   LearningRate 0.0001   Epoch: 19   Global Step: 321890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:52,403-Speed 9522.90 samples/sec   Loss 3.3513   LearningRate 0.0001   Epoch: 19   Global Step: 321900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:53,550-Speed 8933.21 samples/sec   Loss 3.3016   LearningRate 0.0001   Epoch: 19   Global Step: 321910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:54,737-Speed 8629.12 samples/sec   Loss 3.3098   LearningRate 0.0001   Epoch: 19   Global Step: 321920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:55,878-Speed 8986.02 samples/sec   Loss 3.2840   LearningRate 0.0001   Epoch: 19   Global Step: 321930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:56,977-Speed 9328.05 samples/sec   Loss 3.2535   LearningRate 0.0001   Epoch: 19   Global Step: 321940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:58,069-Speed 9376.14 samples/sec   Loss 3.2846   LearningRate 0.0001   Epoch: 19   Global Step: 321950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:35:59,223-Speed 8878.24 samples/sec   Loss 3.2390   LearningRate 0.0001   Epoch: 19   Global Step: 321960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:36:00,377-Speed 8881.55 samples/sec   Loss 3.2178   LearningRate 0.0001   Epoch: 19   Global Step: 321970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:36:01,485-Speed 9244.93 samples/sec   Loss 3.3057   LearningRate 0.0001   Epoch: 19   Global Step: 321980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:36:02,630-Speed 8949.37 samples/sec   Loss 3.2830   LearningRate 0.0001   Epoch: 19   Global Step: 321990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:36:03,732-Speed 9295.79 samples/sec   Loss 3.3657   LearningRate 0.0001   Epoch: 19   Global Step: 322000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:36:25,860-[lfw][322000]XNorm: 6.541839
Training: 2022-04-12 00:36:25,861-[lfw][322000]Accuracy-Flip: 0.99667+-0.00269
Training: 2022-04-12 00:36:25,861-[lfw][322000]Accuracy-Highest: 0.99750
Training: 2022-04-12 00:36:51,110-[cfp_fp][322000]XNorm: 5.712872
Training: 2022-04-12 00:36:51,110-[cfp_fp][322000]Accuracy-Flip: 0.97371+-0.00767
Training: 2022-04-12 00:36:51,111-[cfp_fp][322000]Accuracy-Highest: 0.97514
Training: 2022-04-12 00:37:12,947-[agedb_30][322000]XNorm: 6.374946
Training: 2022-04-12 00:37:12,947-[agedb_30][322000]Accuracy-Flip: 0.97150+-0.00783
Training: 2022-04-12 00:37:12,947-[agedb_30][322000]Accuracy-Highest: 0.97417
Training: 2022-04-12 00:37:14,056-Speed 145.61 samples/sec   Loss 3.3088   LearningRate 0.0001   Epoch: 19   Global Step: 322010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:15,162-Speed 9264.82 samples/sec   Loss 3.3498   LearningRate 0.0001   Epoch: 19   Global Step: 322020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:16,301-Speed 8995.57 samples/sec   Loss 3.3168   LearningRate 0.0001   Epoch: 19   Global Step: 322030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:17,448-Speed 8932.26 samples/sec   Loss 3.2901   LearningRate 0.0001   Epoch: 19   Global Step: 322040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:18,564-Speed 9180.66 samples/sec   Loss 3.2890   LearningRate 0.0001   Epoch: 19   Global Step: 322050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:19,672-Speed 9252.48 samples/sec   Loss 3.3044   LearningRate 0.0001   Epoch: 19   Global Step: 322060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:20,795-Speed 9117.51 samples/sec   Loss 3.2958   LearningRate 0.0001   Epoch: 19   Global Step: 322070   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:37:21,871-Speed 9526.44 samples/sec   Loss 3.3874   LearningRate 0.0001   Epoch: 19   Global Step: 322080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:22,970-Speed 9319.93 samples/sec   Loss 3.3205   LearningRate 0.0001   Epoch: 19   Global Step: 322090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:24,100-Speed 9067.20 samples/sec   Loss 3.2691   LearningRate 0.0001   Epoch: 19   Global Step: 322100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:25,193-Speed 9375.58 samples/sec   Loss 3.2962   LearningRate 0.0001   Epoch: 19   Global Step: 322110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:26,291-Speed 9377.65 samples/sec   Loss 3.2625   LearningRate 0.0001   Epoch: 19   Global Step: 322120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:27,380-Speed 9404.18 samples/sec   Loss 3.2585   LearningRate 0.0001   Epoch: 19   Global Step: 322130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:28,560-Speed 8681.54 samples/sec   Loss 3.3603   LearningRate 0.0001   Epoch: 19   Global Step: 322140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:29,702-Speed 8977.40 samples/sec   Loss 3.2468   LearningRate 0.0001   Epoch: 19   Global Step: 322150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:30,825-Speed 9119.88 samples/sec   Loss 3.2416   LearningRate 0.0001   Epoch: 19   Global Step: 322160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:31,948-Speed 9118.57 samples/sec   Loss 3.2643   LearningRate 0.0001   Epoch: 19   Global Step: 322170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:33,061-Speed 9217.17 samples/sec   Loss 3.3292   LearningRate 0.0001   Epoch: 19   Global Step: 322180   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:37:34,147-Speed 9432.71 samples/sec   Loss 3.2828   LearningRate 0.0001   Epoch: 19   Global Step: 322190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:35,270-Speed 9118.94 samples/sec   Loss 3.3006   LearningRate 0.0001   Epoch: 19   Global Step: 322200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:36,373-Speed 9293.85 samples/sec   Loss 3.2689   LearningRate 0.0001   Epoch: 19   Global Step: 322210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:37,447-Speed 9532.84 samples/sec   Loss 3.3104   LearningRate 0.0001   Epoch: 19   Global Step: 322220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:38,621-Speed 8733.59 samples/sec   Loss 3.3083   LearningRate 0.0001   Epoch: 19   Global Step: 322230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:37:39,796-Speed 8717.98 samples/sec   Loss 3.3431   LearningRate 0.0001   Epoch: 19   Global Step: 322240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:37:40,876-Speed 9484.62 samples/sec   Loss 3.2441   LearningRate 0.0001   Epoch: 19   Global Step: 322250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:37:42,016-Speed 8987.80 samples/sec   Loss 3.3739   LearningRate 0.0001   Epoch: 19   Global Step: 322260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:37:43,183-Speed 8785.87 samples/sec   Loss 3.3148   LearningRate 0.0001   Epoch: 19   Global Step: 322270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:37:44,275-Speed 9382.99 samples/sec   Loss 3.3038   LearningRate 0.0001   Epoch: 19   Global Step: 322280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:37:45,372-Speed 9337.36 samples/sec   Loss 3.2580   LearningRate 0.0001   Epoch: 19   Global Step: 322290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:37:46,446-Speed 9541.52 samples/sec   Loss 3.2782   LearningRate 0.0001   Epoch: 19   Global Step: 322300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:37:47,553-Speed 9260.09 samples/sec   Loss 3.3089   LearningRate 0.0001   Epoch: 19   Global Step: 322310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:37:48,652-Speed 9319.38 samples/sec   Loss 3.3193   LearningRate 0.0001   Epoch: 19   Global Step: 322320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:37:49,726-Speed 9542.66 samples/sec   Loss 3.3051   LearningRate 0.0001   Epoch: 19   Global Step: 322330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:50,825-Speed 9323.77 samples/sec   Loss 3.2515   LearningRate 0.0001   Epoch: 19   Global Step: 322340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:51,938-Speed 9204.97 samples/sec   Loss 3.2403   LearningRate 0.0001   Epoch: 19   Global Step: 322350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:53,058-Speed 9146.79 samples/sec   Loss 3.1940   LearningRate 0.0001   Epoch: 19   Global Step: 322360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:54,225-Speed 8777.40 samples/sec   Loss 3.2729   LearningRate 0.0001   Epoch: 19   Global Step: 322370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:55,351-Speed 9105.84 samples/sec   Loss 3.3278   LearningRate 0.0001   Epoch: 19   Global Step: 322380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:56,440-Speed 9410.28 samples/sec   Loss 3.2850   LearningRate 0.0001   Epoch: 19   Global Step: 322390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:57,539-Speed 9322.92 samples/sec   Loss 3.2197   LearningRate 0.0001   Epoch: 19   Global Step: 322400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:58,657-Speed 9159.28 samples/sec   Loss 3.2937   LearningRate 0.0001   Epoch: 19   Global Step: 322410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:37:59,767-Speed 9238.41 samples/sec   Loss 3.2613   LearningRate 0.0001   Epoch: 19   Global Step: 322420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:00,858-Speed 9387.91 samples/sec   Loss 3.2712   LearningRate 0.0001   Epoch: 19   Global Step: 322430   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:38:01,965-Speed 9260.95 samples/sec   Loss 3.3508   LearningRate 0.0001   Epoch: 19   Global Step: 322440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:03,082-Speed 9172.97 samples/sec   Loss 3.2871   LearningRate 0.0001   Epoch: 19   Global Step: 322450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:04,213-Speed 9063.56 samples/sec   Loss 3.2644   LearningRate 0.0001   Epoch: 19   Global Step: 322460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:05,334-Speed 9136.85 samples/sec   Loss 3.2713   LearningRate 0.0001   Epoch: 19   Global Step: 322470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:06,436-Speed 9296.11 samples/sec   Loss 3.3064   LearningRate 0.0001   Epoch: 19   Global Step: 322480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:07,564-Speed 9086.45 samples/sec   Loss 3.2877   LearningRate 0.0001   Epoch: 19   Global Step: 322490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:08,633-Speed 9583.80 samples/sec   Loss 3.3341   LearningRate 0.0001   Epoch: 19   Global Step: 322500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:09,743-Speed 9223.15 samples/sec   Loss 3.2687   LearningRate 0.0001   Epoch: 19   Global Step: 322510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:10,866-Speed 9123.47 samples/sec   Loss 3.2817   LearningRate 0.0001   Epoch: 19   Global Step: 322520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:11,994-Speed 9082.89 samples/sec   Loss 3.3254   LearningRate 0.0001   Epoch: 19   Global Step: 322530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:13,069-Speed 9533.63 samples/sec   Loss 3.2454   LearningRate 0.0001   Epoch: 19   Global Step: 322540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:14,240-Speed 8752.38 samples/sec   Loss 3.3008   LearningRate 0.0001   Epoch: 19   Global Step: 322550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:15,389-Speed 8918.80 samples/sec   Loss 3.2223   LearningRate 0.0001   Epoch: 19   Global Step: 322560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:16,513-Speed 9115.74 samples/sec   Loss 3.2258   LearningRate 0.0001   Epoch: 19   Global Step: 322570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:17,611-Speed 9331.84 samples/sec   Loss 3.2718   LearningRate 0.0001   Epoch: 19   Global Step: 322580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:18,703-Speed 9382.17 samples/sec   Loss 3.3592   LearningRate 0.0001   Epoch: 19   Global Step: 322590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:19,808-Speed 9276.16 samples/sec   Loss 3.3263   LearningRate 0.0001   Epoch: 19   Global Step: 322600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:20,890-Speed 9469.27 samples/sec   Loss 3.3512   LearningRate 0.0001   Epoch: 19   Global Step: 322610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:21,992-Speed 9295.24 samples/sec   Loss 3.3564   LearningRate 0.0001   Epoch: 19   Global Step: 322620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:23,085-Speed 9376.02 samples/sec   Loss 3.3507   LearningRate 0.0001   Epoch: 19   Global Step: 322630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:24,170-Speed 9447.09 samples/sec   Loss 3.2478   LearningRate 0.0001   Epoch: 19   Global Step: 322640   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:38:25,245-Speed 9531.78 samples/sec   Loss 3.2460   LearningRate 0.0001   Epoch: 19   Global Step: 322650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:26,347-Speed 9297.70 samples/sec   Loss 3.2836   LearningRate 0.0001   Epoch: 19   Global Step: 322660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:27,479-Speed 9049.50 samples/sec   Loss 3.2502   LearningRate 0.0001   Epoch: 19   Global Step: 322670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:28,604-Speed 9105.57 samples/sec   Loss 3.2823   LearningRate 0.0001   Epoch: 19   Global Step: 322680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:29,751-Speed 8935.99 samples/sec   Loss 3.2164   LearningRate 0.0001   Epoch: 19   Global Step: 322690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:30,888-Speed 9013.43 samples/sec   Loss 3.2961   LearningRate 0.0001   Epoch: 19   Global Step: 322700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:32,014-Speed 9100.25 samples/sec   Loss 3.2764   LearningRate 0.0001   Epoch: 19   Global Step: 322710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:33,164-Speed 8912.77 samples/sec   Loss 3.2993   LearningRate 0.0001   Epoch: 19   Global Step: 322720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:34,348-Speed 8654.45 samples/sec   Loss 3.3004   LearningRate 0.0001   Epoch: 19   Global Step: 322730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:35,494-Speed 8941.71 samples/sec   Loss 3.3582   LearningRate 0.0001   Epoch: 19   Global Step: 322740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:36,619-Speed 9105.67 samples/sec   Loss 3.2950   LearningRate 0.0001   Epoch: 19   Global Step: 322750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:37,756-Speed 9013.11 samples/sec   Loss 3.3308   LearningRate 0.0001   Epoch: 19   Global Step: 322760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:38,899-Speed 8964.04 samples/sec   Loss 3.3376   LearningRate 0.0001   Epoch: 19   Global Step: 322770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:40,024-Speed 9111.34 samples/sec   Loss 3.2495   LearningRate 0.0001   Epoch: 19   Global Step: 322780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:41,113-Speed 9406.54 samples/sec   Loss 3.3258   LearningRate 0.0001   Epoch: 19   Global Step: 322790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:42,259-Speed 8937.79 samples/sec   Loss 3.2226   LearningRate 0.0001   Epoch: 19   Global Step: 322800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:43,380-Speed 9139.66 samples/sec   Loss 3.2727   LearningRate 0.0001   Epoch: 19   Global Step: 322810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:44,490-Speed 9233.82 samples/sec   Loss 3.3897   LearningRate 0.0001   Epoch: 19   Global Step: 322820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:45,637-Speed 8936.53 samples/sec   Loss 3.3401   LearningRate 0.0001   Epoch: 19   Global Step: 322830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:46,763-Speed 9098.03 samples/sec   Loss 3.3494   LearningRate 0.0001   Epoch: 19   Global Step: 322840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:47,896-Speed 9039.12 samples/sec   Loss 3.3083   LearningRate 0.0001   Epoch: 19   Global Step: 322850   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:38:49,025-Speed 9077.55 samples/sec   Loss 3.2341   LearningRate 0.0001   Epoch: 19   Global Step: 322860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:50,170-Speed 8953.84 samples/sec   Loss 3.3733   LearningRate 0.0001   Epoch: 19   Global Step: 322870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:51,363-Speed 8587.82 samples/sec   Loss 3.2757   LearningRate 0.0001   Epoch: 19   Global Step: 322880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:52,453-Speed 9395.98 samples/sec   Loss 3.3076   LearningRate 0.0001   Epoch: 19   Global Step: 322890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:53,568-Speed 9191.02 samples/sec   Loss 3.2984   LearningRate 0.0001   Epoch: 19   Global Step: 322900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:54,701-Speed 9039.24 samples/sec   Loss 3.2536   LearningRate 0.0001   Epoch: 19   Global Step: 322910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:55,815-Speed 9205.17 samples/sec   Loss 3.2544   LearningRate 0.0001   Epoch: 19   Global Step: 322920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:56,951-Speed 9025.57 samples/sec   Loss 3.2776   LearningRate 0.0001   Epoch: 19   Global Step: 322930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:58,102-Speed 8901.88 samples/sec   Loss 3.3080   LearningRate 0.0001   Epoch: 19   Global Step: 322940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:38:59,246-Speed 8954.61 samples/sec   Loss 3.2566   LearningRate 0.0001   Epoch: 19   Global Step: 322950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:00,388-Speed 8977.50 samples/sec   Loss 3.3219   LearningRate 0.0001   Epoch: 19   Global Step: 322960   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:39:01,499-Speed 9216.12 samples/sec   Loss 3.3083   LearningRate 0.0001   Epoch: 19   Global Step: 322970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:02,650-Speed 8902.90 samples/sec   Loss 3.3182   LearningRate 0.0001   Epoch: 19   Global Step: 322980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:03,780-Speed 9071.90 samples/sec   Loss 3.2753   LearningRate 0.0001   Epoch: 19   Global Step: 322990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:04,917-Speed 9008.38 samples/sec   Loss 3.3985   LearningRate 0.0001   Epoch: 19   Global Step: 323000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:06,040-Speed 9124.28 samples/sec   Loss 3.2556   LearningRate 0.0001   Epoch: 19   Global Step: 323010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:07,176-Speed 9023.03 samples/sec   Loss 3.2370   LearningRate 0.0001   Epoch: 19   Global Step: 323020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:08,297-Speed 9133.27 samples/sec   Loss 3.2030   LearningRate 0.0001   Epoch: 19   Global Step: 323030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:09,465-Speed 8777.86 samples/sec   Loss 3.2931   LearningRate 0.0001   Epoch: 19   Global Step: 323040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:10,588-Speed 9123.03 samples/sec   Loss 3.3000   LearningRate 0.0001   Epoch: 19   Global Step: 323050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:11,703-Speed 9184.56 samples/sec   Loss 3.2391   LearningRate 0.0001   Epoch: 19   Global Step: 323060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:12,841-Speed 9004.27 samples/sec   Loss 3.1880   LearningRate 0.0001   Epoch: 19   Global Step: 323070   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:39:13,948-Speed 9257.53 samples/sec   Loss 3.3273   LearningRate 0.0001   Epoch: 19   Global Step: 323080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:15,048-Speed 9320.57 samples/sec   Loss 3.3016   LearningRate 0.0001   Epoch: 19   Global Step: 323090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:16,115-Speed 9598.55 samples/sec   Loss 3.2375   LearningRate 0.0001   Epoch: 19   Global Step: 323100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:17,221-Speed 9268.06 samples/sec   Loss 3.3375   LearningRate 0.0001   Epoch: 19   Global Step: 323110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:18,300-Speed 9493.43 samples/sec   Loss 3.2770   LearningRate 0.0001   Epoch: 19   Global Step: 323120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:19,468-Speed 8769.75 samples/sec   Loss 3.3205   LearningRate 0.0001   Epoch: 19   Global Step: 323130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:20,623-Speed 8873.20 samples/sec   Loss 3.2254   LearningRate 0.0001   Epoch: 19   Global Step: 323140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:21,732-Speed 9236.40 samples/sec   Loss 3.3172   LearningRate 0.0001   Epoch: 19   Global Step: 323150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:22,863-Speed 9058.35 samples/sec   Loss 3.3429   LearningRate 0.0001   Epoch: 19   Global Step: 323160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:24,033-Speed 8762.35 samples/sec   Loss 3.3191   LearningRate 0.0001   Epoch: 19   Global Step: 323170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:25,109-Speed 9520.77 samples/sec   Loss 3.2763   LearningRate 0.0001   Epoch: 19   Global Step: 323180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:26,254-Speed 8947.09 samples/sec   Loss 3.3762   LearningRate 0.0001   Epoch: 19   Global Step: 323190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:27,400-Speed 8943.36 samples/sec   Loss 3.2274   LearningRate 0.0001   Epoch: 19   Global Step: 323200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:28,531-Speed 9060.52 samples/sec   Loss 3.2605   LearningRate 0.0001   Epoch: 19   Global Step: 323210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:29,682-Speed 8894.92 samples/sec   Loss 3.2897   LearningRate 0.0001   Epoch: 19   Global Step: 323220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:30,795-Speed 9207.43 samples/sec   Loss 3.2695   LearningRate 0.0001   Epoch: 19   Global Step: 323230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:31,931-Speed 9020.79 samples/sec   Loss 3.3754   LearningRate 0.0001   Epoch: 19   Global Step: 323240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:33,006-Speed 9529.63 samples/sec   Loss 3.2612   LearningRate 0.0001   Epoch: 19   Global Step: 323250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:34,123-Speed 9175.04 samples/sec   Loss 3.3196   LearningRate 0.0001   Epoch: 19   Global Step: 323260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:35,241-Speed 9167.16 samples/sec   Loss 3.3419   LearningRate 0.0001   Epoch: 19   Global Step: 323270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:36,369-Speed 9083.38 samples/sec   Loss 3.2263   LearningRate 0.0001   Epoch: 19   Global Step: 323280   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:39:37,484-Speed 9185.52 samples/sec   Loss 3.3299   LearningRate 0.0001   Epoch: 19   Global Step: 323290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:38,593-Speed 9242.13 samples/sec   Loss 3.2694   LearningRate 0.0001   Epoch: 19   Global Step: 323300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:39,737-Speed 8952.73 samples/sec   Loss 3.3088   LearningRate 0.0001   Epoch: 19   Global Step: 323310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:40,859-Speed 9136.95 samples/sec   Loss 3.3146   LearningRate 0.0001   Epoch: 19   Global Step: 323320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:41,997-Speed 9000.92 samples/sec   Loss 3.2748   LearningRate 0.0001   Epoch: 19   Global Step: 323330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:43,142-Speed 8948.05 samples/sec   Loss 3.2579   LearningRate 0.0001   Epoch: 19   Global Step: 323340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:44,321-Speed 8689.49 samples/sec   Loss 3.2847   LearningRate 0.0001   Epoch: 19   Global Step: 323350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:45,402-Speed 9482.64 samples/sec   Loss 3.2723   LearningRate 0.0001   Epoch: 19   Global Step: 323360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:46,534-Speed 9049.45 samples/sec   Loss 3.2892   LearningRate 0.0001   Epoch: 19   Global Step: 323370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:47,660-Speed 9102.04 samples/sec   Loss 3.2944   LearningRate 0.0001   Epoch: 19   Global Step: 323380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:48,809-Speed 8920.59 samples/sec   Loss 3.2737   LearningRate 0.0001   Epoch: 19   Global Step: 323390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:49,959-Speed 8908.96 samples/sec   Loss 3.2614   LearningRate 0.0001   Epoch: 19   Global Step: 323400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:51,095-Speed 9017.99 samples/sec   Loss 3.2783   LearningRate 0.0001   Epoch: 19   Global Step: 323410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:52,222-Speed 9087.27 samples/sec   Loss 3.3263   LearningRate 0.0001   Epoch: 19   Global Step: 323420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:53,367-Speed 8953.89 samples/sec   Loss 3.2363   LearningRate 0.0001   Epoch: 19   Global Step: 323430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:54,487-Speed 9145.68 samples/sec   Loss 3.2769   LearningRate 0.0001   Epoch: 19   Global Step: 323440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:55,583-Speed 9355.14 samples/sec   Loss 3.2200   LearningRate 0.0001   Epoch: 19   Global Step: 323450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:56,706-Speed 9120.82 samples/sec   Loss 3.2876   LearningRate 0.0001   Epoch: 19   Global Step: 323460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:57,806-Speed 9308.45 samples/sec   Loss 3.3051   LearningRate 0.0001   Epoch: 19   Global Step: 323470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:39:58,931-Speed 9112.01 samples/sec   Loss 3.2631   LearningRate 0.0001   Epoch: 19   Global Step: 323480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:00,007-Speed 9518.41 samples/sec   Loss 3.3065   LearningRate 0.0001   Epoch: 19   Global Step: 323490   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:40:01,136-Speed 9075.41 samples/sec   Loss 3.3229   LearningRate 0.0001   Epoch: 19   Global Step: 323500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:02,266-Speed 9069.87 samples/sec   Loss 3.3119   LearningRate 0.0001   Epoch: 19   Global Step: 323510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:03,399-Speed 9046.24 samples/sec   Loss 3.2431   LearningRate 0.0001   Epoch: 19   Global Step: 323520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:04,496-Speed 9339.71 samples/sec   Loss 3.2991   LearningRate 0.0001   Epoch: 19   Global Step: 323530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:05,656-Speed 8829.81 samples/sec   Loss 3.2884   LearningRate 0.0001   Epoch: 19   Global Step: 323540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:06,749-Speed 9372.91 samples/sec   Loss 3.2963   LearningRate 0.0001   Epoch: 19   Global Step: 323550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:07,885-Speed 9018.22 samples/sec   Loss 3.2396   LearningRate 0.0001   Epoch: 19   Global Step: 323560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:09,009-Speed 9120.12 samples/sec   Loss 3.3066   LearningRate 0.0001   Epoch: 19   Global Step: 323570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:10,154-Speed 8947.05 samples/sec   Loss 3.3182   LearningRate 0.0001   Epoch: 19   Global Step: 323580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:11,307-Speed 8885.37 samples/sec   Loss 3.2386   LearningRate 0.0001   Epoch: 19   Global Step: 323590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:12,425-Speed 9168.31 samples/sec   Loss 3.2685   LearningRate 0.0001   Epoch: 19   Global Step: 323600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:13,520-Speed 9353.34 samples/sec   Loss 3.2808   LearningRate 0.0001   Epoch: 19   Global Step: 323610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:14,652-Speed 9066.86 samples/sec   Loss 3.3221   LearningRate 0.0001   Epoch: 19   Global Step: 323620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:15,811-Speed 8834.75 samples/sec   Loss 3.3542   LearningRate 0.0001   Epoch: 19   Global Step: 323630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:16,943-Speed 9053.10 samples/sec   Loss 3.3109   LearningRate 0.0001   Epoch: 19   Global Step: 323640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:18,055-Speed 9212.05 samples/sec   Loss 3.3699   LearningRate 0.0001   Epoch: 19   Global Step: 323650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:19,159-Speed 9280.90 samples/sec   Loss 3.2407   LearningRate 0.0001   Epoch: 19   Global Step: 323660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:20,243-Speed 9455.52 samples/sec   Loss 3.3151   LearningRate 0.0001   Epoch: 19   Global Step: 323670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:21,323-Speed 9488.30 samples/sec   Loss 3.3191   LearningRate 0.0001   Epoch: 19   Global Step: 323680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:22,442-Speed 9151.97 samples/sec   Loss 3.2816   LearningRate 0.0001   Epoch: 19   Global Step: 323690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:23,577-Speed 9029.33 samples/sec   Loss 3.3001   LearningRate 0.0001   Epoch: 19   Global Step: 323700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:24,647-Speed 9578.92 samples/sec   Loss 3.2657   LearningRate 0.0001   Epoch: 19   Global Step: 323710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:25,785-Speed 9008.70 samples/sec   Loss 3.3152   LearningRate 0.0001   Epoch: 19   Global Step: 323720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:26,911-Speed 9098.88 samples/sec   Loss 3.2771   LearningRate 0.0001   Epoch: 19   Global Step: 323730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:28,054-Speed 8961.22 samples/sec   Loss 3.3798   LearningRate 0.0001   Epoch: 19   Global Step: 323740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:29,164-Speed 9226.12 samples/sec   Loss 3.2792   LearningRate 0.0001   Epoch: 19   Global Step: 323750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:30,300-Speed 9027.89 samples/sec   Loss 3.2598   LearningRate 0.0001   Epoch: 19   Global Step: 323760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:31,413-Speed 9204.31 samples/sec   Loss 3.2004   LearningRate 0.0001   Epoch: 19   Global Step: 323770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:32,548-Speed 9031.01 samples/sec   Loss 3.2795   LearningRate 0.0001   Epoch: 19   Global Step: 323780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:33,689-Speed 8979.58 samples/sec   Loss 3.2156   LearningRate 0.0001   Epoch: 19   Global Step: 323790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:34,808-Speed 9150.76 samples/sec   Loss 3.3054   LearningRate 0.0001   Epoch: 19   Global Step: 323800   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:40:35,921-Speed 9203.84 samples/sec   Loss 3.2863   LearningRate 0.0001   Epoch: 19   Global Step: 323810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:37,019-Speed 9337.76 samples/sec   Loss 3.2517   LearningRate 0.0001   Epoch: 19   Global Step: 323820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:38,122-Speed 9285.80 samples/sec   Loss 3.3516   LearningRate 0.0001   Epoch: 19   Global Step: 323830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:39,255-Speed 9045.94 samples/sec   Loss 3.2803   LearningRate 0.0001   Epoch: 19   Global Step: 323840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:40,364-Speed 9236.78 samples/sec   Loss 3.3275   LearningRate 0.0001   Epoch: 19   Global Step: 323850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:41,472-Speed 9246.64 samples/sec   Loss 3.2818   LearningRate 0.0001   Epoch: 19   Global Step: 323860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:42,575-Speed 9287.34 samples/sec   Loss 3.3227   LearningRate 0.0001   Epoch: 19   Global Step: 323870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:43,685-Speed 9227.83 samples/sec   Loss 3.3007   LearningRate 0.0001   Epoch: 19   Global Step: 323880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:44,775-Speed 9410.86 samples/sec   Loss 3.3180   LearningRate 0.0001   Epoch: 19   Global Step: 323890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:45,896-Speed 9136.26 samples/sec   Loss 3.2101   LearningRate 0.0001   Epoch: 19   Global Step: 323900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:46,965-Speed 9582.20 samples/sec   Loss 3.3143   LearningRate 0.0001   Epoch: 19   Global Step: 323910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:48,105-Speed 8992.95 samples/sec   Loss 3.3258   LearningRate 0.0001   Epoch: 19   Global Step: 323920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:49,229-Speed 9113.49 samples/sec   Loss 3.2313   LearningRate 0.0001   Epoch: 19   Global Step: 323930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:50,364-Speed 9026.03 samples/sec   Loss 3.3142   LearningRate 0.0001   Epoch: 19   Global Step: 323940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:51,470-Speed 9268.19 samples/sec   Loss 3.2604   LearningRate 0.0001   Epoch: 19   Global Step: 323950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:52,583-Speed 9205.82 samples/sec   Loss 3.3480   LearningRate 0.0001   Epoch: 19   Global Step: 323960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:53,710-Speed 9091.36 samples/sec   Loss 3.3315   LearningRate 0.0001   Epoch: 19   Global Step: 323970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:54,792-Speed 9464.72 samples/sec   Loss 3.1542   LearningRate 0.0001   Epoch: 19   Global Step: 323980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:55,886-Speed 9371.81 samples/sec   Loss 3.3075   LearningRate 0.0001   Epoch: 19   Global Step: 323990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:40:56,980-Speed 9363.93 samples/sec   Loss 3.2660   LearningRate 0.0001   Epoch: 19   Global Step: 324000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:41:19,091-[lfw][324000]XNorm: 6.549769
Training: 2022-04-12 00:41:19,092-[lfw][324000]Accuracy-Flip: 0.99700+-0.00287
Training: 2022-04-12 00:41:19,092-[lfw][324000]Accuracy-Highest: 0.99750
Training: 2022-04-12 00:41:44,565-[cfp_fp][324000]XNorm: 5.724918
Training: 2022-04-12 00:41:44,566-[cfp_fp][324000]Accuracy-Flip: 0.97543+-0.00823
Training: 2022-04-12 00:41:44,566-[cfp_fp][324000]Accuracy-Highest: 0.97543
Training: 2022-04-12 00:42:06,544-[agedb_30][324000]XNorm: 6.387645
Training: 2022-04-12 00:42:06,545-[agedb_30][324000]Accuracy-Flip: 0.97383+-0.00827
Training: 2022-04-12 00:42:06,545-[agedb_30][324000]Accuracy-Highest: 0.97417
Training: 2022-04-12 00:42:07,639-Speed 144.92 samples/sec   Loss 3.3260   LearningRate 0.0001   Epoch: 19   Global Step: 324010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:08,739-Speed 9318.14 samples/sec   Loss 3.2844   LearningRate 0.0001   Epoch: 19   Global Step: 324020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:09,890-Speed 8899.50 samples/sec   Loss 3.3204   LearningRate 0.0001   Epoch: 19   Global Step: 324030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:11,035-Speed 8950.57 samples/sec   Loss 3.3096   LearningRate 0.0001   Epoch: 19   Global Step: 324040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:12,124-Speed 9403.75 samples/sec   Loss 3.2987   LearningRate 0.0001   Epoch: 19   Global Step: 324050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:13,260-Speed 9021.60 samples/sec   Loss 3.3154   LearningRate 0.0001   Epoch: 19   Global Step: 324060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:14,449-Speed 8617.30 samples/sec   Loss 3.2771   LearningRate 0.0001   Epoch: 19   Global Step: 324070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:15,538-Speed 9408.87 samples/sec   Loss 3.3463   LearningRate 0.0001   Epoch: 19   Global Step: 324080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:16,692-Speed 8878.86 samples/sec   Loss 3.2888   LearningRate 0.0001   Epoch: 19   Global Step: 324090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:17,831-Speed 8998.08 samples/sec   Loss 3.3399   LearningRate 0.0001   Epoch: 19   Global Step: 324100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:18,964-Speed 9039.69 samples/sec   Loss 3.2761   LearningRate 0.0001   Epoch: 19   Global Step: 324110   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:42:20,072-Speed 9251.21 samples/sec   Loss 3.3122   LearningRate 0.0001   Epoch: 19   Global Step: 324120   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:42:21,199-Speed 9087.59 samples/sec   Loss 3.2988   LearningRate 0.0001   Epoch: 19   Global Step: 324130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:22,296-Speed 9338.36 samples/sec   Loss 3.2481   LearningRate 0.0001   Epoch: 19   Global Step: 324140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:23,442-Speed 8948.29 samples/sec   Loss 3.3271   LearningRate 0.0001   Epoch: 19   Global Step: 324150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:24,595-Speed 8889.20 samples/sec   Loss 3.3862   LearningRate 0.0001   Epoch: 19   Global Step: 324160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:25,700-Speed 9272.35 samples/sec   Loss 3.2853   LearningRate 0.0001   Epoch: 19   Global Step: 324170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:26,793-Speed 9379.10 samples/sec   Loss 3.3132   LearningRate 0.0001   Epoch: 19   Global Step: 324180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:27,896-Speed 9285.40 samples/sec   Loss 3.2689   LearningRate 0.0001   Epoch: 19   Global Step: 324190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:28,998-Speed 9294.30 samples/sec   Loss 3.3350   LearningRate 0.0001   Epoch: 19   Global Step: 324200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:30,072-Speed 9541.52 samples/sec   Loss 3.2631   LearningRate 0.0001   Epoch: 19   Global Step: 324210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:31,180-Speed 9254.70 samples/sec   Loss 3.3726   LearningRate 0.0001   Epoch: 19   Global Step: 324220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:32,292-Speed 9211.21 samples/sec   Loss 3.2960   LearningRate 0.0001   Epoch: 19   Global Step: 324230   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:42:33,393-Speed 9309.15 samples/sec   Loss 3.2590   LearningRate 0.0001   Epoch: 19   Global Step: 324240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:34,463-Speed 9572.94 samples/sec   Loss 3.2995   LearningRate 0.0001   Epoch: 19   Global Step: 324250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:35,566-Speed 9285.90 samples/sec   Loss 3.2970   LearningRate 0.0001   Epoch: 19   Global Step: 324260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:36,697-Speed 9063.08 samples/sec   Loss 3.3211   LearningRate 0.0001   Epoch: 19   Global Step: 324270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:37,860-Speed 8814.52 samples/sec   Loss 3.3223   LearningRate 0.0001   Epoch: 19   Global Step: 324280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:38,991-Speed 9054.62 samples/sec   Loss 3.2754   LearningRate 0.0001   Epoch: 19   Global Step: 324290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:40,142-Speed 8900.61 samples/sec   Loss 3.2611   LearningRate 0.0001   Epoch: 19   Global Step: 324300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:41,297-Speed 8873.33 samples/sec   Loss 3.3245   LearningRate 0.0001   Epoch: 19   Global Step: 324310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:42,403-Speed 9265.83 samples/sec   Loss 3.2835   LearningRate 0.0001   Epoch: 19   Global Step: 324320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:43,526-Speed 9122.05 samples/sec   Loss 3.4040   LearningRate 0.0001   Epoch: 19   Global Step: 324330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:44,646-Speed 9148.13 samples/sec   Loss 3.2292   LearningRate 0.0001   Epoch: 19   Global Step: 324340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:45,737-Speed 9395.14 samples/sec   Loss 3.3474   LearningRate 0.0001   Epoch: 19   Global Step: 324350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:46,839-Speed 9297.66 samples/sec   Loss 3.2206   LearningRate 0.0001   Epoch: 19   Global Step: 324360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:42:47,963-Speed 9112.86 samples/sec   Loss 3.3201   LearningRate 0.0001   Epoch: 19   Global Step: 324370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:42:49,027-Speed 9628.82 samples/sec   Loss 3.3366   LearningRate 0.0001   Epoch: 19   Global Step: 324380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:42:50,140-Speed 9209.49 samples/sec   Loss 3.3273   LearningRate 0.0001   Epoch: 19   Global Step: 324390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:42:51,264-Speed 9116.80 samples/sec   Loss 3.2057   LearningRate 0.0001   Epoch: 19   Global Step: 324400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:42:52,394-Speed 9062.70 samples/sec   Loss 3.2384   LearningRate 0.0001   Epoch: 19   Global Step: 324410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:42:53,541-Speed 8936.82 samples/sec   Loss 3.2283   LearningRate 0.0001   Epoch: 19   Global Step: 324420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:42:54,661-Speed 9145.62 samples/sec   Loss 3.3702   LearningRate 0.0001   Epoch: 19   Global Step: 324430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:42:55,783-Speed 9130.39 samples/sec   Loss 3.2481   LearningRate 0.0001   Epoch: 19   Global Step: 324440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:42:56,930-Speed 8937.74 samples/sec   Loss 3.2504   LearningRate 0.0001   Epoch: 19   Global Step: 324450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:42:58,080-Speed 8909.63 samples/sec   Loss 3.4196   LearningRate 0.0001   Epoch: 19   Global Step: 324460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:42:59,209-Speed 9072.35 samples/sec   Loss 3.2646   LearningRate 0.0001   Epoch: 19   Global Step: 324470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:00,311-Speed 9295.99 samples/sec   Loss 3.3151   LearningRate 0.0001   Epoch: 19   Global Step: 324480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:01,409-Speed 9339.27 samples/sec   Loss 3.3683   LearningRate 0.0001   Epoch: 19   Global Step: 324490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:02,513-Speed 9286.06 samples/sec   Loss 3.2714   LearningRate 0.0001   Epoch: 19   Global Step: 324500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:03,618-Speed 9271.72 samples/sec   Loss 3.2404   LearningRate 0.0001   Epoch: 19   Global Step: 324510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:04,762-Speed 8955.00 samples/sec   Loss 3.3727   LearningRate 0.0001   Epoch: 19   Global Step: 324520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:05,872-Speed 9225.70 samples/sec   Loss 3.2122   LearningRate 0.0001   Epoch: 19   Global Step: 324530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:06,979-Speed 9256.40 samples/sec   Loss 3.3018   LearningRate 0.0001   Epoch: 19   Global Step: 324540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:08,138-Speed 8840.16 samples/sec   Loss 3.2332   LearningRate 0.0001   Epoch: 19   Global Step: 324550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:09,262-Speed 9115.57 samples/sec   Loss 3.2508   LearningRate 0.0001   Epoch: 19   Global Step: 324560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:10,371-Speed 9242.18 samples/sec   Loss 3.2952   LearningRate 0.0001   Epoch: 19   Global Step: 324570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:11,435-Speed 9626.41 samples/sec   Loss 3.3079   LearningRate 0.0001   Epoch: 19   Global Step: 324580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:12,522-Speed 9427.57 samples/sec   Loss 3.2657   LearningRate 0.0001   Epoch: 19   Global Step: 324590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:13,597-Speed 9525.46 samples/sec   Loss 3.3018   LearningRate 0.0001   Epoch: 19   Global Step: 324600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:14,726-Speed 9075.23 samples/sec   Loss 3.4165   LearningRate 0.0001   Epoch: 19   Global Step: 324610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:15,881-Speed 8878.79 samples/sec   Loss 3.4163   LearningRate 0.0001   Epoch: 19   Global Step: 324620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:16,971-Speed 9393.99 samples/sec   Loss 3.2949   LearningRate 0.0001   Epoch: 19   Global Step: 324630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:18,104-Speed 9045.59 samples/sec   Loss 3.2564   LearningRate 0.0001   Epoch: 19   Global Step: 324640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:19,187-Speed 9458.27 samples/sec   Loss 3.2261   LearningRate 0.0001   Epoch: 19   Global Step: 324650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:20,302-Speed 9196.06 samples/sec   Loss 3.3138   LearningRate 0.0001   Epoch: 19   Global Step: 324660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:21,420-Speed 9168.69 samples/sec   Loss 3.2774   LearningRate 0.0001   Epoch: 19   Global Step: 324670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:22,537-Speed 9174.83 samples/sec   Loss 3.3036   LearningRate 0.0001   Epoch: 19   Global Step: 324680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:23,610-Speed 9542.47 samples/sec   Loss 3.3418   LearningRate 0.0001   Epoch: 19   Global Step: 324690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:24,748-Speed 9004.54 samples/sec   Loss 3.3089   LearningRate 0.0001   Epoch: 19   Global Step: 324700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:25,845-Speed 9343.24 samples/sec   Loss 3.2599   LearningRate 0.0001   Epoch: 19   Global Step: 324710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:26,952-Speed 9253.26 samples/sec   Loss 3.3375   LearningRate 0.0001   Epoch: 19   Global Step: 324720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:28,137-Speed 8646.00 samples/sec   Loss 3.2895   LearningRate 0.0001   Epoch: 19   Global Step: 324730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:29,279-Speed 8970.01 samples/sec   Loss 3.2916   LearningRate 0.0001   Epoch: 19   Global Step: 324740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:30,378-Speed 9324.70 samples/sec   Loss 3.3360   LearningRate 0.0001   Epoch: 19   Global Step: 324750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:31,500-Speed 9137.29 samples/sec   Loss 3.3169   LearningRate 0.0001   Epoch: 19   Global Step: 324760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:32,645-Speed 8954.02 samples/sec   Loss 3.3042   LearningRate 0.0001   Epoch: 19   Global Step: 324770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:33,761-Speed 9176.18 samples/sec   Loss 3.2066   LearningRate 0.0001   Epoch: 19   Global Step: 324780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:34,873-Speed 9220.01 samples/sec   Loss 3.2321   LearningRate 0.0001   Epoch: 19   Global Step: 324790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:36,013-Speed 8986.61 samples/sec   Loss 3.3691   LearningRate 0.0001   Epoch: 19   Global Step: 324800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:37,124-Speed 9221.36 samples/sec   Loss 3.3168   LearningRate 0.0001   Epoch: 19   Global Step: 324810   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:43:38,225-Speed 9312.55 samples/sec   Loss 3.2723   LearningRate 0.0001   Epoch: 19   Global Step: 324820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:39,321-Speed 9343.60 samples/sec   Loss 3.2173   LearningRate 0.0001   Epoch: 19   Global Step: 324830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:40,404-Speed 9459.68 samples/sec   Loss 3.3357   LearningRate 0.0001   Epoch: 19   Global Step: 324840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:41,496-Speed 9384.70 samples/sec   Loss 3.3326   LearningRate 0.0001   Epoch: 19   Global Step: 324850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:42,635-Speed 8993.19 samples/sec   Loss 3.2834   LearningRate 0.0001   Epoch: 19   Global Step: 324860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:43,730-Speed 9359.33 samples/sec   Loss 3.2778   LearningRate 0.0001   Epoch: 19   Global Step: 324870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:44,848-Speed 9166.12 samples/sec   Loss 3.3209   LearningRate 0.0001   Epoch: 19   Global Step: 324880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:45,940-Speed 9377.22 samples/sec   Loss 3.2272   LearningRate 0.0001   Epoch: 19   Global Step: 324890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:47,073-Speed 9050.50 samples/sec   Loss 3.3483   LearningRate 0.0001   Epoch: 19   Global Step: 324900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:48,200-Speed 9084.45 samples/sec   Loss 3.3614   LearningRate 0.0001   Epoch: 19   Global Step: 324910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:49,293-Speed 9379.38 samples/sec   Loss 3.3065   LearningRate 0.0001   Epoch: 19   Global Step: 324920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:50,415-Speed 9132.21 samples/sec   Loss 3.3242   LearningRate 0.0001   Epoch: 19   Global Step: 324930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:51,565-Speed 8909.64 samples/sec   Loss 3.2792   LearningRate 0.0001   Epoch: 19   Global Step: 324940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:52,695-Speed 9075.37 samples/sec   Loss 3.3032   LearningRate 0.0001   Epoch: 19   Global Step: 324950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:53,794-Speed 9315.81 samples/sec   Loss 3.3133   LearningRate 0.0001   Epoch: 19   Global Step: 324960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:54,944-Speed 8916.61 samples/sec   Loss 3.3177   LearningRate 0.0001   Epoch: 19   Global Step: 324970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:56,037-Speed 9366.24 samples/sec   Loss 3.2848   LearningRate 0.0001   Epoch: 19   Global Step: 324980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:43:57,118-Speed 9482.57 samples/sec   Loss 3.3160   LearningRate 0.0001   Epoch: 19   Global Step: 324990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:58,263-Speed 8948.59 samples/sec   Loss 3.3187   LearningRate 0.0001   Epoch: 19   Global Step: 325000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:43:59,398-Speed 9026.99 samples/sec   Loss 3.1996   LearningRate 0.0001   Epoch: 19   Global Step: 325010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:00,501-Speed 9285.73 samples/sec   Loss 3.3523   LearningRate 0.0001   Epoch: 19   Global Step: 325020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:01,569-Speed 9601.43 samples/sec   Loss 3.3090   LearningRate 0.0001   Epoch: 19   Global Step: 325030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:02,672-Speed 9281.61 samples/sec   Loss 3.2754   LearningRate 0.0001   Epoch: 19   Global Step: 325040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:03,791-Speed 9162.49 samples/sec   Loss 3.3418   LearningRate 0.0001   Epoch: 19   Global Step: 325050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:04,921-Speed 9062.01 samples/sec   Loss 3.3341   LearningRate 0.0001   Epoch: 19   Global Step: 325060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:06,056-Speed 9033.39 samples/sec   Loss 3.3181   LearningRate 0.0001   Epoch: 19   Global Step: 325070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:07,188-Speed 9049.81 samples/sec   Loss 3.3979   LearningRate 0.0001   Epoch: 19   Global Step: 325080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:08,309-Speed 9141.98 samples/sec   Loss 3.2821   LearningRate 0.0001   Epoch: 19   Global Step: 325090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:09,429-Speed 9144.44 samples/sec   Loss 3.2372   LearningRate 0.0001   Epoch: 19   Global Step: 325100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:10,546-Speed 9175.62 samples/sec   Loss 3.2500   LearningRate 0.0001   Epoch: 19   Global Step: 325110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:11,661-Speed 9194.22 samples/sec   Loss 3.3262   LearningRate 0.0001   Epoch: 19   Global Step: 325120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:12,778-Speed 9171.97 samples/sec   Loss 3.3195   LearningRate 0.0001   Epoch: 19   Global Step: 325130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:13,880-Speed 9297.76 samples/sec   Loss 3.3119   LearningRate 0.0001   Epoch: 19   Global Step: 325140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:14,989-Speed 9234.76 samples/sec   Loss 3.2602   LearningRate 0.0001   Epoch: 19   Global Step: 325150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:16,110-Speed 9143.98 samples/sec   Loss 3.2814   LearningRate 0.0001   Epoch: 19   Global Step: 325160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:17,252-Speed 8967.13 samples/sec   Loss 3.2937   LearningRate 0.0001   Epoch: 19   Global Step: 325170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:18,395-Speed 8964.86 samples/sec   Loss 3.2198   LearningRate 0.0001   Epoch: 19   Global Step: 325180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:19,514-Speed 9154.06 samples/sec   Loss 3.3291   LearningRate 0.0001   Epoch: 19   Global Step: 325190   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:44:20,576-Speed 9651.93 samples/sec   Loss 3.3247   LearningRate 0.0001   Epoch: 19   Global Step: 325200   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:44:21,693-Speed 9172.37 samples/sec   Loss 3.2661   LearningRate 0.0001   Epoch: 19   Global Step: 325210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:22,793-Speed 9317.10 samples/sec   Loss 3.3444   LearningRate 0.0001   Epoch: 19   Global Step: 325220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:23,911-Speed 9161.75 samples/sec   Loss 3.3288   LearningRate 0.0001   Epoch: 19   Global Step: 325230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:25,075-Speed 8801.32 samples/sec   Loss 3.2691   LearningRate 0.0001   Epoch: 19   Global Step: 325240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:26,202-Speed 9091.70 samples/sec   Loss 3.3175   LearningRate 0.0001   Epoch: 19   Global Step: 325250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:27,320-Speed 9162.49 samples/sec   Loss 3.3212   LearningRate 0.0001   Epoch: 19   Global Step: 325260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:28,410-Speed 9403.68 samples/sec   Loss 3.2981   LearningRate 0.0001   Epoch: 19   Global Step: 325270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:29,507-Speed 9337.51 samples/sec   Loss 3.3124   LearningRate 0.0001   Epoch: 19   Global Step: 325280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:30,613-Speed 9270.13 samples/sec   Loss 3.2639   LearningRate 0.0001   Epoch: 19   Global Step: 325290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:31,751-Speed 9001.94 samples/sec   Loss 3.3000   LearningRate 0.0001   Epoch: 19   Global Step: 325300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:32,866-Speed 9193.22 samples/sec   Loss 3.3427   LearningRate 0.0001   Epoch: 19   Global Step: 325310   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:44:34,003-Speed 9008.62 samples/sec   Loss 3.2074   LearningRate 0.0001   Epoch: 19   Global Step: 325320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:35,126-Speed 9124.92 samples/sec   Loss 3.3028   LearningRate 0.0001   Epoch: 19   Global Step: 325330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:36,242-Speed 9177.06 samples/sec   Loss 3.3211   LearningRate 0.0001   Epoch: 19   Global Step: 325340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:37,388-Speed 8938.44 samples/sec   Loss 3.2736   LearningRate 0.0001   Epoch: 19   Global Step: 325350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:38,546-Speed 8850.31 samples/sec   Loss 3.2910   LearningRate 0.0001   Epoch: 19   Global Step: 325360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:39,671-Speed 9107.44 samples/sec   Loss 3.3589   LearningRate 0.0001   Epoch: 19   Global Step: 325370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:40,770-Speed 9325.73 samples/sec   Loss 3.3174   LearningRate 0.0001   Epoch: 19   Global Step: 325380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:41,854-Speed 9451.72 samples/sec   Loss 3.2764   LearningRate 0.0001   Epoch: 19   Global Step: 325390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:42,943-Speed 9406.52 samples/sec   Loss 3.3195   LearningRate 0.0001   Epoch: 19   Global Step: 325400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:44,079-Speed 9019.92 samples/sec   Loss 3.3037   LearningRate 0.0001   Epoch: 19   Global Step: 325410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:45,171-Speed 9384.80 samples/sec   Loss 3.1650   LearningRate 0.0001   Epoch: 19   Global Step: 325420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:46,298-Speed 9087.32 samples/sec   Loss 3.3419   LearningRate 0.0001   Epoch: 19   Global Step: 325430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:47,428-Speed 9073.86 samples/sec   Loss 3.2574   LearningRate 0.0001   Epoch: 19   Global Step: 325440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:48,501-Speed 9546.89 samples/sec   Loss 3.1733   LearningRate 0.0001   Epoch: 19   Global Step: 325450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:49,655-Speed 8878.41 samples/sec   Loss 3.2757   LearningRate 0.0001   Epoch: 19   Global Step: 325460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:50,796-Speed 8978.24 samples/sec   Loss 3.2626   LearningRate 0.0001   Epoch: 19   Global Step: 325470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:51,939-Speed 8965.60 samples/sec   Loss 3.3154   LearningRate 0.0001   Epoch: 19   Global Step: 325480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:53,059-Speed 9151.88 samples/sec   Loss 3.3010   LearningRate 0.0001   Epoch: 19   Global Step: 325490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:54,197-Speed 9003.70 samples/sec   Loss 3.4011   LearningRate 0.0001   Epoch: 19   Global Step: 325500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:55,332-Speed 9025.19 samples/sec   Loss 3.2629   LearningRate 0.0001   Epoch: 19   Global Step: 325510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:56,422-Speed 9397.17 samples/sec   Loss 3.2368   LearningRate 0.0001   Epoch: 19   Global Step: 325520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:57,531-Speed 9244.44 samples/sec   Loss 3.2604   LearningRate 0.0001   Epoch: 19   Global Step: 325530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:58,632-Speed 9303.18 samples/sec   Loss 3.3042   LearningRate 0.0001   Epoch: 19   Global Step: 325540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:44:59,757-Speed 9104.78 samples/sec   Loss 3.2837   LearningRate 0.0001   Epoch: 19   Global Step: 325550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:00,857-Speed 9318.32 samples/sec   Loss 3.2557   LearningRate 0.0001   Epoch: 19   Global Step: 325560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:02,020-Speed 8814.54 samples/sec   Loss 3.3666   LearningRate 0.0001   Epoch: 19   Global Step: 325570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:03,127-Speed 9255.94 samples/sec   Loss 3.2768   LearningRate 0.0001   Epoch: 19   Global Step: 325580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:04,299-Speed 8736.52 samples/sec   Loss 3.3257   LearningRate 0.0001   Epoch: 19   Global Step: 325590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:05,391-Speed 9380.82 samples/sec   Loss 3.2880   LearningRate 0.0001   Epoch: 19   Global Step: 325600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:06,535-Speed 8959.35 samples/sec   Loss 3.3260   LearningRate 0.0001   Epoch: 19   Global Step: 325610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:07,631-Speed 9349.36 samples/sec   Loss 3.2568   LearningRate 0.0001   Epoch: 19   Global Step: 325620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:08,718-Speed 9437.03 samples/sec   Loss 3.3001   LearningRate 0.0001   Epoch: 19   Global Step: 325630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:09,844-Speed 9094.41 samples/sec   Loss 3.2430   LearningRate 0.0001   Epoch: 19   Global Step: 325640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:10,948-Speed 9278.01 samples/sec   Loss 3.2583   LearningRate 0.0001   Epoch: 19   Global Step: 325650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:12,053-Speed 9271.58 samples/sec   Loss 3.2744   LearningRate 0.0001   Epoch: 19   Global Step: 325660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:13,140-Speed 9426.64 samples/sec   Loss 3.2601   LearningRate 0.0001   Epoch: 19   Global Step: 325670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:14,254-Speed 9202.46 samples/sec   Loss 3.3385   LearningRate 0.0001   Epoch: 19   Global Step: 325680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:15,327-Speed 9543.13 samples/sec   Loss 3.3576   LearningRate 0.0001   Epoch: 19   Global Step: 325690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:16,445-Speed 9165.22 samples/sec   Loss 3.2756   LearningRate 0.0001   Epoch: 19   Global Step: 325700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:17,559-Speed 9195.83 samples/sec   Loss 3.2308   LearningRate 0.0001   Epoch: 19   Global Step: 325710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:18,676-Speed 9171.27 samples/sec   Loss 3.2788   LearningRate 0.0001   Epoch: 19   Global Step: 325720   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:45:19,839-Speed 8812.77 samples/sec   Loss 3.3015   LearningRate 0.0001   Epoch: 19   Global Step: 325730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:20,987-Speed 8927.47 samples/sec   Loss 3.2632   LearningRate 0.0001   Epoch: 19   Global Step: 325740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:22,083-Speed 9345.42 samples/sec   Loss 3.3280   LearningRate 0.0001   Epoch: 19   Global Step: 325750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:23,176-Speed 9380.04 samples/sec   Loss 3.3299   LearningRate 0.0001   Epoch: 19   Global Step: 325760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:24,298-Speed 9125.62 samples/sec   Loss 3.2110   LearningRate 0.0001   Epoch: 19   Global Step: 325770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:25,444-Speed 8948.90 samples/sec   Loss 3.2982   LearningRate 0.0001   Epoch: 19   Global Step: 325780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:26,572-Speed 9082.31 samples/sec   Loss 3.3482   LearningRate 0.0001   Epoch: 19   Global Step: 325790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:27,680-Speed 9246.96 samples/sec   Loss 3.2445   LearningRate 0.0001   Epoch: 19   Global Step: 325800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:28,757-Speed 9517.34 samples/sec   Loss 3.2720   LearningRate 0.0001   Epoch: 19   Global Step: 325810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:29,797-Speed 9850.59 samples/sec   Loss 3.2820   LearningRate 0.0001   Epoch: 19   Global Step: 325820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:30,932-Speed 9023.75 samples/sec   Loss 3.2605   LearningRate 0.0001   Epoch: 19   Global Step: 325830   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:45:32,059-Speed 9094.66 samples/sec   Loss 3.3067   LearningRate 0.0001   Epoch: 19   Global Step: 325840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:33,216-Speed 8860.31 samples/sec   Loss 3.3499   LearningRate 0.0001   Epoch: 19   Global Step: 325850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:34,291-Speed 9528.74 samples/sec   Loss 3.3028   LearningRate 0.0001   Epoch: 19   Global Step: 325860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:35,425-Speed 9033.41 samples/sec   Loss 3.2716   LearningRate 0.0001   Epoch: 19   Global Step: 325870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:36,585-Speed 8832.11 samples/sec   Loss 3.1905   LearningRate 0.0001   Epoch: 19   Global Step: 325880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:37,728-Speed 8960.51 samples/sec   Loss 3.3331   LearningRate 0.0001   Epoch: 19   Global Step: 325890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:38,864-Speed 9021.52 samples/sec   Loss 3.2583   LearningRate 0.0001   Epoch: 19   Global Step: 325900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:40,003-Speed 8994.40 samples/sec   Loss 3.3374   LearningRate 0.0001   Epoch: 19   Global Step: 325910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:41,082-Speed 9499.80 samples/sec   Loss 3.3054   LearningRate 0.0001   Epoch: 19   Global Step: 325920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:45:42,196-Speed 9194.03 samples/sec   Loss 3.3065   LearningRate 0.0001   Epoch: 19   Global Step: 325930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:45:43,343-Speed 8936.42 samples/sec   Loss 3.2540   LearningRate 0.0001   Epoch: 19   Global Step: 325940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:45:44,448-Speed 9272.57 samples/sec   Loss 3.3115   LearningRate 0.0001   Epoch: 19   Global Step: 325950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:45:45,521-Speed 9545.34 samples/sec   Loss 3.2588   LearningRate 0.0001   Epoch: 19   Global Step: 325960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:45:46,641-Speed 9149.10 samples/sec   Loss 3.3507   LearningRate 0.0001   Epoch: 19   Global Step: 325970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:45:47,717-Speed 9528.25 samples/sec   Loss 3.3268   LearningRate 0.0001   Epoch: 19   Global Step: 325980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:45:48,801-Speed 9451.37 samples/sec   Loss 3.2652   LearningRate 0.0001   Epoch: 19   Global Step: 325990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:45:49,871-Speed 9570.21 samples/sec   Loss 3.3027   LearningRate 0.0001   Epoch: 19   Global Step: 326000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:46:11,846-[lfw][326000]XNorm: 6.548740
Training: 2022-04-12 00:46:11,847-[lfw][326000]Accuracy-Flip: 0.99650+-0.00263
Training: 2022-04-12 00:46:11,847-[lfw][326000]Accuracy-Highest: 0.99750
Training: 2022-04-12 00:46:37,252-[cfp_fp][326000]XNorm: 5.721344
Training: 2022-04-12 00:46:37,253-[cfp_fp][326000]Accuracy-Flip: 0.97186+-0.00892
Training: 2022-04-12 00:46:37,253-[cfp_fp][326000]Accuracy-Highest: 0.97543
Training: 2022-04-12 00:46:59,122-[agedb_30][326000]XNorm: 6.378732
Training: 2022-04-12 00:46:59,122-[agedb_30][326000]Accuracy-Flip: 0.97217+-0.00803
Training: 2022-04-12 00:46:59,122-[agedb_30][326000]Accuracy-Highest: 0.97417
Training: 2022-04-12 00:47:00,237-Speed 145.53 samples/sec   Loss 3.2839   LearningRate 0.0001   Epoch: 19   Global Step: 326010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:47:01,352-Speed 9220.53 samples/sec   Loss 3.2458   LearningRate 0.0001   Epoch: 19   Global Step: 326020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:47:02,511-Speed 8839.45 samples/sec   Loss 3.2810   LearningRate 0.0001   Epoch: 19   Global Step: 326030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:03,634-Speed 9122.17 samples/sec   Loss 3.3563   LearningRate 0.0001   Epoch: 19   Global Step: 326040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:04,744-Speed 9236.98 samples/sec   Loss 3.4477   LearningRate 0.0001   Epoch: 19   Global Step: 326050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:05,849-Speed 9264.94 samples/sec   Loss 3.2926   LearningRate 0.0001   Epoch: 19   Global Step: 326060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:06,979-Speed 9070.33 samples/sec   Loss 3.3483   LearningRate 0.0001   Epoch: 19   Global Step: 326070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:08,104-Speed 9106.60 samples/sec   Loss 3.2510   LearningRate 0.0001   Epoch: 19   Global Step: 326080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:09,227-Speed 9130.55 samples/sec   Loss 3.3628   LearningRate 0.0001   Epoch: 19   Global Step: 326090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:10,380-Speed 8883.81 samples/sec   Loss 3.3451   LearningRate 0.0001   Epoch: 19   Global Step: 326100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:11,530-Speed 8906.10 samples/sec   Loss 3.3114   LearningRate 0.0001   Epoch: 19   Global Step: 326110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:12,688-Speed 8845.55 samples/sec   Loss 3.2867   LearningRate 0.0001   Epoch: 19   Global Step: 326120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:13,811-Speed 9124.16 samples/sec   Loss 3.3004   LearningRate 0.0001   Epoch: 19   Global Step: 326130   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:47:14,964-Speed 8892.45 samples/sec   Loss 3.3422   LearningRate 0.0001   Epoch: 19   Global Step: 326140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:16,080-Speed 9178.42 samples/sec   Loss 3.1896   LearningRate 0.0001   Epoch: 19   Global Step: 326150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:17,213-Speed 9043.57 samples/sec   Loss 3.3454   LearningRate 0.0001   Epoch: 19   Global Step: 326160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:18,360-Speed 8933.04 samples/sec   Loss 3.3264   LearningRate 0.0001   Epoch: 19   Global Step: 326170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:19,486-Speed 9095.81 samples/sec   Loss 3.2478   LearningRate 0.0001   Epoch: 19   Global Step: 326180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:20,578-Speed 9393.03 samples/sec   Loss 3.1820   LearningRate 0.0001   Epoch: 19   Global Step: 326190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:21,676-Speed 9324.03 samples/sec   Loss 3.3316   LearningRate 0.0001   Epoch: 19   Global Step: 326200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:22,749-Speed 9548.80 samples/sec   Loss 3.2772   LearningRate 0.0001   Epoch: 19   Global Step: 326210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:23,885-Speed 9022.30 samples/sec   Loss 3.2812   LearningRate 0.0001   Epoch: 19   Global Step: 326220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:25,020-Speed 9028.35 samples/sec   Loss 3.3419   LearningRate 0.0001   Epoch: 19   Global Step: 326230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:26,097-Speed 9509.23 samples/sec   Loss 3.2233   LearningRate 0.0001   Epoch: 19   Global Step: 326240   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:47:27,163-Speed 9616.08 samples/sec   Loss 3.3207   LearningRate 0.0001   Epoch: 19   Global Step: 326250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:47:28,325-Speed 8814.15 samples/sec   Loss 3.3306   LearningRate 0.0001   Epoch: 19   Global Step: 326260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:47:29,441-Speed 9179.33 samples/sec   Loss 3.2828   LearningRate 0.0001   Epoch: 19   Global Step: 326270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:47:30,568-Speed 9092.45 samples/sec   Loss 3.2780   LearningRate 0.0001   Epoch: 19   Global Step: 326280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:47:31,726-Speed 8853.86 samples/sec   Loss 3.3743   LearningRate 0.0001   Epoch: 19   Global Step: 326290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:47:32,853-Speed 9088.89 samples/sec   Loss 3.2611   LearningRate 0.0001   Epoch: 19   Global Step: 326300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:47:33,999-Speed 8939.97 samples/sec   Loss 3.2179   LearningRate 0.0001   Epoch: 19   Global Step: 326310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:47:35,095-Speed 9348.67 samples/sec   Loss 3.3304   LearningRate 0.0001   Epoch: 19   Global Step: 326320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:47:36,238-Speed 8970.02 samples/sec   Loss 3.2101   LearningRate 0.0001   Epoch: 19   Global Step: 326330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:47:37,355-Speed 9172.65 samples/sec   Loss 3.2954   LearningRate 0.0001   Epoch: 19   Global Step: 326340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:47:38,528-Speed 8738.92 samples/sec   Loss 3.2995   LearningRate 0.0001   Epoch: 19   Global Step: 326350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:39,631-Speed 9291.11 samples/sec   Loss 3.2515   LearningRate 0.0000   Epoch: 19   Global Step: 326360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:40,768-Speed 9007.75 samples/sec   Loss 3.3195   LearningRate 0.0000   Epoch: 19   Global Step: 326370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:41,908-Speed 8985.31 samples/sec   Loss 3.3163   LearningRate 0.0000   Epoch: 19   Global Step: 326380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:42,996-Speed 9421.18 samples/sec   Loss 3.2688   LearningRate 0.0000   Epoch: 19   Global Step: 326390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:44,120-Speed 9114.58 samples/sec   Loss 3.2936   LearningRate 0.0000   Epoch: 19   Global Step: 326400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:45,249-Speed 9072.15 samples/sec   Loss 3.3066   LearningRate 0.0000   Epoch: 19   Global Step: 326410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:46,345-Speed 9352.59 samples/sec   Loss 3.3359   LearningRate 0.0000   Epoch: 19   Global Step: 326420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:47,482-Speed 9012.05 samples/sec   Loss 3.3015   LearningRate 0.0000   Epoch: 19   Global Step: 326430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:48,650-Speed 8766.61 samples/sec   Loss 3.2636   LearningRate 0.0000   Epoch: 19   Global Step: 326440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:49,781-Speed 9063.17 samples/sec   Loss 3.3815   LearningRate 0.0000   Epoch: 19   Global Step: 326450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:50,853-Speed 9562.40 samples/sec   Loss 3.2186   LearningRate 0.0000   Epoch: 19   Global Step: 326460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:51,937-Speed 9457.30 samples/sec   Loss 3.1690   LearningRate 0.0000   Epoch: 19   Global Step: 326470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:53,002-Speed 9615.54 samples/sec   Loss 3.1867   LearningRate 0.0000   Epoch: 19   Global Step: 326480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:54,123-Speed 9145.92 samples/sec   Loss 3.2924   LearningRate 0.0000   Epoch: 19   Global Step: 326490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:55,272-Speed 8909.89 samples/sec   Loss 3.2996   LearningRate 0.0000   Epoch: 19   Global Step: 326500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:56,381-Speed 9238.13 samples/sec   Loss 3.2641   LearningRate 0.0000   Epoch: 19   Global Step: 326510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:57,485-Speed 9283.42 samples/sec   Loss 3.2923   LearningRate 0.0000   Epoch: 19   Global Step: 326520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:58,554-Speed 9585.33 samples/sec   Loss 3.2988   LearningRate 0.0000   Epoch: 19   Global Step: 326530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:47:59,647-Speed 9373.06 samples/sec   Loss 3.2999   LearningRate 0.0000   Epoch: 19   Global Step: 326540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:00,765-Speed 9165.19 samples/sec   Loss 3.2551   LearningRate 0.0000   Epoch: 19   Global Step: 326550   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:48:01,905-Speed 8994.32 samples/sec   Loss 3.2734   LearningRate 0.0000   Epoch: 19   Global Step: 326560   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:48:03,053-Speed 8923.07 samples/sec   Loss 3.3214   LearningRate 0.0000   Epoch: 19   Global Step: 326570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:04,222-Speed 8759.01 samples/sec   Loss 3.2818   LearningRate 0.0000   Epoch: 19   Global Step: 326580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:05,400-Speed 8697.12 samples/sec   Loss 3.2905   LearningRate 0.0000   Epoch: 19   Global Step: 326590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:06,536-Speed 9021.96 samples/sec   Loss 3.3272   LearningRate 0.0000   Epoch: 19   Global Step: 326600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:07,632-Speed 9350.01 samples/sec   Loss 3.3358   LearningRate 0.0000   Epoch: 19   Global Step: 326610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:08,741-Speed 9244.68 samples/sec   Loss 3.2833   LearningRate 0.0000   Epoch: 19   Global Step: 326620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:09,872-Speed 9056.29 samples/sec   Loss 3.2886   LearningRate 0.0000   Epoch: 19   Global Step: 326630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:10,990-Speed 9164.51 samples/sec   Loss 3.3369   LearningRate 0.0000   Epoch: 19   Global Step: 326640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:12,077-Speed 9426.76 samples/sec   Loss 3.2580   LearningRate 0.0000   Epoch: 19   Global Step: 326650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:13,237-Speed 8832.80 samples/sec   Loss 3.2795   LearningRate 0.0000   Epoch: 19   Global Step: 326660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:14,358-Speed 9135.93 samples/sec   Loss 3.3099   LearningRate 0.0000   Epoch: 19   Global Step: 326670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:48:15,483-Speed 9106.96 samples/sec   Loss 3.2896   LearningRate 0.0000   Epoch: 19   Global Step: 326680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:48:16,581-Speed 9336.68 samples/sec   Loss 3.2137   LearningRate 0.0000   Epoch: 19   Global Step: 326690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:48:17,712-Speed 9051.92 samples/sec   Loss 3.2575   LearningRate 0.0000   Epoch: 19   Global Step: 326700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:48:18,832-Speed 9153.14 samples/sec   Loss 3.2540   LearningRate 0.0000   Epoch: 19   Global Step: 326710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:48:19,962-Speed 9062.03 samples/sec   Loss 3.2809   LearningRate 0.0000   Epoch: 19   Global Step: 326720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:48:21,078-Speed 9184.36 samples/sec   Loss 3.3219   LearningRate 0.0000   Epoch: 19   Global Step: 326730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:48:22,214-Speed 9020.87 samples/sec   Loss 3.3005   LearningRate 0.0000   Epoch: 19   Global Step: 326740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:48:23,339-Speed 9108.88 samples/sec   Loss 3.3068   LearningRate 0.0000   Epoch: 19   Global Step: 326750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:48:24,452-Speed 9200.23 samples/sec   Loss 3.3291   LearningRate 0.0000   Epoch: 19   Global Step: 326760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:48:25,595-Speed 8964.04 samples/sec   Loss 3.3306   LearningRate 0.0000   Epoch: 19   Global Step: 326770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:26,730-Speed 9032.93 samples/sec   Loss 3.3169   LearningRate 0.0000   Epoch: 19   Global Step: 326780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:27,819-Speed 9406.24 samples/sec   Loss 3.3193   LearningRate 0.0000   Epoch: 19   Global Step: 326790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:28,964-Speed 8951.73 samples/sec   Loss 3.2497   LearningRate 0.0000   Epoch: 19   Global Step: 326800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:30,050-Speed 9433.89 samples/sec   Loss 3.3952   LearningRate 0.0000   Epoch: 19   Global Step: 326810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:31,130-Speed 9482.38 samples/sec   Loss 3.1759   LearningRate 0.0000   Epoch: 19   Global Step: 326820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:32,199-Speed 9590.01 samples/sec   Loss 3.3502   LearningRate 0.0000   Epoch: 19   Global Step: 326830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:33,291-Speed 9380.48 samples/sec   Loss 3.2518   LearningRate 0.0000   Epoch: 19   Global Step: 326840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:34,424-Speed 9045.78 samples/sec   Loss 3.2996   LearningRate 0.0000   Epoch: 19   Global Step: 326850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:35,560-Speed 9020.70 samples/sec   Loss 3.3188   LearningRate 0.0000   Epoch: 19   Global Step: 326860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:36,663-Speed 9282.42 samples/sec   Loss 3.3034   LearningRate 0.0000   Epoch: 19   Global Step: 326870   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:48:37,755-Speed 9389.43 samples/sec   Loss 3.2646   LearningRate 0.0000   Epoch: 19   Global Step: 326880   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:48:38,875-Speed 9148.03 samples/sec   Loss 3.2217   LearningRate 0.0000   Epoch: 19   Global Step: 326890   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:48:39,955-Speed 9490.68 samples/sec   Loss 3.2851   LearningRate 0.0000   Epoch: 19   Global Step: 326900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:41,018-Speed 9635.96 samples/sec   Loss 3.2445   LearningRate 0.0000   Epoch: 19   Global Step: 326910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:42,099-Speed 9474.02 samples/sec   Loss 3.3624   LearningRate 0.0000   Epoch: 19   Global Step: 326920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:43,194-Speed 9360.64 samples/sec   Loss 3.3253   LearningRate 0.0000   Epoch: 19   Global Step: 326930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:44,335-Speed 8980.36 samples/sec   Loss 3.3294   LearningRate 0.0000   Epoch: 19   Global Step: 326940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:45,483-Speed 8927.81 samples/sec   Loss 3.2485   LearningRate 0.0000   Epoch: 19   Global Step: 326950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:46,580-Speed 9341.71 samples/sec   Loss 3.2982   LearningRate 0.0000   Epoch: 19   Global Step: 326960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:47,703-Speed 9123.97 samples/sec   Loss 3.2550   LearningRate 0.0000   Epoch: 19   Global Step: 326970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:48,818-Speed 9185.64 samples/sec   Loss 3.2051   LearningRate 0.0000   Epoch: 19   Global Step: 326980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:49,912-Speed 9362.84 samples/sec   Loss 3.3226   LearningRate 0.0000   Epoch: 19   Global Step: 326990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:51,035-Speed 9129.17 samples/sec   Loss 3.3780   LearningRate 0.0000   Epoch: 19   Global Step: 327000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:52,122-Speed 9425.17 samples/sec   Loss 3.3074   LearningRate 0.0000   Epoch: 19   Global Step: 327010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:53,271-Speed 8912.75 samples/sec   Loss 3.2745   LearningRate 0.0000   Epoch: 19   Global Step: 327020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:54,400-Speed 9081.52 samples/sec   Loss 3.2708   LearningRate 0.0000   Epoch: 19   Global Step: 327030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:55,487-Speed 9425.05 samples/sec   Loss 3.2661   LearningRate 0.0000   Epoch: 19   Global Step: 327040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:56,572-Speed 9436.61 samples/sec   Loss 3.2818   LearningRate 0.0000   Epoch: 19   Global Step: 327050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:57,681-Speed 9242.68 samples/sec   Loss 3.2287   LearningRate 0.0000   Epoch: 19   Global Step: 327060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:48:58,874-Speed 8586.39 samples/sec   Loss 3.3258   LearningRate 0.0000   Epoch: 19   Global Step: 327070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:00,048-Speed 8727.51 samples/sec   Loss 3.2791   LearningRate 0.0000   Epoch: 19   Global Step: 327080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:01,200-Speed 8890.35 samples/sec   Loss 3.2854   LearningRate 0.0000   Epoch: 19   Global Step: 327090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:02,294-Speed 9376.29 samples/sec   Loss 3.2403   LearningRate 0.0000   Epoch: 19   Global Step: 327100   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:49:03,397-Speed 9290.07 samples/sec   Loss 3.3357   LearningRate 0.0000   Epoch: 19   Global Step: 327110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:04,538-Speed 8982.72 samples/sec   Loss 3.3074   LearningRate 0.0000   Epoch: 19   Global Step: 327120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:05,690-Speed 8889.49 samples/sec   Loss 3.3415   LearningRate 0.0000   Epoch: 19   Global Step: 327130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:06,891-Speed 8532.98 samples/sec   Loss 3.3176   LearningRate 0.0000   Epoch: 19   Global Step: 327140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:07,979-Speed 9418.59 samples/sec   Loss 3.3326   LearningRate 0.0000   Epoch: 19   Global Step: 327150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:09,100-Speed 9138.69 samples/sec   Loss 3.4121   LearningRate 0.0000   Epoch: 19   Global Step: 327160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:10,219-Speed 9157.78 samples/sec   Loss 3.2928   LearningRate 0.0000   Epoch: 19   Global Step: 327170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:11,307-Speed 9420.42 samples/sec   Loss 3.3422   LearningRate 0.0000   Epoch: 19   Global Step: 327180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:12,408-Speed 9300.71 samples/sec   Loss 3.3213   LearningRate 0.0000   Epoch: 19   Global Step: 327190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:13,543-Speed 9031.08 samples/sec   Loss 3.3378   LearningRate 0.0000   Epoch: 19   Global Step: 327200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:14,668-Speed 9107.38 samples/sec   Loss 3.3905   LearningRate 0.0000   Epoch: 19   Global Step: 327210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:15,811-Speed 8963.08 samples/sec   Loss 3.3082   LearningRate 0.0000   Epoch: 19   Global Step: 327220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:16,896-Speed 9442.73 samples/sec   Loss 3.2565   LearningRate 0.0000   Epoch: 19   Global Step: 327230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:18,091-Speed 8569.87 samples/sec   Loss 3.1994   LearningRate 0.0000   Epoch: 19   Global Step: 327240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:19,206-Speed 9188.05 samples/sec   Loss 3.3039   LearningRate 0.0000   Epoch: 19   Global Step: 327250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:20,316-Speed 9236.66 samples/sec   Loss 3.2520   LearningRate 0.0000   Epoch: 19   Global Step: 327260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:21,382-Speed 9611.40 samples/sec   Loss 3.3031   LearningRate 0.0000   Epoch: 19   Global Step: 327270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:22,498-Speed 9181.78 samples/sec   Loss 3.2374   LearningRate 0.0000   Epoch: 19   Global Step: 327280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:23,635-Speed 9015.60 samples/sec   Loss 3.2713   LearningRate 0.0000   Epoch: 19   Global Step: 327290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:24,761-Speed 9092.77 samples/sec   Loss 3.3457   LearningRate 0.0000   Epoch: 19   Global Step: 327300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:25,897-Speed 9017.64 samples/sec   Loss 3.3030   LearningRate 0.0000   Epoch: 19   Global Step: 327310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:26,972-Speed 9538.76 samples/sec   Loss 3.3262   LearningRate 0.0000   Epoch: 19   Global Step: 327320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:28,071-Speed 9314.93 samples/sec   Loss 3.3015   LearningRate 0.0000   Epoch: 19   Global Step: 327330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:29,171-Speed 9318.33 samples/sec   Loss 3.3059   LearningRate 0.0000   Epoch: 19   Global Step: 327340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:30,301-Speed 9066.42 samples/sec   Loss 3.3230   LearningRate 0.0000   Epoch: 19   Global Step: 327350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:31,448-Speed 8928.23 samples/sec   Loss 3.2025   LearningRate 0.0000   Epoch: 19   Global Step: 327360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:32,590-Speed 8979.32 samples/sec   Loss 3.2741   LearningRate 0.0000   Epoch: 19   Global Step: 327370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:33,786-Speed 8563.34 samples/sec   Loss 3.3491   LearningRate 0.0000   Epoch: 19   Global Step: 327380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:34,879-Speed 9380.79 samples/sec   Loss 3.2333   LearningRate 0.0000   Epoch: 19   Global Step: 327390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:35,933-Speed 9714.58 samples/sec   Loss 3.3523   LearningRate 0.0000   Epoch: 19   Global Step: 327400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:37,050-Speed 9176.31 samples/sec   Loss 3.3625   LearningRate 0.0000   Epoch: 19   Global Step: 327410   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:49:38,161-Speed 9223.55 samples/sec   Loss 3.3378   LearningRate 0.0000   Epoch: 19   Global Step: 327420   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:49:39,261-Speed 9313.94 samples/sec   Loss 3.3039   LearningRate 0.0000   Epoch: 19   Global Step: 327430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:40,392-Speed 9060.80 samples/sec   Loss 3.2682   LearningRate 0.0000   Epoch: 19   Global Step: 327440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:41,494-Speed 9305.93 samples/sec   Loss 3.3048   LearningRate 0.0000   Epoch: 19   Global Step: 327450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:42,608-Speed 9188.63 samples/sec   Loss 3.2406   LearningRate 0.0000   Epoch: 19   Global Step: 327460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:43,769-Speed 8830.03 samples/sec   Loss 3.3312   LearningRate 0.0000   Epoch: 19   Global Step: 327470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:44,869-Speed 9318.73 samples/sec   Loss 3.2678   LearningRate 0.0000   Epoch: 19   Global Step: 327480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:45,931-Speed 9639.92 samples/sec   Loss 3.3646   LearningRate 0.0000   Epoch: 19   Global Step: 327490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:47,070-Speed 8996.17 samples/sec   Loss 3.3111   LearningRate 0.0000   Epoch: 19   Global Step: 327500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:48,182-Speed 9216.38 samples/sec   Loss 3.2829   LearningRate 0.0000   Epoch: 19   Global Step: 327510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:49,282-Speed 9315.53 samples/sec   Loss 3.2258   LearningRate 0.0000   Epoch: 19   Global Step: 327520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:50,402-Speed 9144.16 samples/sec   Loss 3.2929   LearningRate 0.0000   Epoch: 19   Global Step: 327530   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:49:51,509-Speed 9256.96 samples/sec   Loss 3.3057   LearningRate 0.0000   Epoch: 19   Global Step: 327540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:52,692-Speed 8662.58 samples/sec   Loss 3.3750   LearningRate 0.0000   Epoch: 19   Global Step: 327550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:53,774-Speed 9468.91 samples/sec   Loss 3.3580   LearningRate 0.0000   Epoch: 19   Global Step: 327560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:54,886-Speed 9211.10 samples/sec   Loss 3.3504   LearningRate 0.0000   Epoch: 19   Global Step: 327570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:55,998-Speed 9212.31 samples/sec   Loss 3.3037   LearningRate 0.0000   Epoch: 19   Global Step: 327580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:57,074-Speed 9527.58 samples/sec   Loss 3.3058   LearningRate 0.0000   Epoch: 19   Global Step: 327590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:58,163-Speed 9407.63 samples/sec   Loss 3.2805   LearningRate 0.0000   Epoch: 19   Global Step: 327600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:49:59,287-Speed 9114.47 samples/sec   Loss 3.2598   LearningRate 0.0000   Epoch: 19   Global Step: 327610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:00,419-Speed 9057.85 samples/sec   Loss 3.3773   LearningRate 0.0000   Epoch: 19   Global Step: 327620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:01,497-Speed 9500.79 samples/sec   Loss 3.2701   LearningRate 0.0000   Epoch: 19   Global Step: 327630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:02,575-Speed 9506.07 samples/sec   Loss 3.2661   LearningRate 0.0000   Epoch: 19   Global Step: 327640   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:50:03,674-Speed 9323.38 samples/sec   Loss 3.2902   LearningRate 0.0000   Epoch: 19   Global Step: 327650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:04,767-Speed 9370.15 samples/sec   Loss 3.4144   LearningRate 0.0000   Epoch: 19   Global Step: 327660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:05,884-Speed 9174.04 samples/sec   Loss 3.2606   LearningRate 0.0000   Epoch: 19   Global Step: 327670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:06,988-Speed 9279.49 samples/sec   Loss 3.2693   LearningRate 0.0000   Epoch: 19   Global Step: 327680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:08,090-Speed 9304.73 samples/sec   Loss 3.3332   LearningRate 0.0000   Epoch: 19   Global Step: 327690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:09,233-Speed 8961.16 samples/sec   Loss 3.2522   LearningRate 0.0000   Epoch: 19   Global Step: 327700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:10,295-Speed 9647.32 samples/sec   Loss 3.3685   LearningRate 0.0000   Epoch: 19   Global Step: 327710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:11,377-Speed 9465.49 samples/sec   Loss 3.2273   LearningRate 0.0000   Epoch: 19   Global Step: 327720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:12,470-Speed 9381.53 samples/sec   Loss 3.2775   LearningRate 0.0000   Epoch: 19   Global Step: 327730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:13,544-Speed 9543.60 samples/sec   Loss 3.2419   LearningRate 0.0000   Epoch: 19   Global Step: 327740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:14,699-Speed 8869.80 samples/sec   Loss 3.3591   LearningRate 0.0000   Epoch: 19   Global Step: 327750   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:50:15,814-Speed 9191.24 samples/sec   Loss 3.3101   LearningRate 0.0000   Epoch: 19   Global Step: 327760   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:50:16,900-Speed 9431.19 samples/sec   Loss 3.3368   LearningRate 0.0000   Epoch: 19   Global Step: 327770   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:50:18,014-Speed 9203.15 samples/sec   Loss 3.2687   LearningRate 0.0000   Epoch: 19   Global Step: 327780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:19,159-Speed 8942.90 samples/sec   Loss 3.3467   LearningRate 0.0000   Epoch: 19   Global Step: 327790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:20,234-Speed 9536.05 samples/sec   Loss 3.3167   LearningRate 0.0000   Epoch: 19   Global Step: 327800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:21,356-Speed 9127.66 samples/sec   Loss 3.3482   LearningRate 0.0000   Epoch: 19   Global Step: 327810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:22,425-Speed 9579.98 samples/sec   Loss 3.2923   LearningRate 0.0000   Epoch: 19   Global Step: 327820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:23,553-Speed 9088.27 samples/sec   Loss 3.2361   LearningRate 0.0000   Epoch: 19   Global Step: 327830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:24,664-Speed 9220.49 samples/sec   Loss 3.2546   LearningRate 0.0000   Epoch: 19   Global Step: 327840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:25,831-Speed 8780.16 samples/sec   Loss 3.2733   LearningRate 0.0000   Epoch: 19   Global Step: 327850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:26,990-Speed 8841.43 samples/sec   Loss 3.2349   LearningRate 0.0000   Epoch: 19   Global Step: 327860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:28,098-Speed 9245.93 samples/sec   Loss 3.2601   LearningRate 0.0000   Epoch: 19   Global Step: 327870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:29,221-Speed 9130.09 samples/sec   Loss 3.3170   LearningRate 0.0000   Epoch: 19   Global Step: 327880   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:50:30,395-Speed 8724.05 samples/sec   Loss 3.3538   LearningRate 0.0000   Epoch: 19   Global Step: 327890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:31,494-Speed 9322.12 samples/sec   Loss 3.3786   LearningRate 0.0000   Epoch: 19   Global Step: 327900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:32,642-Speed 8926.56 samples/sec   Loss 3.3242   LearningRate 0.0000   Epoch: 19   Global Step: 327910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:33,820-Speed 8701.46 samples/sec   Loss 3.2828   LearningRate 0.0000   Epoch: 19   Global Step: 327920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:34,906-Speed 9433.58 samples/sec   Loss 3.2791   LearningRate 0.0000   Epoch: 19   Global Step: 327930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:36,012-Speed 9266.39 samples/sec   Loss 3.2910   LearningRate 0.0000   Epoch: 19   Global Step: 327940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:37,106-Speed 9370.72 samples/sec   Loss 3.1996   LearningRate 0.0000   Epoch: 19   Global Step: 327950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:38,206-Speed 9310.70 samples/sec   Loss 3.2699   LearningRate 0.0000   Epoch: 19   Global Step: 327960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:39,284-Speed 9503.26 samples/sec   Loss 3.3356   LearningRate 0.0000   Epoch: 19   Global Step: 327970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:40,402-Speed 9166.05 samples/sec   Loss 3.2941   LearningRate 0.0000   Epoch: 19   Global Step: 327980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:41,531-Speed 9069.24 samples/sec   Loss 3.3323   LearningRate 0.0000   Epoch: 19   Global Step: 327990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:50:42,634-Speed 9290.21 samples/sec   Loss 3.3207   LearningRate 0.0000   Epoch: 19   Global Step: 328000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:51:04,508-[lfw][328000]XNorm: 6.515726
Training: 2022-04-12 00:51:04,508-[lfw][328000]Accuracy-Flip: 0.99733+-0.00291
Training: 2022-04-12 00:51:04,508-[lfw][328000]Accuracy-Highest: 0.99750
Training: 2022-04-12 00:51:29,940-[cfp_fp][328000]XNorm: 5.690911
Training: 2022-04-12 00:51:29,941-[cfp_fp][328000]Accuracy-Flip: 0.97286+-0.00838
Training: 2022-04-12 00:51:29,941-[cfp_fp][328000]Accuracy-Highest: 0.97543
Training: 2022-04-12 00:51:51,814-[agedb_30][328000]XNorm: 6.350155
Training: 2022-04-12 00:51:51,815-[agedb_30][328000]Accuracy-Flip: 0.97217+-0.00806
Training: 2022-04-12 00:51:51,815-[agedb_30][328000]Accuracy-Highest: 0.97417
Training: 2022-04-12 00:51:52,916-Speed 145.70 samples/sec   Loss 3.2589   LearningRate 0.0000   Epoch: 19   Global Step: 328010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:51:53,993-Speed 9520.11 samples/sec   Loss 3.3285   LearningRate 0.0000   Epoch: 19   Global Step: 328020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:51:55,067-Speed 9532.41 samples/sec   Loss 3.2296   LearningRate 0.0000   Epoch: 19   Global Step: 328030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:51:56,162-Speed 9365.11 samples/sec   Loss 3.3119   LearningRate 0.0000   Epoch: 19   Global Step: 328040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:51:57,266-Speed 9277.50 samples/sec   Loss 3.3819   LearningRate 0.0000   Epoch: 19   Global Step: 328050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:51:58,374-Speed 9245.98 samples/sec   Loss 3.2495   LearningRate 0.0000   Epoch: 19   Global Step: 328060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:51:59,456-Speed 9470.82 samples/sec   Loss 3.3074   LearningRate 0.0000   Epoch: 19   Global Step: 328070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:52:00,583-Speed 9095.89 samples/sec   Loss 3.2388   LearningRate 0.0000   Epoch: 19   Global Step: 328080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:52:01,697-Speed 9197.05 samples/sec   Loss 3.3069   LearningRate 0.0000   Epoch: 19   Global Step: 328090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:52:02,797-Speed 9311.07 samples/sec   Loss 3.2792   LearningRate 0.0000   Epoch: 19   Global Step: 328100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:52:03,894-Speed 9341.13 samples/sec   Loss 3.2507   LearningRate 0.0000   Epoch: 19   Global Step: 328110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:52:05,009-Speed 9186.04 samples/sec   Loss 3.2416   LearningRate 0.0000   Epoch: 19   Global Step: 328120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:52:06,104-Speed 9360.04 samples/sec   Loss 3.2700   LearningRate 0.0000   Epoch: 19   Global Step: 328130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:07,211-Speed 9255.69 samples/sec   Loss 3.2851   LearningRate 0.0000   Epoch: 19   Global Step: 328140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:08,303-Speed 9382.08 samples/sec   Loss 3.3012   LearningRate 0.0000   Epoch: 19   Global Step: 328150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:09,441-Speed 9006.41 samples/sec   Loss 3.3038   LearningRate 0.0000   Epoch: 19   Global Step: 328160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:10,566-Speed 9101.71 samples/sec   Loss 3.2703   LearningRate 0.0000   Epoch: 19   Global Step: 328170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:11,694-Speed 9081.28 samples/sec   Loss 3.2706   LearningRate 0.0000   Epoch: 19   Global Step: 328180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:12,798-Speed 9286.41 samples/sec   Loss 3.2410   LearningRate 0.0000   Epoch: 19   Global Step: 328190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:13,921-Speed 9120.56 samples/sec   Loss 3.2355   LearningRate 0.0000   Epoch: 19   Global Step: 328200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:15,003-Speed 9470.85 samples/sec   Loss 3.3016   LearningRate 0.0000   Epoch: 19   Global Step: 328210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:16,084-Speed 9483.18 samples/sec   Loss 3.2798   LearningRate 0.0000   Epoch: 19   Global Step: 328220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:17,271-Speed 8629.84 samples/sec   Loss 3.2373   LearningRate 0.0000   Epoch: 19   Global Step: 328230   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:52:18,414-Speed 8960.28 samples/sec   Loss 3.2319   LearningRate 0.0000   Epoch: 19   Global Step: 328240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:19,529-Speed 9193.17 samples/sec   Loss 3.2814   LearningRate 0.0000   Epoch: 19   Global Step: 328250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:20,649-Speed 9152.19 samples/sec   Loss 3.2395   LearningRate 0.0000   Epoch: 19   Global Step: 328260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:21,747-Speed 9327.53 samples/sec   Loss 3.2947   LearningRate 0.0000   Epoch: 19   Global Step: 328270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:22,836-Speed 9411.96 samples/sec   Loss 3.2910   LearningRate 0.0000   Epoch: 19   Global Step: 328280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:23,966-Speed 9064.08 samples/sec   Loss 3.2800   LearningRate 0.0000   Epoch: 19   Global Step: 328290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:25,101-Speed 9028.59 samples/sec   Loss 3.2914   LearningRate 0.0000   Epoch: 19   Global Step: 328300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:26,239-Speed 9006.05 samples/sec   Loss 3.3100   LearningRate 0.0000   Epoch: 19   Global Step: 328310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:27,354-Speed 9189.38 samples/sec   Loss 3.2968   LearningRate 0.0000   Epoch: 19   Global Step: 328320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:28,505-Speed 8902.39 samples/sec   Loss 3.3155   LearningRate 0.0000   Epoch: 19   Global Step: 328330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:29,620-Speed 9188.85 samples/sec   Loss 3.3403   LearningRate 0.0000   Epoch: 19   Global Step: 328340   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:52:30,743-Speed 9124.62 samples/sec   Loss 3.3664   LearningRate 0.0000   Epoch: 19   Global Step: 328350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:31,819-Speed 9534.49 samples/sec   Loss 3.2683   LearningRate 0.0000   Epoch: 19   Global Step: 328360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:32,929-Speed 9226.49 samples/sec   Loss 3.3566   LearningRate 0.0000   Epoch: 19   Global Step: 328370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:34,112-Speed 8662.43 samples/sec   Loss 3.2484   LearningRate 0.0000   Epoch: 19   Global Step: 328380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:35,266-Speed 8877.10 samples/sec   Loss 3.3180   LearningRate 0.0000   Epoch: 19   Global Step: 328390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:36,358-Speed 9380.69 samples/sec   Loss 3.3400   LearningRate 0.0000   Epoch: 19   Global Step: 328400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:37,530-Speed 8745.59 samples/sec   Loss 3.3036   LearningRate 0.0000   Epoch: 19   Global Step: 328410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:38,639-Speed 9237.87 samples/sec   Loss 3.2205   LearningRate 0.0000   Epoch: 19   Global Step: 328420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:39,741-Speed 9302.83 samples/sec   Loss 3.2752   LearningRate 0.0000   Epoch: 19   Global Step: 328430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:40,848-Speed 9256.68 samples/sec   Loss 3.3304   LearningRate 0.0000   Epoch: 19   Global Step: 328440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:41,975-Speed 9091.78 samples/sec   Loss 3.2941   LearningRate 0.0000   Epoch: 19   Global Step: 328450   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:52:43,064-Speed 9414.09 samples/sec   Loss 3.2425   LearningRate 0.0000   Epoch: 19   Global Step: 328460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:44,163-Speed 9333.35 samples/sec   Loss 3.3563   LearningRate 0.0000   Epoch: 19   Global Step: 328470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:45,254-Speed 9390.73 samples/sec   Loss 3.2950   LearningRate 0.0000   Epoch: 19   Global Step: 328480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:46,398-Speed 8956.14 samples/sec   Loss 3.2981   LearningRate 0.0000   Epoch: 19   Global Step: 328490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:47,540-Speed 8978.79 samples/sec   Loss 3.2364   LearningRate 0.0000   Epoch: 19   Global Step: 328500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:48,667-Speed 9091.02 samples/sec   Loss 3.2952   LearningRate 0.0000   Epoch: 19   Global Step: 328510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:49,817-Speed 8904.92 samples/sec   Loss 3.2780   LearningRate 0.0000   Epoch: 19   Global Step: 328520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:50,869-Speed 9742.41 samples/sec   Loss 3.2942   LearningRate 0.0000   Epoch: 19   Global Step: 328530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:51,955-Speed 9434.17 samples/sec   Loss 3.3186   LearningRate 0.0000   Epoch: 19   Global Step: 328540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:53,033-Speed 9499.50 samples/sec   Loss 3.3352   LearningRate 0.0000   Epoch: 19   Global Step: 328550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:54,116-Speed 9460.52 samples/sec   Loss 3.3127   LearningRate 0.0000   Epoch: 19   Global Step: 328560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:55,176-Speed 9669.18 samples/sec   Loss 3.3663   LearningRate 0.0000   Epoch: 19   Global Step: 328570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:56,304-Speed 9087.81 samples/sec   Loss 3.2808   LearningRate 0.0000   Epoch: 19   Global Step: 328580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:57,416-Speed 9216.19 samples/sec   Loss 3.2613   LearningRate 0.0000   Epoch: 19   Global Step: 328590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:58,502-Speed 9433.52 samples/sec   Loss 3.2797   LearningRate 0.0000   Epoch: 19   Global Step: 328600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:52:59,621-Speed 9152.51 samples/sec   Loss 3.2823   LearningRate 0.0000   Epoch: 19   Global Step: 328610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:00,704-Speed 9463.57 samples/sec   Loss 3.3145   LearningRate 0.0000   Epoch: 19   Global Step: 328620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:01,824-Speed 9156.85 samples/sec   Loss 3.2744   LearningRate 0.0000   Epoch: 19   Global Step: 328630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:02,934-Speed 9224.32 samples/sec   Loss 3.2955   LearningRate 0.0000   Epoch: 19   Global Step: 328640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:04,120-Speed 8638.01 samples/sec   Loss 3.3091   LearningRate 0.0000   Epoch: 19   Global Step: 328650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:05,255-Speed 9028.16 samples/sec   Loss 3.2956   LearningRate 0.0000   Epoch: 19   Global Step: 328660   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:53:06,311-Speed 9699.07 samples/sec   Loss 3.3571   LearningRate 0.0000   Epoch: 19   Global Step: 328670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:07,432-Speed 9146.90 samples/sec   Loss 3.2360   LearningRate 0.0000   Epoch: 19   Global Step: 328680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:08,558-Speed 9102.83 samples/sec   Loss 3.2665   LearningRate 0.0000   Epoch: 19   Global Step: 328690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:09,701-Speed 8957.21 samples/sec   Loss 3.3069   LearningRate 0.0000   Epoch: 19   Global Step: 328700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:10,832-Speed 9060.46 samples/sec   Loss 3.2288   LearningRate 0.0000   Epoch: 19   Global Step: 328710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:11,950-Speed 9165.46 samples/sec   Loss 3.3762   LearningRate 0.0000   Epoch: 19   Global Step: 328720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:13,057-Speed 9255.90 samples/sec   Loss 3.2885   LearningRate 0.0000   Epoch: 19   Global Step: 328730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:14,146-Speed 9415.50 samples/sec   Loss 3.3275   LearningRate 0.0000   Epoch: 19   Global Step: 328740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:15,265-Speed 9152.07 samples/sec   Loss 3.2788   LearningRate 0.0000   Epoch: 19   Global Step: 328750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:16,352-Speed 9429.21 samples/sec   Loss 3.3118   LearningRate 0.0000   Epoch: 19   Global Step: 328760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:17,440-Speed 9420.55 samples/sec   Loss 3.2958   LearningRate 0.0000   Epoch: 19   Global Step: 328770   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:53:18,569-Speed 9077.43 samples/sec   Loss 3.2656   LearningRate 0.0000   Epoch: 19   Global Step: 328780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:19,692-Speed 9119.04 samples/sec   Loss 3.3003   LearningRate 0.0000   Epoch: 19   Global Step: 328790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:20,797-Speed 9270.39 samples/sec   Loss 3.2379   LearningRate 0.0000   Epoch: 19   Global Step: 328800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:21,900-Speed 9293.09 samples/sec   Loss 3.2667   LearningRate 0.0000   Epoch: 19   Global Step: 328810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:22,983-Speed 9459.16 samples/sec   Loss 3.3400   LearningRate 0.0000   Epoch: 19   Global Step: 328820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:24,112-Speed 9078.50 samples/sec   Loss 3.2581   LearningRate 0.0000   Epoch: 19   Global Step: 328830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:25,222-Speed 9223.70 samples/sec   Loss 3.1944   LearningRate 0.0000   Epoch: 19   Global Step: 328840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:26,361-Speed 9001.37 samples/sec   Loss 3.3453   LearningRate 0.0000   Epoch: 19   Global Step: 328850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:27,481-Speed 9147.12 samples/sec   Loss 3.2595   LearningRate 0.0000   Epoch: 19   Global Step: 328860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:28,625-Speed 8957.32 samples/sec   Loss 3.3015   LearningRate 0.0000   Epoch: 19   Global Step: 328870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:29,811-Speed 8642.65 samples/sec   Loss 3.3100   LearningRate 0.0000   Epoch: 19   Global Step: 328880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:30,886-Speed 9525.16 samples/sec   Loss 3.2864   LearningRate 0.0000   Epoch: 19   Global Step: 328890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:31,992-Speed 9266.78 samples/sec   Loss 3.2485   LearningRate 0.0000   Epoch: 19   Global Step: 328900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:33,089-Speed 9337.03 samples/sec   Loss 3.2455   LearningRate 0.0000   Epoch: 19   Global Step: 328910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:34,193-Speed 9278.73 samples/sec   Loss 3.2826   LearningRate 0.0000   Epoch: 19   Global Step: 328920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:35,271-Speed 9507.43 samples/sec   Loss 3.2173   LearningRate 0.0000   Epoch: 19   Global Step: 328930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:36,330-Speed 9679.72 samples/sec   Loss 3.3264   LearningRate 0.0000   Epoch: 19   Global Step: 328940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:37,495-Speed 8794.24 samples/sec   Loss 3.2533   LearningRate 0.0000   Epoch: 19   Global Step: 328950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:38,693-Speed 8557.92 samples/sec   Loss 3.2241   LearningRate 0.0000   Epoch: 19   Global Step: 328960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:39,775-Speed 9469.36 samples/sec   Loss 3.3053   LearningRate 0.0000   Epoch: 19   Global Step: 328970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:40,854-Speed 9491.61 samples/sec   Loss 3.3901   LearningRate 0.0000   Epoch: 19   Global Step: 328980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:41,947-Speed 9376.15 samples/sec   Loss 3.2493   LearningRate 0.0000   Epoch: 19   Global Step: 328990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:43,041-Speed 9364.31 samples/sec   Loss 3.2474   LearningRate 0.0000   Epoch: 19   Global Step: 329000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:44,134-Speed 9376.13 samples/sec   Loss 3.2547   LearningRate 0.0000   Epoch: 19   Global Step: 329010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:45,211-Speed 9512.81 samples/sec   Loss 3.3523   LearningRate 0.0000   Epoch: 19   Global Step: 329020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:46,296-Speed 9442.10 samples/sec   Loss 3.2900   LearningRate 0.0000   Epoch: 19   Global Step: 329030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:47,432-Speed 9021.98 samples/sec   Loss 3.3140   LearningRate 0.0000   Epoch: 19   Global Step: 329040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:48,588-Speed 8860.02 samples/sec   Loss 3.1978   LearningRate 0.0000   Epoch: 19   Global Step: 329050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:49,750-Speed 8823.31 samples/sec   Loss 3.3456   LearningRate 0.0000   Epoch: 19   Global Step: 329060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:53:50,853-Speed 9287.92 samples/sec   Loss 3.2928   LearningRate 0.0000   Epoch: 19   Global Step: 329070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:51,956-Speed 9290.30 samples/sec   Loss 3.2672   LearningRate 0.0000   Epoch: 19   Global Step: 329080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:53,094-Speed 9005.84 samples/sec   Loss 3.2793   LearningRate 0.0000   Epoch: 19   Global Step: 329090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:54,247-Speed 8882.13 samples/sec   Loss 3.2605   LearningRate 0.0000   Epoch: 19   Global Step: 329100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:55,378-Speed 9061.18 samples/sec   Loss 3.2170   LearningRate 0.0000   Epoch: 19   Global Step: 329110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:56,489-Speed 9230.16 samples/sec   Loss 3.2043   LearningRate 0.0000   Epoch: 19   Global Step: 329120   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:57,651-Speed 8820.41 samples/sec   Loss 3.3169   LearningRate 0.0000   Epoch: 19   Global Step: 329130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:58,776-Speed 9105.83 samples/sec   Loss 3.3133   LearningRate 0.0000   Epoch: 19   Global Step: 329140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:53:59,899-Speed 9123.47 samples/sec   Loss 3.2984   LearningRate 0.0000   Epoch: 19   Global Step: 329150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:54:01,002-Speed 9292.65 samples/sec   Loss 3.2151   LearningRate 0.0000   Epoch: 19   Global Step: 329160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:54:02,108-Speed 9259.30 samples/sec   Loss 3.3207   LearningRate 0.0000   Epoch: 19   Global Step: 329170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:03,235-Speed 9091.99 samples/sec   Loss 3.3186   LearningRate 0.0000   Epoch: 19   Global Step: 329180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:04,308-Speed 9546.12 samples/sec   Loss 3.2480   LearningRate 0.0000   Epoch: 19   Global Step: 329190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:05,370-Speed 9653.24 samples/sec   Loss 3.2825   LearningRate 0.0000   Epoch: 19   Global Step: 329200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:06,501-Speed 9052.68 samples/sec   Loss 3.3136   LearningRate 0.0000   Epoch: 19   Global Step: 329210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:07,674-Speed 8740.80 samples/sec   Loss 3.3281   LearningRate 0.0000   Epoch: 19   Global Step: 329220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:08,814-Speed 8987.88 samples/sec   Loss 3.3387   LearningRate 0.0000   Epoch: 19   Global Step: 329230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:09,944-Speed 9068.30 samples/sec   Loss 3.2860   LearningRate 0.0000   Epoch: 19   Global Step: 329240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:11,075-Speed 9062.23 samples/sec   Loss 3.1992   LearningRate 0.0000   Epoch: 19   Global Step: 329250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:12,212-Speed 9009.99 samples/sec   Loss 3.3118   LearningRate 0.0000   Epoch: 19   Global Step: 329260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:13,310-Speed 9332.19 samples/sec   Loss 3.3476   LearningRate 0.0000   Epoch: 19   Global Step: 329270   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:54:14,424-Speed 9206.87 samples/sec   Loss 3.3431   LearningRate 0.0000   Epoch: 19   Global Step: 329280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:15,505-Speed 9472.55 samples/sec   Loss 3.2647   LearningRate 0.0000   Epoch: 19   Global Step: 329290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:16,638-Speed 9047.40 samples/sec   Loss 3.2940   LearningRate 0.0000   Epoch: 19   Global Step: 329300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:17,772-Speed 9030.26 samples/sec   Loss 3.2992   LearningRate 0.0000   Epoch: 19   Global Step: 329310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:18,903-Speed 9058.90 samples/sec   Loss 3.2807   LearningRate 0.0000   Epoch: 19   Global Step: 329320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:19,998-Speed 9361.10 samples/sec   Loss 3.2212   LearningRate 0.0000   Epoch: 19   Global Step: 329330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:21,102-Speed 9275.85 samples/sec   Loss 3.3307   LearningRate 0.0000   Epoch: 19   Global Step: 329340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:22,219-Speed 9171.96 samples/sec   Loss 3.1885   LearningRate 0.0000   Epoch: 19   Global Step: 329350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:23,391-Speed 8745.67 samples/sec   Loss 3.3664   LearningRate 0.0000   Epoch: 19   Global Step: 329360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:24,567-Speed 8713.29 samples/sec   Loss 3.2767   LearningRate 0.0000   Epoch: 19   Global Step: 329370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:25,713-Speed 8937.97 samples/sec   Loss 3.2599   LearningRate 0.0000   Epoch: 19   Global Step: 329380   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:54:26,831-Speed 9164.76 samples/sec   Loss 3.2421   LearningRate 0.0000   Epoch: 19   Global Step: 329390   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:54:27,992-Speed 8830.20 samples/sec   Loss 3.3041   LearningRate 0.0000   Epoch: 19   Global Step: 329400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:29,109-Speed 9170.07 samples/sec   Loss 3.2415   LearningRate 0.0000   Epoch: 19   Global Step: 329410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:30,259-Speed 8912.14 samples/sec   Loss 3.2955   LearningRate 0.0000   Epoch: 19   Global Step: 329420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:31,364-Speed 9272.52 samples/sec   Loss 3.3137   LearningRate 0.0000   Epoch: 19   Global Step: 329430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:32,467-Speed 9287.14 samples/sec   Loss 3.4026   LearningRate 0.0000   Epoch: 19   Global Step: 329440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:33,533-Speed 9611.37 samples/sec   Loss 3.3001   LearningRate 0.0000   Epoch: 19   Global Step: 329450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:34,637-Speed 9278.16 samples/sec   Loss 3.2805   LearningRate 0.0000   Epoch: 19   Global Step: 329460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:35,749-Speed 9214.32 samples/sec   Loss 3.2846   LearningRate 0.0000   Epoch: 19   Global Step: 329470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:36,840-Speed 9393.80 samples/sec   Loss 3.2225   LearningRate 0.0000   Epoch: 19   Global Step: 329480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:37,917-Speed 9521.18 samples/sec   Loss 3.3243   LearningRate 0.0000   Epoch: 19   Global Step: 329490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:39,044-Speed 9090.26 samples/sec   Loss 3.3229   LearningRate 0.0000   Epoch: 19   Global Step: 329500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:40,153-Speed 9232.15 samples/sec   Loss 3.2927   LearningRate 0.0000   Epoch: 19   Global Step: 329510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:41,306-Speed 8889.61 samples/sec   Loss 3.3232   LearningRate 0.0000   Epoch: 19   Global Step: 329520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:42,419-Speed 9202.97 samples/sec   Loss 3.2266   LearningRate 0.0000   Epoch: 19   Global Step: 329530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:43,526-Speed 9258.50 samples/sec   Loss 3.3476   LearningRate 0.0000   Epoch: 19   Global Step: 329540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:44,651-Speed 9108.97 samples/sec   Loss 3.2380   LearningRate 0.0000   Epoch: 19   Global Step: 329550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:45,728-Speed 9516.19 samples/sec   Loss 3.3595   LearningRate 0.0000   Epoch: 19   Global Step: 329560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:46,862-Speed 9036.62 samples/sec   Loss 3.2453   LearningRate 0.0000   Epoch: 19   Global Step: 329570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:47,974-Speed 9211.97 samples/sec   Loss 3.2315   LearningRate 0.0000   Epoch: 19   Global Step: 329580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:49,128-Speed 8877.11 samples/sec   Loss 3.3477   LearningRate 0.0000   Epoch: 19   Global Step: 329590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:50,224-Speed 9353.25 samples/sec   Loss 3.2861   LearningRate 0.0000   Epoch: 19   Global Step: 329600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:51,337-Speed 9207.14 samples/sec   Loss 3.2969   LearningRate 0.0000   Epoch: 19   Global Step: 329610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:52,451-Speed 9197.63 samples/sec   Loss 3.2492   LearningRate 0.0000   Epoch: 19   Global Step: 329620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:53,596-Speed 8944.61 samples/sec   Loss 3.3100   LearningRate 0.0000   Epoch: 19   Global Step: 329630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:54,743-Speed 8931.53 samples/sec   Loss 3.2675   LearningRate 0.0000   Epoch: 19   Global Step: 329640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:55,868-Speed 9111.30 samples/sec   Loss 3.3054   LearningRate 0.0000   Epoch: 19   Global Step: 329650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:56,949-Speed 9482.59 samples/sec   Loss 3.3186   LearningRate 0.0000   Epoch: 19   Global Step: 329660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:58,019-Speed 9573.50 samples/sec   Loss 3.2413   LearningRate 0.0000   Epoch: 19   Global Step: 329670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:54:59,128-Speed 9234.26 samples/sec   Loss 3.3360   LearningRate 0.0000   Epoch: 19   Global Step: 329680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:00,306-Speed 8701.65 samples/sec   Loss 3.3115   LearningRate 0.0000   Epoch: 19   Global Step: 329690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:01,432-Speed 9097.20 samples/sec   Loss 3.2021   LearningRate 0.0000   Epoch: 19   Global Step: 329700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:02,526-Speed 9362.54 samples/sec   Loss 3.3088   LearningRate 0.0000   Epoch: 19   Global Step: 329710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:03,628-Speed 9299.16 samples/sec   Loss 3.2987   LearningRate 0.0000   Epoch: 19   Global Step: 329720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:04,809-Speed 8679.74 samples/sec   Loss 3.2750   LearningRate 0.0000   Epoch: 19   Global Step: 329730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:05,925-Speed 9175.58 samples/sec   Loss 3.3146   LearningRate 0.0000   Epoch: 19   Global Step: 329740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:07,048-Speed 9134.62 samples/sec   Loss 3.2776   LearningRate 0.0000   Epoch: 19   Global Step: 329750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:08,173-Speed 9108.14 samples/sec   Loss 3.2980   LearningRate 0.0000   Epoch: 19   Global Step: 329760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:09,322-Speed 8918.39 samples/sec   Loss 3.2910   LearningRate 0.0000   Epoch: 19   Global Step: 329770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:10,426-Speed 9284.70 samples/sec   Loss 3.2964   LearningRate 0.0000   Epoch: 19   Global Step: 329780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:11,506-Speed 9484.78 samples/sec   Loss 3.2640   LearningRate 0.0000   Epoch: 19   Global Step: 329790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:12,600-Speed 9362.37 samples/sec   Loss 3.2601   LearningRate 0.0000   Epoch: 19   Global Step: 329800   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:55:13,782-Speed 8673.05 samples/sec   Loss 3.2125   LearningRate 0.0000   Epoch: 19   Global Step: 329810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:14,895-Speed 9202.02 samples/sec   Loss 3.3274   LearningRate 0.0000   Epoch: 19   Global Step: 329820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:15,985-Speed 9406.76 samples/sec   Loss 3.3011   LearningRate 0.0000   Epoch: 19   Global Step: 329830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:17,086-Speed 9299.17 samples/sec   Loss 3.2071   LearningRate 0.0000   Epoch: 19   Global Step: 329840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:18,177-Speed 9395.10 samples/sec   Loss 3.2528   LearningRate 0.0000   Epoch: 19   Global Step: 329850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:19,256-Speed 9493.72 samples/sec   Loss 3.2770   LearningRate 0.0000   Epoch: 19   Global Step: 329860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:20,330-Speed 9540.61 samples/sec   Loss 3.3131   LearningRate 0.0000   Epoch: 19   Global Step: 329870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:21,441-Speed 9221.20 samples/sec   Loss 3.3300   LearningRate 0.0000   Epoch: 19   Global Step: 329880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:22,551-Speed 9228.63 samples/sec   Loss 3.2315   LearningRate 0.0000   Epoch: 19   Global Step: 329890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:23,693-Speed 8977.10 samples/sec   Loss 3.3650   LearningRate 0.0000   Epoch: 19   Global Step: 329900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:24,829-Speed 9015.36 samples/sec   Loss 3.3742   LearningRate 0.0000   Epoch: 19   Global Step: 329910   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:55:25,924-Speed 9365.16 samples/sec   Loss 3.2167   LearningRate 0.0000   Epoch: 19   Global Step: 329920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:27,004-Speed 9482.97 samples/sec   Loss 3.2089   LearningRate 0.0000   Epoch: 19   Global Step: 329930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:28,105-Speed 9309.49 samples/sec   Loss 3.3508   LearningRate 0.0000   Epoch: 19   Global Step: 329940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:29,216-Speed 9217.26 samples/sec   Loss 3.3291   LearningRate 0.0000   Epoch: 19   Global Step: 329950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:30,319-Speed 9290.13 samples/sec   Loss 3.3192   LearningRate 0.0000   Epoch: 19   Global Step: 329960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:31,457-Speed 9010.32 samples/sec   Loss 3.3037   LearningRate 0.0000   Epoch: 19   Global Step: 329970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:32,581-Speed 9115.76 samples/sec   Loss 3.2742   LearningRate 0.0000   Epoch: 19   Global Step: 329980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:33,736-Speed 8866.35 samples/sec   Loss 3.2220   LearningRate 0.0000   Epoch: 19   Global Step: 329990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:34,881-Speed 8950.99 samples/sec   Loss 3.2903   LearningRate 0.0000   Epoch: 19   Global Step: 330000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:55:56,753-[lfw][330000]XNorm: 6.533692
Training: 2022-04-12 00:55:56,754-[lfw][330000]Accuracy-Flip: 0.99633+-0.00287
Training: 2022-04-12 00:55:56,754-[lfw][330000]Accuracy-Highest: 0.99750
Training: 2022-04-12 00:56:22,045-[cfp_fp][330000]XNorm: 5.707386
Training: 2022-04-12 00:56:22,046-[cfp_fp][330000]Accuracy-Flip: 0.97386+-0.00888
Training: 2022-04-12 00:56:22,046-[cfp_fp][330000]Accuracy-Highest: 0.97543
Training: 2022-04-12 00:56:43,894-[agedb_30][330000]XNorm: 6.365573
Training: 2022-04-12 00:56:43,894-[agedb_30][330000]Accuracy-Flip: 0.97400+-0.00720
Training: 2022-04-12 00:56:43,894-[agedb_30][330000]Accuracy-Highest: 0.97417
Training: 2022-04-12 00:56:45,053-Speed 145.93 samples/sec   Loss 3.2732   LearningRate 0.0000   Epoch: 19   Global Step: 330010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:56:46,187-Speed 9031.81 samples/sec   Loss 3.3010   LearningRate 0.0000   Epoch: 19   Global Step: 330020   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:56:47,310-Speed 9119.94 samples/sec   Loss 3.2775   LearningRate 0.0000   Epoch: 19   Global Step: 330030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:56:48,388-Speed 9513.77 samples/sec   Loss 3.2932   LearningRate 0.0000   Epoch: 19   Global Step: 330040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:56:49,535-Speed 8929.03 samples/sec   Loss 3.3366   LearningRate 0.0000   Epoch: 19   Global Step: 330050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:56:50,655-Speed 9152.40 samples/sec   Loss 3.1703   LearningRate 0.0000   Epoch: 19   Global Step: 330060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:56:51,786-Speed 9057.37 samples/sec   Loss 3.2951   LearningRate 0.0000   Epoch: 19   Global Step: 330070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:56:52,884-Speed 9330.47 samples/sec   Loss 3.2730   LearningRate 0.0000   Epoch: 19   Global Step: 330080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:56:54,029-Speed 8946.49 samples/sec   Loss 3.3658   LearningRate 0.0000   Epoch: 19   Global Step: 330090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:56:55,155-Speed 9099.66 samples/sec   Loss 3.2280   LearningRate 0.0000   Epoch: 19   Global Step: 330100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:56:56,256-Speed 9306.10 samples/sec   Loss 3.2998   LearningRate 0.0000   Epoch: 19   Global Step: 330110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:56:57,403-Speed 8933.03 samples/sec   Loss 3.2664   LearningRate 0.0000   Epoch: 19   Global Step: 330120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:56:58,494-Speed 9391.87 samples/sec   Loss 3.3686   LearningRate 0.0000   Epoch: 19   Global Step: 330130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:56:59,595-Speed 9309.42 samples/sec   Loss 3.2610   LearningRate 0.0000   Epoch: 19   Global Step: 330140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:00,710-Speed 9188.16 samples/sec   Loss 3.3301   LearningRate 0.0000   Epoch: 19   Global Step: 330150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:01,923-Speed 8446.29 samples/sec   Loss 3.3137   LearningRate 0.0000   Epoch: 19   Global Step: 330160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:03,017-Speed 9363.59 samples/sec   Loss 3.2414   LearningRate 0.0000   Epoch: 19   Global Step: 330170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:04,130-Speed 9214.89 samples/sec   Loss 3.2495   LearningRate 0.0000   Epoch: 19   Global Step: 330180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:05,237-Speed 9255.48 samples/sec   Loss 3.2689   LearningRate 0.0000   Epoch: 19   Global Step: 330190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:06,367-Speed 9065.87 samples/sec   Loss 3.3149   LearningRate 0.0000   Epoch: 19   Global Step: 330200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:07,455-Speed 9413.30 samples/sec   Loss 3.2506   LearningRate 0.0000   Epoch: 19   Global Step: 330210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:08,575-Speed 9146.56 samples/sec   Loss 3.3055   LearningRate 0.0000   Epoch: 19   Global Step: 330220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:09,670-Speed 9360.50 samples/sec   Loss 3.2679   LearningRate 0.0000   Epoch: 19   Global Step: 330230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:10,766-Speed 9342.69 samples/sec   Loss 3.2677   LearningRate 0.0000   Epoch: 19   Global Step: 330240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:11,876-Speed 9236.88 samples/sec   Loss 3.2271   LearningRate 0.0000   Epoch: 19   Global Step: 330250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:13,011-Speed 9037.65 samples/sec   Loss 3.2859   LearningRate 0.0000   Epoch: 19   Global Step: 330260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:14,100-Speed 9413.67 samples/sec   Loss 3.3035   LearningRate 0.0000   Epoch: 19   Global Step: 330270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:15,229-Speed 9079.19 samples/sec   Loss 3.2718   LearningRate 0.0000   Epoch: 19   Global Step: 330280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:16,355-Speed 9098.75 samples/sec   Loss 3.2277   LearningRate 0.0000   Epoch: 19   Global Step: 330290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:17,470-Speed 9188.92 samples/sec   Loss 3.2969   LearningRate 0.0000   Epoch: 19   Global Step: 330300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:18,586-Speed 9184.01 samples/sec   Loss 3.3188   LearningRate 0.0000   Epoch: 19   Global Step: 330310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:19,758-Speed 8745.51 samples/sec   Loss 3.3135   LearningRate 0.0000   Epoch: 19   Global Step: 330320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:20,901-Speed 8959.84 samples/sec   Loss 3.2368   LearningRate 0.0000   Epoch: 19   Global Step: 330330   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:57:22,008-Speed 9251.69 samples/sec   Loss 3.3275   LearningRate 0.0000   Epoch: 19   Global Step: 330340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:23,118-Speed 9232.28 samples/sec   Loss 3.3237   LearningRate 0.0000   Epoch: 19   Global Step: 330350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:24,272-Speed 8880.55 samples/sec   Loss 3.2500   LearningRate 0.0000   Epoch: 19   Global Step: 330360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:25,398-Speed 9100.28 samples/sec   Loss 3.2535   LearningRate 0.0000   Epoch: 19   Global Step: 330370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:26,529-Speed 9058.00 samples/sec   Loss 3.2732   LearningRate 0.0000   Epoch: 19   Global Step: 330380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:27,660-Speed 9060.26 samples/sec   Loss 3.3130   LearningRate 0.0000   Epoch: 19   Global Step: 330390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:28,801-Speed 8974.92 samples/sec   Loss 3.3059   LearningRate 0.0000   Epoch: 19   Global Step: 330400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:29,932-Speed 9061.29 samples/sec   Loss 3.2778   LearningRate 0.0000   Epoch: 19   Global Step: 330410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:31,077-Speed 8943.10 samples/sec   Loss 3.2387   LearningRate 0.0000   Epoch: 19   Global Step: 330420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:32,189-Speed 9224.33 samples/sec   Loss 3.4047   LearningRate 0.0000   Epoch: 19   Global Step: 330430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:33,306-Speed 9174.90 samples/sec   Loss 3.2659   LearningRate 0.0000   Epoch: 19   Global Step: 330440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:34,416-Speed 9227.17 samples/sec   Loss 3.2557   LearningRate 0.0000   Epoch: 19   Global Step: 330450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:35,536-Speed 9151.17 samples/sec   Loss 3.3379   LearningRate 0.0000   Epoch: 19   Global Step: 330460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:36,671-Speed 9031.19 samples/sec   Loss 3.3172   LearningRate 0.0000   Epoch: 19   Global Step: 330470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:37,782-Speed 9225.57 samples/sec   Loss 3.2390   LearningRate 0.0000   Epoch: 19   Global Step: 330480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:38,895-Speed 9199.66 samples/sec   Loss 3.2840   LearningRate 0.0000   Epoch: 19   Global Step: 330490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:40,045-Speed 8914.37 samples/sec   Loss 3.3304   LearningRate 0.0000   Epoch: 19   Global Step: 330500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:41,167-Speed 9126.46 samples/sec   Loss 3.2603   LearningRate 0.0000   Epoch: 19   Global Step: 330510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:42,288-Speed 9145.86 samples/sec   Loss 3.2652   LearningRate 0.0000   Epoch: 19   Global Step: 330520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:43,448-Speed 8830.60 samples/sec   Loss 3.3536   LearningRate 0.0000   Epoch: 19   Global Step: 330530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:44,555-Speed 9255.45 samples/sec   Loss 3.2736   LearningRate 0.0000   Epoch: 19   Global Step: 330540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:45,629-Speed 9541.19 samples/sec   Loss 3.3802   LearningRate 0.0000   Epoch: 19   Global Step: 330550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:57:46,759-Speed 9068.47 samples/sec   Loss 3.2640   LearningRate 0.0000   Epoch: 19   Global Step: 330560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:57:47,879-Speed 9152.12 samples/sec   Loss 3.3357   LearningRate 0.0000   Epoch: 19   Global Step: 330570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:57:48,968-Speed 9406.61 samples/sec   Loss 3.3337   LearningRate 0.0000   Epoch: 19   Global Step: 330580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:57:50,108-Speed 8985.84 samples/sec   Loss 3.2721   LearningRate 0.0000   Epoch: 19   Global Step: 330590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:57:51,250-Speed 8977.74 samples/sec   Loss 3.3116   LearningRate 0.0000   Epoch: 19   Global Step: 330600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:57:52,413-Speed 8810.51 samples/sec   Loss 3.2657   LearningRate 0.0000   Epoch: 19   Global Step: 330610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:57:53,518-Speed 9268.57 samples/sec   Loss 3.2602   LearningRate 0.0000   Epoch: 19   Global Step: 330620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:57:54,647-Speed 9079.60 samples/sec   Loss 3.3163   LearningRate 0.0000   Epoch: 19   Global Step: 330630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:57:55,770-Speed 9121.43 samples/sec   Loss 3.2444   LearningRate 0.0000   Epoch: 19   Global Step: 330640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:57:56,933-Speed 8809.91 samples/sec   Loss 3.3468   LearningRate 0.0000   Epoch: 19   Global Step: 330650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:58,033-Speed 9311.84 samples/sec   Loss 3.2300   LearningRate 0.0000   Epoch: 19   Global Step: 330660   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:57:59,134-Speed 9307.12 samples/sec   Loss 3.2822   LearningRate 0.0000   Epoch: 19   Global Step: 330670   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:00,234-Speed 9318.68 samples/sec   Loss 3.2290   LearningRate 0.0000   Epoch: 19   Global Step: 330680   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:01,336-Speed 9296.71 samples/sec   Loss 3.3152   LearningRate 0.0000   Epoch: 19   Global Step: 330690   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:02,451-Speed 9191.16 samples/sec   Loss 3.2915   LearningRate 0.0000   Epoch: 19   Global Step: 330700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:03,562-Speed 9221.46 samples/sec   Loss 3.2618   LearningRate 0.0000   Epoch: 19   Global Step: 330710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:04,637-Speed 9530.67 samples/sec   Loss 3.3093   LearningRate 0.0000   Epoch: 19   Global Step: 330720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:05,740-Speed 9285.03 samples/sec   Loss 3.2982   LearningRate 0.0000   Epoch: 19   Global Step: 330730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:06,854-Speed 9204.93 samples/sec   Loss 3.3592   LearningRate 0.0000   Epoch: 19   Global Step: 330740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:07,969-Speed 9188.87 samples/sec   Loss 3.3045   LearningRate 0.0000   Epoch: 19   Global Step: 330750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:09,120-Speed 8900.61 samples/sec   Loss 3.1943   LearningRate 0.0000   Epoch: 19   Global Step: 330760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:10,237-Speed 9166.74 samples/sec   Loss 3.2696   LearningRate 0.0000   Epoch: 19   Global Step: 330770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:11,338-Speed 9318.36 samples/sec   Loss 3.2374   LearningRate 0.0000   Epoch: 19   Global Step: 330780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:12,437-Speed 9321.08 samples/sec   Loss 3.3527   LearningRate 0.0000   Epoch: 19   Global Step: 330790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:13,568-Speed 9062.33 samples/sec   Loss 3.2660   LearningRate 0.0000   Epoch: 19   Global Step: 330800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:14,709-Speed 8976.41 samples/sec   Loss 3.2622   LearningRate 0.0000   Epoch: 19   Global Step: 330810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:15,850-Speed 8980.65 samples/sec   Loss 3.3335   LearningRate 0.0000   Epoch: 19   Global Step: 330820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:16,932-Speed 9468.98 samples/sec   Loss 3.3102   LearningRate 0.0000   Epoch: 19   Global Step: 330830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:18,058-Speed 9100.10 samples/sec   Loss 3.2603   LearningRate 0.0000   Epoch: 19   Global Step: 330840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:19,171-Speed 9211.97 samples/sec   Loss 3.2554   LearningRate 0.0000   Epoch: 19   Global Step: 330850   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:58:20,260-Speed 9409.84 samples/sec   Loss 3.2964   LearningRate 0.0000   Epoch: 19   Global Step: 330860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:21,365-Speed 9272.05 samples/sec   Loss 3.3299   LearningRate 0.0000   Epoch: 19   Global Step: 330870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:22,530-Speed 8792.84 samples/sec   Loss 3.2553   LearningRate 0.0000   Epoch: 19   Global Step: 330880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:23,638-Speed 9251.08 samples/sec   Loss 3.2893   LearningRate 0.0000   Epoch: 19   Global Step: 330890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:24,726-Speed 9415.52 samples/sec   Loss 3.2276   LearningRate 0.0000   Epoch: 19   Global Step: 330900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:25,869-Speed 8959.55 samples/sec   Loss 3.3177   LearningRate 0.0000   Epoch: 19   Global Step: 330910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:26,981-Speed 9216.99 samples/sec   Loss 3.3368   LearningRate 0.0000   Epoch: 19   Global Step: 330920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:28,085-Speed 9282.84 samples/sec   Loss 3.3321   LearningRate 0.0000   Epoch: 19   Global Step: 330930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:29,246-Speed 8825.11 samples/sec   Loss 3.2645   LearningRate 0.0000   Epoch: 19   Global Step: 330940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:30,388-Speed 8976.21 samples/sec   Loss 3.2490   LearningRate 0.0000   Epoch: 19   Global Step: 330950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:31,479-Speed 9393.09 samples/sec   Loss 3.3469   LearningRate 0.0000   Epoch: 19   Global Step: 330960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:32,591-Speed 9207.80 samples/sec   Loss 3.3647   LearningRate 0.0000   Epoch: 19   Global Step: 330970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:33,682-Speed 9390.11 samples/sec   Loss 3.2961   LearningRate 0.0000   Epoch: 19   Global Step: 330980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:34,772-Speed 9402.49 samples/sec   Loss 3.2691   LearningRate 0.0000   Epoch: 19   Global Step: 330990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:35,911-Speed 8991.57 samples/sec   Loss 3.3106   LearningRate 0.0000   Epoch: 19   Global Step: 331000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:37,063-Speed 8895.00 samples/sec   Loss 3.2982   LearningRate 0.0000   Epoch: 19   Global Step: 331010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:38,253-Speed 8612.91 samples/sec   Loss 3.3132   LearningRate 0.0000   Epoch: 19   Global Step: 331020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:39,392-Speed 8988.96 samples/sec   Loss 3.2377   LearningRate 0.0000   Epoch: 19   Global Step: 331030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:40,508-Speed 9182.41 samples/sec   Loss 3.2922   LearningRate 0.0000   Epoch: 19   Global Step: 331040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:41,607-Speed 9325.17 samples/sec   Loss 3.3340   LearningRate 0.0000   Epoch: 19   Global Step: 331050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:42,743-Speed 9018.57 samples/sec   Loss 3.3289   LearningRate 0.0000   Epoch: 19   Global Step: 331060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:43,927-Speed 8656.54 samples/sec   Loss 3.3072   LearningRate 0.0000   Epoch: 19   Global Step: 331070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:45,028-Speed 9305.96 samples/sec   Loss 3.2838   LearningRate 0.0000   Epoch: 19   Global Step: 331080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:46,138-Speed 9228.69 samples/sec   Loss 3.3202   LearningRate 0.0000   Epoch: 19   Global Step: 331090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:47,227-Speed 9411.98 samples/sec   Loss 3.2817   LearningRate 0.0000   Epoch: 19   Global Step: 331100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:48,323-Speed 9349.18 samples/sec   Loss 3.2964   LearningRate 0.0000   Epoch: 19   Global Step: 331110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:49,425-Speed 9298.98 samples/sec   Loss 3.2924   LearningRate 0.0000   Epoch: 19   Global Step: 331120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:50,514-Speed 9405.57 samples/sec   Loss 3.2292   LearningRate 0.0000   Epoch: 19   Global Step: 331130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:51,634-Speed 9147.01 samples/sec   Loss 3.2832   LearningRate 0.0000   Epoch: 19   Global Step: 331140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:52,775-Speed 8982.59 samples/sec   Loss 3.3174   LearningRate 0.0000   Epoch: 19   Global Step: 331150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:53,892-Speed 9173.29 samples/sec   Loss 3.3007   LearningRate 0.0000   Epoch: 19   Global Step: 331160   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:58:55,023-Speed 9062.00 samples/sec   Loss 3.2626   LearningRate 0.0000   Epoch: 19   Global Step: 331170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:56,148-Speed 9102.22 samples/sec   Loss 3.2447   LearningRate 0.0000   Epoch: 19   Global Step: 331180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:57,316-Speed 8771.17 samples/sec   Loss 3.3052   LearningRate 0.0000   Epoch: 19   Global Step: 331190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:58,421-Speed 9272.66 samples/sec   Loss 3.3073   LearningRate 0.0000   Epoch: 19   Global Step: 331200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:58:59,530-Speed 9238.41 samples/sec   Loss 3.2940   LearningRate 0.0000   Epoch: 19   Global Step: 331210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:00,622-Speed 9384.16 samples/sec   Loss 3.2583   LearningRate 0.0000   Epoch: 19   Global Step: 331220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:01,791-Speed 8768.73 samples/sec   Loss 3.2558   LearningRate 0.0000   Epoch: 19   Global Step: 331230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:02,964-Speed 8733.03 samples/sec   Loss 3.2469   LearningRate 0.0000   Epoch: 19   Global Step: 331240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:04,057-Speed 9374.04 samples/sec   Loss 3.2609   LearningRate 0.0000   Epoch: 19   Global Step: 331250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:05,178-Speed 9144.00 samples/sec   Loss 3.2839   LearningRate 0.0000   Epoch: 19   Global Step: 331260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:06,233-Speed 9705.50 samples/sec   Loss 3.2146   LearningRate 0.0000   Epoch: 19   Global Step: 331270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:07,327-Speed 9364.50 samples/sec   Loss 3.2964   LearningRate 0.0000   Epoch: 19   Global Step: 331280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:08,412-Speed 9443.27 samples/sec   Loss 3.2914   LearningRate 0.0000   Epoch: 19   Global Step: 331290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:09,515-Speed 9292.76 samples/sec   Loss 3.2847   LearningRate 0.0000   Epoch: 19   Global Step: 331300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:10,591-Speed 9520.06 samples/sec   Loss 3.3257   LearningRate 0.0000   Epoch: 19   Global Step: 331310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:11,675-Speed 9450.08 samples/sec   Loss 3.2313   LearningRate 0.0000   Epoch: 19   Global Step: 331320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:12,835-Speed 8835.80 samples/sec   Loss 3.2885   LearningRate 0.0000   Epoch: 19   Global Step: 331330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:13,936-Speed 9305.85 samples/sec   Loss 3.3241   LearningRate 0.0000   Epoch: 19   Global Step: 331340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:15,075-Speed 9001.98 samples/sec   Loss 3.2973   LearningRate 0.0000   Epoch: 19   Global Step: 331350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:16,220-Speed 8942.31 samples/sec   Loss 3.2913   LearningRate 0.0000   Epoch: 19   Global Step: 331360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:17,387-Speed 8779.49 samples/sec   Loss 3.3207   LearningRate 0.0000   Epoch: 19   Global Step: 331370   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 00:59:18,505-Speed 9165.87 samples/sec   Loss 3.3194   LearningRate 0.0000   Epoch: 19   Global Step: 331380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:19,684-Speed 8694.93 samples/sec   Loss 3.2205   LearningRate 0.0000   Epoch: 19   Global Step: 331390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:20,795-Speed 9222.39 samples/sec   Loss 3.2846   LearningRate 0.0000   Epoch: 19   Global Step: 331400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:21,909-Speed 9202.95 samples/sec   Loss 3.2861   LearningRate 0.0000   Epoch: 19   Global Step: 331410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:22,999-Speed 9397.88 samples/sec   Loss 3.3423   LearningRate 0.0000   Epoch: 19   Global Step: 331420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:24,113-Speed 9197.95 samples/sec   Loss 3.3001   LearningRate 0.0000   Epoch: 19   Global Step: 331430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:25,250-Speed 9010.33 samples/sec   Loss 3.2263   LearningRate 0.0000   Epoch: 19   Global Step: 331440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:26,381-Speed 9055.10 samples/sec   Loss 3.3192   LearningRate 0.0000   Epoch: 19   Global Step: 331450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:27,507-Speed 9100.31 samples/sec   Loss 3.2830   LearningRate 0.0000   Epoch: 19   Global Step: 331460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:28,575-Speed 9609.03 samples/sec   Loss 3.3383   LearningRate 0.0000   Epoch: 19   Global Step: 331470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:29,740-Speed 8797.56 samples/sec   Loss 3.3383   LearningRate 0.0000   Epoch: 19   Global Step: 331480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:30,912-Speed 8742.54 samples/sec   Loss 3.3056   LearningRate 0.0000   Epoch: 19   Global Step: 331490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:32,039-Speed 9091.85 samples/sec   Loss 3.2700   LearningRate 0.0000   Epoch: 19   Global Step: 331500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:33,158-Speed 9161.29 samples/sec   Loss 3.1974   LearningRate 0.0000   Epoch: 19   Global Step: 331510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:34,297-Speed 8991.56 samples/sec   Loss 3.2867   LearningRate 0.0000   Epoch: 19   Global Step: 331520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:35,415-Speed 9163.67 samples/sec   Loss 3.2598   LearningRate 0.0000   Epoch: 19   Global Step: 331530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:36,536-Speed 9136.75 samples/sec   Loss 3.3054   LearningRate 0.0000   Epoch: 19   Global Step: 331540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:37,645-Speed 9245.97 samples/sec   Loss 3.2341   LearningRate 0.0000   Epoch: 19   Global Step: 331550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:38,785-Speed 8986.59 samples/sec   Loss 3.2696   LearningRate 0.0000   Epoch: 19   Global Step: 331560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:39,874-Speed 9408.59 samples/sec   Loss 3.2934   LearningRate 0.0000   Epoch: 19   Global Step: 331570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:41,011-Speed 9015.00 samples/sec   Loss 3.3378   LearningRate 0.0000   Epoch: 19   Global Step: 331580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:42,116-Speed 9268.84 samples/sec   Loss 3.2761   LearningRate 0.0000   Epoch: 19   Global Step: 331590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:43,233-Speed 9182.24 samples/sec   Loss 3.3078   LearningRate 0.0000   Epoch: 19   Global Step: 331600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:44,325-Speed 9376.82 samples/sec   Loss 3.2560   LearningRate 0.0000   Epoch: 19   Global Step: 331610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:45,435-Speed 9233.03 samples/sec   Loss 3.2328   LearningRate 0.0000   Epoch: 19   Global Step: 331620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:46,531-Speed 9347.66 samples/sec   Loss 3.2724   LearningRate 0.0000   Epoch: 19   Global Step: 331630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 00:59:47,660-Speed 9072.59 samples/sec   Loss 3.3535   LearningRate 0.0000   Epoch: 19   Global Step: 331640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:48,776-Speed 9182.45 samples/sec   Loss 3.2389   LearningRate 0.0000   Epoch: 19   Global Step: 331650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:49,926-Speed 8917.40 samples/sec   Loss 3.3209   LearningRate 0.0000   Epoch: 19   Global Step: 331660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:51,053-Speed 9088.01 samples/sec   Loss 3.3941   LearningRate 0.0000   Epoch: 19   Global Step: 331670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:52,148-Speed 9362.03 samples/sec   Loss 3.3189   LearningRate 0.0000   Epoch: 19   Global Step: 331680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:53,277-Speed 9071.92 samples/sec   Loss 3.2795   LearningRate 0.0000   Epoch: 19   Global Step: 331690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:54,424-Speed 8935.01 samples/sec   Loss 3.3349   LearningRate 0.0000   Epoch: 19   Global Step: 331700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:55,572-Speed 8920.01 samples/sec   Loss 3.2496   LearningRate 0.0000   Epoch: 19   Global Step: 331710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:56,666-Speed 9369.03 samples/sec   Loss 3.3109   LearningRate 0.0000   Epoch: 19   Global Step: 331720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:57,798-Speed 9049.89 samples/sec   Loss 3.2159   LearningRate 0.0000   Epoch: 19   Global Step: 331730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 00:59:58,898-Speed 9318.00 samples/sec   Loss 3.2732   LearningRate 0.0000   Epoch: 19   Global Step: 331740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:00,009-Speed 9221.25 samples/sec   Loss 3.2577   LearningRate 0.0000   Epoch: 19   Global Step: 331750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:01,132-Speed 9125.76 samples/sec   Loss 3.3672   LearningRate 0.0000   Epoch: 19   Global Step: 331760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:02,270-Speed 9000.42 samples/sec   Loss 3.3409   LearningRate 0.0000   Epoch: 19   Global Step: 331770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:03,408-Speed 9004.84 samples/sec   Loss 3.2712   LearningRate 0.0000   Epoch: 19   Global Step: 331780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:04,508-Speed 9316.06 samples/sec   Loss 3.3494   LearningRate 0.0000   Epoch: 19   Global Step: 331790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:05,589-Speed 9474.91 samples/sec   Loss 3.2931   LearningRate 0.0000   Epoch: 19   Global Step: 331800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:06,716-Speed 9094.73 samples/sec   Loss 3.2987   LearningRate 0.0000   Epoch: 19   Global Step: 331810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:07,852-Speed 9020.51 samples/sec   Loss 3.3217   LearningRate 0.0000   Epoch: 19   Global Step: 331820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:08,952-Speed 9316.53 samples/sec   Loss 3.2872   LearningRate 0.0000   Epoch: 19   Global Step: 331830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:10,011-Speed 9674.77 samples/sec   Loss 3.3330   LearningRate 0.0000   Epoch: 19   Global Step: 331840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:11,162-Speed 8901.42 samples/sec   Loss 3.2757   LearningRate 0.0000   Epoch: 19   Global Step: 331850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:12,286-Speed 9109.67 samples/sec   Loss 3.3024   LearningRate 0.0000   Epoch: 19   Global Step: 331860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:13,414-Speed 9085.36 samples/sec   Loss 3.2944   LearningRate 0.0000   Epoch: 19   Global Step: 331870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:14,550-Speed 9021.50 samples/sec   Loss 3.2739   LearningRate 0.0000   Epoch: 19   Global Step: 331880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:15,679-Speed 9076.49 samples/sec   Loss 3.3015   LearningRate 0.0000   Epoch: 19   Global Step: 331890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:16,794-Speed 9192.98 samples/sec   Loss 3.3226   LearningRate 0.0000   Epoch: 19   Global Step: 331900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:17,876-Speed 9465.58 samples/sec   Loss 3.3142   LearningRate 0.0000   Epoch: 19   Global Step: 331910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:18,965-Speed 9407.88 samples/sec   Loss 3.3203   LearningRate 0.0000   Epoch: 19   Global Step: 331920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:20,062-Speed 9347.46 samples/sec   Loss 3.2620   LearningRate 0.0000   Epoch: 19   Global Step: 331930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:21,143-Speed 9472.61 samples/sec   Loss 3.3353   LearningRate 0.0000   Epoch: 19   Global Step: 331940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:22,240-Speed 9343.95 samples/sec   Loss 3.2416   LearningRate 0.0000   Epoch: 19   Global Step: 331950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:23,380-Speed 8983.13 samples/sec   Loss 3.3194   LearningRate 0.0000   Epoch: 19   Global Step: 331960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:24,525-Speed 8946.37 samples/sec   Loss 3.2807   LearningRate 0.0000   Epoch: 19   Global Step: 331970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:25,666-Speed 8987.65 samples/sec   Loss 3.2978   LearningRate 0.0000   Epoch: 19   Global Step: 331980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:26,795-Speed 9071.90 samples/sec   Loss 3.2595   LearningRate 0.0000   Epoch: 19   Global Step: 331990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:27,892-Speed 9335.05 samples/sec   Loss 3.1670   LearningRate 0.0000   Epoch: 19   Global Step: 332000   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:00:49,671-[lfw][332000]XNorm: 6.504402
Training: 2022-04-12 01:00:49,671-[lfw][332000]Accuracy-Flip: 0.99667+-0.00269
Training: 2022-04-12 01:00:49,672-[lfw][332000]Accuracy-Highest: 0.99750
Training: 2022-04-12 01:01:14,848-[cfp_fp][332000]XNorm: 5.682261
Training: 2022-04-12 01:01:14,849-[cfp_fp][332000]Accuracy-Flip: 0.97443+-0.00812
Training: 2022-04-12 01:01:14,849-[cfp_fp][332000]Accuracy-Highest: 0.97543
Training: 2022-04-12 01:01:36,562-[agedb_30][332000]XNorm: 6.337553
Training: 2022-04-12 01:01:36,563-[agedb_30][332000]Accuracy-Flip: 0.97217+-0.00817
Training: 2022-04-12 01:01:36,563-[agedb_30][332000]Accuracy-Highest: 0.97417
Training: 2022-04-12 01:01:37,701-Speed 146.69 samples/sec   Loss 3.2398   LearningRate 0.0000   Epoch: 19   Global Step: 332010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:38,814-Speed 9204.38 samples/sec   Loss 3.2623   LearningRate 0.0000   Epoch: 19   Global Step: 332020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:39,896-Speed 9475.86 samples/sec   Loss 3.3320   LearningRate 0.0000   Epoch: 19   Global Step: 332030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:40,956-Speed 9659.08 samples/sec   Loss 3.3023   LearningRate 0.0000   Epoch: 19   Global Step: 332040   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 01:01:42,051-Speed 9360.06 samples/sec   Loss 3.2722   LearningRate 0.0000   Epoch: 19   Global Step: 332050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:43,187-Speed 9020.47 samples/sec   Loss 3.2972   LearningRate 0.0000   Epoch: 19   Global Step: 332060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:44,278-Speed 9395.38 samples/sec   Loss 3.2994   LearningRate 0.0000   Epoch: 19   Global Step: 332070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:45,372-Speed 9358.86 samples/sec   Loss 3.3487   LearningRate 0.0000   Epoch: 19   Global Step: 332080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:46,445-Speed 9547.34 samples/sec   Loss 3.2009   LearningRate 0.0000   Epoch: 19   Global Step: 332090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:47,519-Speed 9544.26 samples/sec   Loss 3.2757   LearningRate 0.0000   Epoch: 19   Global Step: 332100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:48,674-Speed 8871.57 samples/sec   Loss 3.2584   LearningRate 0.0000   Epoch: 19   Global Step: 332110   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:49,774-Speed 9316.79 samples/sec   Loss 3.2400   LearningRate 0.0000   Epoch: 19   Global Step: 332120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:50,936-Speed 8814.05 samples/sec   Loss 3.4490   LearningRate 0.0000   Epoch: 19   Global Step: 332130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:52,031-Speed 9357.46 samples/sec   Loss 3.3160   LearningRate 0.0000   Epoch: 19   Global Step: 332140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:53,106-Speed 9532.36 samples/sec   Loss 3.2285   LearningRate 0.0000   Epoch: 19   Global Step: 332150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:54,188-Speed 9472.39 samples/sec   Loss 3.3444   LearningRate 0.0000   Epoch: 19   Global Step: 332160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:55,289-Speed 9306.57 samples/sec   Loss 3.2415   LearningRate 0.0000   Epoch: 19   Global Step: 332170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:56,428-Speed 8995.18 samples/sec   Loss 3.3190   LearningRate 0.0000   Epoch: 19   Global Step: 332180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:57,580-Speed 8890.87 samples/sec   Loss 3.3323   LearningRate 0.0000   Epoch: 19   Global Step: 332190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:58,691-Speed 9222.63 samples/sec   Loss 3.3825   LearningRate 0.0000   Epoch: 19   Global Step: 332200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:01:59,751-Speed 9666.86 samples/sec   Loss 3.2665   LearningRate 0.0000   Epoch: 19   Global Step: 332210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:00,826-Speed 9527.22 samples/sec   Loss 3.2939   LearningRate 0.0000   Epoch: 19   Global Step: 332220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:01,905-Speed 9500.38 samples/sec   Loss 3.2947   LearningRate 0.0000   Epoch: 19   Global Step: 332230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:03,029-Speed 9114.42 samples/sec   Loss 3.2498   LearningRate 0.0000   Epoch: 19   Global Step: 332240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:04,240-Speed 8462.31 samples/sec   Loss 3.2827   LearningRate 0.0000   Epoch: 19   Global Step: 332250   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 01:02:05,335-Speed 9352.13 samples/sec   Loss 3.3626   LearningRate 0.0000   Epoch: 19   Global Step: 332260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:06,468-Speed 9047.99 samples/sec   Loss 3.1880   LearningRate 0.0000   Epoch: 19   Global Step: 332270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:07,614-Speed 8938.15 samples/sec   Loss 3.2499   LearningRate 0.0000   Epoch: 19   Global Step: 332280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:08,735-Speed 9144.04 samples/sec   Loss 3.2140   LearningRate 0.0000   Epoch: 19   Global Step: 332290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:09,871-Speed 9020.70 samples/sec   Loss 3.3006   LearningRate 0.0000   Epoch: 19   Global Step: 332300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:11,025-Speed 8879.48 samples/sec   Loss 3.2452   LearningRate 0.0000   Epoch: 19   Global Step: 332310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:12,153-Speed 9084.02 samples/sec   Loss 3.2957   LearningRate 0.0000   Epoch: 19   Global Step: 332320   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:13,263-Speed 9235.26 samples/sec   Loss 3.3151   LearningRate 0.0000   Epoch: 19   Global Step: 332330   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:14,359-Speed 9344.77 samples/sec   Loss 3.3457   LearningRate 0.0000   Epoch: 19   Global Step: 332340   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:15,426-Speed 9599.11 samples/sec   Loss 3.3019   LearningRate 0.0000   Epoch: 19   Global Step: 332350   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:16,518-Speed 9382.08 samples/sec   Loss 3.2592   LearningRate 0.0000   Epoch: 19   Global Step: 332360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:17,657-Speed 8996.56 samples/sec   Loss 3.3222   LearningRate 0.0000   Epoch: 19   Global Step: 332370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:18,772-Speed 9188.42 samples/sec   Loss 3.3869   LearningRate 0.0000   Epoch: 19   Global Step: 332380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:19,859-Speed 9434.80 samples/sec   Loss 3.2653   LearningRate 0.0000   Epoch: 19   Global Step: 332390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:20,975-Speed 9182.19 samples/sec   Loss 3.3734   LearningRate 0.0000   Epoch: 19   Global Step: 332400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:22,110-Speed 9024.23 samples/sec   Loss 3.3564   LearningRate 0.0000   Epoch: 19   Global Step: 332410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:23,226-Speed 9178.39 samples/sec   Loss 3.2562   LearningRate 0.0000   Epoch: 19   Global Step: 332420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:24,341-Speed 9190.98 samples/sec   Loss 3.2419   LearningRate 0.0000   Epoch: 19   Global Step: 332430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:25,427-Speed 9429.18 samples/sec   Loss 3.2983   LearningRate 0.0000   Epoch: 19   Global Step: 332440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:26,526-Speed 9325.85 samples/sec   Loss 3.2735   LearningRate 0.0000   Epoch: 19   Global Step: 332450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:27,655-Speed 9074.17 samples/sec   Loss 3.2766   LearningRate 0.0000   Epoch: 19   Global Step: 332460   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 01:02:28,748-Speed 9373.42 samples/sec   Loss 3.2481   LearningRate 0.0000   Epoch: 19   Global Step: 332470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:29,880-Speed 9056.07 samples/sec   Loss 3.2899   LearningRate 0.0000   Epoch: 19   Global Step: 332480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:30,990-Speed 9235.77 samples/sec   Loss 3.2881   LearningRate 0.0000   Epoch: 19   Global Step: 332490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:32,100-Speed 9227.31 samples/sec   Loss 3.2490   LearningRate 0.0000   Epoch: 19   Global Step: 332500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:33,232-Speed 9051.37 samples/sec   Loss 3.3209   LearningRate 0.0000   Epoch: 19   Global Step: 332510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:34,366-Speed 9037.02 samples/sec   Loss 3.2712   LearningRate 0.0000   Epoch: 19   Global Step: 332520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:35,445-Speed 9496.35 samples/sec   Loss 3.2425   LearningRate 0.0000   Epoch: 19   Global Step: 332530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:36,520-Speed 9530.30 samples/sec   Loss 3.2136   LearningRate 0.0000   Epoch: 19   Global Step: 332540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:37,700-Speed 8686.42 samples/sec   Loss 3.3362   LearningRate 0.0000   Epoch: 19   Global Step: 332550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:38,799-Speed 9321.79 samples/sec   Loss 3.2810   LearningRate 0.0000   Epoch: 19   Global Step: 332560   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:39,936-Speed 9006.51 samples/sec   Loss 3.3484   LearningRate 0.0000   Epoch: 19   Global Step: 332570   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:41,070-Speed 9040.28 samples/sec   Loss 3.2247   LearningRate 0.0000   Epoch: 19   Global Step: 332580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:42,216-Speed 8936.77 samples/sec   Loss 3.2334   LearningRate 0.0000   Epoch: 19   Global Step: 332590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:43,354-Speed 9009.19 samples/sec   Loss 3.3362   LearningRate 0.0000   Epoch: 19   Global Step: 332600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:02:44,455-Speed 9306.00 samples/sec   Loss 3.3198   LearningRate 0.0000   Epoch: 19   Global Step: 332610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:02:45,549-Speed 9363.66 samples/sec   Loss 3.3067   LearningRate 0.0000   Epoch: 19   Global Step: 332620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:02:46,715-Speed 8789.29 samples/sec   Loss 3.2763   LearningRate 0.0000   Epoch: 19   Global Step: 332630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:02:47,829-Speed 9199.89 samples/sec   Loss 3.2309   LearningRate 0.0000   Epoch: 19   Global Step: 332640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:02:48,986-Speed 8856.67 samples/sec   Loss 3.3417   LearningRate 0.0000   Epoch: 19   Global Step: 332650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:02:50,077-Speed 9393.68 samples/sec   Loss 3.3619   LearningRate 0.0000   Epoch: 19   Global Step: 332660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:02:51,211-Speed 9033.80 samples/sec   Loss 3.2785   LearningRate 0.0000   Epoch: 19   Global Step: 332670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:02:52,323-Speed 9216.28 samples/sec   Loss 3.3413   LearningRate 0.0000   Epoch: 19   Global Step: 332680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:02:53,448-Speed 9102.11 samples/sec   Loss 3.3829   LearningRate 0.0000   Epoch: 19   Global Step: 332690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:02:54,629-Speed 8678.03 samples/sec   Loss 3.2387   LearningRate 0.0000   Epoch: 19   Global Step: 332700   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:55,747-Speed 9163.61 samples/sec   Loss 3.3294   LearningRate 0.0000   Epoch: 19   Global Step: 332710   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:56,874-Speed 9093.23 samples/sec   Loss 3.2517   LearningRate 0.0000   Epoch: 19   Global Step: 332720   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:57,994-Speed 9143.27 samples/sec   Loss 3.2296   LearningRate 0.0000   Epoch: 19   Global Step: 332730   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:02:59,119-Speed 9107.66 samples/sec   Loss 3.2779   LearningRate 0.0000   Epoch: 19   Global Step: 332740   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:00,226-Speed 9254.27 samples/sec   Loss 3.3266   LearningRate 0.0000   Epoch: 19   Global Step: 332750   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:01,337-Speed 9222.98 samples/sec   Loss 3.2672   LearningRate 0.0000   Epoch: 19   Global Step: 332760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:02,474-Speed 9016.50 samples/sec   Loss 3.3241   LearningRate 0.0000   Epoch: 19   Global Step: 332770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:03,597-Speed 9123.94 samples/sec   Loss 3.2949   LearningRate 0.0000   Epoch: 19   Global Step: 332780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:04,688-Speed 9383.87 samples/sec   Loss 3.2754   LearningRate 0.0000   Epoch: 19   Global Step: 332790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:05,795-Speed 9259.98 samples/sec   Loss 3.3176   LearningRate 0.0000   Epoch: 19   Global Step: 332800   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:06,903-Speed 9245.29 samples/sec   Loss 3.3177   LearningRate 0.0000   Epoch: 19   Global Step: 332810   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:08,022-Speed 9161.81 samples/sec   Loss 3.2522   LearningRate 0.0000   Epoch: 19   Global Step: 332820   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:09,141-Speed 9156.49 samples/sec   Loss 3.2922   LearningRate 0.0000   Epoch: 19   Global Step: 332830   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:10,257-Speed 9176.30 samples/sec   Loss 3.2618   LearningRate 0.0000   Epoch: 19   Global Step: 332840   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:11,355-Speed 9331.97 samples/sec   Loss 3.2822   LearningRate 0.0000   Epoch: 19   Global Step: 332850   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:12,482-Speed 9091.97 samples/sec   Loss 3.3070   LearningRate 0.0000   Epoch: 19   Global Step: 332860   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:13,577-Speed 9360.10 samples/sec   Loss 3.3202   LearningRate 0.0000   Epoch: 19   Global Step: 332870   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:14,664-Speed 9426.47 samples/sec   Loss 3.2435   LearningRate 0.0000   Epoch: 19   Global Step: 332880   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:15,770-Speed 9259.38 samples/sec   Loss 3.2090   LearningRate 0.0000   Epoch: 19   Global Step: 332890   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:16,922-Speed 8898.60 samples/sec   Loss 3.3221   LearningRate 0.0000   Epoch: 19   Global Step: 332900   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:18,054-Speed 9054.05 samples/sec   Loss 3.2999   LearningRate 0.0000   Epoch: 19   Global Step: 332910   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:19,171-Speed 9166.67 samples/sec   Loss 3.2145   LearningRate 0.0000   Epoch: 19   Global Step: 332920   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:20,281-Speed 9240.62 samples/sec   Loss 3.3263   LearningRate 0.0000   Epoch: 19   Global Step: 332930   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:21,372-Speed 9396.49 samples/sec   Loss 3.3485   LearningRate 0.0000   Epoch: 19   Global Step: 332940   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:22,516-Speed 8951.95 samples/sec   Loss 3.2957   LearningRate 0.0000   Epoch: 19   Global Step: 332950   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:23,636-Speed 9150.53 samples/sec   Loss 3.2737   LearningRate 0.0000   Epoch: 19   Global Step: 332960   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:24,736-Speed 9314.14 samples/sec   Loss 3.2977   LearningRate 0.0000   Epoch: 19   Global Step: 332970   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:25,840-Speed 9284.85 samples/sec   Loss 3.2908   LearningRate 0.0000   Epoch: 19   Global Step: 332980   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:26,956-Speed 9181.43 samples/sec   Loss 3.2743   LearningRate 0.0000   Epoch: 19   Global Step: 332990   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:28,067-Speed 9217.35 samples/sec   Loss 3.2534   LearningRate 0.0000   Epoch: 19   Global Step: 333000   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 01:03:29,198-Speed 9056.57 samples/sec   Loss 3.3077   LearningRate 0.0000   Epoch: 19   Global Step: 333010   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:30,294-Speed 9347.40 samples/sec   Loss 3.2955   LearningRate 0.0000   Epoch: 19   Global Step: 333020   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:31,398-Speed 9283.64 samples/sec   Loss 3.2495   LearningRate 0.0000   Epoch: 19   Global Step: 333030   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:32,543-Speed 8952.85 samples/sec   Loss 3.3486   LearningRate 0.0000   Epoch: 19   Global Step: 333040   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:33,665-Speed 9132.68 samples/sec   Loss 3.3443   LearningRate 0.0000   Epoch: 19   Global Step: 333050   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:34,770-Speed 9268.18 samples/sec   Loss 3.3299   LearningRate 0.0000   Epoch: 19   Global Step: 333060   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:35,879-Speed 9240.24 samples/sec   Loss 3.3149   LearningRate 0.0000   Epoch: 19   Global Step: 333070   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:36,994-Speed 9189.17 samples/sec   Loss 3.2992   LearningRate 0.0000   Epoch: 19   Global Step: 333080   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:38,118-Speed 9114.24 samples/sec   Loss 3.2788   LearningRate 0.0000   Epoch: 19   Global Step: 333090   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:39,237-Speed 9159.67 samples/sec   Loss 3.2914   LearningRate 0.0000   Epoch: 19   Global Step: 333100   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:40,369-Speed 9052.93 samples/sec   Loss 3.2200   LearningRate 0.0000   Epoch: 19   Global Step: 333110   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 01:03:41,473-Speed 9277.23 samples/sec   Loss 3.2511   LearningRate 0.0000   Epoch: 19   Global Step: 333120   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:42,594-Speed 9140.19 samples/sec   Loss 3.2673   LearningRate 0.0000   Epoch: 19   Global Step: 333130   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:43,771-Speed 8705.69 samples/sec   Loss 3.3193   LearningRate 0.0000   Epoch: 19   Global Step: 333140   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:44,848-Speed 9521.72 samples/sec   Loss 3.2532   LearningRate 0.0000   Epoch: 19   Global Step: 333150   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:45,954-Speed 9263.67 samples/sec   Loss 3.3166   LearningRate 0.0000   Epoch: 19   Global Step: 333160   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:47,101-Speed 8932.70 samples/sec   Loss 3.3154   LearningRate 0.0000   Epoch: 19   Global Step: 333170   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:48,215-Speed 9196.33 samples/sec   Loss 3.3406   LearningRate 0.0000   Epoch: 19   Global Step: 333180   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:49,313-Speed 9328.58 samples/sec   Loss 3.2252   LearningRate 0.0000   Epoch: 19   Global Step: 333190   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:50,457-Speed 8960.06 samples/sec   Loss 3.2860   LearningRate 0.0000   Epoch: 19   Global Step: 333200   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:51,618-Speed 8821.04 samples/sec   Loss 3.3256   LearningRate 0.0000   Epoch: 19   Global Step: 333210   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:52,743-Speed 9111.43 samples/sec   Loss 3.3692   LearningRate 0.0000   Epoch: 19   Global Step: 333220   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:53,889-Speed 8940.65 samples/sec   Loss 3.3723   LearningRate 0.0000   Epoch: 19   Global Step: 333230   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:55,027-Speed 9000.67 samples/sec   Loss 3.3203   LearningRate 0.0000   Epoch: 19   Global Step: 333240   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:56,131-Speed 9282.29 samples/sec   Loss 3.3050   LearningRate 0.0000   Epoch: 19   Global Step: 333250   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:57,183-Speed 9738.08 samples/sec   Loss 3.2314   LearningRate 0.0000   Epoch: 19   Global Step: 333260   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:58,277-Speed 9361.40 samples/sec   Loss 3.2804   LearningRate 0.0000   Epoch: 19   Global Step: 333270   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:03:59,403-Speed 9104.41 samples/sec   Loss 3.2841   LearningRate 0.0000   Epoch: 19   Global Step: 333280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:00,544-Speed 8978.32 samples/sec   Loss 3.2884   LearningRate 0.0000   Epoch: 19   Global Step: 333290   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:01,680-Speed 9017.31 samples/sec   Loss 3.2915   LearningRate 0.0000   Epoch: 19   Global Step: 333300   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:02,813-Speed 9044.78 samples/sec   Loss 3.3368   LearningRate 0.0000   Epoch: 19   Global Step: 333310   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:03,936-Speed 9126.99 samples/sec   Loss 3.2673   LearningRate 0.0000   Epoch: 19   Global Step: 333320   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 01:04:05,070-Speed 9034.74 samples/sec   Loss 3.3377   LearningRate 0.0000   Epoch: 19   Global Step: 333330   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 01:04:06,143-Speed 9546.94 samples/sec   Loss 3.2699   LearningRate 0.0000   Epoch: 19   Global Step: 333340   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 01:04:07,275-Speed 9053.86 samples/sec   Loss 3.3366   LearningRate 0.0000   Epoch: 19   Global Step: 333350   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 01:04:08,345-Speed 9570.13 samples/sec   Loss 3.2887   LearningRate 0.0000   Epoch: 19   Global Step: 333360   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:09,461-Speed 9183.26 samples/sec   Loss 3.2461   LearningRate 0.0000   Epoch: 19   Global Step: 333370   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:10,556-Speed 9356.61 samples/sec   Loss 3.2043   LearningRate 0.0000   Epoch: 19   Global Step: 333380   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:11,650-Speed 9369.74 samples/sec   Loss 3.2621   LearningRate 0.0000   Epoch: 19   Global Step: 333390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:12,764-Speed 9198.65 samples/sec   Loss 3.3158   LearningRate 0.0000   Epoch: 19   Global Step: 333400   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:13,924-Speed 8830.33 samples/sec   Loss 3.3329   LearningRate 0.0000   Epoch: 19   Global Step: 333410   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:15,057-Speed 9041.48 samples/sec   Loss 3.2689   LearningRate 0.0000   Epoch: 19   Global Step: 333420   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:16,183-Speed 9102.90 samples/sec   Loss 3.3077   LearningRate 0.0000   Epoch: 19   Global Step: 333430   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:17,244-Speed 9655.73 samples/sec   Loss 3.3684   LearningRate 0.0000   Epoch: 19   Global Step: 333440   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:18,387-Speed 8964.92 samples/sec   Loss 3.2889   LearningRate 0.0000   Epoch: 19   Global Step: 333450   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:19,498-Speed 9221.09 samples/sec   Loss 3.2848   LearningRate 0.0000   Epoch: 19   Global Step: 333460   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:20,611-Speed 9211.95 samples/sec   Loss 3.3271   LearningRate 0.0000   Epoch: 19   Global Step: 333470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:21,727-Speed 9184.47 samples/sec   Loss 3.3543   LearningRate 0.0000   Epoch: 19   Global Step: 333480   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:22,831-Speed 9273.34 samples/sec   Loss 3.3069   LearningRate 0.0000   Epoch: 19   Global Step: 333490   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:23,993-Speed 8817.53 samples/sec   Loss 3.2281   LearningRate 0.0000   Epoch: 19   Global Step: 333500   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:25,138-Speed 8950.32 samples/sec   Loss 3.3200   LearningRate 0.0000   Epoch: 19   Global Step: 333510   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:26,246-Speed 9247.70 samples/sec   Loss 3.3886   LearningRate 0.0000   Epoch: 19   Global Step: 333520   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:27,374-Speed 9088.87 samples/sec   Loss 3.3948   LearningRate 0.0000   Epoch: 19   Global Step: 333530   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:28,498-Speed 9115.83 samples/sec   Loss 3.3219   LearningRate 0.0000   Epoch: 19   Global Step: 333540   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:29,605-Speed 9255.00 samples/sec   Loss 3.3117   LearningRate 0.0000   Epoch: 19   Global Step: 333550   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:30,753-Speed 8920.91 samples/sec   Loss 3.3055   LearningRate 0.0000   Epoch: 19   Global Step: 333560   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 01:04:31,872-Speed 9157.95 samples/sec   Loss 3.3396   LearningRate 0.0000   Epoch: 19   Global Step: 333570   Fp16 Grad Scale: 131072   Required: 0 hours
Training: 2022-04-12 01:04:33,002-Speed 9069.47 samples/sec   Loss 3.2137   LearningRate 0.0000   Epoch: 19   Global Step: 333580   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:34,108-Speed 9261.34 samples/sec   Loss 3.2862   LearningRate 0.0000   Epoch: 19   Global Step: 333590   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:35,207-Speed 9328.72 samples/sec   Loss 3.3368   LearningRate 0.0000   Epoch: 19   Global Step: 333600   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:36,335-Speed 9075.87 samples/sec   Loss 3.3406   LearningRate 0.0000   Epoch: 19   Global Step: 333610   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:37,453-Speed 9171.66 samples/sec   Loss 3.3134   LearningRate 0.0000   Epoch: 19   Global Step: 333620   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:38,617-Speed 8797.32 samples/sec   Loss 3.3383   LearningRate 0.0000   Epoch: 19   Global Step: 333630   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:39,700-Speed 9463.57 samples/sec   Loss 3.3868   LearningRate 0.0000   Epoch: 19   Global Step: 333640   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:40,816-Speed 9184.25 samples/sec   Loss 3.2512   LearningRate 0.0000   Epoch: 19   Global Step: 333650   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:41,955-Speed 8989.89 samples/sec   Loss 3.2848   LearningRate 0.0000   Epoch: 19   Global Step: 333660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:04:43,122-Speed 8788.78 samples/sec   Loss 3.2724   LearningRate 0.0000   Epoch: 19   Global Step: 333670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:04:44,317-Speed 8575.12 samples/sec   Loss 3.2629   LearningRate 0.0000   Epoch: 19   Global Step: 333680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:04:45,422-Speed 9271.60 samples/sec   Loss 3.2388   LearningRate 0.0000   Epoch: 19   Global Step: 333690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:04:46,523-Speed 9309.43 samples/sec   Loss 3.2668   LearningRate 0.0000   Epoch: 19   Global Step: 333700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:04:47,655-Speed 9050.75 samples/sec   Loss 3.2845   LearningRate 0.0000   Epoch: 19   Global Step: 333710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:04:48,793-Speed 9002.44 samples/sec   Loss 3.3189   LearningRate 0.0000   Epoch: 19   Global Step: 333720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:04:49,912-Speed 9162.52 samples/sec   Loss 3.3094   LearningRate 0.0000   Epoch: 19   Global Step: 333730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:04:51,000-Speed 9421.08 samples/sec   Loss 3.3412   LearningRate 0.0000   Epoch: 19   Global Step: 333740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:04:52,134-Speed 9033.78 samples/sec   Loss 3.2917   LearningRate 0.0000   Epoch: 19   Global Step: 333750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:04:53,239-Speed 9268.22 samples/sec   Loss 3.2281   LearningRate 0.0000   Epoch: 19   Global Step: 333760   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:54,346-Speed 9258.97 samples/sec   Loss 3.3730   LearningRate 0.0000   Epoch: 19   Global Step: 333770   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:55,460-Speed 9198.14 samples/sec   Loss 3.2675   LearningRate 0.0000   Epoch: 19   Global Step: 333780   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:56,617-Speed 8851.07 samples/sec   Loss 3.3304   LearningRate 0.0000   Epoch: 19   Global Step: 333790   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-04-12 01:04:57,693-Speed 9530.24 samples/sec   Loss 3.3148   LearningRate 0.0000   Epoch: 19   Global Step: 333800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:04:59,087-Speed 7350.47 samples/sec   Loss 3.2615   LearningRate 0.0000   Epoch: 19   Global Step: 333810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-12 01:05:00,121-Speed 9903.82 samples/sec   Loss 3.2785   LearningRate 0.0000   Epoch: 19   Global Step: 333820   Fp16 Grad Scale: 32768   Required: -0 hours